This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
4550
Julie A. Jacko (Ed.)
Human-Computer Interaction Interaction Design and Usability 12th International Conference, HCI International 2007 Beijing, China, July 22-27, 2007 Proceedings, Part I
13
Volume Editor Julie A. Jacko Georgia Institute of Technology and Emory University School of Medicine 901 Atlantic Drive, Suite 4100, Atlanta, GA 30332-0477, USA E-mail: [email protected]
Library of Congress Control Number: 2007929779 CR Subject Classification (1998): H.5.2, H.5.3, H.3-5, C.2, I.3, D.2, F.3, K.4.2 LNCS Sublibrary: SL 2 – Programming and Software Engineering ISSN ISBN-10 ISBN-13
0302-9743 3-540-73104-0 Springer Berlin Heidelberg New York 978-3-540-73104-7 Springer Berlin Heidelberg New York
The 12th International Conference on Human-Computer Interaction, HCI International 2007, was held in Beijing, P.R. China, 22-27 July 2007, jointly with the Symposium on Human Interface (Japan) 2007, the 7th International Conference on Engineering Psychology and Cognitive Ergonomics, the 4th International Conference on Universal Access in Human-Computer Interaction, the 2nd International Conference on Virtual Reality, the 2nd International Conference on Usability and Internationalization, the 2nd International Conference on Online Communities and Social Computing, the 3rd International Conference on Augmented Cognition, and the 1st International Conference on Digital Human Modeling. A total of 3403 individuals from academia, research institutes, industry and governmental agencies from 76 countries submitted contributions, and 1681 papers, judged to be of high scientific quality, were included in the program. These papers address the latest research and development efforts and highlight the human aspects of design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of Human-Computer Interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. This volume, edited by Julie A. Jacko, contains papers in the thematic area of Human-Computer Interaction, addressing the following major topics: • • • •
Interaction Design: Theoretical Issues, Methods, Techniques and Practice Usability and Evaluation Methods and Tools Understanding Users and Contexts of Use Models and Patterns in HCI The remaining volumes of the HCI International 2007 proceedings are:
• Volume 2, LNCS 4551, Interaction Platforms and Techniques, edited by Julie A. Jacko • Volume 3, LNCS 4552, HCI Intelligent Multimodal Interaction Environments, edited by Julie A. Jacko • Volume 4, LNCS 4553, HCI Applications and Services, edited by Julie A. Jacko • Volume 5, LNCS 4554, Coping with Diversity in Universal Access, edited by Constantine Stephanidis • Volume 6, LNCS 4555, Universal Access to Ambient Interaction, edited by Constantine Stephanidis • Volume 7, LNCS 4556, Universal Access to Applications and Services, edited by Constantine Stephanidis • Volume 8, LNCS 4557, Methods, Techniques and Tools in Information Design, edited by Michael J. Smith and Gavriel Salvendy • Volume 9, LNCS 4558, Interacting in Information Environments, edited by Michael J. Smith and Gavriel Salvendy • Volume 10, LNCS 4559, HCI and Culture, edited by Nuray Aykin
VI
Foreword
• Volume 11, LNCS 4560, Global and Local User Interfaces, edited by Nuray Aykin • Volume 12, LNCS 4561, Digital Human Modeling, edited by Vincent G. Duffy • Volume 13, LNAI 4562, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris • Volume 14, LNCS 4563, Virtual Reality, edited by Randall Shumaker • Volume 15, LNCS 4564, Online Communities and Social Computing, edited by Douglas Schuler • Volume 16, LNAI 4565, Foundations of Augmented Cognition 3rd Edition, edited by Dylan D. Schmorrow and Leah M. Reeves • Volume 17, LNCS 4566, Ergonomics and Health Aspects of Work with Computers, edited by Marvin J. Dainoff I would like to thank the Program Chairs and the members of the Program Boards of all Thematic Areas, listed below, for their contribution to the highest scientific quality and the overall success of the HCI International 2007 Conference.
Ergonomics and Health Aspects of Work with Computers Program Chair: Marvin J. Dainoff Arne Aaras, Norway Pascale Carayon, USA Barbara G.F. Cohen, USA Wolfgang Friesdorf, Germany Martin Helander, Singapore Ben-Tzion Karsh, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Kari Lindstrom, Finland
Holger Luczak, Germany Aura C. Matias, Philippines Kyung (Ken) Park, Korea Michelle Robertson, USA Steven L. Sauter, USA Dominique L. Scapin, France Michael J. Smith, USA Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK
Human Interface and the Management of Information Program Chair: Michael J. Smith Lajos Balint, Hungary Gunilla Bradley, Sweden Hans-Jörg Bullinger, Germany Alan H.S. Chan, Hong Kong Klaus-Peter Fähnrich, Germany Michitaka Hirose, Japan Yoshinori Horie, Japan Richard Koubek, USA Yasufumi Kume, Japan Mark Lehto, USA Jiye Mao, P.R. China
Robert Proctor, USA Youngho Rhee, Korea Anxo Cereijo Roibás, UK Francois Sainfort, USA Katsunori Shimohara, Japan Tsutomu Tabe, Japan Alvaro Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan
Foreword
Fiona Nah, USA Shogo Nishida, Japan Leszek Pacholski, Poland
Li Zheng, P.R. China Bernhard Zimolong, Germany
Human-Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Sheue-Ling Hwang, Taiwan Yong Gu Ji, Korea Steven J. Landry, USA Jonathan Lazar, USA
V. Kathlene Leonard, USA Chang S. Nam, USA Anthony F. Norcio, USA Celestine A. Ntuen, USA P.L. Patrick Rau, P.R. China Andrew Sears, USA Holly Vitense, USA Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Kenneth R. Boff, USA Guy Boy, France Pietro Carlo Cacciabue, Italy Judy Edworthy, UK Erik Hollnagel, Sweden Kenji Itoh, Japan Peter G.A.M. Jorna, The Netherlands Kenneth R. Laughery, USA
Nicolas Marmaras, Greece David Morrison, Australia Sundaram Narayanan, USA Eduardo Salas, USA Dirk Schaefer, France Axel Schulte, Germany Neville A. Stanton, UK Andrew Thatcher, South Africa
Universal Access in Human-Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elizabeth Andre, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian Bühler, Germany Noelle Carbonell, France Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy Michael Fairhurst, UK Gerhard Fischer, USA Jon Gunderson, USA Andreas Holzinger, Austria
Zhengjie Liu, P.R. China Klaus Miesenberger, Austria John Mylopoulos, Canada Michael Pieper, Germany Angel Puerta, USA Anthony Savidis, Greece Andrew Sears, USA Ben Shneiderman, USA Christian Stary, Austria Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg Vanderheiden, USA Gerhard Weber, Germany
VII
VIII
Foreword
Arthur Karshmer, USA Simeon Keates, USA George Kouroupetroglou, Greece Jonathan Lazar, USA Seongil Lee, Korea
Harald Weber, Germany Toshiki Yamaoka, Japan Mary Zajicek, UK Panayiotis Zaphiris, UK
Virtual Reality Program Chair: Randall Shumaker Terry Allard, USA Pat Banerjee, USA Robert S. Kennedy, USA Heidi Kroemker, Germany Ben Lawson, USA Ming Lin, USA Bowen Loftin, USA Holger Luczak, Germany Annie Luciani, France Gordon Mair, UK
Ulrich Neumann, USA Albert "Skip" Rizzo, USA Lawrence Rosenblum, USA Dylan Schmorrow, USA Kay Stanney, USA Susumu Tachi, Japan John Wilson, UK Wei Zhang, P.R. China Michael Zyda, USA
Usability and Internationalization Program Chair: Nuray Aykin Genevieve Bell, USA Alan Chan, Hong Kong Apala Lahiri Chavan, India Jori Clarke, USA Pierre-Henri Dejean, France Susan Dray, USA Paul Fu, USA Emilie Gould, Canada Sung H. Han, South Korea Veikko Ikonen, Finland Richard Ishida, UK Esin Kiris, USA Tobias Komischke, Germany Masaaki Kurosu, Japan James R. Lewis, USA
Rungtai Lin, Taiwan Aaron Marcus, USA Allen E. Milewski, USA Patrick O'Sullivan, Ireland Girish V. Prabhu, India Kerstin Röse, Germany Eunice Ratna Sari, Indonesia Supriya Singh, Australia Serengul Smith, UK Denise Spacinsky, USA Christian Sturm, Mexico Adi B. Tedjasaputra, Singapore Myung Hwan Yun, South Korea Chen Zhao, P.R. China
Online Communities and Social Computing Program Chair: Douglas Schuler Chadia Abras, USA Lecia Barker, USA Amy Bruckman, USA
Stefanie Lindstaedt, Austria Diane Maloney-Krichmar, USA Isaac Mao, P.R. China
Foreword
Peter van den Besselaar, The Netherlands Peter Day, UK Fiorella De Cindio, Italy John Fung, P.R. China Michael Gurstein, USA Tom Horan, USA Piet Kommers, The Netherlands Jonathan Lazar, USA
IX
Hideyuki Nakanishi, Japan A. Ant Ozok, USA Jennifer Preece, USA Partha Pratim Sarker, Bangladesh Gilson Schwartz, Brazil Sergei Stafeev, Russia F.F. Tusubira, Uganda Cheng-Yen Wang, Taiwan
Augmented Cognition Program Chair: Dylan D. Schmorrow Kenneth Boff, USA Joseph Cohn, USA Blair Dickson, UK Henry Girolamo, USA Gerald Edelman, USA Eric Horvitz, USA Wilhelm Kincses, Germany Amy Kruse, USA Lee Kollmorgen, USA Dennis McBride, USA
Jeffrey Morrison, USA Denise Nicholson, USA Dennis Proffitt, USA Harry Shum, P.R. China Kay Stanney, USA Roy Stripling, USA Michael Swetnam, USA Robert Taylor, UK John Wagner, USA
Digital Human Modeling Program Chair: Vincent G. Duffy Norm Badler, USA Heiner Bubb, Germany Don Chaffin, USA Kathryn Cormican, Ireland Andris Freivalds, USA Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Sung H. Han, South Korea Pheng Ann Heng, Hong Kong Dewen Jin, P.R. China Kang Li, USA
Zhizhong Li, P.R. China Lizhuang Ma, P.R. China Timo Maatta, Finland J. Mark Porter, UK Jim Potvin, Canada Jean-Pierre Verriest, France Zhaoqi Wang, P.R. China Xiugan Yuan, P.R. China Shao-Xiang Zhang, P.R. China Xudong Zhang, USA
In addition to the members of the Program Boards above, I also wish to thank the following volunteer external reviewers: Kelly Hale, David Kobus, Amy Kruse, Cali Fidopiastis and Karl Van Orden from the USA, Mark Neerincx and Marc Grootjen from the Netherlands, Wilhelm Kincses from Germany, Ganesh Bhutkar and Mathura Prasad from India, Frederick Li from the UK, and Dimitris Grammenos, Angeliki
X
Foreword
Kastrinaki, Iosif Klironomos, Alexandros Mourouzis, and Stavroula Ntoa from Greece. This conference could not have been possible without the continuous support and advise of the Conference Scientific Advisor, Prof. Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications Chair and Editor of HCI International News, Abbas Moallem, and of the members of the Organizational Board from P.R. China, Patrick Rau (Chair), Bo Chen, Xiaolan Fu, Zhibin Jiang, Congdong Li, Zhenjie Liu, Mowei Shen, Yuanchun Shi, Hui Su, Linyang Sun, Ming Po Tham, Ben Tsiang, Jian Wang, Guangyou Xu, Winnie Wanli Yang, Shuping Yi, Kan Zhang, and Wei Zho. I would also like to thank for their contribution towards the organization of the HCI International 2007 Conference the members of the Human Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, Maria Pitsoulaki, George Paparoulis, Maria Bouhli, Stavroula Ntoa and George Margetis.
Constantine Stephanidis General Chair, HCI International 2007
HCI International 2009
The 13th International Conference on Human-Computer Interaction, HCI International 2009, will be held jointly with the affiliated Conferences in San Diego, California, USA, in the Town and Country Resort & Convention Center, 19-24 July 2009. It will cover a broad spectrum of themes related to Human Computer Interaction, including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. For more information, please visit the Conference website: http://www.hcii2009.org/
General Chair Professor Constantine Stephanidis ICS-FORTH and University of Crete Heraklion, Crete, Greece Email: [email protected]
Table of Contents
Part 1: Interaction Design: Theoretical Issues, Methods, Techniques and Practice Design Principles Based on Cognitive Aging . . . . . . . . . . . . . . . . . . . . . . . . . Hiroko Akatsu, Hiroyuki Miki, and Naotsune Hosono
3
Redesigning the Rationale for Design Rationale . . . . . . . . . . . . . . . . . . . . . . Michael E. Atwood and John Horner
11
HCI and the Face: Towards an Art of the Soluble . . . . . . . . . . . . . . . . . . . . Christoph Bartneck and Michael J. Lyons
20
Towards Generic Interaction Styles for Product Design . . . . . . . . . . . . . . . Jacob Buur and Marcelle Stienstra
30
Context-Centered Design: Bridging the Gap Between Understanding and Designing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunan Chen and Michael E. Atwood
40
Application of Micro-Scenario Method (MSM) to User Research for the Motorcycle’s Informatization - A Case Study for the Information Support System for Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroshi Daimoto, Sachiyo Araki, Masamitsu Mizuno, and Masaaki Kurosu
A New User-Centered Design Process for Creating New Value and Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasuhisa Itoh, Yoko Hirose, Hideaki Takahashi, and Masaaki Kurosu
108
The Evasive Interface – The Changing Concept of Interface and the Varying Role of Symbols in Human–Computer Interaction . . . . . . . . . . . . Lars-Erik Janlert
117
An Ignored Factor of User Experience: FEEDBACK-QUALITY . . . . . . . Ji Hong and Jiang Xubo
127
10 Heuristics for Designing Administrative User Interfaces – A Collaboration Between Ethnography, Design, and Engineering . . . . . . . . . Luke Kowalski and Kristyn Greenwood
133
Micro-Scenario Database for Substantializing the Collaboration Between Human Science and Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . Masaaki Kurosu, Kentaro Go, Naoki Hirasawa, and Hideaki Kasai
140
A Meta-cognition Modeling of Engineering Product Designer in the Process of Product Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Liang, Zu-Hua Jiang, Yun-Song Zhao, and Jin-Lian Wang
146
User Oriented Design to the Chinese Industries Scenario and Experience Innovation Design Approach for the Industrializing Countries in the Digital Technology Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . You Zhao Liang, Ding Hau Huang, and Wen Ko Chiou
Axiomatic Design Approach for E-Commercial Web Sites . . . . . . . . . . . . . Mehmet Mutlu Yenisey
308
Development of Quantitative Metrics to Support UI Designer Decision-Making in the Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Young Sik Yoon and Wan Chul Yoon
316
Scenario-Based Product Design, a Real Case . . . . . . . . . . . . . . . . . . . . . . . . Der-Jang Yu and Huey-Jiuan Yeh
Long Term Usability; Its Concept and Research Approach - The Origin of the Positive Feeling Toward the Product . . . . . . . . . . . . . . . . . . . . . . . . . . Masaya Ando and Masaaki Kurosu
393
General Interaction Expertise: An Approach for Sampling in Usability Testing of Consumer Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali Emre Berkman
397
Are Guidelines and Standards for Web Usability Comprehensive? . . . . . . Nigel Bevan and Lonneke Spinhof
A Game to Promote Understanding About UCD Methods and Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muriel Garreta-Domingo, Mag´ı Almirall-Hill, and Enric Mor
446
Table of Contents
DEPTH TOOLKIT: A Web-Based Tool for Designing and Executing Usability Evaluations of E-Sites Based on Design Patterns . . . . . . . . . . . . Petros Georgiakakis, Symeon Retalis, Yannis Psaromiligkos, and George Papadimitriou
XVII
453
Evaluator of User’s Actions (Eua) Using the Model of Abstract Representation Dgaui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Susana G´ omez-Carnero and Javier Rodeiro Iglesias
463
Adaptive Evaluation Strategy Based on Surrogate Model . . . . . . . . . . . . . . Yi-nan Guo, Dun-wei Gong, and Hui Wang
472
A Study on the Improving Product Usability Applying the Kano’s Model of Customer Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeongyun Heo, Sanhyun Park, and Chiwon Song
482
The Practices of Usability Analysis to Wireless Facility Controller for Conference Room . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ding Hau Huang, You Zhao Liang, and Wen Ko Chiou
490
What Makes Evaluators to Find More Usability Problems?: A Meta-analysis for Individual Detection Rates . . . . . . . . . . . . . . . . . . . . . . . . Wonil Hwang and Gavriel Salvendy
499
Evaluating in a Healthcare Setting: A Comparison Between Concurrent and Retrospective Verbalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Janne Jul Jensen
508
Development of AHP Model for Telematics Haptic Interface Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yong Gu Ji, Beom Suk Jin, Jae Seung Mun, and Sang Min Ko
Why It Is Difficult to Use a Simple Device: An Analysis of a Room Thermostat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sami Karjalainen
544
Usability Improvements for WLAN Access . . . . . . . . . . . . . . . . . . . . . . . . . . Kristiina Karvonen and Janne Lindqvist
549
A New Framework of Measuring the Business Values of Software . . . . . . . In Ki Kim, Beom Suk Jin, Seungyup Baek, Andrew Kim, Yong Gu Ji, and Myung Hwan Yun
559
XVIII
Table of Contents
Evaluating Usability Evaluation Methods: Criteria, Method and a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Koutsabasis, T. Spyrou, and J. Darzentas
How to Use Emotional Usability to Make the Product Serves a Need Beyond the Traditional Functional Objective to Satisfy the Emotion Needs of the User in Order to Improve the Product Differentiator - Focus on Home Appliance Product . . . . . . . . . . . . . . . . . . . Liu Ning and Shang Ting Towards Remote Empirical Evaluation of Web Pages’ Usability . . . . . . . . Juan Miguel L´ opez, Inmaculada Fajardo, and Julio Abascal Mixing Evaluation Methods for Assessing the Utility of an Interactive InfoVis Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Markus Rester, Margit Pohl, Sylvia Wiltner, Klaus Hinum, Silvia Miksch, Christian Popow, and Susanne Ohmann
Effectiveness of Content Preparation in Information Technology Operations: Synopsis of a Working Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Savoy and G. Salvendy
624
Traces Using Aspect Oriented Programming and Interactive Agent-Based Architecture for Early Usability Evaluation: Basic Principles and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean-Claude Tarby, Houcine Ezzedine, Jos´e Rouillard, Chi Dung Tran, Philippe Laporte, and Christophe Kolski
632
Usability and Software Development: Roles of the Stakeholders . . . . . . . . Tobias Uldall-Espersen and Erik Frøkjær
642
Human Performance Model and Evaluation of PBUI . . . . . . . . . . . . . . . . . Naoki Urano and Kazunari Morimoto
The Balancing Act Between Computer Security and Convenience . . . . . . Mayuresh Ektare and Yanxia Yang
731
What Makes Them So Special?: Identifying Attributes of Highly Competent Information System Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brenda Eschenbrenner and Fiona Fui-Hoon Nah
736
User Acceptance of Digital Tourist Guides Lessons Learnt from Two Field Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bente Evjemo, Sigmund Akselsen, and Anders Sch¨ urmann
746
Why Does IT Support Enjoyment of Elderly Life? - Case Studies Performed in Japan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaori Fujimura, Hitomi Sato, Takayoshi Mochizuki, Kubo Koichiro, Kenichiro Shimokura, Yoshihiro Itoh, Setsuko Murata, Kenji Ogura, Takumi Watanabe, Yuichi Fujino, and Toshiaki Tsuboi
756
Design Effective Navigation Tools for Older Web Users . . . . . . . . . . . . . . . Qin Gao, Hitomi Sato, Pei-Luen Patrick Rau, and Yoko Asano
765
Out of Box Experience Issues of Free and Open Source Software . . . . . . . Mehmet G¨ okt¨ urk and G¨ orkem C ¸ etin
774
XX
Table of Contents
Factor Structure of Content Preparation for E-Business Web Sites: A Survey Results of Industrial Employees in P.R. China . . . . . . . . . . . . . . . . Yinni Guo and Gavriel Salvendy
784
Streamlining Checkout Experience – A Case Study of Iterative Design of a China e-Commerce Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alice Han, Jianming Dong, Winnie Tseng, and Bernd Ewert
796
Presence, Creativity and Collaborative Work in Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilona Heldal, David Roberts, Lars Br˚ athe, and Robin Wolff
Mental Models of Chinese and German Users and Their Implications for MMI: Experiences from the Case Study Navigation System . . . . . . . . . Barbara Knapp
User Response to Free Trial Restrictions: A Coping Perspective . . . . . . . . Xue Yang, Chuan-Hoo Tan, and Hock-Hai Teo
991
XXII
Table of Contents
A Study on the Form of Representation of the User’s Mental Model-Oriented Ancient Map of China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001 Rui Yang, Dan Li, and Wei Zhou Towards Automatic Cognitive Load Measurement from Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011 Bo Yin and Fang Chen Attitudes in ICT Acceptance and Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021 Ping Zhang and Shelley Aikman
Part 4: Models and Patterns in HCI Using Patterns to Support the Design of Flexible User Interaction . . . . . . 1033 M. Cec´ılia C. Baranauskas and Vania Paula de Almeida Neris Model-Based Usability Evaluation - Evaluation of Tool Support . . . . . . . . 1043 Gregor Buchholz, J¨ urgen Engel, Christian M¨ artin, and Stefan Propp User-Oriented Design (UOD) Patterns for Innovation Design at Digital Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053 Chiou Wen-Ko, Chen Bi-Hui, Wang Ming-Hsu, and Liang You-Zhao Formal Validation of Java/Swing User Interfaces with the Event B Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062 Alexandre Cortier, Bruno d’Ausbourg, and Yamine A¨ıt-Ameur Task Analysis, Usability and Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072 David Cox ORCHESTRA: Formalism to Express Static and Dynamic Model of Mobile Collaborative Activities and Associated Patterns . . . . . . . . . . . . . . 1082 Bertrand David, Ren´e Chalon, Olivier Delotte, and Guillaume Masserey Effective Integration of Task-Based Modeling and Object-Oriented Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1092 Anke Dittmar and Ashraf Gaffar A Pattern Decomposition and Interaction Design Approach . . . . . . . . . . . 1102 Cunhao Fang, Pengwei Tian, and Ming Zhong Towards an Integrated Approach for Task Modeling and Human Behavior Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109 Martin Giersich, Peter Forbrig, Georg Fuchs, Thomas Kirste, Daniel Reichart, and Heidrun Schumann A Pattern-Based Framework for the Exploration of Design Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119 Tibor Kunert and Heidi Kr¨ omker
Table of Contents
XXIII
Tasks Models Merging for High-Level Component Composition . . . . . . . . 1129 Arnaud Lewandowski, Sophie Lepreux, and Gr´egory Bourguin Application of Visual Programming to Web Mash Up Development . . . . . 1139 Seung Chan Lim, Sandi Lowe, and Jeremy Koempel Comprehensive Task and Dialog Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . 1149 V´ıctor L´ opez-Jaquero and Francisco Montero Structurally Supported Design of HCI Pattern Languages . . . . . . . . . . . . . 1159 Christian M¨ artin and Alexander Roski Integrating Authoring Tools into Model-Driven Development of Interactive Multimedia Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1168 Andreas Pleuß and Heinrich Hußmann A Survey on Transformation Tools for Model Based User Interface Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1178 Robbie Schaefer A Task Model Proposal for Web Sites Usability Evaluation for the ErgoMonitor Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1188 Andr´e Luis Schwerz, Marcelo Morandini, and S´ergio Roberto da Silva Model-Driven Architecture for Web Applications . . . . . . . . . . . . . . . . . . . . . 1198 Mohamed Taleb, Ahmed Seffah, and Alain Abran HCI Design Patterns for PDA Running Space Structured Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206 Ricardo Tesoriero, Francisco Montero, Mar´ıa D. Lozano, and Jos´e A. Gallud Task-Based Prediction of Interaction Patterns for Ambient Intelligence Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216 Kristof Verpoorten, Kris Luyten, and Karin Coninx Patterns for Task- and Dialog-Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226 Maik Wurdel, Peter Forbrig, T. Radhakrishnan, and Daniel Sinnig Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237
Part I
Interaction Design: Theoretical Issues, Methods, Techniques and Practice
Design Principles Based on Cognitive Aging Hiroko Akatsu1, Hiroyuki Miki1, and Naotsune Hosono2 1 Oki Electric Industry Co., Ltd. 1-16-8 Chuou Warabi-shi, Saitama, 335-8510 Japan [email protected], [email protected] 2 Oki Consulting Solutions Co., Ltd. [email protected]
Abstract. This study proposes the design principles considering the balance of ‘simplicity’ and ’helpfulness’ based on cognitive aging. Due to the increase of the aging population, various equipments are required to better assist the elderly users. ATMs (Automatic Teller Machine) have always been considered to be equipment that is difficult for the elderly users. Then this paper discusses a new ATM interface design considering the principles. The effectiveness of the new design was examined by comparing it with a conventional ATM. The usability test results favored the new ATM design, and it is consequently accepted by many elderly users. Keywords: cognitive aging, design principles, elderly users, ATM.
2 Influences of Interaction Equipments by Cognitive Aging 2.1 Issues It is important to consider not only the perceptive and physical characteristics, but a comprehensive consideration including cognitive behavioral characteristics that are definite influences on operation should also be taken into account (Figure1).
The elderly users’ characteristics when they operate various equipments
Aged-Changes Decreased vision
Slow operations through confirmations
Cataracta senilis Decreased sensibility
Hard to understand all the information at once.
longer response time
Hard to notice the screen changes
Diminished attention
Repeat similar errors
Decline in memory
Hesitate to take initiatives
・ ・ ・ Perceptive Physical characteristics
・ ・ ・ Cognitive behavioral characteristics
Cognitive aging Fig. 1. Cognitive aging
The elderly user's characteristics were found by usability tests of various equipment as presented below [3]. 1) Longer Response Time than Younger Users The time required for entries was quite long when using the 50 character keys, which involved the time to insert a passbook or cash and the overall time responding to individual items. This often resulted in a time-out, meaning many of the elderly needed to repeat the procedure from the beginning. A comparison of the average times needed for each task revealed that the group of elderly users took twice as long as the group of university students for withdrawal operations and three times as long for fund transfers. It was found by repeating the same operations, such as entering one's name using the 50 character keys, however, the elderly people also learned the operation, and this resulted in a shortening of time for such tasks.
Design Principles Based on Cognitive Aging
5
2) Difficulties Collecting all the Information in a Short Time Under certain conditions, they experienced difficulties in collecting all the necessary information at once, such as being able to read only a portion of the messages displayed on the screen. 3) Excessive Response to Voice Messages In general the voice message prompts prevented the elderly from forgetting to press a key (example: A voice message such as “Please verify the amount and press the 'Confirm' key if the amount is correct”). However, when a voice message prompting them to “enter your name” was given at a time after the name was entered, the elderly proceeded to enter the name again, even though the name entry had just been completed. 4) Recurrence of the Same Errors It was found when an operational error was once made, then there was a tendency to repeat the same error. It appears that it is difficult for the elderly to determine what status they are currently in or how the operation was done previously, therefore, making it difficult for themselves to avoid the same errors. 5) They Tend to Respond to Items that are Easily seen or can be Touched Directly by Hand (example: hardware keys) 6) They Hardly Notice the Changes to Information Displayed on the Screen 7) They cannot always extract the necessary information (or they will try to read all the information, but will get tired on the way through and are unable to finish the reading). 8) They will not take any initiatives on their own (or they will just follow the orders when they are asked to push keys, for example). 2.2 Ease of Use and Cognitive Aging: A Three-Layered Factor Model By sorting out the problems of the elderly obtained through various experiments, it appears that the three factors as shown in Figure 2 overlap each other in a complex manner, causing the phenomena that the elderly“cannot use equipment”. The three factors are; (a) Factors Associated with the Deterioration of the Cognitive Capacity of the Elderly Users Factors that are the basis for the inability to use equipment are the deterioration of the cognitive function, which occurs by aging. As reported by researches in the field of experimental cognitive psychology, the deterioration of capabilities due to aging is considered to have a clear influence on the matter.
6
H. Akatsu, H. Miki, and N. Hosono
(b) Factors Relevant to the Lack of Knowledge and Mental Models (for Equipment and Systems) A mental model is an image that a user puzzles how equipment should be used. It is believed that the lack of such knowledge is accelerating the effects of cognitive aging outlined in (a) of figure 2, delaying the understanding on the operations of equipment. Such problems arise from the rapid acceleration in the advancement of IT equipments. This brings difficulties for the elderly in the future. So long as new technologies are being developed at all times, however, it is believed that new problems, which are different from those today, will appear continuously. (c) Factors Relevant to Attitude (cultural and social values) The elderly users seem to have an attitude of not even wanting to try to use the equipment from the start by selecting methods and means that are beyond their familiarity (example: Using a teller rather than an ATM), as they do not want to be seen as being incapable. This factor is a problem for manufacturers. Still, as mentioned before, with the branches of many banks being consolidated and reduced in number, it is believed that there will be an increasing number of situations in the future when the elderly are forced to use ATMs, which are eventually difficult for them to use. As our agenda for the future, it is essential to broaden the scope of usability research and to conduct studies from other perspectives, such as what needs to be done to enable the elderly to use the equipment. It is necessary to consider that a cause of one issue is not only by one factor but also by three factors. Then the design principles are based on cognitive aging considering the three factors. Consequently the new ATM design for the elderly users by the design principles is proposed. Afterward the effectiveness of the new design was compared with a conventional ATM.
(c) Factors associated w ith attitudes • Negative attitude by using the equipm ent • Values, knowledge and fram ework for each generation. • Select m ethods and m eans to effectively sustain their own capabilities. (b) Factors associated w ith a lack of know ledge and m ental m odels • Knowledge and m ental m odels concerning particular m odes of operation of equipm ent. • Knowledge relative to the concept of the inform ation itself. (a) Factors associated w ith the deterioration of cognitive capabilities of the elderly • Deterioration of inhibition functions. • Decrease in short-term m emory capacity. • Delays in com prehension.
Fig. 2. Ease of use and cognitive aging: A three-layered factor model (The material was touched up and corrected by Harada and Akatsu[3])
Design Principles Based on Cognitive Aging
7
3 Design Principles and ATM Design Through consideration of elderly users’ characteristics above, the following design principles were clarified. A new ATM design that balances ‘simplicity’ and ‘helpfulness’ based on cognitive aging is proposed. 1) Just One Operation Requires at One Screen ATM design example: the elderly users can perform the banking transaction in a stepby-step manner. 2) The Screen Switch Must be Noticed ATM design example: blinking buttons and screen switch by side slide at a time of page renewal (Figure3). 3) The Operation Flow also Must be Comprehensible ATM design example: The conventional ATM demands two operations of input and confirmation. The new ATM divides them into two pieces of an input screen and confirmation screen. As a result, the elderly users could use it with confident input operation and confirmation (see Figure 4). 4) The Screen Information Must be Easy to Read (sufficient font size and contrast) 5) Screen Information must be Simple as Possible The announcements generally support the operation. However, sometimes the announcements hinder the operation due to inappropriate timing and contents. Hence the following points were considered. 6) The Same Content as the Announcement Must be Displayed on the Screen 7) The Announcement Must be Done at the Time Just Before Changing to the Next Screen, and it must not Repeat 8) The Announcements of Feedback Massage can be Done by the Handset Phone
Fig. 3. Screen switch by side slide
8
H. Akatsu, H. Miki, and N. Hosono
Please enter the amount to remit. Next then please confirm.
cancel confirm
clear
amount
Conventional ATM Please enter the amount to remit. Clear
amount
The amount is “65,000 yen” Is it OK?
Clear
Yes
Next
Input Screen
Confirmation Screen
New ATM for elderly users Fig. 4. Input screen and Conformation screen
4 ATM Usability Testing The effectiveness of the new ATM design for the elderly users was compared with conventional one. 4.1 Methods At first the test participants were instructed to express vocally, what they were thinking while operating an ATM simulator (“Think Aloud Method”). Then, the collected data (every behavior and speech of the test participants) were “Protocol Analyzed”. 4.2 Test Participants The test participants were six elderly users (three males and three females, aged between 68 and 75). They have never used an ATM before. 4.3 Experimental Equipments As an intended system, the ATM simulator was prepared (a personal computer and a touch display were installed in a paper model housing), and ordinary transaction
Design Principles Based on Cognitive Aging
9
operations were then to be performed. A video camera, a tiepin- type small microphone, recording equipment, etc., were prepared as recording media. 4.4 Experimental Procedures Each test was conducted by the individual participants. First, an explanation of the usability test objectives, an explanation for the use of the equipment, practice of the thought utterance method and preliminary questionnaire survey concerning the use of ATM were conducted prior to performing the tasks. A follow-up questionnaire survey was conducted once after the tasks had been completed, and additional interviews were also conducted. The prepared two tasks were (1) withdrawal using a cash card, and (2) money transfer. 4.5 Results and Considerations 1) Decreased Number of Time-outs from Operational Errors It was found that most time-outs of an ATM operation occur when the elderly users become confused and are uncertain of what to do next. When a time-out occurs, the display is usually returned to the top screen and wipes out any previous efforts by the users. The number of time-outs of each user experienced during a money transfer task. As a whole, the new ATM design was found to decrease the number of time-out occurrences to less than half when compared with a conventional ATM. On the conventional ATM, the time-outs mainly occurred during the money transfer operation, entering the first letter of the bank branch name and selecting a bank branch from a list. On the other hand, the new ATM time-outs were found to occur during the name input using the Japanese character list. Consequently it can be said that the new ATM solved the issues of usability even though there are still some problems left with the name input. 2) Less Cognitive Load The six users were interviewed after the experimental evaluation. They admitted that the new ATM was easier to use and the most part were satisfied. From the comments made by the users, it is surmised that accumulation of useful tips on each screen page and overall effort to reduce cognitive load were effective. 3) Number of Operational Steps and Operational Confidence There is a trade off between simplifying one screen page information and the additional number of page operations. In the elderly user mode, additional screen pages are added, so that the operations can be performed easier and with their confidence. Operational rhythm is enhanced with subsidiary announcements to make the additional steps less noticeable. Interview results by the test participants showed they preferred simple usability even if several steps are added. Judging by the results of the usability test, the proposed principles were confirmed its effectiveness.
10
H. Akatsu, H. Miki, and N. Hosono
5 Conclusion This paper proposed to design a new ATM interface particularly reflecting the requirements of cognitive aging. Experimental evaluation shows a lower number of operational puzzlement and errors when compared with the conventional ATM. The elderly users appreciated the step-by-step operations, which were more in line with their input pace. Therefore the proposed principles were confirmed its effectiveness. As for the principles , not only the ATM but also other equipments will be applicable.
References 1. Fisk, A.D., Rogers, W.A., et al.: Designing for older adults: Principles and Creative Factors Approaches, CRC Press (2004) 2. Kyoyou-Hin Foundation: Inconvenience list such as the elderly people (1999) 3. Harada, T.E., Akatsu, H.: What is “Usability” - A Perspective of Universal Design in An Aging Society. In: Cognitive Science of Usability, Kyoritsu Publisher (2003)
Redesigning the Rationale for Design Rationale Michael E. Atwood and John Horner College of Information Science and Technology Drexel University Philadelphia, PA 19104 USA {atwood, jh38} @drexel.edu
Abstract. One goal of design rationale systems is to support designers by providing a means to record and communicate the argumentation and reasoning behind the design process. However, there are several inherent limitations to developing systems that effectively capture and utilize design rationale. The dynamic and contextual nature of design and our inability to exhaustively analyze all possible design issues results in cognitive, capture, retrieval, and usage limitations. In addition, there are the organizational limitations that ensue when systems are deployed. In this paper we analyze the essential problems that prevent the successful development and use of design rationale systems. We argue that useful and effective design rationale systems cannot be built unless we carefully redefine the goal of design rationale systems. Keywords: Design rationale, theories of design, interactive systems design.
1 Introduction Over the past two decades, much has been written about design rationale. That design rationale has remained an active research area within the human-computer interaction (HCI) community for an extended time indicates that researchers see it as an attractive and productive area for research. We share this enthusiasm for research on design rationale. But, at the same time, we have little confidence that useful and usable design rationale systems will ever be built. And, should they ever be built, we have little confidence that they will be used. The only solution we see to successful research on design rationale is to carefully define the rationale underlying design rationale. Our motivation in writing this paper is derived from two questions. First, since we don’t have a common understanding of what design is, how can we have a common understanding of what design rationale is? Second, why is the collection of papers that describe design rationale systems so much larger than the collection that describe design rationale successes?
Wania et al reported a bibliometric cocitation analysis of the HCI literature over much of the past two decades. From this analysis, shown in Figure 1, seven major approaches to design were identified. It is important to note that the Design Rationale cluster spans across much of the map, almost connecting one side to the other. Two points are worth noting here. First, design rationale is not a tool that other design communities use as much as it is a research area of its own; that is why is appears here as a separate cluster, Second, the design rationale community does not have a great deal of commonality in interest. The authors in the Design Rationale cluster all seem to be boundary spanners. Each author in this cluster is located very close to another cluster. This suggests that design rationale may mean different things to the different researchers and practitioners within this community. 2.1 Why Do the Papers Describing Systems Outnumber Those Describing Successes? In analyzing the papers that describe design rationale systems, we will look at two end-points. In 1991, a special issue of the journal Human-computer interaction presented six papers on design rationale. Of these six, only one reported any data on system use and this data indicated only that one design rationale system was usable; there was no data supporting a claim that is was useful. In 2006, an edited text [2] presented twenty papers on design rationale. Of these twenty, only one reported data on system usability; no data on usefulness was presented. Clearly, the number of papers describing design rationale systems is much larger then the number reporting design rationale successes. In order to understand why design rationale is not seen as a tool for designers and why successes are so rare, we will begin with a common view of design rationale. In Figure 2, we show the flow of information in most design rationale systems. Initially, designers consider alternatives to design issues they are facing [3]. Then, they store the rationale for their decisions in a design rationale system. At a later time, another design can browse the design rationale system to review earlier decisions and potentially to apply these earlier decisions to the current design. All of this, of course, sits in some organizational context.
Redesigning the Rationale for Design Rationale
13
Organizational Setting Artifact B
Artifact A
1
4
2
DR System
3
Fig. 2. Barriers to Effective Design Rationale Systems
Overall, design rationale systems are intended to support communication, reflection, and analysis in design. Design rationale systems are intended to support the communication of design decisions to others, to support reflecting on design options, and to support analyzing which option to select. But, referring back to Figure 2, the goal of transmitting information to future designers detracts from the goal of doing good designs today! Simply put, a designer’s cognitive energy can be focused on solving today’s problems or on recording information to be used in the future. But, doing one detracts from the other. We argue that the main use of rationale of design rationale systems is to support today’s design. In essence, this brings design rationale back to its starting point (e.g.,[4]).
3 The Essential Barriers For each of the activities shown in Figure 2, we list the essential problems that inhibit the success of design rationale systems. We use the term essential in the same way that Brooks [5] did; essential problems are inherent in the nature of the activity in contrast to accidental problems that are problems for today but which are not inherent and may well be solved by future technological advances. After analyzing these essential problems, we return to two additional questions. In order to better understand what the rationale for design rationale should be we must ask what do designer do? And then what should the goal of design rationale be? 3.1 Cognitive Barriers Designers must focus their cognitive energy on the problem at hand. Imposing inappropriate constraints or introducing irrelevant information into design activities can have detrimental effects. Satisficing, Not Optimization. People have a limited capacity to process information. This limitation can hinder the effectiveness of design rationale. Simon [6] states that we are bounded by our rationality and cannot consider all possible
14
M.E. Atwood and J. Horner
alternatives. Therefore, people choose satisfactory rather than optimal solutions. Since we are bounded by the amount of information we can process, design rationale is necessarily incomplete. Unintended Consequences. It is important to recognize the potential for unintended consequences, especially in systems where the risks are high [6]. In these situations, designers may want to ensure that they have exhaustively covered the design space so as to minimize the risk for unanticipated effects. The key question in this type of query is “what are we missing?” Design rationale is a potential solution to help designers identify issues that they may have otherwise left unconsidered. Systems could allow designers to search for similar projects or issues to identify issues that were considered in those projects. Collaboration Hampers Conceptual Integrity. One mechanism to more exhaustively analyze the design space is to use collaboration in the design process [7]. However, in any collaborative design context, maintaining conceptual integrity is important to keep the design project focused [5]. More people are capable of considering more ideas, but this adds complexity and effort in keeping persons on the design team up to speed. It also increases the effort of integrating diverse perspectives. 3.2 Capture Barriers There are many different situations in which design rationale may not be captured. In some cases, the omission is unintentional. In others, it is quite intentional. We consider both below. Work-benefit Disparity. Complex design is normally a group activity, and tools to support designers can therefore be considered a type of groupware. Grudin [8] describes several problems involved in developing groupware. Specifically, one of the obstacles he discusses is of particular interest to design rationale systems. He contends that there should not be a disparity between who incurs the cost and who receives the benefit. If the focus of design rationale is placed only on minimizing the cost to later users, it can add significant costs to the original designers. A major shortcoming in design rationale is the failure to minimize the cost to the original designers. Gruber and Russell [9] contend that design rationale must go beyond the record and replay paradigm and collect data that can benefit later users, while also not being a burden on designers. Context Is Hard to Capture. Design rationale may be considered, but unintentionally not recorded by the capture process. There are several reasons why considerations could be unintentionally omitted from design rationale. If the design rationale capture takes place outside of the design process, it is possible that contextual cues may not be present, and designers may not recall what they deliberated upon, or designers may not be available at the time the rationale is captured. For these reasons, it would appear that rationale should be captured in the context of design. However, it is not always possible or advantageous to capture rationale in
Redesigning the Rationale for Design Rationale
15
the design context. Grudin [10] notes that in certain development environments, exploring design space can be detrimental because it diverts critical resources. Additionally, many design decisions are considered in informal situations, where capturing the rationale is infeasible [11]. Tracking the location of where the rationale was recorded, the persons present at the time of design rationale capture, their roles and expertise, and the environmental context of the capture can help reviewers infer why specific information was considered. Designers Should Design, Not Record Rationale. Tacit knowledge [12] is a term used to describe things that we know, but are not able to bring to consciousness. It is possible that design rationale may unintentionally be omitted because a designer may not be able to explicate their tacit knowledge. Designers may not be able or willing to spend the energy to articulate their thoughts into the design rationale system, especially when they reach breakdowns, and are focusing on understanding and resolving the problem at hand. Conklin and Bergess-Yakemovic [7] state that designers focus should be on solving problems and not on capturing their decisions. During routine situations, designers react to problems as they arise without consciously thinking about them. Recording Rationale Can Be Dangerous! Sharing knowledge can be detrimental to designers, especially if the information they share could potentially be used against them. Designers may be hesitant to simply give away knowledge without knowing who will use it or how it will be used. Rewarding knowledge sharing is a challenging task that involves creating tangible rewards for intangible ideas. This is especially difficult considering that there is often no way to evaluate which ideas resulted in the success or failure of an artifact. In certain contexts, there are privacy and security concerns with the design rationale. For instance, organizations may want to keep their rationale secure so that competing organizations cannot gain a competitive advantage. Similarly, there may be political repercussions or security breaches if policy makers make their rationale available to the public. For example, designers may not want to document all of their considerations because politically motivated information could be held against them. There are also situations where people working outside the specified work procedures may not want to document their work-arounds in fear that it will be detrimental to them. Designers may not want to capture rationale that could be viewed as detrimental to themselves or certain other people, and therefore will intentionally omit certain rationale. Additionally, individual designers may not want their design considerations to be available for post-hoc scrutiny. 3.3 Retrieval Barriers Karsenty [13] evaluated design documents and found that design rationale questions were by far the most frequent questions during design evaluation meetings. However, only 41% of the design rationale questions were answered by the design rationale documentation. The reasoning for the discrepancy between the needed and captured design rationale is broken into several high-level reasons, including analysts not capturing questions, options, or criteria; the inadequacy of the design rationale method; and the lack of understanding. Other literature has focused on several issues
16
M.E. Atwood and J. Horner
that contribute to this failure, including inappropriate representations [14,15] the added workload required of designers [7,10] exigent organizational constraints [11] and contextual differences between the design environment at the time when the rationale is captured and the time when it is needed [9]. Relevance Is Situational. Initial designers and subsequent users of rationale may have different notions of what is relevant in a given design context. Wilson [16] describes relevance as a relationship between a user and a piece of information, and as independent of truth. Relevance is based on a user’s situational understanding of a concern. Moreover, he argues that situational relevance is an inherently indeterminate notion because of the changing, unsettled, and undecided character of our concerns. This suggests that the rationale constructed at design time may not be relevant to those reviewing the rationale at a later time in a different context. When rationale is exhaustively captured, there is an additional effort required to capture the information. And, when too little information is captured, the reviewers’ questions remain unanswered. Belkin [17] describes information retrieval as a type of communication whereby a user is investigating their state of knowledge with respect to a problem. Belkin contends that the success of the communication is dependent upon the extent to which the anomaly can be resolved based on the information provided, and thus is controlled by the recipient. This suggests that designers cannot recognize the relevance of rationale until a person queries it. And, later uses may not be able to specify what information will be most useful, but rather will only recognize that they do not have the necessary knowledge to resolve a problem. Indexing. A more structured representation can make it more difficult to capture design ideas, but can facilitate indexing and retrieval. One problem is that there is an inherent tradeoff between representational flexibility and ease of retrieval. Unstructured text is easier to record, but more difficult to structure in a database. One solution is to push the burden on to those who are receiving the benefit [8] which would be the retrievers in this case. However, if the potential users of the rationale find the system to be too effortful, then it will go unused. Then, designers will not be inclined to spend time entering design rationale into a system that will not be used. 3.4 Usage Barriers People reviewing design rationale have a goal and a task at hand that they hope the design rationale will support. Often, these people are also involved in designing. If this is the case, the reviewers may not know whether retrieved rationale is applicable to their current problem. The Same Problem in a Different Context Is a Different Problem. Because design problems are unique, even rationale that successfully resolved one design problem may not be applicable to a different problem. In addition to the problem of accurately and exhaustively capturing rationale, recognizing the impact of rationale can be a difficult task. Understanding rationale tied to one problem could help resolve similar problems in the future. However, design is contextual, and external factors often interact with the
Redesigning the Rationale for Design Rationale
17
design activity in a complex and unexpected manner. Reviewers of rationale are interested in understanding information to help them with their task-at-hand, and without understanding the context of those problems, utilization of the information becomes difficult. The inherent problem of identifying the impact of rationale across different design problems adds a net cost to utilizing rationale, decreasing the overall utility in the design process. Initiative Falls on the User. Design rationale systems are passive rather than active. The initiative to find relevant rationale falls on the user. The system does not suggest it; it is the user’s responsibility to retrieve it. 3.5 Organizational Barriers As Davenport and Prusak warn in their book [18] “if you build it, they may not come." Being able to build a system is only an initial step; the “gold standard” against which success is measured, however, is whether people will accept and use it. Designers don’t Control the Reward Structure of Users. As system builders, we do not have much control over the personal reward systems of the individual users and management mandate that many [18,19] recommend will enhance usage of the technology, and therefore we can not motivate our users as such. Therefore, we must rely on other factors. Informal Knowledge is Difficult to Capture. Design Rationale tools must support both formal and informal knowledge, making the system flexible enough so that broad content types were supported [20]. They must support multiple levels of organization of content and design systems so that knowledge can be structured at any time after it is entered [21].
4 Conclusions In this paper, we have explored the role of design rationale research within the broader design community. And, we have looked into a number of barriers that impede design rationale as an effective tool for reflection, communication, and analysis. The barriers were discussed in terms of cognitive, capture, retrieval, usage, and organizational limitations. At one level, the intent of design rationale is to transmit information from a designer working at one time and in one context to another designer working in another time and context. This is the most frequently-cited goal in design rationale research. But, is this the ultimate goal of design rationale? We argue that it is not. The goal of research on design rationale is to improve the quality of designs. There are fundamental barriers to developing information systems that support asynchronous communication among designers working on different design problems. Therefore, design research should focus on supporting designers who better understand the context of their unique problems.
18
M.E. Atwood and J. Horner
The goal of research on design rationale is to improve the quality of designs. There are fundamental barriers to developing computer systems that support communication among designers working on design problems. Therefore, the focus of design rationale should be on identifying what tools are most appropriate for the task. Using less persistent modes of communication, putting a greater emphasis on supporting design processes rather than design tools, and creating systems that are optimized for a single purpose are necessary steps for improving design.
References 1. Wania, C., McCain, K., Atwood, M.E.: How do design and evaluation interrelate in HCI research? In: Proceedings of the 6th ACM conference on Designing Interactive systems, June 26-28, 2006, University Park, PA, USA (2006) 2. Dutoit, McCall, Mistrik, Paech. (eds.) Rationale Management in Software Engineering. Springer Heidelberg 3. Horner, J., Atwood, M.E.: Design rationale: the rationale and the barriers. In: Proceedings of the 4th ACM Nordic conference on Human-computer interaction: changing roles (2006) 4. Rittel, H., Weber, M.: Planning Problems are Wicked Problems. In: Cross, N. (ed.) Developments in design methodology, pp. 135–144. Wiley, Chichester; New York (1984) 5. Brooks, F.P.: The mythical man-month: essays on software engineering. Addison-Wesley Pub. Co, Reading, Mass (1995) 6. Simon, H.A.: The sciences of the artificial. Cambridge, MA, MIT Press. 1996. Tenner, E. Why things bite back: technology and the revenge of unintended consequences. New York, Knopf (1996) 7. Conklin, E., Bergess-Yakemovic, K.: A process oriented approach to design rationale. In: Moran, T.P., Carroll, J.M. (eds.) Design rationale: concepts, techniques, and use, L. Erlbaum Associates, Mahwah, N.J (1996) 8. Grudin, J.: Groupware and social dynamics: eight challenges for developers. Communications of the ACM 37(1), 92–105 (1994) 9. Gruber, T., Russell, D.: Generative Design Rationale. Beyond the Record and Replay Paradigm. In: Moran, T.P., Carroll, J.M. (eds.) esign rationale: concepts, techniques, and use, L. Erlbaum Associates, Mahwah, N.J (1996) 10. Grudin, J.: Evaluating opportunities for design capture. In: Moran, T.P., Carroll, J.M. (eds.) Design rationale: concepts, techniques, and use, L. Erlbaum Associates, Mahwah, N.J (1996) 11. Sharrock, W., Anderson, R.: Synthesis and Analysis: Five modes of reasoning that guide design. In: Moran, T.P., Carroll, J.M. (eds.) Design rationale: concepts, techniques, and use, L. Erlbaum Associates, Mahwah, N.J (1996) 12. Polanyi, M.: The tacit dimension. Doubleday, Garden City, NY (1966) 13. Karsenty, L.: An empirical evaluation of design rationale documents. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 150–156. ACM Press, New York (1996) 14. Lee, J., Lai, K.: What’s in design rationale? In: Moran, T.P., Carroll, J.M. (eds.) Design rationale: concepts, techniques, and use, L. Erlbaum Associates, Mahwah, N.J (1996) 15. MacLean, A., Young, R., Bellotti, V., Moran, T.: Questions, Options, Criteria: Elements of design space analysis. In: Moran, T.P., Carroll, J.M. (eds.) Design rationale: concepts, techniques, and use, L. Erlbaum Associates, Mahwah, N.J (1996) 16. Wilson, P.: Situational Relevance. Information Stor. Retrieval 9, 457–471 (1973)
Redesigning the Rationale for Design Rationale
19
17. Belkin, N.: Anomalous States of Knowledge as a Basis for Information Retrieval. Canadian Journal of Information Science 5, 133–143 (1980) 18. Davenport, T.H., Prusak, L.: Working Knowledge: How Organizations Manage What They Know. Harvard Business School Press, Boston, Massachusetts (1998) 19. Orlikowski, W.J., Hofman, J.D.: An Improvisational Model for Change Management: The Case of Groupware Technologies, Sloan Management Review/Winter, pp. 11–21 (1997) 20. Davenport, T.H.: Saving IT’s Soul: Human-Centered Information Management, Harvard Business Review: Creating a System to Manage Knowledge, 1994, product #39103, pp. 39–53 (1994) 21. Shipman, F., McCall, R.: Incremental Formalization with the Hyper-Object Substrate. ACM Transactions on Information Systems (1999)
HCI and the Face: Towards an Art of the Soluble Christoph Bartneck1 and Michael J. Lyons2 1
Department of Industrial Design, Eindhoven University of Technology, Den Dolech 2, 5600 MB Eindhoven, The Netherlands [email protected] 2 ATR Intelligent Robotics and Communication Labs, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan [email protected]
Abstract. The human face plays a central role in most forms of natural human interaction so we may expect that computational methods for analysis of facial information and graphical and robotic methods for synthesis of faces and facial expressions will play a growing role in human-computer and human-robot interaction. However, certain areas of face-based HCI, such as facial expression recognition and robotic facial display have lagged others, such as eye-gaze tracking, facial recognition, and conversational characters. Our goal in this paper is to review the situation in HCI with regards to the human face, and to discuss strategies which could bring more slowly developing areas up to speed. Keywords: face, hci, soluble, recognition, synthesis.
recognition community and the findings are highly relevant to HCI [1, 2]. Work on animated avatars may be considered to be mature [3], while the younger field of social robotics is expanding rapidly [4-6]. FP is a central concern in both of these fields, and HCI researchers can contribute to and benefit from the results.
2 HCI and the Face Computer scientists and engineers have worked increasingly on FP, from the widely varying viewpoints of graphics, animation, computer vision, and pattern recognition. However, an examination of the HCI research literature indicates that activity is restricted to a relatively narrow selection of these areas. Eye gaze has occupied the greatest share of HCI research on the human face (e.g. [7]). Eye gaze tracking technology is now sufficiently advanced that several commerical solutions are available (e.g. Tobii Technology [8]). Gaze tracking is a widely used technique in interface usability, machine-mediated human communication, and alternative input devices. This area can be viewed as a successful, sub-field related to face-based HCI. Numerous studies have emphasized the neglect of human affect in interface design and argued this could have major impact on the human aspects of computing [9]. Accordingly, there has been much effort in the pattern recognition, AI, and robotics communities towards the analysis, understanding, and synthesis of emotion and expression. In the following sections we briefly introduce the areas related to analysis and synthesis, especially by robots, of facial expressions. In addition, we share insights on these areas gained during a workshop we organized on the topic. 2.1 Analysis: Facial Expression Classification The attractive prospect of being able to gain insight into a user’s affective state may be considered one of the key unsolved problems in HCI. It is known that it is difficult to measure the “valence” component of affective state, as compared to “arousal”, which may be gauged using biosensors. However, a smile, or frown, provides a clue that goes beyond physiological measurements. It is also attractive that expressions can be guaged non-invasively with inexpensive video cameras. Automatic analysis of video data displaying facial expressions has become a topic of active area of computer vision and pattern recognition research (for reviews see [10, 11]). The scope of the problem statement has, however, been relatively narrow. Typically one measures the performance of a novel classification algorithm on recognition of the basic expression classes proposed by Ekman and Friesen [12]. Expression data often consists of a segmented headshot taken under relatively controlled conditions and classification accuracy is based on comparison with emotion labels provided by human experts. This bird’s eye caricature of the methodology used by the pattern recognition community given above is necessarily simplistic, however it underlines two general reflections. First, pattern recognition has successfully framed the essentials of the facial expression problem to allow for effective comparison of algorithms. This narrowing of focus has led to impressive developments of the techniques for facial expression analysis and substantial understanding. Second, the narrow framing of the
22
C. Bartneck and M.J. Lyons
FP problem typical in the computer vision and pattern recognition may not be appropriate for HCI problems. This observation is a main theme of this paper, and we suggest that progress on use of FP in HCI may require re-framing the problem. Perhaps the most salient aspect of our second general observation on the problem of automatic facial expression recognition is that HCI technology can often get by with partial solutions. A system that can discriminate between a smile and frown, but not an angry versus disgusted face, can still be a valuable tool for HCI researchers, even if it is not regarded as a particularly successful algorithm from the pattern recognition standpoint. Putting this more generally, components of algorithms developed in the pattern recognition community, may already have sufficient power to be useful in HCI, even if they do not yet constitute general facial expression analysis systems. Elsewhere in this paper we give several examples to back up this statement. 2.2 Synthesis: Robotic Facial Expressions There is a long tradition within the HCI community of investigating and building screen based characters that communicate with users [3]. Recently, robots have also been introduced to communicate with the users and this area has progressed sufficiently that some review articles are available [4, 6]. The main advantage that robots have over screen based agents is that they are able to directly manipulate the world. They not only converse with users, but also perform embodied physical actions. Nevertheless, screen based characters and robots share an overlap in motivations for and problems with communicating with users. Bartneck et al. [13] has shown, for example, that there is no significant difference in the users’ perception of emotions as expressed by a robot or a screen based character. The main motivation for using facial expressions to communicate with a user is that it is, in fact, impossible not to communicate. If the face of a character or robot remains inert, it communicates indifference. To put it another way, since humans are trained to recognize and interpret facial expressions it would be wasteful to ignore this rich communication channel. Compared to the state of the art in screen-based characters, such as Embodied Conversational Agents [3], however, the field of robot’s facial expressions is underdeveloped. Much attention has been paid to robot motor skills, such as locomotion and gesturing, but relatively little work has been done on their facial expression. Two main approaches can be observed in the field of robotics and screen based characters. In one camp are researchers and engineers who work on the generation of highly realistic faces. A recent example of a highly realistic robot is the Geminoid H1 which has 13 degrees of freedom (DOF) in its face alone. The annual Miss Digital award [14] may be thought of as a benchmark for the development of this kind of realistic computer generated face. While significant progress has been made in these areas, we have not yet reached human-like detail and realism, and this is acutely true for the animation of facial expressions. Hence, many highly realistic robots and character currently struggle with the phenomena of the “Uncanny Valley” [15], with users experiencing these artificial beings to be spooky or unnerving. Even
HCI and the Face: Towards an Art of the Soluble
23
the Repliee Q1Expo is only able to convince humans of the naturalness of its expressions for at best a few seconds [16]. In summary, natural robotic expressions remain in their infancy [6]. Major obstacles to the development of realistic robots lie with the actuators and the skin. At least 25 muscles are involved in the expression in the human face. These muscles are flexible, small and be activated very quickly. Electric motors emit noise while pneumatic actuators are difficult to control. These problems often result in robotic heads that either have a small number of actuators or a somewhat larger-thannormal head. The Geminoid H1 robot, for example, is approximately five percent larger than its human counterpart. It also remains difficult to attach skin, which is often made of latex, to the head. This results in unnatural and non-human looking wrinkles and folds in the face. At the other end of the spectrum, there are many researchers who are developing more iconic faces. Bartneck [17] showed that a robot with only two DOF in the face can produce a considerable repertoire of emotional expressions that make the interaction with the robot more enjoyable. Many popular robots, such as Asimo [18], Aibo [19] and PaPeRo [20] have only a schematic face with few or no actuators. Some of these only feature LEDs for creating facial expressions. The recently developed iCat robot is a good example of an iconic robot that has a simple physically-animated face [21]. The eyebrows and lips of this robot move and this allows synthesis of a wide range of expressions. More general and fundamental unsolved theoretical aspects of facial information are also relevant to the synthesis of facial expressions. The representation of the space of emotional expressions is a prime example [22]. The space of expressions is often modeled either with continuous dimensions, such as valence and arousal [23] or with a categorical approach [12]. This controversial issue has broad implications for all HCI applications involving facial expression [22]. The same can be said for other fundamental aspects of facial information processing, such as the believability of synthetic facial expressions by characters and robots [5, 24].
3 Workshop on “HCI and the Face” As part of our effort to examine the state of the field of FP in HCI, we organized a day-long workshop the ACM CHI’2006 conference (see: http://www.bartneck.de/ workshop/chi2006/ for details). The workshop included research reports, focus groups, and general discussions. This has informed our perspective on the role of FP in HCI, as presented in the current paper. One focus group summarized the state of the art in facial expression analysis and synthesis, while another brainstormed HCI applications. The idea was to examine whether current technology sufficient advanced to support HCI applications. The proposed applications were organized with regards to the factors “Application domain” and “Intention” (Table 1). Group discussionseemed to naturally focus on applications that involve some type of agent, avatar or robot. It is nearly impossible to provide an exhaustive list of applications for each field in the matrix. The ones listed in the table should therefore be only considered as representative examples.
24
C. Bartneck and M.J. Lyons Table 1. Examples of face processing applications in HCI and HRI Intention Persuade
Application domain
Entertainment
Communication
Health
Advertisement: REA [3] Greta [25] Persuasive Technology [28] Cat [29] Health advisor Fitness tutor [32]
Being a companion
Educate
Aibo [19] Tamagotchi [26]
My Real Baby [27]
Avatar [30]
Language tutor [31]
Aibo for elderly [33] Paro [29] Attention Capture for Dementia Patients [34]
Autismtic children [35]
These examples well illustrate a fundamental problem of this research field. The workshop participants can be considered experts in the field and all the proposed example applications were related to artificial characters, such as robots, conversational agents and avatars. Yet not one of these applications has become a lasting commercial success. Even Aibo, the previously somewhat successful entertainment robot, has been discontinued by Sony in 2006. A problem that all these artificial entities have to deal with is, that while their expression processing has reached an almost sufficient maturity, their intelligence has not. This is especially problematic, since the mere presence of an animated face raises the expectation levels of its user. An entity that is able to express emotions is also expected to recognize and understand them. The same holds true for speech. If an artificial entity talks then we also expect it to listen and understand. As we all know, no artificial entity has yet passed the Turing test or claimed the Loebner Prize. All of the examples given in Table 1 presuppose the existence of a strong AI as described by John Searle [36]. The reasons why strong AI has not yet been achieved are manifold and the topic of lengthy discussion. Briefly then, there are, from the outset, conceptual problems. John Searle [36] pointed out that digital computers alone can never truly understand reality because it only manipulates syntactical symbols that do not contain semantics. The famous ‘Chinese room’ example points out some conceptual constraints in the development of strong AIs. According to his line of arguments, IBM’s chess playing computer “Deep Blue” does not actually understand chess. It may have beaten Kasparov, but it does so only by manipulating meaningless symbols. The creator of Deep Blue, Drew McDermott [37], replied to this criticism: "Saying Deep Blue doesn't really think about chess is like saying an airplane doesn't really fly because it doesn't flap its wings." This debate reflects different philosophical viewpoints on what it means to think and understand. For centuries philosophers have thought about such questions and perhaps the most important conclusion is that there is no conclusion at this point in time. Similarly, the possibility of developing a strong AI remains an open question. All the same, it must be admitted that some kind of progress has been made.
HCI and the Face: Towards an Art of the Soluble
25
In the past, a chess-playing machine would have been regarded as intelligent. But now it is regarded as the feat of a calculating machine – our criteria for what constitutes an intelligent machine has shifted. In any case, suffice it to say that no sufficiently intelligent machine has yet emerged that would provide a foundation for our example applications given in Table 1. The point we hope to have made with the digression into AI is that the application dreams of researchers sometimes conceal rather unrealistic assumptions about what is possible to achieve with current technology.
4 Towards an “Art of the Soluble” The outcome of the workshop we organized was unexpected in a number of ways. Most striking was the vast mismatch between the concrete and fairly realistic description of the available FP technology and its limitations arrived at by one of the focus groups, and the blue-sky applications discussed by the second group. Another sharp contrast was evident at the workshop. The actual presentations given by participants were pragmatic and showed effective solutions to real problems in HCI not relying on AI. This led us to the reflection that scientific progress often relies on what the Nobel prize winning biologist Peter Medawar called “The Art of the Soluble” [38]. That is, skill in doing science requires the ability to select a research problem which is soluble, but which has not yet been solved. Very difficult problems such as strong AI may not yield to solution over the course of decades, so for most scientific problems it is preferable to work on problems of intermediate difficulty, which can yield results over a more reasonable time span, while still being of sufficient interest to constitute progress. Some researchers of course are lucky or insightful enough to re-frame a difficult problem in such a way as to reduce its difficulty, or to recognize a new problem which is not difficult, but nevertheless of wide interest. In the next two subsections we illustrate the general concept with examples from robotic facial expression synthesis as well as facial expression analysis. 4.1 Facial Expression Synthesis in Social Robotics As we argued in section 2, the problems inherited by HRI researchers from the field of AI can be severe. Even if we neglect philosophical aspects of the AI problem and are satisfied with a computer that passes the Turing test, independently of how it achieves this, we will still encounter many practical problems. This leads us to the socalled “weak AI” position, namely claims of achieving human cognitive abilities are abandoned. Instead, this approach focuses on specific problem solving or reasoning tasks. There has certainly been progress in weak AI, but this has not yet matured sufficiently to support artificial entities. Indeed, at present, developers of artificial entities must to resort to scripting behaviors. Clearly, the scripting approach has its limits and even the most advanced common sense database, Cyc [39] , is largely incomplete. FP should therefore not bet on the arrival of strong AI solutions, but focus on what weak AI solutions can offer today. Of course there is still hope that eventually also strong AI applications will become possible, but this may take a long time.
26
C. Bartneck and M.J. Lyons
Fig. 1. Robots with animated faces
When we look at what types of HRI solutions are currently being built, we see that a large number of them do barely have any facial features at all. Qrio, Asimo and Hoap-2, for example, are only able to turn their heads with 2 degrees of freedom (DOF). Other robots, such as Aibo, are able to move their head, but have only LEDs to express their inner states in an abstract way. While these robots are intended to interact with humans, they certainly avoid facial expression synthesis. When we look at robots that have truly animated faces, we can distinguish between two dimensions: DOF and iconic/realistic appearance (see Figure 1). Robots in the High DOF/Realistic quadrant not only have to fight with the uncanny valley [40] they also may raise user expectations of a strong AI which they are not able to fulfill. By contrast, the low DOF/Iconic quadrant includes robots that are extremely simple and perform well in their limited application domain. These robots lie well within the domain of the soluble in FP. The most interesting quadrant is the High DOF/Iconic quadrant. These robots have rich facial expressions but avoid evoking associations with a strong AI through their iconic appearance. We propose that research on such robots has the greatest potential for significant advances in the use of FP in HRI. 4.2 Facial Analysis for Direct Gesture-Based Interaction The second example we use to illustrate the “Art of the Soluble” strategy comes from the analysis of facial expressions. While there is a large body of work on automatic
HCI and the Face: Towards an Art of the Soluble
27
facial expression recognition and lip reading within the computer vision and pattern recognition research communities, relatively few studies have examined the possible use of the face in direct, intentional interaction with computers. However, the complex musculature of the face and extensive cortical circuitry devoted to facial control suggest that motor actions of the face could play a complementary or supplementary role to that played by the hands in HCI [1]. One of us has explored this idea through a series of projects using vision-based methods to capture movement of the head and facial features and use these for intentional, direct interaction with computers. For example, we have used head and mouth motions for the purposes of hands-free text entry and single-stroke text character entry on small keyboards such as found on mobile phones. Related projects used action of the mouth and face for digital sketching and musical expression. One of the systems we developed tracked the head and position of the nose and mapped the projected position of the nose tip in the image plane to the coordinates of the cursor. Another algorithm segmented the area of the mouth and measured the visible area of the cavity of the user’s mouth in the image plane. The state of opening/closing of the mouth could be determined robustly and used in place of mouse-button clicks. This simple interface allowed for text entry using the cursor to select streaming text. Text entry was started and paused by opening and closing the mouth, while selection of letters was accomplished by small movements of the head. The system was tested extensively and found to permit comfortable text entry at a reasonable speed. Details are reported in [41]. Another project used the shape of the mouth to disambiguate the multiple letters mapped to the keys of a cell phone key pad [42]. Such an approach works very well for Japanese, which has a nearly strict CV (consonant-vowel) phoneme structure, and only five vowels. The advantage of this system was that it took advantage of existing user expertise in shaping the mouth to select vowels. With some practice, users found they could enter text faster than with the standard multi-tap approach. The unusual idea of using facial actions for direct input may find least resistance in the realm of artistic expression. Indeed, our first explorations of the concept were with musical controllers using mouth shape to control timbre and other auditory features [43]. Of course, since many musical instruments rely on action of the face and mouth, this work has precedence, and was greeted with enthusiasm by some musicians. Similarly, we used a mouth action-sensitive device to control line properties while drawing and sketching with a digital tablet [44]. Here again our exploration elicited a positive response from artists who tried the system. The direct action facial gesture interface serves to illustrate the concept that feasible FP technology is ready to be used as the basis for working HCI applications. The techniques used in all the examples discussed are not awaiting the solution of some grand problem in pattern recognition: they work robustly in real-time under a variety of lighting conditions.
5 Conclusion In this paper we have argued in favour of an “Art of the Soluble” approach in HCI. Progress can often be made by sidestepping long-standing difficult issues in artificial
28
C. Bartneck and M.J. Lyons
intelligence and pattern recognition. This is partly intrinsic to HCI: the presence of a human user for the system being developed implies leverage for existing computational algorithms. Our experience and the discussions that led to this article have also convinced us that HCI researchers tend towards an inherently pragmatic approach even if they are not always self-conscious of the fact. In summary, we would like to suggest that skill in identifying soluble problems is already a relative strength of HCI and this is something that would be worth further developing.
References [1] Lyons, M.J.: Facial Gesture Interfaces for Expression and Communication, IEEE International Conference on Systems, Man and Cybernetics, The Hague (2004) [2] Lyons, M.J., Budynek, J., Akamatsu, S.: Automatic Classification of Single Facial Images. IEEE PAMI 21, 1357–1362 (1999) [3] Cassell, J., Sullivan, J., Prevost, S., Churchill, E.: Embodied Conversational Agents. MIT Press, Cambridge (2000) [4] Bartneck, C., Okada, M.: Robotic User Interfaces, HC2001, Aizu (2001) [5] Bartneck, C., Suzuki, N.: Subtle Expressivity for Characters and Robots. In International Journal of Human Computer Studies, vol. 62, Elsevier, pp. 306 (2004) [6] Fong, T., Nourbakhsh, I., Dautenhahn, K.: A survey of socially interactive robots. Robotics and Autonomous Systems 42, 143–166 (2003) [7] Zhai, S., Morimoto, C., Ihde, S.: Manual and gaze input cascaded (MAGIC) pointing presented at ACM CHI’99 [8] Tobii Technology, Tobii Technology (2007) Retrieved February 2007, from http://www.tobii.com/ [9] Picard, R.W.: Affective computing. MIT Press, Cambridge (1997) [10] Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE PAMI 22, 1424–1445 (2000) [11] Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recognition 36, 259–275 (2003) [12] Ekman, P., Friesen, W.V.: Unmasking the Face. Prentice-Hall, Englewood Cliffs (1975) [13] Bartneck, C., Reichenbach, J., Breemen, A.: In your face, robot! The influence of a character’s embodiment on how users perceive its expressions, Design and Emotion (2004) [14] Cerami, F.: Miss Digital World (2006) Retrieved August 4th, from http://www.missdigitalworld.com/ [15] Mori, M.: The Uncanny Valley, Energy, vol. The. Uncanny Valley 7, 33–35 (1970) [16] Ishiguro, H.: Towards a new cross-interdisciplinary framework, presented at CogSci Workshop Towards social Mechanisms of android science, Stresa (2005) [17] Bartneck, C.: Interacting with an Embodied Emotional Character, presented at Design for Pleasurable Products Conference (DPPI2004), Pittsburgh (2003) [18] Honda, Asimo (2002) Retrieved from http://www.honda.co.jp/ASIMO/ [19] Sony, Aibo (1999) Retrieved January, 1999, from http://www.aibo.com [20] NEC, PaPeRo (2001) Retrieved from http://www.incx.nec.co.jp/robot [21] Breemen, A., Yan, X., Meerbeek, B.: iCat: an animated user-interface robot with personality, 4th Intl. Conference on Autonomous Agents & Multi Agent Systems (2005) [22] Schiano, D.J.: Categorical Imperative NOT: Facial Affect is Perceived Continously, presented at ACM CHI’2004 (2004)
HCI and the Face: Towards an Art of the Soluble
29
[23] Russell, J.A.: Affective space is bipolar. Journal of personality and social psychology 37, 345–356 (1979) [24] Bartneck, C.: How convincing is Mr. Data’s smile: Affective expressions of machines. User Modeling and User-Adapted Interaction 11, 279–295 (2001) [25] Pelachaud, C.: Multimodal expressive embodied conversational agents, In: Proceedings of the 13th annual ACM international conference on Multimedia (2005) [26] Bandai, Tamagotchi (2000) Retrieved from http://www.bandai.com/ [27] Lund, H.H., Nielsen, J.: An Edutainment Robotics Survey, 3rd Intl. Symposium on Human and Artificial Intelligence Systems (2002) [28] Fogg, B.J.: Persuasive technology: using computers to change what we think and do. Morgan Kaufmann Publishers, Amsterdam, Boston (2003) [29] Catherine, Z., Paula, G., Larry, H.: Can a virtual cat persuade you?: The role of gender and realism in speaker persuasiveness, presented at ACM CHI’2006 (2006) [30] Biocca, F.: The cyborg’s dilemma: embodiment in virtual environments, 2nd Intl. Conference on Cognitive Technology - Humanizing the Information Age (1997) [31] Schwienhorst, K.: The State of VR: A Meta-Analysis of Virtual Reality Tools in Second Language Acquisition. Computer Assisted Language Learning 15, 221–239 (2002) [32] Mahmood, A.K., Ferneley, E.: Can Avatars Replace The Trainer? A case study evaluation, International Conference on Enterprise Information Systems (ICEIS), Porto (2004) [33] Tamura, T., Yonemitsu, S., Itoh, A., Oikawa, D., Kawakami, A., Higashi, Y., Fujimooto, T., Nakajima, K.: Is an entertainment robot useful in the care of elderly people with severe dementia? The. Journals of Gerontology Series A 59, M83–M85 (2004) [34] Wiratanaya, A., Lyons, M.J., Abe, S.: An interactive character animation system for dementia care, Research poster, ACM SIGGRAPH (2006) [35] Robins, B., Dautenhahn, K., Boekhorst, R., t. Boekhorst, R., Billard, A.: Robotic Assistants in Therapy and Education of Children with Autism: Can a Small Humanoid Robot Help Encourage Social Interaction Skills? In: UAIS, 4(2), 1–20. Springer-Verlag, Heidelberg (2005) [36] Searle, J.R.: Minds, brains and programs. Behavioral and Brain Sciences 3, 417–457 (1980) [37] McDermott, D.: Yes, Computers Can Think, in New York Times (1997) [38] Medawar, P.B.: The art of the soluble. Methuen, London (1967) [39] Cycorp, Cyc. (2007) Retrieved February 2007, from http://www.cyc.com/ [40] MacDorman, K.F.: Subjective ratings of robot video clips for human likeness, familiarity, and eeriness: An exploration of the uncanny valley, ICCS/CogSci-2006 (2006) [41] de Silva, G.C., Lyons, M.J., Kawato, S., Tetsutani, N.: Human Factors Evaluation of a Vision-Based Facial Gesture Interface, IEEE CVPR (2003) [42] Lyons, M.J., Chan, C., Tetsutani, N.: MouthType: Text Entry by Hand and Mouth, presented at ACM CHI’2004 (2004) [43] Lyons, M.J., Tetsutani, N.: Facing the Music: A Facial Action Controlled Musical Interface, presented at ACM CHI’2001 (2001) [44] Chan, C., Lyons, M.J., Tetsutani, N.: Mouthbrush: Drawing and Painting by Hand and Mouth, ACM ICMI-PUI’2003 (2003)
Towards Generic Interaction Styles for Product Design Jacob Buur and Marcelle Stienstra Mads Clausen Institute for Product Innovation, University of Southern Denmark Grundtvigs Allé 150, 6400 Sønderborg, Denmark {buur, marcelle}@mci.sdu.dk
Abstract. A growing uneasiness among users with the experience of current product user interfaces mounts pressure on interaction designers to innovate user interface conventions. In previous research we have shown that a study of the history of product interaction triggers a broader discussion of interaction qualities among designers in a team, and that the naming of interaction styles helps establish an aesthetics of interaction design. However, that research focused on one particular product field, namely industrial controllers, and it was yet to be proven, if interaction styles do have generic traits across a wider range of interactive products. In this paper we report on five years of continued research into interaction styles for telephones, kitchen equipment, HiFi products and medical devices, and we show how it is indeed possible and beneficial to formulate a set of generic interaction styles. Keywords: Interaction styles, interaction history, product design, user interface design, tangible interaction, quality of interaction.
Towards Generic Interaction Styles for Product Design
31
in education as a way of explaining the historical inheritance and debating the difference between alternative design solutions. Since user interaction design shares many characteristics with industrial design, we claim that interaction design can benefit greatly from an understanding of the concept of style. It can provide designers with strong visions and a sense of direction in designing new user interfaces. In particular we focus on user interface design for physical IT-products with small displays and dedicated keys, because of the tight coupling of interaction design and industrial design. The design of such user interfaces seems largely governed by technological progress, and to a large extent they seem to inherit user interface principles from the computer world, just one generation delayed. Human-Computer Interaction (HCI) interface principles were designed for full keyboard and mouse operation, therefore they become much more cumbersome with a tiny display and a limited number of keys. And in particular, when moving away from buttons and screen to forms of tangible interaction, HCI principles fall short of providing much help. We are concerned that interaction designers in enthusiasm with new technologies fail to transfer the qualities of use, which were achieved with previous technologies. It is, however, pointless to exactly copy products of the past as society’s needs and values have changed and technology has moved on. But we argue that it is possible to use the interaction style of a particular period as inspiration for an innovative blend of interaction style, functionality and technology within a contemporary interaction design. In this way we may be able to preserve qualities of interaction otherwise lost in history. In previous research we have shown that a study of the history of product interaction triggers a deep discussion of interaction qualities among designers in a team, and that the naming of interaction styles helps establish an aesthetics of interaction design [1, 2]. Since then we have expanded our research from industrial controllers to a broader range of interactive products including telephones, kitchen equipment, HiFi products and medical devices, and in this paper we will show that it is possible and beneficial to formulate a set of generic interaction styles for interactive products. Our research is based on two types of investigations: 1. 2.
Historical analysis of interactive products. We identify and characterise style eras for each of five product fields, then compare the style eras across the fields. Design experiments (research-through-design). We exaggerate the qualities found in historic eras, but implement them with contemporary technology, e.g., a mobile phone with the interaction experience of a 1930 rotary dial telephone. Then we analyse all design experiments across the five product fields to identify core dilemmas in current interaction design. Based on these investigations we propose a set of four generic interaction styles.
2 Interaction Styles in History The concept of style has been the focus of much debate within all genres of art, from literature and visual arts to architecture and design. In recent decades, emphasis has shifted from understanding style as a method of categorisation based on particular
32
J. Buur and M. Stienstra
conventions of content and norms to an understanding that styles are defined within social groups and essentially dynamic both in form and function [3, 4]. In a relatively new field as interaction design, discussions about style have only started recently, e.g., [2, 5, 6]. Style has been used for different purposes, to classify products and systems [6], but also to serve as an inspiration to create a specific look and feel [7]. In this paper we focus mainly on this last approach. In our understanding of style, the following concepts are important: ‘network of norms’, ‘style marker’, and ‘interpretation community’. Essential to style is, as Merleau-Ponty explains in [5], the fact that perception – which lies at the basis of stylization – ‘cannot help but to constitute and express a point of view’. Stylization thus starts the moment we perceive of an object and is an individual activity: it depends very much upon the person (his/her competences, references and experiences), and context in which the stylization takes place. We compare the object with similar objects based on, for example, function and usage. Essential to this systematic activity is the existence of a given system, which Enkvist calls the ‘network of norms’[8]: ‘a compilation of prior experiences with objects into a style taxonomy that makes it possible to find correspondences, both differences and similarities, between new objects and previous norms’. Enkvist observes that all style experiences arise from comparison. The comparison of artefacts that we see as similar lets us identify ‘style markers’, i.e. elements in the products that significantly correlate with or deviate from prevailing norms of design [8]. Our investigation covers five product genres: industrial controllers (in collaboration with Danfoss), telephones (Nokia), kitchen equipment, HiFi equipment (Bang & Olufsen), and medical devices (Novo Nordisk). We organised the style study as a yearly 2-week seminar for graduate design students with a new product genre each year. Each seminar included literature search, museum studies, interviews with curators, and videotaping of interactions with historic products. Product collections in museums provide a good opportunity to engage into the comparison activity. With groups of 16-20 students (our ‘interpretation community) we were able to cover 2-3 museums for each product genre; typically a combination of a science museum and a private company collection. To ensure a broad view of styles, we split into 3-5 teams, each with a particular focus of study: • Society context: What is the dominating view on humans and technology? • Hands and skills: What movements and skills are required to interact? • Technology: What main technologies are employed for functions and manufacturing? • Company Spirit: What is the dominating self-image of the manufacturer? Based on the collected data we sequenced a timeline in appropriate eras of a dominating ‘style’, characterise each style era, create an appropriate set of style names, and produce collage posters to communicate findings (see fig. 1). The posters then served as input for the ensuing design experiments (described in section 3). The naming of style eras posed a particular challenge. Where style labels in the history of architecture and design typically spring out of the style discourse of the period (e.g., De Stijl, Dada, Art Deco) and the origins of dominant pieces of art (e.g., Bauhaus, Pop Art, Swiss Style), the discussion of user interaction experience is rather
Towards Generic Interaction Styles for Product Design
33
Fig. 1. Four posters describing eras of telephone interaction styles, each covering the dominant aspects of community, hand movements & skills, knowledge allocation, technology & design
recent. So for each product genre we were in the unique situation of at the same time discussing and naming of all styles eras through the 20th century1. It became clear to us that interaction style names need to point to people, interaction purpose and experience, rather than to the visual identity of user interfaces (buttons, knobs, sliders). Thus we chose labels like Routine Caller (30s-70s telephones), Food Processor Queen (50s-70s kitchen equipment) and Analog Professional (60s-70s industry controllers) – rather flamboyant names to trigger the imagination of interaction designers. The naming discussions were long and intensive, because they seemed to condense the many observations and interpretations. In the first seminars we left the naming to a small sub-group to easier form consensus. But later we realised that this discussion may well be the core of forming a shared style understanding, as it contributes exactly to the development of the ‘network of norms’. The naming discussion seemed precisely to foster the building of shared norms in the investigation team. According to Engholm [5]: ‘the stylization will always depend on the discursive context that one is part of and on one’s historical, cultural or technical competence’. Therefore it was even more important that all students were involved, not just a small group. In addition to the naming activity, the format of style posters worked exceedingly well as a format to synthesize what we had seen at the museums. The graduate students took pride in their work and the style period labels quickly became part of the language repertoire in discussions. 1
One can argue that the invention of electricity also gave birth to the field of user interface design as we know it today. Therefore we focused in our study on products invented in the end of the 19th century and start of the 20th.
34
J. Buur and M. Stienstra
Fig. 2. Four interaction style eras presented in the form of a ‘style book’. The style eras have been generated after analyzing the timelines for all five product genres. The ‘operation’ of the pages symbolizes the main mode of interaction of the respective eras: turning, sliding, clicking, and brushing.
When we align the style timelines for all five product genres, it is obvious that they share similar developments, although for some genres a new era may arrive earlier than for others (see fig. 3). To stress this we have indicated rather sharp transitions between one style and the next (see fig. 2). In reality the eras should be seen as waves with large overlaps. One would expect this similarity, as all products in an overall sense are embedded in the same society discourses and draw on the same technology inventions. We have however refrained from coining composite style labels, as we feel that the collapsing of the specific product style names would result in too abstract names without sufficient imaginative power.
3 Designing with Interaction Styles To test the power of interaction style thinking, we challenged our graduate students to design contemporary digital products that would incorporate the interaction experience of each of the style eras studied: a mobile phone, a microwave oven, a motor controller. By keeping the specifications constant across styles we were able to compare experiences (see fig. 4). This lead to a large collection of design samples and many challenging discussions, like: how would you send an SMS with the feeling of an old time crank? Or how would you use the rotary dialing motion in a portable telephone?
Towards Generic Interaction Styles for Product Design
Fig. 3. Comparison of interaction style studies for five different product genres
35
36
J. Buur and M. Stienstra
Fig. 4. Four mobile phones inspired by respectively the Magic Connector period, the Routine Caller period, the Life Chatter period and the Information Navigator period (see also fig. 2)
Based on the design experiments we claim that interaction style thinking indeed helps designers to increase their sensitivity to experience issues and break with user interfaces conventions. We support this by three observations. Firstly, we observed a fine spread of interaction qualities in the designs that the graduate students produced following the history style studies. Along the way some teams found it very difficult to let go of their preoccupation with button and display technology, some simply copied user interface components of the past. In the end, however, all teams created designs that support rich actions and established convincing links between actions and functions. Secondly, the graduate students were able to compare their designs to exemplars in history and, most importantly, they were explicit about the expression of interaction they wanted to support. They demonstrated in their presentations that they had established a shared understanding of different interaction styles based in history and the respective qualities of each style. Thirdly, the students themselves were positive about the interaction style thinking compared to their prior experiences. One student, for instance, expressed his surprise about the richness of interaction history: »Inspiration from the past is like going to the beach - there is so much more to find.« Another one adds: »We did suffer from preconceptions. We think we know all about telephones already.« Our next step was to see if by looking at the designs themselves we were able to abstract generic characteristics. We analyzed four motor controller designs, four mobile phones, and five microwaves ovens. In order to reduce the risk of a circular argument (that what we learn from the designs only confirms what we knew already from the historical study), we added a set of 10 MP3 player designs. This assignment did not explicitly refer to historical interaction styles, but required students to design a new interface for an iShuffle-inspired MP3 player (no screen, very simple functionality) that would support rich interaction, bodily engagement and the expressiveness of product movements [9]. For clarity reasons only 17 of the 23 designs appear on the clustering diagram in fig. 5. The analysis helped us explicate two dilemmas in current (tangible) interaction design. One, the designs seem to support either an explanatory or an exploratory mode of interacting. The ‘explanatory’ designs provide a direct link between the goal you want to achieve, and how to get there. Every step is explainable: there is a feeling of being in control. The ‘exploratory’ designs, on the other hand, are less ‘serious’ in that they support a playful building of interaction skills, where the goal may be less important than the action itself.
Towards Generic Interaction Styles for Product Design
37
Fig. 5. Visual comparison of 17 student designs; 4 motor controllers, 4 telephones, 5 microwaves and 4 MP3 players. Only the MP3 players were not designed using the history styles explicitly as inspiration.
Two, there seems to be an important distinction between discrete and composite interface designs, or – put very bluntly – simple and complex products. The ‘discrete’ interaction designs favor one control for each function they offer (think of old radios with different buttons to choose wave lengths, sliders to select radio channels, knobs to adjust volume, treble, bass). Products with ‘composite’ interaction have general controls to access different, layered functions (think of the keypad on mobile phones).
4 Towards Generic Interaction Styles Looking back at the Interaction Styles in history and comparing them with the designs made by students, we argue that it is possible to extract four generic and contemporary interaction styles based on the presented material. We take inspiration from the work of Maaß & Oberquelle [10], who proposed four perspectives to explain differences in how designers conceive of the computer in relation to its users: the system perspective, the information processor perspective, the workshop perspective and the media perspective. We propose four interaction styles for interactive products characterized as follows: • Tangible Control (discrete, explanatory): the product exhibits its function through its design; the interface consists of several, discrete controls; the spatial arrangement of the controls supports the understanding of the product; the
38
J. Buur and M. Stienstra
interaction takes place there where the product is placed. This style supports the view that interactive technology is a tool that people employ to achieve a certain, explicit purpose. • Elastic Play (discrete, exploratory): there are specific controls for specific functions; the interface consists of a wide variety of general control types (buttons, sliders, handles etc.); the interaction supports physical input and feedback; learning to interact with the product requires both a cognitive and bodily understanding. Elastic Play banks on virtuosity: technology is an expressive instrument that people can learn to master, and aims growing with the skills. • Rhythmic Logics (composite, explanatory): the product is a complex system which consists of different layers; the interaction requires a cognitive understanding of the product; input is a rhythmic sequence of simple actions, like button tapping; the interaction focuses on efficiency; feedback is digitally mediated. Technology is an ‘intelligent’ partner that people negotiate sense with. • Touch-free Magic (composite, exploratory): the product reacts in surprising ways; it may not have one clear identity (e.g., phone, camera and music player in one); personal style (in appearance and/or interaction) is important - in a way, the user also becomes the designer of the product; the product supports an exploratory type of interaction with no or very light touch; the product may move and respond physically, but there is no tactile feedback; interaction with the product takes place there where the user is. This style supports the view of technology as a wonder, as something unexplainable, a magic that people can learn to engage in/with.
5 Conclusions The generic interaction styles presented here are based on five studies of interaction history combined with a number of conceptual design experiments. They have come into being after long discussions amongst interaction designers and researchers. They clearly refer to qualities of interaction from the past, but have a contemporary character being based upon current technology and needs and values of today’s society. A next step would be to investigate how the generic interaction styles work for interaction designers who haven’t been involved in the preceding discussions. Rather than use our generic style proposals as an analytic tool or as design guidelines, we aim at provoking interaction designers to discuss how they relate to their own product genres. Such discussions are vital in order for common understanding and agreement to arise, and to create a shared ‘network of norms’ in Enkvist’s sense [8]. Are the descriptions, examples and illustrations provided for each style enough for designers to serve as the inspiration we intended them to be? Or are specific activities – like museum studies - required to get a deeper understanding of the styles, and what could such activities be? Employed as a trigger for discussion we believe that the generic interaction styles can help interaction designers to innovate the dominant user interface conventions.
Towards Generic Interaction Styles for Product Design
39
Acknowledgments. We would like to thank the IT Product Design students at the University of Southern Denmark for their enthusiastic participation in the interaction style experiments, in particular Mads Vedel Jensen, Peng Cheng, Mette Mark Larsen, Ken Zupan, Kyle Kilbourn, Anda Grarup, René Petersen and Yingying Wang who created poster style guides and helped analyze the material.
References 1. Øritsland, T.A., Buur, J.: Taking the best from a company history - designing with interaction styles. In: Symposium on Designing Interactive Systems 2000, ACM Press, New York (2000) 2. Øritsland, T.A., Buur, J.: Interaction Styles: An Aesthetic Sense of Direction in Interface Design. International Journal of Human-Computer Interaction 15(1), 67–85 (2003) 3. Chandler, D.: An Introduction to Genre Theory, [WWW document], [15.02.2007] (1997), URL: http://www.aber.ac.uk/media/Documents/intgenre/intgenre.html 4. Ylimaula, A.M.: Origins of style - Phenomenological approach to the essence of style in the architecture of Antoni Gaudi. In: Mackintosh, C.R., Wagner, O. (eds.) University of Oulu, Oulu, Finland (1992) 5. Engholm, I.: Digital style history: the development of graphic design on the Internet. Digital Creativity 13(4), 193–211 (2002) 6. Ehn, P., et al.: What kind of car is this sales support system? In: On styles, artifacts, and quality-in-use. In Computers and design in context, MIT Press, Cambridge (1997) 7. Engholm, I., Salamon, K.L.: Webgenres and -styles as socio-cultural indicators - an experimental, interdisciplinary dialogue. In: The Making, Copenhagen, Denmark ( 2005) 8. Enkvist, N.E.: Någat om begrepp och metoder i språkvetenskaplig stilforskning. In Om stilforskning/Research on Style, Stockholm, Sweden: Kunglig Vitterhets Historie och Antikvitetsakademien (1983) 9. Djajadiningrat, J.P., Matthews, B., Stienstra, M.: Easy Doesn’t Do It: Skill and Expression in Tangible Aesthetics. Special Issue on Movement of the Journal for Personal and Ubiquitous Computing, forthcoming 10. Maaß, S., Oberquelle, H.: Perspectives and Metaphors for Human-Computer Interaction. In: Floyd, C., et al. (eds.) Software Development and Reality Construction, pp. 233–251. Springer, Heidelberg (1992)
Context-Centered Design: Bridging the Gap Between Understanding and Designing Yunan Chen and Michael E. Atwood College of Information Science & Technology Drexel University, Philadelphia, PA, 19104, USA {yunan.chen, michael.atwood}@ischool.drexel.edu
Abstract. HCI is about how people use systems to conduct tasks in context. Most current HCI research focuses on a single or multiple users’ interaction with system(s). Compared with the user, system and task components, context is a less studied area. The emergence of ubiquitous computing, context-aware computing, and mobile computing requires system design to be adaptive and respond to aspects of setting in which the tasks are performed, including other users, devices and environments. Given the importance of context in information system design, we note that even the notion of context in HCI is not well-defined. In this paper, we review several theories of context as it relates to interaction design. We also present our Context-centered Framework which is aimed to bridging end users’ understand and designers’ designing together. The research design and expected outcomes are also presented.
Context-Centered Design: Bridging the Gap Between Understanding and Designing
41
context of the interaction itself. Dourish [4] identified two perspectives for context: representational and interactional. He argues that the correct focus for research is on the interaction between objects and activities and not solely on the representation of the objects. We concur with observation and also with Greenberg’s point [5] that context is not a fixed, descriptive element, but is a dynamic and interactive element. Designing context-aware system for complex environments is very challenging because the knowledge needed to solve this complex problem is processed by people who typically work in different domains. This is known as the Symmetry of Ignorance, and communication breakthrough is needed in these cases [6]. Since endusers live in their context they understand the context much better than system designers do. But, end-users must rely on others to design the systems they need. Doing so effectively requires a shared understanding of context to ensure a good design in context rich environment. To solve this problem, in this paper we presented a Context-centered framework for interactive system design which is intended to answer the following three research questions. − What is context when it is applied in interactive design? − What are the components of the context? − How can we use context to bridge the gap between understanding and designing?
2 Literature Review: Theories and Metaphors Although many current theories within HCI do not explicitly address context issue, some consideration of context is embedded in these theories. We review these theories in this section (Table 1). Table 1. Theories and Metaphors Applicable to Using of Context in HCI
Activity Theory Distributed Cognition
Basic Unit of Analysis An activity -a form of doing directed to an object that transforms an object into an outcome. A cognition system composed of individuals and the artifacts they use.
Situated Action
The activity of persons-acting in setting
GMOS
GMOS-user’s cognitive structure
Awareness
Awareness -knowledge about the state of some environment
Locales Framework
Locales –the relationship between social world and its interactional needs, and the “site and means” its members use to meet those needs.
Components of context Subject, Tools, Object, Rules, Community, Division of labor Goals, Internal Representation, External Representation Person, Activity
Setting, Relationship between person and setting Goals, Operators, Methods for achieving the goals, Rules for choosing methods People, Artifacts, Time Actions happened and happening Locales foundation, Civic structures, Individual views, Interaction trajectory, Mutuality
42
Y. Chen and M.E. Atwood
2.1 Activity Theory Activity theory is a research framework originating in Soviet Psychology in the 1920s [7]. The application of activity theory has lately been introduced to information systems area [8, 9]. The object of AT is to understand the unity of consciousness and activity [9]. Emphasis on Context: Nardi [9] argued that the AT is a descriptive tool which provides different perspective on human activity. Activity theory begins with the notion of activity. Unlike many other theories which take human actions as a unit of analysis, AT takes actions and the situated context as a whole and calls this an activity. Context is the activity and the environment in which it occurs. 2.2 Distributed Cognition Distributed cognition [10, 11] theory believes that humans augment their knowledge by placing memories, facts, or knowledge on the objects, individuals, and tools in their environment. Distributed cognition breaks the traditional internal and external boundary and combined them together as a distributed system. Emphasis on Context: Distributed cognition system together is a context for the activities they are carried out. Since distributed cognition theory focus on the distributed nature of activity solving process, it takes into account people, artifacts situated in various locations. It is widely adopted in Computer Supported Collaborative Work (CSCW) studies which emphasized on the collaborations across multiple participants [12]. 2.3 Situated Action Situated action was first introduced in 1987 by Lucy Suchman [13]. Rather than decompose the circumstances and the actions being taken by a preset plan, situated action theory think that the actions are highly contextualized; the context of specific situation determined what the next action is. Suchman believes that people construct their plan as they go along in the situation, creating and altering their next move based on what has just happened, rather than planning all actions in advance and simply carrying out that plan. Emphasis on Context: Situated action theory believes that context is a dynamic thing associated with actions. From the situated action point of view, an action plan, is not pre-defined, but consists many unpredicted actions which determined by the specific context it is situated. In this way we could define and analyze context as an interaction entity from the action point of view. 2.4 Locales Framework Locales framework [14, 15] is a theory that create a shared abstraction among stakeholders and bridge understanding and design in CSCW field. Basically, a locale
Context-Centered Design: Bridging the Gap Between Understanding and Designing
43
is a space together with the resources available there which has particular relationship with social world and interaction needs to meet people’s needs. Locales could either be a physical space or a virtually shred environment. Emphasis on Context: Though Fitzpatrick only studied the locales in the CSCW field, the notion of ‘locales’ could be applied to any interaction situation. A locale is an individual context in this sense and the framework could help identify locales five properties. This would potentially 2.5 GOMS GOMS [16] is a method for modeling and describing human task performance. GOMS is an acronym that stands for Goals, Operators, Methods, and Selection Rules, the components of which are used as the building blocks for a GOMS model. Emphasis on Context: GOMS provides an alternative view of context. Context, instead of a shared environment and the people, artifacts inside it, it is a means to select and conduct activities. Context does not necessarily to be tangible artifacts. Like distributed cognition theory claim, human cognition is part of context too. Though the rules are not physical artifacts, it restrict the why which actions is carried out. 2.6 Awareness Awareness is generally defined in terms of two concepts: knowledge and consciousness. In the HCI scope, awareness is studied as it relates to the interaction between an agent and its environment. Emphasis on Context: Dourish and Bellotti [17] defined awareness as “an understanding of the activities of others, which provides a context for your own activities.” In this sense, awareness could be simply defined as “knowing what’s going on in the context [18].” This definition indicated that awareness is associated with the context under which the intended task is being processing. Also knowing what’s going on provide users feedbacks and conscious of the context. 2.7 Contextual Factors Identified From the above review, we conclude that context, although defined and used different in these theories, does share some common elements. The contextual factors associated with each theory are outlined (table 2). Our review and analysis suggested that Context is not a fixed, descriptive element. Instead, it is a dynamic and interactive element which arises from the activity and is particular to each occasion of activity.
44
Y. Chen and M.E. Atwood Table 2. Contextual Factors Extracted from HCI Theories
Factors Motivation Goal Activity
Rules Constraint Awareness Methods People Objects Settings
Explanations The reasons for a action The intend outcome for the a action Action
Principles or regulations of a action Limitation or restriction of a action Knowing what’s going on Different ways of conducting a action People involved in a action and their roles Relevant artifacts Either physical or virtual space for a action
3 Context Revisit Given the importance of context in the system design and the contextual factors extracted from the previous theories, we are interested in what exactly context is and context aware system from this activity bounded view. 3.1 Context Definition Both Dourish’s [4] point and the literature review above indicate that context is a property of an interaction between objects and activities, not of the objects in the environment alone. From this interactional point of view, context is “a relational property held between objects or activities. We can not simply say that something is or is not context; rather, it may or may not be contextually relevant to some particular activity.” [4] This viewpoint shows that context is a dynamic property which particular to each occasion of activity or action. Therefore, context in our definition is: A dynamic property aroused from activities. It interacts and constraints activities happened within it. 3.2 Context-Aware System A context aware application is adaptive, reactive, responsive, situated, contextsensitive and environment-directed [19]. Since the definition of context varies depending on the different usage, the notion and usage of context-aware application also differs greatly. In the early stage, context-aware has been depicted as “adapts according to its location of use, the collection of nearby people and objects, as well as
Context-Centered Design: Bridging the Gap Between Understanding and Designing
45
changes to those objects over time.” [20] Context depicts in this definition is only as representational problem. What does the context-aware mean when it is an interactional property? Dey [2] define context-aware as: “a system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user’s task”. There is no doubt that adaptive and responsive to the surrounding environment is the key characteristic of context-aware computing. From the activity point of view, contextual information is decided by the activities happened within it, a task is a more general notion, and a task may contain many goal oriented activities. Therefore we define context-aware applications as: a system which could incorporate relevant contextual information and be adaptive to the situation it is situated, whereas the contextual information is determined by the goal oriented activities users carried out to complete tasks.
4 Context-Centered Framework Our Context-Centered Framework is intended both to incorporate context into design and to facilitate communication between end users and designers. Compared with locales framework of considering context as static environment, we adapt a dynamic view to combine context with the task solving process. End users could use this framework to identify the contextual information associated with their working activities. It also assists designers to analyze the system features and to validate it in the context. We take activity as a unit of analysis in this framework. 4.1 Action as a Unit of Analysis The review shows that context is inseparable from activities, whether something is considered to be context or not is determined by its relevance to a particular activity. Therefore, we set the unit of analysis in our study to an activity level. From the interaction point of view, contextual information is initiated from and bounded by the activities happened within it. According to Nardi’s [9] hierarchical levels of activities, activities are long-term formations and their objects can not be translated into outcomes at once, but through a process consisting often of several steps or phrases of actions. Actions under the same activity are related to each other by the same overall object and motive. 4.2 Context-Centered Framework Aspects From the hierarchy of activity point of view [9] , the activity is similar to the task which users are trying to accomplish, actions are steps of achieving it, and operation are procedures under each steps. Context differs in each step and also the overall task. For each action, there are four aspects to analyze it. These four aspects are highly interdependent and overlapping. They have been connected by the same action under taken. Combined together, the aspects have the potential to capture many contextual characteristics in the working settings.
46
Y. Chen and M.E. Atwood
Goal: First thing in understanding the context is to identify the object of the activity. It could determine what relevant context information is. Goal includes users’ motivation, intended outcome of performing this activity. Setting: Setting is a place where participants perform the activity; it could be either virtual or physical environment. The relevant setting information includes: − − − −
Who are the people who conduct this activity and their roles The characters of the setting where the activity performs The available tools like other available methods and approaches; The artifacts involved in the setting like other devices and objects.
Rules: Rules of using the resources in current setting and constraints of using any tool how users will perform the activity. E.g. Time constraint for an action. − Constraints of using the resources in the working settings − Rules of allocating resources Awareness: An understanding of the others (either objects or people), which provides feedbacks and conscious of the context and the activities. − The shared context: Aware of other people who involved in the activity and their roles; Aware of the tools and artifacts in the current settings; Aware of the rules/constraints for performing this activity − Actions: Aware of the actions has been taken; Aware of the actions is carrying out now. Table 3. Contextual factors identified Goal Setting Rules Awareness
Object determines what contextual information for the activity is. Setting is the place where participants perform activities. In includes the resources involved in the task solving process. Rules and constraints of using the resources. An understanding and conscious of the setting and activity.
5 Research Design In order to understand how can we use context to bridge the gap between understanding and designing. We designed a 2x2 experiment to test: 1) whether the context-centered Framework could bridge understanding and designing and 2) whether the context-centered Framework could generate a better design compared to non-contextual consideration. A scenario-based design (SBD) [21, 22] approach is applied to our experiment. SBD is an ideal way to measure the context implication in design [23]. Two group of students will be recruited to conduct to generate scenarios based on a given tasks. Students who had HCI courses are believed to have certain design expertise; whereas students who have nursing training are considered as end-users. We will apply two conditions to these students: with and without context-centered framework training (table 4).
Context-Centered Design: Bridging the Gap Between Understanding and Designing
47
Table 4. Research Design
Without training With training
Designers Group D1 Group D2
End Users Group E1 Group E2
The hypotheses are: Ha: In both designers and end-users groups, using Context-centered Framework could produce better design scenarios. Hb: Without context consideration, designers will generate better design scenarios; whereas with context-centered framework training, end-users could generate better design scenarios. To assess the quality of the scenarios, two HCI experts will review the scenario quality and score them according to the quality of scenarios to design.
6 Expected Results We believe that the focus on context could improve communication between endusers and designers. This focus will also produce high quality scenarios which will lead to better design products. We expect that without contextual centered framework instruction, designers (Group D1) will produce better interaction scenarios than end users (Group E1); whereas when context is taken into consideration, end users (Group E2) will generate high quality scenarios than the designers (Group D2).
7 Conclusion and Future Work We intend to use the context-centered framework to connect end users’ understanding of working setting and designer’s design activities. We believe that the results of this study will be relevant to both researchers and practitioners and will help in designing useful and usable system for two reasons. First the context-centered framework can be a starting point to help analyst and designers understand working environment. The task dependent framework could be used to generate initial question and direct observations. It could also capture working settings from the end users’ point of view. Second, a context-centered framework can be used by system designers to help identify where features can be added to enhance existing design, to identify task related context issues and how to incorporate then into system design. Our future work includes conducting experiment for this study, and also we intend to adapt the context-centered framework to a contextual walkthrough for system evaluation.
References 1. Greenbaum, J., Kyng, M. (eds.): Design at Work: Cooperative Design of Computer Systems. Lawrence Erlbaum Ass, Hillsdale, New Jersey (1991)
48
Y. Chen and M.E. Atwood
2. Dey, A.K., Abowd, G.D., Salber, D.: A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications. Human-Computer Interaction 16, 97–166 (2001) 3. Schilit, B., Theimer, M.: Disseminating active map information to mobile hosts. IEEE Netwk 8, 22–32 (1994) 4. Dourish, P.: What we talk about when we talk about context. Personal Ubiquitous Comput. 8, 19–30 (2004) 5. Greenberg, S.: Context as a Dynamic Construct. Human-Computer Interaction 16, 257– 268 (2001) 6. Rittel, H.: Second-Generation Design Methods. In: Cross, N. (ed.) Developments in Design Methodology, pp. 317–327. John Wiley & Sons, New York (1984) 7. Wertsch, J.V.: Vygotsky and the Social Formation of Mind. Harvard University Press, Cambridge, MA, London (1985) 8. Bødker, S.: A human activity approach to user interfaces. Human-Computer Interaction 4, 171–195 (1989) 9. Nardi, B.: Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge (1996) 10. Hutchins, E.: Cognition in the Wild. The MIT Press, Cambridge, MA (1996) 11. Zhang, J., Norman, D.A.: Representations in Distributed Cognitive Tasks. Cognitive Science 18, 87–122 (1994) 12. Rogers, Y., Ellis, J.: Distributed cognition: An alternative framework for analysing and explaining collaborative working. Journal of Information Technology 9, 119–128 (1994) 13. Suchman, L.A.: Plans and Situated Actions: The Problem of Human-Machine Communication. Cambridge University Press, New York (1987) 14. Fitzpatrick, G., Mansfield, T., Kaplan, S.M.: Locales framework: exploring foundations for collaboration support, pp. 34–41 (1996) 15. Fitzpatrick, G., Kaplan, S., Mansfield, T.: Applying the Locales Framework to Understanding and Designing. In: Proceedings of the Australasian Conference on Computer Human Interaction, p. 122. IEEE Computer Society Press, Los Alamitos (1998) 16. Card, S.K., Moran, T.P., Newell, A.: The phychology of human computer interaction. Lawrence Erlbaum Associates, Inc, Hillsdale, NJ (1983) 17. Dourish, P., Bly, S.: Portholes: Supporting Awareness in a Distributed Work Group. In: Proceedings of the Conference on Human Factors in Computing Systems, Monterey, CA, 541–547 (1992) 18. Gutwin, C., Greenberg, S.: A Descriptive Framework of Workspace Awareness for RealTime Groupware. Computer Supported Cooperative Work (CSCW) 11, 411–446 (2002) 19. Abowd, G.D., Dey, A.K., Brown, P.J., Davies, N., Smith, M., Steggles, P.: Towards a Better Understanding of Context and Context-Awareness. In: Proceedings of the 1st international symposium on Handheld and Ubiquitous Computing, pp. 304–307. SpringerVerlag, Karlsruhe, Germany (1999) 20. Schilit, B., Theimer, M.: Disseminating Active Map Information to Mobile Hosts. IEEE Network 8, 22–32 (1994) 21. Carroll, J.: Scenario-Based Design: Envisioning Work and Technology in System Development. John Wiley & Sons, Chichester (1995) 22. Rosson, M.B., Carroll, J.M.: Usability Engineering: scenario-based development of human-computer interaction. Morgan Kaufmann, Seattle, Washington (2001) 23. David Pinelle, C.G.: Groupware walkthrough: adding context to groupware usability evaluation. In: Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, pp. 455–462. ACM Press, Minneapolis (2002)
Application of Micro-Scenario Method (MSM) to User Research for the Motorcycle’s Informatization - A Case Study for the Information Support System for Safety Hiroshi Daimoto1,3, Sachiyo Araki1, Masamitsu Mizuno1, and Masaaki Kurosu2,3 1 YAMAHA MOTOR CO., LTD., Japan National Institute of Multimedia Education, Japan 3 Department of Cyber Society and Culture, The Graduate University for Advanced Studies, Japan 2
Abstract. The Micro-Scenario Method (MSM) is an approach to uncover the consumer needs and establish the development concepts [2]. In this study, the MSM is applied to the Information Support System for Safety related to a motorcycle and devised for application efficiency. What is devised is to make a prescriptive model up before interview research and set up the syntax rules of the problem-scenario (a description sentence of problem situation). As a result, the development efficiency is improved by the modified MSM. The communication of relevant parties can be speeded up, because the prescriptive model which keywords are structurally organized helps development actors share wide-ranging information about problem situations. Moreover, the creation time of problem-scenario can be cut, because the syntax rule of problem-scenario simplifies how to describe it. Though the modified MSM is an effort to take MSM in practical use at YAMAHA Motor Company Ltd. (YMC), the modified MSM was considered as a useful approach to reduce the workload of HCD (Human-Centred Design).
user needs and to improve their usability. This approach corresponds to the activity of “the understanding and specifying the context of use” in the early development stage of ISO13407. The purpose of present paper is to propose the modified MSM that is improved in terms of the analytical method regarding the problem-scenario (pscenario). There are two distinctive improvements on the analysis of the p-scenario, which consists of “the prescriptive model” and “the syntax rule”.
2 The Prescriptive Model and the Syntax Rule of the Modified MSM 2.1 The Prescriptive Model The prescriptive model consists of structured keywords derived from literature research. The prescriptive model is exploited for covering rough aspects of the target fields and utilized to facilitate understanding of the research contents among development actors (user, engineer, designer, usability engineer). Before the interview research, we made up the prescriptive model (see Fig.1) that is organized from a standpoint of i) rider’s factors (physical factor, emotional factor, personality factor, information processing factor), ii) vehicle body factor (breakdown, poor maintenance, etc), iii) environmental factor (surrounding vehicles, traffic situation, road surface condition, etc). Fig.1 shows the structured accident cause of a motorcycle. The keywords about the accident cause are grouped and organized structurally such as a KJ method [1].
Fig. 1. Prescriptive model about the accident factors of a motorcycle
Table 1. Detail descriptions of contextual factors on the accidents of a motorcycle
Application of MSM to User Research for the Motorcycle’s Informatization 51
52
H. Daimoto et al.
After the interview research, the prescriptive model is revised by adding keywords derived from interviews. Table 1 shows the detail descriptions of contextual factors on the motorcycle accidents. The prescriptive model is based on this structured classification. The prescriptive model is utilized for participants to understand the whole image of the accident causes. At a stage of p-scenario analysis, the prescriptive model is utilized for usability engineers to analyze the accident causes by connecting the prescriptive model (keywords) with the p-scenario. 2.2 The Syntax Rule The p-scenarios are derived from organizing the interview data and the literature research. The person who is responsible for usability takes plenty of time for making p-scenarios. Because the text derived from the interview and the literature are huge volumes of data. Therefore, the writers of scenarios have a hard time how to describe the p-scenarios. Their way of writing the p-scenarios vary considerably from person to person. It is resolved by setting up the syntax rules of the p-scenarios. The syntax rule is to regulate the words that should be described. Fig.2 shows a case example of p-scenario for the Information Support System for Safety related to motorcycles. Fig.3 shows the traffic situation (Japanese keeping to the left) of the p-scenario. This case example of p-scenario is described about “subject-object”, “provided information”, “to whom”, “when”, “condition of rider”, “situation of environment and other vehicles”, “a kind of hazard”, “means”. When one's own motorcycle goes straight through an intersection on a green light while there is a preceding vehicle (truck, etc.) and an oncoming right-turn car, and both the motorcycle’s rider and the oncoming car’s driver fail to see each other, there is a risk that the oncoming right turn car might come into the intersection by mistake. Therefore, it is desirable to indicate the presence of one's own motorcycle to the oncoming car's driver. However, such a means does not exist.
Fig. 2. A case example of p-scenario for the Information Support System for Safety related to a motorcycle
The p-scenarios were made up to cover the all keywords of the prescriptive model of Fig.1. When the p-scenario of Fig.2 is connected with the prescriptive model of Fig.1, the accident factors (keywords of the prescriptive model) that are assumed by the p-scenario are “invisible (= an invisible oncoming car)”, “surrounding vehicle (= a preceding vehicle)”, and “road geometry (= an intersection)”.
3 An Application Study of Modified Micro-Scenario Method
Fig. 3. A traffic situation of Fig2
An application study of the modified MSM for the Information Support System for Safety related to a motorcycle as follows. The modified MSM is characterized by “the prescriptive model” and “the syntax rule”.
Application of MSM to User Research for the Motorcycle’s Informatization
53
3.1 Participants Participants for the interview research were 20 working people between the ages 20’s and 50’s. Table 2 shows the detail attributions and the number of participants. The general riders were gathered through a research company for payment. The instructors were a driving instructor of a motorcycle. The other selection criteria of the participants were (1) to ride a motorcycle more than twice a month, (2) to have the experience that they run on a highway. The Attributes were scattered as much as possible to hear a voice of various riders. Participants for the questionnaire research were 20 working people same as the interview research. However, the 4 participants could not participate in the questionnaire research. The data analysis of questionnaire was performed for 16 participants. Table 2. The attributions of participants
3.2 Procedure At the interview research, each participant answered our questions (“What kinds of information do you want under what circumstances?” etc) and explained the context of the problem situation. Additionally, each participant was presented 15 typical traffic scenes (e.g. “crossing collision”, “right turn”, and “bump from behind” etc.) to lead an accident through the safety education teaching materials for motorcycles, and reported the requirements of motorcycle’s informatization for safety in each scene that was covered the major traffic situations. At the end points of the each reporting, each participant was also presented the prescriptive model of each scene, which showed an envisioned accident factor. Then, after having been explained about a general accident cause of each scene, each participant was demanded to report the
54
H. Daimoto et al.
more detailed requirements. The interview takes about two hours each. The voice data of the interviews is recorded. At the questionnaire research, each participant answered the questionnaire about “Level of importance (How important is the problem to be solved for motorcycle safety?)”, “Degree of risk (How dangerous it is due to the absence of means to solve?)”, and “Frequency (How often it is to encounter a case requiring the means?)” in a range of five-point for each p-scenario that was derived from the interview research. The p-scenarios were made up refer to the syntax rule. 3.3 Result The prescriptive model is exploited for two purposes. First is to cover rough aspects of the accident factors and facilitate understanding of the accident causes among development actors. The participants are easy to share the whole image of the accident causes with interviewers and easy to report the requirements without exception. As a result of the interviews, 66 p-scenarios regarding the various traffic situations that cover the 39 keywords of the prescriptive model was made up. Table 3 shows the number of p-scenario for each traffic situations that derived from the interviews and the literature. Table 3. The number of p-scenario for each traffic situations
Second is to analyze the accident cause by connecting the prescriptive model with the p-scenario (39 keywords x 66 p-scenario). We can make an important accident factor clear by analysing the ten high-scored p-scenarios (see Fig.3-1, Fig.3-2). Table 4 shows the result of the accident factor analysis regarding the checked factor (checked = 1). The result indicates that “surrounding vehicles”, “invisible”, and “road geometry” was the particularly important factors.
Application of MSM to User Research for the Motorcycle’s Informatization Table 4. The result of the accident factor analysis
Fig. 3-1. High-scored p-scenarios
55
56
H. Daimoto et al.
*1
The score of supporting data is the average one of questionnaire about “Level of importance (How important the problem is for the motorcycle safety?)”, “Degree of risk (How dangerous it is due to the absence of means to solve?)”, and “Frequency (How often it is encounter a case requiring the means?)” for p-scenarios of the best 10. *2 The order of ten high-scored p-scenarios is defined by the overall score (overall score = “Level of importance” x “Degree of risk” x “Frequency”). Fig. 3-2. High-scored p-scenarios
The syntax rule is exploited for describing the p-scenarios systematically. As a result of having used the syntax rule, the writer of the p-scenario made up 66 pscenarios based on the text data of the interview and the literature. Without the syntax rule, the writer of the p-scenario would not make up the p-scenario effectively and spend much time in vain. In fact, the writer of the p-scenario reported that there was the syntax rule and was easy to write the p-scenario.
4 Summary The purpose of present study is to propose the modified MSM that is improved in terms of the analytical method regarding the p-scenario. Specifically, the modified
Application of MSM to User Research for the Motorcycle’s Informatization
57
MSM is applied to the Information Support System for Safety related to a motorcycle, the application example is shown. Two distinctive improvements are obtained on the modified MSM, which are “the prescriptive model” and “the syntax rule”. As a result of the application, it is indicated that (1) the prescriptive model helps development actors to share wide-ranging information about the accident causes structurally, (2) the prescriptive model helps usability engineers to make a detailed analysis of the accident causes, (3) the syntax rule helps scenario writers to make up the p-scenario easily. In the early development stage of HCD, it was considered that scenario method was effective [3]. MSM is a method of analysis using the scenario technique for a qualitative data such as an interview data, and is getting clear a frame of methodology. However, most of the adaptation example to the real development field has never been introduced. The present study is a case study of MSM, and the modified MSM is an effort to apply MSM to a practical development at YMC. The coverage and quantitative evaluation of the modified MSM are future problems, and this method will be improved by taking in more voice of the real development field.
References 1. Kawakita, J.: Hassouhou. Chuko Shinsho, Tokyo, [in Japanese] (1967) 2. Kurosu, M.: Micro Scenario Method. Research Reports on National Institute of Multimedia Education, 17, [in Japanese] (2006) 3. Carroll, J.M.: Five reasons for scenario-based design. In: Proceedings of the 32nd Hawaii International Conference on System Sciences (Maui, HI, January 4-8), [published as CDROM] pp. 4–8. IEEE Computer Society Press, Los Alamitos, CA (1999) 4. Carroll, J.M.: Scenario-Based Design of Human-Computer Interactions. MIT Press, Boston, MA (2000)
Incorporating User Centered Requirement Engineering into Agile Software Development Markus Düchting1, Dirk Zimmermann2, and Karsten Nebe1 1
University of Paderborn C-LAB, Cooperative Computing & Communication Laboratory, Fürstenallee 11, 33102 Paderborn, Germany 2 T-Mobile Germany, Landgrabenweg 151, 53227 Bonn, Germany {markus.duechting, karsten.nebe}@c-lab.de, [email protected]
Abstract. Agile Software Engineering approaches gain more and more popularity in today’s development organizations. The need for usable products is also a growing factor for organizations. Thus, their development processes have to react on this demand and have to offer approaches to integrate the factor “usability” in their development processes. The approach presented in this paper evaluates how agile software engineering models consider activities of Usability Engineering to ensure the creation of usable software products. The user-centeredness of the two agile SE models Scrum and XP has been analyzed and the question of how potential gaps can be filled without loosing the process’ agility is discussed. As requirements play a decisive role during software development, in Software Engineering as well as Usability Engineering. Therefore, different User Centered Requirements that ensure the development of usable systems served as basis for the gap-analysis. Keywords: Agile Software Engineering, Usability Engineering, User-Centered Requirements.
Incorporating User Centered Requirement Engineering
59
the spiral consists of four major activities and ends with a progress assessment, followed by a planning phase for the next process iteration. Additionally, a risk assessment is performed after each iteration. The iterative approach allows reacting adequate on changing requirements. This makes the process of developing software more manageable and minimizes the risk of failure, in contrast to the sequential SE Model.
2 Agile Software Engineering A recently emerging trend in SE focuses on lightweight, so called agile models, which follow a different approach to software development. Agile models follow the idea of Iterative and Incremental Development (IID), similar to the Spiral Model mentioned above. But in contrast to Boehm’s model, the iteration length is shorter in agile models. The iterations in the Scrum Model for instance, take 30 calendar days. Agile software development does not rely on comprehensive documentation and monolithic analysis activities; instead they are more delivery- and code-quality-oriented approaches. Through co-location of the development team the tacit knowledge among the team members compensates extensive documentation efforts. Agile models emphasize communication, and aspire towards early and frequent feedback through testing, on-site customers and continuous reviews. The basic motivation behind agile and iterative development is to acknowledge that software development is similar to creating new and inventive products [8]. New product development requires the possibility for research and creativity. It is rarely possible to gather all requirements of a complex software system upfront and identify, define and schedule all detailed activities. Many details emerge later during the development process. This is a known problem within the domain of SE and the reason for many failed projects [8]. For this reason, agile models implement mechanisms to deal with changing requirements and other unforeseen incidents to plan, monitor and manage SE activities. 2.1 Scrum Scrum is an agile and iterative-incremental SE model. Its development tasks are organized in short iterations, called Sprints. Each Sprint starts with a Sprint Planning meeting where stakeholders decide the functionality to be developed in the following Sprint. All requirements for a software system are collected in the Product Backlog. The Product Backlog is a prioritized list and serves as a repository for all requirements related to the product. However, the Product Backlog is not at any time a finalized document but rather evolves along with the product. In the beginning of a project the Product Backlog only contains high-level requirements and it becomes more and more percise during the Sprints. Each Backlog item has a priority assigned to represent its’ business value, and an effort estimation to plan the required resources to implement it. During the Sprint Planning, the Scrum Team picks high priority backlog items that they think are realistic for the next Sprint. The Scrum Teams are small interdisciplinary groups of 7 to 9 people [12], which are self-organized and have full authority to determine the best way for reaching the
60
M. Düchting, D. Zimmermann, and K. Nebe
Sprint Goals. There are no explicit roles defined within the Scrum Team. Scrum places emphasis on an emergent behavior of the team, meaning the teams develop their mode of cooperation autonomously. This self-organizing aspect supports creativity and high productivity [12]. The Scrum Team and its’ manager - the Scrum Master – meet in a short, daily meeting, called Daily Scrum, to report progress, impediments and further proceedings. Every Sprint ends with a Sprint Review meeting, where the current product increment is demonstrated to project stakeholders. 2.2 Extreme Programming Extreme Programming [1] is one of the established agile SE methodologies. Similar to Scrum, XP is an iterative-incremental development model. However, XP’s iterations are even shorter than Scrum’s. According to Beck the optimal iterationlength is somewhere between 1 and 3 weeks. XP adopts reliable SE techniques to a very high degree. Continuous reviewing is assured by pair programming, where two developers are sitting together at one workstation. XP also applies the common code ownership principle. All team members are allowed to make changes in code written by someone else when it is necessary. In addition, XP requires a user stakeholder to be on-site as a mean to gather early user feedback. The requirements in XP are defined by the customer in so called User Stories. Each story is a brief, informal specification of requirements. Similar to Scrum’s Product- and Sprint Backlog, the User Stories have a priority and effort estimation assigned to it. Before a new iteration starts, the User Stories are decomposed into more granular technical work packages. The literature about XP does not mention an explicit design phase, but highly emphasizes continuous refactoring and modeling. The functionality described in User Stories is converted into test cases. The simplest concept that passes the test is implemented. Development is finished, when all tests are passed.
3 User Centered Design A recent trend can be observed, showing that usability criteria become a sales argument for products and the awareness for the need of usable systems is growing. But many software development projects are mainly driven by the SE model that is used. Usability Engineering (UE) provides a wide range of methods and systematical approaches to support the user-centered development. These approaches are called Usability Engineering Models (UE Models), e.g. the Usability Engineering Lifecycle [9] or Goal-Directed Design [4]. Mayhew’s UE process consists of three phases, which are processed sequentially. The first Phase is the Requirement Analysis, followed by Design/Testing/ Development Phase and the Installation of the product. The process is iterative: Concepts, preliminary and detailed design are evaluated until all problems are identified and resolved. In the Goal Directed Design Process of Cooper, several phases are passed through as well. During the Research Phase, qualitative research leads to a picture of how users do work in their daily work environment. During the Modeling Phase Domain Models and User Models (so called Personas) are developed that are then translated
Incorporating User Centered Requirement Engineering
61
into a Framework for the design solutions, which is detailed in the Refinement Phase. These two models have much in common since they describe an idealized approach to ensure the usability of a software system, but they usually differ in the details. UE Models usually define an order of activities and their resulting deliverables. UE approaches often happen concurrently to the other software development activities, so there is an obvious necessity for integrating these two approaches, in order to permit the predictability of budgets, resources and timelines of the UE activities within software development.
4 Motivation According to Ferre [7] basic conditions for integrating SE and UE are an iterative approach and active user involvement. The two agile SE models outlined above are iterative-incremental approaches that rely on a solid customer involvement. They even talk about user representatives as a special kind of customer stakeholders. The involved customer should at least have a solid knowledge of the user’s domain and their needs. This raises the question, if and how Usability is ensured in an agile software development process in order to perform UE activities in a satisfying way. This paper discusses the user-centeredness of two agile SE Models and the question how potential gaps can be filled without loosing the process agility. When exploring the UCD Models described above, there is a commonality with the traditional SE Models. Both are strongly driven by phases and the resulting deliverables. However, documentation has a minor part in agile models. Due to their incremental approach and overlapping development phases there are no distinct phases, like e.g. Analysis, Design, Development and Validation, in agile SE Models. Without certain deliverables or activities there is a need for other criteria to allow an assessment of the user-centeredness of agile SE Models. Requirements play a decisive role during the software development lifecycle, in both the SE and the UE domain. SE is mainly concerned with system requirements, while UE takes the user’s needs into account. Requirements are measurable criteria and the elicitation, implementation and validation takes place in most approaches to software development. The approach of defining granular requirements allows to look at activities independent of the larger modules, which lends itself well to the agile approach of developing smaller increments that ultimately add up to the final system, and not preparing a big design up front. In order to develop recommendations for the integration the authors analyze Scrum and XP to see how they are able to adopt UCD activities, specifically how they can utilize UCD requirements. The Requirement Framework introduced in the following section offers a way to approach this.
5 User Centered Requirements Based on a generalized UCD model defined in DIN EN ISO 13407 [5], Zimmermann & Grötzbach [13] describe a Requirement Engineering framework where three types of requirements are generated, each of which constitutes the analysis and design outcome for one of the three UCD activity types. Usability Requirements are
62
M. Düchting, D. Zimmermann, and K. Nebe
developed during the Context of Use analyses; which revolves mainly around the anticipated user, their jobs and tasks, their mental models, conceptions of the usage of the system, physical environment, organizational constraints and determinants and the like. It is important to elicit these findings from actual users in their context of use, in order to get a reliable baseline for requirements pertaining to users’ effectiveness, efficiency and satisfaction. These requirements can be used as criteria for the system and intermediate prototypes through Usability Tests, questionnaires, or expert based evaluations. The Workflow Requirements focus on individual workflows and tasks to be performed by a user. Task performance models are elicited from users, the workflow is optimized for, and an improved task performance model is generated. The outcome of this module is a set of requirements pertaining to a specific user’s interaction with the system in the context of a specific workflow or task, e.g. as described in use case scenarios. The requirements describe the discrete sub-steps of a user’s interaction flow and the expected behavior of the system for each of these steps in an optimized workflow. It is important to validate these requirements against the usability requirements with users, e.g. by comparing an optimized workflow to the current state of workflow performance with regard to effectiveness, efficiency and user satisfaction. Workflow Requirements are ideal input for test cases, against which prototypes or the final system can be tested, either through usability tests or expert evaluations. The User Interface (UI) Requirements, generated in the Produce Design Solution activities, define properties of the intended system that are derived from Usability or User Requirements, e.g. interaction flow or screen layout. During the development phase, the UI Requirements provide guidance for technical designers regarding the information and navigation model, which can then be aligned with other technical models. They also help programmers implement the required functions using the correct visual and interaction model. UI requirements serve as criteria for the actual system that has been developed, i.e. to determine if it follows the defined model for layout and interaction. These evaluations can be user or expert based, and can be conducted during system design and testing. By translating UI Requirements into test cases, this evaluation step is facilitated.
6 Proceedings The authors used the User Centered Requirements summarized above as a basis for a gap-analysis in order to determine whether the two agile SE Models (Scrum and XP) consider the three types of requirements adequately. As the different requirements have distinct stages, they have to be elicited, implemented and evaluated appropriately. The fulfillment of the requirements will guarantee user centeredness in the development process. In order to prepare the gap-analysis the authors used the description of the different requirements to derive several criteria used for the assessment. The goal was to specify criteria which apply to both models. Thereby there is no 1:1 relation of the stages (elicitation, implementation, evaluation) and the criteria derived for the different types of requirements. Thus, there might be no criteria at a specific stage for a specific type of requirement, as the framework suggests. As an example, selected criteria for the UI Requirements are shown in Table 1.
Incorporating User Centered Requirement Engineering
63
Table 1. Selection of criteria, defined for the UI Requirements,based on the definition in 5 Elicitation develop appropriate representation of workflow by UI designer
Implementation verify feasibility
specify interaction and behavioral detail
transform architecture into design solutions
Evaluation evaluate if UI meets UI methods to measure improvements in effectiveness and efficiency verify requirements and refine existing requirements
According to the criteria the two agile models Scrum and XP have been analyzed regarding whether the criteria’s are met. This allows comprehensive statements about the considerations of UE activities and outcomes in agile SE. The analysis results are presented by each type of requirement and in the order of the three stages and are based on the model description from the sources cited above. Subsequent to the analysis the authors give recommendations for the two agile SE Models that enhance the consideration of the three requirement types in Scrum and XP. 6.1 Implementation of User Centered Requirements The results of the analysis for the Usability Requirements (Table 2) show, that neither Scrum nor XP consider this type of requirements appropriately. During the elicitation of Usability Requirements only one criteria, the consideration of stakeholder input is partly fulfilled by both models. The insufficient acquaintance of overarching Usability Requirements can also be determined in evaluation activities. Just one criterion is met by the Scrum Models to some extend. Table 2. Selection of criteria, defined for the Usability Requirements. + fulfilled; - not fulfilled; o partly fulfilled. Usability Requirements Elicitation observe user in context of use consider workflow-oriented quality criteria measurable, verifiable and precise usability requirements gather and consider stakeholders input Evaluation verify if requirements are met measure end user's satisfaction check requirements and refine existing requirements
SCRUM
XP
─ ─ ─ o ─ ─ o
─ ─ ─ o ─ ─ ─
During the elicitation of Workflow Requirements hardly any of the criteria could be found in Scrum or in XP. Except that the XP model does partly fulfill the criteria to verify if the new workflow is an improvement from the user’s perspective. However, the agile models posses’ solid strengths in the evaluation of user requirements. The only criterion which is not met by both models is the verification of workflow mockups against the improved workflow. The impact for the usability, because of these unconsidered criteria, is negligible.
64
M. Düchting, D. Zimmermann, and K. Nebe Table 3. Selection of criteria, defined for the Workflow Requirements
Workflow Requirements Elicitation specify system behavior for given task, related to concrete goal check if new workflow is an improvement form users perspective Evaluation check correctness and completeness of workflow description check workflow mockups for correctness, completeness and possibly find new requirements verify requirements and refine existing requirements verify that final system meets requirements
SCRUM
XP
─
─
─
o
+
+
─
─
+ o
+ +
Table 4. Selection of criteria, defined for the User Interface Requirements User Interface Requirements Elicitation develop appropriate representation of workflow by UI designer specify interaction and behavioral detail Implementation verify feasibility transform architecture into design solutions Evaluation evaluate if UI meets UI requirements concluding evaluation to see if system meets requirements methods to measure improvements in effectiveness and efficiency verify requirements and refine existing requirements
SCRUM
XP
─
─
─ + + o ─
─ + + ─ +
─
─
+
─
The elicitation of User Interface Requirements is not provided by any of the two models. However, for the criteria of implementation activities, both models provide an opportunity to verify feasibility of certain interaction concepts and consider technical constraints for design decisions before the UI concepts are implemented. In terms of the evaluation of UI Requirements the two models have several distinctions. The Scrum Model provides a way to verify UI Requirements with users and experts whereas there is no information about a comparable activity in the literature for the XP Model. As opposed to that, XP does perform concluding evaluations to see if the system meets the UI Requirements within the scope of automated tests. Both models do not consider measuring the improvements of the user’s effectiveness and efficiency. 6.2 Conclusion and Recommendations Looking at the summarized results it becomes apparent that both agile models have significant deficiencies in handling User-Centered Requirements. Usability Requirements are treated insufficiently in the important stages of development. Regarding to more detailed requirements the agile models possess certain strengths and the potential for the integration with UE activities.
Incorporating User Centered Requirement Engineering
65
Workflow Requirements for instance are dealt with appropriately regarding evaluative activities. But it needs to be assured that they are elicited and processed adequately during previous stages from an UE standpoint. The development can be essentially influenced on the granular level of UI Requirements. However, the UI requirements have to be derived from correct workflow descriptions and qualitative Usability Requirements. The recommendations listed below, provide suggestions to endorse the two models in order to include the criteria of User-Centered Requirements and ensure the usability of a software product. The recommendations are derived from the results of the analysis described above. In the descriptions of both models an explicit exploration phase prior to the actual development is mentioned. The development teams work out system architecture and technology topics to evaluate technical feasibilities, while customer stakeholders generate Product Backlog Items (in Scrum) or User Stories (in XP). Compared to common UE analysis activities the exploration phases in Scrum and XP are rather short and are supposed to not exceed one usual process iteration. Nevertheless, this exploration phases can be used by UE experts to endorse the particular development teams in a rough exploration of the real users in their natural work environment. In order to stay agile it is important to not insist on comprehensive documentation of the results, rather than emphasizing on lightweight artifacts and sharing the knowledge with the rest of the team. Having an UE domain expert in the development team also assures that generic Usability Requirements are taken into account during requirement gathering activities. Due to the vague definition of the customer role in Scrum and XP it is not guaranteed that real users are among the group of customer stakeholders. From an UE point of view, it is essential to gather information regarding the context of use and the users’ workflows and to validate early design mockups and prototypes with real users. Therefore, it is necessary to explicitly involve users on-site for certain UE activities instead of different customer stakeholders, even though when they claim to have a solid knowledge of the end users needs. The Product Backlog (in Scrum) and the User Stories (in XP) would be the right place to capture Workflow Requirements. However, there is the risk of loosing the “big picture”, of how single system features are related to each other, because both artifacts focus on the documentation of high level requirements instead of full workflows. Modeling the workflow with Essential Use Cases and scenario based descriptions [10] would be sufficient, but is not intended by any of the two models. Scrum and the XP do not intend to perform usability tests to verify if the requirements are met, nor they measure the users satisfaction, e.g. using questionnaires. However, the Sprint Review in Scrum offers facilities to expert evaluations involving people with UE expertise and/or real users among the Scrum Team and as attendees of the Sprint Review. This can not substitute comprehensive usability evaluations, but helps to avoid user problems at an early stage. System testing in terms of usability is a problem in agile models because the solutions are specified, conventionalized and developed in small incremental steps. However, to perform a usability test with real users, the system has to be in a certain state of complexity to evaluate the implementation of their workflows. In traditional SE models, also using incremental development, these workflows and regarding requirements are documented forehand and a prototype could be developed regarding
66
M. Düchting, D. Zimmermann, and K. Nebe
such a set of requirements for one workflow to be tested with the users. It certainly does not make sense to demand for Usability Testing subsequent to each process iteration, but the tests could be tied to a release plan. Agile models provide good opportunities for a close collaboration between developers and designers during development activities. Due to the overlapping development phases and the multidisciplinarity of the development teams, the feasibility of certain interaction models can be compiled with developers frequently and without fundamentally slowing down design and implementation activities. Regarding to this, design decisions can consider Usability Requirements and technical constraints in an easy and early stage. In terms of the evaluation of UI Requirements the two models differ in their proceedings. The Sprint Review in Scrum can be used to review the user interface in order to verify whether the design meets the previously defined specifications - presuming that those specifications have been created and defined as Sprint Goals beforehand. XP does not stipulate a review meeting like the Scrum Model. Unlike to Scrum, the XP Model explicitly demands for constant testing on a frequent basis. Certain subsets of UI Requirements are suited for automated test, e.g. interaction or behavior related requirements. But it is barely possible to test the conformity to a style guide regarding the accurate implementation.
7 Summary and Outlook The underlying criteria for the assessment do not claim to be exhaustive. Anyhow, they show the right tendencies and allow to make statements in terms of the realization in the particular models. The approach presented in this paper is used to evaluate how agile software engineering (SE) models consider activities of usability engineering (UE) in order to ensure the creation of usable software products. The user-centeredness of the two agile SE models Scrum and XP has been analyzed and the question how potential gaps can be filled without loosing the process agility was discussed. As requirements play a decisive role during software development, either in software engineering but also in usability engineering, the authors assumed that requirements can serve as the common basis on which agile SE models can work together with the results of usability engineering activities. The User Centered requirements, defined by Zimmermann and Grötzbach, describe three types of requirements derived from the results of UCD activities outlined in DIN EN ISO 13407 [5]. By using these three types of requirements the authors derived more specific criteria in order to perform a gap-analysis of the two agile models. As a result, the fulfillment of the criteria allowed comprehensive statements about the considerations of UE activities and outcomes in agile SE. It turned out that both agile models have significant deficiencies in handling User-Centered Requirements. Usability Requirements are treated insufficiently in all the important stages of development. The presented approach has been used to acquire first insights about the ability of agile SE models in creating usable software. However, the authors are well aware of the need for further more extensive and more specific criteria. Using and applying them to other agile models will enable to derive more generic statements about the integration of UE in agile SE models in general.
Incorporating User Centered Requirement Engineering
67
References 1. Beck, K.: Extreme Programming explained. Addison-Wesley, Boston (2000) 2. Boehm, B.: A Spiral Model of Software Development and Enhancement. IEEE Computer 21, 61–72 (1988) 3. Cohn, M.: User Stories Applied – For Agile Software Development. Addison-Wesley, Boston (2004) 4. Cooper, A.: About Face 2.0. Wiley Publishing Inc, Indianapolis, Chichester (2003) 5. DIN EN ISO 13407: Human-centered design processes for interactive systems. Brussels, CEN - European Committee for Standardization (1999) 6. DIN EN ISO 9241-11. Ergonomic requirements for office work with visual display terminals (VDTs) – Part 11: Guidance on usability. International Organization for Standardization (1998) 7. Ferre, X.: Integration of Usability Techniques into the Software Development Process. In: Proceedings of the 2003 International Conference on Software Engineering. pp. 28–35, Portland (2003) 8. Larman, C.: Agile & Iterative Development – A Manager’s Guide. Addison-Wesley, Boston (2004) 9. Mayhew, D.J.: The Usability Engineering Lifecycle. Morgan Kaufmann, San Francisco (1999) 10. Rosson, M.B., Carrol, J.M.: Usability Engineering: Scenario-Based Development of Human-Computer Interaction. Academic Press, London (2002) 11. Royce, W.: Managing the Development of Large Software Systems. In: Proceedings of IEEE WESCON. vol. 26, pp. 1–9 (August 1970) 12. Schwaber, K., Beedle, M.: Agile Software Development with Scrum. Prentice Hall, Upper Saddle River (2002) 13. Zimmermann, D., Groetzbach, L.: A Requirement Engineering Approach to User Centered Design. In: HCII 2007, Beijing (2007)
How a Human-Centered Approach Impacts Software Development Xavier Ferre and Nelson Medinilla Universidad Politecnica de Madrid Campus de Montegancedo 28660 - Boadilla del Monte (Madrid), Spain {xavier, nelson}@fi.upm.es
Abstract. Usability has become a critical quality factor in software systems, and it requires the adoption of a human-centered approach to software development. The inclusion of humans and their social context into the issues to consider throughout development deeply influences software development at large. Waterfall approaches are not feasible, since they are based on eliminating uncertainty from software development. On the contrary, the uncertainty of dealing with human beings, and their social or work context, makes necessary the introduction of uncertainty-based approaches into software development. HCI (Human-Computer Interaction) has a long tradition of dealing with such uncertainty during development, but most current software development practices in industry are not rooted in a human-centered approach. This paper revises the current roots of software development practices, illustrating how their limitations in dealing with uncertainty may be tackled with the adoption of well-known HCI practices. Keywords: uncertainty, software engineering, waterfall, iterative, HumanComputer Interaction-Software Engineering integration.
How a Human-Centered Approach Impacts Software Development
69
leading to a need for integration of usability methods into SE practices, providing them the necessary human-centered flavor. The term "Human-Centered Software Engineering" has been coined [25] to convey this idea. In contrast, HCI practitioners need to show upper management how their practices provide value to the company in the software development endeavor, in order to get a stronger position in the decisiontaking process. HCI and SE need to understand each other so that both can reciprocally complement with effectiveness. While SE may offer HCI practitioners participation in decision-making, HCI may offer their proven practices that help in dealing with the uncertainty present in most software development projects. In the next section the diverging approaches of HCI and SE are analyzed. Next, in section 3 the role of uncertainty in software development is outlined, elaborating on problem-solving strategies and how they apply to software development. Section 4 presents how joint HCI-SE strategies may be adopted for projects where uncertainty is present. Finally section 5 presents the conclusions gathered.
2 HCI and SE Development Approaches SE is defined as the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software [13]. In the pursuit of these objectives, SE has highlighted software process issues, and it has also traditionally focused on dealing with descriptive complexity. On the other hand, HCI is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use in a social context, and with the study of major phenomena surrounding them [22]. Usability is the main concern for HCI, and it is multidisciplinary by essence. The HCI view on software development is, in a certain sense, broader than the SE one, which mostly focuses on the running system in isolation. In contrast, HCI does not handle with comparable deepness specific issues, like software process or software architecture. Fig. 1 shows how SE and HCI differ about their main subject of interest in software development. While HCI cares about the impact created by the software in the user and his social context, SE focuses mainly on the correctness of the running software system itself. Software engineers mostly consider usability as a user interface issue, usually dealt with at the end of development, when the `important´ part of the system has already been built. Alternatively, HCI experts carefully study the users and their tasks, in order to better fit the system to the intended users, and they consider that once the system interaction has been defined software engineers may begin `building´ the system. There is a high degree of misunderstanding between both fields, along with some lack of appreciation for the work performed by the other discipline. Practitioners of both fields think it is them who do the "important job" in software development. Comparing HCI to SE it may look like as lacking maturity. In this direction, Mayhew states that integration of usability engineering with the existing software development lifecycle has not yet been solved, mostly due to the state of maturity of the Usability Engineering discipline [20]. Alternatively, SE methods may look too system-centered for an effective user-system interaction, as understood in HCI.
70
X. Ferre and N. Medinilla
Fig. 1. Comparison between HCI and SE main focus
Despite this lack of mutual understanding, both disciplines need to collaborate, since there is a non-trivial overlapping between their respective objects of study and practice. In particular, requirements-related activities are considered a cornerstone of the work of both HCI and SE. The decision of which system is going to be built is quite important for usability purposes, so HCI has a lot to say about it, while requirements engineering is a SE subdiscipline with a recognized importance in the field, so software engineers will not be handing completely requirements-related activities to usability experts. The traditional overall approach to development in SE has been the waterfall lifecycle. In relation to requirements, it is based on requirements which are fixed (frozen) at early stages of development. Nevertheless, the waterfall lifecycle is considered nowadays in SE as only valid for developing software systems with lowmedium complexity in domains where the development team has extensive experience. As an alternative to the waterfall, iterative development is currently identified as the development approach of choice, even if its practical application finds some opposition. On the contrary, HCI has traditionally adopted an iterative approach to development. Therefore, some promising opportunities for SE-HCI collaboration come out. Conflicts may arise between both kinds of practitioners, but they must be solved if usability is to be considered a relevant quality attribute in mainstream software development. Fortunately, recent trends in SE show a higher acceptance of uncertainty in software development, and this can provide a higher appreciation for HCI practices, as explained in the next sections.
How a Human-Centered Approach Impacts Software Development
71
3 Uncertainty in Software Development Uncertainty is currently accepted as a necessary companion of software development [3],[19]. However, SE has traditionally considered uncertainty as harmful and eradicable. The aim was to try to define a "safe" space where no uncertainty could affect the work of software developers. The development of software systems of higher complexity levels has led to the need of changing this approach. In order to deal with complexity, the traditional SE view considers only descriptive complexity (quantity of information required to describe the system, according to Klir & Folger [17]). It is a useful dimension to work in the software universe but, on most occasions, it is not enough on its own to explain the software universe. Descriptive complexity needs to be combined with the complexity due to uncertainty, which is defined by Klir & Folger as the quantity of information required to solve any uncertainty related to the system [17]. Ignoring uncertainty in software development obstructs the objective of better coping with highly complex problems to be addressed by software systems, since it narrows the interpretation of both the problem and the possible strategies for building a successful solution. Complexity due to uncertainty adds a new dimension to the software space, as shown in Fig. 2. When extending the software universe dimensions to two, some hidden issues that hinder software development projects are uncovered, and new solutions emerge. complexity due to uncertainty
software universe descriptive complexity
Fig. 2. Extension of the software universe when considering the uncertainty dimension
Dealing with uncertainty is unavoidable in software development. But it is not just an undesired companion in the software development journey, it can be used as a tool that offers a powerful mean of attacking recurring problems in software development. Having uncertainty-based means in the toolbox of software development teams, offers them a richer background and vision to better tackle their work in the complex software universe. The usage of uncertainty as a tool in software development takes several forms: The introduction of ambiguity in the solution and the adoption of problem-solving strategies that manage uncertainty.
72
X. Ferre and N. Medinilla
3.1 Ambiguity as a Way of Introducing Uncertainty in the Solution Abstraction is a simplification tool that expresses just the essential information about something, leaving out the unnecessary details. This omission deliberately introduces uncertainty, which manifests in the form of ambiguity. An abstraction is precise with respect to the essence of the topic conveyed, but it is necessarily ambiguous with respect to the particulars, which are intentionally taken out of the picture. When making design decisions, uncertainty also plays a major role in providing solutions which are easier to maintain, modify or extend. For example, the hiding information principle [21], promotes the introduction of uncertainty in the design, by not providing details on how a particular module is implemented. Modularization on its own does not provide benefits for this purpose, since a careful design of the modules and their headers is necessary for attaining the necessary relation of indifference between modules. Any design decision that attempts to introduce some degree of ambiguity in the solution being developed uses uncertainty as a tool for allowing easier future modifications. As a collateral effect, development usually gets more complex and more difficult to manage when employing uncertainty-based strategies; in a similar way to object-oriented design being more complex than the structured development paradigm, but providing a more powerful and less constrained instrument for the development of complex software systems. 3.2 Problem-Solving Strategies and Uncertainty Human beings use different strategies according to the extent of the uncertainty they must confront. A linear or industrial strategy may be employed with zero or negligible uncertainty; a cyclical or experimental strategy when having medium uncertainty (something is known); and an exploratory or trial and error strategy when high uncertainty needs to be dealt with. The higher the uncertainty level provided by the strategy, the higher will be its power for dealing with uncertainty (in the problem). Linear strategy (step after step) follows a path between a starting point to an ending one, given that both points and the path between them are known in advance. That is, it is necessary to know the problem, the solution, and the way to reach such solution. If all these requirements are met, the lineal strategy is the cheapest one. In order to make possible its application, any uncertainty needs to be eradicated before beginning the resolution process. The paradigm that represents the linear strategy in software development is the waterfall life cycle. It follows the sequence requirements, analysis, design, implementation, and testing, which is a direct translation of the Cartesian precepts enunciated in the Discourse on Method [8]: Evidences, analysis, synthesis and evaluation. The idea behind these principles is to undertake in the first place the what and afterwards the how. This separation between requirements and design is an abstract goal and not a human reality [1]. The so called incremental strategy is a variant of the linear one where the problem is divided into pieces, which are then undertaken one by one. Cyclical or experimental strategy (successive approximations), when converging, comes progressively closer to an unknown final destination through the periodical refinement of an initial proposition (hypothesis). A cyclical strategy is adopted when
How a Human-Centered Approach Impacts Software Development
73
the solution is unknown, but there is enough information on the issue to be able to formulate a hypothesis. The paradigm for the cyclical strategy in the software world is the spiral model [2]. A common statement in software development is to describe each cycle in the spiral model as a small waterfall. This is inappropriate, since the spiral recognizes the presence of uncertainty throughout the (risk-driven) process, and the waterfall, whatever its size, requires eradicating the uncertainty at the beginning. Arboreal or exploratory strategy (trial and error) is the way to reach an unknown destination without a best first guess, provided that the universe is closed. In the case of an open universe, the exploratory strategy does not ensure finding the solution, but none of the other strategies may ensure it, given the same conditions of uncertainty. An exploratory strategy is in place every time a solution is discarded and development goes back to the previous situation. The Chaos life cycle [23] is very close to an exploratory strategy, but it is limited by Raccoon's waterfall mindset. 3.3 Uncertainty and HCI HCI has developed interactive software for decades, without the obsession about uncertainty eradication present in SE. In fact, HCI literature has some examples of insight regarding real software development. Hix & Hartson's [10] observations about the work of software developers show that they usually operate in alternating waves of two complementary types of activities: both bottom-up, creative activities (a synthesis mode) and top-down, abstracting ones (an analysis mode). Hix & Hartson also unveil the closeness that exists between analysis and design activity types, especially in the requirements-related activities. It is not sensible, then, to try to draw a clear separation between both activity types. With regard to methodologies in place in software development companies (based in a waterfall approach), they report that in some of their empirical studies they noticed that "iterative and alternating development activities occurred, but because corporate standards required it, they reported their work as having been done strictly top-down" [10]. The reality of development was hidden behind the mask of order of the waterfall. According to Hakiel "There is no reason why a design idea might not survive from its original appearance in requirements elicitation, through high- and low-level design and into the final product, but its right to do so must be questioned at every step" [9]. This approach is a radical separation from the waterfall mindset mostly present in SE, which was traditionally presented as the way to develop software in an orderly manner. The multidisciplinary essence of HCI has helped in providing a not so rigid approach to development in the field. As Gould and Lewis [12] say, when a human user is considered (as in the upper part of Fig. 1) a coprocessor of largely unpredictable behavior has been added. Uncertainty is a companion of any attempt to develop interactive systems of non-trivial complexity, since human beings are part of the supra-system we are addressing: the combination of the user and the software system, trying to perform tasks which directly relate to the user goals. User-centered or human-centered development is the HCI approach to the development process, and it has traditionally introduced uncertainty when labelling himself as iterative. In this sense, [5], [10], [16], [22], and [26] agree on considering iterative development as a must for a user-centred development process. Therefore,
74
X. Ferre and N. Medinilla
iterativeness is at the core of HCI practices. A real iterativity, in the sense that evaluation is often considered as formative; not just an exam for identifying mistakes, but a tool for giving form to the interaction design, and maybe for identifying new requirements.
4 Common HCI-SE Problem-Solving Strategies As presented in the previous section, uncertainty is a tool for problem resolution; in particular, it is a tool for interactive software development. Uncertainty-based approaches have been adopted in the resolution strategies of both HCI and SE, without labeling them as such. When trying to integrate usability and HCI methods into mainstream development, the extensive HCI experience in dealing with uncertainty may be incorporated into SE practices, making them better prepared to cope with the development of complex systems with a high usability level. Non-linear problem-solving strategies present important challenges with respect to estimation and planning, along with the danger of continuously iterating without advancing towards the solution. A certain degree of flexibility is necessary for dealing with these issues, as HCI usually employs. Accordingly, some degree of uncertainty will have to be introduced in the formal procedures advocated by SE methodologies. 4.1 Iterative Development Iterative-cyclical strategies are currently at the center of debate in SE, with agile and iterative practices (see, for example, [18]). When adopting cyclical strategies of this kind, the introduction of HCI practices may be undertaken with greater success than former proposals for integration into waterfall lifecycles, like [7]. The aim of integrating usability engineering and HCI practices into mainstream software development, which mostly refuses to deal with uncertainty, have led to more formal solutions, in a SE sense, but leaving out the uncertainty present, for example, in iterative approaches. Such as Mayhew's Usability Engineering Lifecycle [20], which is based on a two-step process where analysis activities are performed in a first phase, and then design and evaluation activities are performed iteratively on a second phase; but there is no place for resuming analysis activities. Therefore, it is based on a frozen requirements paradigm, with reminiscences of a waterfall mindset. Nevertheless, iterativeness has been at the heart of usability engineering practices because usability testing has been the central point around which the whole development effort turns. It is necessary to test any best-first-guess design. Observational techniques and sound analysis are performed with the aim of getting a high quality first design, but usability testing with representative users is then performed to check against reality the logical constructs the design is made of. The expected functionality and quality levels of the final system can be specified, but there is a certain degree of uncertainty in building the solution, the software system, in the sense that when undertaking the construction of some part of the system we do not exactly know how far we are from the specified solution. This is especially true when dealing with usability. Any design decision directed to usability
How a Human-Centered Approach Impacts Software Development
75
improvement needs to be tested with representative users, in order to check the actual improvement in usability attributes like efficiency in use. When the system under scrutiny includes the final user on top of the computer system, as it is necessary for the management of the final product usability, flexibility is required for adapting the partial prototypes according to evaluation outcomes. 4.2 Exploratory Strategies and the Definition of the Product Concept Exploratory strategies are not yet dealt with in SE literature and practice. Traditional information systems, like payroll systems, are well defined and most SE methodologies are directed to building them. Input-process-output models fit very well this kind of problems: automation of procedures previously performed manually, with well defined rules and algorithms. The product concept is clearly delimited in this kind of systems, so requirements can be written down with less risk of misunderstandings between the customer and the development team. Actually, IEEE body of standards has a standard for establishing the user requirements, the Concept of Operations [15] or ConOps, but it is seldom used in software development, unlike the more system-oriented (or developer-oriented) IEEE recommended practice for software requirements specification [14], which receives much more attention from the SE field. On the other hand, the HCI field has a long tradition of dealing with ill-defined problems, developing new products with a high degree of innovation. Even if the creation of these systems has not been their main focus of activity, dealing with problems with neither an obvious solution nor indications of how development should proceed, has been part of HCI practitioners' work. Accordingly, several HCI techniques are specially well suited for defining the product concept. These techniques favor participative and creative activities, which fit very well the purpose of creating a model of how the system works, from the user point of view, studying if it fits with user needs/expectations. Examples of this techniques are Personas [6], Scenarios [4], Storyboards and Visual Brainstorming [22]. As long as current interactive systems development goes on changing to new paradigms of interaction, with an ever increasing degree of novelty required, these HCI techniques will have to be either adopted by software engineers, or applied by HCI experts belonging to the development team.
5 Conclusions In this paper we have shown how uncertainty plays a major role in software development in the construction of non-trivial interactive software systems. While uncertainty in the problem may be harmful, uncertainty in the solution may be useful when used as a tool for dealing with the former kind of uncertainty (the one in the problem). HCI has been traditionally applying flexible processes that allow participatory design, and it has promoted the usage of prototypes aiming at greater flexibility for making changes to the (partial) software solution. Aditionally, some HCI techniques are especially well suited for the development of innovative software systems, which
76
X. Ferre and N. Medinilla
are ill-defined by definition, and they may be adopted for exploratory problemsolving strategies. Even if this is part of standard HCI practices, the convenience of this approach has not been formalized in a way that helps HCI methods integration into mainstream software development practices. Recent awareness about the obstacles that traditional approaches, like the waterfall life cycle, imposes on the endeavor of successful systems development, leads to a more favorable attitude to the introduction of HCI methods, which ultimately lead to better quality products. In particular, HCI may play an important role in introducing practices that improve the usability of the final product, while additionally preparing businesses to better deal with uncertainty in software development. Understanding the roots of current software development practices and knowing their deficiencies in dealing with uncertainty is essential for any software development business. A model for software development that considers uncertainty is needed, in order to change from a field that is based only on the expertise of gurus to a software development field with sound foundations for the selection of development practices.
References 1. Blum, B.I.: Software Engineering A Holistic View. Oxford University Press, New York, USA (1992) 2. Boehm, B.W.: A Spiral Model of Software Development and Enhancement. ACM SIGSOFT Engineering Notes 11-4, 14–24 (1986) 3. Bourque, P., Dupuis, R., Abran, A., Moore, J.W., Tripp, L., Wolf, S.: Fundamental principles of software engineering- a journey. Journal of Systems and Software 62, 59–70 (2002) 4. Carroll, J.M.: Scenario-Based Design. In: Helander, M., Landauer, T., Prabhu, P. (eds.) Handbook of Human-Computer Interaction, 2nd edn. pp. 383–406. Elsevier, NorthHolland (1997) 5. Constantine, L.L., Lockwood, L.A.D.: Software for Use: A Practical Guide to the Models and Methods of Usage-Centered Design. Addison-Wesley, New York, USA (1999) 6. Cooper, A., Reimann, R.: About Face 2.0: The Essentials of Interaction Design. Wiley Publishing, Indianapolis (IN), USA (2003) 7. Costabile, M.F.: Usability in the Software Life Cycle. In: Chang, S.K. (ed.): Handbook of Software Engineering and Knowledge Engineering, pp. 179–192. World Scientific, New Jersey, USA (2001) 8. Descartes, R.: Discourse on the Method of Rightly Conducting One’s Reason and of Seeking Truth (1993), http://www.gutenberg.org/etext/59 9. Hakiel, S.: Delivering Ease of Use. Computing and Control Engineering Journal 8-2, 81– 87 (1997) 10. Hix, D., Hartson, H.R.: Developing User Interfaces: Ensuring Usability Through Product and Process. John Wiley and Sons, New York (NY), USA (1993) 11. Glass, R.L.: Facts and Fallacies of Software Engineering. Addison-Wesley, Boston (MA), USA (2003) 12. Gould, J.D., Lewis, C.: Designing for Usability: Key Principles and What Designers Think, Communications of the ACM, 300–311 (March 1985) 13. IEEE: IEEE Std 610.12-1990. IEEE Standard Glossary of Software Engineering Terminology. IEEE, New York (NY), USA (1990)
How a Human-Centered Approach Impacts Software Development
77
14. IEEE: IEEE Std 830-1998. IEEE Recommended Practice for Software Requirements Specifications. IEEE, New York (NY), USA (1998) 15. IEEE: IEEE Std 1362-1998. IEEE Guide for Information Technology - System Definition Concept of Operations (ConOps) Document. IEEE, New York (NY), USA (1998) 16. ISO: International Standard: Human-Centered Design Processes for Interactive Systems, ISO Standard 13407: 1999. ISO, Geneva, Switzerland (1999) 17. Klir, G.J., Folger, T.A.: Fuzzy Sets, Uncertainty and Information. Prentice Hall, N.J. (1988) 18. Larman, C.: Agile and Iterative Development. In: A Manager’s Guide, Addison-Wesley, Boston (MA), USA (2004) 19. Matsubara, T., Ebert, C.: Benefits and Applications of Cross-Pollination. IEEE Software. 24–26 (2000) 20. Mayhew, D.J.: The Usability Engineering Lifecycle. Morgan Kaufmann, San Francisco (CA), USA (1999) 21. Parnas, D.L.: On the Criteria To Be Used in decomposing System into Modules. Communications of the ACM. 15-12, 1053–1058 (1972) 22. Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., Carey, T.: Human-Computer Interaction. Addison Wesley, Harlow, England (1994) 23. Raccoon, L.B.S.: The Chaos Strategy. ACM SIGSOFT Software Engineering Notes, 20-5, 40–46 (1995) 24. Seffah, A., Andreevskaia, A.: Empowering Software Engineers in Human-Centered Design. In: Proc. of the ICSE’03 Conference, Portland (OR), USA, pp. 653–658 (2003) 25. Seffah, A., Gulliksen, J., Desmarais, M.D. (eds.): Human-Centered Software Engineering Integrating Usability in the Development Process. Springer, Heidelberg (2005) 26. Shneiderman, B.: Designing the User Interface: Strategies for Effective Human-Computer Interaction, 3rd edn. Addison-Wesley, Reading (MA), USA (1998) 27. Vredenburg, K., Mao, J.Y., Smith, P.W., Carey, T.: A Survey of User-Centered Design Practice. In: Proc. of CHI-2002, Minneaopolis (MI), USA, pp. 471–478 (2002)
After Hurricane Katrina: Post Disaster Experience Research Using HCI Tools and Techniques Catherine Forsman USA [email protected]
After Hurricane Katrina: Post Disaster Experience Research
79
1 Introduction Disaster is a complex human and environmental event oftentimes perplexing to the most brilliant social scientist, humanitarian worker, governmental official, or human being. FEMA, and other governmental agencies, work with a defined categorization of disaster, called a cycle of disaster [1]. This high-level cycle is an overall view of stages within a continual loop of disaster. Yet, defining a fixed type of disaster and the resulting solutions for appropriate information dissemination, acquisition and organization is difficult because no disaster is the same. Each disaster brings with it specific characteristics. To easily envision the complexity, one could consider the difference between the impact of extreme environmental elements such as wind, water, air and fire and then compound that with the varying contexts where such a disaster could take place. These contexts could be urban or rural with different cultural and language requirements. For example, imagine the difference between 9/11 and Hurricane Katrina. These disasters took place in two different cities with different local governments, cultural histories, demographics, and within different urban plans. If one were to envision the types of activities and information involved in both disasters there may be a few overlapping qualities at a high level, but in reality, they are specifically different at the informational need, activity and urban context level. In other words, disaster has a site specific element to it that involves understanding context, activities, information and the flexibility of real-time, ad hoc information adaptation to contextual activities. If this hypothesis is interesting, then research in disaster management with an HCI perspective may reap interesting findings. Because HCI deals with the study of information systems and appropriate technologies for people within situated activities, it is a unique field, well suited for understanding human needs in adaptive and changing environments. That is what this paper is about: the process of using HCI tools and techniques in a post disaster situation in order to learn how context, activities and people learn what to do and what they may need as information and technology in order to do those things. In the future, conducting HCI research in disaster areas may lead to important findings regarding innovation for disaster situations, technology devices, information structures and the creation of ontological frameworks of experience used as infrastructures for adaptive learning tools when the cycle repeats itself.
2 The Cycle of Disaster The disaster cycle is outlined next in this paper in order to illustrate a framework and define what is meant by “disaster management.” Mitigation. This phase encompasses lessening the effects of possible disasters. It differs from other phases because it involves trying to learn from past disasters through information and data to lessen the severity of any future disaster. This phase also deals with evaluating risks and risk management [2].
80
C. Forsman
Preparedness. Common preparedness measures include the proper maintenance and training of emergency services, the development and exercise of emergency population warning methods combined with emergency shelters and evacuation plans, the stockpiling, inventory, and maintenance of supplies and equipment, and the development and practice of multi-agency coordination. An efficient preparedness measure is an emergency operations center (EOC) combined with a practiced regionwide doctrine for managing emergencies [2]. A development of interest to HCI professionals in this area is one where ethnographic observations regarding self organizing behavior were used after 9/11. In 2002, the US Federal government created a new procedure for evacuating federal employees in Washington. The protocol is based upon observed social dynamics exhibited in 9/11 and attempts to “improve the ad hoc process” based upon ethnographic findings [3]. But, even if there are some insights into how field research can contribute to understanding self organizing systems for future disaster scenarios, is the concept of preparedness flawed? Certainly some risks can be avoided, but disaster by definition is about chaos and the unexpected that takes place in specific contexts that cannot be predetermined. Is it possible to be prepared for dynamic and complex situations that may not now exist, even in a risk model? In two surveys conducted by NYU’s Center for Catastrophe Preparedness and Response (CCPR), one after 9/11 and one after Hurricane Katrina, they noticed a steep rise in participant’s beliefs that one could not prepare for a disaster after the widespread destruction of Hurricane Katrina. This survey data is as follows: 62% of Americans said that it was nearly impossible to be very prepared for terrorist bombings, 60% said the same about hurricanes and floods, and 55% said the same of a flu epidemic” [4]. This shows, perhaps, a lack of confidence in the idea that anyone can “prepare” for such events. But, it also illustrates a perception that preparedness may be an area of inquiry. The question is: What tools does one use to understand this issue? In reality, managing disaster is a combination of both understanding the ad hoc organization before, during, and after the disaster occurs—a very difficult proposition. Response. The response phase includes the mobilization of the necessary emergency services and first responders in the disaster area, such as firefighters, police, volunteers, and non-governmental organizations (NGOs). Additionally, this phase includes organizing volunteers [2]. One could imagine that there are dynamic events that occur in the real world during a disaster. Additionally, there are static preparedness protocols that could be described in a taxonomic way such as a “type” of response (e.g. rescue in water), and scenarios of rescue (e.g. evacuation to hospital). However, rescuing someone from a nest of poisonous water snakes as the person struggles to stay afloat in oil enriched water with no clear directive on where to take the victim due to limited radio frequency and lack of organizational directives is what occurs. The actual narrative of events is very different from simulation of the event previous to the disaster. This underscores an aspect inherent in response, and that is the need for real-time collaboration in interactions with people, information and technology in a social networking and ad hoc organizational manner as needs arise and have outcomes that can rarely be predicted. It also underscores the need for HCI research that deals both
After Hurricane Katrina: Post Disaster Experience Research
81
with narratives of actual events and the creation of technical infrastructures, information structures and organizational models for real-time response and access and organization (reading of patterns). Recovery. The aim of the recovery phase is to restore the affected area to its previous state. It differs from the response phase in its focus; recovery efforts are concerned with issues and decisions that must be made after immediate needs are addressed [2]. The idea of “restructuring” brings with it a wealth of opportunity to explain and explore contextual and population needs through narrative. In other words, new socialized orders can be explored and remapped in accordance to what may not have existed before but ideally could have. The narrative could be shown in personas and scenarios; yet, grounded in field research that can be validated by communities and individuals in order to ensure a community feedback loop.
3 Technology and Disaster If one looks more closely at the technology used during the Rescue and Recovery phase after Katrina it leads to some interesting findings. For example, first responders often communicate via two-way radios. Two-way radios have limited range, about a kilometer, but repeater stations can be positioned to increase the range. They are most often used to coordinate supplies, rescue missions and communications between team members and a Coordinator [5]. Additionally, to accommodate a real-time dynamic, cell phones can be used but oftentimes the network cannot respond or infrastructure or device failure occurs due to environmental issues [6]. However, even if a cell phone could work, when power failure occurred, all 911 center capabilities were disabled [7]. There was nowhere to call but friends or family. A useful technology after Hurricane Katrina was Ham radio. Commercial radio antennas are placed on top of hills in order to cover broader areas of reception making them highly vulnerable to wind and earthquake tremors, whereas Ham radio operators build smaller antennas knowing their broadcast range is within the local area, such as within a city. If an Ham radio tower (can be as small as 100 feet) falls over, it is easy to pick up and reinstall. ARRL President Jim Haynie testified before Congress in 2005 that 1000 Amateur Radio volunteers have been serving the stricken area to provide communication for such agencies as The Red Cross and the Salvation Army and to promote interoperability between agencies” [8]. FEMA passed out radios to citizenry in Bay St. Louis, MS so that they could listen to the broadcasts from a local radio station and obtain information regarding food, shelter and medicine [9]. The idea of the local and smaller technology prevailed simply because it was quickly repairable and could be supported easily by governmental agencies (just pass out the radio from a truck). There is no guarantee that any type of technology would not suffer the same fate as cell phones did during Hurricane Katrina. Meaning, they may work but there are important factors to consider that involve information and organization (e.g. 9/11 operators, system overload, complete loss of communication tools). However, as in the case of Ham radio, the key is the separation of information structures from the device and the local, smaller aspect of the technology. This then, has some precedence in HCI with pervasive computing [10, 11].
82
C. Forsman
For the volunteers outside the disaster zone, a proliferation of internet use and social networking took place (bulletin boards, websites, etc…). The internet became a platform for grassroots initiatives and individual and small-group citizen rescue mission needs (e.g. “Please send money for a gas card for the plane we are flying in to the Gentilly area to distribute water.”) [12]. While those outside the disaster zone can use the internet for organizational and informational purposes, those within the disaster area most likely are without immediate access to the internet. The important takeaways from this is that technology, understood in a real-world framework, worked best when it was easily restructured, could relay distribution and organizing information, and allowed for a social networking either via voice or text.
4 Considering Disaster, Context and the User If we think about the “user’s” capacity for information organization and processing in a complex environment such as in a disaster, the idea of the user as an isolated element understood and normalized for specific psycho-cognitive interactions with an information system in a laboratory does not hold. This type of user definition arose from cybernetics where Herbert Simon proposed his ideas of bounded rationality and learning through information feedback and adaptation [13]. The objective of these studies was to determine task models, ergonomic needs, information models and cognitive responses to a system [14]. From the set of results, usually conducted in a laboratory with task-based questions or via survey, a baseline could be created of a user with varying degrees of expertise and satisfaction in relation to a technology system and tasks performed. Yet another train of thought, as written about by Drs. Lamb and Kling is the idea of the user as a social actor where context plays an important role in understanding the requirements for interactions [15]. The added relation of context and user accounts for the complexities of situated actions, such as space, interactions with objects and people, and power dynamics for use of systems or information. Situated action was first introduced in 1987 in Lucy Suchman’s book, Plans and Situated Actions: the Problem of Human-Computer Communication. And, as Lamb and King point out, “years later a particularly formative and influential study in this area appeared in Mumford’s socio-technical ETHICS PD approach [16]. PD practitioners became keenly aware that structural constraints may prevent the exchange of information, but they believed that users were social actors and capable of mobilizing change. This is not to say the early PD research sided only with the social actor as worker, but included within their perspective organizational changes through technology throughout the full power hierarchy of an organization.” Basically, it was a creative way to consider context and people as social actors within their interactions with context, information and technology [17, 18, and 19]. Disaster by its very definition is chaos rapidly changing and possibly disintegrating contexts. Context here can be the urban landscape, such as a city or home, or mental models of operation, such as knowing how to reach a medical facility if one is hurt. Due to the disruptive nature of disaster to context it is an important study for HCI field research in disaster situations. Meaning, how do people, whether organizers or
After Hurricane Katrina: Post Disaster Experience Research
83
survivors, deal with varying cognitive loads of information and organizational complexity in order to readjust themselves within survival contexts (shelters, hospitals, etc…)? Extending context beyond the workplace does have some precedent in HCI in the area of pervasive and urban computing where context is extended so that “cities can also be viewed as products of historically and culturally situated practices and flows. When we view urban areas in this context, rather than as collections of people and buildings, infrastructure and practice are closely entwined” [15 and 20]. Over the course of time, methodologically, the research organically moved closer to the sensibilities of PD as envisioned in earlier Scandinavian HCI projects [23]. Observational notes were requested by NGOs so that they could better understand the conditions of specific locations. These notes were sent via email with participant’s permission and editing. Participants in the research began organizing within the shelter and asking for photography or journal recording advice in order to post information to the internet. Traditionally, PD dealt with context and human activities in work environments and was deployed with the understanding that a community would be studied and impacted from decisions made about information systems or machines [24]. The core premise of PD was that better and safer working conditions will result from some sharing of power and an appreciation of the tacit knowledge and adaptive capabilities that workers contribute to organizational processes. In other words, researchers immersed themselves in a culture in order to contextualize culture within their research, create a feedback loop with people within the context, and to participate with the community in developing prototypes and articulations of requirements. In the historical context of PD, the research itself became a conduit for requirements that arbitrated the needs of the workers to management and vice versa. In the case of HCI disaster research, the need for research within context becomes even more strongly coupled than in industry as understanding the needs of a population cannot be divorced from survival actions in context. Understanding what organizational and information needs confront people while coping with the myriad facets of disaster very likely can inform information structures for disaster response in the future, as well as immediate feedback loops in the present.
5 The Research Experience The full study took place over a 2 year period, but for the purposes of this paper, the first experiment is briefly described as it set the direction for the following research experiments. To understand the organizational complexity of people and how they rapidly relearn a new context in order to survive, ethnographic observation and contextual inquiry was used. Interviews were conducted regarding people’s memories of their experience in the changed flow of the city during escape compared to preexisting conditions. Interviews took place in order to understand the memory of how the participant once contextualized their day-to-day experiences within their
84
C. Forsman
homes and neighborhoods, and how this had changed. Additionally, design probes1 (cameras, diaries, sketching, asking participants to post photos to an internet site and video to you tube) were used [21]. The research had as its goal to understand what was critical when changes in context-as-situated-action had on participants. The inquiry highlighted the needs of survivors for both information and organization around flow in order for them to cocreate ongoing survival strategies. The output from this area of inquiry was a set of narratives, field notes, scenarios and personas with clear representations of the participants in context before and after disaster. These personas and scenarios were taken back to communities, and via workshops in some cases, or individuals in others, validated and iterated, thereby involving a “community” aspect. Another interesting output from this research was political in nature. Given that context became a central area of inquiry, findings regarding the appropriate distribution of goods and cultural misinterpretation of needs between governmental and NGO workers and hurricane survivors was evident. Meaning, the survivor believed “You just don’t get me.” when it came to privacy concerns, organization of cots, shower schedules and food. The NGOs and Red Cross organizers followed a protocol of organization that had little to do with context and had been prepared previous to the hurricane. This, then, may be categorized as a missing layer of information interfacing between a specific, situated population and the relief organization’s protocols. In order to bridge this gap, organizational meetings too place within the shelters and field notes were used as tools of information dissemination, pointing to the need for a more efficient; yet, malleable form for creating an information interface between the two populations. Similar research was conducted in the city of Bay St. Louis, MS. In this context, gas stations and Red Cross distribution centers had gluts of certain supplies (baby diapers and sweaters in 90 F weather where babies had been evacuated) and very little of the needed supplies for navigating the new context (accessible medical centers for tetanus shots, appropriate makeshift shelter for 200+ people sleeping in the church’s parking lot, or fork lifts to clear major thorough fares and street signs to order the flow of traffic). This is not an uncommon pattern as witnessed in an ethnographic work in Peru in 1990 when a powerful earthquake struck. Noticing that indigenous populations were not receiving food and goods that arrived through NGOs, Theo Schilderman, an anthropologist, studied the problem of how official relief agencies, survivors, and grassroots volunteers misinterpreted each others needs. The result was a deprivation of goods to survivors because both governmental agencies and NGOs 1
By the word “probe” is meant a label used to categorize a set of tools used in field research and design practices to gather information and iterate ideas with people. Tools categorized within the “probe” category are such things as diaries, remote cameras, drawing exercises mailed to the researcher, etc… Probes have a lineage within the design field for open inquiry. However, the author has intentionally not classified Participatory Design (PD) within the area of Probes in order to distinguish a different historical lineage, such as PD resulting from a need to incorporate an understanding of politics into the research process, whereas the classic definition of probes (diaries, cameras, etc…) were developed for design feedback loops that may or may not have had as an objective and understanding mediating politics.
After Hurricane Katrina: Post Disaster Experience Research
85
were unfamiliar with the community conditions. Additionally, when the goods were finally distributed, the results were a mismatch between what emergency management authorities were trying to give to the victims and what they actually needed [22]. Creating information structures that can easily be accessed around population needs within specific contexts could alleviate some of this tension. 5.1 Aftermath Experiment The first experiment was performed from September 25 – 30, 2005 in three separate locations. Location 1: the Houston Astrodome. Location 2: NGO center outside Baton Rouge shelter, and Location 3: at various places within the city limits (home or makeshift shelter) of Bay St. Louis and Waveland, Mississippi. Two days were spent in each location, interviewing and observing people with a video camera and recording field notes. The recruitment process was done by word-of-mouth and over the internet. 5.2 Social Networking Via the Internet Before arriving onsite, the necessity to quickly establish connections with NGOs and emergency medical personnel within specific locations was done through email and the internet. Craig’s list and Katrina bulletin boards were also used during this phase. Due to the organizational complexities thwarting NGOs and governmental agencies in the accurate distribution of volunteers and supplies, certain people formed their own organizations and drove into the disaster area distributing goods. They organized via the internet and appeared on the doorsteps of shelters, churches or roadsides with supplies. Connecting with these people became invaluable in order to cover wider geographic areas of interest because they distributed life-saving goods in an ad hoc fashion. 5.2.1 Research Issues and the Failure of the Probe Ethnography was performed in the following way: observational, interactive (conversational), and some times participatory in a shelter or on the street with survivors. Cognitive issues arose, such as memory recall for the telling of a story. Additionally, a shifting of importance in temporal information would change with sudden interruptions in the conversation. Participants would interrupt themselves with more pressing concerns such a fears that “my house may blow up,” or “I don’t know where my child is could you help me get information?” Given this, cameras were passed out to each participant so that they could record the details of their lives when they had time and mail it to the researcher at a later date. Envelopes and stamps were included so that wherever they relocated to they could mail the envelope from that location. These diaries are still arriving in the mail from Sept. 2005 illustrating the importance of understanding post traumatic stress disorder and its lasting effects, as well as how long it may take to restructure key infrastructures, such as post office. Below is a chart illustrating the response times.
86
C. Forsman Table 1. Camera passed out in September 2005 with return dates Participant Location Baton Rouge Shelter Houston Shelter Bay St. Louis
Number of Cameras 25 10 27
Waveland
8
Date/Amount of Returned 1, Jan, 2005; 10, Dec. 2006 5, June, 2005 3, Oct. 2005; 5, Nov. 2005; 6, collected 2, collected Mar 2006
Understanding the context: post offices destroyed, or multiple relocation of participants, stress in recounting events, demonstrates the needs for adaptable research. After the initial field experiment participants began to email images taken with their cell phones or cameras. Additionally, participants wished to post information on You Tube and flickr in order to reach a wider audience. They expressed that they believed that these structures worked better for them as they struggled for assistance and wished to be noticed by a wider population. As this pattern began to develop new approaches for organizing information in the research took place. A good example of a website that is specifically for ethnographic research and disaster is the Indigenous Knowledge in Disaster Management website.
6 Conclusion The Disaster Cycle was highlighted in this paper in order to set the stage for a HCI field research. The research was explained so that anecdotal evidence was presented on how HCI research needed to be both participatory and adaptive in a post disaster environment. Acknowledgments. Participants in Bay St. Louis, Slidell, and Waveland, Mississippi. and in New Orleans and Baton Rouge who graciously gave of their time.
References 1. Alexander, D.: Principles of Emergency Planning and Management. Terra Publishing, Harpendern (1991) 2. Haddow, G.D., Bullock, J.A.: Introduction to Emergency Management. ButterworthHeinemann, Amsterdam (2004) 3. Jason, P.: (August 14, 2002), http://www.govexec.com 4. Berne, R.: CCPR: Organizational & Community Preparedness Project Executive Summary (2005) 5. SAFAM Summary of Events for a Medical Mission to Mozambique (2007) 6. Banipal, K.: Strategic Approach to Disaster Management: Lessons Learned from Hurricane Katrina. Disaster Prevention and Management, pp. 299–421 (2006) 7. Hurricane Katrina Timelines. The Brookings Institute (2004) 8. ARRL President Congressional Testimony on Hams’ Katrina Response, Submitted to the House Government Reform Committee (September 15, 2005)
After Hurricane Katrina: Post Disaster Experience Research
87
9. Moyers, B., Kleinenberg, E.: Fighting for Air Transcripts (2006) 10. Dourish, P.: Seeking a Foundation for Context-Aware Computer. Human-Computer Interaction 16(2,3 & 4), 229–241 (2001) 11. Dourish, P.: Speech-gesture driven multimodal interfaces for Crisis Management. Proceedings of the IEEE 91, 1327–1354 (2003) 12. Anonymous, Craig’s List posting retrieve (September 10, 2005) 13. Simon, H.A.: A Behavioral Model of Rational Choice. Quarterly Journal of Economics 69, 99–118 (1955) 14. Norman, D.A.: Cognitive Engineering. In: Norman, D.A., Draper, S.W. (eds.) In UserCentered System Design, Lawrence Erlbaum Associates, Hillsdale, NJ (1986) 15. Dourish, P.: What We Talk About When We Talk About Context. Personal and Ubiquitous Computer 8(1), 19–30 (2004) 16. Mumford, E.: Effective Systems Design & Requirements Analysis: The ETHICS Approach. MacMillan, New York (1995) 17. Greenbaum, J., Kyun, M.: Design at Work: Cooperative Design of Computer Systems. Lawrence Erlbaum, Hillsdale, NJ (1992) 18. Gutwin, C., Greenberg, S.: Design for Individuals, Design for Groups: Tradeoffs between Power and Workspace Awareness. In: Proceedings of the ACM 2000 Conference on Computer Supported Cooperative Work 2003, Philadelphia, PA (2000) 19. Nardi, B., Miller, J.: Twinkling Lights and Nested Loops: Distributed Problem Solving and Spreadsheet Development. International Journal of Man.-Machine Studies 34, 161–184 (1991) 20. Curry, M., Phillips, D., Regan, P.: Emergency Response Systems and the Creeping Legibility of People and Places. The. Information Society 20, 357–369 (2004) 21. Boehner, K., Vertesi, J., Sengers, P., Dourish, P.: How HCI Interprets the Probes. In: Proceedings of CHI (2007) 22. Schilderman.: Theo. Strengthening the Knowledge and Information System for the Urban Poor. Cambridge Unversity Press, Cambridge (2003) 23. Nygaard, K.: Program Development as Social Activity. In: Kugler, H.-J. (ed.) Information Processing, pp. 189–198. Elsevier Science Publishers, Amsterdam (1986) 24. Schuler, D., Namioka, A.: Participatory Design: Principles and Practices. Lawrence Erlbaum Associates, Hillsdale, NJ (1993)
A Scenario-Based Design Method with Photo Diaries and Photo Essays Kentaro Go Interdisciplinary Graduate School of Medicine and Engineering, University of Yamanashi 4-3-11 Takeda, Kofu 400-8511 Japan [email protected]
Abstract. In this paper, we propose a requirements elicitation method called Scenarios, Photographic Essays and Diaries as User Probes (SPED-UP). In SPED-UP participants create photographic diaries and photographic essays themselves. Each participant creates photographic diaries to capture a day in their own life. They reflect upon their personal experiences and create photographic essays based upon this reflection. This approach enables designers to collect user data conveniently. Designers, who might be participants themselves in a participatory approach, can then analyze these experiences by forming design concepts, envision scenarios by imagining contexts of use, and create artifacts by sketching these scenarios. We also describe an exemplary workshop using the SPED-UP approach. Keywords: user research, photographic diary, photographic essay, probe, requirements inquiry, scenario.
A Scenario-Based Design Method with Photo Diaries and Photo Essays
89
We propose a design approach using lightweight user research for designers and create design ideas from the user data to address this issue. Our approach, Scenarios, Photo Essays and Diaries as User Probes (SPED-UP), is a scenario-based design using participants’ self-produced photographic essays and photographic diaries. In this paper, we describe an overview of the SPED-UP approach; this paper specifically examines photographic diaries and photographic essays as representations of user research.
2 User Research to Elicit Requirements Four goals of the early stage of design for human-computer interaction are the following. • • • •
Elicit potential desires and requirements. Envision novel scenarios of use. Create designs reflecting the material of user research. Bring actual users into design activities.
Several efforts have been made to study user research for design. Researchers and practitioners transferred research methods for field work to the design of humancomputer interaction. For example, in the contextual inquiry technique [1], researchers visit users’ work settings and ask questions during the actual activities. This technique is useful to record and understand actual users’ tasks and activities to elicit their potential wants and requirements. Gaver, Dunne and Pacenti [3] created cultural probes, which is a package of devices such as post cards, disposable cameras, notebooks, and so forth. Each device is designed to encourage potential users to keep a diary themselves as the instruction and messages from the designers are printed on it. The packages are distributed to potential users; they in turn keep a diary using the devices and send the package back to the designers. The designers browse the materials. Consequently, the materials provide the designers with a clue for design. As in the cultural probe technique, photographs taken by actual users often play a central role in user research. Frost and Smith [2] used photographs taken by patients with diabetes themselves for self-management training. In the marketing research field, Holbrook and Kuwahara proposed a data collection method using collective stereographic essays to probe consumption experiences. Holbrook and Kuwahara’s approach inspired us to develop the Participatory Requirements Elicitation using Scenarios and Photo Essays (PRESPE) approach [6, 7]. Based on experiences using the PRESPE approach, we created the SPED-UP approach. With devices such as photographs and writings created by potential users, we intend to deal with the above four issues in the early stage of the design process.
3 SPED-UP: Scenarios, Photo Essays and Diaries as User Probe Our approach on user research for design employs three key devices: scenarios, photographic essays and photographic diaries. The approach is called Scenarios, Photo Essays and Diaries as User Probe (SPED-UP). Fig. 1 depicts an overview of the SPED-UP approach.
90
K. Go
Coordinator
Theme Theme
(1) Collect Photo Diaries
(2) Reflect
(3) Analyze Photo Essays
Personal Experience Participants
Artifacts Artifacts
Design Design Concept Concept Requirements and needs
(5) Translate
Scenarios
(4) Envision
Fig. 1. Overview of the Scenarios, Photo Essays and Diaries as User Probes (SPED-UP) approach
3.1 SPED-UP Overview As a participatory design approach, SPED-UP sets a group of major stakeholders (including designers and real users) working together to produce and evaluate product designs [11]. The SPED-UP approach encompasses two roles: coordinators and participants. The coordinators assign a project theme and provide ongoing support for the participants’ activities. Five main activities are (1) collection, (2) reflection, (3) analysis, (4) envisioning, and (5) translation. Participants collect their own personal photographic diaries. For the assigned theme, participants create photo-essays to reflect their personal experiences using existing artifacts. The participants are divided into several groups; the remaining SPED-UP activities are conducted as group work. By comparing the individual photographic essays, the participants can analyze shared ideas, identify the concepts behind them, and then develop design concepts. The participants can then use these design concepts as inspiration for future uses of the relevant technology when they envision use scenarios and contexts. This activity, called scenario exploration, is a structured brainstorming session with role-playing using scenarios and questions. The participants then translate scenes described in the scenarios into artifacts by making sketches of the scenes [4]. Three devices area used for SPED-UP: photographic diaries, photographic essays, and scenarios. 3.2 Photo Diaries A photographic diary comprises a series of photographs and their descriptions. Fig. 2 shows an example of a photographic diary. A participant takes a photograph at specified time intervals and describes an activity at the time the photograph is taken.
A Scenario-Based Design Method with Photo Diaries and Photo Essays
91
In Fig. 2 the participant took a photograph and wrote a diary at one-hour intervals. Each photograph and description represents a scene from a day in the participant’s life. The purpose of collecting photographic diaries of users is to capture actual scenes of life from the users. The final outcome from the design process is design ideas or designed products relating to information and communications technologies. Therefore, we are interested in finding opportunities for information processing and communication from their daily lives.
8:45 All I have in my wallet is a thousand-yen note. I stop at .an ATM machine to withdraw money on the way to work.
9:45 Working on a business meeting, a business partner gave a phone call.
10:45 The business meeting started at 10:30. The meeting material got in under the wire. I will be giving a talk on the material soon.
Fig. 2. An example of a photographic diary
A timer or prompter is useful to prompt taking a photograph by users to create photographic diaries. However, using a self-timer to take photographs might not be appropriate for our approach because it might capture unintended scenes and cause privacy and security concerns. For this reason, we ask users to take photographs themselves so that the users can choose what they capture as a scene of daily life. Instead of taking photographs automatically, we ask users to capture a scene that represents their actions, tasks, or activities as well as the environment surrounding them. In fact, we ask them to appear themselves in photographs to represent clearly what they are doing in what situation. Current technologies such as small portable digital cameras, mobile telephones with digital cameras, and personal digital assistants (PDAs) with digital cameras provide us opportunities to create photographic diaries without too much trouble. In addition, self production of photographic diaries by the participants enables designers to collect user data in a short period of time.
92
K. Go
3.3 Photo Essays A photographic essay contains representative photographs on an assigned theme and an essay explaining why the participant thinks the photographs fit the theme. Photos might be stereograms to increase the viewer’s sense of reality [8]. Fig. 3 shows an example of a photographic essay.
I live alone. The first thing I do is turn on the TV when I get back home. I guess I might be feeling lonely. I try to find an entertaining program. I watch many kinds of programs, such as variety shows, dramas, and comedies. Because I live alone, I have a habit of channel surfing. Because I do not subscribe to a newspaper, I do not know what TV programs are currently on the air. So after turning on the TV, I start channel surfing and stop when I find an entertaining program. During commercial breaks, I start channel surfing again because I do not want to miss any entertaining programs that might be airing simultaneously on a different channel. Another reason for this habit is that I am not disturbing anyone because I live by myself. I think that this habit might change depending on my environment. Fig. 3. An example of a photographic essay: Channel surfing [7]. The theme assigned to the participant is “something I usually do with an IT product.” In the essay, the author assumed that the television is an IT product.
The purpose of collecting photographic essays of users is to elicit potential hidden needs. This is achieved by users’ deep introspection based on the assigned theme. The photographic diaries and photographic essays are key user data in the SPEDUP approach. We expect from the user data that users’ needs or requirements that emerge from photographic essays might be incorporated into opportunities of information processing or communication found in photographic diaries. Toward this end, designers analyzed the collected photographic diaries and essays. The obtained ideas from the data are summarized and listed as Design Concepts shown in Fig. 1. The next step in the SPED-UP approach is to create scenarios. 3.3 Scenarios Scenarios in the SPED-UP approach have two aspects: as a tool to support idea generation and as a representation of design ideas based on user data. At the idea
A Scenario-Based Design Method with Photo Diaries and Photo Essays
93
generation stage from the design concept produced from the photographic diary and photographic essay analysis, designers conduct brainstorming sessions using an affinity diagram. In this activity, scenarios might be a textual narrative form. During the SPED-UP brainstorming session, participants create short scenarios that include usage situations. The participants ask 5W1H (What, Why, Who, When, Where and How) and what-if questions to identify concrete details of various use situations. The answers to the questions are represented as scenarios with detailed information. As a representation of design ideas, designers create scenarios that represent scenes of a task or activity. Scenarios at this stage are much longer descriptions than those in the brainstorming session.
4 Example We conducted a two-day workshop of the SPED-UP approach at the Ergo-Design Division, Japan Ergonomics Society. This section describes an overview of the workshop as an example. Other reports on the workshop can be found at [9, 10, 16]. The workshop was intended to create design specifications of a ubiquitous computing system for a university campus. Specifically, we designed the system not only for traditional usability aspects but also for emotional aspects; in this sense, we intended to incorporate the aspect of happiness into the system. The workshop participants are from several companies and universities in Japan. They have various backgrounds and experiences in industrial and product design but have no experience using the SPED-UP approach. Box 1 and Box 2 show the assignment given to the participants. Following our SPED-UP approach, we asked them to address a theme – “Something I feel happy about” – by taking a representative photograph and writing a brief vignette indicating the significance of the photo. We provided assignments to the participants of the workshop beforehand. They created the photographic diaries and photographic essays prior to the workshop. Fig. 4 shows the first two hours of a photographic diary created by a participant. She is a supporting staff member of a university field hockey team, and she describes her day during spring break. The photographic diaries provided by the participants enable the workshop members to share and understand the individual’s daily life. Photo Diary Project Description Do the following.
• Take a photograph every thirty-minute interval from morning to night (one-hour interval may be acceptable if you think thirty-minute interval is too busy). • Write a short diary that explains the scene captured in the photograph. • Construct a summary document (a PowerPoint presentation or a poster) that contains the photographs and diary. Notes
• Consider what the theme means to you. • Describe the scene in the photographs; explain why you selected that particular scene. Box 1. Photo diary assignment given to the participants
94
K. Go Photo Essay Project Description For the theme below, do the following.
• Take a pair of photographs (overview and close-up) that describes the theme. • Write a short essay that explains the meaning of the scene captured in the photographs. • Construct a summary document (a PowerPoint slide) that contains the photographs and essay. Theme
• Something I feel happy about Notes
• Consider what the theme means to you. • Describe the scene in the photographs, and explain why you selected that particular scene. Box 2. Photo essay assignment given to the participants
8:30-9:00 (1), (2) I wake up in the morning and check e-mail first. (1)
(3)
9:30-10:00 (4) I time warm-up exercise with a stop-watch behind the backstop on the field hockey field
(2)
I use a microwave for making a drink in the cold winter.
9:00-9:30 (3) At a convenience store, I use photoprinting service. The cash insertion slot is out of reach of the printing terminal. 10:00-10:30 (5) I hand out drinks to players every thirtyminute interval (4)
(5)
Fig. 4. A photographic diary created by a participant. She is a supporting staff member of a university field hockey team (excerpt from her poster and translated by the author).
Fig. 5 shows a photographic essay created by a participant. He explains in the photographic essay why self-made coffee in the morning is important for him. At the workshop we started explaining the photographic diaries and photographic essays that were brought. Then we divided the workshop members into three groups. Each group reviewed all the photographic diaries and photographic essays and find common ideas and opportunities behind them. They created design keywords through this activity. All materials had been posted on the wall of the workshop room so that the participants were able to review them anytime.
A Scenario-Based Design Method with Photo Diaries and Photo Essays
95
Fig. 5. A photographic essay created by a participant. He explains why self made coffee in the morning is important for him to spend a happy day.
During the analysis phase of photographic diaries and photographic essays, the participants created keyword descriptions. Box 3 shows an example of the keyword descriptions created by a participant group. Based on those keywords, the participants conducted scenario-based brainstorming sessions. Finally, they created design ideas about restructuring the concept of a lecture on campus. They proposed the “learning like a pot-luck party” concept, a student-led learning environment where anyone comes and leaves anytime and shares knowledge and experience. Keyword descriptions: Relativity: The degree of happiness is perceived in a relative manner. The same life event can be experienced differently from person to person. Rhythm: Series of events in daily life create a harmony of happiness. Box 3. Keyword description by the participant group
5 Conclusions In this paper, we introduced a user research and design method using a scenario-based approach with photographic diaries and photographic essays. The Scenarios, Photo Essays and Diaries as User Probes (SPED-UP) approach enables designers to collect user data at the beginning of design process in lightweight manner. In this paper, we specifically addressed representation of photographic diaries and photographic essays.
96
K. Go
We introduced the SPED-UP approach at a workshop held by the Ergo-Design Division, Japan Ergonomics Society in February, 2006. The participants at the workshop quickly acquire the approach; then they started using it at design departments of several companies and universities in Japan including Fujitsu Co. Ltd., Canon Inc., Ricoh Company, Ltd., Chiba University, Musashi Institute of Technology, Kurashiki University of Science and The Arts, and University of Yamanashi. The Ergo-Design Division is now considering using it as a basic design approach for ubiquitous services, applications, and products. Ueda and Watanabe [15] reported that the SPED-UP approach enables design students to center their creative efforts specifically on their design target, which suggests the potential value of SPEDUP for use in design education. Acknowledgments. The author thanks the Ergo-Design Division, Japan Ergonomics Society. The photographic diary and photographic essay in Section 4 are provided by Saori Oku, Wakayama University and Hiromasa Yoshikawa, Design Center, Fujitsu Co. Ltd.
References 1. Beyer, H., Holtzblatt, K.: Contextual design: Defining customer-centered systems. Morgan Kaufmann, San Francisco (1998) 2. Frost, J., Smith, B.K.: Visualizing Health: Imagery in Diabetes Education. In: Proceedings DUX 2003 Case Study, Designing for User Experience ACM/AIGA (2003) 3. Gaver, B., Dunne, T., Pacenti, E.: Cultural probes, interactions, 6(1) 21–29 (1999) 4. Go, K., Carroll, J.M.: Scenario-based task analysis. In: Diaper, D., Stanton, N. (eds.) The Handbook of Task Analysis for Human-Computer Interaction, pp. 117–134 (2003) 5. Go, K., Carroll, J.M.: The Blind Men and the Elephant: Views of Scenario-Based System Design. Interactions 11(6), 44–53 (2004) 6. Go, K., Takamoto, Y., Carroll, J.M., Imamiya, A., Masuda, H.: PRESPE: Participatory Requirements Elicitation using Scenarios and Photo Essays, Extended. In: Proceedings of the CHI 2003, Conference on Human Factors in Computing Systems. pp. 780–781 (2003) 7. Go, K., Takamoto, Y., Carroll, J.M., Imamiya, A., Masuda, H.: Envisioning systems using a photo-essay technique and a scenario-based inquiry. In: Proceedings of HCI International 2003, pp. 375–379 (2003) 8. Holbrook, M.B., Kuwahara, T.: Collective Stereographic Photo Essays: An Integrated Approach to Probing Consumption Experiences in Depth. International Journal of Research in Marketing 15, 201–221 (1998) 9. Inoue, A.: A Proposal for New Campus Life for the Ubiquitous Generation: An approach using Photo Scenario Method. The. Japanese Journal of Ergonomics 42 Supplement, 58–59 (in Japanese) (2006) 10. Ito, J.: How to Make Campus Life Unforgettable with Ubiquitous Service. The Japanese Journal of Ergonomics 42 Supplement, 54–55 (in Japanese) (2006) 11. Muller, M.J., Haslwanter, J.H., Dayton, T.: Participatory Practices in the Software Lifecycle. In: Helander, M., Landauer, T.K., Prabhu, P.V. (eds.) Handbook of HumanComputer Interaction, 2nd edn. pp. 255–297. Elsevier, Amsterdam (1997) 12. Poltrock, S.E., Grudin, J.: Organizational obstacles to interface design and development: two participant-observer studies. ACM Transactions on Computer-Human Interaction 1(1), 52–80 (1994)
A Scenario-Based Design Method with Photo Diaries and Photo Essays
97
13. Pruitt, J., Adlin, T.: The Persona Lifecycle: Keeping People in Mind throughout Product Design. Morgan Kaufmann, San Francisco (2006) 14. Rosson, M.B., Carroll, J.M.: Usability Engineering: Scenario-Based Development of Human-Computer Interaction. Morgan Kaufmann, San Francisco (2001) 15. Ueda, Y., Watanabe, M.: A study of vision-development methods for the ubiquitous generation. In: Proceedings of the 36th annual Meeting of Kanto-Branch, Japan Ergonomics Society, pp. 29–30 (in Japanese) (2006) 16. Yoshikawa, H.: Campus Life Support by Ubiquitous Technology. The Japanese Journal of Ergonomics 42 Supplement, 56–57 (in Japanese) (2006)
Alignment of Product Portfolio Definition and User Centered Design Activities Ron Hofer1, Dirk Zimmermann2, and Melanie Jekal3 1
Siemens IT Solutions and Services C-LAB, Fürstenallee 11, 33102 Paderborn, Germany [email protected] 2 T-Mobile Deutschland GmbH, Landgrabenweg 151, 53227 Bonn, Germany [email protected] 3 Universität Paderborn C-LAB, Fürstenallee 11, 33102 Paderborn, Germany [email protected]
Abstract. To reach a product’s business objectives, the requirements of all relevant stakeholders have to be analyzed and considered in the product definition. This paper focuses on the processes applied to analyze and consider the needs and expectations of two of these stakeholder groups, namely the customers and the users of a product. The processes to produce customer centered product definitions and user centered product definitions are compared, rendering visual opportunities to increase their efficiency and effectiveness by means of collaboration. Keywords: Business Requirements, Customer Requirements, Marketing, Product Definition, Product Portfolio Management, Usability Engineering, User Centered Design, User Requirements.
Alignment of Product Portfolio Definition and User Centered Design Activities
99
2 The Playground The roles that one or more person(s) might perform in a buying decision, can be classified into six buying roles which are the initiator, the influencer, the decider, the buyer, the user and the gatekeeper [1]. This framework helps to understand the different view angles, expectations and needs of the customers - and the users regarding the same products. Business plans consider all of these six roles to define products, which intentionally influence all factors leading to a purchase decision. One of these buying roles is the user. User Centered Design (UCD) offers established processes, methods and tools to understand and consider this part of the six buying roles, which leads to the authors’ belief, that an early start of UCD activities supports business decisions already in the initial phase of the PL. Another buying role is the decider (the one who decides on the purchase of a product). In the context of this paper, the motivation to make a purchase decision is different for organizational customers that purchase IT systems to be used by members of the organization (e.g. a call center or an intranet solution) and private customers who are actual end-users (e.g. the purchaser of a tax software or mobile phone). These differences will be addresses at relevant points within the paper. The PPD is conducted at the very beginning of a product’s lifecycle. Product portfolios (PPs) consist of a unified basic product platform and product modules, which are tailored to fit the needs of specific market segments. Objectives and requirements of PPs are defined in “product vision” documents [22]. The modules of a PP can be developed and launched as independent projects at different times. There is a wide range of drivers influencing the definition of product vision for PPs. Company-external drivers, such as society and politics, sciences and technology and the target market as well as more internal drivers like the business strategy, the product strategy and existing and planned own and competitive products are to be considered. This paper focuses solely on one aspect of these drivers, the so called “voice of the customer” ([16], [22]) which has to be heard and considered in the definition of product visions and project scopes to tailor the modules of a product line according to customer segments and to align each module with specific customer needs and expectations. Literature on the process of product definition (PD) emphasizes, that the analysis of the context of use provides valuable insights about customers’ needs and expectations and should be considered in the definition of product visions and project scopes ([16], [22]). On the other hand, usability experts (e.g. in the QIU Reference model [8]) and related ISO standards (DIN EN ISO 13407 [7], ISO/TR 18529 [13] and ISO/PAS 18152 [12]) point out, that the interests and needs of user groups that will work with the products should be considered throughout the entire product lifecycle, “from the cradle to the grave” to thoroughly ensure and enhance the ease of use and usability of interactive products.
3 Comparison of Focus and Methods The following comparison identifies activities within both processes which needs to be aligned to assure and increase both the customer and the user acceptance of
100
R. Hofer, D. Zimmermann, and M. Jekal
Fig. 1. The four activities within the PPD and subsequent UCD phase
products. To ease the comparison, both processes are divided into four steps, namely “Analyze Context”, “Specify Requirements”, “Produce Concepts” and “Evaluate Concepts”. This sequence is in line with the iterative human centered design steps [7] and customer centered approaches to define products [16]. For each step, product definition and UCD activities are juxtaposed to identify opportunities to increase the efficiency within both processes by joint activities and to explore the usage related effects of decisions within the product definition phase. The steps are mapped on a schematic diagram visualizing the commonly acknowledged sequence from the PPD phase to the UCD phase. 3.1 Analyze Context Analyzing the Business and Customer Context Within the business context, product visions and project scopes are defined, based on a thorough analysis. This paper focuses on a significant part of the overall analysis activities, namely the identification of “the voice of the customer” [22]. Within this part, significant differences between groups of customers are identified in order to segment markets and detailed insights about each customer groups’ specific current and future needs and expectations are gathered. In the case of product offerings for private customers, information about customers’ geographics, demographics (addressing the social levels and the family lifecycle) psychographics (addressing patterns by which people live and spend time and money) and behavioristics (addressing the customers extent of use and loyalty, unrealized consumer needs and the usage situation) ([9], [10]) supports “the process of dividing a market into groups of similar consumers and selecting the most appropriate group(s) […] for the firm to serve” [19] and provides valuable information about the private customers motivation to make purchase decisions. Common sources to analyze customers’ needs and expectations are problem reports and enhancement requests for a current system, marketing surveys, system
Alignment of Product Portfolio Definition and User Centered Design Activities
101
requirements specifications and descriptions of current or competitive products. These sources are supplemented with interviews and discussions with potential users, user questionnaires, the observation of users at work and the analysis of user tasks [22] to “perform foresight research into potential user groups in order to identify forthcoming needs for systems and new users or user organizations” and to “Identify expected context of use of future systems” [13]. These methods have significant overlap with analysis methods used in the UCD process. Analyzing the User Context UCD processes begin with a thorough analysis of the context of use. The context of use includes “the characteristics of the intended users”, “the tasks the users are to perform” and “the environment in which the users are to use the system” [7]. Additionally, a “competitive analysis” [17] of competitive systems can add valuable information. The characteristics of the intended users include information about their “knowledge, skill, experience, education, training, physical attributes, habits, preferences and capabilities” [7]. This information is summarized in user profiles [14] often represented as Personas ([20], [5]). User profiles help to keep each user group’s specific constraints, abilities and mental models in mind throughout product development. The relevant user goals are captured and analyzed to identify the as-is sequences of tasks that users proceed to reach these goals. The usage environment analysis adds information about “the hardware, software and materials to be used [and] the organizational and physical environment” [7]. Information about the environment helps to consider restrictions and to identify potential opportunities to enhance the product-to-be. Common methods to analyze the context of use are structured on-site visits, structured interviews or interviews using the master/apprentice model [3] with users and customers ([6], [11]). 3.2 Specify Requirement Business and Customer Requirements Business requirements set the overall “product vision” and determine the product portfolio modules to be developed. Furthermore, business requirements contain the identified business opportunity, business objectives and criteria, customer and market needs, business risks, scopes and limitations and the business context containing information about the stakeholder profiles. Customers are a subset of the overall stakeholders considered in the definition of business requirements. Business requirements are the basis to elicit customer requirements for each project. This is done in tight collaboration with customers. Customer requirements can be grouped into nine classes, namely “Business Requirements, Business Rules, Use Cases or Scenarios, Functional Requirements, Quality Attributes, External Interface Requirements, Constraints, Data Definitions and Solution Ideas” [22]. Business as well as customer requirements address issues related to the context of use. High-level business requirements “set business tasks (Use Cases) that the product enables” and “influence the implementation priorities for use cases” [22] and project related customer requirements include those Use Cases.
102
R. Hofer, D. Zimmermann, and M. Jekal
User Requirements User or Workflow requirements specify how the system should support the user to complete his/her tasks and thus have an impact on the early definition of products and market segments [7]. They are captured in Use Cases that “describe the system behavior under various conditions as the system responds to a request from one of the stakeholders, called the primary actor” [4]. The core element of a Use Case is the main scenario, which lists the flow of interaction to reach a specific goal. This interaction flow is improved into a reengineered shall-be-status to “realize the power and efficiency that automation makes possible” and to “more effectively support business goals” [14] and customer requirements. Use Cases are an ideal container to gather all functional requirements necessary to enable a specific user group (primary actor) in reaching a specific goal. As products usually enable several kinds of distinctively different user groups in reaching several goals, Use Cases can be organized into a matrix showing user groups and their respective user goals. This matrix supports decisions concerning the product portfolio elements and project scopes. 3.3 Produce Concepts Business and Project Concepts On the business level, a consistent concept is developed under consideration of the business requirements. This process is of a complex nature, as there is more than one alternative solution for each component of the concept ([2], [18]. On the product level, customer requirements are consolidated into product definition concepts describing the “Place” variable (referring to a geographic location, an industry and/or a group of people - a segment - to whom a company wants to sell its products or services) and the “Product” variable (addressing a product’s functionality, product differentiation, product shape and the Product Portfolio management) of the “4Ps” of a so called “marketing mix”. From a marketing perspective, the “Pricing” and “Promotion” variables supplement the concepts [15]. Methods to systematically derive an optimum configuration of business and product concepts address the visualization of complex requirement interrelations, the production and usage of prototypes and the prioritization of requirements. To deal with the uncertainties given, usually several concepts are derived and evaluated to reduce the risks of misconceptions [22]. User Interface Concept The conceptual phase within the UCD process deals with two major objectives. The first objective is to organize the identified and reengineered tasks into models to describe the overall hierarchy and interrelations of tasks, considering the user and business perspective. The second objective is to translate these models into a consistent specification of the UI through several iterations. The first iteration, focuses on the creation of the “Conceptual Model Design”, which defines “a coherent rule-based framework that will provide a unifying foundation for all the detailed interface design decisions to come“ [14]. This framework, visualized in mock-ups, represents the reengineered task models in a more tangible way and can thus support customer-focused evaluation activities.
Alignment of Product Portfolio Definition and User Centered Design Activities
103
3.4 Evaluate Concepts Evaluation of Business and Project Concepts From a business perspective, evaluation activities address business concepts and product concepts defining the segmentation of markets and the corresponding of products. These concepts are reviewed with customers (usually specific registered customers of the company) and relevant stakeholders and domain experts [22]. Customer requirements are evaluated with customers to get feedback on how to adjust concepts and which concept to choose. Feedback on product concepts is gained by surveys, focus groups, reviews and structured interviews with potential and known customers. In the case of several concepts to be compared, benchmarking methods such as the KANO method or the Conjoint Analysis method [16] are used to identify promising project concepts and marketing mixes. These methods are based on the assumption that customers are able to explain and predict their thinking and behavior [20]. They can be supplemented by methods to gain insights about the 95% of thinking that takes place in customers’ unconscious minds, and strongly affect their purchasing behavior [23]. Additionally, launching products with a limited area of circulation or functionality (single modules, beta versions) provides early feedback from the marketplace. User Centered Evaluation One of the basic principles of UCD is to develop human system interfaces in iterations to decrease costly the chance of changes and revisions at late stages of product development [22]. With this approach, the risk of unforeseen obstacles which might result from reengineered task sequences, task models and UI concepts can be reduced and initially undetected issues concerning the users’ needs and expectations can be considered at an early stage of UI development. There are two types of UCD evaluations. Summative evaluations (e.g. usability tests, benchmarks and reviews) aim at the final assessment of products, whereas formative evaluations are conducted continuously to support decisions concerning UCD concepts within the process. As this paper discusses mutual benefits in joint customer centered and user centered activities in the early “cradle” step of product development, the formative UCD evaluation is of foremost interest. The methods used for formative evaluations at this point of product development are collaborative reviews, expert reviews, validations with users and customers and focus groups. Formative evaluations confirm intermediate results within the process and identify potential areas for optimization or correction.
4 Mutual Benefits As shown, the methods used within product development overlap with methods used in UCD activities. This overlap can be a promising starting point to reduce time and effort (the two basic metrics for efficiency) within product development.
104
R. Hofer, D. Zimmermann, and M. Jekal
Fig. 2. Promising areas for collaboration within the PPD and subsequent UCD phase
The second advantage of a simultaneous proceeding of Product portfolio definition and UCD activities is the opportunity to explore the effects of PPD activities on the context of use within the PPD phase. This feedback is a valuable basis to make adjustments within each of the PPD steps, enhancing the reliability of all subsequent steps and reducing cost intensive change request in subsequent PL phases. This enables the product definition team to adjust analysis plans, requirement specifications, concepts and evaluation focus accordingly. In the following, we summarize all potential areas of collaboration. The areas are mapped on the schematic diagram (Figure 2) visualizing the PPD and subsequent UCD phase, introduced in chapter 3. a) Joint Analysis and Customer Selection The identification of relevant customer and user segments for analysis activities can be simplified by joint collaboration of business and user analysts. Business analysts can utilize user groups described in Personas to segment markets ([20], [22]), which leads to a significant reduction of the set of customers to be investigated [16]. On the other hand, “ethnographic interviewers should use market research to help them select interview partners” [5] and derive user groups [20]. Some of the main methods used to analyze the characteristics of target customers are equally used within the UCD process to gain insights about the characteristics of
Alignment of Product Portfolio Definition and User Centered Design Activities
105
the intended user, their user goals and the environment in which the users are to use the system. A simultaneous analysis approach could therefore reduce time and effort. The relevant interview partners can be jointly interviewed adding valuable mutual insights. As stated by Cooper, “data gathered via market research and that gathered via qualitative user research complement each other quite well” [5]. b) Exploring User Requirements for Product Definition First (jointly analyzed) insights about customers and users can be utilized by UCD activities to “perform foresight research into potential user groups in order to identify forthcoming needs for systems and new users or user organizations” [13] which can be used as a basis for a user groups and goal oriented product modularization and the identification of “technology capabilities that would be needed” [21]. The UCD methods to translate user goals into meaningful Use Case requirements can be utilized in PD to “Identify expected context of use of future systems” [13]. Use Cases fill the “Use case or scenario” class of the customer requirements derived within PD [22]. Furthermore, early insights about the expected context of use can indicate missing analysis data about customers within the customer analysis step. c) Joint Requirement Specification Business requirements “determine both the set of business tasks (use cases) that the application enables” and “influence the implementation priorities for use cases and their associated functional requirements” [22]. Within the requirement elicitation phase of PD, analysts elaborate customer and user statements into general customer requirements. Some of these requirements address statements concerning user goals or business tasks that users need to perform. UCD methods can be utilized to condense these requirements in the form of Use Cases, which cluster all product requirements necessary to fulfill a certain user goal in one single requirement and can thus reduce the complexity of requirements to be considered. [17] In the requirement phase of the UCD process, task sequences are reengineered to optimally achieve the identified business goals. These UCD reengineering activities allow the consideration of improved workflows and changes in users and tasks within the PD phase. d) Explorative Concepts Usage related product requirements can be translated into first conceptual models and mock-ups. Especially in the context of private customers, these mock-ups can be used in the requirement elicitation phase to get early customer feedback and adjust the requirements accordingly. e) Joint Conception In the concept phase of PD, several marketing mix concepts are derived to identify the best mixture of all variables of the product offering. Joint conceptualization activities allow to see the effect of trade-off decisions in the marketing mix immediately and to adjust the marketing mix concepts accordingly. Furthermore, a simultaneous creation of first conceptual UI models increases the real-world character of marketing mixes to be evaluated with customers and users.
106
R. Hofer, D. Zimmermann, and M. Jekal
f) Explorative Evaluation of Usage Related Components of the Marketing Mix Explorative evaluation efforts to “assess the significance and relevance of the system to each stakeholder group which will be end users of the system and/or will be affected by input to or output from the system” [13] provide early feedback in the context of use. Marketing mix concepts can be evaluated up front by UCD activities based on the first set of user requirements to allow usage related concept adjustment within the PD phase. g) Joint Evaluation UCD processes offer appropriate methods to evaluate the (high-level) usability of product concepts. Furthermore, UI mock-ups derived within the UCD processes help to communicate the product part of marketing mixes to customers and users within review and evaluation sessions. h) Positive Influence on Schedule, Budget, Resources and Quality The alignment of PD and UCD activities reduces time and effort, enables to utilize each others expertise and increases the product quality and thereby the predictability of product acceptance of customers and users.
5 Summary This paper identified opportunities to improve the alignment of PPD and UCD activities. It offers a basis for the discussion of how these joint activities can be embedded into established product development processes. Considering the specific requirements of users within the Product Portfolio Definition increases the user acceptance of future products and helps to smoothly implement the UCD process into the overall Product Development: • The users’ acceptance of future products is considered from the beginning and leads to strategic product portfolios aiming at high-level user goals. • As UCD activities can start earlier in the product development process, the time necessary to analyze the context of use in subsequent process steps is reduced. • The simultaneous customer and user focus enhances the shared understanding and awareness of business and user goals across development teams early in the project development process. • Feedback about the user acceptance of portfolio definitions is provided early in the process, which enables the adjustment of product portfolios within the first process steps and thus reduces extra costs of change requests in subsequent steps.
References 1. American Marketing Association: Dictionary of Marketing Terms. Retrieved (February 16 2007), from http://www.marketingpower.com/mg-dictionary-view435.php 2. Becker, J.: Marketing-Konzeption. Grundlagen des ziel-strategischen und operativen Marketing-Managements. 8th edn. München, Vahlen (2006)
Alignment of Product Portfolio Definition and User Centered Design Activities
107
3. Beyer, H., Holzblatt, K.: Contextual Design. Defining Customer-Centered Systems. Morgan Kaufmann Publishers, San Francisco, CA (1998) 4. Cockburn, A.: Writing Effective Use Cases, vol. 1. Addison-Wesley, Boston, MA (2001) 5. Cooper, A.: About Face 2.0., vol. 53. Wiley Publishing Inc, Indianapolis, US (2003) 6. Courage, C., Baxter, K.: Understanding Your Users. A Practical Guide to User Requirements [...]. Morgan Kaufmann Publisher (Elsevier), San Francisco, CA (2005) 7. DIN EN ISO 13407: Human-centered design processes for interactive systems. Brussels, CEN - European Committee for Standardization vol. 9(10) (1999) 8. Earthy, J., Sherwood-Jones, B.: Quality in use processes and their integration - Part 1 Reference Model. Lloyd’s Register of Shipping, London (2000) 9. Engel, J.F., Blackwell, R.D., Minard, P.W.: Consumer Behavior. The Dryden Press, Chicago (1990) 10. Evans, M., Jamal, A., Foxall, G.: Consumer Behaviour. John Wiley & Sons Ltd, West Sussex, England (2006) 11. Hackos, J.T., Redish, J.C.: User and Task Analysis for Interface Design. John Wiley & Sons, Inc, USA (1998) 12. ISO/PAS 18152: Ergonomics of human-system interaction - Specification for the process assessment of human-system issues. ISO, Genf. 8, 9, 11 (2003) 13. ISO/TR 18529: Ergonomics - Ergonomics of humansystem interaction - human-centred lifecycle process descriptions. ISO, Genf. (2000) 14. Mayhew, D.J.: The Usability Engineering Lifecycle, Morgan Kaufmann, San Francisco, pp. 172, 174, 188 (1999) 15. McCarthy, J.: Basic Marketing - A managerial approach. Irwin, Homewood, IL (1960) 16. Mello, S.: Customer-centric product definition. Amacom, New York (2002) 17. Nielsen, J.: Usability Engineering. Academic Press, Boston (1993) 18. Nieschlag, R., Dichtl, E., Hörschgen, H.: Marketing. 18th edn. Duncker & Humbolt, Berlin (1997) 19. Peter, J.P., Olson, J.C.: Consumer Behavior and Marketing Strategy, p. 378. McGraw-Hill Higher Education, Boston (2002) 20. Pruitt, J., Adlin, T.: The Persona Lifecycle. Keeping People in Mind Throughout Product Design. Morgan Kaufmann Publishers (Elsevier), San Francisco, CA (2006) 21. Sengupta, U, Sherry, J.: Future vision 2015: Building a User-Focused Vision for Future Technology. Technology@intel Magazine (9/2004) (2004) 22. Wiegers, K.E.: Software Requirements. In: Practical Techniques for gathering and managing Requirements [...], 2nd edn. Microsoft Press, Redmond, Washington 120, 95,81 (2003) 23. Zaltman, G.: How Customers Think. Essential Insights into the Mind of the Market. Harvard Business School Press, Boston, MA (2003)
A New User-Centered Design Process for Creating New Value and Future Yasuhisa Itoh1,2, Yoko Hirose3, Hideaki Takahashi3, and Masaaki Kurosu3 1
U'eyes Design Inc., Housquare Yokohama 4th Floor 1-4-1 Nakagawa, Tsuzuki-ku, Yokohama, Kanagawa-ken 224-0001 Japan 2 Department of Cyber Society and Culture, The Graduate University for Advanced Studies, 2-12, Wakaba, Mihama-ku, Chiba-shi, 261-0014 Japan 3 National Institute of Multimedia Education, 2-12, Wakaba, Mihama-ku, Chiba-shi, 261-0014 Japan
Abstract. This paper presents a new process model of user-centered design that can be applied to the development of new value and future. Realizing that the widely known conventional human-centered design process, defined by ISO13407, is not always effective, here we propose a new process model and introduce an overview of activities based on this process. This aims at not only developing new value and future, but also in generating new ideas in concept planning. Keywords: User-centered design; ISO13407; Developing new value and future; Concept planning.
A New User-Centered Design Process for Creating New Value and Future
109
ISO013407 processes. Here we will introduce some specific characteristics of the projects we are focusing on, which are outlined below: • The system’s realization date (the launch date) is in the near future • The system will make use of technology that is not currently available • Wishing to add new value but currently having no specific ideas Development that meets these kinds of requirements does not include products that have just become available for sale, but is aimed at products or systems that will be released over a period that could range from 2 to 3 years following development, to up to 5 or 10 years in the future. These products will also include items that will contain entirely new functions or added value, be equipped with a completely new user interface, or fall under the category of a completely new product or service. In order to be able to realize these new functions and added value new technology is often required, as well as a suitable amount of time being necessary for the development of this technology. This often means therefore, that rather than using the most recent technology what is actually required is using technology that, while not actually currently available, will be developed in the near future. In the initial stages of such development there are often cases in which the product or service itself is in the middle of the planning process, and this often leads to cases in which it is necessary to create new ideas regarding new value and include an investigation of the feasibility of actually realizing this as part of the product planning process. In this study we will introduce a conceptual model for a user-centered design process that involves a system that is both close to realization in the near future and that is capable of creating new value. This period of the near future is defined as being from 2 to 10 years from the current date.
2 Scope of a New User-Centered Design Process 2.1 Scope Table 1 shows the scope of the proposed process model. The points that divide the scope are whether there is actual new value in the system being examined (either a product or service) and the proposed realization date of the relevant system. In attempting to acquire quantifiable values from each axis for both realization date and new value it is impossible to actually divide these values qualitatively, but the figure does present a general concept of how these figures can be distinguished and separated. For areas in Table 1 that show no new value and whose realization date has been only been recently realized then the ISO13407 process model is thought to be a suitable model for use in bringing these products to development. For items that have already undergone the product planning stage under ISO13407, however, these can be treated using the “Specific requirements for user-centered planning” as defined in the upper-left panel in Figure 1. After making the decision as to whether the product or service is in need of user-centered design then we think that they can then undergo the same actual process. In contrast to this, however, in the development process that is the subject of this current study then the user-centered design process will start from a stage taken as being during the initial product planning process (Figure 2). This will result in the planning process being incorporated as the first of a series of processes.
110
Y. Itoh et al. Table 1. Scope of a new user-centered design process Realization time for the relevant system
Yes
New value? No
Recent events
Near future
Development of new value that has been recently realized
Development of new value that will be realized in the near future
*Suitable for application using the proposed process
*Suitable for application using the proposed process
Relatively little development of new value that has been recently realized
Relatively little development of new value that that will be realized in the near future
*Suitable for application using ISO13047
*Suitable for application using the proposed process
Fig. 1. Process of human-centered design activities
2.2 The Process Model Figure 2 shows a general outline of a conceptual model for the user-centered design process that we propose here. In this model, in contrast to the process model outlined in Figure 1 that is included within the conventional ISO13407, an additional 3 processes have been added: 1) User-centered planning, 2) Study and prediction of future circumstances, and 3) Selection and creation of new value. 1) The user-centered design process differs from cases of development in which the planning stage has already been decided and involves the process beginning from the initial product planning process stage. As the product planning that takes place here includes a user-orientated philosophy we therefore decided to name this process as user-centered planning.
A New User-Centered Design Process for Creating New Value and Future
111
Fig. 2. A new user-centered design process for creating new value and future
2) The study and prediction of future circumstances is a necessary process for envisaging an actual realization time for the relevant product or service in the near future. In the event of the development period being within the several months or between 1 to 2 years from the current period, it can be assumed that future circumstances and users will be virtually unchanged from the current period and suggesting that development can already take place. In contrast to this, however, if the realization period is in the near future (anticipated as being between 2 to 10 years in the future) then it is likely that a wide variety of factors will change in this period, including the currently available technology, and it is also difficult to envisage future users having the same needs and requirements as current users. In this event it is necessary to conduct a study of future circumstances and global changes as well as attempting to predict the characteristics of potential users in the future, together with the anticipated conditions for the relevant product or service. This particular process of study and predication of future circumstances is therefore an integral part of the proposed process. 3) The process of creating and selecting new value is also connected to usercentered planning. In the event of new value being one of the requirements of this planning process, then coming up with new ideas is an essential element of this process. If such ideas are subsequently found to be of high value and feasible for implementation then these can be used as the basis for the refinement of the product planning process. In order to carry this out, however, it is first necessary to develop a number of creative ideas. This involves generating a number of ordinary ideas and subsequently choosing the best ideas from this number for use as the basis for refinement of the product planning process. This element of generating and selecting new ideas is an important factor in the user-centered design process.
112
Y. Itoh et al.
2.3 The ISO13407 Process Model and Its Application In addition to the 3 processes outlined in section 2.2, there are a number of processes that share a number of the same points as ISO13407. The content of each of these processes, however, has undergone some change and expansion, and the content of each of these processes will be touched on in chapter 3. 2.4 The Life Cycle of ISO/IEC15288 and Its Application Table 2 shows the system life cycle stages of ISO/IEC15288[2]. We consider our new proposed process (Figure 2) as actually corresponding to the concept stages and development stages of ISO/IEC15288. In our proposed process we anticipate each activity involved in the concept stage and development stage to undergo repeated activity. When this occurs then there is a possibility of switching between both the concept stage and development stage, although in the event of this not fully satisfying the user or organization, or from the planning point of view, then the process will return to the previous stage and the overall process will be repeated. Table 2. System life cycle stages and purposes [2] LIFE CYCLE STAGES
PURPOSE
DECISIONS
Identify stakeholders’ needs CONCEPT
Explore concepts Propose viable solutions
DEVELOPMENT
PRODUCTION
Refine system requirements
- Execute next stage
Determine system components
- Continue this stage
Build system
- Go to previous state
Verify and validate system
- Hold project activity
Mass produce system
- Terminate project
Inspect and test UTILIZATION
Operate system to satisfy users’ needs
SUPPORT
Provide sustained system capability
RETIREMENT
Retire; archive or dispose the system
3 Proposed Process Activity 3.1 User-Centered Planning Product planning is an essential element of the process to develop the relevant service or product. Although product planning does give rise to technology-driven planning in a number of cases, this process adopts a system that doesn’t rely solely on
A New User-Centered Design Process for Creating New Value and Future
113
technology but also involves planning that takes into account the perspective of users who will actually use the system. Based on the subsequently developed planning then the realization date for the service and requirements from the planning side for creating new value can subsequently be determined. The process model that we describe here is expected to be mainly involved in systems whose realization date is in the near future and which require the creation of new value. It is also possible, however, to utilize this process for cases in which the creation of new value is required and the realization date is more recent, or in which the realization date is in the near future but which involve no demand for the creation of new value. In such cases some parts of the process will not be required (for the former case there will be no need to carry out a study and prediction process for future circumstances; for the latter case then the process for creating and selecting new value will become redundant). 3.2 Understand and Specify the Context of Use The proposed process will also involve carrying out a survey and analysis of users. The subjects of such a survey should be the actual anticipated users of the relevant system. Particular attention should be paid, however, in not being able to carry out a survey of potential future users in looking to determine a future realization date for the system. If the proposed realization date is 1 to 2 years following the planning process then a survey can be carried out based on the assumption that future users will not be noticeably different from current users. If the system or product’s current use and users are unclear then it will be difficult to actually develop a system in the future, meaning that is essential to carry out a survey of current users. The results of this survey can then be used as requirement definitions for the system as well as being used as important original data for creating new value. 3.3 Study and Prediction of Future Circumstances In focusing on being able to implement the system in the immediate future it is first necessary to carry out a study and prediction of future circumstances. While it is impossible to completely predict the future it is possible to survey and predict the future as much as possible relating to the development of the system and its targeted users. If the proposed realization date is only 1 to 2 years following the initial planning process then future users can be expected to not be noticeably different from current users and there should no significant changes [7]. Events that are anticipated as undergoing some change can also be expected to undergo quantitative prediction based on extrapolation of previous data [7]. Despite this, however, it is still important to remember that considerable change can still occur in new technology, products or services, and that the rate of usage or adoption of the relevant service or products is also subject to significant change. In the event of the realization date expected to be in the near future (roughly 2 to 10 years following the initial planning process) then it is only to be expected that significant change will take place between conditions now and in the future, meaning that being able to predict the future is a valuable facility. Although it is impossible to
114
Y. Itoh et al.
completely predict the future, in carrying out the principles of scenario planning this offers the potential of being able to portray a number of different scenarios for the future [4], [5]. In order to be able to carry out these predictions it is first necessary to fully clarify the items related to changes in the future and find out the principle factors that contribute to change on a global-scale [4], [5]. With these factors as the core it is then necessary to consider a number of different possible futures. Figure 3 shows a conceptual diagram of futures that have a high potential of actually coming about. The number of futures with a high potential of actually being realized are not limited to just one possible outcome, meaning that a number of different scenarios should be drawn up with different prospects for the future. The future scenario drawn up in this case includes stages showing each of the specified requirements and this will also be used as an important basis for data in creating new value.
Fig. 3. Model of future scenario planning
3.4 Specify the User and Organizational Requirements, and Future Circumstances This process involves using the results from user-centered planning, understanding and displaying usage, and a study and prediction of future circumstances to extract the necessary requirements for the relevant system and subsequently describing these in a text format. Outwith the displayed contents of the requirements involved in the ISO13407 process, it is also necessary to display requirements from the initial planning stage and for future circumstances. The usability requirements for the system and required conditions for the relevant functions are the same as conventional requirements. What should be of particular attention is that the results gained from the study and prediction of future circumstances can lead to a definition of what technology can be used and what type of technology will be unsuitable for use in the realization date to be decided in the future. These will also act as restrictions in creating new value. In order to be able to choose from a wide variety of different ideas in creating new value
A New User-Centered Design Process for Creating New Value and Future
115
it is necessary to define a rating scale for ideas and this rating scale can be developed based on the requirements for new value. 3.5 Creation and Selection of New Value This particular process contains a distinctive element to the process model and this process is essential if the creation of new value is required from the product planning stage. What is called new value in this situation is not simply a few minor changes to the product or a new level of model change, but rather the introduction of completely new functions, a new user interface, high added-value that previously didn’t exist, or a product or system that has been implemented based on new findings. A range of creative ideas is therefore necessary to be able to realize such new value contained in these products, and this usually involves implementing brainstorming sessions or individual thinking by product planning and design staff. It is then possible to select the most appropriate ideas that are developed and generate a concept using the best ideas. This will ultimately be compiled as part of the product planning process. This particular process takes advantage of the ideas of user-centered design and the results of user surveys and analysis, as well as the usage and predictions for a future world, future markets, and future users as a base for carrying out creative thinking and developing ideas. Regarding specific methods for creative thinking we are currently planning to explore this in a separate study and publication. Generating a broad range of multiple ideas means that these should be subject to the rating scale for ideas developed in section 3.4 and subsequently used to carry out a quantitative evaluation with the ensuing results used as a basis for selecting the most appropriate ideas. 3.6 Produce Design Solutions The requirements that include the selected ideas can then be used to design and develop a range of solutions. At this time we recommend that a number of different prototypes of the relevant product be created. As there is a concept stage element involved in this particular process this means that this is implemented as a result of selecting planned ideas that have a high level of feasibility of being implemented. For each process involving user-centered planning, displaying the relevant requirements, creating and selecting new value, and development of solutions through planning, we think that in some cases there may a repetitive and simultaneous carrying out of such processes in a progressive manner, although these processes will not necessarily be carried out in the order indicated by the arrows in Figure 2. This is the same process as occurs in ISO13407 [1],[3]. 3.7 Evaluate Designs Against Requirements Products or system prototypes that have been created by the previous process can then be evaluated using this process. The evaluation will essentially be implemented among anticipated users of the relevant product or system and the evaluation method used will be based on a usability test and user test. These will, however, differ from regular tests in that the anticipated users will be users at some point in the future. Although it is impossible to actually carry out an evaluation test on future users, it is possible to carry out a test on subjects who are anticipated as being relatively close to such future users.
116
Y. Itoh et al.
In order to carry out an evaluation of systems that will be used in the future, rather than performing an evaluation on regular users it is preferable to take measures to carry out such a test on progressive users of the product or system [6].
4 Conclusion Here we introduced a conceptual model for a user-centered design process for use on systems that involve the creation of new value and will be realized and implemented in the future. Although efforts are already underway into the development of systems that use such a conceptual model, these systems are currently in use and we have yet to see any clear results from these efforts. In the future we plan on further investigating the effectiveness of this process, as well as continuing to use this process model as part of the development process.
Acknowledgement We presented the first draft of this paper at the HIS2004.
References 1. ISO13407: Human-centered design processes for interactive systems (1999). JIS Z 8530: Human-centered design process for interactive systems (2000) 2. ISO/IEC 15288: Systems engineering - System life cycle processes (2002). JIS X 0170: Systems engineering - System life cycle processes (2002) 3. Kurosu, Hirasawa, Horibe, Miki: Understanding human-centered design processes for interactive systems, Ohmsha (2001) 4. Schwartz, P.: The Art of the Long View, John Wiley & Sons (1997) – Translated as Shinario puraningu no giho (Scenario Planning Techniques) (trans. Taomoto and Ikeda), Toyokeizai (2000) 5. Teramoto, Yamamoto, Yamamoto: Advanced Evaluation of Technology, Nikkei BP (2003) 6. Holmquist, L.E.: User-Driven Innovation in the Future Applications Lab, In: Proc. CHI2004, pp. 1091–1092 (2004) 7. Sherden, W.: The Fortune Sellers: The Big Business of Buying and Selling Predictions, Diamond (1999)
The Evasive Interface – The Changing Concept of Interface and the Varying Role of Symbols in Human–Computer Interaction Lars-Erik Janlert Department of Computing Science Umeå University, Sweden [email protected]
Abstract. This is an analysis of the changes the concept of interface is going through in the shift from the currently dominating virtuality paradigm of use to two new use paradigms, namely ubiquity and mobility; an analysis of the concomitantly shifting role of symbols in relation to the user and to the world; ending with an attempt to identify and analyze important research issues in the new situation that arises, two of which are to better understand the various ways different kinds of interface symbols can link to their real-world referents, and how to combine tracking reality with supporting the user’s own thinking.
1 Changing Paradigms of Use, Changing Notions of Interface There is enormous diversity in the ways modern information technology—that is, computer, telecommunication and interface technology1—have been put to use. Narrowing down to uses that would normally count as involving a “user” and falling within the field of study of human–computer interaction (HCI), still leaves a very great variety. On a high level of abstraction it is possible to discern general, broadly characterized forms of use, which may be helpful in identifying and understanding long-term trends and important challenges ahead. Often, specific technological advancements (e.g. in display or telecommunication technology) play a major role in determining new forms of usage, but there is also considerable inertia in a wellestablished form of use, striving to assimilate technological changes while retaining basically the same form. In this paper three of the most important paradigms of use in the last decades will be identified and examined: one older and well established—the virtuality paradigm; and two new, which are rapidly gaining ground theoretically as well as in practical applications—the ubiquity and the mobility paradigms. The purpose of this analysis is to draw some conclusions from the changing notion of interface and to identify some central research issues that arise as a consequence of the ongoing paradigm shifts. The choice of the term “paradigm” in this context is inspired by Thomas Kuhn’s famous 1
Usually just “information technology” (IT) or “information and communication technology” (ICT) as if deliberately ignoring the fact that such technologies (per definition) have been around since the beginning of history.
notion of scientific paradigms [9]. A use paradigm comprises important design examples, use scenarios, specific techniques and technologies, specific views on key concepts, such as what a “user” is, and what goals to pursue in HCI—and, not least important, groups or communities of people (researchers and interaction designers) developing and defending the paradigm. Unlike the scientific paradigms in Kuhn’s understanding of scientific development, however, new paradigms in HCI seldom completely replace old ones; even if a new paradigm becomes predominating over time, older paradigms can find niches where they survive. In this manner several use paradigms can coexist and come to be seen as complementing each other. Shifting paradigms of use imply shifting notions of interface. The interface concept in HCI has emerged from a variegated background: the precise physical specifications of components necessary for industrial mass production and assembly, the control panels and steering devices of complicated engines, and, of course, software interfaces between different parts of complicated programs. Within HCI, the interface concept has developed into a complex and multifaceted notion, and the development with regard to the three chosen use paradigms will specifically be studied here. The changing role of symbols is of particular interest. 1.1 The Role of Symbols in Human–Computer Interaction Human–computer interaction, as it is usually understood (which includes the three paradigms examined here), invariably involves the use of symbols, in the general, technical sense of the term [1]. Symbols are being used to represent input, control settings, system status, events, ongoing processes, available resources, possible user actions, results, outputs, etc., all for the benefit of the user. The earliest use of computers was as advanced calculating machines. Since then, not only has there been important changes in the kinds of symbols used, but also in what they are used to refer to, how the user envisages the relations between self, symbol and referent, and how these relations are upheld, in the abstract and concretely. Broadly speaking, in this context symbols may serve three general purposes: as a means for the user to access and acquire information; as a means for the user to supply information, including with the special purpose to achieve certain ends; as an instrument simplifying, supporting or extending the user’s own thinking. The first two are easy enough to understand: symbols are used for output; they are also used for input, data as well as control. The third purpose, supporting cognition, is less obvious but of special interest here. 1.2 Cognitive Artifacts Donald Norman [12] introduced the concept of a cognitive artifact, defining it as “artificial devices that maintain, display, or operate upon information in order to serve a representational function and that affect human cognitive performance.” Computer applications that normally concern HCI are generally cognitive artifacts. But there are two different senses in which a cognitive artifact can assist thinking, related to the distinction Norman makes between “personal view” and “system view.” One sense is that it can substitute for (parts of the) thinking in performing a certain task. An example would be the pocket calculator. The user doesn’t have to do all the thinking
The Evasive Interface
119
involved in the usual routine for multiplying numbers using pen and paper. The pocket calculator requires the user to press the right buttons to input the right numbers (watching out for errors) and the desired operation, and read off the result, too—but that is very much the same as in the pen-and-paper version. Norman would say that the task for the user has changed, the “personal view” has changed significantly, whereas from the “system view,” the result should be the same but delivered faster and possibly with less errors. The other manner in which a cognitive artifact can assist in the cognitive work of the user is well illustrated by the manual method of multiplying numbers using cognitive artifacts such as pen, paper, mathematical notation such as numerals and arithmetical operator symbols. In many cases, these two senses can be two different aspects of the same artifact. For this to happen, it is important that the symbols employed in the interface are chosen with care: they should raise the level of abstraction in such a manner that they really support fruitful higher-level thinking on the user’s part. We should not underestimate the extent to which computer applications can support users in their own thinking, serving as cognitive artifacts in the second sense. Spreadsheet applications and word processors are typical examples of a range of common applications where the support for the user’s own thinking is about equally important as supporting the user in producing results. 1.3 Thinking Versus Doing and Perceiving To be able to claim that thinking is taking place, it is of some importance that a distinction between thinking and doing can be maintained, even if it is relative and at a symbolic level itself: e.g. to entertain the possibility of X should not be tantamount to causing X.2 Thinking by “doing” is certainly sometimes a possibility, for example when we think about how to best lay the table by trying out different placements of plates, cutlery, glasses, etc.. It is reported that in playing the game of Tetris, contrary to what one would expect, as users become more skilled they increase rather than decrease the number of “epistemic” actions, i.e. actions performed to uncover or speed up access to information, compared to “pragmatic” actions, i.e. actions performed with a purpose to put the current piece in its chosen place and orientation [8, 10]. In some circumstances it seems difficult to say whether an action is part of the thinking preceding the “real,” effective action or the effective action itself, until after the fact. Computer applications supporting undo encourage such tentative actions, but if the ultimate purpose of the application is some real-world implement or effect, we can still see it as a (productive) play with symbols; at least as long as the symbols are more easy to change than their (ultimate) referents, and failures are less devastating at the symbolic level than at the referent level. In considering various hypothetical stages of thinking in evolution, Daniel Dennett arrives, first at what he calls the “Popperian creature,” which, as Karl Popper succinctly put it “permits our hypotheses do die in our stead,” and then at the “Gregorian creature,” named after Richard Gregory, which is also able to take cognitive shortcuts by importing “mind tools” from the environment [1]. 2
Compare Hegel’s remark in Lectures on the philosophy of history, that whereas animals cannot interpose anything between an impulse and its satisfaction, human beings have the ability to arrest an impulse and reflect on it before letting it pass into action [5].
120
L.-E. Janlert
Mainstream cognitive science has been attacked from different quarters for attaching too much importance to thinking with the help of symbols. Within HCI there have been several attempts to rectify the predominance of interaction through explicit symbols, by investigating alternatives in the direction of rich perceptual experiences and complex physical actions, which presumably make better use of natural human capabilities to interact. An influential case in point is the concept of affordance brought in from ecological psychology and adapted to HCI by Donald Norman [14]. Affordances, in Gibson’s original version, are not symbols (possibly they might count as indices in Peirce’s taxonomy of signs), they are rather perceptual cues that trigger responses, behaviors [3]. Still, it is one thing to perceive that a button invites to being pressed, another to know what the effect will be, and when and for what purpose it is appropriate to perform the action. In the pedagogical examples Norman likes to use, such as operating doors and water taps, the function of the artifact is severely limited and well-known: just about the only thing you expect to be able to do with a door is opening and closing it, so if you perceive a button-looking feature that invites to pushing, you can reasonably infer that pushing the button will either open or close the door. Going further in the direction of tangible user interfaces, consider computerized artifacts that lack a dedicated symbolic interface, e.g. a computerized chair that adapts to your body, interprets your spontaneous, small movements, learns and remembers your favorite positions, wakes you up when you fall asleep, makes you change your posture when you have been sitting too long in the same position, etc.. It may be an academic question whether this is really HCI, but researchers and designers will have to deal with such cases. At this time, however, none of the paradigms studied here seem to include artifacts of this kind.
2 The Virtuality Paradigm In what may be called the virtuality paradigm, the interface is a means for the user to access a different and symbolic world. This is the use paradigm that has become so common and dominating that we are hardly aware of it. The user ultimately wants to get through the interface, partly (as in the typical graphical user interface, GUI) or completely (virtual reality), into that other world. Transparency is commonly seen as an ideal. In engaging with the virtual world, the user more or less shuts out the real world and the specific situation of use; it rather disturbs the interaction and task performance. Maintaining links and relations between the symbols and the real world is the responsibility of the user and the service provider: mapping real world regularities and state of affairs into symbolic models, and interpreting and mapping symbolic results back for application in the real world. This arrangement puts the user in the position of a middleman: streams of information pass through the user in both directions; the user easily becomes a bottleneck, exhausted and confused by the traffic, afflicted by information and communication overload. Although taking its name from virtual-reality technology (VR)—which may be said to have as its ideal the complete immersion of the user in an alternative, virtual world appearing as real to the user as the real world—the virtuality paradigm not only
The Evasive Interface
121
antedates VR, but also GUIs.3 The “other” world accessed through early textual interfaces, before the advent of graphical user interfaces, was also a symbolic world, typically consisting of mathematical models and data about the real world. It was a rather abstract “world,” usually lacking spatiality and shape, in some sense comparable to the world evoked by a book. GUIs transformed these abstract and spatially weak symbolic models into what could be more properly be called worlds, directly accessible to the user’s perception, in the process also replacing the previous conversation model of interaction with the acting-in-a-world model. In some sense this parallels the step from book to motion picture. Interface concept. The interface provides access to a different and symbolic world, whether the means are textual or graphical (or involves other modalities). The interface is something the user wants to reach or get through, to engage in the virtual world behind. Graphical interfaces open for a more vivid interpretation of “world,” and the interface can be viewed literally as an opening. Use scenario. The user accesses or enters the virtual, symbolic world via the interface in order to perform some operations in the world, to retrieve information, to update and develop. In many cases this is done in order to support some real-world activity: tasks arise in the real world; the user enters the virtual world beyond the interface, for help and assistance, mentally more or less leaving the real world (since it is difficult to engage in more than one world at a time); and eventually returns to the real world with an answer. In preparation for future uses, the user may also learn facts about the real world, and enter the virtual world to record or modify the facts or change the model. Whereas it is hard to find examples of virtual worlds that bear absolutely no relation to the real world, some uses are undeniably more escape from, than support for, the real world. Symbols. In most cases the symbolic world thus represents aspects of the real world even if large parts can be hypothetical, counterfactual, even fantastic. The task of keeping track of which real-world referents the symbols have, and what status they have, falls on the user and the service provider (maintaining the basic model, updating variable data). The situation of use is not linked to the model world. When engaged in using the application, the application is basically the only means of accessing the real world, which usually means a rather abstract, alienated view of the world, with little chance to verify that the virtual world gives a correct picture of the real state of affairs, especially since the user is in principal cut off from the real world by the very way the interface concept works. 2.1 Mixed Reality Leaving the purely virtual approach where symbols are unaffected by the real world, there are now many applications in which virtual world elements are causally coupled to real world counterparts. Some actions in the virtual world have real-world effects, 3
Similar to how computer graphics has had as its longstanding ideal the ability to produce pictures qualitatively indistinguishable from photographs of any actual or imaginable realworld scene.
122
L.-E. Janlert
they are not just symbolic actions; some real-world changes are reflected in virtualworld updates. This is part of the idea of cyberspace as interpreted by among others Benedikt [1]. By this move users are somewhat relieved in their role as mediators. Information can bypass the user. Some tasks can be completely automated, taking the user out of the loop completely. Typically though, real-world feedback to the user through the interface is weak and abstract, giving the user a feeling of unreality (as e.g. in computerized warfare). In lifting part of the responsibility of connecting symbols to reality off the user, the overview of consequences and quality of control may suffer. In the case of more radical forms of mixed reality, like augmented reality, the user may face a single world that is a fusion of real world and virtual world, where it potentially may become difficult to distinguish what is real and what is just a symbol, or perhaps even to insist that the distinction still exists. There are two types of augmentation. The first is to superimpose extra information (normally inaccessible to the user’s senses) about the real world on top of the real world elements it is about, producing a kind of “annotated reality.” The second is to introduce elements, components, aspects that are simply non-existent, fictional, in relation to the real, actual world. The first type of augmentation is less problematic as long as the extra symbols are easy to distinguish as such (e.g. textual annotations); the second kind is more problematic: it is what may turn this into a kind of “magic reality,” where you might become uncertain whether you can walk through that wall or not. Of course, it is not easy to freely mix fantasy with hard reality if the basic requirement is that reality is perceived as such and as it is. This branch of the virtuality paradigm is not so well developed yet—it clearly needs the addition of mobility to become more than very locally realizable—so it will have to await further analysis, but potentially there is a whole new use paradigm hidden here, just waiting for the right technology: efficient, comfortable, and cheap. One interesting technical possibility is hand-held VR [6].
3 The Ubiquity Paradigm If the old idea was to put a world into the computer, the new idea is to put the computer into the world of real objects and environments. In what may be called the ubiquity paradigm, ubiquitous computing and computer artifacts divide the traditional interface into a multitude of individual thing and environment interfaces. The computer artifact is reality, and the interface is a way to use and control the real thing. This is a notion of interface more in line with traditional industrial design: an envelope of the object, negotiating between inner and outer environment, as elaborated by Herbert Simon [16]. Whereas the virtual approach is arbitrarily free relative to the real world, the ubiquitous approach tends to be earthbound, welding symbol and object together, as in the notion of the object symbol introduced by Donald Norman and Edwin Hutchins [15, 14]. In the more environmentally oriented areas of ubiquitous computing, such as calm technology, introduced by Mark Weiser [17], the unobtrusiveness and even invisibility of the interface is emphasized. The interface can signal real-world state affairs, but it should not be in the form of proper symbols, rather like indexical signs in nature (e.g. smoke or smell of burning indicates fire).
The Evasive Interface
123
Interface concept. The interface is the surface of a real, clearly distinguishable physical object, which it covers and is the means of controlling. The “invisibility” ideal, the interface as something the user should not have to think about, is an ideal of superficiality—everything of importance is on the surface—which is complementary to the transparency ideal of the virtuality paradigm. Use scenario. Users use facilities on site, wherever they happen to be, use objects and devices where they are present, for purposes that are pertinent to the situation of use. Computer artifacts typically have specialized functions (compared to the traditional general-purpose computer), dedicated uses. Symbols. Symbols are strongly real-world related, more precisely to the real-world situation of use, to the point where symbol and referent threaten to fuse into one entity. There is no reference to a different world. Accessing the symbols is accessing the real world, here and now. 3.1 The Problem with Object Symbols Three of the most basic expectations we have on symbols are: 1) that they are lightweight and easy to manipulate compared to their referents; 2) that they can be at a distance from their referents; and 3) that they can symbolize states of affairs other than the real and actual. In dropping one or two of these conditions, the third in particular, we also lose some or all of their ability to serve as tools for thinking. They may still work as tools for observing and acting. The notion of the object symbol was put forward to encourage very tight couplings between symbols and referents in HCI—as well as in artifact interaction in general, but in many older, mechanical artifacts, this tight coupling is already present and seen as an ideal by Norman: “when the object in the artifact is both the means of control (for execution of actions) and also the representation of the object state (for evaluation), then we have the case of an object symbol” [12]. It seems that object symbols violate all three of the above conditions for symbols. Per definition they violate the third condition, and thus give poor support for the user’s own thinking: if you cannot represent counterfactual state of affairs, if you do not have the ability to fantasize, you are not, properly speaking, thinking at all. Tracking reality is not thinking. Per definition, object symbols also violate the second condition: when objects represent themselves or a larger artifact of which they are a proper part, they cannot be at a distance from their referent. Of course, in the ubiquity paradigm, this is a feature, not a bug. Again, much depends on how cognitively sophisticated applications and artifacts we consider. For example, since we do not use stoves to help us think, perhaps the idea of object symbols might work out fine. Imagine that the knobs of the stove are object symbols: not only can the user control the heat by turning the knob, the current temperature is simultaneously indicated by the current angle of rotation of the knob. Here we see the effect of violating the first condition: if the stove has ordinary electric heaters, the logic of object symbols will require the user to apply torque to the knob for as long as it takes the stove to reach the desired temperature. Not very convenient. And there is another problem: if the symbol really works both ways, how does the user express desired artifact states except by constantly working the controls? What stops the stove from gradually
124
L.-E. Janlert
getting cooler, slowly turning the knob to indicate lower and lower temperature? In many ways it is easier to make interfaces to virtual worlds than to the real world where you cannot adjust the physics to suit the desired logic of the interface.
4 The Mobility Paradigm Another important new use paradigm is mobility, very much a consequence of mobile computing, using mobile, “untethered,” and (usually) small units, connected through wireless technology. Mobility brings two new scenarios of use: remote operation, which is the main focus of practical applications at present; and, more important, in situ application, which is just beginning to be explored. The latter creates a new kind of situation with regard to the interface. Bringing computer applications to bear directly and dynamically on their very point of use in the real world, precisely where the user is in space and time, the user will need to relate symbols with their also present real-world referents—contingent on real-world location and real-world changes. Contrary to the virtuality paradigm, the real world and the actual situation of use in particular, is not a distraction but a resource as well as an, obviously present, target for the use of the application. Interface concept. The interface concept is not one and fixed. One possible concept ties in with the remote access use scenario, basically inheriting the interface concept of the virtuality paradigm. With regard to the in situ use scenario, the issue of interface concept is interesting but so far unresolved: it is clear that like in the ubiquity paradigm, the interface must relate closely to the objects and environment at hand in the situation of use; on the other hand the interface must allow access to informational and computational resources not tied to a particular real-world location or time, like in the virtuality paradigm. Use scenarios. There are thus two use scenarios. One scenario is remote access and control, that is, use independent of situation, which can be seen as extending the virtuality paradigm to allow remote operation from wherever the user happens to be; as if bringing along your desktop computer, connections and all. The second use scenario is the exact opposite, in situ application: use is determined by and dependent upon the situation. The computational resources are brought to bear on the very situation of use and user. Symbols. For the remote operation scenario and interface notion, symbols work similar to the virtuality paradigm. For the in situ application scenario and interface notion, we have a more complex situation. Some of the symbols need to relate to referents that are copresent with the user: the user needs to mentally and dynamically link present realworld referents to symbols in the interface. This is different from both the virtuality paradigm where the user disappears into the interface, and the ubiquity paradigm where the referents are within the artifact itself, so it puts the interaction designer in a new kind of situation. We have not really had to deal with how the user is supposed to match symbols to particular, present real-world referents, dynamically and efficiently, before. In [7] there is an attempt to begin a systematic investigation of the possibilities to make this kind of linking in the particular case of visual symbols.
The Evasive Interface
125
4.1 Context Awareness and Use Situation The mobility paradigm brings with it the opportunity and challenge of context-aware computing (CAC) [11]. Many suggested applications of CAC build on the assumption that the physical setting and situation largely define social roles and agenda. Ironically, just when we have the means to automatically silence mobile phones as we enter the meeting room (remote operation scenario), it is becoming less obvious that we should do so, and less axiomatic that “meeting room” is a physically fixed location with this one purpose. If before, the physical environment very much determined the social environment —e.g. a class room is for teaching, which involves teacher and pupils playing their particular roles—and, vice versa the informational environment, i.e., the available informational and computational resources, very much determined the physical environment—e.g. to access the reference literature you would need to go to the library—with the mobility paradigm of use we now are both freer to mix environments and more exposed to inconvenient environment combinations (e.g. driving and using the mobile phone at the same time). Before, the user would typically do one thing at a time; handling the physical stuff, negotiating with people, and doing the thinking and information work, in turns. The mobility paradigm creates a condition where the total situation of use (i.e. the information situation, the social situation, and the physical setting) has to be taken into account in parallel, and where the course of events in each environment no longer can be assumed to be well correlated with the course of events in the others.
5 Conclusion Earlier and more recent developments in HCI have worked to modify, extend and elaborate the concept of user interface, making it the complex and multifaceted notion it is today. The meeting of the established virtuality paradigm with the new ubiquity and mobility paradigms (and there are no signs at this point that any of these three paradigms will recede into the background), seem to result in a confusion of options and requirements that need to be satisfied regarding the status of symbols and their relation to the real world. The mobility paradigm, in particular, produces some new research challenges by bringing to the fore the issue of linking interface symbols to the real world at the very point of use. Research challenges identifiable from the above analysis include: examining and developing the various ways different kinds of symbols can link to their real-world referents, as seen from the user’s point of view; investigating how conceptual links can be turned into effective perceptual links; studying how to make different statuses of relation between symbols and reality perspicuous to the user, as well as the distinction between symbol and reality itself; and finding out how in doing all this we can strike a balance between tracking reality and allowing symbolic “freedom of thought” supporting the user’s thinking, without confusing the user too much. When it comes to practical answers they will certainly depend on the application, on the particular circumstances and functions.
126
L.-E. Janlert
References 1. Benedikt, M.: Cyberspace: Some Proposals. In: Benedikt, M. (ed.) Cyberspace: First steps, The MIT Press, Cambridge MA (1991) 2. Dennett, D.C.: Darwin’s Dangerous Idea. Simon & Schuster, New York (1995) 3. Gibson, J.J.: The Ecological Approach to Visual Perception. Lawrence Erlbaum Associates, Hillsdale NJ (1986) 4. Goodman, N.: Languages of art: an approach to a theory of symbols, 2nd edn. Hackett, Indianapolis IN (1976) 5. Hegel, G.F.: Vorlesungen über die Philosophie der Geschichte (1837) 6. Hwang, J., Jung, J., Kim, G.J.: Hand-held Virtual Reality: A Feasibility Study. In: Proceedings of the ACM symposium on Virtual reality software and technology, pp. 356– 363. ACM Press, New york (2006) 7. Janlert, L.E.: Putting Pictures in Context. In: Janlert, L.E. (ed.) Proceedings of the working conference on Advanced Visual Interfaces, pp. 463–466. ACM Press, New York (2006) 8. Kirsh, D., Maglio, P.: On Distinguishing Epistemic from Pragmatic Action. Cognitive Science 18, 513–549 (1994) 9. Kuhn, T.S.: The Structure of Scientific Revolutions, 2nd edn. The University of Chicago Press, Chicago (1970) 10. Maglio, P.P., Kirsh, D.: Epistemic Action Increases With Skill. In: Proceedings of Twenty-first annual conference on the cognitive science society, Lawrence Erlbaum Associates, Hillsdale NJ (1999) 11. Moran, T.P., Dourish, P. (eds.): Context-Aware Computing. Special Issue of Human– Computer Interaction 16(2–4) (2001) 12. Norman, D.: Cognitive Artifacts. In: Carroll, J.M. (ed.) Designing interaction, Cambridge University Press, Cambridge (1991) 13. Norman, D.: Emotional Design. Basic Books, New York (2004) 14. Norman, D.: The Psychology of Everyday Things. Basic Books, New York (1988) 15. Norman, D.A., Hutchins, E.L.: Computation via direct manipulation (Final Report: ONR Contract N00014-85-C-0133). Institute for Cognitive Science, La Jolla CA. University of California, San Diego (1988) 16. Simon, H.A.: The Sciences of the Artificial, 3rd edn. The MIT Press, Cambridge MA (1996) 17. Weiser, M., Brown, J.S.: The Coming Age of Calm Technology. In: Denning, P.J., Metcalfe, R.M. (eds.) Beyond Calculation: The Next Fifty Years of Computing, Springer, Heidelberg (1997)
An Ignored Factor of User Experience: FEEDBACK-QUALITY Ji Hong1 and Jiang Xubo2 1
Abstract. User experience plays a more and more important role in the process of design and development for the information products. About the user experience in the field of the network-based (Internet and mobile network) application a lot of research and development teams focus on the information architecture (IA) and user interface (UI) design, they locate on the middle and front level of the products. But in the same time a very important factor of user experience is ignored: FEEDBACK-QUALITY, which is decided by the quality of telecommunication from Telecom Service Support. Through the long observation and research we find: this factor can basically influence the most network-based products. Keywords: feedback quality , feedback periods , feedback periods integrality , feedback time.
1 Brief Introduction At present , the study about user experience concentrates in user interface design which is user straight osculant , but the other important factor is ignored by most people which we called feedback quality . By studying three kinds of long-distance scrutiny software , we discover that the ignored factor takes an important part in information systems user experience which is mostly intermediary by internet.
2 Definition In order to make our discussion clearly , we make several definitions: 2.1 Feedback Periods It is the process from user sends out instruction of aiming at information storeroom to receives corresponding feedback . We can make this definition clearly by picture. USER
We can see too principal parts from the model : User and information storeroom . On the side ,there’s intermedium between them . There’s user interface that we are familiar with in this intermedium , and also there’s An important part which is made of the whole information system: Internet connecting. User interface -----------internet physics stratum -----------------informaition storeroom interface (machinery interface) Fig. 2. Intermedium forming model
2.2 Feedback Quality It is a standard to scale efficiency of feedback quality , also an important to user experience which is ignored for a long time. We think there are too standards to scale the feedback quality : 1. Integrality of feedback periods , which is directly deciding form of feedback periods, it affects users’ needs can be satisfied or not in the user experience field. 2. Feedback time which also called the time users finish feedback periods , satisfies users’ needs of efficiency .Generally speaking ,user experience field only needs to pay attention to the affection of user interface ,but after studying we discover that the internet speed is also affects user experience .
3 Methodology What we found comes from a UT about 3 softwares from China Telecom. The methodology is the usability testing: Let the really users in the same designed scenarios perform the selected Tasks, with the statistic of the perform-time, the amount of errors and the interview to the participant,we can get the problems in the tested products about the usability.[1][2].
4 The Design of Experiments This test is a landscape orientation contrast test between 3 different versions of the far-controled security software. We found at the first 7 users to perform the test to their main functions. 4.1 The Choose of the Participants First of all we setted the standards of participants: the staffers in security room or normal personnels without absolute use experiences of the tested softwares. Base of the choosing standard,we found,there were 2 out of 7 participants worked to sell the those softwares,so we deleted their datas in the test. 4.2 The Test Plan To avoid the affections of study-impacts we set a matrix-order for each participants in the test.
An Ignored Factor of User Experience: FEEDBACK-QUALITY
129
Table 1. Participant
Participant
Participant
Participant
Participant
1
2
3
4
5
1
Version A
Version C
Version B
Version A
Version B
2
Version B
Version A
Version C
Version C
Version A
3
Version C
Version B
Version A
Version B
Version C
Orde
4.3 Task Arrangement We design tasks for users according to the main functions of software: 1. Showing the XXXX watch menu in the top left corner window 2. Showing the YYYY watch menu in the bottom right corner window 3. Select the picture in the top left corner window, turn the camera left and up, then take establishing shot 4. Take a picture from the bottom left corner window. 5. Check the picture of step 4. 4.4 Data Collection Criteria 1. Time Criteria of Task Completion The time will be accounted after user finishing reading the task, and it will end after the user’s announcement of finish. If the time a user used excesses the average time, the task completion of this user will be considered failure. 2. Criteria of Successful Task Completion User announces the finish himself, and the completion is confirmed by the question-master.
5 Analysis of Data and Experiment Result 5.1 Stat of Testing Time Software A User1
Description User selected” Kinescope Research” to display the watch picture User hit the control under catalog of “facility list”; without hitting the image of the camera, but hitting images of subdirectory. User drag and hit images of the control User hit other irrelative widgets to research watch pictures -
No. 2 5 3 4 -
Task3 Mistake Description Software A Software C Software B
User went into “Image Effect” and “Advanced Control” to research establishing shot control function. User could not find the direction control function User misconduct the direction control frequently User hit wrong widget to control establishing shot User confused the images of establishing shot and close shot
No. 3 4 4 3 5
An Ignored Factor of User Experience: FEEDBACK-QUALITY
131
Task4, 5 Since the three testing all had situations of Task failure, comparison and stat is hard to handle. However, the task failure itself shows the mistake bearable problem of the software design. When the mistakes happened, all three software did not show clear hint or help, nor necessary in-support function insert, so user’s operation can only be based on one single mode. If any problem happens in this process, user will have on way to finish the whole task. This is the biggest Mistake Bearable problem of the three software so far. 5.3 Our Discovers After the analyse to the statistion about the using-time and the errors but also the interview to the participants,we got: version C gets the worst note of UE. But we did not immediately conclude,that all prolems ascribe the UID of this version,becase we found 2 strange phenomenas: 1. the 2. task of the version C is not difficult,it is just a simple select-perform.but we got a lot of error-records. 4 in 5 participants were lost. It confused us,why after the study of task 1, the perform of the participants got a lower note. The difficulty of the task schould be reduced. To explain this question we reviewed the feedback of the participants after the test and wacthed the video-tapes again. Through the analyse we found,the impersonality reason of the lost is the discontinuity of the video cable,it leads to the participants can not find the video they wanted,and more there were not clear clew for them. We defined it : the discontinuity of the feedback cycle. Its intergrality is destroyed. The interface in users(UID) —Destroyed— The Medium——the interface in machine Fig. 3. The intergrality of the feedback cycle is destroyed
2. We got a bad record of using-time and errors in the 3. task of version C.We discovered the reason after the review of the test. There was a 2-3 seconds delay when the participants tried to control the direction of the camera. That means the feedback time is far higher than the participant`s limit of patient. It leads, that they can not decide their performs as their custom.They must face to the difficulty of study and more the efficiency is affected. At the last we got: The unintergrality of the feedback cycle and the too long feedback time can directly affect the feedback Quality,and more can lead to the reduction of the UE, which is not just localized in the field of UID.
132
H. Ji and X. Jiang
6 Conclusions Customer experience is always just considered as user interface design. Once it’s related to technique issue, it’s easy to neglected and simply considered as technique bug. However, through this usability testing, in our opinion, user experience is not just user interface design. The company should not only focus on the user research and user interface design of the two sides of feedback time but also should pay attention to the relative technique aspects. Particularly China Telecom, how to improve the quality of feedback or how to male the customer satisfied with basic needs for feedback periods and decrease feedback periods to improve the potential needs of feedback efficiency is significant. User experience could be considered as the furniture which is made up by a few woods. User research, feedback quality and user interface are one of them. If one of them is neglected, user experience will be affected. Therefore, research and improve feedback quality which is always neglected is the key point to improve user experience to China Telecom.
References 1. Handbook of Usability Testing(How to plan, design, and contduct effective Tests) Jeffrey Rubin (John wiley & Sons Inc) pp. 25–26 2. A Practical Guide to Usability Testing Joseph S. Dumas Janice C. Redish(Intellect) p. 4
10 Heuristics for Designing Administrative User Interfaces – A Collaboration Between Ethnography, Design, and Engineering Luke Kowalski and Kristyn Greenwood Oracle Corporation, 500 Oracle Parkway, Redwood Shores, CA 94065 [email protected], [email protected]
Abstract. The lack of focus on administrative interfaces often comes from management's mandate to prioritize end user screens ahead of others. This often shortchanges a more technical class of users with unique needs and requirements. At Oracle, design heuristics for admininstrative GUIs were sourced from a multitude of sources in the corporate ecosystem. Ethnographers, software architects, designers, and the administrators themselves all contributed to bring a better understanding of this often forgotten class of user. Administrators were found to inhabit anywhere from two to five particular classifications, depending on the size of the company. Recently, an ethnographer studied one classification in greater detail, the Database Administrator, while a designer, in the course of an E-Business Suite Installer project analyzed another, the application administrator. What emerged based on the gathered data was a remarkably consistent and universal set of rules and tools that can be used to lower the total cost of ownership and increase usability, attractiveness, and satisfaction for administrative interfaces. Keywords: Design, Administrative interfaces, design techniques, heuristics, ethnographic research, design methods.
deal with administration, configuring, tuning, and maintenance of databases. They posses highly specialized skills. This last user type was the subject of a 2-year long, 8 site ethnographic study. Data collection in this instance involved 23 Database Administrators and included a self-report survey, observational sessions during which task, object, and tool use was recorded at set intervals, and a follow up interview to elicit more quallitative data. The study was designed to find out what the administrators spend time on, what tools they use, and how this information could influence the next generation of Oracle’s server products. The fourth administrator type is the Application Administrator or Functional Administrator. These professionals usually work with a given application like Human Resources, Manufacturing, or Financials. Their duties encompass Lifecycle Change Management, which spans installation, setup, configuration, maintenance (patching), and upgrade. It is often said that the funds spent on LCM are anywhere from 2 to 4 times as much as the initial license cost of the software. Total Cost of Ownership (TCO) issues are much more relevant in software supporting complex enterprises than in consumer software. Most of the administrator’s time is spent tailoring the applications to meet the business needs and practices of a given company. The fact that an enterprise suite is installed does not mean that it is ready to use. Administrators need to configure things like security, populate or provision the system with users, and define defaults for invoicing, printers, and tax structure, among other tasks. These individuals were studied in detail in the context of a design project to improve the task completion of a suite installer. The installer was a Java based wizard that installed the database, application server, and the applications tiers of the Oracle E-Business Suite. The last type of administrator belongs in the Business Analyst or Implementation Consultant class. They often customize the seeded business flows to meet specific business needs, or work on legacy system integration projects. When the project gets too technical they are often joined by a team of developers who extend and customize the application programmatically, often using a developer tool like Oracle JDeveloper. In studying the administrators through the ethnographic research and through a series of design projects we were able to abstract out heuristics and tools that are generalizable for most administrators and could help a designer better target their deliverables to the needs of this unique community.
2 Heuristics Heuristic 1: Do Not Force a Graphical User Interface (GUI). Innovate Only Where Appropriate. In the ethnographic study, we found that 32 percent (See figure 1) of the administrators relied on the command line as their primary tool on the job. They often found it more efficient, faster, and offering more feedback than a GUI. It can also be accessed remotely. Furthermore, the UNIX command line does not involve any set up or configuration in order to be immediately usable. Designers often assume that command line tools and utilities only exist because engineers did not have the time to develop a GUI. Instead of forcing a GUI it is instead advised to support the habits, comfort zone, and core competencies of the administrators by developing tools to accommodate the command line. These could include repositories of custom scripts for batching jobs, or logging tools and information visualization for mentoring.
10 Heuristics for Designing Administrative User Interfaces
135
Fig. 1. Percentage of Time Using Tool Categories. Data from Oracle Study of Database Administrators.
Heuristic 2: Design Based on Observation. Do Not Rely on Self-Reported Data When It Comes to Design for Administrators. Participate in user groups, advisory councils, and include observational data. This is often a universal truth when it comes to data gathering methods, but we found it to more pronounced for this type of user. Surveys and interviews provided inconsistent data compared to observational sessions. Administrators told us that they spent little time doing troubleshooting, where the observational data showed otherwise (Fig. 2). In the study design we did make sure that our sample came from a representative day, and did not include a singular task. It is recommended to focus on 2 or 3 methods when gathering information for the design of administrative applications. One of them should include some form of direct observation, in context, or with a prototype.
Fig. 2. Comparison of Self Report and Observed Database Administrators for Top 5 SelfReport tasks
Heuristic 3: Design Lightweight and Flexible Applications to Accommodate Remote Administration. Administrators often work from home, or administer hardware located in a data center far away. We have observed that if a tool needs to be installed, or if it has slow performance, or long download times, it will not be used
136
L. Kowalski and K. Greenwood
at all. With the current technology, this means thin client web applications, as opposed to native operating system applications, or Java on the client. Mobile applications are critical for administrators, as well. More intelligent devices that can provide more information about a given escalation are slowly replacing pagers that notify the administrator of a given alert. Personal Data Assistants (PDAs) like Treos and BlackBerries were very popular in the Database Administrator and Application Administrator environments. One data point came from a user in a supervisory capacity. His role was to send as a trafficmaster for alerts and data center escalations. He would send specific tasks to administrators based on severity and acquired competencies. Heuristic 4: Design for Collaboration. Administrators spend a large portion of their time communicating with others. Database Administrators spent 19 percent of their time talking to others and 9 percent using e-mail . A good set of collaboration tools can help them become more efficient, automate certain tasks, or just become better organized. Accountability and record keeping also come into question. If collaboration tools are not integrated with the other tools to monitor or tune the hardware and software, they are not considered as useful. Rob Barrett of IBM Almaden Research presented a similar finding, where collaboration was found to be a critical element in the Database Administrator’s work. [1]. In our study we found that administrators underreported all of the communication tasks. Once we were able to identify collaboration as a key feature we were able to design it into the knowledge repository and other tools used by our users, and these features well extremely well received in future lab tests. Heuristic 5: Integrate the Major Administrative Tool Silos: Collaboration, Monitoring, Information Knowledgebase. All administrators studied expressed a desire for a better-integrated portal that would provide an overview of their systems and tools. The application to monitor and tune was only useful if it had an “in context” connection to the application that was used to troubleshoot (Information Knowledgebase, or the repository of solutions to known problems). Collaboration tools were also deemed more useful if they were integrated with their monitoring tools and were designed specifically for administrators to collaborate on lifecycle management of the software environments they were supporting. A good example of this is the ability for the administrator to append notes to an alert in the application that monitors database performance. Administrators are often presented with multiple interrupts of different priorities. We found that they could be more efficient if provided additional context. If they receive two critical notifications (running out of tablespaces) they will triage the one that involves a sales deal database before the end of the quarter and then try to troubleshoot one that belongs to a test system for a future version implementation. Heuristic 6: Documentation for Administrators Is More Frequently Referenced, Needs To Be Fresher, Task vs. Product-Based, and Include the Web. If an application administrator needs to apply patches to their system, they need to have the most recent source of truth, since patches can affect security and stability of the applications they are administering. A 3-month-old printed manual will not be as
10 Heuristics for Designing Administrative User Interfaces
137
useful as online documentation (Fig. 3). Administrators, in contrast to the end users study the documentation and form detailed project plans around installation and production deployments. Administrators also work with software and tools authored by sometimes disconnected product groups within one company. Their tasks do not correspond to the product or organizational boundaries. They often span them. In working with the application administrators, in the context of administering a Common Industry Format (CIF) test from NiST, we found that when administrators were stuck after reading the documentation, they went to search Google. They would often find a web based discussion group where this exact error message was analyzed and the problem solved. These were not always official, or company sponsored sites.
Fig. 3. Documentation in the form of a Post-Installation Page with Links to Tools, Guides, and Information Knowledgebases
Heuristic 7: Manage Complexity by Providing Defaults, and Automating Tasks. A constant point of feedback from the application administrators was a request to provide tool defaults that work. This tends to entail a reduction in the number of
138
L. Kowalski and K. Greenwood
screens and fewer decision points. If an administrator is using a wizard to perform an installation they do not always want to see all the choices and all the paths (Fig.4). Creating a Quick Install path and an Expert path resonated very well with administrators in the next iteration of the design. Sometimes intelligent assumptions are better, and the optimization, or “tweaking” can happen after the system is working in its basic configuration. Other feedback included complaints about the number of manual steps necessary to prepare for the installation. Automation of some steps proved the answer. In one usability test, one issue involved the absence of system checks that, when not performed beforehand, would cause failure of the installation. One of the checks, for free disk space, took place at the end of the installation when it was too late to do anything about it.
Fig. 4. Managing Complexity by Providing Alternate Paths and Decreasing the Number of Decisions on a given screen
Heuristic 8: Perform Competitive Analysis, Including Open Source Tools. As much as a company will try, it is impossible to force the administrators to use only your tools. They will find utilities developed by their user group, an open source product for monitoring and health checks, or even deploy your competitor’s product. The more a designer studies these tools, the more effective the integration exercise, or the information can be used to enhance existing applications. Subjects of our DBA study all had their favorite collections of tools, and while there were some patterns there seemed to be a race to discover the coolest and latest utility to make the adminstration tasks more efficient and operations transparent. Heuristic 9: International Focus and Hosted Applications. Administration is being outsourced. In some cases, the physical infrastructure and the software are remote to both the end users and the administrators. This is the case when a company hosts an application suite for a customer who accesses it over the Web. In other cases, only the administrators are in remote locations. Designers need to include sensitivity to other cultures, and design with internationalization support in mind, including support for
10 Heuristics for Designing Administrative User Interfaces
139
languages, bi-directional support, and accessibility standards relevant to the local government bodies. Heuristic 10: Use the Right Communication Vehicle during the Design Process. When designing for administrators, it is very common to create designs that are not implemented. Results of studies are often communicated in 100 page reports that the stakeholders do not have time to read. Conversely, posters representing flows, or “before and after” designs, are more successful. What also helps is “speaking the administrator’s or the developer’s language” and using the bug defect database to record design and usability issues. Communication among team members can also prove to be a failure point for a designer. Utilizing a new tool like the collaborative Twiki can accelerate communication and foster a feeling of an extended virtual team, with everyone working on the same goal. A designer is furthermore successful if they extend their role and try to understand why technology, legal, or business issues stand in the way of their vision’s implementation. Standardized testing, while not always useful in the creative phases of the project can still be instrumental when comparing unassisted task completion rates between one release and the next, or comparing yourself to the completion. And lastly, direct involvement with the end users and project stakeholders tends to work better than management mandates and lengthy and abstract guidelines.
3 Conclusion Administrators are not yet a fully understood user type. More work is needed to fully develop complete user profiles. Enterprise software also represents just one dimension. Consumer companies like EBay and Yahoo are also cultivating their own administrative ecosystems. The domain is not an easy one since it involves constantly evolving technology and industry standards. Furthermore, few enterprise installations include only the software sold. There are always legacy systems and integration exercises present unique logistical, financial, and human factors challenges. The heuristics identified provide a focus for a designer who is new to this domain and dealing with that user type. If taken into consideration, the most basic administration UI design bloopers will be avoided.
Reference 1. Barrett, R.: System Administrators are Users, Too, Standard Human Computer Interaction Seminar (Mary 30, 2003), http://hci.stanford.edu/seminar/abstracts/02-03/030530-barrett.html
Micro-Scenario Database for Substantializing the Collaboration Between Human Science and Engineering Masaaki Kurosu1, Kentaro Go2, Naoki Hirasawa3, and Hideaki Kasai4 1
Abstract. For the purpose of achieving the effective and efficient humancentered design, a database of problem micro scenario (p-MS) is proposed. In the concept of this system, the human scientist work first for getting the information about the user and the context of use by applying the field work methods. The information about problems discovered in the field data will be stored in the p-MS database with the tag and the ground information. Engineers who plan to manufacture something can retrieve relevant problem information from this database, thus they can shorten the time required for the early stage of development. This idea of p-MS database is believed to facilitate the humancentered design and the feasibility study will be conducted within a year from this presentation. Keywords: usability, scenario based design, micro scenario method, database.
Fig. 1. Relationship between the human science and the engineering (above: previous situation where human science served just as the information source separately, below: situation where HCD is implemented and both approach are integrated into one)
2 Collaboration Between Human Science and Engineering As is shown in Figure 2, there are two types of collaboration between the human science and the engineering. In this figure, above is an idealistic type where the human scientist takes the first role by investigating the user characteristics and the context of use, thus summarizes the requirement.
Fig. 2. Two types of collaboration between the human science and the engineering
142
M. Kurosu et al.
But most of the real development takes the type below where both parties start at the same time. Although this type of development is better than no collaboration, engineers will not wait for the requirement information presented to them. Because it is a waste of time, they start “something” while waiting. As a result, when the requirement information is given, engineers might have stepped into some designing process without the adequate information about the user and the context of use. If engineers were quite flexible and receptive, they will redo designing. But in most cases, to our regret, engineers do not lend their ears to the requirement, thus design something that do not fit to the user requirement. On the other hand, the serial approach described above in Figure 2 is difficult because it is unbearable for engineers just to wait for the completion of the requirement and do nothing until then.
3 Micro Scenario Database One answer to the problem above is to construct the database of problem micro scenario as is shown in Figure 3.
Fig. 3. Concept of micro-scenario database
The problem micro scenario (p-MS) is a scenario that represents the micro information structure constructed from the field work data. It is an output from the first half step of the micro scenario method (Kurosu et al. 2003, Kurosu 2004, 2005, 2006) described in Figure 4. Micro scenario method is a successor to the scenariobased design originally proposed by Caroll (1995). As is shown in Figure 5, each p-MS represents the problem in terms of the relationship between the user and the artefact. Fundamental information about the user and the context of use is described as the ground information (GI) and linked to each p-MS, hence if one wants to get the background information of the p-MS, s/he can get it by tracing the link to the GI. p-MS is attached with the tag information that represents the content or the domain of the problem. It is similar to the keyword. Thus it will be used to retrieve the relevant p-MSs from p-MS database and the user of the system can get p-MSs with similar problems, and can summarize the information. In this way, this database of p-MS can be used to create the requirement for developing some products or systems.
Micro-Scenario Database
Fig. 4. Basic flow of micro scenario method
Fig. 5. Problem micro scenario
143
144
M. Kurosu et al.
As shown in Figure 3, human scientists work for investigating the user and the context of use by using the field work methods independently from the engineering developmental process. They summarize the information as a set of p-MS, tag, and GI and store them into the database. Engineers can use that database whenever they would like to start a project for manufacturing something. Relevant information can be retrieved from the database by entering the keyword. Figure 6 represent the situation where the micro scenario database is used by many engineers. In this figure, an interpreter is added to the top of each engineering project. The interpreter must have the background of usability engineering and can interpret the retrieved p-MS adequately in order to create the requirement.
Fig. 6. Use of micro scenario database
4 Conclusion The p-MS database is just a concept at the time of this presentation, but it is planned to be implemented in a year or two. The feasibility study will then be started. Authors have a belief that this kind of database is surely be useful in order to spread the human centered design. Besides, the micro-scenario authoring tool (Kurosu et al. 2006) that has just completed will facilitate the use of the database.
References 1. Carroll, J.M. (ed.): Scenario-Based Design: Envisioning Work and Technology in System Development. Wiley, Chichester, UK (1995) 2. Kurosu, M., Nishida, S., Osugi, T., Mitsui, M.: Analysis of Field Data by Micro-scenario Method (in Japanese) In: Proceedings of Human Interface Symposium (2003)
Micro-Scenario Database
145
3. Kurosu, M.: Micro-scenario method for designing and re-designing the e-Learning system, E-Learn 2004 (2004) 4. Kurosu, M.: Micro-scenario method: a new approach to the requirement analysis, WWCS 2004 (2004) 5. Kurosu, M.: Micro-scenario method (MSM) - a new approach to the requirement analysis , Human Interface Society SIGUSE (2004) 6. Kurosu, M.: Micro-scenario method – interface design based on the context of use information, Design IT (in Japanese) (2005) 7. Kurosu, M.: Scenario creation by using the micro-scenario analysis system, JPA 2006 (in Japanese) (2006) 8. Kurosu, M., Kasai, H., Hirasawa, N., Go, K.: Analysis tool for micro-scenario, Human Interface Society SIGUSE, 2006 (in Japanese) (2006) 9. Kurosu, M.: Micro-scenario method, NIME report, 2006 (in Japanese) (2006) 10. Ohnishi, J., Go, K.: Requirement Engineering, Kyoritsu-shuppan, 2002 (in Japanese) (2002)
A Meta-cognition Modeling of Engineering Product Designer in the Process of Product Design Jun Liang, Zu-Hua Jiang, Yun-Song Zhao, and Jin-Lian Wang Department of Industrial Engineering & Management, School of Mechanical Engineering, Shanghai Jiao Tong University, 800, Dong Chuan Road, Shanghai, 200240, P.R. China {jliang, zhjiang, lemon_zhao, buada}@sjtu.edu.cn
Abstract. For further effectual tacit knowledge reusing in the process of product design, individual cognitive processes, cognitive factors, and cognitive strategies need to be realized to find the essential factors that affect the generation of tacit knowledge and control designer activities in the whole design process. But these key factors are relative to individual cognitive capability and meta-cognitive level. So, based on physical symbol system hypothesis (PSSH) and connectionism, a meta-cognition model of engineering product designer is provided to elucidate the active monitoring and consequent regulation in this paper. Designers’ cognitive activities in the process of product design are analyzed from the viewpoint of cognition science. Finally, the cognitive differences between the experienced designers and the novices in the process of fuel injection bump design is compared and elaborated in detail. Keywords: Meta-cognition, Cognitive activity, Individual Difference, Product design.
A Meta-cognition Modeling of Engineering Product Designer
147
product design, meta-cognition can monitor and control cognitive process of the designers about product design. For example, in the case based design, designer starts cognitive activities and meta-cognitive activities from design tasks and design requirements, continues with the confirmation of features, case retrieval, case revision, case using, and ends with the accomplishment of design. Effective design support systems must complement human cognitive activities, and must be based on a sound understanding of the human cognitive abilities [4]. This paper focuses on the analysis of designer’ cognitive and meta-cognitive activities and builds a bridge that connects cognition psychology and engineering product design. This paper is organized in the following way. Section 2 introduces the cognitive foundation of designer meta-cognition model and the related works of meta-cognition. Section 3 provides a meta-cognition model of engineering product designer and presents the components of meta-cognition model. Cognitive and meta-cognitive activities in the process of product design are explored and analyzed in Sect. 4. Section 5 discusses cognitive and meta-cognitive activities in fuel injection pump design and compares the cognitive differences between the experienced designers and the novices and the conclusions are presented in Sect. 6.
2 Cognition Science Foundation for Individual Meta-cognition in Product Design Meta-cognition emphasizes personal mind activity, thought, perception, memory, and interaction of cognitive activities, and pays more attention to self-awareness and selfregulation. Meta-cognition is defined by Flavell as “knowledge and cognition about cognitive phenomena” [5] and often described as the executive process governing our cognitive efforts [1] and it consists of meta-cognitive knowledge and self-regulation [6]. Susan V. Baxt [7] defined six meta-cognitive processes, i.e. problem definition, planning, Strategy selection, flexibility (of strategy use), evaluating, and checking and monitoring, which are based on above three meta-cognition models. Meta-cognitive activity was significantly related to knowledge acquisition, skilled performance at the end of training, and self-efficacy [8]. Monitoring and Control are the two important information flow processes, one information flow is from a cognitive to a meta-cognitive level allows monitoring of the cognitive level by the meta-cognitive level, the other is from the meta-cognitive to the cognitive level allows control of cognition by meta-cognition [9].Monitoring one's thinking and the effects of controlling it are the model's mechanisms for increasing meta-cognitive understanding [10]. Furthermore, the importance of the context was emphasized by Erik Hollnagel [11] and he pointed out that cognition and context are inseparable. Valkenburg R. [12] and Smith R.P. [13] had studied the cognitive activities of design teams. Mao-Lin Chiu[14] considered that design status is a kind of manner of design operation, which can implement with sense input, perceive process, conception process, status-structure process and memory construction process.
148
J. Liang et al.
3 Meta-cognition Model of Engineering Product Designer 3.1 The Framework of the Model Designer meta-cognition in product design domain refers to designers monitor and control a series of cognitive activities to dominate individual knowledge to solve design problems in self-awareness when stimulating information of design environment interchanges with their cognitive behaviors. Meta-cognitive process in product design is a continuous process of driving design task forward until its accomplishment. Designers can cognize not only design objectives, design process, but cognitive process and cognitive results, and these cognitive activities happen in a positive and self-conscious situation.
Fig. 1. Meta-cognition model of Engineering Product Designer
As shown in Fig. 1, meta-cognition model of engineering product designer involves five sub-modules, and they are meta-cognitive knowledge, meta-cognitive experience, meta-cognitive operation, product design cognition sub-module, and long memory module of product design knowledge introduced in following context. Among of them, meta-cognitive knowledge, meta-cognitive experience, and metacognitive operation are the hard core of the model. Long memory module of product design knowledge provides various kinds of knowledge for solving the design problems, and product design cognition sub-module supports the minute cognitive activities, such as sense perception and, respectively.
A Meta-cognition Modeling of Engineering Product Designer
149
3.2 The Components of the Model 3.2.1 Meta-cognitive Knowledge Meta-cognitive knowledge refers to beneficial knowledge, experiences, and lessons that impact cognitive processes, cognitive strategies, cognitive structures, and cognitive results during cognitive activities happening in the product design process, and it supports and affects meta-cognitive operation and meta-cognitive activities and transfers cognitive tasks. In product design, meta-cognitive knowledge includes three main aspects, people, tasks, and strategies, which are described as follows: People means that the designer self or others act as the objects, concerns cognitive capability, intelligence level, design experiences, knowledge, cognitive structure etc., which involves to cognize his/her own cognitive capability, and to perceive cognitive states in design requirements, cognitive differences and similarities, and special cognitive rules and experiences formed in the product design process. Tasks mainly mean cognitive knowledge when the designer analyzes and judge detailed cognitive goals and cognitive requirements, which include to cognize the requirements, goals, and features of cognitive tasks, the properties, characteristics and mode of appearance of cognitive materials, and the familiarity, degree of difficulty and schedule of cognitive object in product design. Strategies mean cognitive knowledge and methods used by the designers when they plan, employ, monitor, control, and adjust cognitive activities, which include methods about cognizing designer’s cognitive activities, analysis of merits and demerits about cognitive strategies, guideline about exception problem handling in process, and directions about cognitive activities, such as, attention, memory, and thought etc. 3.2.2 Meta-cognitive Experience Meta-cognitive experience refers to designers’ comprehension and consciousness about their cognitive activities and cognitive process, reflects the awareness and unawareness about cognitive activities, and shows in the form of affective experience. The execution of designer cognitive activities in product design emerges from meta-cognitive knowledge activated by meta-cognitive experience that changes metacognitive knowledge from activated level to work state to serve meta-cognitive monitoring and meta-cognitive regulation. Positivity or negativity of meta-cognitive experience impacts designer cognitive activities, decides designer’s decision-making behaviors, such as, attention of different degrees about design process, cognitive strategy, and method choosing, and finally, conducts the success and failure of product design. Meta-cognitive experience is a mediator and a trigger of monitoring and regulating cognitive activities. From the viewpoint of engineering product design, at initial stage of product design, the designer experiences degree of difficulty, familiarity, and ongoing situation of cognitive tasks. In medium-term, the designer experiences the process of cognitive tasks, all kinds of difficulties and obstacles about cognitive tasks, the gap between planning and practices, and reschedule of cognitive strategies. At final stage, the designer experiences the effect of cognitive activities, the evaluation of planning and practices, meta-cognition activities, such as, the improvement of cognitive strategy, and emotional experiences, like, glad and sad. So, it is very important to
150
J. Liang et al.
arouse designer meta-cognitive experience in product design, because meta-cognitive experience can activate enthusiasm of cognitive activity and improve the validity of cognitive process about design problems. 3.2.3 Meta-cognitive Operation Meta-cognitive operation refers to a series of meta-cognitive activities monitoring and regulating designer cognitive activities by activation of meta-experience, when research objects is cognitive activities of the designer oneself. It means a continuous work process of different operative behaviors, regulates and acts on cognitive activities directly, and interacts with meta-cognitive experience and meta-cognitive knowledge. Herein, operative behaviors of meta-cognitive operation includes choosing, controlling, feedback, monitoring, evaluating, comparing and analyzing etc., which is self-consciously governed by phenomenal consciousness called “metacognitive center” in this model. All these meta-cognitive operative behaviors may do execution in concurrent mode or in serial mode. Such as, the operative behaviors of “monitoring--feedback-controlling” is a serial handling structure in the individual cognitive process of product design, but “choosing” of meta-cognitive knowledge or design domain knowledge in long memory of module, and the operative behaviors of “monitoring— feedback--controlling” are dealt with in a concurrent mode. Meta-cognitive center is the core of meta-cognitive operation, guides the operative behaviors, and contacts with meta-cognitive knowledge. It is affected by the designer oneself, call cognitive tasks with corresponding cognitive strategy. Meta-cognitive operation revises meta-cognitive knowledge and responds to the activation of metacognitive experience. In the individual cognitive process about product design, metacognitive operation carries through meta-cognitive activity to monitor, to control and to regulate the cognitive process of product design, interacts with others sub-modules in the model. 3.2.4 Cognition of Product Design Cognition of product design refers a set of cognitive activities happening in designer consciousness, which starts from receiving the stimulation of design requirements and design tasks and ends with the completion of a concrete design. This process is a special applying of general cognitive activities in product design, an access of metacognition and cognition, and the cognitive access of design problems. It includes cognitive activities about product design, cognitive process of product design, attention, the characteristics of cognitive tasks of product design, cognitive effects, and mental feeling etc. 3.2.5 Product Design Expertise Knowledge in Long-Term Memory From the product design coming to hand to accomplishment of this product, all individual memory contents about product design, such as, expertise knowledge, experiences, lessons that exist with all product design activities are stored in long memory module of product design knowledge, and this module serves for metacognitive operation. Tulving [15] divides memory into episodic memory and semantic memory. Over here, semantic memory refers to memory of general knowledge and rules of product design, relates to the connotation of concepts that emerges from the
A Meta-cognition Modeling of Engineering Product Designer
151
whole product design process. However, the information of episodic memory comes from external information resource and concerns design experiences and their concrete scene and specific details. This module provides the needed expertise knowledge, domain knowledge, and other knowledge for the designer going along cognitive activities and supports meta-cognitive knowledge.
4 Relationship Between Meta-cognitive and Cognitive Activities and the Product Design Process Individual cognitive activities in the product design process mainly focus on the imago and cognition of the components, concepts, execution, and completion of cognitive tasks about design, and involve cognitive process and mental activities, such as, sensation, perception, image, thinking, memory and attention etc. individual metacognition is a cognition about product design cognition and a continuous process of realizing design tasks. The designers can cognize objective tasks and their own cognitive process and cognitive results. Cognitive activities and meta-cognitive activities for product design are governed and regulated in a positive and selfconscious status, such as, self-regulation, self-awareness, and self-control. The designers start their cognitive activities from receiving the stimulation of design tasks, such as, sensation, perception, and attention etc. At one time, metacognitive activities work in a concurrent mode, like, meta-cognitive monitoring and meta-cognitive controlling. With the development of product design activities, cognitive activities and meta-cognitive activities continue to advance and improve. Finally, individual cognitive and meta-cognitive activities will end along with the completion of design tasks. Observing from a special time or space viewpoint, cognitive and meta-cognitive activities of the designers exist in the dispersion, fragment, and concurrency mode, but in the whole design process, they go along in the sequence, order, and series mode.
5 Cognitive and Meta-cognitive Activities and Cognitive Differences in Fuel Injection Pump Design The retrospective verbal protocols of two experienced designers and four novices have been analyzed and compared to research the cognitive process and metacognitive activities that happened in PM fuel injection pump design process. 5.1 Cognition and Meta-cognition Analysis in Fuel Injection Pump Design As soon as designers get the design task of PM fuel injection pump, their cognition and thinking start to deal with related design tasks and cognitive tasks from task assignment, technology resource, strategy, role, and potential problem etc. The design information is sensed and perceived by vision and audition of the designers, and the design requirements of PM fuel injection pump are paid more attention firstly, such as, the type of matching engine, key parameters. With the stimulation of design information, meta-cognitive center handles the related cognitive information from
152
J. Liang et al.
bottom to top. Meta-cognition analyzes cognitive tasks, considers designer own role, cognitive goals and intention, and monitors cognitive activities by meta-cognitive center. Meta-cognitive operation goes into effect in series or concurrent mode, such as, planning individual cognitive process, selecting cognitive strategy, and comparing the differences of this design task and one-time design tasks in mental feeling. At the same time, meta-cognitive operation inspires meta-cognitive experience, which activates meta-cognitive knowledge to call the related knowledge and design scenario segments, like PL and/or PM fuel injection pump design scenes. Designer metacognitive knowledge guides and affects meta-cognitive operation and comprehends meta-cognitive experience, in reverse, and meta-cognitive experience supports all kinds of operative behaviors. They interact, restrict, collaborate, and depend one other to monitor, control and regulate cognitive activities in product design. The design tasks and design intention of PM fuel injection pump, its functionbehavior-structure, and sub-goal and sub-task need be arranged, discussed, and determined at some meeting and branch meeting in several working days, which lead to designer cognition existing in a dispersion and fragment mode observed from time and space viewpoints. Designer cognitive and meta-cognitive activities govern and dominate individual behaviors, such as, the lingual expression of design scheme, the drawing practice, and the concrete design steps. With regard as minute design calculation and basic parameters, designers finish them in the direction of design template and design manual or by the professional software, in which there is few creative activity, so designers only need to notice, monitor, and control their cognitive activities. When designers encounter some difficulties, they need to extract related experiences, knowledge, and shortcuts from long-term memory module of product design. Sometimes, designers need to activate individual image, creativity and afflatus etc. to complete design task and design activities of PM fuel injection pump. In general, designer cognitive and meta-cognitive activities in PM fuel injection pump design conform to the principle of economy. 5.2 Cognitive Differences of Different Designers The whole design process of PM fuel injection pump contains two stages in a nutshell, the preparation of design scheme and concrete design and calculation of PM fuel injection pump. At the first stage, the experienced designer and the novice differ in cognitive plan, cognitive strategy, the perception and prediction of PM fuel injection pump design process and detailed step. The cognitive differences between them mainly focus on cognitive effects, mental feeling, cognitive goal and intention, result prediction of cognitive tasks, cognitive process, and meta-cognitive activities etc, which are shown in Fig. 2. For example, in cognitive effects and cognitive tasks, the experienced designers like to perceive all-sided design tasks to plan their cognitive tasks and transfer and use their design experience, but the novices focus their attention on design details and design difficulties, and their cognitive strategies are different. Furthermore, the experienced designers emphasize the utilization of the techniques, setting forms, materials, and tolerances of existing series products, like IW fuel injection pump and P fuel injection pump, and mature products, like PW2000 fuel injection pump, but there is any experience using and operation found in the novices.
A Meta-cognition Modeling of Engineering Product Designer
153
Fig. 2. Meta-cognitive and cognitive activities and individual differences in the preparation of fuel injection pump design
Due to the difference of knowledge quantity, problem analysis, experience and shortcut possession of similar design task between them, so the design effects and design schemes generated by them are distinct obviously at this stage. At the second stage, the experienced designers and the novices solve the minute design problem and parameter calculation. The cognitive differences are focused on key problem perception of design process, design experience, knowledge quantity, knowledge structure, which is represented in the method selection of concrete parts, like plunger and camshaft, and in the determination of parameters, like the pressure of fuel supply. For example, the novices design plunger and plunger barrel according to fuel delivery per cycle and duration of feeding, but the experienced designers analyze the parameters and history data of dimension chain and max pressure at pump end of PL fuel injection pump and PW2000 fuel injection pump, and consider the influence of fuel supply rate, spray quality, and the pressure of combustion system at the end of injection to calculate the coefficient of plunger diameter/ effective stroke and chute inclination of plunger. Table 1 shows the partial comparison of the cognitive differences between the experienced designers and the novices in the design process of fuel injection pump. Due to the differences of design role, cognitive tasks, cognitive strategies, and knowledge structure etc., the designers have different mental feeling, perception, cognitive activities and meta-cognitive activities, and their activated meta-cognitive experience and meta-cognitive operative behaviors are also different.
154
J. Liang et al.
Table 1. Partial cognitive differences between the experienced designers and the novices in fuel injection pump design Differences
Experienced Designers
Cognitive People
Understand them and solve difficulty by easy stages, but lack creativity.
Trend to field dependence, Reflective, divergence, holist Abundant expertise knowledge, Knowledge Quantity domain knowledge, and practice experience Ordered, connected, and Organized Manner hierarchical organizing Cognitive Level of Simple and effective product Problem design cognitive process Manner of Knowledge Extracting according to the Extraction rules of schema and hierarchy Cognitive Style
Novices Deficiency of Self-cognition, excessive self-confidence or negative in problem-solving, and sometimes creativity. Trend to field independence, impulsive, convergence, serialist Only part expertise knowledge learned in university of by enterprise training Out of order, untrimmed, and random organizing Form product design cognitive process gradually. Extracting in stochastic and disorder manner
6 Conclusions This paper explores the designer’s cognitive activities in the process of product design and provides a meta-cognition model of engineering product designer, which afford the bedrock of cognition psychology for the research of cognitive process and metacognitive activities in the engineering design process. The core factors of the module are described and discussed in detail, and they interact, restrict, collaborate, and depend on one other in the product design process. Meta-cognitive and cognitive activities in the process of product design are analyzed, and the cognitive differences of the experienced designer and the novices in PM pump design process are compared. It can support and sever for cognition research in engineering design. Furthermore, meta-cognition activity can guide the reusing of the important tacit knowledge and provide the designer the effective knowledge, experience and right design orientation. At the same time, this study provides a useful reference for other domains researches about cognitive and meta-cognitive activity. Acknowledgments. This work is supported by the Shuguang Program of the Shanghai Educational Committee under grant No.05SG15 and the National Basic Research Program of China (973 Program) under grant No. 2003CB317005.
References 1. Sternberg, R.J.: Human intelligence: the model is the message. Science. vol. 230(4730) pp. 1111–1118 2. Flavell, J.H.: Cognitive monitoring. In: Dickson, W.P. (ed.) Children’s oral communication skills, pp. 35–60. Academic Press, New York (1981)
A Meta-cognition Modeling of Engineering Product Designer
155
3. Walczyk, J.J.: The Development of Verbal Efficiency, Metacognitive Strategies, and Their Interplay. Educ. Psychol. Rev. 2, 173–189 (1994) 4. Sherman, Y.T., Lang, J.D., Ralph, O.B.: Cognitive factors in distributed design. Comput Ind. 48, 89–98 (2002) 5. Flavell, J.H.: Metacognition and cognitive monitoring: A new area of cognitive developmental inquiry. Am. Psychol. 34, 906–911 (1979) 6. Brown, A.L.: Metacognition, executive control, self-regulation, and other more mysterious mechanisms. In: Weinert, R.E., Kluwe, R.H. (eds.) Metacognition, Motivation and Understanding, pp. 65–116. Lawrence Erlbaum Associates, Hillside New Jersey (1987) 7. Baxt, S.V.: Metacognition gets personality: a developmental study of the personality correlates of metacognitve functioning. Carleton University, Ottawa (1995) 8. Ford, J.K., Smith, E.M., Weissbein, D.A., Gully, S.M., Salas, E.: Relationships of goal orientation, metacognitive activity, and practice strategies with learning outcomes and transfer. J. Appl. Psychol. 83, 218–233 (1998) 9. Butterfield, E.C., Albertson, L.R., Johnston, J.: On making cognitive theory more general and developmentally pertinent. In: Weinert, E., Schneider, W. (eds.) Memory Performance and Competence: lssues in Growth and Development, pp. 181–205. Lawrence Erlbaum, Hillsdale New Jersey (1995) 10. Butterfield, E.C., Hacker, D.J., Albertson, L.R.: Environmental, Cognitive, and Metacognitive Influences on Text Revision: Assessing the Evidence. Educ. Psychol. Rev. 8(3), 239–297 (1996) 11. Hollnagel, E.: Cognition As Control: A Pragmatic Approach To The Modelling Of Joint Cognitive Systems. IEEE Trans Syst Man Cybern (in press). http://www.ida.liu.se/ eriho/Publications_O.htm 12. Valkenburg, R., Dorst, K.: The reflective practice of design teams. Des Stud. 19, 249–271 (1998) 13. Smith, R.P., Leong, A.: Observational study of design team process: a comparison of student and professional engineers. J. Mech. Des, Trans. ASME. 120(4), 636–642 (1998) 14. Chiu, M.L.: Design moves in situated design with case-based reasoning. Des. Stud. 24, 1– 25 (2003) 15. Tulving, E., Donaldson, W.: Episodic and semantic memory, Organization of Memory, pp. 381–403. Academic Press, New York (1972)
User Oriented Design to the Chinese Industries Scenario and Experience Innovation Design Approach for the Industrializing Countries in the Digital Technology Era You Zhao Liang, Ding Hau Huang, and Wen Ko Chiou 259 Wen-Hwa 1st Road, Kwei-Shan Tao-Yuan, Taiwan, 333, R.O.C Chang Gung University [email protected]
Abstract. Designing for Chinese industries and the new China market has became a ‘hot’ issue within the global and Chinese industrial design society. The characteristics of low labor costs and hard-working Chinese have had an effect on the rapid economic development within the region as a whole. The purpose of this paper is to analyze state of the art industrial development within Taiwan and Mainland China, and to evaluate the critical problems of industrial design development in both regions. Additionally to discover how Taiwan Chinese digital technology industries confront this situation with user-oriented design (UOD). This paper synthesizes six approaches to carry out an innovative product development framework of new product development procedures, with user oriented scenario predictions and experience innovation approach. These approaches not only generate original design data from a user’s point of view, but furthermore make it much easier to get consensus from product development teams and really create innovative designs through interdisciplinary collaboration to create innovative cultural enterprises. Keywords: User oriented design, Scenario approach, Innovation design, Industrializing countries, Digital technology.
User Oriented Design to the Chinese Industries Scenario
157
own, preferring instead to copy or imitate those products that are already available in highly industrialized countries. Most manufacturers in the region involve themselves more with technical and production problems and with upgrading their production and technical quality. It is thus obvious that most makers are primarily concerned with ‘how to produce’ rather than with ‘what to produce’. In the past Taiwan has developed the export of low-priced items based on the island’s competitive edge which stems from relatively low labor costs. Taiwan has been competing in terms of ‘price’ rather than ‘quality’. The product has not been considered as ‘important’ and manufacturers have spent comparatively little on it. This situation has been changing as other nations with even lower labor costs are producing lower priced products. Looking particularly at the recent history of Taiwan, the slow but steady implementation of industrial design reflects this dilemma. This history can be grouped into three periods. The first, the economic industrial development period from 1966 to 1973 focused on ‘design as a tool’ in developing products which must satisfy local users’ need as will as environment requirements. The second, the export industries development period from 1973 to 1989, emphasized ‘design as a bridge’ between foreign buyers and local manufacturers. The third period, the industrial period from 1981 to the present, has implemented ‘design as a tool’ in developing unique Taiwanese products for the global market. Therefore the purpose of this paper is to analyze state of the art industrial development within Taiwan and Mainland China, and to evaluate the critical problems of Industrial Design development in both regions. Additionally to discover how Taiwan Chinese digital technology industries confront this situation with useroriented design.
2 The Value of Design Firstly we propose how product design and development actually work. 2.1 Definition and Scope of Industrial Design A number of managers in Taiwan local industries have understood that industrial design is a very important element in industry. However, it is still necessary to clarify the role of industrial design as something more than cosmetic ‘face-lifting’ or the creation of a ‘nice outer shell’ surrounding technology in general. In this respect, we would like to quote the definition of industrial design as formulated by the International Council of the Society of the Industrial Design (ICSID): “industrial design is a creative activity. Its objective is to improve human life and its environment through product design which satisfies user’s needs and habits, and is concerned with their functional and emotional requirements” [3]. Today, most top managers in global business enterprises have recognized the importance of industrial design, not only as an important specialized field during the product development process, but also as a quality ‘tool’.
158
Y.Z. Liang, D.H. Huang, and W.K. Chiou
2.2 Product Design and Value Planning To further enlighten the issue, We would like to quote the industrial design policy of the Concern Industrial Design Center (CIDC) of Philips Netherlands [2]: “It is the task of the CIDC to transform technology into products which are simple to produce, ergonomically correct, safe and easy to use and to service, and which are also aesthetically appealing there by improving man’s comfort and environment”. Based on this policy we can list the main design factors as ‘function’, ‘use’, ‘appearance’ and ‘production’. Each factor significantly influences a product’s quality and value. The relationship between the factors can be formulated as: V =
Q C
=
F+U+A C
where V=value, C=cost of materials & production, Q=quality, F=function, U=use, and A=appearance. 2.3 Function of Product Design A quantity of managers in the Taiwan region imagines the function of product design to be simply a product’s engineering and manufacturing. Others may think it follows in terms of electronics. In fact, design can be defined as a conscious plan. Its main contribution to product development lies in the synthesis of a concept using carefully assembled facts. Design skills may be defined in relation to the type of product and may also be related to the various functions of the designer. The three main groups directly involved in the product design and development process are: ‘the marketing group’, ‘the technical development and production group’, and ‘the industrial design group’. Team work is the key word applicable during the product development process. All specialists involved cooperate according to a systematic product development pattern, and they must be competent enough to coordinate their specialized ‘optimal solution’ with the expected holistic solution. This coordination creates an optimal product or product system, and, at the same time, prevents the dominance of one function over another. The product development procedure is a systematic process which integrates all product design and development activities from the idea stage to mass production to ensure the product meets market and consumer time and price needs. Product development also works as a coordinator and integrator to ensure that every functional division works as an integrated team to maintain good communication with full commitment to the project goal.
3 Experience in Taiwan At the Pacific rim of Mainland China, it seems as though they are following in the same footsteps as the Taiwan Chinese, that is, developing their industries on the basis of original equipment manufacturer (OEM) orders, and then trying to upgrade to an
User Oriented Design to the Chinese Industries Scenario
159
original brand manufacturing (OBM) level through original design manufacturing (ODM) business. Therefore we can talk about the experience of Taiwanese industry especially on product design and development as following. 3.1 The Gap of Smile Curve and Its Shifting ‘ACER’ has a famous brand image the world over. It symbolizes that Taiwan has not just got a manufacturing industry, but can create a brand which embodies value. The founder of ACER described the current characteristic of the electronic equipment manufacturing industry by a theory called ‘Smile curve’. Within the smile curve, the two sides are marketing and development. Manufacturing function is in the middle [2]. Mr. Shih was encouraged Taiwan industry should move to these two functions of smile curve during the global value system. Marketing and development have higher add-on value throughout the industry. Taiwan should not stay in the middle of the curve which has lower value within the industry. Therefore Taiwan should develop industries on the basis of OEM orders, to upgrade to an OBM level through ODM business (shown as figure 1).
Fig. 1. Smile curve
However a product developed without market strategy, positioning and user knowledge will find it hard to get customer acceptance and to become a market recognized brand. Therefore in the knowledge economy era we should transfer the strategy from ‘smile curve theory’ to ‘close cycle concept’. We should integrate manufacture knowledge into the bottom of smile curve, the technology knowledge on the left side and the marketing knowledge on right side, but more importantly should add the content knowledge of user oriented needs in the top of the cycle (shown as figure 2). 3.2 The Missing Link from OEM to OBM Regarding the industrial development gap as with the product development practice in Taiwan, the term ODM actually means ‘own development manufacturing’, that means we are qualified in technical and engineering development, but usually only offer an on-going solution with ‘me too’ (follower) design, rather then applying user-oriented design (UOD) principles.
160
Y.Z. Liang, D.H. Huang, and W.K. Chiou Cl ose cycl e
Gap1 . Original De si gn Manufactu ri ng Engnineer Knowledge
Research and development Engine ering
Ow n D evelop ment M an ufactu rin g
、
Sale me too
、
MKT ENG'S MFG MNGT Design
、
S tr at eg ic In no vat ion D es ig n
(OBM) Domain Marketing Branding Knowledge Sales
MFG
MFG(OEM) MFG Knowledge (OEM)
Fig. 2. Smile curve shifting
GAPⅡ
M ar ket St rategy
RdD Hi-M anagement Hi -Tec h Hi -Des ig n
Desi gn
Engi neer ing Me too
Manu fact ur ing
DCOR
Fig. 3. Reasonable product innovation development process
However a reasonable innovation design process should define the direction of the innovation strategies firstly, and then according to the goal of the strategies conduct R&D and design and then finally the result could be put into manufacture. Moreover we should be concerned with both technological innovation and product strategy at the same time and building brand image by interdisciplinary collaborative design and marketing based value efficiently. In order to bridge the gap of industrial development in Taiwan, UOD and ‘interdisciplinary collaboration’ integration should be emphasized. It is proposed to upgrade the own development manufacture (follower design) to original design manufacture and to build design strategies including High design of user oriented
User Oriented Design to the Chinese Industries Scenario
161
base, High tech of technology base and hi-management of interdisciplinary collaborate base (shown as figure 3).
4 Scenario and Experience Innovation Design Approach This paper recommended six approaches to carry out user oriented innovative product development framework of new product development procedures which can be applied to a series of practical cases. The approaches are as follows: 4.1 I-Ching and Darwin’s Natural Law Applying I-Ching (the theory of change) and Darwin’s natural law to describe the principle of the form, shape, function to create and develop from a ‘natural environment and scenario’ in which things are living. 4.2 Competitive Product Appraisal and Monitoring Competition As products are developed following their ‘field of use’ and ‘use scenario’, monitoring competition from the users’ point of view and market positioning assists in evaluating their advantages and disadvantages in order to position and define the competitive advantage. 4.3 Macro Vision Scenario This is from an economic, social, technology point of view, defining the product opportunity from macro vision to develop the key issue/s for new product development. 4.4 Micro Scenario Defines the target user group and detailed scenario situations, and activities from the above product opportunities, which interact with the product/s (the user target groups will be generated from character mapping, which are defined from a set of attributes and relate to the product and users). From the micro scenario key issues and design requirements for new products can be identified. The above approaches not only generate original design data from a user’s point of view, but furthermore make it much easier to get consensus from product development teams and really create innovative design/s through interdisciplinary collaboration to create innovative cultural enterprises. 4.5 Scenario Observation Observations of actual situations and interaction with actual sampling characters verify critical issues and design requirements, which are generated from micro scenarios so final design definitions become evident.
162
Y.Z. Liang, D.H. Huang, and W.K. Chiou
4.6 Design Development and Scenario Verification Scenario simulation and scenario verification are facilitated by means of rough ‘mock ups’, prototyping and ‘field test sampling’ to experience and verify users’ scenario/s in order to refine designs and to reduce risks from both users’ and business’ points of view.
5 User Oriented Innovation Design Concept With this approach, we collaborated with ADVANTECH Co. Ltd [1], Taiwan, who is a leader in the industrial computing and automation market. The above methods were applied to a series of interactive interface products for e-automation systems including industrial automation e-platform, service automation e-platform (medical, vehicle), home automation e-platform in ADVANTECH, with UOD scenario prediction and experience innovation approach. The innovative UOD concept for e-automation industries is shown as figure 4.
Fig. 4. Innovative UOD concept [1]
6 Conclusion The most important consideration for managers in this region is the development of marketing and design and not just technology and production. Products have to be designed to closely fit the market and complement users’ life-styles, needs and habits.
User Oriented Design to the Chinese Industries Scenario
163
It is also essential for our region’s producers to think more along the lines of longterm advantages instead of immediate profit. Manufacturers have to put more effort into creating new products as well as improving existing products. They must simultaneously establish their own corporate identity and product image to further their global development. These goals are best met by adhering to a set procedure of product development. This will give the customer both what he desires and generate an inbred ‘quality consciousness’ toward innovative design manufacturers. As noted earlier, product development is the coordinator and integrator of the entire product development cycle. It ensures that the overall program stays on schedule and that the product introduction date is met. Most important, the whole concept is based on the premise that the customer is the boss.
7 Implications Taiwan is an island with a population of 23 million; the market is too small for new innovative products to survive unless enterprise scales up to international markets. With a population of 1.3 billion, Mainland China’s market and industries will have many more opportunities in developing innovative UOD in today’s knowledge economic era.
References 1. 2. 3. 4.
ADVANTECH: http://www.advantech.com/ CIDC: Concern Industrial Design Centre -Philips, Nederland: http://www.design.philips.com ICSID: http://www.icsid.org/ Shi, Z.R.: Acer reconstruction: Initiating, growing up and challenge. Commonwealth Publishing (2004)
Emotional Experiences and Quality Perceptions of Interactive Products Sascha Mahlke1 and Gitte Lindgaard2 1
Centre of Human-Machine Systems, Berlin University of Technology, Franklinstr. 28/29 – FR2-7/2, 10587 Berlin, Germany [email protected] 2 Human-Oriented Technology Lab, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada [email protected]
Abstract. Over the past few years, various novel approaches have been applied to the evaluation of interactive systems. Particularly, the importance of two categories of concepts has been emphasized: non-instrumental qualities and emotions. In this paper we present an application of an integrative approach to the experimental study of instrumental and non-instrumental quality perceptions as well as emotional user reactions as three central components of the user experience. A study is presented that investigates the influence of system properties and context parameters on these three components. The results show that specific system properties independently influence the perception of instrumental (i.e. usability) and non-instrumental qualities (i.e. visual aesthetics). Especially the perception of instrumental qualities was shown to have an impact on the users’ emotional reactions (subjective feelings as well as cognitive appraisals). There was also evidence suggesting that context parameters influenced emotional user reactions.
Emotional Experiences and Quality Perceptions of Interactive Products
165
Mahlke [2] reviewed various approaches to the study of non-instrumental quality aspects. Briefly, he argued that two distinct categories of non-instrumental qualities have been differentiated in most approaches. On the one hand, aesthetic aspects have been discussed. These contain first and foremost visual aspects of product appearance, but can also imply other sensory experiences like haptic or auditory aspects of product use, as for example discussed by Jordan [3] and captured in his definition of physiopleasure. The other category refers to a symbolic dimension of product appearance. The concept of hedonic quality discussed by Hassenzahl [4] belongs to this category, which is similar to what Jordan [3] calls socio- and ideo-pleasure. Although much is being said about non-instrumental quality aspects and their application to design, only a few empirical studies actually measuring these have been reported. In a study of the interplay of non-instrumental quality perceptions with other concepts, Tractinsky, Katz and Ikar [5] highlighted the connection between aesthetics and usability. They argue that users’ aesthetic judgment made before using an interactive system affects their perceived usability even after using it. Lindgaard & Dudek [6] found a more complex relationship between these two concepts. Hassenzahl [4] studied the interplay between usability and hedonic quality in forming overall judgments concerning beauty and goodness. He found that judgments of beauty are more influenced by the user’s perception of the hedonic qualities, while judgments of goodness - as a more general evaluative construct - are affected by both hedonic quality and usability. Although a few empirical studies do exist that contribute to a better understanding of the role of non-instrumental qualities and their interplay with other relevant aspects of technology use, many questions remain to be addressed. In particular, the relationships between quality perceptions and emotional experiences have barely been explored. 1.2 Emotions as Part of the User Experience Rafaeli and Vilnai-Yavetz [7] attempted to link quality perceptions and emotional experience. They suggested that artifacts should be analyzed in terms of three conceptually distinct quality dimensions: instrumentality, aesthetics, and symbolism. They conducted a qualitative study in a non-interactive product domain to better understand the influence of these three quality dimensions on emotional responses. All three categories contributed significantly to the emergence of emotion. Tractinsky and Zmiri [8] applied this idea to an interactive domain by studying various existing websites which yielded similar results, and Mahlke’s [9] study on actual audio players showed that various instrumental and non-instrumental quality perceptions influenced users’ emotional responses. While Rafaeli and Vilnai-Yavetz [7] used interviews, Tracinksy and Zmiri [8] and Mahlke [9] applied questionnaires to assess users’ emotional responses. All these studies focused on the subjective feelings that arise when perceiving or using the relevant products. Much research has been conducted on measurements of emotion during interaction with technical devices, and different methods have been proposed to measure emotions in interactive contexts. Mahlke, Minge and Thüring [10] used Scherer’s [11] multi component model of emotion to structure a range of relevant emotion-measurement methods and relating them to the five components of emotion:
166
S. Mahlke and G. Lindgaard
subjective feelings, facial expressions, physiological reactions, cognitive appraisals and behavioral tendencies. Taken together, there are two major problems with the interpretation of results emerging from the studies reported above that relate emotional experiences during the interaction with users’ quality perceptions [7, 8, 9]: 1. They took a quasi-experimental approach by using existing products. As it was not discussed which properties of the stimuli or other variables influenced quality perceptions and the emotional experience, this question remains unanswered. 2. Rather than measuring all the five components of Scherer’s [11] model, only subjective feelings were measured as indicators of emotions. 1.3 Research Approach Mahlke and Thüring [12] describe an integrated research approach to the experimental study of emotional user reactions considering both instrumental and non-instrumental quality perceptions of interactive systems. Their model defines instrumental and noninstrumental quality perceptions as well as emotional reactions as three central components of the user experience, claiming that characteristic of the interaction affect all three of these. These characteristics primarily depend on system properties, but both user characteristics and context parameters like aspects of the tasks and the situation can play an important role. The outcomes of the users’ interactive experience as expressed in overall judgments of a product, usage behavior or choices of alternatives are shown to involve all three components, namely emotional user reaction as well as instrumental and non-instrumental quality perceptions. This model has been applied to study the influence of system properties on the three user experience components and users’ overall appraisal of the system [12]. In an effort to affect the perception of instrumental qualities as well as user performance, the level of usability was systematically varied as were other system properties modified expected to affect perception of visual aesthetics. Emotions were measured in terms of subjective feelings, motor expressions and physiological responses. The results confirmed that the manipulations had the predicted impact on the perception of both instrumental and non-instrumental qualities. Prototypes high in usability and attractiveness were significantly rated more highly than those that were low in both aspects. The results of the questionnaire assessing subjective feelings showed an effect of both factors. They revealed that the effect of variations in usability was greater than variations in visual aesthetics on both valence and arousal measures. Consequently, the high-usability/high-aesthetics prototype was experienced as most satisfying, while the low-usability/low-aesthetics was found to be most annoying. Since no statistical interaction of usability and aesthetics was found, both factors contributed additively to these emotions. EMG data of facial muscle sites and other physiological measures (dermal activity and heart rate) supported this interpretation. The following study is based on the same research approach, but differs in two aspects. First, the measurement of emotions focuses on subjective feelings and cognitive appraisals to learn more about another component of emotions defined by Scherer [11], and second, task demands were varied as an example for contextual parameters. Hassenzahl, Kekez and Burmester [13] found that the influence of instrumental and non-instrumental quality perceptions on overall judgments differs
Emotional Experiences and Quality Perceptions of Interactive Products
167
depending on whether users are in a goal- or action-mode. In the goal-mode participants were required to accomplish given tasks, while they had the same amount of time to explore the system on their own in the action-mode. This variation was applied to investigate the effect of context parameters on emotional responses. The following predictions were made: 1. The versions with higher levels of usability and/or visual aesthetics would lead to higher instrumental and/or non-instrumental quality ratings. 2. Quality ratings would not be influenced by the usage mode [13]. 3. The versions with higher levels of usability and/or visual aesthetics would lead to differences in the cognitive appraisal of the usage situation and more positive subjective feelings. 4. In goal-mode, the correlation between instrumental quality perceptions and subjective feelings would be higher than between non-instrumental quality perceptions and subjective feelings. In action-mode the opposite would be found.
2 Method The variables investigated concerned the influence of system properties associated with usability and aesthetics of the system and task demands, that is, goal- versus action-mode, on the perception of instrumental and non-instrumental qualities and emotional user reactions. These included subjective feelings and cognitive appraisals. 2.1 Participants Eighty undergraduate students (48 women, 32 men) participated in the study. They were between 18 and 54 years old (average 21.3 years) and received course credit for participation in the study. Most of the participants (n = 72) owned a portable audio player and used it regularly. Almost all (n = 78) used computers daily. 2.2 Material Portable audio players were chosen as the domain of study and different versions were simulated on a computer. The aim of the variation of system attributes was to influence perceived usability and aesthetics of the system independently. To produce two versions with different levels of usability, three system features were varied: the number of menu lines shown (five versus two), a scrollbar indicating available but hidden menu items (given or not), a cue about the present position in the menu hierarchy (given or not). These variations had been used in a previous experiment [12] in which the effect of these on usability varied in the direction one would predict, that is, the most usable version resulted in the highest usability ratings. With respect to system features designed to influence the perception of visual aesthetics, two different body designs were used in the earlier experiment [12] varying in symmetry (high or low), color combination (high or low color differences) and shape (round or square). Because these manipulations resulted only in small differences in perceived aesthetics between the two versions, an attempt was made here to improve the high-aesthetic version by consulting a professional designer.
168
S. Mahlke and G. Lindgaard
The prototypes were presented on a 7” TFT-display with touch screen functionality that participants could hold in their hands for providing input. The display was connected to a computer which ran the simulation of the audio player. 2.3 Design Three independent variables were manipulated: ‘usability’, ‘visual aesthetics’, and ‘mode’ (goal- vs. action-mode). Since each of the variations of ‘usability’ and ‘visual aesthetics’ had two levels (‘high’ and ‘low’), four prototypes were created: (a) ‘highusability’ and ‘high-aesthetics’, (b) ‘high-usability’ and ‘low-aesthetics’, (c) ‘lowusability’ and ‘high-aesthetics’, (d) ‘low-usability’ and ‘low-aesthetics’. In the goalmode participants were required to accomplish a set of tasks, and in the action-mode they were freely browsing the system for the same amount of time. All three variables were between-subjects factors. 2.4 Measures Two types of behavioral data were recorded in the goal-mode condition to ensure that versions of assumed high or low usability differed as planned: task completion rates and time on task. Questionnaires were employed to assess the user’s perception of instrumental and non-instrumental qualities. Selected sub-dimensions (controllability, effectiveness, helpfulness, learnability) of the Subjective Usability Measurement Inventory (SUMI) [14] served to rate usability. The dimension ‘classical visual aesthetics’ of a questionnaire developed by Lavie and Tractinsky [15] was used to measure visual aesthetics. Subjective emotional data were obtained via the Self-Assessment Manikin (SAM) [16] which captures the quality, or valence (positive/negative), and intensity (arousal) of emotions. Cognitive appraisals were obtained via a questionnaires based on the Geneva Appraisal Questionnaire [17]. It measures five appraisal dimensions: intrinsic pleasantness, novelty, goal/need conduciveness, coping potential, and norm/self compatibility. Novelty is a measure of familiarity and predictability of the occurrence of a stimulus, while intrinsic pleasantness describes whether a stimulus event is likely to result in a positive or negative emotion. A goal conduciveness check establishes the importance of a stimulus for the current goals or needs. Coping potential refers to the extent to which an event can be controlled or influenced. Norm/self compatibility describes the extent a stimulus satisfies external and internal standards. 2.5 Procedure The experiment took roughly 30 minutes on average. Participants were given instructions describing the experimental procedure and the use of SAM. They were then asked to rate their subjective feelings as a baseline measure. Then, depending on the experimental condition to which they were assigned at random, the relevant player was presented and participants rated its visual aesthetics. Next, they read a short text describing how to use the system.
Emotional Experiences and Quality Perceptions of Interactive Products
169
Participants were then asked either to complete the set of five tasks or to explore the system for a certain amount of time. In the goal-mode condition a limit of two minutes was set for each task. Typical tasks were ‘Please have a look which songs you find on the player in the Genre POP’ or ‘Please change the sound setting of the player to CLASSIC’. However, participants actually completed the five tasks in five minutes on average. Therefore, a five-minute time limit was also set for the browsing participants. In the task condition participants filled in SAM scales after the first, third and fifth task. In the browsing condition, they were asked to rate their current subjective feeling after one, three and five minutes of exploration. At the end of this, the cognitive appraisal questionnaire was completed and usability ratings were obtained.
3 Results A 2x2 ANOVA for ‘usability’ and ‘visual aesthetics’ was performed on the goalmode data only, assessing task-completion rates and task-completion time. There was a significant main effect for ‘usability’ only, for both task-completion rates, F(1,38)=9.20, p < .01, and task-completion time, F(1,38)=13.10, p < .01. Thus, high usability led to better performance on both measures. 3.1 Instrumental and Non-instrumental Quality Perception Table 1 summarizes the average usability and visual aesthetics ratings for each condition. The ratings were transformed to values between 0 and 1 because the range of ratings differed between the variables. The Table shows that the average ratings were comparatively high even in the low-usability and the low-aesthetics conditions. Table 1. The first number in each cell represents the average usability rating and the second number the average visual aesthetics rating for each condition (ratings are transformed to values between 0 and 1)
A 2x2x2 ANOVA for ‘usability’, ‘visual aesthetics’ and ‘mode’ performed on the usability ratings revealed a significant main effect for ‘usability’ only, F(1,72)=9.0, p < .01. A similar 2x2x2 ANOVA carried out on the visual aesthetics ratings showed a significant main effect for ‘visual aesthetics’ only, F(1,72)=34.3, p < .001. Consistent with hypotheses 1 and 2, this suggests that the system properties affected the perception of both instrumental (i.e. usability) and non-instrumental qualities (i.e. visual aesthetics), and that quality perceptions were not influenced by usage mode.
170
S. Mahlke and G. Lindgaard
3.2 Emotional User Reactions A series of 2x2x2 ANOVAs for ‘usability’, ‘visual aesthetics’ and ‘mode’ on each of the five cognitive appraisal dimensions showed that participants rated the intrinsic pleasantness of the interaction higher for the high-usability than for the low-usability version, F(1,72)=3.9, p < .05. Furthermore, the experience with the low-usable system was rated as more novel, F(1,72)=5.6, p < .05, and self/norm compatibility was higher for the high-usability version, F(1,72)=5.2, p < .05. Neither ‘visual aesthetics’ nor ‘mode’ influenced intrinsic pleasantness, novelty or self/norm compatibility, and goal conduciveness as well as coping potential showed no significant effect for any of the independent variables. In summary then, we found partial support for hypothesis 3: differences in cognitive appraisals for three of the appraisal dimensions and only the factor ‘usability’ had a significant influence. For the analysis of subjective feelings we calculated the changes from the baseline value obtained at the beginning of the experiment to the three values assessed during the interaction for each participant. For the changes from the baseline to the first two assessments of subjective feelings the 2x2x2 ANOVAs with ‘usability’, ‘visual aesthetics’ and ‘mode’ as independent variables revealed no significant effects for either the dimensions valence or arousal. Figure 1 shows the average subjective feeling changes to the third data point at the end of the interaction for the four prototypes. A 2x2x2 ANOVAs for ‘usability’, ‘visual aesthetics’ and ‘mode’ and the changes in valence as dependent variable revealed a significant effect for ‘usability’ only, F(1,72)=25.5, p < .05. The ANOVA for arousal as dependent variable showed no significant effects. Thus, only ‘usability’ affected the valence of subjective feelings, what again only partially supported hypothesis 3.
Arousal
1
0
-1 -2
-1
0 Valence
1
2
Fig. 1. Changes of subjective feeling ratings from the beginning of the experiment to the third assessment during the interaction with the system for the four systems (squared high vs. round low usability; filled high vs. unfilled low aesthetics; SAM ratings were between 0 and 8)
In order to test prediction 4 we conducted partial correlations to assess the correlation of usability and visual aesthetic ratings and subjective feelings in the two usage situations. As shown in Table 2 we found a high correlation for perceived usability and valence in the goal-mode, but not for perceived aesthetics and valence. For arousal none of the correlations was significant. For the action-mode the results yielded a moderately significant correlation with perceived usability and also with perceived aesthetics. For arousal again none of the correlations was significant.
Emotional Experiences and Quality Perceptions of Interactive Products
171
Table 2. Correlation coefficients between quality ratings (usability and visual aesthetics) and subjective feelings (valence and arousal) Goal-mode (tasks) perceived usability – valence
.66
perceived aesthetics – valence
-.01 b)
perceived usability – arousal
-.16
.35 a) * .35 b) *
a)
-.19 a)
.04 b)
perceived aesthetics – arousal Partial correlation coefficients with p < .05; ** p < .01
Action-mode (exploration)
a) **
a)
.22 b) b)
visual aesthetics controlled and usability controlled
*
4 Discussion As stated in hypothesis 1, system properties did independently influence instrumental as well as non-instrumental quality perceptions. Both usability and aesthetics manipulations affected subjective predictions in the predicted directions. In comparison to other studies [5, 18], we did not find any influence of the visual aesthetics variation on perceived usability. One reason may be that in other studies an overall usability rating was used, while we applied a detailed measure for usability. No effect of the factor ‘mode’ was found on quality perceptions (prediction 2) as one would have expected based on Hassenzahl et al.’s [13] findings. The integration of cognitive appraisals as another component of emotions followed the recommendations by Mahlke et al. [10] to consider different components of emotions. We found an influence of the factor ‘usability’ on cognitive appraisals. The interaction with the low-usability system was experienced as less intrinsically pleasant, which corresponds to the findings regarding the subjective feelings. Furthermore, participants rated it as more novel or unusual, which may have led to more negative subjective feelings. The low-usability system was also rated as less self/norm compatible. Although this experiment is another step to the study of cognitive appraisals in interactive contexts, further research is clearly needed on this topic. In terms of the users’ subjective feelings, these were only affected by variations in usability. Furthermore, only the valence dimension was influenced. Participants’ subjective feelings were more positive in the high usability condition towards the end of the experiment compared to the beginning. Surprisingly, we did not find an effect of ‘visual aesthetics’, although we tried to improve the differences in visual aesthetics in comparison to a previous experiment [12]. The variation of usage mode revealed differences in the connections between quality perceptions and participants’ subjective feelings. These differences were most pronounced for the subjective feeling dimension of valence. While there was a high correlation between the valence of users’ subjective feelings and the perceived usability of a system and no correlation with the perceived visual aesthetics when participants focused on the given tasks in the goal-mode, we found moderate correlations between valence and both perceived usability and aesthetics when participants were merely exploring the system. These results indicate that context
172
S. Mahlke and G. Lindgaard
parameters like usage mode influence both the specific quality dimensions for overall judgments [13], and also the quality of the emotional experience. However, more research is needed on these relationships, especially with respect to the subjective feeling dimension of arousal. In future studies the influence of user characteristics should also be studied in addition to system properties and context parameters. Furthermore, the variation of system properties that influence noninstrumental qualities other than visual aesthetics (e.g. haptic and acoustic quality) may reveal important insight especially for the domain of consumer electronic products. Acknowledgements. This research was supported by the German Research Foundation (DFG) as part of the Research Training Group ‘Prospective Engineering of Human-Technology Interaction’ (no. 1013) and by the German Academic Exchange Service (DAAD) with a travel grant. We would like to thank Lucienne Blessing, Manfred Thüring and various colleagues at the Center on Human-MachineSystems in Berlin and the Human-Oriented Technology Lab in Ottawa for the discussions on the study.
References 1. ISO: ISO 9241: Ergonomic requirements for office work with visual display terminals. Part 11: Guidance on usability. ISO, Genf. (1998) 2. Mahlke, S.: Aesthetic and Symbolic Qualities as Antecedents of Overall Judgements of Interactive Products. In: Bryan-Kinns, N., Blanford, A., Cruzon, P., Nigay, L. (eds.) People and Computers XX - Engage, pp. 57–64. Springer, Heidelberg (2006) 3. Jordan, P.W.: Designing pleasurable products. Taylor & Francis, London (2000) 4. Hassenzahl, M.: The Interplay of Beauty, Goodness, and Usability in Interactive Products. Human-Computer Interaction 19, 319–349 (2004) 5. Tractinsky, N., Katz, A.S., Ikar, D.: What is beautiful is usable. Interacting with Computers 13, 127–145 (2000) 6. Lindgaard, G., Dudek, C.: What is the evasive beast we call user satisfaction? Interacting with Computers 15(3), 429–452 (2003) 7. Rafaeli, A., Vilnai-Yavetz, I.: Instrumentality, aesthetics and symbolism of physical artifacts as triggers of emotion. Theoretical Issues in Ergonomics Science 5, 91–112 (2004) 8. Tractinsky, N., Zmiri, D.: Exploring Attributes of Skins as Potential Antecedents of Emotion in HCI. In: Fishwick, P. (ed.) Aesthetic Computing, MIT Press, Cambridge (2006) 9. Mahlke, S.: Studying user experience with digital audio players. In: Harper, R., Rauterberg, M., Combetto, M. (eds.) ICEC 2006. LNCS, vol. 4161, pp. 358–361. Springer, Heidelberg (2006) 10. Mahlke, S., Minge, M., Thüring, M.: Measuring multiple components of emotions in interactive contexts. In: CHI ’06 extended abstracts on human factors in computing systems, pp. 1061–1066. ACM Press, New York (2006) 11. Scherer, K.R.: What are emotions? And how can they be measured? Social Science Information 44, 693–727 (2005)
Emotional Experiences and Quality Perceptions of Interactive Products
173
12. Mahlke, S., Thüring, M.: Antecedents of Emotional Experiences in Interactive Contexts. In: CHI ’06 proceedings on human factors in computing, ACM Press, New York (2007) 13. Hassenzahl, M., Kekez, R., Burmester, M.: The importance of a software’s pragmatic quality depends on usage modes. In: Lucsak, H., Cakir, A.E., Cakir, G. (eds.) (WWDU2002). Proceedings of the 6th international conference on Work With Display Units, pp. 275–276. ERGONOMIC Institut für Arbeits- und Sozialforschung, Berlin (2002) 14. Kirakowski, J.: The software usability measurement inventory: Background and usage. In: Jordan, P.W., et al. (eds.) Usability Evaluation in Industry, pp. 169–178. Taylor & Francis, London (1996) 15. Lavie, T., Tractinsky, N.: Assessing dimensions of perceived visual aesthetics of web sites. International Journal of Human-Computer Studies 60, 269–298 (2004) 16. Lang, P.J.: Behavioral treatment and bio-behavioral assessment: Computer applications. In: Sidowski, J., Johnson, H., Williams, T. (eds.) Technology in Mental Health Care Delivery Systems, pp. 119–137. Ablex Publishing, Greenwich (1980) 17. Scherer, K.R.: Appraisal considered as a process of multi-level sequential checking. In: Scherer, K.R., Schorr, A., Johnstone, T. (eds.) Appraisal processes in emotion: Theory, methods, research, pp. 92–120. Oxford University Press, New York, Oxford (2001) 18. Ben-Bassat, T., Meyer, J., Tractinsky, N.: Economic and Subjective Measures of the Perceived Value of Aesthetics and Usability. ACM Transaction on Computer-Human Interaction 2, 210–234 (2006)
CRUISER: A Cross-Discipline User Interface and Software Engineering Lifecycle Thomas Memmel, Fredrik Gundelsweiler, and Harald Reiterer Human-Computer Interaction Lab University of Konstanz, D-78457 Konstanz, Germany {memmel,gundelsw,reiterer}@inf.uni-konstanz.de
Abstract. This article seeks to close the gap between software engineering and human-computer interaction by indicating interdisciplinary interfaces of SE and HCI lifecycles. We present a cross-discipline user interface design lifecycle that integrates SE and HCI under the umbrella of agile development. Keywords: Human-Computer Interaction, Usability Engineering, Extreme Programming, Agile Modeling, User-centered Design & Development (UCD).
1 Human-Computer Interaction and Software Engineering From its birth in the 1980’s, the field of human-computer interaction (HCI) has been defined as a multidisciplinary subject. To design usable systems, experts in the HCI arena are required to have distinct skills, ranging from an understanding of human psychology, to requirements modeling and user interface design (UID) [1]. In this article we will use the term user interface (UI) designer as a synonym for a professional who combines knowledge of usability, graphics and interaction design. Table 1. Methods for integrating SE and UE, based on [2] (excerpt) Integration issue
Method of application
Mediating and improving the communication lines between users, usability experts and developers
Use medium-weight artifacts, work with toolkits appropriate for collaborative design, talk the same language, work in pairs
Extending software engineering artifacts for UI specification & conceptualization
Use artifacts known by both professions and adjust their expressiveness
Extending RE methods for collecting information about users and usability
Include principles, practice and light- to medium-weight methods from HCI into RE
Representing design artifacts including prototypes using different formalisms
Apply prototyping as a method of participatory design; all stakeholders gather requirements
CRUISER: A Cross-Discipline User Interface and Software Engineering Lifecycle
175
functional requirements are translated into a running system. HCI and SE are recognized as professions made up of very distinct populations. Each skill set is essential for the production of quality software, but no one set is sufficient on its own. The interaction layer is the area where HCI and SE are required to work together, in order to ensure that the resulting software product behaves as specified in the initial requirements engineering (RE). To provide a high level of UI usability, software SE has to work with people with a background in HCI, but the course of collaboration is mostly unclear. It is therefore true that classic and agile SE methods still lack integration of HCI methods and processes (see Table 1). Bearing these two different engineering disciplines in mind, each software design process can be characterized in terms of its dependency on its engineering orientation, ranging from a formal and model-based methodology to an informal explanatory design. SE tends to be more formal and “consequently, the business user and IT analyst may think that they both agree on a design, only to discover down the line that they had very different detailed implementations and behaviors in mind” [3]. Very formal or complex models are an inappropriate base for communication, especially so for collaborative design processes with high user- and businessstakeholder participation. Scenarios [4] - known as user stories in Extreme Programming (XP) [5] - and prototypes are recognized as interdisciplinary modeling language for RE and as bridging techniques for HCI and SE [6]. In SE, scenarios – as a sequence of events triggered by the user – are generally used for requirements gathering and for model checking. HCI applies scenarios to describe software context, users, user roles, tasks and interaction [4]. Prototypes in SE are used to verify functional specifications and models. Agile Modeling (AM) and XP recognize prototypes as a type of small release [5,7], whereas HCI mainly employs them for iterative UID [8]. The bottom-line is that some informal methods of XP and AM are close to HCI practice and therefore the pathfinder for a common course of action. While heavy-weight methods such as style guides (HCI) are far too expensive, lightweight methods such as essential use cases (SE) are in contrast too abstract for system specification. Cross-discipline agile methods are the optimum, and workable, compromise. Agile approaches of both SE [5] and HCI [9,10] are therefore the interface for our common and balanced software lifecycle known as CRUISER.
2 From XP to Agile Cross-Discipline Software Engineering In contrast to classic, heavy-weight SE processes like the V-Model, agile methods begin coding at a very early stage while having a shorter up-front RE phase. Following the paradigm of XP, implementation of code takes place in small increments and iterations, and the customer is supplied with small releases after each development cycle. During the exploration phase, teams write user stories in an attempt to describe user needs and roles. But the people interviewed need not necessarily be the end-users of the eventual software. XP therefore often starts coding based only on assumptions about end-user needs [10]. AM is less rigid than XP and takes more care over initial RE as is provides room for low-fi prototyping, activity diagrams or use-case diagrams [11]. Nevertheless, the analysis phase is finished as soon as requirements have been declared on a horizontal
176
T. Memmel, F. Gundelsweiler, and H. Reiterer
level, because the iterative process assumes that missing information will be filled in at later stages. Development in small increments may work properly as long as the software is not focused on the UI. Changes to software architecture usually have no impact on what the user sees and interacts with. With the UI, however, it is a different story. When designing UIs, continual changes to the UI may give rise to conflicts with user expectations and learnability, cause inconsistency and finally lead to user dissatisfaction. Thus, agile development does not really qualify as user-centered design (UCD), but can function as one pillar for an integrated approach [10]. Both SE and UID have to cope with a shorter time-to-market, in which the quality of the delivered software must not suffer. This therefore is a great challenge both for management and the methods and tools applied. Our idea is a balanced hybrid process, which is both agile SE and agile UCD, and which is consistent with the principles and practices of both disciplines. In order to identify interfaces between agile SE and agile HCI, we have to highlight different approaches to UID, analyze their agile potential and their different contributions to a cross-discipline process. Like XP, original UCD is a highly iterative process. It differs from agile methods, however, since real users are taken into account and the development team tries to understand user needs and tasks before any line of code is written. The lifecycles of usability engineering processes [4,12] provide numerous methods and tools that should support the designer in gathering all of the required information. Most of these methods are rated as heavy-weighted, due to their claim to analyze and document as much as possible about users, work flows, context, etcetera right from the beginning. Constantine [9] argues that UCD produces design ideas in a rather magical process in which the transformation from claims to design is neither comprehensible nor traceable. Such a “Black Box Designer” produces creative solutions without being able to explain or illustrate what goes on in the process. Furthermore, UCD tries to converge the resulting, often diverse, design alternatives into a single solution, which is then continuously evaluated and refined. UCD may therefore take a long time, or even fail, if too many users are involved and narrowing the design space is difficult. Iteration may create the illusion of progress, although the design actually goes round in circles and solutions remain elusive. Altogether, a one-to-one integration of UCD processes and methods is in general inappropriate for an agile course of action. Constantine’s usagecentered design approach takes up the basic philosophy of AM and concentrates on essential and easy to understand models. Through their application, HCI becomes more formal, but the simplicity of their syntax still enables collaborative design by engineering rather than by trial and error [9] (see Table 2). Although the list of usagecentered design success stories is creditable, the products praised tend to support user performance rather than user experience. This cannot be the only aspiration of a modern design approach, however. This is where Donald Norman's recently proposed activitycentered design approach (ACD) [13] comes in. Products causing a high joy of use can reach great user acceptance even when they lack usability. Norman therefore votes for the integration of emotional design issues and the stronger consideration of user satisfaction. In Lowgren and Stolterman’s book about thoughtful interaction design (TID) [14], the designer, in order to design such highly usable and aesthetic systems, switches between 3 levels of abstraction: vision, operative image and specification. If the designer is confronted with a design situation, at first an often sketchy and diffuse vision emerges. Frequently, several visions are promising and are therefore competing to be implemented, eventually resulting in a chaos of conflicting visions. The initial
CRUISER: A Cross-Discipline User Interface and Software Engineering Lifecycle
177
version of the operative image is the first externalization of the vision, e.g. captured in mock-ups or elaborated interactive (hi-fi) prototypes. It enables manipulation, stimulation, visualization and decision making for the most promising design. The designer wants to learn as much about the design space as possible, narrowing the design towards the best solution as late as possible. The operative image is transformed into a (visual) specification of the final design if it is sufficiently detailed. Table 2 shows a comparison of the design approaches under discussion. Our development lifecycle is set up on the core methods of all the approaches presented, such as e.g. selective user involvement (UCD, ACD), prototyping for visual thinking (TID), as well as modeling with scenarios or task maps (usage-centered design). Table 2. Comparison of user interface design approaches, adapted from [9] User-Centered Design
Usage-Centered Design
Activity-Centered Design
Thoughtful Interaction Design
Focus is on users
Focus is on usage
Focus is on activities
Focus is on design
Substantial user involvement
Selective user involvement
Authoritative user involvement
Thoughtful user involvement
User studies Particip. Design User testing
Explorative modeling Model validation Usability inspections
All designers in a project need to have a similar understanding of the vision and the wholeness of the system (TID). Thus continuous and lively discussion is necessary (XP). Informal communication across organizational borders should be easy, and teams should have common spaces (XP). Since reaching agreement on abstract notions (text) is difficult, ideas have to be made visible, allowing participants to look at, feel, analyze and evaluate them as early as possible (XP, AM). The process should be controlled by an authoritative person who must have a deep understanding of both SE and HCI. With our demand for such highly capable personnel, we concur with what XP and AM announced as one of their most important criteria for project success [5]. The leader navigates through the development process, proposes solutions to critical design issues and applies the appropriate design, engineering and development methods. Since the gap between SE and HCI becomes less significant “when the (HCI) specialist is also a strong programmer and analyst” [2], we chose XP as fundamental to our thoughts on bonding SE and HCI. Its principle of pair programming allows people with different fields of expertise, but common capabilities, to design a system together.
178
T. Memmel, F. Gundelsweiler, and H. Reiterer
The basis of our cross-discipline lifecycle is therefore the identification of similarities between XP and HCI (see Table 3), AM and HCI (see Table 4), as well as ACD and TCD when compared to HCI, AM and XP (see Table 5). We outline some major similarities, although our comparison highlighted many more interfaces of these disciplines. Although different in their wording, agile principles and practices are comparable and show a significant overlap, such as in iterative design, small releases and prototyping, story cards of active stakeholder participation and scenarios, or testing and evaluation. Modern UID approaches do not oppose collaboration with SE; on the contrary, they underline the commonalities. Table 3. Similarities between XP and HCI (excerpt) XP Practice
HCI Practice
Iteration, Small Increments, Adaptivity
Prototyping
Planning Game
Focus Groups
Story Cards, Task Cards, User Stories
Scenarios, User Profiles, Task Model
Table 4. Similarities between AM and HCI (excerpt) Agile Modeling Practice
Usability Engineering Practice
Prove It With Code
Prototyping
Create Several Models in Parallel
Concurrent Modeling
Active Stakeholder Participation
Usage-Centered Design, User Participation
Consider Testability
Evaluation, Usability Inspections
Table 5. Overall comparison of agile SE, usual HCI and other practice (excerpt) AM & XP Practice
HCI Practice
TID & ACD Practice
Minimalist documentation
Comprehensible models
Interactive representations
Show results early
Lo-/Hi-Fi prototyping
Make ideas visible asap
Small teams, design rooms
Design rooms, styles guides
Informal communication
Active stakeholder part.
Collaborative design
externalization of visions
User performance
User performance, user experience
User performance, user experience, hedonic quality
3 Agile Cross-Discipline User Interface Design and Software Engineering Lifecycle Our agile cross-discipline user interface and software engineering lifecycle, called CRUISER, originates in our experience of developing various kinds of interactive
CRUISER: A Cross-Discipline User Interface and Software Engineering Lifecycle
179
software systems in teams with up to 20 members [16]. Although CRUISER is based on XP, we firmly believe in a scaling of our lifecycle for larger teams, bearing in mind success stories of agile development with several hundred team members [17] and within large organizations [18]. For the following explanation of CRUISER, we concentrate on those issues that need to be worked out collaboratively by HCI and SE experts. SE practice that is independent from UID are not mentioned in detail. CRUISER starts with the initial requirements up-front (IRUP, see Table 6), which must not take longer than the claims analysis in XP. The agile timeframe can be preserved if the methods employed can be rated as agile (see Table 3, 4, 5) and interdisciplinary. Concerning the design of the UI, XP and AM practice is not sufficient and has to be endorsed by UID practice and authoritive design (TID, ACD). Table 6. CRUISER initial requirements up-front; contributions of disciplines Initial Requirements Up-Front (IRUP) Agile SE
Human-Computer Interaction
Authoritive Design
Use Cases, Usage Scenarios Technical Requirements User Performance Goals
Role & Task Model User-, Task-, Interaction Scenarios Essential Use Cases UI Patterns User Experience Goals
As discussed in Chapter 2, the real users have to be taken into account rather than just stakeholders of any kind. Appropriate cross-discipline methods for analyzing user needs are role models and task models. The model-based RE proposed by [9] focuses on surveying essential information and satisfies an agile course of action due to the use of index cards. The user roles are prioritized (Focal User Roles) and sorted in relation to their impact on product success. Finally, essential use cases describe user tasks and enable the building of task model and task map. Like user roles, task cases are sorted in accordance with Kent Beck’s proposal, which is “required - do first, desired - do if time, deferred - do next time”, whenever the necessary scenarios are established for understanding and communication. For a shared understanding of developers and for communication with stakeholders, all models are translated into scenarios, which can focus on different aspects of UID (users, tasks, interactions). Since agile methods do not consider the UI in detail, they do not recognize extensive style guides as used in HCI practice. We therefore suggest light-weight style guides that are shorter, more relevant and contain UI patterns [19]. They ease the design process by providing design knowledge and experience (AM: Apply Design Standards, Use Existing Resources). During all IRUP assignments, users, HCI, SE and business personnel support and finalize RE with initial discussions about scenarios and design alternatives. This alone will result in various outline visions such as mockups or prototypes that make up the initial project design space. In contrast to other HCI lifecycles (e.g. [12]), CRUISER envisions the externalization of design visions even before the requirements analysis is finished. In our opinion, this break with common HCI practice enables the UI designer to decide
180
T. Memmel, F. Gundelsweiler, and H. Reiterer
very early about the degree of user involvement and the necessity of more innovative solutions. He can have a considerable influence on balancing user performance, user experience and hedonic quality demands and can guide the IRUP accordingly. The second phase of the development process is the initial conceptual phase (ICP, see Figure 1). In the ICP we envisage a separation of ongoing UI prototyping from architectural prototyping whenever possible to speed up the process. The conscientious application of software patterns [19] facilitates this procedure. The development of UI and system architecture can take place in parallel as soon as a minimalist, common UI specification [13] is generated and the necessary interfaces are identified. Dependencies between UI and system architecture can be found with the help of task cases and scenarios established during IRUP. It is very likely that highly interactive UIs will have greater impact on the system architecture.
Fig. 1. CRUISER initial conceptual phase
As discussed, prototypes are common practice in HCI and SE. The overall purpose of the ICP is therefore the generation of more detailed and interactive prototypes for narrowing the design space towards a single solution through discussion with stakeholders and through scenario refinement [3]. For this assignment, the designer must leap between abstract and detailed levels of prototyping, always considering a timeframe and expressivity suitable for an agile project environment (see Table 7). Bearing in mind the claims of agile methods, prototypes should be easy to work with and, above all, quick to produce and easy to maintain. With more interactive and complex external representations, the designer conducts a dialogue about design solutions and ideas. Prototypes that are visually more detailed help us to overcome the limitations of our cognitive abilities to process, develop, and maintain complex ideas and to produce a detailed operative image (TID). As long as the prototype can be modified using simple direct manipulation techniques, the users can be proactively involved in the participatory process. In addition to low-fi prototyping for e.g. conceptual design, a modern UID approach must also provide methods and tools for hi-fi prototyping that overcomes most of the disadvantages mentioned in Table 7. We
CRUISER: A Cross-Discipline User Interface and Software Engineering Lifecycle
181
recommend prototyping tools such as Macromedia Flash and iRise Studio. They are easy to use for all stakeholders due to the absence of coding, they allow reuse of components through the application of patterns or templates, and they produce running interactive simulations that can be enhanced to small releases. Table 7. Low- and High-Fidelity Prototyping, based on [8] (excerpt) Type
Advantages
Disadvantages
Low-Fidelity
less time & lower cost evaluate multiple concepts communication device address screen layout issues
limited usefulness for usability tests navigational and flow limitations facilitator-driven poor specification
High-Fidelity
partial/complete functionality interactive use for exploration and test marketing & sales tool
time-consuming to create inefficient for proof-of-concept designs blinds users to major representational flaws management may think it is real
Interactive prototypes can also run as “Spike Solutions”, which are used to evaluate and prove the functionality and interoperability of UI concepts and system architecture. More importantly, they can be applied as visual, interactive UI specifications in the ensuing construction phase. Visual specifications are unambiguous and can guarantee the final system matches stakeholder expectations about UID and behavior. The prototyping-based process minimizes the risk of making wrong design decisions and leads the way towards a winning design solution. Through the well-balanced and thoughtful application of selected methods of RE such abstract modeling or detailed prototyping, CRUISER avoids a design by trialand-error and makes the design process move forward in a traceable manner. The process of identifying the most promising design solution is guided by UI evaluations, which can be kept at low complexity if the UE methods applied are agile [20]. In order to give due regard to the UI's hedonic qualities, which are e.g. the ability to stimulate or to express identity, we envision a design review with AttrakDiff [15]. On entering the construction and test phase (CTP), coding starts (see Figure 2). At this phase, the CRUISER lifecycle closely resembles the incremental and iterative manner of XP. CTP therefore begins with iteration planning and the creation of unitand acceptance-tests, which are later used to evaluate parts of the system architecture (e.g. automatically) and the UI (e.g. with extreme evaluations [20]). The latter guarantees that the previously defined usability or hedonic quality goals are properly taken into account. They are only to be executed if a usability expert on the team identifies a need for it. We therefore recommend the integration of HCI personnel in the pair programming development. As with the construction of prototypes, the actual coding of UI and system architecture again takes place in parallel, and components of the UI that have great impact may be developed faster initially and then later refined during the following iterations. As in XP, the CTP ends with the deployment of a small release. Before the next iteration starts, each small release can again be evaluated using cheap and fast methods [20]. If usability or hedonic quality issues are identified, they can also be
182
T. Memmel, F. Gundelsweiler, and H. Reiterer
documented on index cards (“defect cards”). Each defect is assigned to its corresponding task case. The usability defects may be sorted and prioritized and thus reviewed during earlier or later iterations. If usability or design catastrophes occur, HCI and SE experts and stakeholders can decide on the necessary measures. The last step in the CRUISER lifecycle is the deployment phase. While users are working with the system, new functionality may be requested, or usability and design issues that were underrated during the iterations may be raised. The lifecycle therefore allows for a return to earlier phases to cater for such new requirements.
Fig. 2. CRUISER construction and test phase
4 Summary Our motivation was to take a step towards a cross-discipline procedure for software design with respect to agile movements. With the CRUISER lifecycle, we bridge HCI and SE based on the commonalities of both fields. Similarities can be found in basic principles and practices as well as among the methods and tools that are typically applied. CRUISER has important links to XP [5], but differs from it in many important aspects related to AM, HCI and beyond. For integrating all critical disciplines under the umbrella of one common lifecycle, we concur with the findings of interdisciplinary researchers and use scenarios and prototypes as fundamental artifacts propelling a design process with high involvement of users and stakeholders.
References 1. Pyla, P.S., Pérez-Quiñones, M.A., Arthur, J.D., Hartson, H.R.: Towards a Model-Based Framework for Integrating Usability and Software Engineering Life Cycles. In: Proceedings of Interact 2003, Zurich, Switzerland, September 1-3, IOS Press, Amsterdam (2003) 2. Seffah, A., Gulliksen, J., Desmarais, M.C. (eds.): Human-centered software engineering – integrating usability in the development process, pp. 3–14. Springer, Heidelberg (2005) 3. Zetie, C.: Show, Don’t tell - How High-Fidelity Prototyping Tools Improve Requirements Gathering, Forrester Research Inc. (2005) 4. Rosson, M.B., Carroll, J.M.: Usability engineering: scenario-based development of human computer interaction. Morgan Kaufmann, San Francisco (2002) 5. Beck, K.: Extreme Programming Explained. Addison-Wesley, London, UK (1999)
CRUISER: A Cross-Discipline User Interface and Software Engineering Lifecycle
183
6. Sutcliffe, A.G.: Convergence or competition between software engineering and human computer interaction. In: Seffah, A., Gulliksen, J., Desmarais, M.C. (eds.) Human-centered software engineering – integrating usability in the development process, pp. 71–84. Springer, Heidelberg (2005) 7. Blomkvist, S.: Towards a model for bridging agile development and user-centered design. In: Seffah, A., Gulliksen, J., Desmarais, M.C. (eds.) Human-centered software engineering – integrating usability in the development process, pp. 219–244. Springer, Heidelberg (2005) 8. Rudd, J., Stern, K., Isensee, S.: Low vs. high fidelity prototyping debate, Interactions, vol. 3(1), pp. 76–85. ACM Press, New York (1996) 9. Constantine, L.L.: Process agility and software usability: Toward lightweight usagecentered design, Information Age, vol. 8(8) (August 2002) 10. Gundelsweiler, F., Memmel, T., Reiterer, H.: Agile Usability Engineering. In: KeilSlawik, R., Selke, H., Szwillus, G. (Hrsg.) Mensch & Computer 2004: Allgegenwärtige Interaktion, pp. 33–42. Oldenbourg Verlag, München (2004) 11. Ambler, W.S.: Agile Modeling. John Wiley & Sons, New York (2002) 12. Mayhew, Deborah, J.: The usability engineering lifecycle - A Practicioners Handbook for User Interface Design. Morgan Kaufmann, San Francisco (1999) 13. Norman, D.: Human-Centered Design Considered Harmful. Interactions 12(4), 14–19 (2005) 14. Lowgren, J., Stolterman, E.: Thoughtful Interaction Design: A Design Perspective on Information Technology. MIT Press, Cambridge, MA (2004) 15. Hassenzahl, M., Platz, A., Burmester, M., Lehner, K.: Hedonic and Ergonomic Quality Aspects Determine a Software’s Appeal, In: Proceedings of the CHI 2000, Conference on Human Factors in Computing, The Hague, NL, pp. 201–208 (2000) 16. Limbach, T., Reiterer, H., Klein, P., Müller, F.: VisMeB: A visual Metadata Browser. In: Rauterberg, M. pp. 993–996. IOS Press, Amsterdam (2003) 17. Eckstein, J.: Agile Software Development in the Large: Diving Into the Deep. Dorset House Publishing Co., Inc. New York (2004) 18. Lindvall, M., Muthig, D., Dagnino, A.: Agile Software Development in Large Organizations. Computer 37(12), 26–34 (2004) 19. Borchers, J.: A Pattern Approach to Interaction Design. John Wiley & Sons, New York (2001) 20. Gellner, M., Forbrig, P.: Extreme Evaluations – Lightweight Evaluations for Soft-ware Developers, In: IFIP Working Group 2.7/13.4, editor, INTERACT 2003 Workshop on Bridging the Gap Between Software Engineering and Human-Computer Interaction (2003)
Interface Between Two Disciplines The Development of Theatre as a Research Tool Maggie Morgan and Alan Newell School of Computing, University of Dundee, Scotland, DD1 4HN [email protected]
Abstract. Dundee University’s School of Computing is researching technology for older users, whose difficulty with technology often exclude them from its benefits. This paper discusses the problems raised in consulting potential users who feel they do not understand technology and are anxious about using it. How should the technologists and designers get over to this clientele the somewhat abstract concepts of ‘what might be developed’ and how it might affect the users’ quality of life? How could they keep the focus of discussion while giving the older people the confidence to be truthful? Experiments made with video and live theatre in consulting with older users, requirements gathering and evaluation of designs are described. This paper addresses: the process of scientific data being transformed into appropriate and useful ‘stories’ to the satisfaction both of writer and researchers: the role of actors and facilitator: the impact on the ‘extreme users’ in the audience: and the data thus gained by the researchers.
Interface Between Two Disciplines - The Development of Theatre as a Research Tool
185
to look after people as they become increasingly frail. Technology should have an important role in improving an old frail person’s quality of life, giving him/her more control over his/her environment, and in giving support to the carers. In order for such technology to be successful, however, older people should be consulted as part of any design process [3].
2 Problems of Consultation Consulting older people about the design of potential technology raises a number of questions: − How do you translate rather abstract scientific concepts into a ‘reality’ that older people can relate to and apply to their own lives? − How can you make older people really understand a piece of technology that has not yet been developed? − How can you make it easier for older people to be critical? They often do not want to ‘upset’ designers and their responses aim to please. How can you create a ‘safe’ method of lively discussion between older people and designers, without the older people feeling intimidated and ashamed of their ‘ignorance’ or the designers either being frustrated or unwittingly patronising.
3 The Introduction of Drama The School of Computing is experimenting with using drama, both video and live theatre, to address these problems [7]. This is based on the following premises: Theatre, whether live or on video, has the ability to ‘pretend’ - so undeveloped technology can be presented as real and working. Scientific concepts and novel technology, with their esoteric language and jargon, can be translated into everyday life. This enables the audience to apply them to their own situation; thus facilitating significant information transfer between researchers and older users. Stories, with ‘real’ characters, with whom the audience can identify, help the audience engage with problems and questions encountered [4,5,11]. All discussion, debate and criticism are focussed on the story and the characters; no-one is going to be offended. This enables both older people and designers to discuss, argue, inform and share needs and experience in a very safe way. This very safety helps older people and designers to draw on and share their experiences. This can be particularly useful in an area where individual needs and disabilities are subject to very wide variation. The roles of researchers, writers, actors and facilitators within this process are all very important, and will be discussed later in this paper. 3.1 Maggie Morgan The Scotland-based Foxtrot Theatre Company, which specialises in interactive forum theatre, provided Maggie Morgan, a theatre writer, director and interactive theatre
186
M. Morgan and A. Newell
facilitator, to work with researchers, write scripts and produce video for two research projects within the School of Computing. The success of these resulted in her being awarded a Leverhulme Art-in-Residence Fellowship for the academic year 2005-6, with the remit to further develop the role of theatre as a research tool within computing. 3.2 The Fall Mentoring Project – Requirements Gathering Using Video A group of researchers were developing a mentoring system which detected falls which involved video cameras within an old person’s home. The pictures would be transmitted to a computer which would alert a carer if it detected the person suffering a fall [6]. The initial reaction of people to the idea of having cameras in the home can be completely negative, but is perhaps an uninformed judgement. To address this issue in more depth, Morgan and the researchers devised four different situations which would inform the viewers, open up wider discussion, and provide valuable data for the researchers. Videos of these scenarios were then made using professional actors and video engineers. The four brief video scenes consisted of – − Older man rushing to answer door bell, and tripping and falling when there was not monitor in his house to detect the fall. − Older woman who has a monitor in her room, reaches up to dust, loses her balance and falls. She is shocked and cannot get up. The monitor registers fall, and soon someone arrives, having been alerted. − False alarm: an older woman – with a monitor – drops a jigsaw, gets down on the floor to pick up the pieces. The monitor registers this and alerts her daughter, who rings her immediately. Music is playing so the mother does not hear the phone for a long time; the daughter rushes out from an important meeting and arrives to find her mother enjoying her jigsaw. She is both relived and frustrated! − A daughter, talking to her father, describes the monitor her mother-in-law has, and that it has somewhat eased the burden of checking up on the old lady. The conversation is interrupted by a phone message from the computer connected to mother-in-law’s monitor. The computer is letting her know that, although the old lady has not fallen, she is not moving around as usual. Daughter-in-law rings to check whether she might be ill. It’s OK! It is Wimbledon fortnight, the old lady is a tennis enthusiast and is hardly moving from the television! Relief but some irritation – but father comments to his daughter that she might be very glad of this function some day. Pauses were built into these video scenes so that the audience could comment and discuss each scenario. The researcher facilitating the discussions, who had been trained in facilitation by Morgan, was able to answer questions about what happened to the ‘pictures’ the cameras took and how carers might be alerted. Audiences varied from relatively fit older people living independently in sheltered housing or their own house to very frail old people who needed a lot of care in order to stay at home and who came together at Day Centres. One audience consisted of a group of professional carers. Each audience brought its own experiences and perspectives; among the
Interface Between Two Disciplines - The Development of Theatre as a Research Tool
187
topics covered were anxieties about privacy; what support systems were already in use and how effective these were; anxieties about falling or becoming ill and this being detected; where falls were most likely; how their individual activities differed and false alarms. The narrative form of the video clips engaged the audience and kept the focus of the discussion. Using drama was found to be an extremely useful method of provoking discussion at the pre-prototyping stage and provided many insights that we believe would not have been obtained without such techniques being used. This confirms the comments made by Sato & Salvador [13] that human centred stories lead to a more detailed discussion and that the drama provides a point of contact, which makes the evaluative task much easier. Although Strom [14] reported that he found it difficult to combine large or dramatic consequences with the exploration of an interface, this was not an issue in this piece of research. 3.3 The UTOPIA Trilogy Video – An Attitude Changing Exercise Using Video A similar technique to the above was used to produce narratives for discussion aimed at communicating the essential findings of the UTOPIA group (Usable Technology for Older People: Inclusive and appropriate) [2] to designers of technology for older people. During the research phase of the project, which included discussions with individuals and groups of older people, important data emerged concerning older people’s problems with language; anxiety; assumptions of knowledge that they in fact lacked; confusing software and the increase of disabilities with aging. Designers, however – usually young – found it difficult to conceive of people who were totally unfamiliar with basic modern technology. Three videos were produced, which focussed on: installing a web camera, a completely novice user attempting to use email, and a first time user of a mobile telephone [15]. The video stories were viewed and discussed by several audiences: some consisting of designers and engineers, some of older people, others of mixed audiences. Changes in audience attitudes were measured by identical questionnaires about perceptions of older people being filled in before the viewing and at the end of the event. Each performance provoked lively discussion and proved very enjoyable. Significant changes in attitude were noted in all audiences who viewed these videos [1]. 3.4 The Rice Digital Television Project – Live Theatre for Requirements Gathering Rice, a researcher in the Dundee University School of Computing, used focus groups in his initial requirements gathering for the design of a home telecommunication system for older adults, and subsequently used live interactive theatre as a method of holding in-depth discussions with a large groups of older people [12]. Although digital television and its possible applications is very topical many, particularly older people, neither understand how digital TV worked nor what its potential uses are, especially those which could enhance the quality of life of older people. The potential uses of digital TV examined were: a ‘chatting’ service, communication between homes via a camera: a ‘scrap book’, and a reminder service. The problems of describing technology, which had not yet been developed, and therefore ascertaining
188
M. Morgan and A. Newell
how desirable or useful it might be seen to be were solved by the ability of theatre to ‘pretend’. A ‘multi-media’ production was scripted, developed and produced, using professional actors, on-stage props, and the projection of DVD onto a back screen. The situations chosen were those frequently found in real life experience – children and grandchildren living at a distance: having to move from the family home to a smaller place; becoming more forgetful The creation of characters in life-like situations resulted in a ‘reality’ with which older audiences could identify and empathise, directly relating the action to their own experiences and expectations. The discussion was enhanced even further when the characters - i.e. the actors who remained in role - took part in the discussion with the audience. The characters bore the brunt of being unsure of the role of the technology and finding the possible disadvantages – but also discovered how it might help their human situation. The performances, and all the audience interaction, were conducted in a purpose designed studio theatre within the School of Computing [9] and were recorded using four cameras and a high quality sound system. This ensured that all the interaction within the audience was faithfully recorded, and were subsequently transcribed. This provided extensive data which was extremely useful both in the decision making process for, and in the detailed development of digital television application.
4 Experiments with Combining Video and Live Theatre Live theatre has a big impact, but the full rehearsed performance is not always feasible both practically and financially. We therefore also experimented with a mixture of video clips and live theatre. Showing a video clip was followed by the actors in that clip being present ‘in role’ to dialogue with the audience. The aim of the viewings was to measure change in attitude towards older people and technology with three audiences – undergraduate students, post-graduate students and professionals at an HCI conference. The undergraduate and post-graduate students reported that, although the video was interesting and informative, being able to question and discuss with the ‘live’ characters had more impact. The response of the professional audience at the HCI conference [10], who were not specialists in designing for older people, was very mixed, but again the session with the actors stimulated a huge amount of discussion and argument and made the session highly memorable for the audience. With all three very different audiences, the fact that the characters were actually actors liberated everyone to say what they really thought. The ‘characters’ were highly believable and convincing, but the audience could attack the characters, knowing that the actors were not going to take their comments personally.
5 Continuing Use of Theatre in Technological Research Plans are already being implemented by a group of researchers from four Scottish Universities, and involving “telecare”, health and social work stakeholders, to use live theatre for requirements gathering, evaluation and inter-communication among
Interface Between Two Disciplines - The Development of Theatre as a Research Tool
189
audiences of older people, formal and informal carers, designers and engineers, health and social work professionals. Two different formats for discussion following the performances will be tested, and the results of this methodological experiment will be reported at the conference at HCI 2007. 5.1 How Does It Actually Work? The essential constituents of Interactive Forum Theatre are quality of: • • • •
The script, The performance , The facilitation of interaction with the audience, and the use of Appropriate interaction techniques.
5.1.1 The Script – The “Story of an Interface” The script must be the result of thorough collaboration between researchers and writer. The task of the researchers is to convey their aims accurately and clearly to the writer: the questions they want answers to and/or the information they wish conveyed. The writer’s task is to understand clearly the aims of the researchers, and to translate their research issues into the form of a story. This interaction between researchers and writer may sound simple, but is in fact complex. The researchers may well be anxious about their measured scientific data being rendered inaccurately: they may find the whole process very alien. Researchers with no experience of this method may feel a lack of trust both in the process and in the writer. The writer on the other hand may find their technical jargon impenetrable, and have to ask many ‘idiot’ questions in order to understand what is really required. The writer has to produce a good story that will work dramatically in performance how can this be reconciled with scientific data and analyses? The writer too can feel frustrated if the researchers seem not to understand what (s)he is trying to do and are even suspicious of the process. The process, however, gradually builds up a rapport between researchers and writer. The writer goes through several stages of composition: she produces one or more outline ideas: then a first draft of script: then a second draft of script, then a ‘working draft’ that the director and actors can begin to rehearse with. At each stage, the writer’s outlines and scripts are referred back to a working group of the researchers for checking out. The writer needs to be clear about her limitations and continually ask the researchers to amend or suggest. For example, when needing audience responses to technological help in the home, what pieces of technology would the researchers like to see in the story? What are the questions they would like asked around this piece of technology? How would the character make this work? An older person might have disabilities to take into account when operating it. Or even how might you persuade an older person that this facility would really benefit them? Alien as this process may seem to traditional theatre, the structure of a dramatised story is actually very appropriate. Tension and conflict are need to achieve drama: characters resisting or struggling with pieces of technology introduces tension and asks questions, and, as with all HCI technology, the interface is with human beings, with their own psychologies, knowledge and context. Theatre can create the “story of an interface”, where an audience can look at a piece of technology, its possible
190
M. Morgan and A. Newell
usefulness, design and usability, and how a human being interacts with it, the human being having attitudes, emotions, physical difficulties and needs. 5.1.2 The Actors Only professional actors have been used in the experiments reported. Minimal costume and only essential props were used and the actors were physically very close to the audience. This form of theatre requires experienced professional actors who can take direction and immediately, or almost immediately, to produce a three dimensional believable characters. The actors, which have been used in this interactive work, also are experienced in interactive theatre, and are able to ‘suspend disbelief’ and have the ability to engage an audience without the normal technical aids of a full theatre production. The actors were very well briefed into the aims of the theatre: the way the pieces of technology were supposed to ‘work’: how it might relate to the life style and needs of the character: and what questions might arise in the audience that they may have to react to. It was extremely useful for one or more researchers to be present for some of the rehearsals. Questions inevitably arise about the technology during rehearsal, and a researcher can supply the information and explanation the actors need. This also assures the researchers that they still have control over the project and that their research is being respected in detail. For example, if a character is being ‘hot-seated’ – questioned ‘in role’ in a dialogue with the audience, (s)he needs to be well versed in the character’s own story and circumstances and also the issues around the piece of technology. Other dramatic possibilities with this format include, the audience being able to redirect a character in the story. For example, one of the characters in the story may have explained the technology in a way that is either incomprehensible or patronising to the older person - the audience can be given the opportunity to replay that part of the story to see the effect of a different approach to the challenge of communicating technology to older people. 5.1.3 The Director and Facilitator The director needs to thoroughly understand the research aims and brief the actors as they rehearse. The director and facilitator have to be as well briefed as the writer. In the case of the work reported here, the writer was also the director and facilitator. If this is not the case, the writer, director and facilitator must work very collaboratively. The facilitator’s role is crucial. (S)he must: Thoroughly understand the issues which the researchers need investigated, Explain clearly and simply to the audience how the process will work and how the facilitator will enable them to interact, Particularly with older people, but in fact with any audience, have a brief, relaxed ‘warm up’ session, to begin the process of audience members responding and beginning to focus, to establish the rapport between facilitator and audience. At the ‘Pauses’ for interaction, guide the audience through the techniques appropriate at that point. Ask questions that are as open as possible, and accept contributions from the audience unconditionally. No one should be made to feel belittled by a facilitator’s response.
Interface Between Two Disciplines - The Development of Theatre as a Research Tool
191
Frequently repeat or paraphrase what an audience member has just said both to reinforce the point and also to make sure everyone in the audience has heard. Where conflicting attitudes and perspectives come from the audience, briefly sum up the divergence, with respect, which often moves the discussion on. The different perspectives are aired and heard by everyone, but there is the safety of the differences being projected onto the characters and the situation in the story. If the focus of the discussion is being lost, regain the focus by referring back to the story. 5.1.4 Co-facilitation In some projects it is appropriate to have a co-facilitator who is a member of the research team. Whenever scientific issues or queries arise, the main facilitator can call on the co-facilitator to supply the information. In the case of a researcher / cofacilitator thinking an important issue or question is being missed in the discussion, (s)he can raise this with the audience. This method of co-facilitation worked well [12]. 5.2 Focus The performance of the story maintains the focus of the discussion, the characters bear the brunt of any negative comments, the audience increasingly engages and feels it’s comfortable to join in and a great deal of data emerges from the discussion. The whole process can be recorded unobtrusively (though with permission) for subsequent transcription and analysis. 5.3 Cost Video and live theatre are both extremely useful for engaging and informing an audience and stimulating lively discussion. They can be used for requirements gathering and evaluation by large groups of people at a time. The impact of live theatre and the ability of the audience to respond, and often directly interact with the characters, cannot be underestimated. If a video is used the discussions following the viewing need to be as well facilitated, as those in live performances, though obviously there is no direct interaction with the performers. The balance of costs between producing a DVD and live performances depends on the number of performances planned. Economically live performances need to be put on close together, so that the actors are employed for a single period and need only one rehearsal period as part of this. If the presentations are spread out in time rebriefing and re-rehearsal of the actors will be needed. The cost of producing a good quality video can be up to five times the cost of producing a series between 2 and 5 live performances within a single run of productions, but if researchers wish to use the performance many times but at intervals and in different places, the initial cost of a video may be more economical. A useful compromise, where performances have to be at intervals, is to make a video and have at least one of the actors present in character for dialogue with the audience. This means that the actor(s) do not need a rehearsal period prior to the performance.
192
M. Morgan and A. Newell
6 Conclusions the Appropriateness of Theatre for HCI The work reported has shown that theatre can be very effective in many stages of the development of technology. There is a logic to the use of theatre in HCI research. Human needs and wants should be the starting point with researchers frequently needing to consult potential users at the earliest stage, and theatre provides a very effective communication method. Once technological ideas begin to be developed, further consultation is needed with potential users. At the pre-prototype stage, theatre is particularly useful to help the researchers create a ‘reality’, where we imagine these devices are being used, but raising questions about appropriateness of design for older people’s life situations and for their usability by people who are unsure about technology and slower to learn than when they were younger. An interactive performance essentially provides a very flexible ‘virtual’ world in which an audience can play with novel technology and concepts. Acknowledgements. The work reported has been supported by the Scottish Higher Education Funding Council, the Engineering and Physical Sciences Research Council, and the Leverhulme Trust.
References 1. Carmichael, A., Newell, A.F., Dickinson, A., Morgan, M.: Using theatre and film to represent user requirements. Include, Royal College of Art, London (April 5-8, 2005) 2. Dickinson, A., Eisma, R., Syme, A., Gregor, P.: UTOPIA: Usable Technology for Older People: Inclusive and Appropriate. In: Brewster, S., Zajicek, M. (eds.) A New Research Agenda for Older Adults, Proc. BCS HCI, London, pp. 38–39 (2002) 3. Eisma, R., Dickinson A., Goodman, Mival, O,J., Syme, A., Tiwari L.: Mutual inspiration in the development of new technology for older people. In: Proc. Include 2003, London, pp.7:252–7:259 (March 2003) 4. Grudin, J.: Why Personas Work – the psychological evidence. In: Pruitt, J., Adlin, T. (eds.) The Persona Lifecycle, keeping people in mind throughout product design, Elsevier (In press) 5. Head, A.: Personas: Setting the stage for building usable information sites. Online 27(4), 14–21 (2003) 6. Marquis-Faulkes, F., McKenna, S.J., Gregor, P., Newell, A.F.: Gathering the requirements for a fall monitor using drama and video with older people. Technology and Disability 17(4), 227–236 (2005) 7. Newell, A.F., Carmichael, A., Morgan, M., Dickinson, A.: The use of theatre in requirements gathering and usability studies. Interacting with Computers 18, 996–1011 (2006) 8. Newell, A.F., Gregor. P.: User sensitive inclusive design in search of a new paradigm. In: Scholtz, J., Thomas, J. (eds.) CUU 2000, Proc. First ACM Conference on Universal Usability, USA. pp. 39–44 (2000) 9. Newell, A.F., Gregor, P., Alm, N.: HCI for older and disabled people in the Queen Mother Research Centre at Dundee University, Scotland, CHI 2006 Montreal, Quebec, Canada, 22-27 April 2006. pp. 299–303 (2006)
Interface Between Two Disciplines - The Development of Theatre as a Research Tool
193
10. Newell, A.F., Morgan, M.: The use of theatre in HCI research, In: “Engage” 20th Annual BCS HCI Conference University of London (September 11-15, 2006) 11. Pruitt, J., Grudin, J.: Personas: Practice and Theory. In: Proceedings DUX 2003, CD ROM, 15 (2003) 12. Rice, M., Newell, A.F., Morgan, M.: Forum Theatre as a requirement gathering methodology in the design of a home telecommunication system for older adults, Behaviour and Information Technology (In press ) 13. Sato, S., Salvador, T.: Playacting and Focus Troupes: Theatre Techniques for creating quick, intensive, immersive and engaging focus group sessions, Interactions, pp. 35–41 (September-October, 1999) 14. Strom, G.: Perception of Human-centered Stories and Technical Descriptions when Analyzing and Negotiating Requirements. In: Proceedings of the IFIP TC13 Interact 2003, Conference (2003) 15. Utopia Trilogy can be downloaded from: http://www.computing.dundee.ac.uk/projects/ UTOPIA/utopiavideo.asp
Aspects of Integrating User Centered Design into Software Engineering Processes Karsten Nebe1 and Dirk Zimmermann2 1
University of Paderborn, C-LAB, 33098 Paderborn, Germany [email protected] 2 T-Mobile Deutschland GmbH, Landgrabenweg 151, 53227 Bonn, Germany [email protected]
Abstract. Software Engineering (SE) and Usability Engineering (UE) both provide a wide range of elaborated process models to create software solutions. Today, many companies have realized the need for usable products and understood that a systematic and structured approach to usability is as important as the process of software development itself. However, theory and practice still have problems to efficiently and smoothly incorporate UE methods into established development processes. One challenge is to identify integration points between the two disciplines SE and UE that allow a close collaboration, with acceptable additional organizational and operational effort. The approach presented in this paper identifies integration points between software engineering and usability engineering on the level of process models. The authors analyzed four different software engineering process models to determine their ability to create usable products. Therefore, the authors synthesized demands of usability engineering and performed an assessment of the models. Keywords: Software Engineering, Usability Engineering, Standards, Models, Processes, Integration, Assessment.
Aspects of Integrating User Centered Design into Software Engineering Processes
195
1.1 Software Engineering Software engineering is a discipline that adopts various engineering approaches to address all phases of software production, from the early stages of system specification up to the maintenance phase after the release of the system ([15], [18]). Software engineering tries to provide a systematic and planable approach for software development. To achieve this, it provides comprehensive, systematic and manageable procedures: so called software engineering process models (SE Models). SE Models usually define detailed activities, the sequence in which these activities have to be performed and resulting deliverables. The goal of SE Models is to define a process where the project achievement does not depend on individual efforts of particular people or fortunate circumstances [5]. Hence, SE Models partially map to process properties and process elements and add concrete procedures. Existing SE Models vary with regards to specific properties (such as type and number of iterations, level of detail in the description or definition of procedures or activities, etc.) and each model has specific advantages and disadvantages, concerning predictability, risk management, coverage of complexity, generation of fast deliverables and outcomes, etc. Examples of such SE Models are the Linear Sequential Model (also called Classic Life Cycle Model or Waterfall Model) [16], Evolutionary Software Development [12], the Spiral Model by Boehm [1], or the V-Model [9]. Software engineering standards define a framework for SE Models on a higher abstraction level. They define rules and guidelines as well as properties of process elements as recommendations for the development of software. Thereby, standards support consistency, compatibility and exchangeability, and cover the improvement of quality and communication. The ISO/IEC 12207 provides such a general process framework for the development and management of software [7]. It defines processes, activities and tasks and provides descriptions about how to perform these items on an abstract level. Thus, there is a hierarchy of different levels of abstractions for software engineering: Standards that define the overarching framework and process models describe systematic and traceable approaches for the implementation. All these levels put the focus on system requirements and system design. 1.2 Usability Engineering Usability Engineering is a discipline that is concerned with the question of how to design software that is easy to use. Usability engineering is “an approach to the development of software and systems which involves user participation from the outset and guarantees the efficacy of the product through the use of a usability specification and metrics.” [4]. Therefore usability engineering provides a wide range of methods and systematic approaches to support the development process. These approaches are called Usability Engineering Models (UE Models). Examples are Goal-Directed-Design [2], the Usability Engineering Lifecycle [11] or the User-Centered Design-Process Model of
196
K. Nebe and D. Zimmermann
IBM [6]. They describe an idealized approach to ensure the development of usable software, but they usually differ in its details, in the applied methods (the “how?”) and the general description of the procedure (the “what?”, e.g. phases, dependencies, goals, responsibilities, etc.) [19]. Usability engineering provides standards which are similar to the idea of software engineering standards. They also serve as a framework to ensure consistency, compatibility, exchangeability, and quality. However, usability engineering standards lay the focus on the users and the construction of usable solutions during the development of software solutions. Examples for such standards are the DIN EN ISO 13407 [3] and the ISO/PAS 18152 [8]. The DIN EN ISO 13407 introduces a process framework for the human-centered design of interactive systems. Its overarching aim is to support the definition and the management of human-centered design activities. The ISO/PAS 18152 is based on the DIN EN ISO 13407 and describes a reference model to measure the maturity of an organization in performing processes that make usable, healthy and safe systems. Thus, in usability engineering exists a similar hierarchy of abstraction levels as in software engineering: Standards define the overarching framework and process models describe systematic and traceable approaches for the implementation. However, usability engineering puts the focus on creating usable and user-friendly systems instead of system requirements and system design. 1.3 Relationship of Standards, Models and Operational Processes In general standards and models are seldom applied directly, neither in software engineering nor in usability engineering. Standards merely define a framework to ensure compatibility and consistency and to set quality standards. Models are being adapted and/or tailored according to the corresponding organizational conditions, such as existing processes, organizational or project goals and constraints, legal policies, etc. According to this, the models are detailed by the selection and definition of activities, tasks, methods, roles, deliverables, etc. as well as responsibilities and relationships in between. The derived instantiation of the model, fitted to the organizational aspects, is called software development process (for SE Models) or usability lifecycle (for UE Models). Thus, the resulting Operational Process is an instance of the underlying model and the implementation of activities and information processing within the organization. This applies to both software engineering and usability engineering. Thus, there is not just a single hierarchy of standards and models but an additional level of operational processes for software engineering, as well as for usability engineering. Standards define the overarching framework, models describe systematic and traceable approaches and on the operational level these models are adjusted and put into practice (Figure 1). In order to achieve sufficient alignment between the two disciplines, all three levels have to be regarded to ensure that the integration points and suggestions for optimized collaboration meet the objectives of both sides and not lose the intentions behind a standard, model or operational implementation.
Aspects of Integrating User Centered Design into Software Engineering Processes
ISO/IEC 12207
Usability Engineering DIN EN ISO 13407 ISO/PAS 18152
Opera tion Proce al ss
Proce ss Mode l
Stand ards
Software Engineering
197
Procedure
Procedure
Fig. 1. Similar hierarchies in the two disciplines software engineering and usability engineering: standards, process models and operational processes
2 Motivation For development organizations SE Models are an instrument to plan and systematically structure the activities and tasks to be performed during software creation. However, software development organizations aim to fulfill specific goals when they plan a software solution. Such goals could be the rapid development of a new software solution (to become the leader in this area) or to develop a very stable and reliable solution (e.g. because of the organization’s prestige) and of course, to create revenue with it. Depending on its’ goals an organization will chose one (or the combination of more than one) SE Model for the implementation that will in their estimate fits best. As an example, the Linear Sequential Model with its predefined results at the end of each phase and its sequential flow of work certainly provides a good basis for planability. On the other hand, the Evolutionary Development might not be a good choice if the main focus of the solution is laid on error-robustness, because the continuous assembling of the solution is known to cause problems in structure and the maintenance of software code. As usability engineering puts the focus on the user and usability of products, which is an important aspect of quality, usability becomes important for the development process and thus also an important criterion for organizations to choose a well-suited SE Model. However, usability engineering activities are not just a subset of software engineering or SE activities. Although different models exist for software and usability engineering, there is a lack of systematic and structured integration [17]. They often coexist as two separate processes in an organization and therefore need to be managed separately and in addition need to be synchronized, by adding usability engineering activities to the software engineering process models. In order to identify integration points between the two disciplines the authors believe examinations on each level of the hierarchy have to be performed: On the level of standards it has to be shown that aspects of software engineering and usability
198
K. Nebe and D. Zimmermann
engineering can coexist and can be integrated, even on this abstract level. On the level of process models it has to be analyzed how usability engineering aspects can be incorporated into SE Models. And on the operational level’s activities, a close collaboration should be achieved, resulting in reasonable additional organizational and operational efforts. 2.1 Common Framework on the Level of Standards In previous work the authors already performed an initial analysis on the first two hierarchy levels [13] of Standards and Processes. First integration points on the level of Standards could be found in comparing the software engineering standard ISO/IEC 12207 with the usability engineering standard DIN EN ISO 13407. Therefore, standards’ detailed descriptions of processes, activities and tasks, output artifacts, etc. have been analyzed and similarities were found. Based on common goals and definitions, the single activities of the standards could be consolidated as five common activities: Requirement Analysis, Software Specification, Software Design and Implementation, Software Validation and Evaluation. These common activities represent and divide the process of development from both, a software engineering and a usability engineering point of view. The five common activities can be seen as basis for integrating the two disciplines on the overarching level of standards: a common framework for software engineering and usability engineering activities. The authors used the framework to set the boundaries for the next level of analysis in the hierarchy: the level of process models. 2.2 Ability of SE Models to Create Usable Products Based on the common framework different SE Models were analyzed with regards to see how they already support the implementation of the usability activities. Thus, an assessment of SE Models with the goal to identify the ability of SE Models to create usable software solutions was performed. In order to create valuable results, the authors defined several tasks to be performed. First, adequate criteria for the assessment of the SE Models needed to be defined, by which unbiased and reliable statements about process models and their ability to create usable software can be made. The assumption was that based on the results of the assessment specific recommendations can be derived to enrich the SE Models by adding or adapting usability engineering activities, phases, artifacts, etc. By doing this, the development of usable software on the level of process models can be guaranteed. Furthermore, hypothesizes about the process improvements can be made for each recommendation which then can be evaluated on the Operational Process level. Therefore, case studies will be identified based on which the recommendations can be transferred in concrete measures. These measures can then be evaluated by field-testing to verify their efficiency of user-centeredness of software engineering activities. In summary, four types of analyses need to be performed: two on the level of process models and two on the operational process level. The four respective analysis topics differ in their proceedings as well as their expected results:
Aspects of Integrating User Centered Design into Software Engineering Processes
-
-
-
199
Operationalization of the base practices and the identification of criteria for the assessment von usability engineering activities and the corresponding deliverables. Assessment of SE Models, based on the identified criteria and the derivation of adequate recommendations. Inspection of case studies with regards to the recommendations and the derivation of specific measures for the implementation of UE activities in SE Processes Evaluation of the measures in practice
For each of the analyses several methods can be used, some of which involve domain experts as interview partners, whereas others are more document oriented. This paper focuses on the description of the performed analyses in the first topic listed above and first results on the second topic as a forecast based on the results of the first topic, i.e. the operationalization of base practices and derivation of UE criteria for the assessment. 2.3 Criteria for the Assessment of SE As the authors identified the need for assessment criteria to define the degree of usability engineering coverage in SE Models, the following section shows how these criteria were gathered and what results were derived and to be expected from further research activity. To obtain detailed knowledge about usability engineering activities, methods, deliverables and their regarding quality aspects, the authors analyzed the DIN EN ISO 13407 and the ISO/PAS 18152. In addition to the identified common activities of the framework within the human-centered design activities, ISO/PAS 18152 defines detailed Base Practices that specify the tasks for creating usable products. These base practices have been used as a foundation to derive requirements that represent the common activities’ usability engineering perspective. The quantity of fulfilled requirements for each activity of the framework informs about the level of compliance of the SE Model satisfying the base practices and therewith the usability view of activities. For each base practice the authors determined whether the model complied with it or not. In a second iteration of the gap-analysis expert interviews will lead to more detailed criteria in order to assess the corresponding SE Models more specific. Additionally the completeness and correctness of the base practices and humancentered design activities as defined in the ISO/PAS 18152 itself needs to be verified. The detailed descriptions of the base practices have been used to pre-structure the collection of criteria and for the expected results. Since the base practices are structured based on activities, methods, and deliverables the authors used this to prestructure the expected results. Additionally expected results are criteria about the quality aspects of the overall process. The results will be separated based on the specific human-centered design activities and those that are more generic and overarching. This results in a matrix of activities & methods, content & deliverables, roles & quality aspects in relation to the human-centered design and overall activities as shown in Table 1.
200
K. Nebe and D. Zimmermann
of use
Produce Design S l i Evaluation
User Requirement
Context of use
Overarching Aspects
Table 1. Structure and orientation of criteria for the assessment of software engineering models
Based on this, several evaluation questions have been gathered, focusing on the abstract level of process models. The goal is the to define overarching criteria and not evaluate the concrete accomplishment within one specific model or particular procedure, e.g. questions about overlaps of activities, phases, deliverables, or questions about the relevance of specific activities or roles within a process model. According to the questions and based on the initial structure, as shown in Table 1, the authors performed the first analysis, the documentation of existing SE Models (Linear Sequential Model, Evolutionary Software Development, the Spiral Model by Boehm and the V-Model) and for the second analysis created an interview guideline that is currently used as basis for the expert-interviews. Initial results of theses analyses are described in the following section.
Across Activities
0%
Evaluation of Use
0%
Produce Design Solutions
User Requirements
Linear Sequential Model
Context of Use
Table 2. Summary Results of the gap-analysis, showing the sufficiency of SE Models in covering the requirements of usability engineering (based on the ISO/PAS 18152; HS 3)
0 % 60 %
13 %
Evolutionary Development
13 % 40 % 40 % 80 %
39 %
Spiral Model
13 %
V-Modell
88 % 80 % 40 % 100 % 78 %
Across Models
28 %
80%
50 %
40 % 100 % 52 %
30 %
85 %
3 Results As a result of the first analysis of selected SE Models first general statements can be made: The overall level of compliance of the SE Models satisfying the base practices and therewith the usability view of activities, is rather low (Table 2). For none of the
Aspects of Integrating User Centered Design into Software Engineering Processes
201
SE Models all base practices of the ISO/PAS 18152 are fulfilled. However, there is also a large variability in the coverage rate between the SE Models. For example, the V-Model shows a very good coverage for all modules except for lower compliance of the activity HS 3.3 Produce Design Solution criteria, whereas the Linear Sequential Model only fulfills a few of the HS 3.4 Evaluation of Use criteria and none of the other modules. Evolutionary Design and the Spiral Model share a similar pattern of findings, in that they show little coverage for Context of Use, medium to good coverage of User Requirements, limited coverage for Produce Design Solution and good support for Evaluation of Use activities. By looking at the summary of results and comparing the percentage of fulfilled requirements for each SE Model, it shows that the V-Model has a better compliance than the other models and it can basically be regarded to be able to produce usable products. In the comparison, the Linear Sequential Model cuts short, followed by Evolutionary Development and the Spiral Model. Both in the overview and the detail findings it shows that the emphasis for all SE Models is laid on evaluation (Evaluation of Use), especially in comparison to the remaining activities. The lowest overall coverage could be found in Context of Use and Produce Design Solution. Based on the relatively small compliance values for the Context of Use (28%), User Requirements (50%) and Produce Design Solutions (30%) activities across all SE Models, the authors see this as an indicator that there is only a loose integration between usability engineering and software engineering. In summary, the results confirmed expectations of the authors, showing the low level of integration between both disciplines on the level of the overarching process models. As expected it becomes apparent that there is a dire need to compile more specific and detailed criteria for the assessment of the SE Models. As the analysis showed, the base practices currently give too much leeway for interpretations. In addition it turned out that the dichotomous assessment scale (in terms of “not fulfilled” or “fulfilled”) is not sufficient. A less granular rating is necessary to evaluate the process models adequately. Performing the documentation analysis of the SE Models produced first insights but it turned out that the documentation is not comprehensive enough to ensure the validity of the resulting statements. In the second analysis the authors plan to conduct more specific criteria will be determined, according to the previously described structure. These will be compiled in semi-structured interviews with experts from the domain of usability engineering. The criteria focus on the activities defined in the module Human-centered design (ISO/PAS 18152) and their respective base practices and specifics in: fundamental activities, basic conditions and constraints, relevance of activities, resulting outcomes, type of documentation, and respective roles and responsibilities. Beyond this, a substantial focus is put on the quality aspects based on the activities, deliverables, roles and the super ordinate model. The criteria will be evaluated concerning questions like: -
How to identify good activities? How to identify good results or deliverables? How to identify appropriate Roles What are properties/characteristics for the relevance and frequency? How could the progress of an activity or deliverable be measured and controlled?
202
K. Nebe and D. Zimmermann
Based on these criteria the authors expect to be able to get evidence, which activities, deliverables and roles are necessary to ensure the development of usable products from the experts’ point of view. Relevant factors of influence could be for instance: „When will an activity A not be performed, and why?” or “Under which circumstances will an activity A be performed completely, when just partly?” Additionally, criteria are to be raised, based on which the progress of the process could be measured. However, the central point will be collection of criteria that focus on quality aspects of the activities, deliverables and roles as well as their relevance. It is expected that the results can not just be used as more detailed criteria for the assessment but will also provide evidence on the level of completeness of the ISO/PAS 18152 and surface potential areas of improvement.
4 Summary and Outlook The approach presented in this paper was used to identify integration points between software engineering and usability engineering on the level of process models. The authors analyzed four different software engineering process models to identify their ability to create usable products. The authors synthesized demands of usability engineering and performed an assessment of the models. The results provide an overview about the degree of compliance of the models with usability engineering demands. It turned out that there is a relatively small compliance to the usability engineering activities across all software engineering models. This is an indicator that there only little integration between usability engineering and software engineering exists. There are less overlaps between the disciplines regarding these activities and therefore it is necessary to provide suitable interfaces to create a foundation for the integration. The authors identified the need to compile more specific and detailed criteria for the assessment as well as a more differentiated dichotomous assessment scale to evaluate the process models appropriately. Therefore the authors introduced a structured approach of how they will perform the follow-up analysis. The more detailed criteria will be compiled in semi-structured interviews with experts from the domain of usability engineering. Thereby, a substantial focus is put on the quality aspects based on the activities, deliverables, roles and the super ordinate model. Based on these criteria the authors expect to be able to make statements about their necessity and the relevance to ensure the development of usable products from the experts’ point of view. It is expected that the results could not just be used as criteria for the assessment of software engineering models but could also define the demands of usability more precisely and to give evidence about the completeness and potential extension areas of the ISO/PAS 18152.
References 1. Boehm, B.: A Spiral Model of Software Development and Enhancement. IEEE Computer 21, 61–72 (1988) 2. Cooper, A., Reimann, R.: About Face 2.0. Wiley, Indianapolis, IN (2003)
Aspects of Integrating User Centered Design into Software Engineering Processes
203
3. DIN EN ISO 13407. Human-centered design processes for interactive systems. CEN European Committee for Standardization, Brussels (1999) 4. Faulkner, X.: Usability Engineering, pp. 10–12. PALGARVE, New York (2000) 5. Glinz, M.: Eine geführte Tour durch die Landschaft der Software-Prozesse und – Prozessverbesserung. Informatik – Informatique, pp. 7–15 (6/1999) 6. IBM: Ease of Use Model. (11/2004) Retrieved from http://www-3.ibm.com/ibm/easy/ eou_ext.nsf/publish/1996 7. ISO/IEC 12207. Information technology - Software life cycle processes. Amendment 1, 2002-05-01. ISO copyright office, Switzerland (2002) 8. ISO/PAS 18152. Ergonomics of human-system interaction — Specification for the process assessment of human-system issues. First Edition 2003-10-01. ISO copyright office, Switzerland (2003) 9. KBST: V-Modell 97. (05/2006), Retrieved from http://www.kbst.bund.de 10. Larman, C., Basili, V.R.: Iterative and Incremental Development: A Brief History. Computer 36(6), 47–56 (6/2003) 11. Mayhew, D.J.: The Usablility Engineering Lifecycle. Morgan Kaufmann, San Francisco (1999) 12. McCracken, D.D., Jackson M.A.: Life-Cycle Concept Considered Harm-ful. ACM Software Engineering Notes pp. 29–32 (4/1982) 13. Nebe, K., Zimmermann, D.: Suitability of Software Engineering Models for the Production of Usable Software. In: Proceedings of the Engineering Interactive Systems 2007, HCSE (IFIP Working Group 13.2, Methodologies for User Centered Systems Design). Lecture Notes In Computer Science (LNCS), Springer, Heidelberg (in prep. 2007) 14. Pagel, B., Six, H.: Software Engineering: Die Phasen der Softwareentwicklung, 1st edn. vol. 1. Addison-Wesley Publishing Company, Bonn, D (1994) 15. Patel, D., Wang, Y. (eds.): Annals of Software Engineering. Editors’ introduction: Comparative software engineering: Review and perspectives, vol. 10, pp. 1–10. Springer, Heidelberg (2000) 16. Royce, W.W.: Managing the Delopment of Large Software Systems. In: Proceedings IEEE, pp. 328–338. IEEE, Wescon (1970) 17. Seffah, A. (ed.): Human-Centered Software Engineering – Integrating Usability in the Development Process, pp. 3–14. Springer, Heidelberg (2005) 18. Sommerville, I.: Software Engineering. 7th ed. Pearson Education Limited, Essex, GB (2004) 19. Woletz, N.: Evaluation eines User-Centred Design-Prozessassessments - Empirische Untersuchung der Qualität und Gebrauchstauglichkeit im praktischen Einsatz. Doctoral Thesis. University of Paderborn, Paderborn, Germany (4/2006)
Activity Theoretical Analysis and Design Model for Web-Based Experimentation∗ Anh Vu Nguyen-Ngoc Department of Computer Science University of Leicester United Kingdom [email protected]
Abstract. This paper presents an Activity Theoretical analysis and design model for Web-based experimentation, which is one of the online activities that plays a key role in the development and deployment of flexible learning paradigm. Such learning context is very complex as it requires both synchronous and asynchronous solutions to support different types of interaction, which can take place not only among users but also between the user and the provided experimentation environment, and also between different software components that constitute the environment. The proposed analysis and design model help clarify many concepts needed for the analysis of a Webbased experimentation environment. It also represents an interpretation of Activity Theory in the context of Web-based experimentation. Keywords: Analysis and Design model, Activity Theory, Web-based experimentation.
1 Introduction Since about a decade, several engineering departments in colleges and universities have faced the logistical matters of educating more students with the same resources while maintaining the quality of education. There is also an increasing need to expand the diversity of laboratory resources provided to students. Within this challenging context, the flexible learning paradigm [1, 2] could be seen as an appropriate solution. It refers to a hybrid-learning scheme in which the traditional courses are combined with online activities. In engineering education, Web-based experimentation is one of the online activities that plays a key role in the development and deployment of such flexible paradigm. In fact, since the last decade, several institutions have already exploited the usage of the Web infrastructure and developed their experimentation courses in engineering curricula using this medium as a main infrastructure. However, Web-based experimentation is a very complex socio-technical setting [2-4]. As a consequence, understanding the main factors that constitute such particular learning context is an essential step in finding solutions to support and sustain interaction, ∗
Most of this work has been carried out while the author was with the Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland.
Activity Theoretical Analysis and Design Model for Web-Based Experimentation
205
collaboration and learning processes. Though several Web-based experimentation environments have been developed, such as [5-9], so far, there is still no analysis and design model that is really capture the main characteristics of such learning context, and provide useful guilds for analysts, designers, and developers to design and develop Web-based experimentation environments. This paper proposes such a model. Section 2 of this paper discusses the major characteristics of Web-based experimentation. Section 3 presents a typical scenario of interaction and collaboration processes in such learning context. The Activity Theoretical analysis and design is discussed in Section 4. Finally, section 5 concludes the paper.
2 Characteristics of Web-Based Experimentation Although there have been several works on Web-based experimentation environment design, development, and deployment, there is still no clear standard for determining the main characteristics of the collaborative hands-on activities in such learning environments. In this section, a list of these essential characteristics is discussed. 2.1 Hands-On Activities Support First of all, the content delivered in engineering courses that rely on Web-based experimentation includes not only static documents, textual presentations, or video presentations but also computation, graphics generated on-the-fly, real devices measurements, and the like. Web-based experimentation can include virtual and/or remote laboratory resources. In fact, real experimentation is still irreplaceable in engineering curricula since students need to have contact with the apparatus and materials, and that labs should include the possibility of unexpected data occurring as a result of material problems, noise, or other uncontrolled real-world variables. Virtual and remote laboratory resources provide a complement means to carry out real experimentation online and/or at distance. A typical virtual laboratory resource is an interactive experiment that relies on a simulation engine. A typical remote laboratory resource is a physical experimental system that is equipped with the necessary facilities to enable Webbased measuring, monitoring, and manipulation [2]. 2.2 Components Integration Due to the complexity of hands-on work [2-4], several components may need to be integrated into the same experimentation environment. These components should help support the whole experimentation process from the preparation stage, to the design stage, to the experiment stage, and to the experimental analysis stage. Each component provides a working space or working console where students carry out some dedicated tasks to solve a particular problem for a complete experiment. Since the output from one stage may serve as the input for the next stages, there should be some linkages between these components. A comparative study have been carried out
206
A.V. Nguyen-Ngoc
in various engineering courses at the EPFL to determine the most common service spaces that may well require the supporting components for completing typical experimentation assignments [2, 10]. Each service space can be supported by one or several components developed using different technologies. These spaces are as follows • The first space that needs to be supported of course relates to the experimentation itself. This can be regarded as the interaction part of the environment. It enables the actual realization of experiments by interacting with virtual laboratory or remote laboratory resources. • The second space that needs to be supported concerns with tools to carry out interactive design and analysis activities related with the experiment. • The third space of a Web-based experimentation relates to the collaboration support. This is where the professors and the teaching assistants can interact with the students to monitor their progress and to guild their learning activities; where students interact with each other to get the tasks done. • Furthermore, a Web-based experimentation environment may also need to integrate some supplementary components, which give access to a number of pieces of information, including relevant reminders or links presenting the underlying theory, experimental protocol, and description of the environment, including the laboratory resources and the environment features that are used in the experiment. Obviously, depending on the experimental protocol, a Web-based experimentation environment may not need to integrate all of these components. 2.3 Multi-session Experiment Typical Web-based experimentation sessions are mediated by teaching assistants and also by professors responsible for the course. There may be some face-to-face sessions, in which the students work in the laboratory with the presence of the professor and/or teaching assistants, but most of the learning activities take place in flexible sessions. Actually, multi-session experiments are an important factor that helps facilitate students to perform experimentation in a flexible way. In a Web-based experimentation environment, students should be able to carry out several trial-anderror experiments that help them reinforce their understanding of theoretical lectures and physical phenomena in a framework where errors are neither penalized nor hazardous. Ideally, a Web-based experimentation environment should be able to allow students to reconstruct the whole or some parts of the experiment and perform it as many times as they want. Hence, the experimental parameters need to be stored somehow for further reconstruction or reuse of that experiment. To support multi-session experiments carried out by a single student or by groups of students; many issues need to be addressed, such as the continuity of interaction [11] that allows students to interact smoothly and uninterruptedly with the experimentation environment and the laboratory resources, and also with other students. Several asynchronous and synchronous collaboration facilities need to be considered as well.
Activity Theoretical Analysis and Design Model for Web-Based Experimentation
207
2.4 Types of Collaboration The importance of collaboration among students has been recognized since a long time in education, especially in distance and online education. According to social constructivists, learning is a social construct mediated by language via social interactions [12], where collaboration among learners is a critical concept [13]. In addition, hands-on activities are usually conducted in small groups [2]. Consequently, Web-based experimentation environments should integrate components that help students to actively create their own contextual meaning, rather than passively acquire knowledge structures created by others [3]. These components should facilitate students to interact with their peers, discuss their positions, form arguments, reevaluate their initial positions, and negotiate meaning. Students become responsible for learning as they collaborate with one another, with their environment, and with their teaching assistants and professors. Both synchronous and asynchronous collaborations should be supported in a Web-based experimentation environment. 2.5 Discretionary of Collaboration The autonomy of individual students while working in flexible modalities means that collaboration with other students is, in many cases, not strictly required. In other words, the student can collaborate with other students only when they believe that it is worth to do so. In fact, students participating in the course using the provided Webbased experimentation may enrol in different other courses. This means that they may have different study schedule, and they may carry out different tasks at different times. These variations can make it difficult to find some common times when students can collaborate. As a consequence, even working in groups, students usually work together, either in face-to-face or distance modes, when a due date is approaching, e.g. before the laboratory sessions, or before the laboratory test. Of course, there exist also other modes of group working. Our experience in observing the students’ work shows that there are some “well-organized” groups, in which the members clearly divide the tasks for each one. There are also many cases in which only one member of the group does the “whole job”. However, depending on the experimental protocol, more precisely speaking, on how the laboratory test is carried out, sometimes it is difficult for the teaching assistants and professors to recognize such problems. The Web-based experimentation environment should allow students to switch between single working mode and collaborative working mode. This switching should be smooth and transparent as much as possible from the student’s point of view.
3 Typical Scenario of Interaction and Collaboration Process Fig. 1 illustrates the interaction and collaboration process happened in Web-based experimentation in which collaborative actors perform a chain of activities to obtain an outcome, i.e. to acquire knowledge from the course (see 1 in the figure). Collaborative actors are, for instance, student groups enrolled in the course and are using the environment to carry out their experimentation. In hands-on sessions, the group size is usually small (consisting of 2 or 3 students) [2, 3]. These actors share their common
208
A.V. Nguyen-Ngoc
background, divide tasks, coordinate their work, and collaborate with each other based on some social rules to get the work done. To support the coordination and communication between these actors, several collaboration and communication facilities may be needed and integrated into the experimentation environment.
Fig. 1. The interaction and collaboration process of Web-based experimentation
These actors interact with various (software) objects displayed in the GUI of the Web-based environment (2). For example, a student uses the computer mouse to modify the parameters of an electrical drive, which are displayed in the GUI as scrollbars. These objects are actually the representations of software components (3), which may be located on different servers. The interaction between the actors and the objects may change the status and the behaviours of the components, as well as may invoke the interaction and/or the internal calculating process of these components (4). In its turn, the interaction between the components at the system level facilitates the interaction process at the user level, which may serve for the next activities of students (5). To summarize, this scenario depicts the complexity of the context in which • Students can collaboratively carry out their hands-on activities in a flexible way. • The online learning community is heterogeneous and its members may have different roles. The coordination and collaboration among the members of the community may be defined by different social protocols and rules. • The Web-based experimentation environment itself may integrate a large variety of software components, which constitute what we call the system level. These components are represented by several objects displayed in the interface of the provided experimentation environment. • The interaction process conducted by the actors, which externally and internally happens in both user and system levels, allow the actors to acquire the outcome for the course.
Activity Theoretical Analysis and Design Model for Web-Based Experimentation
209
4 Activity Theoretical Analysis and Design Obviously, the complexity of Web-based experimentation is caused by several social and technical factors. As a consequent, when studying the collaborative hands-on work in Web-based experimentation, the interaction and collaboration process should be analyzed as a whole, not as any of its constituent entities in separation, since there are close, active, reciprocal, and bidirectional interdependences among these entities. Actually, the importance of Activity Theory as a framework for conceptualizing human activities has also been studied since a long time by the CSCW and CSCL communities [14, 15]. In an influenced paper published in 1999, Jonassen and RohrerMurphy also argued that Activity Theory has provided a powerful instrument to analyze the needs, tasks, and outcomes for designing constructivist learning environments [16]. They proposed a framework that helps analyze and design a constructivist learning environment. However, one of the most difficult problems for the analysts and designers is how to apply these abstract concepts to a real world problem, e.g. to design a real Web-based experimentation environment that supports online collaborative hands-on activities. In this section, the Jonassen and Mohrer-Murphy‘s framework is adapted to introduce a mapping and interpretation from the abstract concepts introduced in Activity Theory into the real context of Web-based experimentation. The constructed framework would help understand and clarify the context of Web-based experimentation from an Activity Theoretical perspective. 4.1 Activity Theory Concepts 1.
Subject: There could be several types of subjects in the context of Web-based experimentation. Following are the most important ones a. Professor: is someone who is in charge of the course. His/her role is to design and construct the course pedagogical scenario, to guide students in their learning process during the whole course, and also to evaluate the students’ progress and their acquired knowledge. b. Teaching assistant: is someone who may play a very important role in distributing knowledge in the class. The teaching assistant would help students during hands-on sessions. His/her role could also be to support the course management and administration. c. Student: the main subject using the environment, who enrols in the course for carrying out experimentation using the environment provided. d. Technician: is responsible for the configuration of physical equipments in the laboratory. e. Evaluator, research assistant: is responsible for assessing the effectiveness and efficiency of the environment, and/or proposing further improvement, development, and the like.
2.
Object: Different objects can be defined. These different objects are transformed during the course to obtain different outcomes a. Long-term object: can be composed of both physical and mental products. The physical object could be the deliverables obtained after
210
A.V. Nguyen-Ngoc
finishing the course, e.g. a course report, or a set of adequate parameters to obtain a stable state of the system. The mental product refers rather to the knowledge, the concepts, or the perceptions of students on a particular engineering domain. b. Short-term object: objects for each experimental sessions, or modules. Deliverables represented short-term objects could be a report, a mathematical problem to be resolved, a hands-on module to be realized, and the like. Short-term objects can also be the knowledge obtained after finishing these modules. 3.
4.
5.
6.
Community: All professors, assistants, students, technicians using the environment for the course form an online learning community, in which the student is the central character and the professors, teaching assistants are usually the central source of knowledge distribution. Rule: Several rules can be defined for a course depending on the course requirements, the laboratory policies, and on the pedagogical scenarios. The task organization among the members of the same groups normally relies on a social protocol or a compromise established within the group or between groups in the community. In hands-on sessions, experimental protocol is what the professors define to guild the students’ hands-on steps. Tool, artefact: Tools that need to be integrated should support and reflect the major characteristics of Web-based experimentation as presented in the contextual model. Various tools may be required. The analysts and designers should also consider the question of developing the tools themselves or integrate those having been developed by other institutions. Division of labour: This also means the division of tasks between the members of the learning community. The division of labour is actually dependent upon the learning community and the rules defined for that community.
4.2 Activity Structure This part involves in a definition of the activities that engage the subject. Each activity could be decomposed into its component actions and operations. However, the definition of the activity structure and its granularity is solely based on the pedagogical scenarios as well as on the objectives of the environment evaluators. In a practical course, an activity is usually equated with the task students need to complete [11]. For each activity (or task), actions are the full meaningful experimental steps that need to be realized. Operations are what students do unconsciously by interacting with the environment to complete each step. In an automatic control laboratory course, for example, a task could be “Modelling and control of an electrical drive”. For each task, several actions need to be realized. These actions have an immediate, pre-defined goal, such as “preparing the pre-lab”, “manipulating the physical drive”, or “analyzing the experimental result”. Actions consist of a chain of operations, such as “moving the parameter scrollbar to increase or decrease the value of a parameter of a studying electrical drive”.
Activity Theoretical Analysis and Design Model for Web-Based Experimentation
211
4.3 System Dynamism This part investigates the interrelationships between the components that are integrated into the environment. Actually, the interrelationships are dependent upon the pedagogical scenarios defined by the professors. The dynamics of the relationship between members of the community, who use the environment for their learning activities, depends on the social protocol, the division of labour established, and the rules set for the course. Usually, in hands-on sessions, the experimental protocol is pre-defined by the professors and always available for students to follow; hence, for students, the task complexity is mostly dependent upon how they carry out the tasks following the steps defined in the experimental protocol. In addition, the “objectives of work” is also pre-defined, thus collaborative activities are usually not necessarily up to the co-construction level of activity [17]. Fig. 2 summarizes the Activity Theoretical analysis and design model, in which all major elements of Activity Theory are mapped into the context of Web-based experimentation. In other words, the proposed model illustrates our Activity Theoretical vision on the analysis and design of Web-based experimentation environments. Actually, it can also be used as an independent guidance for analysts and designers to analyze and design Web-based experimentation environments. In fact, this model has facilitated the design and development of the eJournal, which is an electronic laboratory journal integrated into the eMersion experimentation environment. In turn, the iterative design and development of the eMersion environment and the eJournal have validated the reliability and usefulness of the proposed model. The eMersion environment has been used in several automatic control courses offered by the EPFL since several academic semesters. It has also been deployed and tested in other European institutions such as the University of Hanover in Germany, the UNED University in Spain and the Ecole Nationale de Mines St. Etienne in France. More information about the design and evaluation of the eMersion and eJournal could be found in [2, 3, 10, 18, 19].
Fig. 2. Activity Theoretical analysis and design model
212
A.V. Nguyen-Ngoc
5 Conclusion This paper presents what we call Activity Theoretical analysis and design model. It discusses the characteristics of Web-based experimentation and also introduces a typical scenario of interaction and collaboration processes in such learning context. This model shed light on many concepts needed for the design of Web-based experimentation environments. It also represents a mapping from Activity Theory to the context of Web-based experimentation. The goal of the proposed models is to capture the important aspects concerning the collaborative hands-on activities in a Web-based experimentation environment. The model could be used by a variety of users. Researchers and professors could be based on this model to conduct their study on the students’ behaviours and activities in such particular learning context. Environment developers could use the model to facilitate their development tasks as the model focused already on the most relevant issues of the domain. And the developers could use the model to structure the environment in a coherent way. Acknowledgments. This work would not have been finished without the invaluable support from the eMersion team, EPFL.
References 1. Holmberg, B.: Theory and practice of distance education, Routledge, London (1995) 2. Gillet, D., et al.: The Cockpit, An effective metaphor for Web-based Experimentation in engineering education. Int. Journal of Engineering Education, 389–397 (2003) 3. Gillet, D., Nguyen-Ngoc, A.V., Rekik, Y.: Collaborative Web-based Experimentation in Flexible engineering education. IEEE Trans on Education, 696–704 (2005) 4. Feisel, L.D., Rosa, A.J.: The role of the laboratory in undergraduate engineering education. ASEE Journal of Engineering Education (2005) 5. Böhne, A., Faltin, N., Wagner, B.: Synchronous tele-tutorial support in a Remote laboratory for process control. In: Aung, W., et al. (eds.) Innovations 2004: World Innovations in Engineering education and research. iNEER in cooperation, pp. 317–329. Begell House Publishers, New York (2004) 6. Schmid, C.: Using the World Wide Web for control engineering education. Journal of Electrical Engineering, 205–214 (1998) 7. Tzafestas, C.S., et al.: Development and evaluation of a virtual and remote laboratory in Robotics. In: Innovations 2005: World innovations in Engineering education and Research. iNEEER in cooperation, pp. 255–270. Begell House Publishers, New York (2005) 8. Ko, C.C. et al.: A Web-based virtual laboratory on a frequency modulation experiment. IEEE Trans on Systems, Man, and Cybernetics, pp. 295–303 (2001) 9. Sepe, R.B., Short, N.: Web-based virtual engineering laboratory (VE-LAB) for collaborative experimentation on a hybrid electric vehicle starter/alternator. IEEE Trans on Industrial Applications (2001) 10. Nguyen-Ngoc, A.V., Rekik, Y., Gillet, D.: Iterative design and evaluation of a Web-based experimentation environment. In: Lambropoulos, N., Zaphiris, P.P. (eds): User-centered design of online learning communities. Idea Group Inc, Pennsylvania, pp. 286–313 (2006)
Activity Theoretical Analysis and Design Model for Web-Based Experimentation
213
11. Nguyen-Ngoc, A.V., Rekik, Y., Gillet, D.: A framework for sustaining the continuity of interaction in Web-based learning environment for engineering education. ED-MEDIA conference, Montreal, Canada (2005) 12. Vygotsky, L.S.: Mind in Society. In: The development of higher psychological processes, Harvard University Press, London (1978) 13. Jonassen, D.H., et al.: Constructivism and computer-mediated communication in distance education. The American Journal of Distance Education, pp. 7–26 (1995) 14. Kuutti, K.: Activity Theory as a potential framework for Human-Computer Interaction research. In: Nardi, B.A. (ed.) Context and Consciousness: Activity theory and Humancomputer interaction, The MIT Press, MA (1995) 15. Nardie, B.A.: Context and consciousness: Activity theory and Human-computer interaction. MIT Press, MA (1996) 16. Jonassen, D.H., Rohrer-Murphy, L.: Activity Theory as a framework for designing constructivist learning environments. Educational Research and Development, pp. 61–79 (1999) 17. Bardram, J.E.: Collaboration, Coordination, and Computer Support: An Activity Theoretical Approach to the Design of CSCW. University of Aarhus (1998) 18. Nguyen-Ngoc, A.V., Gillet, D.S., Sire, S.: Evaluation of a Web-based learning environment for Hands-on experimentation. In: Aung, W., et al. (eds.) Innovations 2004: World Innovations in Engineering education and research. iNEER in cooperation, pp. 303–315. Begell House Publishing, New York (2004) 19. Nguyen-Ngoc, A.V., Gillet, D., Sire, S.: Sustaining collaboration within a learning community in flexible engineering education. In: ED-MEDIA conference. Lugano, Switzerland (2004)
Collaborative Design for Strategic UXD Impact and Global Product Value James Nieters1 and David Williams2 1
255 W Tasman Ave, San Jose, CA 95134- PhD, 2 934 Nanjing West Road, Suite 505, Shanghai, 20041 China [email protected], [email protected]
Abstract. Experts in the field of HCI have spoken at length about how to increase the strategic influence of User Experience Design (UXD) teams in industry [2] [5]. Others have talked about how to build a usability or user experience team in industry [3], and others have offered courses in managing HCI organizations [1] [7]. At the same time, other experts have spoken about the importance of making products usable and desirable for international audiences [9] and the value of “offshoring” their usability efforts [8]. Few though have discussed the value and process for an embedded UXD Group functioning as an internal consultancy to different product teams within their organizations. This paper presents both how the consultancy model can increase the strategic effectiveness of UXD inside a company, and how, by leveraging partners internationally, such groups can broaden the usefulness, usability, and desirability of their products to a more global audience. Keywords: User Experience Design, Organizational development, User Experience Teams, Management, Internationalization.
Collaborative Design for Strategic UXD Impact and Global Product Value
215
leaders in each division may request that UXD resources working on their project report directly to them. • Client-funded model, where individual business units fund a central team that provides UXD resources to their teams, and one central UXD organization manages these people. The benefits of this model are similar to the central model. In addition, the central organization does not become a cost center because other divisions pay for UXD resources. However, managers in each division may feel that UXD practitioners who are not part of their organization are not core or central to their business—and they can decline to pay for the individuals at any point. This challenge becomes more likely when managers need to reduce headcount and do not want to eliminate the individuals whom they “own” (who report to them). • Distributed model, where there is no central UXD group, but UXD practitioners (and smaller groups) report directly to the divisions for the products on which they work. One benefit of this model is that such people are viewed more as “insiders,” as part of the team. While an increasing number of companies are using this model, it poses many challenges for the UXD groups and their influence. There is often no explicit sharing of resources or processes across UXD groups, and destructive competition can arise. Unless each UXD group is large enough, practitioners can end up reporting to a manager who does not understand the value of the UXD function. In addition, without a central UXD group, there is no team responsible for UXD process, standards, or infrastructure. At Cisco, these more traditional organizational structures met with some success. One group within the centralized model was able to show a ROI of more than 10x, or $50 Million USD annually. However, $50 Million in a company that grew from $4 Billion to >$30 Billion from 1999 through 2006 was barely noticed. Attempting to improve the usability, usefulness, and desirability of too many products at one time diminished the Cisco UXD group’s ability to gain the sustained support of senior executives. The Cisco UXD group needed a different model. Senior leaders at Cisco, and other companies, both in mature and emerging markets, are held responsible for steep revenue growth. As such, they are in search of the next “advanced technology” (AT). AT’s are disruptive innovations [6] that differentiate one company from its competition, resulting in large revenue increases. To become strategically relevant, the Cisco UXD team needed to deliver disruptive innovation that changed the way people thought about and interacted in a domain. Therefore, these executives want to invest in groups that can drive radical differentiation. They may also invest in groups that incrementally increase revenue or decrease costs (such as prior Cisco usability teams), but they are likely to invest the most in groups who prove that they can stimulate disruptive innovation [6]. Attempting to improve the usability, usefulness, and desirability of too many products at one time diminished the Cisco UXD group’s ability to gain the sustained support of senior executives. The Cisco UXD group needed a different model, so it could increase revenue geometrically instead of incrementally. To influence a complex-systems company [6] such as Cisco, the UXD Group needed an ROI of 100x to 1000x.
216
J. Nieters and D. Williams
2 Enter the ‘External Consultancy Model’ Within the areas of product and interaction research, design and testing, independent design studios have flourished in mature markets such as the US, Europe and South Korea (IDEO, Fitch, Razorfish). Now, a new breed of international and cost-effective design studios such as Asentio Design are developing business from bases in emerging markets such as China or India. Asentio Design flourishes due to its ability to allocate multi-function design teams to chosen client projects without being constrained by processes and corporate politics experienced by design teams within companies. By capitalizing on its geographic and linguistic context is also able to act as a design bridge between clients in mature markets and ODM/OEM design teams in emerging markets. This model has been referred to as “Collaborative Design.” [9] With such companies in mind, the Cisco UXD Group is able to act like an external design firm. Instead of assigning one UI designer to one or even multiple projects, the Cisco UXD Group now assembles highly focused teams comprising multiple crossfunctional experts to support speedy innovation on carefully selected products. These experts include user researchers, interaction designers, visual designers, developers, and industrial designers as necessary to deliver a superior user experience in a very short time. The consultancy model has the additional advantage of placing the UXD group outside of the organization, allowing freedom of decision-making and objectivity when selecting projects to pursue. Following this model, the group can focus intensively on the five or six most strategic products, and work with teams truly interested in their expertise. Since converting to the Focus Team model, senior leaders recognize that the UXD Group’s contribution to revenue increased to more than $2.5 Billion! Such impact has been difficult to ignore; one result is that Cisco’s new motto is “Lead the Experience.” Cisco executives now recognize that the experience itself is the next “advanced technology.” 2.1 Engagement Model for Successful Focus Teams The Internal Consultancy Model is not ideal in every environment. For it to succeed, UXD management must: 1. Only choose worthwhile projects where measurable opportunity exists for demonstrable impact, and where management is willing to give credit to the UXD Focus Team. 2. Merge each UXD Focus Team into the Product Development Team with clearly delineated roles. 3. Adhere to best practices by following a clearly defined process, with well-defined entry and exit criteria. 4. Choose Focus Team members carefully. 5. Follow through to demonstrate impact.
Collaborative Design for Strategic UXD Impact and Global Product Value
217
2.1.1 Choosing Worthwhile Projects While it is a shame to forego UXD on smaller projects, the point is to dedicate resources where they will have the most effect—we must pick our battles wisely. To take this metaphor a bit farther, a classic military strategy is to focus overwhelming resources on a single target. Then, when success has been achieved, move to the next target. This model can apply to UXD efforts: Shouldn’t any UXD manager make sure that critical projects are fully resourced, even if it means neglecting other projects? The alternative is to be spread desperately thin, resulting in average improvements on most projects, rather than disruptive innovation [6] on a few projects. Choosing the right projects also includes: 1. Conducting an Opportunity Review before agreeing to commit resources, to ensure that the product team is receptive and executives recognize the problem. The product team must agree that their success requires a UXD Focus Team. 2. Generate a Project Brief, a statement of work that describes: • • • • • • • •
Statement of value (summary) Challenges (such as competition) Solution (typically broken into multiple phases) Deliverables to be provided Resources (people) required on UXD team Detailed schedule Costs Assumptions and risks
3. Concept and Execution Commitments, in which managers from the different organizations agree to supply people and money 4. The UXD Focus Team is embedded and integrated with the product development team. 5. The project has clear start and stop points, with clear exit criteria, and is not open-ended. 6. Focus team members love to collaborate, and excel at working in teams. When UXD Group leaders decide which projects to accept, they consider the following factors: • Product team receptivity. The product development team itself has requested support from UXD, rather than had it “pushed” upon them by management. If a product team is ambivalent, the UXD group disengages. • Potential revenue or cost savings. The UXD group seeks projects on which they anticipate a minimum revenue increase of $25 Million in the first year. • Advanced technology—a new technology that has not yet been introduced to the market, so the UXD Group can make a larger impact than on legacy products (preferable, but not required). • Leveraging the Cisco UE Standards (UI guidelines and tools). If a product team does not intend to adopt the UE Standards, the UXD Group will not assign
218
J. Nieters and D. Williams
resources. These standards include component libraries to help engineers quickly create code that is accessible, usable, internationalized, and branded. • High visibility. If a project is a “pet project” of a cross-functional or highly visible organization within the company, the UXD Group is more willing to accept it. • Point in the product lifecycle. If design has already begun, it is often too late to impact a product’s overall experience at a fundamental level. There are times when the UXD group agrees to work on a project through multiple iterations, starting late in one cycle to impact a subsequent release. • Realistic time-to-market demands. The Cisco UXD Group delivers value rapidly. However, if project schedules make delivering a high-quality user experience impossible, the UXD group is less likely to accept the project. While there are other factors, this list represents the most salient ingredients used in deciding to work on a project. 2.1.2 Merging the UXD Focus Teams into Product Teams with Clearly Delineated Roles UXD Focus Teams must integrate completely with the product development team during a project. They cannot function as the “icing on the product team’s cake.” In the centralized and client-funded models, product teams can more easily treat UXD team members like outsiders. In the focus team model, management and product team members have all committed to a stellar user experience. UXD Focus Teams need to be viewed as true partners with product teams, and they must treat each product team like the paying customer it actually is. The roles of the UXD Focus Team must be specifically defined, just as the roles of the product team members are. Cisco’s UXD management created a role grid that explicitly defines UXD roles and skills. The UXD Focus Team functions as the architect who provides the blueprint for the elements of the product that define the user experience, and the developers function as the carpenters who deliver to the specifications. If the product team does not agree in advance to these roles, the UCD group does not accept the project. 2.1.3 Choosing Focus Team Members Carefully To win the trust and respect of product teams, members of the UXD Group must demonstrate world-class user experience design skills. Of equal importance, UXD practitioners must have the business, teamwork, technical, communication, and advocacy skills to ensure that product teams will choose to work with the UXD Focus Team. We must understand the larger business context of our work rather than drive single-mindedly toward an ideal design goal. By approaching the design role as though the product team is a customer with a revenue target that we need to help meet, we become more strategically relevant in our organizations. Ddespite their underlying focus on business goals, corporate executives need to trust you to understand their requirements, to trust that you can help them succeed. Personal trust and accountability can be more important than ROI. UXD Focus Team members must be able to build this credibility.
Collaborative Design for Strategic UXD Impact and Global Product Value
219
2.1.4 Following Through to Demonstrate Impact As any consultancy would do, it is essential to make all successes visible. Future business requires such demonstrable impact. No one would engage a consultancy without a fine reputation and portfolio, and the same rules apply to internal consultancies. To achieve this visibility, the Cisco UXD Group tracks impact and records case studies on its website, as you would find on the websites of design firms in industry. The stories in this portfolio describe: • The Problem • Our Solution • The Impact If the UXD Group cannot calculate the financial impact and managers do not provide a quote attesting to the value of the UXD Group activities, that project does not appear on the portfolio website. Other managers can refer to these examples of impact and trust that the group can deliver the same value for them. 2.2 Extending UXD with a Partner Ecosystem Since Cisco’s UXD Group now behaves as an internal consultancy, it has been able to increase its influence by subcontracting to external consultants. To the customers of the UXD Group (Cisco’s product teams), there is little difference. Such collaboration with external design firms such as Asentio Design in China not only increases the internal UXD team’s capacity. It also injects emerging and global perspectives on research, design, technology, partnerships, and the connection between these domains. Such fresh perspectives are critical to stimulate the innovation required in such a company. The UXD Group soon realized it needed an ecosystem of partners who could augment staff, drive entire projects, and introduce ideas that stimulate disruptive innovation. Using external consultants has become a natural extension of the group’s engagement model. The UXD partner ecosystem includes different types of design firms for different types of design projects. Asentio Design, through its international team can provide dedicated support in all areas of the design lifecycle as well as specific market knowledge and partner relationships from its base in China. As product experiences are increasingly designed to support emerging and mature markets, models such as Asentio Design’s are crucial in allowing Cisco to collaborate with manufacturers in, and develop new products for, emerging markets. A partner ecosystem therefore provides opportunities for innovation between internal and external consultancies as well reducing cost and providing design “bridges” between markets. Many Western companies now leverage a global network of partners [4]. Should we as designers, not also leverage this business model to deliver rapid, low-cost, and globally relevant products?
220
J. Nieters and D. Williams
2.3 Leveraging Intact Design Firms Is Not Offshoring It is important to distinguish “offshoring” from leveraging intact global partners. In the consultancy model, as companies hire work with external design firms, they are seeking rapid, high-quality and globally relevant engagements. This process differs from “offshoring,” which in this paper we define as a company hiring its own resources in another country in order to decrease costs. One of the key value propositions for hiring an intact design team (international design firm) is that they have already performed the hard work of seeking and hiring trusted researchers and designers. These teams have also already gone through the hard work of teambuilding. Developing an ecosystem of partners prevents leaders of UXD organizations from having to attract, hire, and retain talent, which can be even more difficult across international boundaries.
3 Examples of Impact Asentio Design and Cisco are currently working on some joint projects that we hope will change market dynamics, but because these products have not yet reached market, we look forward to reporting on these in subsequent years. From a Cisco perspective though, the company is attempting to enter emerging markets in which they have less experience to cultural expectations, norms, and challenges from a user perspective. As such, it is critical that they partner with design firms in four areas: • Design of personal experiences, which encompasses physical products, application user interfaces, out-of-box experiences and retail environments. • Consumer Research in markets where Cisco does not have a UXD research or design present. The costs of leveraging a company such as Asentio Design are significantly cheaper than setting up a presence in each such emerging market. • Globalization. As Cisco has focused more on Internationalization and Localization as they enter new markets, they need partners in-country to help test their products for these international audiences. Asentio Design has many examples of working with US and European companies and delivering world-class and culturally appropriate designs at a much lower cost than if US or European-based companies had designed them. The following case studies show examples of such international collaboration. 3.1 Case-Study 1 (US/China): Commercialization of a Military Product The client had a long history in developing products for military customers. However, they now wished to take their advanced image processing technology into the commercial market place. While building sourcing relationships in China the client was introduced to Asentio Design as a possible design partner. In order to develop their first consumer product, the scope of the client’s requirement was broad, covering consumer research, feature planning, retail & packaging, user interface design and industrial design. Asentio Design, through its international team and position in shanghai (allowing rapid travel to the client’s west coast US headquarters) conducted
Collaborative Design for Strategic UXD Impact and Global Product Value
221
consumer research on the US East and West coast and personal experience strategy planning through two design workshops at the client’s US-based office. Research and Strategy work was followed by a user interface and product design phase where teams in Shanghai and US worked in close collaboration with frequent face-face meetings. 3.2 Case-Study 2 (Europe/China): Research into Digital Imaging Lifestyles in China and Europe A European mobile phone OEM wished to research and compare the usage of highend camera phones in Europe and China. The company approached Asentio Design because of the latter’s partners’ long experience in researching and designing mobile personal experience across global markets, their location in China and their lower cost base compared to European design consultancies. Asentio Design, through its multilingual team was able to conduct diary studies, one-one interviews and on-line surveys in 4 languages (Mandarin, Cantonese, English, German) in Shanghai, Hong Kong, London and Germany. The on-going results of the research were presented to client teams in Europe and China, allowing wide-dissemination, and providing the stimulus for subsequent more focused research
4 Choosing a UXD Organizational Model The Focus Team model is not right for every company. Perhaps the most important factor in deciding what UXD structure to adopt for your group is management that understands what business model is appropriate for your company’s unique environment. The Focus Team or Internal Consultancy model, is best when: • The organization does not have enough UXD practitioners to support every project. • When cost is an issue. Working with a reputable design firm, such as Asentio, who knows how to deliver excellent results, provides highly qualified resources at a much lower cost. • You need to design products for international markets and need a partner who can design a culturally appropriate product. • Your team’s survival or reputation depends on delivering excellence on every project (you cannot afford to assign one designer to multiple projects, thus diluting their impact) • Product teams can “opt out” from working with you. If your company does not require every product team to follow UCD practices and work with a UXD staff, then working only with motivated teams can optimize your resources. • You can “opt out” of minor projects and focus on the highest-priority projects in the company. Trying to make small improvements on all (or most) products can dilute a UXD group’s impact.
5 Summary Personal experience design is now a truly global activity. In order for companies such as Cisco to effectively support product teams and innovate in global markets, their
222
J. Nieters and D. Williams
UXD groups must look increasingly to the new breed of international design studios located in these markets. Companies such as Asentio Design can offer local knowledge allied with western design processes and experience.
References 1. Anderson, R.I.: Managing User Experience Groups. Second Offering, UCSC Extension, Cupertino, CA (2006), http://www.well.com/user/riander/mguxgrps.html 2. Bias, R.G., Mayhew, D.J.: Cost-Justifying Usability. Academic Press, INC, San Diego, CA, USA (1994) 3. Huh, B.-L., Henry.: PhD. Developing usability team in a company: Multiple perspectives from industries, In: Conference Proceedings, Asia-Pacific CHI (2006) 4. Engardio, P., Einhorn, B.: Outsourcing Innovation, BusinessWeek (March 21, 2005) 5. Innes, J., Friedland, L.: Re-positioning User Experience as a Strategic Process. CHI 2004 tutorial (2004) 6. Moore, G.A.: Dealing with Darwin: How Great Companies Innovate at Every Phase of Their Evolution. Portfolio, New York, New York, USA (2005) 7. Rohn, Janice.: Managing a User Experience Team. In: Proceedings of the Nielsen Norman Group Conference, Seattle, WA (2006) 8. Schaffer, E.: Offshore Usability: Helping Meet the Global Demand? Interactions, p. 12 (March – April, 2006) 9. Williams, D.M.L.: Co-Design, China and the Commercialisation of the Mobile User Interface, ACM “Interactions” Special Gadget Issue, October, Vol. XIII(5) (2006)
Participatory Design Using Scenarios in Different Cultures Makoto Okamoto1, Hidehiro Komatsu1, Ikuko Gyobu2, and Kei Ito1 1
Media Architecture, Future University-Hakodate, Kamedanakano 116-2 Hakodate, 041-8655, Japan 2 Fuculty of Human Life and Environmental Sciences, Ochanomizu University, 2-1-1 Otuka, Tokyo, 112-8610, Japan {maq, g2105009, k-ito}@fun.ac.jp, [email protected]
Abstract. In this paper we have examined the effects of scenarios from a participatory design and cross-cultural perspective. The Scenario Exchange Project was an international workshop using scenarios. The participants were university students from Japan and Taiwan. The impetus behind this project was the practical demand for designers to correctly understand different cultures and design products and services. We confirmed that scenarios are effective techniques for bolstering participatory design. Furthermore, we have recognized that we must create new methods for describing the lifestyle and cultural background of personas. Keywords: Scenario, Information Design, Cross Culture, Situated Design, Participatory Design.
2 Design Process Using Scenarios Many researchers have worked on the scenario-based design approach, starting with John M. Carroll [1]. As they describe human activities and goals using unformatted symbols (words or pictures), scenarios are special in that anybody can understand and use them easily. Furthermore, they facilitate smooth, mutual understanding between stakeholders, such as requirement analysts (designers), customers, users and developers [2]. While scenarios are common tools for expressions, they are also effective tools for interaction designers or media architects. However, it cannot be said that most designers commonly use scenarios in their work. A number of innovations need to be made in order for designers to use them at work, including the situations in which they are effective, simple and effective methods for describing situations, how to elicit requirements, ways of expressing new scenarios and methods for evaluating them. We believe that the meaning behind the use of scenarios is the process by which designers and users cooperate in order to understand unknown living conditions and create new products. As advances continue to be made in information technology, we will be confronted with situations for which we have never designed before. The scenario-related tasks which we have worked on include systems that support the lifestyles of the visually impaired [3, 4], systems that assist exchanges between people from different cultural and linguistic backgrounds [5], and mobile communications services[6]. In an increasingly complex and globalizing modern society, we believe that there is a limit to the world view which individuals are capable of understanding, and new techniques for sharing situations, such as scenarios, will become more and more necessary in the future. In this paper we will report on cases which we implemented design activities using scenarios, in situations where language and culture differed.
3 Scenario Exchange Project Okamoto, Der-Jang Yu and associates held a workshop for students from different cultures to design systems using scenarios (from May 2005 to May 2006). The Scenario Exchange Project (hereafter, SEP) proposed by Yu was one in which Japanese and Taiwanese students designed new information systems through the medium of scenarios. They used the Scenario Exchange Server to share scenarios; problem scenarios with described situations and solution scenarios that address those situations. Furthermore, in order to verify factors which could not be expressed via scenarios or online communication, we held workshops in the respective countries (Table 1). There were two kinds of scenarios which we used with this technique. The first was a Problem Scenario. It describes how users applied the device and the kinds of problems that may have been confronted. This was described from field surveys and interviews. Based on the requirements extrapolated from an analysis of this scenario, a Solution Scenario, which describes how the proposed service should be used, was introduced. These scenarios are special in that they are specific and easy for anybody to understand, making them useful for communication between stakeholders, such as designers, engineers and those involved in the process [1].
Participatory Design Using Scenarios in Different Cultures
225
Table 1. Summary of Project Title: Scenario Design & Episode Exchange Term: Dec16-18, 2005 Place: Hakodate, Japan Participants: FUN 11, NCTU 18 Title: Mobile Taiwan & Ubiquitous City Term: May 7-9, 2006 Place: Taipei, Taiwan Participants: FUN 14, NCTU 18, TAU 22, NYUST 4
1st Workshop
2nd Workshop
Table 2. U-team’s and D-team’s Roles Team U-team (User) One Group D-team (Designer)
Role - User’s perspective (or assuming the role of the user) - Write a Problem Scenario - Idea development - Establishing a hypothesis
SEP constructed the Scenario Exchange Web to enable stakeholders to share scenarios and exchange opinions with each other. This Web enables the whole process, from Problem Scenario to Solution Scenario, to be recorded and shared. It is possible to post not only text, but also camera images and hand-drawn sketches. Furthermore, users and designers can exchange opinions by means of a function for commenting on scenarios. This environment makes it possible for information to be shared via unformatted symbols. Our aim was to consider how much participants who were interacting in this environment were able to understand their counterpart’s situation. The participants were students studying user interface, graphic design and product design. Every group had five to six members, a combination of Japanese and Taiwanese. Each group was further subdivided into a U-team and a D-team (Table 2). The U-team acted as observers. They had to carefully observe the condition of users (or assume the role of users) and create Problem Scenarios. The D-team had to propose ideas based on U-team’s Problem Scenario, that is, they acted as designers. All of the workshops were on the theme of proposing information-processing devices which facilitate travel. By adopting the theme of travel, the students had to take into account the local characteristics of the travel destination.
4 The SEP Process SEP comprised of two phases (Fig.1). Phase 1 was the Remote Research Phase. During this phase, U-team and D-team carried out activities from separate locations (Japan and Taiwan). U-team was the first to travel. The actions of one subject being observed were recorded using a camera or by taking notes. The observer interviewed the subject and wrote a Problem Scenario. This scenario was then uploaded onto the Scenario Exchange Web. This scenario was divided into multiple scenes and each scene was provided with Positive, Negative and Wish categories. Observers wrote down brief notes on the users’ satisfied, positive attitudes and behavior under
226
M. Okamoto et al.
Positive, mistakes and passive attitudes and behavior under Negative, and desires under Wish. Furthermore, personas were set up based on the subject observed and brief profiles were written down at the beginning of each scenario. The term personas described in this paper were virtual user profiles used in the scenario method. Usually, personas are set up based on multiple persons and the most appropriate one is determined from among them, but this step was omitted in the SEP. D-team gained an understanding of what U-team had experienced on their travels from the scenarios and proposed ideas (establishing a hypothesis) via the Scenario Exchange Web. They proactively used online communication such as the Internet, email and chat, asking questions about unclear points in the Problem Scenarios. In Phase 2, D-team actually visited U-team’s country and held a joint workshop. D-team re-experienced the situations which they had previously only been able to understand from the Problem Scenarios. U-team answered questions on differences in the social and cultural background, in particular, and facilitated D-team’s understanding. By re-experiencing, they became aware of things which they had been unable to understand with the scenario, reconsidering ideas based on the new insights and views which they had attained. Solutions Scenarios and Product Images were then created collectively.
Fig. 1. Process of Scenario Exchange Project
5 Project Results 5.1 Workshop 1: Scenario Design and Episode Exchange (December 2005, Japan) We will discuss the significance of this technique by using an example from a group that worked on the “Service Proposal for Fishermen” at the first workshop, which was held in Hakodate in December, 2005. Phase 1: Three students from Future University (U-team) went to a fishing port in the vicinity of Hakodate City for a fishing trip. The actions of one subject being observed were recorded using a camera and by taking notes. Additionally, details of the participant’s experiences were gathered at an interview, and were written up as a Problem Scenario before being uploaded onto the Scenario Exchange Web (Fig.2, Left). The Problem Scenario descriptions began with the fishing preparations on the day prior to the trip, right up until the moment when the fish that had been caught were eaten.
Participatory Design Using Scenarios in Different Cultures
227
Fig. 2. Problem Scenario (Left) and Idea Sketch based on User Requirements (Right)
Fig. 3. Solution Scenario (Left) and Product Images (Right)
Four students from Chao Tung University (D-team) extracted user requirements from the Problem Scenario and proposed three Idea Sketches (Fig.2, Right). Phase 2: On the first day of the workshop, the students from Future University reexperienced fishing with the students of Chao Tung University. By actually experiencing the situation which they had previously only known via the scenario, Dteam was able to understand the enjoyment of fishing and the persona’s feelings. During Phase 1, the members of D-team assumed that hardly anybody went fishing in the winter, and thought that young people did not fish. These assumptions were at odds with the facts. Re-experiencing enabled the students to become aware of such assumptions and lead to an understanding of the persona’s intentions. On the second day, the students discussed in groups whether ideas were valid or not. U-team and Dteam cooperated with one another to create a final design summary. Students then proposed an information system that enhances the enjoyment of fishing by allowing people to compete against other fishermen with regard to the size
228
M. Okamoto et al.
of the fish they have caught using Solution Scenarios and 3D models (Fig.3). The proposed solutions provided extensive support, from a persona making preparations at a fishing tackle store to taking a fish print of the fish which they had caught as a memento. We feel that this was the result of a design that grasped the broad spectrum of the persona’s experiences. 5.2 Workshop 2: Mobile Taiwan and Ubiquitous City (May 2006, Taiwan)
.
The second workshop was held in Taiwan in May, 2006 In this workshop the roles were reversed, with the Japanese students becoming D-team and the Taiwanese students becoming U-team. We use the proposal for a “device which supports communication between people who do not understand each other’s language during a trip” as an example for discussion. Phase 1: Two students from Chao Tung University (U-team) went on a trip to Tamsui in northern Taipei. Tamsui is a historical town blessed with water and greenery. In accordance with the SEP process, they created a Problem Scenario and uploaded it onto the Scenario Exchange Web (Fig.4, Left). The Problem Scenario described subjects who were unfamiliar with Tamsui freely traveling in the town. Four students from Future University (D-team) then proposed multiple ideas (Fig.4, Right) from the Problem Scenario, from the perspective of new tourism experiences in Tamsui.
Fig. 4. Problem Scenario (Left), Ideas based on User Requirements (Right)
Phase 2: On the first day of the workshop the Japanese and Taiwanese students joined together and went on a trip to Tamsui. In Taipei, Tamsui is a leading sightseeing area with many market stalls. The group walked around sampling the local food and taking in the natural scenery, historic buildings and landmarks. As a result of re-experiencing Tamsui, D-team realized that the streets were a maze, complicated and easy to get lost in. The group then became aware that many problems arose when the Japanese and Taiwanese students communicated with each other, such
Participatory Design Using Scenarios in Different Cultures
229
Fig. 5. Solution Scenario (Left) and Product Image (Right)
as when trying to fill in direct information on the map for deciding the nextdestination, or when the Taiwanese students communicated about food which they recommended through pictures. Furthermore, they focused their attention on the importance of finger pointing in these activities. The group then proposed that an IC chip be attached to the finger and a device be worn over one of the eyes (Fig.5). A canvas could then hypothetically be spread in the air and words and pictures drawn onto it, operated by metaphorical movements of the fingers. Additionally, it would also be possible to search for information using a network. These were the findings for an innovative design based on the very exchange experience of U-team and D-team.
6 Discussion 6.1 Scenario and Hypothesis Exchange In Phase 1, the Problem Scenario which U-team had created was exchanged for a hypothesis by D-team relating to it. It is thought that the Problem Scenario was useful in communicating the situation to D-team, the members of which were from a different cultural background. U-team had to condense the actual experience of the trip into the form of a scenario. Although the actions of the persona and the situations could be written into the scenario, the intentions of individual actions and the cultural meaning behind them were obscure. The persona profiles described, for example, that someone was male, 22 years old and that their hobby was skiing, but nothing more detailed. As a result, it became clear from the follow-up interviews that D-team had trouble understanding the intentions or culture, even though they were able to learn about the individual situations. Although scenarios were definitely able to relay the situation, it was difficult for them to get across factors, such as context and culture that were related to those situations.
230
M. Okamoto et al.
6.2 Re-experiencing in Workshops Re-experiencing at Phase 2 was useful for developing a more refined solution, by Dteam’s having an experience equivalent to that of U-team. The interactive efforts during Phase 1 also deepened rapport at the time of the workshops. Furthermore, it is also thought that the level of cross-cultural understanding increased according to the depth of that rapport. We concluded from the students’ reports that proactive communication for gaining background knowledge of their counterparts and reexperiencing made them realize that they had made assumptions about facts, which in real-world situations may lead to poor interaction between stakeholders and design processes.
7 Conclusion The advantages and limitations of our discussion so far are divided and summarized in Table 3. In the SEP, students from completely different cultural backgrounds cooperated to design products. The results of their activities were that they created proposals which offered rich experiences and they were able to practically apply situational designs. For these efforts, an understanding of the cultural background was important when designing. Cultural background is not limited to the national and ethnic cultures of Japan and Taiwan. Culture exists in different structures, such as age, generation, occupation, family or area. Scenarios are extremely helpful for grasping situations. Scenarios are no more than a doorway for understanding. The repetition of questions about problematical points or obscure areas contained in scenarios leads to a deep understanding of the user (context or cultural background). Although re-experiencing grants a deeper understanding of the counterpart’s situation, it seems that formation of a rapport between users and designers such as that gained in our workshops is of significance. The use of a representational scenario as a mediator has the effect of stimulating active participation, even when the participants’ counterparts are from countries where different languages are spoken. Scenarios are effective media in participatory design efforts. However, scenarios also have the following limitations: • The information takes a lot of processing effort (resizing of photographs, composition of text, etc.) before it can be sent to the server. • Text-intensive descriptions take time to read and write. • There is no easy way of writing up background information (context and culture). In order to solve these problems, we would like to create a design system which assists easily stores observational records on a server and allows viewers to understand situations easily.
Participatory Design Using Scenarios in Different Cultures
231
Table 3. Advantages and Limitations of the Scenario Exchange Project
Advantages
Phase 1 Scenario Exchange 1. D-team can understand situation in which U-team is placed. 2. Easy for D-team to discover problems from scenarios. 3. Scenarios give opportunity to try to understand intentions and culture. (Questions and interests arise easily) 4. Since scenarios and hypotheses are disclosed on web, they are always available for viewing.
Limitations
1.
2.
3.
4.
U-team occasionally takes time to write up scenarios. Innovations for explicitly relaying situations are required. Risk of subjective and objective perspectives becoming mixed in scenarios. Skills for expressing self in foreign language (English) are required in order to communicate ideas to counterpart. Possibility that D-team will carry their assumptions on reality.
Online Communication 1. Can hold discussions in real time by using chat software. 2. Can observe counterpart’s face and voice by using video chat. 3. Can exchange information which cannot be completely supplemented with scenarios. 4. Leads to rapport building. 1. Risk of exchanges taking up a lot of time. 2. U-team is required to be well acquainted with their own country’s culture and have skills to relay that knowledge adequately.
Phase 2 Re-experience 1. D-team can notice environments or information which was not described in scenarios. (Discovery of new problem areas) 2. D-team can notice assumptions about reality. 3. D-team can increase level of understanding of intentions and culture. 4. Can verify whether ideas are appropriate through re-experience. 1. Possibility that information gathering will be insufficient when time is short.
References 1. 2. 3. 4.
Carroll, J.M.: MAKING USE. MIT Press, Cambridge, MA (2000) Go, K.: Requirement Engineering, Kyoritsu Publisher (2002) Kato, S., Okamoto, M.: Tool supporting memory of visually impaired person, WIT (2006) Okamoto, M., Akita, J., Ito, K., Ono, T., Takagi, T.: CyARM; Interactive Device for Environment Recognition Using a Non-Visual Modality. In: Miesenberger, K., Klaus, J., Zagler, W., Burger, D. (eds.) ICCHP 2004. LNCS, vol. 3118, Springer, Heidelberg (2004) 5. Komatsu, H., Ogawa, T., Gyobu, I., Okamoto, M.: Scenario Exchange Project. In: International workshop using Scenario Based Design, Human Interface 2006, Japan, pp. 503–508 (2006) 6. Okamoto, M., Ishii, K.: Method for Information Design based user’s thinking process, JSSD, pp.18–19 (2002)
Wizard of Oz for Multimodal Interfaces Design: Deployment Considerations Ronnie Taib and Natalie Ruiz ATP Research Laboratory, National ICT Australia Locked Bag 9013, NSW 1435, Sydney, Australia School of Computer Science and Engineering The University of New South Wales, NSW 2052, Sydney, Australia {ronnie.taib, natalie.ruiz}@nicta.com.au
Abstract. The use of Wizard of Oz (WOz) techniques for the acquisition of multimodal interaction patterns is common, but often relies on highly or fully simulated functionality. This paper suggests that a more operational WOz can benefit multimodal interaction research. The use of a hybrid system containing both fully-functional components and WOz-enabled components is an effective approach, especially for highly multi-modal systems, and collaterally, for cognitively loaded applications. The description of the requirements and resulting WOz set-up created for a user study in a traffic incident management application design is presented. We also discuss the impact of the ratio of simulated and operational parts of the system dictated by these requirements, in particular those related to multimodal interaction analysis. Keywords: Wizard of Oz, Multimodal user interface, Speech and gesture, User-centred design.
Wizard of Oz for Multimodal Interfaces Design: Deployment Considerations
233
particular task, and has been associated with the limited capacity of working memory [4, 5]. TIM operators are bombarded with live information that needs to be integrated, synthesised and entered into the system. They need to monitor heterogeneous information sources and respond to several complex incidents at a time, activities which induce very high levels of cognitive load. Thus, the research involves correlating two factors: high levels of cognitive load and use of multimodality, i.e. speech and gesture interaction.
Fig. 1. TIM User Interface in the Experiment
We hypothesised that the operators’ patterns of multimodality would significantly change as their cognitive load increased. In more detail, we expected: • Increase in level of multimodality, i.e. using more than one modality when given the choice, as a strategy for cognitive load management; • Increase in the frequency of complementary information carried across modalities, for example, where the same message is partly carried partly by speech and partly carried by gesture, with no semantic overlap; • Decrease in the level of redundant information in interactions as cognitive load increased, i.e. less occurrences of input where each modality would carry the same message, with semantic overlap. In this paper, we present the design of a Wizard of Oz (WOz)-based user experiment intended to verify these hypotheses. We review the constraints imposed by the study of multimodal interaction, given our research field and requirements and discuss the trade-off existing between simulated and operational parts of the WOz.
2 Background Multimodal interaction, though characterised as more intuitive or natural is not yet robust enough to fulfil its promises. Product-oriented multimodal systems become
234
R. Taib and N. Ruiz
limited in their functionality to alleviate robustness problems, while research-oriented multimodal systems can suffer from over customisation and application dependency, not allowing broader reuse of components. The Wizard of Oz (WOz) technique has early been recognised as an essential tool for the design of multimodal interfaces, where novel interaction patterns were expected to appear [6]. WOz set-ups allow intuitive interaction while removing the limitations, such as input recognition error rates and misinterpreted semantic fusion of multimodal signals. The ethics of the method have been criticised, considering that the subjects are deceived into believing that they are interacting with a working system; however, [7] have noted a positive acceptance by the subjects when informed during post-hoc debriefing. Another limitation of evaluating simulated systems versus real working systems is mitigated by the same authors on the ground that human users can adapt to the capabilities of a system. While this remark is interesting and correct, we have found that an unconstrained WOz implementation is an efficient UCD tool since it may still require a large user adaptation to the system functionality. This highlights a crucial aspect of the development of a WOz set-up: the relationship between the boundaries of the real system in comparison to the simulated functionality of the WOz system.
3 Design Methods for MMUI Systems 3.1 Task Design for Eliciting Multimodal Interaction A user study of multimodal user interaction requires well planned experiment tasks in order to elicit as natural interaction as possible, yet providing targeted data. The traffic incident management scenario we designed comprised the tasks of marking entities or locations on a map, then deploying resources in relation to those incidents. Four sets of tasks with varying difficulty corresponded to four distinct cognitive load levels. Each set comprised three modality-based conditions, namely using speech only, gesture only, and multimodal (speech and gesture) interaction, this latter being the focus of this paper. Each condition had three repeat tasks in order to obtain statistical power. Hence, subjects had to perform 48 tasks in total. Each set of tasks was completed with the same interface, and the subjects were trained in all conditions during a preliminary session. Task difficulty can be induced in two ways. Firstly, the content and inherent complexity of the problem can be increased: this is known as intrinsic load [4]. Similarly, task difficulty can be induced by increasing the complexity of the representation of the data, known as extraneous load [4]. A good example of this is performing a simple ‘drag and drop’ operation with a mouse-driven UI versus a speech-driven UI. The operation is the same, so the difference in complexity originates from affordances of the input modality. It is much simpler to increase the task difficulty (and cognitive load) by increasing the inherent complexity of the concepts in the task, rather than providing more complex representations, where the effects are much more subjective and unpredictable. For these reasons, we chose to
Wizard of Oz for Multimodal Interfaces Design: Deployment Considerations
235
manipulate intrinsic load to increase task difficulty. The four distinct levels of cognitive load varied in five ways (Table 1): • • • •
Visual complexity: The number of streets in each task increased from ~40 to ~60; Entities: The number of distinct entities in the task description was increased; Extras: The number of distractor (not needed for the task) entities increased; Actions to completion: The minimum number of actions required for task completion; • Time Limit: The most difficult level was induced by a time limit for completion. Table 1. Cognitive Load Levels Entities
Actions
Distractors
Time
1
6
3
2
∞
2
10
8
2
∞
3
12
13
4
∞
4
12
13
4
90 sec.
Level
3.2 Modality Selection Intelligent transportation systems manipulate large amounts of spatial data and traffic control rooms invariably offer wall displays providing an overview of complex situations. In this context, we introduce speech and gesture as two novel communication media, allowing the operator to interact from a distance with the large displays. We further discriminate modalities over these carrier media by the type of interaction that they allow. The resulting modalities are: • Hand pointing: Simple deictic gestures can be used to point to items on the large display, e.g. a specific location on the map; • Hand pausing: Pausing for a short lapse during the deictic movement results in the selection or the activation of the item being pointed at; • Hand shapes: A few specific hand shapes have been allocated specific arbitrary meanings, e.g. a closed fist to tag a location as an accident; • Speech: Natural language can be used for the selection or tagging of items; • Menu bar buttons: Some graphical buttons on the display can be selected by hand pointing and pausing, in order to tag items. The hypotheses of this study necessitate that all tasks be achievable in either a complementary or redundant multimodal way, hence all modalities should be made as semantically equivalent as possible. For this experiment, most tasks could be achieved using the main three modalities: speech, hand pointing and hand shapes. This required a careful crafting of the user interface so that the modalities provide similar functionality in spite of their various affordances. Table 2 provides some examples of equivalent speech and gesture interaction.
236
R. Taib and N. Ruiz
An important aspect to note is that the design allowed users the freedom to choose combined multimodal interaction. They could opt to interact with a single input, in either modality; or with more than one input, in the same or different modality. This applied to the task as a whole e.g. performing the whole task using speech or using gesture; but also to each subtask e.g. performing the item selection using pointing and tagging it using hand shape or speech. Table 2. Examples of multimodal inputs
Functionality
Speech
Gesture
Zooming in
“Zoom in to the top left quadrant”
Point to the corners of the top left quadrant
Selecting an element
“Select the Church on Street X”; or “Church on Street X”; or “St Mary’s Church”
Point to the element and pause
Requesting information on an element
<Select an element> then, “Information on selected element please”; or “Information”
<Select an element> then, Point to “Info” button
Tagging an element as an accident
<Select an element> then, “Mark as accident”; or “Accident”
<Select an element> then, Point to “Accident” button; Or, make closed fist shape
Tagging an element as an event
<Select an element> then, “Mark as event”; or “Event”
<Select an element> then, Point to “Event” button; Or, make scissors shape
Using automatic speech and video-based gesture recognition would dramatically decrease the usability of the system because of the average recognition rates exhibited by such technologies [8]. Reduced usability, in turn, forces subjects to adapt to the system limitations, which works against our primary objective to collecting natural inputs. Hence a WOz approach was selected for this set of experiments, where the wizard manually performed speech and hand shape recognition. An automated hand tracking module developed in-house was found to be sufficiently robust to use during the experiment.
Wizard of Oz for Multimodal Interfaces Design: Deployment Considerations
237
3.3 Data Collection Given our hypotheses and selected modalities, a number of interaction features have to be captured and analysed. Each stream reflects different aspects of the interaction and involves specific requirements. Application-Generated Events. The application monitors the progress towards task completion by recording relevant actions such as the selection or tagging of items on the map. The time of occurrence of such actions may also be used to estimate the subject’s performance on the task. Speech Input. Speech is a major input in many multimodal user interfaces, and we decided to use unconstrained, natural language input during this experiment. The wizard is in charge of interpreting inputs and a the task. However, a complete recording of the speech inputs is very desirable as it contains rich features for the post-analysis of the interaction. Since this experiment was using a single user, we opted for a directional microphone connected to a camcorder in order to capture speech. The major benefit is the inherent synchronisation with the video signal. Gesture Input. An in-house gesture tracking and recognition software module was used to capture hand moves and shapes. This provides untethered gesture interaction with a fair reliability in a lab setting, by using a dedicated high quality Firewire camera focusing on the subject’s hand. The subjects were also videotaped on a classical camcorder in order to capture the overall gestures (see Figure 2). Biosensor data. Physiological data was captured in order to evaluate the level of stress and arousal of the subject during the interaction. In particular, galvanic skin response (GSR) and blood volume pulse (BVP) were recorded using an external device with finger sensors.
Fig. 2. Gesture and speech TIM prototype
238
R. Taib and N. Ruiz
3.4 Data Type Limitations Each data stream provides a rich source of information for the analysis of multimodal interaction, however, there are inherent limitations that have to be balanced in view of the experiment’s purpose. Volume. Audio-visual information is very rich but high quality recordings imply large storage capacity requirements and potential playback and trans-coding issues. Recording on tapes (e.g. MiniDV) requires transfer to a computer at a later stage, often with trans-coding. Further to the consumable cost, this process is extremely time consuming. Hence we opted for connecting the camcorder directly to a computer and record the stream directly on the hard drive. A flat format codec was used for the video streams in order to ensure correct synchronisation between audio and video channels. The resulting files are very large though, so we decided to record them directly on external hard drives in order to provide the maximum flexibility during the post-analysis, while avoiding file copies and transfers that have the potential to corrupt data. Biosensor data also generates large amounts of information due to the high sampling rate at which they should be acquired. Being short text records, the overall file sizes remain easily manageable. Reliability. Multimodal interaction analysis relies on the combination of distinct modality streams in order to improve recognition of other parameters, such as cognitive load. This mutual disambiguation process [8] is most effective when the individual streams are unreliable because of inaccurate recognisers or user context, e.g. automatic speech recogniser or noisy rooms. Biosensor data sensors and acquisition chain is fairly complex, hence often inaccurate. The position and stability of the sensor are paramount for reading GSR, for example. In our experiment, subjects used their main hand for gesture interaction, while their other hand was connected to the biosensors and rested on a back rest of a chair. Any unnecessary movement with the ‘sensor’ hand could cause a disruption in the reading. While it may be difficult to compare results across subjects, within subject evaluation is reasonably stable with this set-up. Another key reliability issue are manual annotations. Uniformity among annotators is difficult to achieve and requires precise annotation rules and cross-validation between annotators. The precision of manual annotations usually comes at a cost; for example we annotated start and end of speech with a precision of around 10ms, which required specialised software tools and more annotation time. Finally, data precision is important as it can restrain the span of numerical analysis. Biosensor technologies vary in cost and precision, so a trade-off between these parameters dictates the final choice. In this experiment, we used professional grade biosensors, with a real-time link to the computer for acquisition. Synchronisation. Accurate synchronisation of all the data streams is crucial to the effective annotation and analysis of multimodal interaction. Logging data on separate computers and devices requires means to ensure synchronisation during recording, for example using the Network Time Protocol (NTP) to synchronise the computers’ time.
Wizard of Oz for Multimodal Interfaces Design: Deployment Considerations
239
But it also requires means to synchronise streams post-hoc, which may be unreliable for video or biosensor data for example. To alleviate this issue, we directed all data streams, except the audio-visual, to a single software logging application. This latter provides a uniform time scale as well as a preformatted output easing annotations. Output is buffered to memory during the tasks, in order to avoid loss of information, and is stored to disk files between tasks. The post-hoc synchronisation of the audio-visual stream is possible thanks to auditory beeps played by the system at the beginning and end of each task. The time of occurrence of the beeps is logged by the unified software logger, and can be manually reconciled with the audio channel during annotation.
4 Discussion: Level of Actual Functionality In the field of multimodal research, system robustness is critical to elicit intuitive and reliable interactions from the subjects, e.g. users will compensate unnaturally for errors in recognition of a certain input and may stop using it or increase use of other modalities that are perceived to be less error prone. Hence, the WOz systems are usually highly or fully simulated, sometimes based on multi-wizard support (e.g. one for speech recognition and one for output preparation). However, there are no general guidelines available in terms of design factors for WOz systems, and our experiment allowed us to determine some characteristics of the data that greatly impact the design, such as volume, reliability and synchronisation. Further to those characteristics, we discovered that the balance between functional components and ‘wizardry’ is highly dependent on the user study design and the goals of the research. When the goals are largely evaluative, more functional modules are necessary, such that feedback on actual functionality can be assessed and incorporated into final versions of software. In addition, having a fairly functional system makes product development far more achievable. In our case, the focus was on identifying multimodal behavioural patterns in highly multimodal systems: the goals were exploratory and we aimed to capture naturalistic interaction. Though input could only be conveyed through three asynchronous modalities, (speech, hand movements and hand shapes), the temporal, syntactical and semantic characteristics of the interaction were highly complex. To illustrate: the least expressive modality, free-hand gesture, could be used to issue 11 distinct commands in a single movement, each of which could then be combined with other commands, in groups of two or three, along various temporal arrangements to alter the semantics of the command. Further, any command could also be conveyed through the speech modality, and again, combined with others in various temporal arrangements. The choice of modality and the temporal arrangements are very delicate characteristics of interaction and subject to both unreliable input recognition and individual subject differences [9]. The state of the art in fully functional speech and gesture recognition would not be sufficiently error-free to produce unbiased interaction, and for this reason, the decision was made to use wizard-based simulation in place of the recognition and multimodal fusion engines. Giving the wizard this responsibility meant that very few other tasks could be allocated to him, so as to prevent overloading. The limitations of the wizard’s attention span, and the lack of resources
240
R. Taib and N. Ruiz
to provide a second wizard, drove the rest of the functionality to be automated as much as possible. The WOZ technique relies on the user believing the system is fully functional. This gives rise to two aspects of system design that impact the implementation of the system and hence the percentage of actual vs. simulated functionality. The complex form of input in multimodal interaction requires equally complex forms of output. Though primarily graphical, the task scenario was also required to provide able textual output at different stages of input forcing the lag time for system feedback to be as fast as possible. The feedback for each different kind of command may require more than one element to appear on the screen, or some text at various stages of the command being issued. The back-end logic of the application, e.g. responses and immediate output were fully functional and largely operated by the wizard, once user input was interpreted, but the wizard did not need concern themselves with selecting the content or form of output on the fly. The wizard’s interface was tailored to suit, providing large buttons which would facilitate this process. Another factor that may also drive the decision of how to distribute the ratio of functional vs. simulation in a WOZ system is the post-analysis required. The more system events are fully automated, the more markers that can be placed on the data and the more features can be recorded on the fly, such as time stamps, command sequences and types, and The centralisation of system models on a single machine allows a better synchronisation of input signals, facilitating data analysis post hoc. In conclusion, our WOz design allowed us to collect the target data and to confirm our hypotheses. However, there are still many aspects of multimodal user interaction that need addressing, especially in view of the evaluation of the cognitive load experienced by a user. So reflecting on the design choices brought some important insights for the design of future WOz based user experiment. In particular, we identified data characteristics that have a deep impact on the design choices, and we clarified the necessary trade-off between implemented and simulated functionality.
References 1. Oviatt, S.: Ten Myths of Multimodal Interaction. ACM, Communications of the ACM 42(11), 74–81 (1999) 2. Bolt, R.A.: “Put-That-There”: Voice and Gesture at the Graphics Interface. In: Bolt, R.A. (ed.) Proc. 7th annual conference on Computer Graphics and Interactive Techniques, Seattle, WA, USA, pp. 262–270. ACM Press, New York, USA (1980) 3. Schapira, E., Sharma, R.: Experimental Evaluation of Vision and Speech based Multimodal Interfaces. In: PUI’01, Workshop on Perceptive User Interfaces, Orlando, FL, pp. 1–9. ACM Press, New York, USA (2001) 4. Paas, F., et al.: Cognitive load measurement as a means to advance cognitive load theory. Educational Psychologist 38, 63–71 (2003) 5. Baddeley, A.D.: Working Memory. Science 255(5044), 556–559 (1992) 6. Salber, D., Coutaz, J.: A Wizard of Oz platform for the study of multimodal systems. In: Ashlund, S., Mullet, K., Henderson, A., Hollnagel, E., White, T. (eds.) INTERACT’93 and CHI’93 Conference Companion on Human Factors in Computing Systems, Amsterdam, The Netherlands, pp. 95–96. ACM Press, NY (1993)
Wizard of Oz for Multimodal Interfaces Design: Deployment Considerations
241
7. Dahlbäck, N., Jönsson, A., Ahrenberg, L.: Wizard of Oz studies: why and how. In: Gray, W.D., Hefley, W.E., Murray, D. (eds.) Proc. 1st international Conference on intelligent User interfaces, Orlando FL, USA, pp. 193–200. ACM Press, NY (1993) 8. Oviatt, S., Cohen, P.: Perceptual user interfaces: multimodal interfaces that process what comes naturally. Communications of the ACM 43(3), 45–53 (2000) 9. Oviatt, S., DeAngeli, A., Kuhn, K.: Integration and Synchronization of Input Modes During Multimodal Human-Computer Interaction. In: SIGCHI conference on Human factors in computing systems, Atlanta, GA, USA, pp. 415–422 (1997)
Extreme Programming in Action: A Longitudinal Case Study Peter Tingling1 and Akbar Saeed2 1
2
Faculty of Business Administration, Simon Fraser University, 8888 University Drive, Burnaby, Canada V5A 1S6 [email protected] Ivey School of Business, University of Western Ontario, 1151 Richmond St. N., London, Canada N6A 3K7 [email protected]
Abstract. Rapid Application Development (RAD) has captured interest as a solution to problems associated with traditional systems development. Describing the adoption of agile methods and Extreme Programming by a software start-up we find that all XP principles were not adopted equally and were subject to temporal conditions. Small releases, on site customer, continuous integration and refactoring were most vigorously advanced by management and adopted by developers. Paired programming on the other hand was culturally avoided. Keywords: Extreme Programming, Agile Methods, Rapid Application Development.
1 Introduction The speed and quality with which systems are delivered continues to concern both practitioners and academics. Traditional methodologies, while praised for their rigor, are often criticized as non responsive, bloated, bureaucratic, or contributing to late and over budget systems that when delivered solve problems that are no longer relevant. Various solutions have been proposed. Frequently combined under the rubric of Rapid Application Development (RAD), these include extensive user involvement, Joint Application Design, prototyping, integrated CASE tools, and more recently, agile methods such as eXtreme Programming (XP). Following a qualitative study of agile methods and concepts we conclude that adoption and extent of agile principle appropriation are affected temporally and by culture. Coding standards for example may initially be excluded in a search for creativity and flexibility. Similarly, in addition to the continuous improvement of refactoring bursts of intense focus also occur.
Extreme Programming in Action: A Longitudinal Case Study
243
Life Cycle is a well adopted ‘systematic, disciplined, quantifiable approach to the development, operation and maintenance of software’ [2, 6]. However, with increasing backlogs; some high profile development failures; and the need to adapt to emerging business conditions; the SDLC has been subject to criticism that it is constraining, heavyweight and results in projects that are outdated before they are finished [7]. Consequently, many organizations have adopted alternates that emphasize incremental development with constant customer feedback (Rapid Application Development); structured processes where constituents collectively and intensely review requirements (Joint Application Development); construct partial systems to demonstrate operation, gain acceptance or technical feasibility (Prototyping); and tools that assist in software development and business analysis (Computer Aided Systems Engineering). Table 1. Agile Principles of Extreme Programming XP Principle 40-Hour Work Week Coding Standards Collective Ownership Continuous Integration Continuous Testing On-Site Customer
Rationale and Description Alert programmers are less likely to make mistakes. XP teams do not work excessive hours. Co-operation requires clear communication. Code conforms to standards. Decisions about the code are made by those actively working on the modules. All code is owned by all developers Frequent integration reduces the probability of problems. Software is built and integrated several times per day. Test scripts are written before the code and used for validation. Ongoing customer acceptance ensures features are provided. Rapid decisions on requirements, priorities and questions reduce expensive communication. A dedicated and empowered individual steers the project. Two programmers using a single computer write higher quality code than individual programmers. Business feature value is determined by programming cost. The customer decides what needs is done or deferred. The software is continually improved Programs are simple and meet current rather than future evolving requirements.
Pair Programming Planning Game Refactoring Simple Design Small Systems are updated frequently and migrated on a short cycle. Releases System Communication is simplified and development guided by a Metaphor common system of names and description. Source: Adapted from [8]
In 2001, a group of programmers created a manifesto that embodied the core principles of a new methodology [9]. An extreme application of RAD, agile methods capitalize on member skill; favor individuals and interactions over process and tools; working software over comprehensive documentation; customer collaboration over negotiation; and change rather than plans and requirements. Dynamic, context specific, aggressive and growth oriented [10, 11], agile methods favor time boxing
244
P. Tingling and A. Saeed
and iterative development over long or formal development cycles. The most widely adopted agile development methodology, eXtreme Programming is a generative set of principles that consisting of twelve inter-related principles. These are described in Table 1.
3 Methodology and Data Collection For this study, we used a case oriented approach which is an “empirical inquiry that investigates a contemporary phenomenon within its real-life context, especially when the boundaries between phenomenon and context are not clearly evident” [12]. Site selection was opportunistic, the result of an ongoing relationship with Semper Corporation, a year-old start-up developing an interactive software product. Data was collected between August 2005 and December 2006 and consisted of interviews with employees, observation of the environment and work practices, and retrospective examination of documents and email [13] These are described in Table 2. Table 2. Data Collection Activities Data Type Interviews Observation Artifact examination
Description The development staff and company principals were regularly interviewed throughout the year long data gathering. Programming and development staff was observed at least weekly. This was both active (at the development offices) and passive (by remote viewing of video cameras). Employment and programming records, progress and bug reports, and copies of each build and version of the product were reviewed as was email correspondence.
The main steps in analysis involved identification of concepts and definition followed by the theorizing and write up of ideas and relationships. Content relating to agile methods and extreme programming content were separated and categorized according to discrete principles. Illustrative yet concise examples were then selected. Direct quotations have been italicized and placed within quotation marks.
4 Extreme Programming at Semper Corporation This section reviews principles described in Table 1. Although these principles were meant to be generative rather than all-inclusive, typical recommendations recognize their inter-relatedness and suggest that implementation be done in entirety with adaptation encouraged only when use and familiarity is established [6]. Findings are summarized in Table 3. 40 Hour Work Week. Company policy was one of flexible work hours. Other than core hours between 10:00 to 15:00, developers were free to set their own schedule. While there were work weeks longer than 40 hours (during customer testing or
Extreme Programming in Action: A Longitudinal Case Study
245
resolving production problems) this was the exception rather than the norm. There was no overtime compensation. Another factor affecting the schedule was the young (average 21) age of the developers who adopted nocturnal habits because of their social schedule. For example, advising when he might be in the office, one of the developers noted “I will be in early tomorrow -around 10:00-10:30”. Email conversations (where a message was sent and a response received) between the developers and managers declined during the core hours from 45% to 31% and increased from 6% to 37% between 22:00 and 04:00. Table 3. Adoption Faithfulness of XP Principles eXtreme Programming Principle 40-Hour Work Week Coding Standards Collective Code Ownership Continuous Integration
Adoption Level*
Temporal Effects
Summary
Developers worked flexible but regular workdays. Low to Standards were initially avoided Y Partial but later implemented. Code was officially shared but Partial Y developers exhibited possessiveness. Code was rarely broken and was Full N continually linked and compiled. Testing was continuous but Continuous Partial to Y advance scripts were not created. Testing Full Black box testing was phased. On-Site The CEO and Analytic Director Full N Customer acted as customers. Programmers were independent Pair Low N except when difficulties or Programming interdependencies existed. Value engineering balanced Planning Game Full N features against time and budget. Modules were constantly Refactoring Full Y improved. Periodic bursts of and dramatic improvement occurred. Simple Design Full N Working software was favored. Small Releases Full N Frequent (weekly) build cycles System Communication was simple and Full N Metaphor informal but unambiguous. *Adoption is considered Complete, Partial or full Full
N
Coding Standards. Coding standards were initially avoided. For example, rather than conventionally declaring all variables at the beginning of a module, one programmer simply added them wherever they were needed. Requests to impose standards were generally ignored by management until the program became sufficiently complex as to require tighter control and the CEO realized development teams continually rewrote the variables when they refactored or changed modules. Staff attrition later resulted in de facto standards.
246
P. Tingling and A. Saeed
Collective Code Ownership. With the exception of a few core modules, module decisions were made by the active developers. As a consequence, different ideas about modules were continually rewritten according to individual preferences. Although modules had multiple authors – one team tended to write the analytic modules while another wrote the graphically intense reporting component. While officially shared and located on a common storage medium, developers were reluctant to adapt code written by others and continued to speak of “their code”. Continuous Integration. The programming environment was Visual Basic in a Microsoft .Net framework. Modules were tested in isolation and embedded into the program several times a day adding to the formal build schedule with weekly integration. During the sixteen months of observation, more than 35 complete formal versions of the product and 225 integrations were compiled. In addition, internal and external users were given replacement Dynamical Link Libraries (DLLs) that encouraged up-to-date testing. Despite a preference for working code, there were several occasions when changes to the data model required extensive rewrites and the code was broken for up to two weeks as modules were re-written and tested. Continuous Testing. Test scripts were not written in advance of coding (as recommended by XP) and were frequently developed in parallel. Ongoing functional and compatibility testing used standardized and ad hoc test scripts. Because the design was modular and addressed a specific rather than generic problem, the majority of the code could be tested in isolation. Integration testing was completed after each weekly build and was conducted by management and external users. Black box testing was conducted using a combination of end user and test samples. HCI and usability aspects were the most dynamic with the majority of the changes immediately accepted or rejected by the onsite customer. The few exceptions to this occurred when the developers were given free rein to creatively design new ideas or when previously adopted choices were abandoned. The CEO often challenged the developers to present complex information simply and intuitively rather than providing them with a design to be implemented. After reviewing the work, he frequently commented that they seemed to anticipate what he wanted or were able to implement what he had been unable to imagine. In addition to a comprehensive series of test scripts that were developed and executed, the program was also provided to industry professionals. Two beta tests involving early customer experience programs were used by the company for acceptance testing and both of these surfaced unanticipated areas for attention. Semper used formal bug and feature tracking software for major or outstanding problems but generally the developers tended to simply immediately fix problems once identified. Often the first indication that management had of a problem was when a fix was provided or noted in the change log. Discussing the need to document bugs, the programmers opined that judgment was used to determine if a bug report should be completed after the fact and that this was only done for difficult or particularly complex solutions. Onsite Customer. Because Semper was an early-stage pre-market company, they did not have customers in the traditional sense. Instead, the product vision was provided by the CEO and the Director of Analytics. Originally trained as a mainframe
Extreme Programming in Action: A Longitudinal Case Study
247
programmer, the CEO was empathetic to technical problems but was not familiar with modern systems development and did not get involved in construction details. He would often jokingly describe programming and analytic modules as “it is just a sort and a print right - what is the big deal – three to four hours programming tops!” and would often laugh and offer to write some code himself if he thought some simple aspects were taking too long. He would challenge developers by reminding them that they learned little by programming simple tasks. A developer response to his question about a particularly complex change provides an example “This is possible but will be hard to do. This is because [text redacted]. Anyway, I’m not going to start talking about the how-to parts. I know your response will be ‘if it were easy, why would you want to do it?’ ”. The Director of Analytics on the other hand, had current technical skills and would often interact directly with the developers and offer suggestions. Generally developers worked interactively with the management team and demonstrated prototypes for immediate feedback. Where planned requirements or changes necessitated extensive coding and development work, Unified Modeling Language use cases, conceptual sketches and data models were used as scaffolding to be discarded in favor of a prototype. A great deal of the management and developer communication was oral but the fact that offices were physically separated meant that email and instant messenger were used a great deal. The main design artefacts were the data model and build reports that identified progress and what was planned for or deferred to the next iteration. Pair Programming. Pair programming was not adopted. Developers were dyadic but each within their own workstations. Modules were coded by one person although complex or difficult problems were shared. Although management discussed paired programming as an option with developers when they were hired (new applicants were interviewed by the programming staff and in addition to technical competency had to “fit in”) it was not pursued. Developers, hired directly from university where assignments and evaluations were competitive and individual; did not embrace collective approaches. While the environment was co-operative, developers would occasionally compete to see who could write the most efficient and effective code. Further exacerbating the difficulties with paired programming were work schedules, staff turnover, and personalities. Two of the development staff for example preferred to listen to iPods and to be isolated. Although programmers would often compete to see who could develop the better module they were reluctant to comment on code written by co-workers except in a joking manner. However, once a programmer left the company or was assigned to a different capacity they immediately became part of the out group and their code would often be referred to as “strange” , “poorly written” or “in need of a re-write”. Although developers would blame problems on former co-workers they would laugh when reminded that they may ultimately be subject to the same criticism. After one developer had been gone for six months another noted it was “too late to blame [redacted] now”. Planning Game. Management realized that development had aspects of both art and science. Nevertheless the planning game was used extensively and trade-offs between time and features were routine. Estimates were almost exclusively provided by the developers and once established were treated as firm deadlines against which they
248
P. Tingling and A. Saeed
were evaluated. Development was categorized into Structural Requirements, Differentiating Features, Human Computer Interaction, and Cosmetic changes. Structural Requirements. Features and capabilities outlined in the business plan, considered core and treated as priority and foundational items. Differentiating Features. Provided differentiating or competitive capabilities and were further grouped into “must haves”, “nice to have” and “defer”. The majority of the “must haves” differentiated the product. Additions to this list resulted from competitive reviews or extensions to existing capabilities suggested by users. Typically a few “must haves” were included each week and developers knew that these could delay the build (there were two or three occasions where a deadline was missed). “Nice to have” items were optional. There were between eight to twenty of these each week although they were added to a cumulative list. Approximately threequarters of these were included in each time box. “Defer” items were a combination of large and small features or changes that could be moved over time into the “must have” or “nice to have” group. Examples included the tutorial to complex encryption requirements that were included in subsequent builds. Human Computer Interaction. Although management realized that HCI was important it was considered secondary to programming and design staff were not hired until the first version of the product had been completed. The main proponent of a more expanded view of usability was the Director of Analytics. Rather than criticize the existing product he would usually make his point by identifying other products that he believed exemplified good design. The result of these comparisons was a complete re-write from the existing traditional Window’s-based interface (Icons, Menu’s and Pointers) to one that was much more intuitive and conversational. Despite the fact that Human Computer Interface issues were later seen as critical to the system and a great deal of time was spent in design, HCI was considered technically minor by the CEO. Cosmetic Changes. Semper viewed all non programming changes as important to customers and use but mainly “cosmetic”. There were numerous evolutions and changes to text, font, color, position and alignment. These were continuous and, in the words of a developer, were “tedious but not hard”. The frequency and approach used to manage these changes are described in Table 4. Refactoring. Code focused on functionality and was continually refined and improved. The first product build, created after just two weeks, was essentially a shell program but was designated version 1.0.0. Substantive changes incremented the second order digit and minor changes usually incremented the low order identifier. In addition there were several major changes. For example a complete change in system interface required that all of the modules be re-written simultaneously and the main analytic engine (over 6,000 lines of code) was completely re-written over a two month period. As such, in addition to continuous improvement through refactoring there were periods of intense improvement in function, usability, reliability, speed and stability.
Extreme Programming in Action: A Longitudinal Case Study
249
Table 4. Development Taxonomy
Type Structural
Feature
HCI
Cosmetic
Description Fundamental aspects or product core. Market and competitive requirements. Grouped into “must have”, “nice to have” and “defer” Usability issues such as placement of glyphs, screen dialogue and presentation. Icons, glyph, color, dialogue and position changes (not all simple).
Number of Changes <12
>100
>250
>1,000
Approach Simple Design & System Metaphor On Site Customer, Planning Game, Simple Design, & Refactoring. On Site Customer, Small Releases, Continuous testing, Refactoring. Onsite Customer, Refactoring, & Small Releases
Simple Design. Development was guided by simple principles but trying to avoid architectural constraints or what the CEO called “painting themselves into a corner”. Problems were designated BR or AR. BR were those that impacted customers and had to be fixed before revenue. AR were those could be solved with the increased resources provided after revenue. The planning game arbitrated between the cost of desired features and refactoring delivered functionality that was later improved. Conceptually developers were told to consider the metaphor of a ‘modern digital camera’, where a high level of complexity and functionality was behind a simple interface that users could employ in a myriad of sophisticated ways. Small Releases. Time boxing was part of the discipline. Consequently, developers released a new version almost every second week. This was relaxed during major revisions and accelerated to almost daily versions when approaching a major deadline. In addition, management and users were also given replacement modules (DLLs) that delivered specific functionality, fixed problems or generally improved the code. Despite periods where developers complained that the ongoing short term focus impeded delivery of a more systematic and quality oriented product, management remained committed to the concept of small releases. In a twelve month period developers delivered approximately 35 complete versions, with almost two dozen non-developer compiles and more than 150 replacement DLLs over and above the build cycle. Working through the planning game, management and the developers laid out a build schedule that was tracked using basic project management tools and rarely modified. System Metaphor. Communication was simple and directly facilitated most often by the data model, the program itself, and the fact that with the exception of the Director of Finance and two junior business analysts all employees had been formally trained in systems analysis or computer programming. Design of the products was handled through a combination of strategic and tactical adjustments. Joint Application Design
250
P. Tingling and A. Saeed
(JAD) sessions were used to begin product development and after each of the beta programs and before each of the three program redesigns. Tactically, designers and management met twice a week to receive the weekly build and to review progress, bug status and planned revisions to the upcoming version schedule. We next draw conclusions about the degree and extent of appropriation, discuss limitations and suggest future research and implications.
5 Conclusions and Summary Semper’s partial adoption of agile principles reinforce other findings that indicate up to two thirds of large companies have adopted ‘some form’ of agile methods [8] which are then blended with more traditional practices. Practitioners have not adopted XP in an all or none action and faithful appropriation of all principles seems to be a rarity. Initially Semper implemented only eight principles. Interestingly, three of the remaining four (continuous testing, shared code and coding standards) did later become more fully and faithfully appropriated. At first, it would appear that Semper should have applied more diligence in following agile principles from the outset. Alternatively, we suggest that these principles may have required a certain level of maturity not present in the organization’s employees and processes. Coding standards were initially eschewed by management in favor of creativity, until a basic level of code had been developed. While the programming staff themselves favored standards, they were unable to agree on the specifics, until staff turnover and management support of a standard pressured them to do so. Similarly, developers still sought code ownership despite a concerted effort by management to curb such behavior. Paired programming, the only principle that did not manage to gain any momentum continues to be supported by management but has yet to be embraced by the developers. Therefore, we find that temporal conditions and maturity affect the extent to which extreme programming principles are adopted and that both management and developer cultures are salient considerations. Consequently, future research should consider both cultural conditions and managerial preferences. Acknowledgments. We are grateful to Semper Corporation. This research was supported by a grant from Simon Fraser University.
References 1. Geogiandou, E.: Software Process and Product Improvement: A Historical Perspective. Cybernetics and Systems Analysis 39(1), 125–142 (2003) 2. Gibbs, W.W.: Software’s Chronic Crises. Scientific American 271(3), 89–96 (1994) 3. Berry, D., Wirsingm, M., Knapp, A., Simonetta, B.: The Inevitable Pain of Software Development: Why there is no silver bullet. Radical Innovations of Software and Systems Engineering in the Future. Venice (2002) 4. Brooks, F.P.: The Mythical Man Month. Addison-Wesley, London, UK (1975)
Extreme Programming in Action: A Longitudinal Case Study
251
5. Duggan, E.W.: Silver Pellets for Improving Software Quality. Information Resources Management Journal 17(2), 1–21 (2004) 6. Beck, K.: Extreme Programming Explained: Embrace Change. Addison -Wesley, Reading, Mass (2000) 7. HighSmith, J.: Agile Software Development Ecosystems. In: Cockburn, A., HighSmith, J. (eds.) Agile Software Development Series, Addison-Wesley, Boston (2002) 8. Barnett, L., Narsu, U.: Best Practices for Agile Development. on accessed, (January 15, 2003, 2005), http://www.gigaweb.com 9. AgileManifesto: The Agile Manifesto (2001) 10. Goldman, S.L., Nagal, R.N., Preiss, K.: Agile Competitors and Virtual Organizations. Van Nostrand Reinhold, NY (1995) 11. Williams, L., Cockburn, A.: Agile Software Development: IT’s About Feedback and Change. Computer 36(6), 39–43 (2003) 12. Yin, R.K.: Case Study Research: Design and Methods. Sage Publications, Thousand Oaks, CA (1994) 13. Spradley, J.P.: The Ethnographic Interview. Holt, Rinehart and Winston, New York (1979)
Holistic Interaction Between the Computer and the Active Human Being Hannu Vanharanta and Tapio Salminen Tampere University of Technology, Industrial Management and Engineering, Pohjoisranta 11, 28101 Pori, Finland
Abstract. In the design, development and use of computer-based decision support systems, the ultimate challenge and goal is to arrange and organize successful interaction between the computer and the active human being. This paper therefore examines the extent to which, by applying the hyperknowledge framework developed by Ai-Mei Chang, the holistic concept of man developed by Lauri Rauhala, and the Circles of Mind metaphor developed by Hannu Vanharanta for decision support systems, these systems can be made to emulate human cognitive processes. The approach is a new one, and it represents an emerging paradigm for achieving emulation and synergy between human decision-making processes and computer configurations. Keywords: Holistic, Interaction, Human Beings, Computer Systems, Concepts, Constructs, Architecture, Co-Evolution, Decision Support Systems.
Holistic Interaction Between the Computer and the Active Human Being
253
architecture can easily be applied to many computer systems as well to new areas of computer usage where holism plays an important role. 1.1 A Philosophic Model of the User The Holistic Concept of Man (HCM) is a philosophic model that has been described in a number of books and articles by Rauhala, a Finnish phenomenological philosopher and psychologist [1] [2] [3]. Rauhala’s source material consists, in particular, of the works of two well-known German philosophers: Husserl [4] and Heidegger [5]. The advantage of the holistic concept of man, compared to the theories presented by Husserl and Heidegger, is that it has a rather simple construction and is therefore more understandable for non-experts. 1.2 The User’s Mind The Circles of Mind metaphor [6] opens up the mind’s most important sectors: the memory system, interpretation system, motivation system and automatic system. These systems and their content must be reinforced by the computer system so that the user feels supported when using the computer. 1.3 The User as a Decision Maker The hyperknowledge framework [7], in turn, views a decision maker as cognitively possessing many diverse and interrelated pieces of knowledge (i.e. concepts). Some are descriptive, some procedural, some are concerned with reasoning, and so forth. The mind is able to deal with these in a fluid and inclusive manner via the controlled focusing of attention. That is, the decision maker actively acquires (recalls, focuses on) desired pieces of knowledge by cognitively navigating through the universe of available concepts. The result is then a hyperknowledge view of the underlying concepts and content involved in the decision making. 1.4 Computer Architecture By combining these three above-described different views, we end up with a new architecture for computer applications and constructs.
2 The Holistic Concept of Man Metaphor 2.1 Modes of Existence The Holistic Concept of Man (HCM) is a human metaphor. The basic dimensions of the metaphor consist of a body, a mind and a situation [2] [8]. A human being is an organism [9] which uses thinking processes and exists in particular and individually formed situations. The human being constitutes simultaneously of three modes of existence based on the above basic dimensions of the HCM, which cannot be separated from each other. According to the HCM, all three modes are needed in
254
H. Vanharanta and T. Salminen
order to make human existence possible and to understand the holistic nature of the human being. These modes of existence of the human being are called 1) corporeality, or existence as an organism with organic processes (the body) 2) consciousness, or existence as a psychic-mental phenomenon, as perceiving and experiencing (the mind), 3) situationality, or existence in relation to reality (the situation). Human beings have relationships and interrelationships that characterize their qualities as individuals in specific situations [10]. 2.2 Corporeality The first mode of existence, corporeality, maintains the basic processes of existence and implements the physical activities of the human being. The human brain and sense organs (internal and external) are needed when observing the objects and concepts in a specific situation which send meanings to the observer [10]. 2.3 Consciousness In consciousness, the active human being experiences, perceives and understands the phenomena encountered. This is more than a mere thinking process (cf. Res cogitans) because qualities such as experiencing, perceiving and understanding are also involved. When human beings uses their inner and outer senses to receive physical signals from the environment, the situation provides the consciousness with a meaningful content, and the human being understands this content, i.e. perceives the corresponding construct(s) or object(s) or concept(s) to be “something.” As a result of an act of understanding, there emerges a relationship, or a meaning or meanings. The HCM metaphor separates the terms consciousness and mind. Consciousness is the totality of the psychic-mental existence of the human being. Mind is used in a more functional sense to refer to the psychical and mental processes which, when taken as a totality, form the mode of existence called consciousness. Mind is a continuous process in which meanings emerge, change and relate to each other. Meanings are linked together in the mind and collectively form networks of meanings. The totality of these networks is called the world view of a human being. In relation to the world view, a human being understands both old and new phenomena. “Cause maps” or “mental models and maps” used in the cognitive psychology approach correspond to some degree to the notion of the world view. The psychological term “memory” also corresponds to some degree to the world view in the HCM metaphor. [10]. 2.4 Situationality Situationality is the third dimension of human existence. Situationality emphasizes that a human being exists not only “as such,” in a vacuum or isolation, but always in relation and interrelation to reality with its multitude of aspects. The world, or reality, is all that exists concretely or ideally, i.e. the world with which people in general can
Holistic Interaction Between the Computer and the Active Human Being
255
Will
Scientific information Everyday knowledge
Sense s Brain s
Mind
Intuition Feeling Belief
Activities Lim
bs
Object(s)
World view
relate to. Situation (or the situation of life) is that part of the world with which a particular human being forms relationships and interrelationships. [10]. Situationality is always unique to each individual. Human beings understand the same object(s) in their situation in an individual way.
Corporeality Consciousness Situationality
Fig. 1. An active human and different types of meaning [10]
3 The Circles of Mind Metaphor 3.1 The Theatre Metaphor The HCM metaphor, or the idea of the human being in a specific situation as a totality, is not sufficient to be used for the development of a brain-based system. The metaphor lacks the new, current research findings on the unconscious part of the human brain. Baars [11] has combined psychology with brain science and the old conception of the human mind to create a metaphor based on the workspace of the mind. The totality can be explained through the theatre metaphor, where the self as an agent and observer behave as if on the theatre stage. Close to the stage is the unconscious part of the brain (the audience), which is divided into four main areas: the motivational system, automatic systems, interpreting system and memory system. The spotlight controller, context and theatre director are also present. 3.2 The Circles of Mind Construct A combination of the HCM and the theatre metaphor of Baars led to our new particular and very practical metaphor. This was named the Circles of Mind metaphor [6](Vanharanta, 2003). The Circles of Mind metaphor was also designed as a physical entity so the metaphor could be used for design purposes. This has led to the idea of a brain-based system which contains the physical body following the Cartesian mind-body relationships, i.e. as a thinking thing and an extended thing [9]. One version of the Circles of Mind metaphor is presented in Figure 2 below.
H. Vanharanta and T. Salminen
R
Y
S
Co beh ntex ind t op the era s ce tor s n es
E
e
D
ing Read n tio Ac ntrol co
Th eu au nc d on Em ien sci re s oti ce ous po ona n ses l F expr acial essio ns
liz rba Ve
Th eP la y er In s spe n er ec h
Imagible
Viol ative
Goals conflicts
Details of language
nstructive
ed Imagings feelin
Pl ay er Int uiti s ons
Motives Objectives
ms ea Dr
Di rec tor
S
Thinkin g
udes Attit
E
Th e
onceptual
S
M
Th eu au nco di ns en ci ce ou s Skill mem ory
l Loca
ST EM
E
C
S M E L S T L S Y O
S
S
T
S
T I ON
E
S
A
I V
S
Y
T O M A
E
T
N
S
U
O
O
S E
O R Y
A
O
N
S
M
M
P
Vi im sual a ge ry
ling Fee ng ari He
E
S
B
Consci ous Exper i ence on the St age
D
S
xts nte Co
Ob eve ject a rec nts nd ogn itio n
S
M
Th eu au nco di ns en ci ce ou Au s gra tob io me phic mo al ry Declara See tiv ing memor e y Smelling Beliefs Facts g Tastin Lexicon on rati Vib at He e g led ow lf Kn onese e g of led ow s e I Kn other edg rld l o f s r o w o o ew at es r n e K f th op scen xt o nte he Co i nd t beh
ST EM
Synta analy ctic sis
N
S
Lo ng Go -term al
erm
tors nes
E T I NG
E P
Social es inferenc and ech Spe n face gnitio reco
I
T
s lue E Va nd s a es sk n g us Ri le l io a sc ch on e nc enc e u di Th au
256
Fig. 2. The Circles of Mind metaphor [6]
Res cogitans/A Thinking Thing was evident here, giving us the four main parts for the architecture of a new computer system. Res extensa/An Extended Thing (body) represents the other dimension of man, which physically uses the computer keyboard and gives the power of functionality to the computer application to be used on the stage.
4 The Hyperknowledge Framework 4.1 Hyperknowledge The hyperknowledge framework views the decision maker, i.e. here an active computer user, as cognitively possessing many diverse and interrelated pieces of knowledge (i.e. concepts). Some concepts are descriptive, some procedural, some are concerned with reasoning, and so forth. The mind is able to deal with these in a fluid and inclusive manner via the controlled focusing of attention. That is, the decision maker actively acquires (recalls, focuses on) desired pieces of knowledge by cognitively navigating through the universe of available concepts. To the extent that a DSS emulates such activity, interacting with it should be relatively “friendly,” natural and comfortable for the user. That is, the DSS can be regarded as an extension of the decision maker’s innate knowledge management capabilities. The decision maker is able to contact and manipulate knowledge embodied in the DSS as a wide range of interrelated concepts. The decision maker is able to navigate through the concepts of the DSS in either a direct or an associative fashion, pausing to interact with it. Thus, the hyperknowledge framework regards a decision support environment ideally as an extension of the user’s mind or cognitive faculties. Its map of concepts and relationships extends the user’s cognitive map, pushing back the cognitive limits on knowledge representation. Its knowledge processing capabilities augment the
Holistic Interaction Between the Computer and the Active Human Being
257
user’s skills, overcoming cognitive limits on the speed and capacity of human knowledge processing. In the following passages we summarize, on a technical level, the major contents and functionality of a DSS specified as per the hyperknowledge framework. For further details, readers can refer to Chang, Holsapple, and Whinston [12], [7] and study also the prototype applications based on the Vanharanta’s framework. [13]. 4.2 Decision Support Content and Functionality According to the hyperknowledge framework, a decision support system is defined, architecturally, in terms of a language system [LS], a presentation system [PS], a problem processing system [PPS], and a knowledge system [KS]. The LS is the universe of all requests the DSS can accept from the user, and the PS is the universe of all responses the DSS can yield. The KS is the extensive universe of all knowledge stored in the DSS. The PPS has a wide range of knowledge management capabilities corresponding to a wide range of knowledge representations permitted in the KS. The KS holds concepts that can be related to each other by definition and association. These concepts and their relationships could be formally expressed and processed in terms of data base, formal logic and model constructs. Associative and definitional relationships among concepts in the KS are the key to creating a hyperknowledge environment and navigating within it. The KS also contains more than just models and data. It contains reasoning, assimilation, linguistic and presentation knowledge (see Figure 3, the human system metaphor developed by Dos Santos and Holsapple. [14].
Fig. 3. Structure of the decision support system [14]
The dynamics of the DSS involve transformations of messages from the user's language system to the decision support system’s LS. These transformations are carried out by the PPS (subject to the content of the KS) using four basic functions: translation (t), assistance (a), functionality (f), and presentation (p). The user interface and functionality of a DSS specified as per the hyperknowledge framework are depicted in Figure 4.
258
H. Vanharanta and T. Salminen
User Interface
Functionality K ling m
m
SI
inner loop
User
m
SO
a
DI
K
outer loop f
p
K K
m
reas desc proc
DO
K
Language System
Problem Processing System
pres
Knowledge System
Fig. 4. User interface and functionality of the hyperknowledge framework [13]
The knowledge symbols in Figure 4 signify the following: Kling Kreas Kdesc Kproc Kpres
= linguistic knowledge available in the KS = reasoning knowledge available in the KS = descriptive knowledge available in the KS = procedural knowledge available in the KS = presentation knowledge available in the KS
4.3 Working Space When a decision maker is working in the hyperknowledge environment, a concept must be “contacted” before it can be “impacted” (affected) by or have an “impact” on the decision maker. Contact is the recognition of a concept in the environment and entails sensing the existence of the concept and bringing it into focus. Either implicitly or explicitly, the user is provided with a “concept map” as the basis for establishing contacts [13]. The concept map indicates what concepts are in the environment and what their interrelationships are. An implicit map is external to the DSS (e.g. in the user’s cognitive environment, which may be burdensome as the KS becomes complex). An explicit map is provided by the DSS itself and can be regarded as a piece of descriptive knowledge held in the KS, describing the present state of its contents. With a concept map as the original contact point within the environment, the user can make controlled purposeful contacts with any desired concept in the hyperknowledge realm. Users can focus their attention on any part of an image, multiple windows can provide different views of parts of the same image, and different images of the same underlying concept can be seen in various windows. The result is extensive user interface flexibility, which is important in the facile and adaptive interface design.
5 Emerging Paradigm 5.1 Fusion Framework In the developed computer architecture, we have based our thinking on co-evolution by combining the HCM metaphor, hyperknowledge framework and the Circles of
Holistic Interaction Between the Computer and the Active Human Being
259
Mind metaphor in one design framework, i.e. a fusion framework. The basic idea has been to map computer constructs and computer applications according to our theories based on modern brain science and the basics of the HCM and hyperknowledge functionality. With the created fusion framework we can design various computer applications and alter the design of existing created knowledge and data bases. First, our created applications contain the same systems as integral to the human brain or emulate the business processes as the brain emulates reality with the brain processes. The knowledge structure therefore contains the same important areas as the unconscious part of the brain. 5.2 Functionality of the Fusion Framework Figure 5 shows the user’s brain processes interacting with the user interface via the computer screen. The functionality is described as the hyperknowledge functionality and the database construction as the unconscious part of the human brain. In contemporary Internet applications, it is possible to navigate through the data and then combine the information according to the user’s needs, just as the hyperknowlege functionality describes the active computer user [10]. Again, these new applications share the same construct – to support the user through the user interface and, furthermore, to support the basic human processes of the mind, i.e. interpretation, memory, motivation and automatic activities. On the other hand, the combination possibilities are huge and, therefore, we have to focus on creating efficient and effective computer content for the computer user in a context-specific situation. In our computer applications, we first describe the content and the objectives of the application itself. The creation of the context-specific ontology then becomes crucial.
Fig. 5. A human compatible computer system (Salminen & Vanharanta 2007)
The construction is new and can be applied in many ways, from application design to database and computer design purposes. Our goal is to demand more from the computer and its application design. We require the design to be more holistic for the user.
260
H. Vanharanta and T. Salminen
6 Conclusion In the developed computer architecture, we have based our thinking on co-evolution. In this kind of overall system design, the computer has been illustrated to have the same sub-systems as we have in our brains. This framework can be applied to many different applications which use hardware and software. We can increase our knowledge through computer interaction. Hyperknowledge is then created on the computer screen. The construction as such contains the basic ideas of co-evolution: self-development through the use of and interaction with computer. Some applications bring the user information automatically and others extend the user’s memory capacity. Some applications also help the user to interpret the current reality, while others may help motivate the user. There are even some applications which support all system areas. Therefore, all applications, to one extent or another, increase and support our brain processes. In the same way, we can work with concepts other than computers within the conscious experience of humans. If we put an object into the conscious experience, for example different business processes, it is possible to create the extroversion of the business processes through an application. The actor can then explore the concept and gain a holistic view and understanding of the matter. These kinds of applications need supporting ontologies, concepts as well as technology, to uncover the underlying models behind the motivation, interpretation, memory and automatic systems and how these different sub-systems can be used in real life applications. These models also need other living system concepts to evolve with the processes and make the applications all the more humanistic. Acknowledgments. The work behind this paper has been financed by the 4M-project (cf. National Agency of Technology in Finland DNro 770/31/06 and Nro 40189/06) in Industrial management and engineering department at Pori, Finland.
References 1. Rauhala, L.: The Hermeneutic Metascience of Psychoanalysis, Man and World, vol. 5, pp. 273–297 (1972) 2. Rauhala, L.: Ihmiskäsitys ihmistyössä, The Conception of Human Being in Helping People. Helsinki: Gaudeamus (1986) 3. Pihlanto, P.: The Holistic Concept of Man as a Framework for Management Accounting Research, Publications of the Turku School of Economics and Business Administration, Discussion and Working Papers, vol. 5 (1990) 4. Husserl, E.: Husserliana I-XVI, Gesammelte Werke, Martinus Nijhoff, Haag (1963-1973) 5. Heidegger, M.: Being and Time. Blackwell, Oxford (1962) 6. Vanharanta, H.: Circles of mind. Identity and diversity in organizations – building bridges in Europe Programme XI th European congress on work and organizational psychology 14-17 May 2003, Lisboa, Portugal (2003) 7. Chang, A., Holsapple, C.W., Whinston, A.B.: A Hyperknowledge Framework of Decision Support Systems. Information Processing and Management, 30(4), 473–498 (1994) 8. Rauhala, L.: Tajunnan itsepuolustus, Self-Defense of the Consciousness. Yliopistopaino, Helsinki (1995)
Holistic Interaction Between the Computer and the Active Human Being
261
9. Maslin, K.T.: An Introduction to the Philosophy of Mind, p. 313. Blackwell Publishers, Malden (2001) 10. Vanharanta, H., Pihlanto, P., Chang, A.: Decision Support for Strategic Management in a Hyperknowledge Environment and The Holistic Concept of Man. In: Proceedings of the 30th Annual Hawaii International Conference on Systems Sciences, pp. 243–258. IEEE Computer Society Press, California (1997) 11. Baars, B.J.: In the Theatre of Consciousness. Oxford University Press, Oxford (1997) 12. Chang, A., Holsapple, C.W., Whinston, A.B.: Model Management: Issues and Directions. Decision Support Systems 9(1), 19–37 (1993) 13. Vanharanta, H.: Hyperknowledge and Continuous Strategy in Executive Support System. In: Acta Academiae Aboensis, vol. Ser. B, 55(1), Åbo Akademi Printing Press, Åbo (1995) 14. Dos Santos, B., Holsapple, C.W.: A Framework for Designing Adaptive DSS Interface. Decision Support Systems 5(1), 1–11 (1989)
The Use of Improvisational Role-Play in User Centered Design Processes Yanna Vogiazou, Jonathan Freeman, and Jane Lessiter Psychology Department, Goldsmiths College London, University of London New Cross SE14 6NW {y.vogiazou, j.freeman, j.lessiter}@gold.ac.uk
Abstract. This paper describes the development and piloting of a user-centered design method which enables participants to actively engage in a creative process to produce intuitive representations and inspire early design concepts for innovative mobile and ubiquitous applications. The research has been produced as part of the EC funded project PASION, aiming to enhance mediated communication in games and collaborative environments through the introduction of socio-emotional information cues, represented in meaningful yet abstract enough ways to accommodate variable thresholds of privacy. We describe our design research methodology, which combines analytical approaches, aiming to uncover participant’s needs, desires and perceptions with creative, generative methods, with which participants inform and inspire the design process.
The Use of Improvisational Role-Play in User Centered Design Processes
263
physiological data over time and the association of the data with communication exchanges with other people (former) or places the users were visiting (latter) at the time the changes in their physiological states were recorded. In a similar line of thought, we are particularly excited by the possibility of introducing such feedback in real-time in social and collaborative contexts and observing the kind of spontaneous individual and group behaviours that could emerge. For this purpose we have adopted a bottom-up, user centered design research approach in order to initially identify how people express various communication, personal and contextual cues in spontaneous ways that make sense to them. The benefits and future opportunities deriving from these research directions span across a range of application areas, in particular applications in which the communication and collaboration of individuals and groups through new technologies takes place. For instance, Reimann and Kay (2005) discuss the role of visualizations for groups of learners in improving upon their knowledge and performance. The authors consider groups as complex systems, where global dynamics can result from local interactions and propose visualizations as a means of providing team awareness. Research in social computing applications (Vogiazou, 2006) has shown that even minimal indicators of other people’s presence facilitate group awareness, which is beneficial for strengthening social bonds among groups and communities. Our interest in patterns of group behaviour and social dynamics in collaborative interactions, in work, learning and leisure oriented activities has motivated the initial phase of the research described in this paper. The goal is to identify through design and user research the kind of socio-emotional cues that can provide useful feedback in communication and to explore emergent group and individual user behaviours from the introduction of such cues. The studies we discuss in this paper, which are part of the EC funded PASION (Psychologically Augmented Social Interaction Over Networks) project aim to identify: • social, emotional and contextual information elements (situational cues, environmental context, and individual and group behaviours) that are relevant in mediated communication in collaborative work and social settings, and • potential real time and historical representations of these elements in the form of multimodal, non verbal/textual representations. • the relevance and importance of such cues in collaborative work and social gaming situations at different levels of privacy disclosure. Next we describe the design research method we deployed to address these issues.
2 Design Research The main premises of user centered design are to bring users closer to the design process and to help designers gain empathy with users and their everyday activities through the use of different methodologies. Role playing has been used in usercentred design workshops for the concept generation of innovative products in everyday life (Kuutti et al, 2002) as well as testing out design ideas with potential consumers (Salvador and Sato, 1999). In interaction design research, role playing has
264
Y. Vogiazou, J. Freeman, and J. Lessiter
been extensively performed with the use of low-fidelity prototypes to develop further design ideas, what Buchenau and Fulton Suri describe as ‘Experience Prototyping’ is usually based on improvising user scenarios that create opportunities for some kind of technological intervention or design solution (Laurel, 2003). These scenarios of use are often acted out either by users or designers with some kind of props or imaginary objects, aiming to identify potential breakdowns as well as design opportunities. This method of user involvement in the design process tends to generate potential or ‘futuristic’ functionalities for products that the design team is working on. The functionalities are then eliminated or developed further in the continuing process by the design team. Role play has the main advantage of facilitating empathy with the context of use while trying out early design ideas. When acting out everyday problems refrigeration technicians are confronted with, with designers as actors and target users as the audience, Brandt and Grunnet (2000) found that the users recognized the situations shown in the dramatized scenarios as ones they often experienced. The designers who performed the scenarios on the other hand found it harder to use drama in an unfamiliar context like this. In another study, role playing was used to elicit a first brainstorm among users about the potential functionalities of an interactive book, the Dynabook in the home environment. Both studies showed that drama can help designers to achieve a greater empathy for the users and the contexts of use. In our research, role play was not related to a particular prototype or imaginary object aiming to elicit ideas for functionality, but was used as an expressive medium for users to communicate emotional states and contextual situations. The provided props were open to interpretation and aimed to facilitate the acting itself, without binding the design process to a particular artefact. Previous research in role playing as a design methodology has outlined the difficulties in involving drama professionals as facilitators (Svanaes and Seland, 2004) because: a) introducing users to acting techniques can be very time consuming and is a separate activity from design – with drama exercises lasting for 4,5 hours the creative sessions need to be arranged at other time slots and b) drama professionals tend to focus on their subject of expertise – teaching and facilitating the acting – rather than the generation of design ideas and therefore need to be able to understand the purpose and scope of a generative workshop. In our studies it was important to ensure that the participants were initially immersed in the themes and ideas of the workshop. Following a group discussion the role playing itself was presented as a game, so there was no need to provide any training in performance, it was sufficient to describe the activity and act out an example of what was asked, introduced by one of the facilitators. An innovative research method, combining analytical and generative approaches was developed and deployed in two user group workshops, which focused on collaborative work (at the Center for Knowledge and Innovation Research, in the Helsinki School of Economics, in Finland) and social gaming (at the Department of Computing and Informatics in the University of Lincoln, in the UK) respectively. The workshops were designed to identify relevant and potentially useful elements of personal, social and contextual information, represented in meaningful ways to be readily interpretable. User attitudes in relation to privacy and comfort with sharing these information cues were also explored.
The Use of Improvisational Role-Play in User Centered Design Processes
265
Both the collaborative work and social gaming workshops followed a similar structure, which encouraged participants to get immersed in the subject and discuss their views, before engaging in creative activities that required them to generate ideas and concepts for representations. The phases can be summarized as follows: • General group discussion and brainstorm. The discussion was focused on everyday collaborative work practices and different forms of play in either workshop, aiming to identify relevant information elements about individuals, teams and context. • Feedback on early sketches. Participants were shown the same set of rough sketches (see figure 1 for example), representing individual, collective and contextual states and cues and were asked to guess what they were meant to suggest. This initiated further discussion and suggestions on non-verbal representations. At the same time this activity acted as a warm-up, to prepare the generative session that followed and inspire participants to think about representations in a more abstract, broader sense. • Improvisational role playing. The role-playing was performed individually by each participant to come up with creative ideas about representing information using different modalities (e.g. visual, sounds, actions). Here we focus on this method in particular. • Card sorting activities. In this last task participants prioritized and grouped the main information elements that emerged in the initial group discussion. They were also asked to comment on when these elements need to be private and when they can be public.
Fig. 1. Left: sketch of a group state indicating collective activity (movement, excitement). Right: sketch of an individual in a calm environment.
For the workshop on collaborative work six male participants were recruited from the age group of 24-40, with professional experience of collaborative work, either as researchers or PhD students. For the social gaming workshop nine participants were recruited from the age group of 17-40, five of whom were female and four were male. Four participants were pursuing a postgraduate degree and the other five were A Level Psychology students. Participants had variable gaming experiences, ranging from online massively multiplayer games to traditional board and card games and physical street games. The role playing was not used as a re-enactment of a user scenario or for the evaluation of a design concept, but in an entirely generative way: participants were asked to act out in a non verbal way different situations that were relevant to the workshop theme. For example, some situations in collaborative work were: “You are confused by what your manager is saying to you in a conversation”, “You are very
266
Y. Vogiazou, J. Freeman, and J. Lessiter
stressed about a forthcoming deadline”, “Being on the bus or train to work, very crowded during peak time”. Situations related to social gaming were along the lines of: “You and your team are exploring a new area – a danger is approaching”, “You have developed bonds with a team of people”, “You are playing a mobile game in a really crowded café”. Participants were asked to pick one situation and one modality that they should use for their representations from a box, both written on strips of paper. Examples of modalities were: “Draw on paper”, “Act out a situation, improvise, mimic an activity” and “Make a sound orally”. The activity was introduced as a game of ‘Charades’, which appears in variations across cultures. Part of the challenge for participants was to represent individual, group and contextual information cues in a way that the rest of the group could guess what was being represented. Various props (e.g. a tambourine, plasticine, paper, coloured pens and cups) were brought and used to express different modalities (e.g. auditory, visual, tactile). The workshop was recorded on audio and video. The video recording of the role playing workshop was used for the further generation of concepts and design ideas. Video recordings, still photographs and sketches from these role-playing activities were then used as generators (Holmquist, 2005), they formed part of a process that generated inspiration, insights and ideas – the beginning rather than the end in concept development. Following the two workshops we organised a third one (at Goldsmiths College, University of London), which was primarily generative, aiming to explore in more depth the key emergent themes from the previous workshops. We used a similar method to the one used in the previous workshops. Two teams of graduate designers in the age of 23-30 were recruited (5 male, 1 female) to generate a breadth of concepts and multidimensional representations of individual, collective and contextual states. The workshop was structured as follows: • Brainstorm and concept mapping. Participants were asked to discuss the key concepts of ‘group power’ and ‘connecting’ in the context of different situations, taking into account location, user attributes and collaboration, either work or leisure related. They documented the generated ideas by drawing collaboratively a ‘concept map’ (Novak, 1998) on large sheets of paper. This acted as point of reference for further discussion and debate around the ideas. • Individual role play. Role-playing activity performed individually to come up with creative ideas about representing various situations using different modalities (e.g. visual, sounds, drawing, modeling, actions). Similarly to the earlier workshops, participants had to choose randomly a ‘situation’ to represent and a modality to use. • Collaborative role play. Role-playing in pairs: participants acted out together an idea they generated in the earlier discussion using various props. A range of props was provided to facilitate improvisation and idea generation on the fly, including a mixer with many different sounds in order to experiment with representations (figure 2). The mixer had two CDs: one with ambient sounds (e.g. park, street noises) and another CD with short sound effects (e.g. clapping, stampede). These could also be used in combination with a touch microphone attached under the table. The touch microphone allowed participants to produce sounds spontaneously by tapping on the table or moving objects on its surface, enhancing the role playing experience and the richness of the representations created. The design graduates
The Use of Improvisational Role-Play in User Centered Design Processes
267
Fig. 2. Participants exploring and then using the audio equipment for sound based representations
produced detailed multimodal representations using, for example, samples of background sounds to represent emotional states and environmental situations, and combining traditional design processes like sketching and modelling with acting.
3 Insights from Improvisational Role-Playing Activities The two user workshops in Finland and the UK produced spontaneous representations with noticeable cross-cultural similarities. For example, we observed an open posture, shaking hands (or a tambourine) as a representation of positive affect relating to celebration or excitement. A more closed body posture indicated negative affect, namely confusion or sadness in collaborative work and gaming scenarios respectively. Participants in all three workshops engaged with the process and gave positive feedback; their interactions became more spontaneous during role play. Their nonverbal representations were very compelling in presentation and encouraged the continuous involvement of the rest of the group, as they tried to guess what was being represented as closely as possible. We found it easy to change the activity on the fly in the workshop, because of its flexible and non-prescriptive nature; participants could act out representations on their own or improvise collaboratively in pairs. The collaborative acts were more detailed and made extensive use of the available props. In the first two workshops, participants used visual representations and in particular actions (as opposed to static poses or drawing) more than other modalities, in spite of encouragement to explore all modalities. Often participants would try and combine modalities (e.g. drawing and then performing some gestures in the relation to the drawing) in order to communicate their situation more accurately. The design graduates who participated in the third workshop discovered the use of sound as a powerful creative tool through mixing the different sounds provided. Modifying the available props in future workshops could therefore reveal more emergent representations and encourage a diverse improvisational play. Below we present a selection of the generated representations that illustrate a variety of individual, group or contextual cues in collaborative work practice and social gaming, using various media (e.g. sound, hand gestures, poses, drawing and actions). 1. Using particular postures and movement to indicate personal states (confusion, stress, sadness) Posture tuned out to be a powerful means of communicating individual emotional states and social cues in role play. The postures and expressions of confusion (figure 3) were sketched out after the workshop to illustrate possible visual representations of confusion in technology mediated communication.
268
Y. Vogiazou, J. Freeman, and J. Lessiter
Fig. 3. Two different postures showing lack of understanding, confusion in communication. Sketches outlining the posture used to represent confusion and lack of understanding.
Fig. 4. One participant kept moving in a loop to indicate a high level of stress
Even when not using the whole body for acting, posture could be suggested with other means. A participant in the social gaming workshop communicated body posture in a rather abstract way using his fingers. An imaginary sad figure, represented by a bended finger and one finger moving away from it, showed the growing distance between two team players. This inspired the sketch in figure 5:
Fig. 5. Growing distance between two players
2. Using a continuous sound for ‘context’ and short sounds for an individual state or alert Sound was used to communicate a sense of atmosphere. One participant in Finland made a continuous noise orally (i.e. “bla bla bla bla bla”) occasionally interrupted by sounds of yawning to suggest boredom during a presentation, which the rest of the group understood. A social gaming participant produced an intense and continuous sound alert to indicate approaching danger. The sound (generated by beating a spoon inside a glass) became more intense and loud towards an imaginary player (represented by an object) to signify some kind of danger getting closer. This was also easily perceived by the rest of the group. In the third workshop, graduate designers experimented by combining techniques they were familiar with such as sketching, with acting or sound creation on the fly, by using a combination of sounds from the mixer. Sound was a good tool for communicating environmental cues. For example, the noise of traffic and a stampede of horses were played to represent crowd flow, while the designer drew the sketch in figure 6 to illustrate the flow of people towards different directions in rush hour.
The Use of Improvisational Role-Play in User Centered Design Processes
269
Fig. 6. Crowd flow during rush hour, accompanied by the combination of city sounds with the noise of a stampede
3. Using lively sounds and an open posture to indicate excitement Open postures were used in all workshops to communicate positive affect, with crosscultural similarities. One participant in the collaborative work workshop (Finland) held the tambourine up and shook it to show joy. Similarly, in the social gaming workshop (Lincoln) another participant used an open gesture and moving wrists to show the celebration for victory in a game.
Fig. 7. An expression of celebration of success (left) and victory in a game. The middle sketch illustrates the same posture.
Similar representations of excitement emerged in the third generative workshop, illustrated by shaking a pair of maracas. A different one was the making of an exclamation mark from plasticine and the drawing of ‘emoticons’ (smileys), a rather common representation of joy. Excitement was also communicated through a juxtaposition of natural sounds – the sound of animals (monkeys) in the background of calm, environmental sound. 4. Size representing status indicator The size of a figure was used in a drawing to suggest that a player holds a higher status in a game, in the social gaming workshop.
Fig. 8. Size of figure represents hierarchical status; circle indicates one’s own team
270
Y. Vogiazou, J. Freeman, and J. Lessiter
5. Representations of private space Participants’ attitudes to privacy issues were explored through discussion and card sorting in the first two workshops to identify different levels of privacy. The concept of personal or ‘communication-free’ space also emerged in the third, creative workshops in which ideas on privacy and personal space were visualised in different ways. For example by marking the space with a line made out of objects or creating an ‘isolation tank’ which completely disconnects all communication and external stimuli. The ‘tank’ was also sketched as an ‘isolation island’, a kind of mobile ‘cloud’ that protects the person from the intrusion of wireless communication when this is not desired. In some of the performances in the third workshop, a participant would try to engage a ‘stranger’ in conversation, for example by playing lively natural sounds (e.g. monkeys), making eye contact, pointing out objects, getting closer to the other person or drawing links between individuals to show connection. The other person would respond by trying to maintain his or her privacy, for example by hiding behind sunglasses/ a book and moving further away. This performance also illustrated the idea of a state of ‘disconnection’ and maintaining one’s privacy and personal space.
Fig. 9. Different representations of private space
6. Varying degrees of disagreement were represented with ‘emoticons’ and gestures In the third workshop, the design graduates created representations to show disagreement within a group discussion or disapproval of a person, with varying degrees. For example, in the sketch in figure 10, gradual disagreement is represented by a ‘smiley’ that eventually stops smiling and responds with ‘abuse’. Another participant drew different icons for disagreement and then smashed a plasticine model of himself to show complete rejection and exclusion. An interesting representation through role-play which was fun to observe, was performed by a designer who pretended he was having a discussion with another participant (who had no idea on what he was trying to communicate). He made gestures of anger shaking his finger at the other person and then hit his fist on the table with the contact microphone, making a very loud sound and ripping up a sheet of paper.
The Use of Improvisational Role-Play in User Centered Design Processes
271
Fig. 10. A sketch of a meeting in which gradual disagreement results in abuse! Different indicators of disagreement
4 Conclusion The combination of analytical and generative methods worked well by initially immersing users in the ideas of the PASION project, helping identify their needs and desires and then engaging them in communicating those ideas in interesting ways, that can be further explored and developed through a design process. The initial discussions introduced participants to the themes of collaboration, connection to other people and non verbal communication. Asking participants to guess what the sketchy drawings meant was also a way of encouraging them to consider more abstract non verbal representations of personal, environmental and collective states and set the scene for the role play. By introducing the role play activity as a fun ‘Charades’ game and demonstrating an example, we shifted the focus from trying to be a good actor to trying to come up with interesting ideas. Role playing and experimentation with different media also opened up a range of creative possibilities for the participating graduate designers, enabling them to enrich initial ideas, to bring them to life from a one-line sentence written on a piece of paper to an engaging performance. Because the activity was not bound to a particular artefact or technology, common to other uses of role play, where a user scenario is acted out to identify product functionalities or solutions to design problems, the generated representations were open to interpretation and diverse in the use of expressive media (actions, props, sound, drawing). In the future, we would like to see how this kind of improvisational role play can be applied in the exploratory design research phase for other innovative products and applications, which are not necessarily focused on non verbal representations. The concepts generated through the activities discussed in this paper demonstrate that improvisational role playing can be a powerful tool for both participants and designers: a) enabling participants to engage creatively in user centred design workgroups, and b) generating useful initial user input for the design process that can be then developed further for the design of easily interpretable and intuitive visualizations and interfaces. This method proved cost and time effective, compared to other role playing methodologies, involving drama professionals as facilitators, in which some training in acting needed to be provided. Most importantly, the method generated valuable concepts and ideas for novel representations of socio-emotional and situational states, which became part of the core design process for the PASION project. These representations are currently being developed further through sketching, mock-ups for application concepts and as user interface design elements that can be trialed with users.
272
Y. Vogiazou, J. Freeman, and J. Lessiter
Acknowledgements. The research is supported by the European Community under the Information Society Technologies (IST) programme of the 6th Framework Programme (FP6-2004-IST-4) – Project PASION (Psychologically Augmented Social Interaction Over Networks). The authors would like to thank all our participants and Nela Brown (sound artist) who planned and arranged the set up for the sound experimentation in the third workshop.
References 1. Charades, Wikipedia definition and rules of play at http://en.wikipedia.org/wiki/Charades (last accessed on 3/11/06) 2. Brandt, E., Grunnet, C.: Evoking the future: drama and props in user centered design. In: Cherkasky, T., Greenbaum, J., Mambrey, P. (eds) Proceedings of Participatory Design Conference, New York, CPSR (2000) 3. Buchenau, M., Fulton Suri, J.: Experience Prototyping. In: Proceedings of the DIS2000 conference, pp. 424–433. ACM Press, New York (2000) 4. Holmquist, L.: Practice: design: Prototyping: generating ideas or cargo cult designs? Interactions of the ACM 12(2), 48–54 (2005) 5. Kuutti, K., Iacucci, G., Iacucci, C.: Acting to Know: Improving Creativity in the Design of Mobile Services by Using Performances. In: Proceedings of the 4th Conference on Creativity & Cognition, Loughborough, UK (2002) 6. Laurel, B.: Design Research: Methods and perspectives, pp. 49–55. The MIT Press, Cambridge, MA London (2003) 7. Lindström, M., Ståhl, S., Höök, K., Sundström, P., Laaksolathi, J., Combetto, M., Taylor, A., Bresin, R.: Affective diary: designing for bodily expressiveness and self-reflection. In: CHI ’06 Extended Abstracts on Human Factors in Computing Systems, Montréal, Québec. Canada, ACM Press, San Francisco (2006) 8. Nold, C.: BioMapping Project. (last accessed on 25/01/06, 2006)Available online at http://biomapping.net/press.htm, 9. Novak, J.D.: Learning, Creating, and Using Knowledge: Concept maps as facilitative tools for schools and corporations. Mahwah, N.J., Lawrence Erlbaum & Assoc. 10. Paulos, E.: Connexus: a communal interface. In: Proceedings of the 2003 conference on Designing for user experiences, pp. 1–4. ACM Press, San Francisco (2003) 11. Reimann, P., Kay, J.: Adaptive visualization of user models to support group coordination processes. In: Paper presented at the 2nd Joint Workshop of Cognition and Learning through Media-Communication for Advanced e-learning, Tokyo, Japan (2005) 12. Salvador, T., Sato, S.: Playacting and Focus Troupe: Theater techniques for creating quick, intense, immersive, and engaging focus group sessions. Interactions of the ACM 6(5), 35– 41 (1999) 13. Svanaes, D., Seland, G.: Putting the users center stage: role playing and low-fi prototyping enable end users to design mobile systems. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’04, Vienna, Austria, ACM Press, New York (2004) 14. Vogiazou, Y.: Design for emergence: collaborative social play with online and locationbased media. IOS Press, Amsterdam (2006)
Quantifying the Narration Board for Visualising Final Design Concepts by Interface Designers Chui Yin Wong and Chee Weng Khong Interface Design Department, Faculty of Creative Multimedia, Multimedia University, 63100 Cyberjaya, Malaysia {cywong, cwkhong}@mmu.edu.my
Abstract. The narration board is a powerful design tool to help translate user observation studies into a storytelling format. It helps to communicate design values and ideas among the design team via visualising user scenarios in its proper context during the early design stages. This paper aims to discuss the narration board as a design tool to help the design team conceptualise and visualise user scenarios interacting with future design concepts within its context of use. Second part of the paper discusses how narration boards assist in generating ideations and visualising final design concepts by interface designers. Twenty (20) design projects (N=20) were examined to study and quantify two important factors, i.e. the components of the narration board in relation with the attributes of the final design concepts. A non-parametric correlation test was used to study the correlation coefficient between scores of the two factors. The results show that there is a statistically significant positive correlation between components of the narration board and attributes of the final design concept. Those with higher scores of components in narration board tend to produce better final design concepts, and vice versa. Keywords: Narration, Interface Design, Storyboard, and design concepts.
multi-disciplinary team to share the same vision and theme on a design project, there needs to be a means for communicating high-level concept designs across the team. Hence, narration or storytelling, has become an important channel for depicting scenarios and sharing visions of design ideas and concepts in the design process. The objectives of this paper are twofold. Firstly, to discuss on the narration board as a design tool to help the design team conceptualise and visualise user scenarios interacting with future design concepts within its context of use. Secondly, this paper attempts to quantify the components of narration (story) board in relation with the attributes of the final design concepts.
2 Storytelling and Scenarios 2.1 Rationale of Storytelling User researchers or ethnographers conduct user studies to elicit user requirements during the early stages of the design process. This is meant to have a closer understanding of how users behave and interact with artefacts within the real environment. Such studies will highlight social activities, trends and values, which are then analysed and incorporated in the scenario-building process to depict user personas in the context of use. Storytelling is perceived as an acceptable channel to share similar beliefs and thoughts among the community. In general, stories are easily remembered by society rather than design principles, facts and figures. There are several reasons why stories are good communication catalysts for a design team [5], [11]: • Stories communicate ideas holistically, conveying rich yet clear messages. Thus, they are an excellent way of communicating intricate ideas and concepts in an easy-to-understand format. Therefore, stories allow people to convey tacit knowledge that might otherwise be difficult to articulate. • Stories are easily remembered by people because they are circulated with human passion and emotions. • Stories aid in team-building as it becomes a communication tool to share similar user-activities events and information that help in constructing vision. It eases the communication flow by nurturing a sense of community and help to build relationships, especially in a multi-disciplinary design team. • Storytelling provides the context in which knowledge arises as well as the knowledge itself, and hence it can increase the likelihood of accurate and meaningful transfer of knowledge. 2.2 Adoption of Storytelling into Scenarios Storytelling has been widely adopted in different disciplines, particularly in film, animation, education, design and business. For instance, Walt Disney uses storyboards for creating motion pictures and animation characters in their film production process. In the real business world, multi-national companies like IBM’s Knowledge Socialisation Project [6] use storytelling to share business visions within
Quantifying the Narration Board for Visualising Final Design Concepts
275
the organizations. Instructional designers may use storyboards to create learning objects for courseware design whilst developing educational systems. In design practice, storytelling has been used by designers to share the conceptual design prototypes and design solutions across the design team. Stories and event scenarios are collected from observational fieldwork studies to share user behaviour, cultural belief, and insight to the whole design team for design strategy. Stories are concrete accounts of particular people and events, in particular situations; scenarios are often more abstract and they are scripts of events that may leave out details of history, motivation, and personality [5]. Despite the differences, storytelling and scenarios are intertwined and both are difficult to be distinguished as design story or user-interaction scenarios. In the user requirement stage, user researchers collect user stories and observational information from fieldwork studies. Observational data is then translated and analysed into various themes and consumer insights. This helps to create realistic example and build scenarios as shared stories in the design team. User profiles, characters and goals form personas in scenario-building process. Cooper [4] first proposed the concept of persona and it has widely applied in academic and industrial practice and the concept has been integrated in various design projects. In essence, persona is an archetype person representing a user profile whereas scenarios inherently describe how a person interacts with the product in the context of use. As mentioned earlier, stories are easily memorized by people, the medium of presenting storytelling are crucial in making the stories memorable and the shared visions are inherently comprehended within the design team. Rosson and Carroll [10] described user-interaction scenarios are setting, actors, task goals, plans, evaluation, actions and events. However, the design scenario activities are illustrated in conventional text-based description, embedding characteristic elements of user interaction scenarios. Thus, next session describes how narrative scenarios are illustrated in pictorial form to conceptualise high-level of user-interaction scenarios.
3 Narration in Context 3.1 Narration in the Design Process Narration has been used and applied in different phases of the design process. Lelie [8] described the value of storyboards in the product design process. The term “storyboard” is used instead of narration board. In each phase of the design process, the form of storyboards has its own style of exploring, developing, discussing or presenting the product-user interaction. The design process ranges from analysis, synthesis, simulation, evaluation to decision phase. The consideration of visualisation style is illustrated differently in relation to design activities, purpose/goals, and its representation form in each phase of design process [8]. In our context, we discuss how narration boards are used in the interface design process during early conceptual design stages for ideation purposes. Figure 1 shows the detail requirements in the conceptual design phase for interface designers. There are two types of narration boards being adopted, that are the Narration Board (preideation) and Narration Board (post-ideation). For the Narration Board (pre-ideation),
276
C.Y. Wong and C.W. Khong
interface designers are required to translate the results of observation studies and market research into problem scenarios highlighting the problems or any issues that users face in the real environment. Different design aids such as mood boards, product design specification, and product positioning are also developed in assisting designers to achieve a holistic grasp on the concept designs being developed. The interface designers will then be required to produce another Narration Board (post-ideation) to project how their concept designs will be used in future scenarios. Research – User Studies – Ideation/Conceptual Design – Prototype – (Re) evaluate Fig. 1. Brief Conceptual Design Phase
In the realm of interface design, communication between designers and other team members are important for a successful design project. Narration board is a valuable design tool to the design team as it provides a common visual-based medium to share the common understanding of future design developments. Conventionally, scenarios are illustrated in textual descriptions to portray userinteraction scenarios [10]. For designers, visual-based mediums are important to assist them in ‘visualising’ and developing ideations for future design solutions. In such circumstances, scenarios described in visual forms accompanied by text explanations serve the communication purpose within the design team. Nonetheless, visual-based narrative is a valuable aid in provoking the thinking process, evoking ideations and to spur creativity to higher levels for interface designers. Several types of medium have been used to illustrate narration or storytelling in either analogue or digital format such as hand drawing, sketching, photography and video [2], [8]. There are some software tools developed for storytelling such as DEMAIS [1], and Comic Book Creator™ [3]. In developing narration boards, the interface designers are required to consider the characteristics of user personas, scenarios and context of use. They are able to select any medium of communication to illustrate the narrative scenarios. Due to time and cost considerations, hand sketching, marker rendering and drawing on layout pads are the most cost-effective way. The designers then scan their narrative scenarios into digital formats, which can then be posted online for sharing purposes. Alternatively, the interface designers can transfer the photographs they have captured during their observation studies using graphical software such as Adobe Flash™, Adobe Photoshop™ or Comic Book Creator™. 3.2 Types of Narration Boards Narration boards also play an important role in bridging the communication gap between the design team and other corporate departments such as top management, manufacturing department and the clients themselves. For top management and the clients, they usually do not have ample time to go through the detailed design levels. Hence, narration board assists in projecting the problem scenarios of the user experience. This is illustrated in the Narration Board (pre-ideation) (figure 2). On the
Quantifying the Narration Board for Visualising Final Design Concepts
277
Fig. 2. An example of a Narration Board (pre-ideation) depicting a scenario of a primary school pupil who is robbed on the way to home from school
Fig. 3. An example of Narration Board (post-ideation) illustrating a scenario of how E-Hovx plays as a role in protecting the primary pupil from a potential robbery
278
C.Y. Wong and C.W. Khong
other hand, top management and clients will be able to grasp the design solutions from the illustration of how the intended users interacting with the new product design concepts or design solutions in the future scenarios as demonstrated in Narration Board (post-ideation) (figure 3). An example of the E-Hovx project depicts a scenario faced by a primary school pupil encountering danger as he is robbed on his way home from school (figure 2). Figure 3 shows how the concept of the E-Hovx device assisting in the scenario by producing an alarm to alert the pupil and to ward off any potential harm.
4 Evaluating Narration Board for Visualising Final Design Concept 4.1 Methodology In order to evaluate how narration board (pre-ideation) is effective as a design tool in assisting interface designers for generating ideations and visualizing final design concepts, an empirical study was conducted by a usability specialist to examine the relation between the two variables of narration board and final design concept. The study examined twenty (20) different design projects developed by interface designers as test subjects (sample size N=20) at the Interface Design Department. Based on the above description, the null hypothesis (Ho) is “there is no relation between narration board and final design concept”. The alternative hypothesis (H1) is where ‘there is a positive association between the narration boards (pre-ideation) with final design concept for a design project.’ To produce a successful narration board, there are certain elements to be highlighted by the designers. Truong et. al. [12] highlighted that there are five significant elements for a narration board to convey its narrative across to the design team. The five elements of narration board are level of detail, inclusion of text, inclusion of people and emotions, number of frames and portrayal of time. On the other hand, there are also 5 attributes that determine how usable and functional the final design concepts are deriving from the input of the narration board. These 5 attributes of generating final design concept in the later conceptual design stage are form and functionality, usability (ease of use), user-artefact illustration, product semantics, and design appeal (emotional and mood). This study looks at 20 design projects (DP) developed by interface designers addressing a common theme of “i-Companion”. The DPs were selected based on the inclusion of narration board (pre-ideation) and final design concept in the design process. To quantify the effectiveness of the narration board, the usability specialist justified the scores based on a 1-5 point Likert scale (1 is the least, 5 is the most applicable of applying the elements) on the elements of the narration board. The elements look at level of detail, inclusion of text, inclusion of people and emotions, number of frames, and portrayal of time. Subsequently, a final score was given on the 20 DPs respectively based on the sum of the 5 narration board elements. Conversely, to evaluate the output of final design concepts, the final design concept scores were calculated with the total sum of the 5 attributes, i.e. form and
Quantifying the Narration Board for Visualising Final Design Concepts
279
functionality, usability, user-artefact illustration, product semantics, and design appeal (emotional and mood) on the 20 DPs respectively. 4.2 Results, Data Analysis and Discussion Result. The table below (table 1) shows the summary of the final scores of narration board and final design concepts for the 20 DPs. Table 1. A summary of the final scores on Narration Board and Final Design Concepts for 20 Design Projects Design Project (DP)
Data Analysis. To examine the relation of both variables (narration board and final design concept), non-parametric Spearman’s Rho test was conducted to study the correlation coefficient for the sample size (N) of 20. The diagram below (table 2) shows the ‘correlations’ matrix of the two variables (scores of narration board and final design concept). From the diagram, there is a statistically significant positive correlation between narration board and final design concept scores (rho=0.78, df = 18, p<0.001). Thus, those with higher scores of components in narration board tend to produce better final design concepts and vice versa. In order to examine whether there is a curvilinear relationship or any outliers on a correlation, the final scores of two variables (narration board scores and final design concept scores) were reported in a scatterplot diagram (figure 4). From the diagram, there was no evidence of a curvilinear relationship or the undue influence of outliers.
280
C.Y. Wong and C.W. Khong
Table 2. Spearman’s correlation produced by Correlate for the two variables (narration board and final design concept) Correlations
Spearman's rho
Narration Scores
Concept Scores
Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig. (2-tailed) N
**. Correlation is significant at the 0.01 level (2-tailed).
Fig. 4. A scatterplot diagram showing the relation of narration board scores and final design concept scores for 20 design projects
Discussion. Apparently, the result showed that there is statistically significant positive relation between the two variables of narration board and final design concept. In other words, the higher scores of narration board would have the tendency of producing higher scores for the final design concepts as well. This is proven true whereby an interface designer, who is proficient in design skills such as visualization, drawing, sketching and possess the ability of creativity, would be able to generate better design solutions and usable final design concept. However, there are some exceptional cases such as DP 5 and DP 8 indicating a contrary of higher scores in narration board compared to lower scores of final design concepts. The reasons are mainly due to time constraints that the interface designers have perhaps spent too much time and effort in producing the narration board (pre-ideation) and other design tools resulting in lesser time in developing ideations at the final conceptual design stage. In addition, the more elements of the narration board are applied, the higher the impact and the better the design solutions will be produced by interface designers. For
Quantifying the Narration Board for Visualising Final Design Concepts
281
instance, the narration board (pre-ideation) in figure 2 revealed that there was a certain level of details applied in narration board such as richness in colour, 8 narrated frames in sequence with number order, appropriate text inclusion in dialogues, portrayal of time and location, facial expression of users in the scenarios, issues highlighted in the user-scenario context revealing user requirements for problemsolving solutions. As a result, a device called E-Hovx was produced as the final design concept to help solve the scenarios as shown in figure 3. The E-Hovx device was also produced in high resolution 3D modeling and final prototype using Rapid Prototyping machine for detailed visualization and testing purpose.
5 Conclusion In general, narration boards provide a visual reference for interface designers in terms of illustrating user problems via scenarios and a medium to promote future userinteraction scenarios. In many ways, its approach is visually tacit. When applied in a holistic manner, narration boards are powerful visual communication aids that can provide input towards design strategies. Its use not only benefits the design team but also helps to convey ideations to top management and to clients. It is proven that designers are more panache and creative in employing mixed mediums for producing narration boards. In a nutshell, a narration board with the required components of level of detailing, inclusion of text, inclusion of people and emotions, appropriate number of frames and portrayal of time will greatly help in visualising and generating final design concepts for interface designers. Having said this, future studies will examine the relation of each component of the narration board with the attributes of the final design concepts. The future study will also cover which components will affect and result in the most usable and effective final design concept. Apart from this, various expert opinions from multi-disciplinary team will be gathered to examine the uptake of narration board in visualising final design solutions in practice. Acknowledgments. The authors would like to express gratitude to the staff and students involving in the design projects at the Interface Design Department, Faculty of Creative Multimedia at Multimedia University, Malaysia. We also wish to thank Selina Ooi whose works are mentioned in the paper. Disclaimer. The authors wish to emphasize that the names and images shown in this paper are only for training and educational purposes, and does not intentionally infringe the rights of individuals or organisations. Any company names, registered trademarks or commercial names mentioned or shown in the text solely belong to their respective owners, organisations and institutions.
References 1. Bailey, B., Konstan, J., Carlis, J.: DEMAIS: Designing Multimedia Applications with Interactive Storyboards. In: MM01, Ottawa, Canada, (Sept. 30 – Oct 5, 2001) pp. 241–250. ACM Press, New York (2001) 2. Cheng, K.: Storytelling Techniques Workshop. Available online at http://www.okcancel.com/archives/article/2006/02/storytelling-techniques-workshop.html
282
C.Y. Wong and C.W. Khong
3. Comic Book Creator. Available online, at http://www.planetwidegames.com 4. Cooper, A.: The Inmates are running the asylum. SAMS, Indiana (1999) 5. Erickson, T.: Notes on Design Practice: Stories and Prototypes as Catalysts for Communication. Available online at http://www.pliant.org/personal/Tom_Erickson/Stories.html 6. IBM Knowledge Socialization Project. Available online at http://www.research.ibm.com/ knowsoc/ 7. Khong, C.W.: A Review of Applied Ergonomics Techniques Adopted by Product Designers. In: Lim, K.Y., et al. (eds.) Proceedings of 4th APCHI and 6th ASEAN Ergonomics 2000 (APCHI/SEAES 2000), pp. 317–322. Elsevier, Singapore (2000) 8. Lelie, C.: The value of storyboards in the product design process. Personal and Ubiquitous Computing 10(2), 159–162 (2006) 9. Pedell, S., Vetere, F.: Visualizing use context with picture scenarios in the design process. In: Mobile HCI 2005. Salzburg, Austria, pp. 271-274 (2005) 10. Rosson, M., Carroll, J.: Usability Engineering: Scenario-based Development of Humancomputer Interaction. Morgan Kauffman Publishers, San Francisco (2002) 11. Specialist Library Knowledge Management. Storytelling. Available at http:// www.nelh.nhs.uk/knowledge_management/km2/storytelling_toolkit.asp 12. Truong, K.N., Hayes, G.R., Abowd, G.D.: Storyboarding: An Empirical Determination of Best Practices and Effective Guidelines. In: Carroll, J. (ed.) Proceedings of the 6th ACM conference on Designing Interactive Systems, pp. 12–21. ACM Press, University Park, PA, USA (2006)
Scenario-Based Installability Design Xiao Shanghong UCD Research Department,Huawei Technologies Co. Ltd., China
Abstract. We introduce the user scenario-based installability design approach. The basic idea is to check out how users complete the installation and thus to understand the experience, skills, and operation habits of users through onsite survey. Special attention is paid to the installation time that affects the efficiency, problems encountered during the installation process, and how users solve these installation problems. The main issues need to be considered: How to Select Typical Users, How to Conduct Installability Task Analysis, How to Define the Scenario, How to Conduct Installability User Test. Keywords: Installability, Typical Users, Task Analysis, Scenario definition, User Test.
1 Introduction Product installability design has aroused wide concerns from the R&D staff. They welcome the idea of conducting the installability design from the user’s point of view. However, they do not have a clear idea on how to take the user requirements into consideration and incorporate user requirements into their design. Huawei promotes the UCD and lists the product installability project as a UCD project. The purpose is to enable the R&D staff to consider installability from the user’s point of view. The problems are how to carry out the installability design and how to let users to participate in the installability design. We introduce the user scenario-based installability design approach. The basic idea is to check out how users complete the installation and thus to understand the experience, skills, and operation habits of users through onsite survey. Special attention is paid to the installation time that affects the efficiency, problems encountered during the installation process, and how users solve these installation problems. We can find out the problems that affect the installation efficiency and quality through careful observation and analysis, so as to work out the task model that complies with user operation habits and can facilitate the user operation. This task model is also called user scenario. Based on the user scenario, we can develop the prototype, invites users to evaluate the prototype, and then reiterate the design until the design meets the installability requirements of users.
Typical installation users represent the majority of installation users. Selecting typical users is in fact a process of sampling from the large amount of users. During sampling, we need to consider the skill of users, which is generally classified into three levels: novice, mid-level, and skillful. Another factor we need to consider is the number of users interviewed. If the number is too small, the interview result cannot reflect the general situation of the majority of users. If the number is too large, the cost of user interview increases. Thus, we need to find out an economic and reasonable number. 2.1 Sample Size and Probability of Detecting Usability Problems Formula (1) is used to calculate the probability of detecting usability problems. λ is the probability that a single user detects the usability problems. Experience proves the probability that a single user detects the usability problems is 31%. Figure 1 shows if we select 6 to 7 typical users for interview, more than 80% of usability problems can be found out.
p 1 (1 O ) n
(1)
Fig. 1. Relationship between sample size and probability of detecting usability problems
2.2 Set Up a Typical Installation User Database The typical installation user database can be set up as follows: For the domestic market: • Select 2 to 3 qualified telecom equipment installation companies in each province. • Request the representative office to distribute the questionnaire to investigate the basic information of installers, including the age, education, experience, and responsibility. • Set up the installation user database according to the survey result. • Export the user profile. • Select typical users according to the user profile. For the overseas market, the same procedure can be referred to.
Scenario-Based Installability Design
285
Example of Task Analysis: Current task model
Time(man*minute)
1 Cabinet Transportation
10
2 Cabinet Installation 2.1 Mark lines based on template 2.2 Drill holes and position the cabinet 2.3 Level the cabinet by using pads 2.4 Secure the cabinet
72 5 23 34 10
3 Power Cable Installation
24
4 User Cable Installation 4.1 Install the user subrack 4.2 Install the user boards 4.3 Route user cables from the cabinet top. (It is impossible to operate on either side of the cabinet.)
125 125
5 Installation upon Expansion (Board Expansion) 5.1 Replace the dummy panel with user board for expansion purpose. 5.2 Locate the correct user cable from the cable trough. 5.3 Install the user cable on the user board.
Problem
It is inconvenient and also takes long time to use pads to level the cabinet.
Users need to route the cables from the cabinet top. At first, users can accept this method. But with the increase of cables to be installed, the cable routing space on either side of the cabinet become smaller and smaller. When it comes to the cables of the last two user boards, users could hardly install these cables. User cables installed in advance are scattered in the cable trough and do not look nice. Cable connectors are full of dust and not well protected. As too many cables pile up near the door, sometimes the front door cannot be properly closed.
2.3 About User Proxy User proxy is often used in the product development. For example, in the installability pre-test, to save expense in inviting users, Huawei internal employees are invited instead to attend the test. Requirements are also set for this type of user proxy. The first requirement is that such users must work in the installation field. For example, for the cabinet installation test, it is not appropriate to invite software engineers. Installation conclusions drawn by software engineers cannot reflect the installability of the equipment. The correct way is to select the mechanical test engineers, new
286
S. Xiao
employees of mechanical department, and document development engineers or technical support engineers engaged in cabinet installation.
3 How to Conduct Installability Task Analysis After determining the typical users, we need to study how users complete the installation task. This process is called task analysis. During the task analysis, we observe how users execute the installation task, record the time needed for each installation task and problems found in each task, and understand the user experience. A detailed task analysis helps the UCD team to have a better understanding of the current product and thus locate the points for improvements. Task analysis can be conducted by adopting various user research methods, such as interview, survey, group discussion, and observation. It should be completed in the early stage of concept phase. Installability task analysis can be conducted by observing the installability test in the laboratory. The best choice, however, is the onsite observation. After the observation, it is recommended to invite experienced users for interview and collect opinions or suggestions from users to help better understand the user problems on site. After the task analysis is completed, we need to export the task model of the current product, completion time of each task, and a detailed record of installation problems (if any). Besides, we also need to export the task importance and satisfaction survey table. Importance of a task is generally judged according to factors such as the installation time and severity of the problem. The satisfaction survey requests at least 6 users and the satisfaction in the survey table is the average of scores given by these users.
Task
Importance Satisfaction
Cabinet transportation
3
5
Cabinet Installation
4
3
Power Cable Installation
4
5
User Cable Installation
5
2
Installation upon Expansion
4
2
Use the four-quadrant analysis method to find out the key points for improvement based on the importance and satisfaction of tasks. Tasks that fall into the opportunity window are those that users react strongly and should be given high priority. Task analysis is the foundation for the user scenario-based design. Only through task analysis can we understand the installation scenario of the user site and define the user scenario oriented to the future design. Task analysis is indispensable in the installability UCD.
Scenario-Based Installability Design
287
4 How to Define the Scenario Based on Task Analysis A scenario defines the human-product interaction. Scenario definition is based on the analysis of the current user installation tasks. The purpose is to improve the current task model and provide the task model oriented to the future product design. The new Before 1 Cabinet Transportation
After 1 Cabinet Transportation
2 Cabinet Installation 2.1 Mark lines based on template 2.2 Drill holes and position the cabinet 2.3 Level the cabinet by using pads 2.4 Secure the cabinet
2 Cabinet Installation 2.1 Mark lines based on template 2.2 Drill holes and position the cabinet 2.3 Level the cabinet by using anchors 2.4 Secure the cabinet
3 Power Cable Installation
3 Power Cable Installation
4 User Cable Installation 4.1 Install the user subrack 4.2 Install the user boards 4.3 Route user cables from the cabinet top. (It is impossible to operate on either side of the cabinet.)
4 User Cable Installation 4.1 Remove the side door 4.2 Install the user sburack 4.3 Install the user boards 4.4 Install user cables (Users can operate at one side of the cabinet.)
5 Installation upon Expansion (Board Expansion) 5.1 Replace the dummy panel with user board for expansion purpose. 5.2 Locate the right user cable from the cable trough. 5.3 Install the user cables on the user board.
5 Installation upon Expansion (Board Expansion) 5.1 Loose user cables that are installed in advance on the dummy panel. 5.2 Replace the dummy panel with user board for expansion purpose. 5.3 Install the user cable on the user board
288
S. Xiao
task model should allow users to complete the installation task in a more convenient and fast manner. For example, in the cabinet installation scenario, the cabinet is originally leveled by using pads. As installers need to insert the pads at the bottom of the cabinet. It is difficult and time-consuming to level the cabinet. Therefore, cabinet leveling problem is the key concern of users in the cabinet installation scenario and special attention must be paid to fix this problem. For example
5 How to Conduct Installability User Test The installability user test is based on the user scenario. Generally, 2 to 3 groups of typical users are invited to the test. Major roles attending the installability user test include one instructor and two test recorders. The instructor steers the whole test process during the test, but does not persuade users to do anything. Recorders record the data such as the installation time, number of errors, times of help seeking, and installation problems. The test process must be videotaped, so that we can analyze the test process in details after the test to locate user problems. Before the test, users will be requested to complete the user acknowledgement letter and the user basic information table. We must acknowledge the users that the test is used for product design only and does not constitute any harm to users. Meanwhile, users have the obligation to keep any related information confidential. Upon completion of each scenario test, users need to complete the satisfaction questionnaire. After the whole test is completed, users need to complete the overall satisfaction questionnaire.
6 Conclusion Besides the above key activities, scenario-based installability design also includes the scenario-based prototype design. Prototype design is also scenario based. It reflects the consistency of user requirements throughout the product development process. Prototype design is not detailed here due to the limited space. This article provides the method of scenario-based installability design method to clear puzzles encountered in the current installability design. It explains how to select typical users, conduct installability task analysis, define user scenarios based on task analysis, and conduct the installability test based on the scenario. The purpose is to provide a clear clew in product installability design for the R&D staff. The article also illustrates the effectiveness of scenario-based installability design through examples.
A Case Study of New Way to Apply Card Sort in Panel Design Yifei Xu, Xiangang Qin, and Shan Shan Cao Corporate Technology, Siemens Ltd. China, 7 Wangjing Zhonghuan Nanlu, Beijing 100102 P.R. China {yifei.xu, xiangang.qin, shanshan.cao}@siemens.com
Abstract. The aim of this paper is to describe a case of washing machine panel design. In this case card sorting and cluster analysis were applied to get target users’ mental models of the information architectures about the washing machine panels, the differences among information architectures of existing panels were also quantitatively evaluated. Besides, the differences between users’ mental models and existing washing machines regarding the information architectures were identified. The methodology and results in this paper contribute to the design of washing machine panels. Keywords: Panel Design, Card Sorting, Quantitative Measure.
it shows less “selling points” to customers in marketing. The goal of this project is to find out a compromised solution for the washing machine panel with better usability and visual affordance of operation and functionality.
Fig. 1. The first trend of HMI design
Fig. 2. The second trend of HMI design
Although washing machine is common in most families and the number of washing programs and functions is limited, the current products were still complained for usability. One finding from user study is that the information architectures of the panels are disorganized and inconsistent with users’ mental models. Both semantic categories of the operations and customers’ using habits were considered in panel design and in most cases they are usually inconsistent. For example, to most users’ common sense, Cotton, Synthetic, Wool and Silk are four programs for different fabric textiles and Quick, Normal and Power Wash are three different programs in washing strength and time. However, according to users’ habits and requirements, Cotton, Synthetic and Quick wash are more frequently used. So, in many products these three programs are grouped together and operable with the knob, but Normal and Power Wash are only available in menu. Another example is that some new and innovative program like Fresh Drying and Steam Wash, which are not frequently used, locate on the panel in high priority places. Also, for the consideration of visual appearance and manufacture cost, knob, buttons and lights on the panel could not be various in design and must be displayed in a visually beautiful order. In that way, much information indicating operation and functionality will be lost in panel design. Because users’ mental model about the washing machine panel is different from the real product, it will be difficult for them to understand and remember the panel information architecture and not be able to locate the program or function when necessary. In order to provide a better design of washing machine HMI, we would study target users’ mental model of the panel and find out its’ difference from current products for improvement. 1.2 Card Sort Method Card sorting is a usability method used in software and product design to discover users’ mental models about information architectures. [1] It is proved to be a useful and effective technique to organize several pieces of information items or concepts. The standard way of card sorting is to provide a group of target users with a set of cards. On each card is a concept or piece of information from the set that needs to be organized. Users then categorize the cards into groups based on their understanding. [2] The technique is based on the assumption that if users group cards together, the concepts probably should be grouped together in the system. [3] The result suggests users’ mental model about the information architecture of a system or website.
A Case Study of New Way to Apply Card Sort in Panel Design
291
The standard way of analyzing card sort results is cluster analysis. Results from all or various groups of users could be entered into a statistical analysis program and tree-diagrams (or dendrograms) will be generated as a graphical representation of the relationships between the concepts under study. The dendrograms may provide useful insight into people’s mental model about a system, but they are usually very complex and difficult to interpret and measure quantitatively. [4] In this project, qualitative differences are not sufficient to support the redesign. Therefore, we applied a quantitative measure to evaluate the differences of card sorting results from users. Also as a new trial, the information architectures of current washing machine panels were input as the card sort results and compare with that of users. We also evaluate the differences between information architectures of existing washing machine panels and the differences between users’ mental models and machine information architectures. The assumption of this test is that the information architecture of current panel designs are also based on previous studies about their target users.
2 Method and Procedure This session is to outline the methodology employed in the data gathering, performance and analysis of the card sort results. 2.1 Card Sorting with Users Participants Twelve (6 male, 6 female) end users of high-end & front-loading washing machines were recruited from the 19 participants in previous interview. Information was collected regarding their gender, age, education level and type of washing machine they currently used. Procedure Thirty-nine physical cards were generated from the programs and functions of the washing machine panel to be evaluated. Each was printed with a menu entry or a visible option around the controls. The functions and programs of the target washing machine are very similar with those of existing products in market. Participants conducted the card-sorting separately after all items on the cards were explained to them in detail to make sure they have understood the meanings. Each participant was accompanied by a usability engineer throughout the session and interpreted their card-sorting results to the usability engineers. Participants were instructed to sort the cards into piles based on their understanding and requirements without any restrictions of how many categories to create or what criteria to use to group the cards. Participants were also allowed to throw out any item that was not believed to fit into their requirements. After card-sorting, participants were asked to name each group of cards and then merged the sub-groups of cards into higher level of categories.
292
Y. Xu, X. Qin, and S.S. Cao
2.2 Card Sorting Derived from Washing Machines Washing Machines 10 high-end, front-loading washing machines were selected for the study. In this paper, we mark them as A1, A2, B1, B2, C, D1, D2, E, F, G. Letter indicates brand, while number distinguishes module. A1, A2, B1, B2 and C are European brands, and the rest are Asian products. Procedure As mentioned above, we strived to analyze information architecture trend of washing machines available in market. To this end, the information architectures of 10 highend, front-loading washing machines were summarized into dendrograms based on the criterion that “items that operated by the same control should be grouped in one category”. Additionally, distinct visual design of the controls or the labels around the control naturally partitions the group into sub-categories. For example, the items around the knob, such as Cotton, Super Quick, Dark Wash and etc., usually belong to the group of wash programs. But according to the visual patterns or visual design elements like location, font and color, they could be divided into several sub-groups. Items that are close in location and similar in visual appearance are usually set under one sub-group. Items locate in software menu belong to another sub-group. All function sub-groups form the group of Functions and Features. Fig.3 shows the item-sort result of one product. We conjecture that the items were supposed to be presented in two groups—program and function groups. The items of program group were split up into two sub-categories by a red semi-ring around the 4 programs leftward and by text in red labeled on top of them. Function group consists of 4 sub-categories. Buttons that are close in location and similar in visual appearance are set under one sub-category.
S
Function Group
S
S P
S
S
S
Fig. 3. The information architecture of washing machine panel
2.3 Data Analysis To make the information architectures from users and washing machines comparable, the card sort results must share the same “card pool”. We summarized the programs and functions from the 39 cards (Pool User) and 10 washing machines (Pool Machine). The same or very similar programs and functions were labeled with the same card name. When inputting the users’ data into cluster analysis, the cards only available in Pool Machine were regarded as “useless cards” and vice versa. The aggregate set of Pool User and Pool Machine were named as Pool General, which will be the card pool for the comparison.
A Case Study of New Way to Apply Card Sort in Panel Design
293
Cluster analysis was performed to process the card sort data. The quantitative measure of card sort data was based on the assumption that each dendrogram can be represented by a card sort cluster and each cluster could be represented as a matrix. The triangle matrix could be concatenated into a single vector. [4] Therefore, a matrix can be got from each participant’s card sort result and the existing panels by USort and EzCalc. Then we may concatenate the matrixes into vectors and do statistics analysis to the vectors representing the card sort result.[4]
3 Results and Discussions 3.1 Information Architectures from Users The results of 12 users were manually input into card-sorting tools Usort and analyzed by EZcalc with complete algorithms. 12 dendrograms and distance matrixes were generated separately which represented the information architectures and the distances between pairs of cards in users’ mind.
Fig. 4. Analysis of the Distance Matrixes for 12 Users
Fig. 5. Dendrogram of card sorting results for 11 Users
From our observation, one participant did not understand the process of cardsorting very well and the result extremely inconsistent with others. So the data from this participant was excluded from further analysis. This result was validated by Hierarchical Cluster Analysis of users’ distance matrixes. Then 11 users’ card-sorting results merged and generated one dendrogram (See Fig.5 for example) and distance matrix (See Table 1 for example) with complete algorithms. A distance of 0 means
294
Y. Xu, X. Qin, and S.S. Cao
that everyone placed two cards together and a “1” means no one combined the two cards. In the dendrogram and distance matrix below, everyone combined Memory1 and Memory2 in the same pile; no one put Quick and Memory1 together. The smaller the distance between two cards, the more users put them together. Table 1. of the Distance Matrix (11users)
Memory1 Memory2 Standard Synthetic Quick Normal Intensive
Memory1 1.00 0.00 0.78 0.90 1.00 1.00 1.00
Memory2 0.00 1.00 0.78 0.90 1.00 1.00 1.00
Standard 0.78 0.78 1.00 0.61 0.64 0.67 0.75
Synthetic 0.90 0.90 0.61 1.00 0.83 1.00 0.90
Quick 1.00 1.00 0.64 0.83 1.00 0.06 0.28
Normal 1.00 1.00 0.67 0.94 0.06 1.00 0.13
Intensive 1.00 1.00 0.75 0.90 0.28 0.13 1.00
Gender Difference As home furnishings, the target users of washing machines could be either male or female. One main purpose of current project is to explore whether there are significant differences between male and female regarding the mental structures of washing machines. The card sorting results of all 11 participants were divided into male-structure and female-structure and produced two distance matrixes respectively. Wilcoxon Signed Ranks Test was used to compare the relationships between the distance matrixes of male and female. However, the result (See Table 2.1 and Table 2.2) shows no significant difference, which indicates that it’s not extremely necessary to distinguish male from female in designing the menu structures of washing machines. This result was consolidated by the correlation test (r=0.613, p<0.001). Table 2.1 Wilcoxon Signed Ranks Test for the information architectures of male and female
Female - Male
Negative Ranks Positive Ranks Ties Total
N 285(a) 312(b) 144(c) 741
Mean Rank 307.69 291.06
Sum of Ranks 87691.00 90812.00
a Female < Male b Female > Male c Female = Male Table 2.2 Signed Ranks Test for the information architectures of male and female
Z Asymp. Sig. (2-tailed) a Based on negative ranks. b Wilcoxon Signed Ranks Test
Female - Male -.371(a) .711
A Case Study of New Way to Apply Card Sort in Panel Design
295
3.2 Information Architectures from Washing Machines The information architectures of the 10 washing machines are supposed to be different among diverse brands and consistent within one brand, which is one of the usability principles recognized by ergonomics [5]. Besides that, brands from Europe and Asia are also assumed to be different. To this end, the distance matrixes and dendrograms of 10 individual washing machines were analyzed by Hierarchical Cluster Analysis to test which information architectures could be clustered together and thus be classified as one group. The results showed that when cluster membership was confined to 2, A1 and B2 were classified as one cluster, and other washing machines were classified as one cluster. This result is out of our expectation. The four washing machines of brand A and B should keep consistent design styles in panel structure. But the distance between B1 and B2 ranked largest among those between B1 and other washing machines, although they are of the same brand. When the cluster membership was confined to 3, F was distinguished as one independent cluster. The reason is that all functions and corresponding parameters was displayed on the panel of F with only one level of menu structure, while there were more than one level of menu structures on other washing machines. E and G were distinguished as independent cluster respectively when the cluster membership was confined to 4 and 5. When the cluster membership was confined to 6, A2&B1, D1&D2&C, A1&B2 were clustered together respectively and E, F, G were distinguished from each other. The classifications were also validated by the proximity of washing machines (See Fig. 6). The results showed that the main dissimilarities exist between washing machines from European and Asian brands, except that the distance between C European and D1 was the smallest one among those between C and other washing machines.
(
)
Fig. 6. Hierarchical Cluster Analysis of 10 Washing Machines Table 3. The Proximity of Washing Machines
Max Min
A1 F 148 B2 45
A2 F 104 B1 44
B1 B2 114 A2 44
B2 E&F 163 A1 45
C B2 140 D1 57
D1 B2 144 D2&C 57
D2 B2 162 D1 56
E B2 163 D1 71
F B2 163 C 78
G B2 160 B1 78
Note: Max is the Maximum Distance between Pairs of Washing Machines; Min is the Minimum Distance between Pairs of Washing Machines
296
Y. Xu, X. Qin, and S.S. Cao
3.3 Comparison of the Structures for Users and Washing Machines Six information architectures of washing machines were generated (that of A2&B1, D1&D2&C, A1&B2, E, F, G ) based on the results of Hierarchical Cluster Analysis. The six groups generated by cluster analysis could be validated by the real machines in terms of their panel layouts and brands. Therefore, the information architectures of the six groups were used for furthur analysis. The similarity between the information architectures of washing machines and users was compared using Wilcoxon Signed Ranks Test. The results showed that the dendrogram of A1& B2 was the most similar one with that of users (See Fig.7). Table 4. Wilcoxon Signed Ranks Test for the Menu structures of users and competitors
Z Asymp. Sig. (2tailed)
Users Vs A1& B2 -1.82(a)
Users Vs D1&D2&C -5.48 (a)
Users Vs A2&B1 -8.26 (a)
Users Vs F -10.72(a)
Users Vs E -12.09(a)
Users Vs 7Others -3.07(b)
.069
.000
.000
.000
.000
.002
Users Vs G -13.57 (a) .000
a Based on positive ranks. b Based on negative ranks. c Wilcoxon Signed Ranks Test
Fig. 7. Dendrogram for A1&B2
4 Strengths and Limitations In this study card sorting was not only used to gather users’ mental models about panel design, but also as a tool to quantitatively measure the differences among information architectures of existing product panels and their correlations with users’ mental models. However, limitations also exist in this study due to the requirements for an applied project rather than a well-designed research project. First, the original purpose of card sorting by users was to get users’ mental models of a panel design for a specified new product as defined in the project. Measurement
A Case Study of New Way to Apply Card Sort in Panel Design
297
for existing products was conducted to further analyze the results. Therefore, the card pool for users was a little different from the card pool of products, which could influence their comparisons. Second, nonparametric tests were used to conduct quantitative analysis due to the small sample size of users (12) and washing machines (10). Although statistical principles were observed strictly, we should be prudent to expand current results into a wide field.
5 Conclusions Card sorting is usually an efficient and effective way to explore users’ requirements of the information architecture design for products. The results from card sorting by users could be guidance for product information architecture design. But how the results would be interpreted to support design highly depends on individual applications and researchers’ personal experiences. Quantitative analysis of card sorting results could be a good supplement of this method. In current case study, the information architecture of existing products and users were identified by card sorting and the results were further analyzed by cluster analysis and nonparametric test to classify and compare them. The information architectures of existing washing machines were classified into six groups by cluster analysis and one of them was proved to best fit that of users. No significant gender difference was found in terms of the information architectures in users’ mental model. In summary, current case study shows that card sorting and cluster analysis are effective ways to build up information architectures of products. As a consequence, it also provides researchers and practitioners with qualitative and quantitative methods and results for information architecture design.
References 1. Nielsen, J., Sano, D.: SunWeb: User Interface Design for Sun Microsystems Internal Web. Computer Networks and ISDN Systems, 28, 179–188 (1995) 2. Dong, J.M., Fu, L.M., Salvendy, G.: Human-Computer Interaction: User Centered Design and Evaluation. Tsinghua University Press, pp. 74–83 (2003) 3. Berndtsson, J.: Designing an Intranet from Scratch to Sketch: Experiences from Techniques Used in the IDEnet Project. In: Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences, vol. 2, p. 2019 (1999) 4. Ewing, G., Logie, R., Hunter, J., McIntosh, N., Rudkin, S. et al.: A New Measure Summarising ‘Information’ Conveyed in Cluster Analysis of Card-Sort Data: Application to a Neonatal Intensive Care environment. In: Proceedings of the 7th Workshop on Intelligent Data Analysis in medicine and pharmacology, pp. 25–29 (2002) 5. Vredenburg, K., Isensee, S., Righi, C.: User-Centered Design: An Integrated Approach (Reprint Edition). Pearson Education North Asia Limited and Higer Education Press, p. 152 (2003)
Design Tools for User Experience Design Kazuhiko Yamazaki1,* and Kazuo Furuta2 1
IBM Japan Ltd., User Experience Design Center/ 2 The University of Tokyo * [email protected]
Abstract. The purpose of this study is to develop an approach to artifacts design based on information technology. To make interactive system easy to use, user centered design approach is utilized by many systems. For user centered design, it is important to consider total user experience. But it is not easy to consider total user experience because user experience is including many aspects. To approach total user experience, the author proposes the method of designing for user experience that consist of “User viewpoint”, “Environment viewpoint” and “Lifecycle viewpoint”. “User viewpoint” is including several user groups from universal design viewpoint, several user characters and several user emotions. “Environment viewpoint” is including hardware product, software, application, space, people who is communicating. “Lifecycle viewpoint” is including pre sales, after sales, support, upgrade, setup product and application. To help this design approach, user experience design tool named “UED (User Experience Design) Studio” was proposed. Based on proposed three approaches, design tools were developed such as “The definition tool”, “The evaluation tool” and “The visualization tool” for user experience design. To define user experience situation easily, “The definition tool” helps designer such selecting user group, selecting environment and input user tasks based on life cycle state. “The evaluation tool” is to evaluate defined user experience easily. And “The visualization tool” is to show the result of evaluation by 3 D graphics easy to understand complicated information. To evaluate proposed tools, experiment to make prototype was conducted and the results indicate that the proposed approach has possibility to help designer and multi-disciplinary team to consider user experience for user centered design.
The web site by American Institute of Graphic Arts describes for experience design as follows; • A different approach to design that has wider boundaries than traditional design and that strives for creating experiences beyond just products or services. • The view of a product or service from the entire lifecycle with a customer, from before they perceive the need to when they discard it. • Creating a relationship with individuals, not targeting a mass market. • Concerned with invoking and creating an environment that connects on an emotional or value level to the customer. • Built upon both traditional design disciplines in the creation of products, services, as well as environments in a variety of disciplines. For example, what is the user’s experience in making a presentation at a conference? Before the conference, the user needs to consider the title and content of the presentation, prepare the presentation slides, travel to the conference with a notebook PC, and after the presentation, the user receives questions and feedback and may change the presentation for the next conference. During the actual presentation, the user may need to set up a desk for the projector, connect cables to the projector, and decide where to put the notebook PC. Also, depending on the user’s character, the presentation style will be different from other presenters. For example, a younger presenter may not think about using small text on the slides, a designer may try to use a lot of graphics on the slides, and some presenters may not use any slides at all. There are a lot of factors that are related to the user’s experience. In reality, it is not easy to adopt a user experience design approach to products and services because user experience design means covering a wide range of aspects, and without collaboration by a multidisciplinary team, the user experience design will not be successful. In this paper, the authors tried to organize the user experience design approach and propose methods for the design approach, for the processes, and for the teams in order to help designers and multidisciplinary teams. Also, to help this design approach succeed, the authors also propose the creation of a user experience design tool.
2 Design Approach for User Experience Design 2.1 About User Experience Design User experience design is to design products, systems, and environments by considering the total user experience that includes various aspects such as usability, accessibility, appearance, personality, branding, etc. The design should span the entire lifecycle, and not be limited to just one scene of the user’s time. Also, it should cover the total environment that is related to all of the materials needed to achieve the user’s goals. 2.2 Design Approach for User Experience Design To approach the total user experience, the authors propose a method of designing for user experience that consists of the Lifecycle viewpoint, the Environment viewpoint,
300
K. Yamazaki and K. Furuta
and Various User viewpoints. The Environment viewpoint covers all of the materials that users look at, touch, or feel. For example, it includes hardware products, software, applications, the space containing the systems, and the people who are communicating. The Lifecycle viewpoint covers all of the time that the user will be related to the product and its systems. For example it includes the pre-sales period, the after-sales time, support, upgrades, the product setup, and actual use of the application. Various User viewpoints cover the differences between various people. For example, it includes several user groups from the universal design viewpoint, such as users with various characters, and various emotions felt by one person. 1) Consider User Experience from the Lifecycle Viewpoint From the Lifecycle viewpoint, the system includes a user's initial awareness, through additional discoveries, on to ordering, delivery, installation, initial use, day-to-day use, service, support, upgrades, and end-of-life. For example, considering a presentation at a conference, before the conference the user needs to consider the title and content of the presentation, prepare the slides, and travel to the conference with a notebook PC, and after the presentation, the user receives feedback and may change the presentation for the next conference. 2) Consider User Experiences from the Environment Viewpoint The Environment viewpoint is to cover all of the materials that the user looks at, touches, or feels. It includes the hardware product, software, applications, the space where things happen, and the people who are communicating. For example, for a presentation, the environmental requirements include a desk, a projector, suitable cables, the projection screen, etc. 3) Consider the user experience from various User viewpoints The various User viewpoints cover various human differences. These include several user groups from the universal design viewpoint, users with various characters and personalities, and feeling various emotions. For example, for a presentation, depending on the user’s character, the presentation style will be different. Younger presenters may not consider small text on the slides, designers may try to use lots of graphics on the slides, and some presenters may not use any slides. Many of these items are related to the user’s unique experience. 2.3 Design Method for User Experience Design The design method for user experience design has to be based on the user centered design approach because user centered design is a useful method to solve the problems from the user’s viewpoint, and also it is very popular in many companies. The design method for the user experience design needs to extend user centered design to cover the Lifecycle viewpoint, the Environment viewpoint, and various User viewpoints. Also, from the corporate viewpoint, branding is very important for user experience design to succeed as a business. The design process and the design team for user experience design should be based on user centered design with extensions from the user experience viewpoint.
Design Tools for User Experience Design
301
Here is the design process for user experience design: • Make a plan for the user experience design: It is important to have the right design process, right methods, and the right team. For this purpose, before starting the project, the project leader has to make a plan for the user experience design. The plan has to include an outline of the process, the schedule, the team members, and the budget. • Understand the background of user experience: The background includes the market, the business, the users, the stakeholders, and the branding. • Understanding the user experience of the targeted users from lifecycle, environment and various User viewpoints: This includes the people, the user roles, the user goals, the user tasks, and the user scenarios, as seen from the user experience viewpoints. • Concept design for user experience: This includes a low fidelity user experience prototype, and a document for the concept design and its evaluation. • Detailed design for user experience: This includes a high fidelity user experience prototype, detailed design specifications, and their evaluation. • Evaluation from the user experience viewpoint: This includes the final prototype and evaluation. • Validation of user experience in marketplace: It is important to validate the results of user feedback from the user experience viewpoint. The design team for user experience design should be considered based on the user centered design approach. The members are almost same as for user centered design, but all of the members have to knowledge about user experience design. Here is the list of team members for user experience design: • • • • • •
Project leader User researcher User experience designer Visual designer (Industrial designer or Graphic designer) User testing specialist Marketing planner
3 Design Tool for User Experience Design 3.1 Purpose User experience design is not easy to understand because it is a new approach and covers many different fields. It is also a new approach for currently practicing usability specialists and designers. It needs to cover various fields from the Lifecycle viewpoint, Environment viewpoint, and various User viewpoints. Also, it is important to share information for a multidisciplinary design team. To help designers to approach the design from the user experience viewpoint, an effective tool is desirable. In this chapter, the authors describe the current tool used to help designers apply the user experience viewpoint, and also the requirements for a newer tool.
302
K. Yamazaki and K. Furuta
3.2 Current Tool To support user experience design, there are some current design tools such as Persona, User Scenarios, User Segment Tables, and Lifecycle for User Experience, as follows: 1) Persona for User Experience A persona is an example of a person who characterizes a role that represents a user group from the user experience viewpoint. A persona describes a fictitious user including the roles, skills, goals, emotions, and other personal characteristics. It feels real to designers because it is example of a person, not just a conceptual description. A persona helps designers understand and focus on characteristics of users from the user experience viewpoint. 2) User Scenario for User Experience Modeling user scenarios is one of the useful methodologies to understand users and share information among designers and related people. A user scenario has many roles such as system vision, design rationale, usability specifications, functional specifications, user interface metaphors, prototypes, object models, formative evaluation, documentations, and overall evaluation. For an innovative design, user scenarios need to be developed for each end user segment to share the goals and aspirations of these users with a variety of professionals. They use these user scenarios to create and evaluate new ideas. User scenarios are very important tools to collaborate with many professionals around the world and to create a common language for collaboration. As shown in Figure 1, one example of a user scenario is a visual user scenario for a notebook PC. It describes the typical user scenario, gives an image of a persona and the goods, and it is like a poster to share among several people who are working on this project. 3) User Segment Table As shown in Figure 2, the user segments table is prepared to help designers identify various types of users of a product being designed and to put these user types in some target user groups. This table constitutes a matrix of rows and columns. The columns consist of the items related to human physical and mental functions and demographic, cultural, and environmental factors, all of which designers have to be taken into account. The rows list the basic types of users, such as disabled people, temporarily disabled people, children, and so on. Some cells in the matrix are listed with typical or general user examples. 4) Lifecycle for User Experience A lifecycle for user experience is an approach that considers user experiences for each step of the relationships among the users and system or product. For example, for a product, a lifecycle could be divided into interest building, serious consideration, shopping, setup, support, and upgrade steps.
Design Tools for User Experience Design
303
3.3 Requirements for a Design Tool for User Experience Design To help this design approach, the authors here describe the requirements for a design tool for user experience design: • Understand the user experience approach including the lifecycle, environment, and various User viewpoints. • Share the information among the project members with different backgrounds. • Easy to update by changing the information. • This tool can be utilized in several different steps of the design process. • Easy to see the information for all members of a multidisciplinary team.
Fig. 1. Example of Visual Scenario
•The tool has to be networked application software and share the data among the team members.
4 Experiment for UED Studio 4.1 Introduction for UED Studio To help support this design approach, the authors propose user experience design tools named UED (User Experience Design) Studio. Based on the requirements for a design tool for user experience design, UED Studio would be an integrated application to support design. UED Studio consists of three applications, the Definition Tool, the Evaluation Tool, and the Visualization Tool. To easily define the user experience situation, the Definition Tool helps designers in ways such as selecting a user group, selecting an environment, and entering user tasks based on the Lifecycle viewpoint. The Evaluation Tool is for easily evaluating the defined user experiences. The Visualization Tool is to show the results of the evaluations by using 3D graphics to make it easy to understand the complicated information. The Definition Tool and the Evaluation Tool are based on the three user experience design approaches, the Lifecycle viewpoint, the Environment viewpoint, and the various User viewpoints. For this purpose, both of these tools have three corresponding views to span these viewpoints.
304
K. Yamazaki and K. Furuta
Fig. 2. Example of User Segment
The purpose of the user view in both of these tools is to define a target user group, and a designer is able to define several user groups by using text and images. The purpose of the environment view in these tools is to define the environment of some target user, and a designer is also able to define several such environments by using text and images. The purpose of the lifecycle view in both of these tools is to select a stage of the lifecycle, such as product recognition, shopping, use, or update. The purpose of the user tasks in these tools is to define each user task, and a designer is able to input descriptions of each user task. Here are the steps of user experience design and the relationships with each application of UED Studio: 1. Make the plan for user experience design Consider which methods and tools will be good for each steps of user experience design. 2. Understanding market and business including branding from user experience viewpoint The Evaluation and Visualization Tools helps to evaluate the current products or systems and the competitors. 3. Understanding user experience of targeted users from lifecycle, environment, and various User viewpoints The Definition Tool helps designers make the definitions. 4. Concept design for user experience The Definition Tool helps designers remember the basic user definitions. 5. Detailed design for user experience The Definition Tool helps designers remember the use of user experience design. The Evaluation and Visualization Tools evaluate low-level user experience prototypes. 6. Evaluation from user experience viewpoint The Evaluation and Visualization Tools evaluate high-level user experience prototypes. Validation of user experience in market The Evaluation and Visualization Tools evaluate the final user experiences.
Design Tools for User Experience Design
305
The UED Studio application is composed of three applications, using text and visual data. The three applications are controlled with XML files, using common text data and common visual data for user information, environment information, user tasks, and the results of evaluations. It is easy to exchange text and images because the text data and image data are separate from the application. Designers can update the text data directly in each application. 4.2 UED Studio: The Definition Tool The purpose of the Definition Tool is to help a designer define target users from their user experience viewpoints. When a designer develops products or systems from the user experience design viewpoint during the concept design stage, this tool helps the designer to define target users including user roles, user characteristics, user tasks, and user environments. As shown in Figure 3, this tool consists of three views, one for the user, one for the user’s environment, and one showing the user’s tasks within the lifecycle.
Fig. 3. Definition Tool
Fig. 5. Evaluation Tool
Fig. 4. User View of Definition Tool
Fig. 6. Visualization Tool
To define a target user, the following is the process with UED Studio: • Start from the main menu of the Definition Tool • Define a user group • The designer needs to select the user definition view from the main menu in order to define each user group. As shown in Figure 4, the user view has many examples of user groups with pictures and detailed definitions, and the designer can select various user groups by clicking the pictures. • Define user environment • The designer needs to select the environment view to define each user’s environment. Like the user view, the environment view has many examples of user environments with pictures and detailed definitions, and the designer can select various environments by clicking the pictures.
306
K. Yamazaki and K. Furuta
• Define lifecycle and user tasks. The designer needs to select one of the lifecycle phases and input each user task. • Returning to the main menu allows looking at overviews of the definitions 4.3 UED Studio: The Evaluation Tool The purpose of the Evaluation Tool is to help a designer evaluate products or systems from the user experience viewpoint. As shown in Figure 5, the Evaluation Tool consists of a lifecycle and task view, an environment view and various user views. A designer evaluates the user experience and selects one of the five steps. This tool has a capability to convert the data and save it as a CSV file for use by other software. 4.4 UED Studio: The Visualization Tool The purpose of the Visualization Tool is to show the results of the evaluations using 3D graphics to make the complicated information easy to understand. The 3D graphics are created automatically with the data of the Evaluation Tool. As shown in Figure 6, the Visualization Tool has 3 axes. The author-designed X-axis is for time, the Y-axis is for a variety of users, and the Z-axis and each surface have pictures that are related to the functions. This picture is also relevant for the Evaluation Tool. The columns are visualized as the results of an evaluation. The color and diameter of each column is related to the results of an evaluation. The authors intend that the designer will find it easy to quickly recognize the results of evaluation.
5 Conclusions To help designers, the authors have proposed a design process by using UED Studio. After creating the proposal, this process was introduced to several companies as a design process from the user experience design viewpoint. We need further study in practical situations to evaluate this design process and the next steps should include experiments in real product design processes. For the next step, the author is planning to enhance UED Studio to make it easier to input user tasks. In addition, based on the current UED tool, the authors are planning to perform experiments to get designers’ feedback. Acknowledgements. For help with this paper, we would like to give special thanks to Manabu Sasajima, Kousuke Akai and Akira Okada.
References 1. Nielsen, J.: Usability Engineering. Academic Press, US (1993) 2. Carroll, J.M.: Scenario-based Design-envisioning work and technology in system development. Wiley, US (1996) 3. Carroll, J.M.: Making Use of Scenario-based design of human-computer interactions. The MIT Press, US (2000)
Design Tools for User Experience Design
307
4. Yamazaki, K.: Study on Design Method by Using Video User Scenario, CHI 2001, New Orleans (2001) 5. Yamazaki, K.: Listening and Leading in User-Focused Design ICSID (International Council of Societies of Industrial Design), In: Proceedings ICSID 2001, pp. 382–388, Seoul (2001) 6. Nomura, M., Yanagida, K., Yamaoka, T., Yamazaki, K., Okada, A., Saito, S.: A Proposal for Universal Design Practical Guideline (3) Proposal of UD Matrix for Universal Design Practical guideline, In: International Conference for Universal Design, Yokohama, Japan (2002) 7. Yamaoka, T., Yamazaki, K., Okada, A., Saito, S., Nomura, M., Yanagida, K.: A Proposal for Universal Design Practical Guideline (1) Framework for UD Practical guideline, In: International Conference for Universal Design, Yokohama, Japan (2002) 8. Yamazaki, K., Yamaoka, T., Okada, A., Nomura, M., Yanagida, K.: Universal Design Practical Guideline, In: First International Conference on Planning and Design, Taipei, CD JP009-F, pp. 01–06 (2001) 9. Yamazaki, K., Okada, A., Saito, S., Nomura, M., Yanagida, K., Yamaoka, T.: A Proposal for Universal Design Practical Guideline (4) Design Process for Universal Design Practical Guideline, In: International Conference for Universal Design, 2002, Yokohama, Japan (2002) 10. Yamazaki, K.: Universal Web Approach to Web Contents for a Company Web Site. In: International Conference on Universal Access in Human-Computer Interaction, pp. 747– 751 (2001) 11. Yamazaki, K., Okada, A., Saitoh, S., Nomura, M., Yanagida, K., Yamaoka, T.: Proposal for design process and user segments table?for universal practical guidelines, In: 10th International Conference on Human-Computer Interaction, vol.4, pp. 168–172, Crete (2003) 12. Yamaoka, T., Yamazaki, K., Okada, A., Saitoh, S., Nomura, M., Yanagida, K.: A concept and method of proposed Universal Design Practical Guideline, In: 10th International Conference on Human-Computer Interaction, vol.4, pp. 163–167, Crete (2003) 13. Okada, A., Saitoh, S., Nomura, M., Yanagida, K., Yamaoka, T., Yamazaki, K.: Construction and application of the user segments table as a tool proposed in the universal design practical guidelines. In: Proceedings of the 6th Triennial Congress of the IEA, pp. 322–326, International Ergonomics Association, Seoul (2003) 14. Yamazaki, K., Okada, A., Saitoh, S., Nomura, M., Yanagida, K., Yamaoka, T.: Design process for universal practical guideline by using UD MATRIX and UD user segments, In: Proceedings of the 6th Triennial Congress of the IEA, pp. 359–363, International Ergonomics Association, Seoul (2003) 15. Yamazaki, K., Furuta, K.: Proposal for design method considering user experience, In: 11th International Conference on Human-Computer Interaction, Las Vegas (2005)
Axiomatic Design Approach for E-Commercial Web Sites Mehmet Mutlu Yenisey Istanbul Technical University, Industrial Engineering Department 34367 Macka-Istanbul TURKEY [email protected]
Abstract. The success of e-commerce depends on strong infrastructure, powerful business processes, error-free codes in e-commerce site, and highly usable interfaces. However, the most important factor to achieve these goals is the quality of design process. The main objective of axiomatic design is to provide a scientific base for design process. Axioms are the propositions which are accepted as true. The design axioms are determined by the definition of common elements of good designs. There are four sets to systemize this interaction; customer, functional, physical and process definition sets. Customer Definition Set shows the expectations of the customer in manner of product, process, system or/and material. The customer needs are expressed as functional requirements and constraints in Functional Definition Set. The physical design parameters to correspond the functional requirements are defined in Physical Definition Set. And finally, the process characterized in manner of process variables is in Process Definition Set. Keywords: Axiomatic Design, Web page, Usability.
Axiomatic Design Approach for E-commercial Web Sites
309
The basic dominant principles for a good design are generated by axiomatic approach. It is based on the creation of decisions and processes for a good design. Moreover, the axioms produce new concepts. New results and theories are generated by generalizing the axioms. Hence, these new results and theories are accepted as true and valid since they are based on axioms. The main objective of axiomatic design is to provide a scientific base for design process. Logical and rational processes and tools are presented to the designer. Hence, the development of design activities is built on a theoretical base. Moreover, the axiomatic design aims to improve the creativity, to shorten the research process, to minimize the trial and error period, to find out the best among all solutions, to combine the powers of computer and creativity by providing a scientific base for design process. Axiomatic design increases the creativity when necessary functions and constraints to correspond the customer needs are clearly defined. It helps that the designer can focus on the ideas which guaranties to correspond the customer needs by eliminating the bad ones immediately. It creates a systematic flow from the emerging ideas to detailed design. It used to be thought that the design could be learnt only by experience. However, it is believed that the creativity and the experience sides of the design can be improved by systematic and scientific approach. This development can be seen as renaissance in design field. The design is a interaction process between “What is to be wanted to achieve?” and “What is the way to reach the achievement?”. There are four different definition sets to systemize this interaction. These sets also define the borders among four design activity. These sets are; customer, functional, physical and process definition sets. Customer Definition Set shows the expectations of the customer in manner of product, process, system or/and material. The customer needs are expressed as functional requirements and constraints in Functional Definition Set. The physical design parameters to correspond the functional requirements are defined in Physical Definition Set. And finally, the process characterized in manner of process variables is in Process Definition Set [1].
2 Commerce vs. E-Commerce Human being had been trading for thousand and thousand years. At the beginning, we collected the goods needed from nature. Later, we guessed that our requirements were beyond of the nature-giving. Hence, we started to exchange goods we did not own. Probably, this was the beginning of the trading. As time went by, several shopping means have been emerged. Today, we are shopping in e-stores in order to accelerate the shopping procedure, to obtain error-free shopping environment and the most important of all, to do shopping easily. Ng [2] classifies today’s shopping means as i)public markets, ii)stores, iii)supermarkets, iv)shopping malls or centers, and v)electronic and cyber malls. E-commerce is simply performing the commercial transactions via digital processes over computers and networks connecting them. However, e-commerce means more than this definition. The main objective is to increase the accuracy and
310
M.M. Yenisey
efficiency in business processes. Hence, e-commerce enables the parties to benefit with more accurate and efficient transactions [3]. Rayport and Jaworski [4] defines E-commerce as technology-mediated exchanges between parties (individuals or organizations) as well as the electronically based intraor interorganizational activities that facilitate such exchanges. As a matter of fact, e-commerce is not only a transfer of business transactions form click and mortar to virtual world. For those companies considering e-commerce as simply such a transformation, e-commerce adventure ended up with disappointment. Schniederjans and Cao [5] claim that, companies need to have an integral business model with an emphasis on supply-chain management rather than a model solely based on sales and marketing in order to be successful in the e-commerce era. Critical success factors for e-commerce focus on both usability and business processes behind the web site. Actually, usability consists of interfaces and processes. Hence, this approach leads us to concept of ease of use in means of both web content and business procedures running behind web server. However, it should not be overlooked these both side of usability. 2.1 Basic E-Shopping Process Basically, e-shopping process is very similar with physical one. The shopper firstly searches for the goods she/he needs. Then, she/he checks the properties and compares the alternatives, later, makes a decision, and finally check-outs. Similar process can be defined for e-shopping. Of course, some additional external steps will exist for e-commerce (Fig.1).
Entering the web site
Shoppi ng cart
Navigation for product
Payment and checkout
Check product properties
Order tracking
Compare alternatives
Deliver y
Fig. 1. Purchase process in e-commerce
A successful e-commercial web site should reflect the customer’s mental model. Kwan et al [6] divides e-customer behavior into three phases when requesting an URL: Phase 1: Awareness; request entry, home page, browse page Phase 2: Exploration; login page, registration page, search page Phase 3: Commitment; select page, add to shopping cart page, payment page Helander and Khalid [7] define a systems model for human factors research in ecommerce. There are three sub-systems; web environment, costumer, and web technology. They classify the design parameters dependent on these sub-systems.
Axiomatic Design Approach for E-commercial Web Sites
311
Design parameters are physical environment, description, images, arrangement, browse for merchandize, map, index, hierarchies, landmarks, search for navigation, minimum number of clicks, shopping basket, images for easy to purchase, games, free gifts for promotions, seller, buyer for feedback in web environment; needs, attitudes, purchasing power, competence, addiction, motivation, age, trust for modulating variables in customer, and search agents for features, tools, artifacts for controls, visual and auditory for displays in web technology. 2.2 Critical Success Factors in E-Commerce Sung [8] summarizes the critical success factors for e-commerce and compares them on a basis of East and West. According to this study, there sixteen factor for the success, i.e. customer relationship, privacy of information, low-cost operation, ease of use, electronic commerce strategy, technical electronic commerce expertise, stability of systems, security of systems, plenty of information, variety of goods/services, speed of system, payment process, services, delivery of goods/services, low price of goods/services, and evaluation of electronic commerce operations.
3 Axiomatic Design Concepts Suh [1] defines design as an interplay between what is wanted to achieve and how is it be achieved. The common in all design activities is what the designers must do: i) know or understand their customers’ needs, ii) define the problem they must solve to satisfy the needs, iii) conceptualize the solution through synthesis, iv) perform analysis to optimize the proposed solution, and v) check the resulting design solution to see if it meets the original customer needs. Design begin with “What we want to achieve?” and end with a clear description of “How we will achieve it?”. There are iterations between “What” and “How”. Each loop in iteration must clarify “What”. Actually, our final understanding from customers’ needs must be transformed into a minimum set of specifications. Classical design approach is iteratively, empirically and intuitively. It is based on experience, cleverness, or creativity. Additionally, it is involving much trial and error. Axiomatic design aims to establish a scientific basis for design. It improves design activities by providing a theoretical foundation. This foundation is based on logical and rational thought processes and tools. Axiomatic design requires a clear definition of design objectives. For this purpose, functional requirements and constraints are established. Criteria for good and bad design are obtained. Then, these help the designer to eliminate the bad ones as early as possible. Moreover, designers can concentrate on promising ideas. Additionally, it provides a decomposition process of a systematic flow from creation of concepts to detailed design. Engelhardt [9] says that engineering design schools often rely on subjective engineering judgments when modeling product structure and behavior. However, Axiomatic Design specifically addresses the internal relationships between a product’s components. It is a principle-based design method. According to Thielman and Ge [10], Suh’s Axiomatic Design provides a consistent framework based on a logical thinking process, and techniques to carry out design activities in a well-organized manner. Axiomatic Design by Suh [1] requires four domains for design lifecycle: i) Costumer Domain includes the needs (or attributes) (CAs) that the costumer looking
312
M.M. Yenisey
for in a product, ii) Functional Domain contains the customer needs in terms of functional requirements (FAs) and constraints(Cs), iii) Physical Domain covers the design parameters (DPs) to satisfy the specified FRs, and iv) Process Domain encompasses the process variables (PVs) characterize the developed process to produce the product specified in terms of DPs. Axiomatic Design has two fundamental axioms to establish a scientific foundation for design activity [1] [11]: Axiom1: The Independence Axiom; Maintain the independence of functional requirements (FRs) The mapping between FRs and DPs is represented by a design equation: {FR}=[A] {DP}
(1)
where {FR} is a column vector that contains all the FRs of design, {DP} is a column vector that contains all the DPs, and [A] is the design matrix defining the relationships between DPs and FRs. If number of FRs (n) is equal to that of DPs then [A] is a square matrix of size n x n. An element of matrix [A], Aij is given by
Aij =
∂FRi ∂DPj
(2)
If DPj influences FRi this element is non-zero. Otherwise, it is zero. Moreover, a strictly uncoupled design has a matrix with only diagonal elements are non-zero. This situation guarantees that the FRs are completely independent. However, it is very difficult to obtain such a design in real world. Designs where FRs are satisfied by more than one DP are acceptable. In this case, the matrix is triangular. Axiom2: The information Axiom: Minimize the information content of the design Information axiom defines the information content of a design with entropy, expressed as the logarithm of the inverse of the probability of success p:
I = log 2
1 p
(3)
These two axioms lead designer to obtain the best design. Thus, the first axiom forces that each requirement can be fulfilled by design parameters without affecting other requirements while the second axiom indicates that the best design is the one with the least information content. The first axiom facilitates concurrent design without interactions. The second axiom is a variation of the old adage “keep it simple”. Hence, they represent two quality characteristics of the design [12][13].
4 Axiomatic Design Approach for E-Commercial Web Sites An e-commercial web site has two dimensions. The first dimension is the appearance of the web site and the second one is processes running behind it. Both of them define
Axiomatic Design Approach for E-commercial Web Sites
313
Shopping in an easy and usable way (CA)
FR A web site with high
DP System design
FR1 Web site appearance
FR2 Process management
FR3 Technic al issues
DP1 Web site design
DP2 Process design
DP3 Usage of
FR11 Easy navigation
FR21 Customi zation
FR31 Security
DP11 Navigati on bar
DP21 User preferences definition
DP31 Strong encryption mechanism
FR12 Locate the items
FR22 Shoppin g cart
DP13 All pages look similar
DP22 Dynamic ally changing shopping cart in another window
DP32 Prevent database against unauthorized Access
DP14 Color contrast
DP23 Smple, short processes
FR13 Web site consistency FR14 Color FR15 Text
FR23 Ease of use processes FR24 Integrati on with suppliers FR25 Accurat e information FR26 Efficient checkout
FR32 Privacy FR33 Transact ion speed FR34 Data transfer speed
DP12 Efficient search mechanism
DP15 Fontsize
DP24 Strong process interaction
DP33 High quality servers DP34 Faster networking technology
DP25 Frequent update policy DP26 Checkout mechanism highly suitable wit human mental model
Fig. 2. An example decomposition diagram for e-commercial web sites
the degree of easy to use for the web site. The success is measured by the metrics related to activities in these dimensions.
314
M.M. Yenisey
Of course, the ease of use reflects the costumer requirements in terms of functional requirements (FRs). Web site’s physical design and the processes developed are design parameters (DPs). By this viewpoint of classification, it becomes true to apply the axiomatic design approach to the design of a commercial web site in the means of both usability and processes. It is surely that they are all based on the technological issues. It is clear that there may be conflicting FRs. That is, a costumer requires a web page having densely animations while desiring shorter download times. Apart from this, it is better to separate procedural part of a web site from physical layout to overcome such collisions. Such an distinction becomes very important especially for e-commercial web sites since the commercial side consists of many processes. FRs can easily be expressed as numerical value since they are related to measurable metrics. Again, DPs can also be digitalized as they are directly in measurable form. Hence, it will be possible to apply two axioms of the Axiomatic Design. There are several works in literature discussing the costumer attitudes or preferences in e-commerce [2], [6], [7], [14], [15]. Actually, these attitudes reflect the functional requirements of costumers. However, almost all studies in the literature have focused on appearance and/or content of the web site so far. Author of this study’s belief is that all aspect discussed in literature has a process running in background. A decomposition diagram expressing FRs and DPs based on the well known usability factors in literature is given in Fig.2. If this decomposition diagram examined according to the first axiom, it can be seen that an uncoupling design achieved. Therefore, this is an acceptable design.
5 Conclusions Axiomatic design is a recent approach to the design activities. It provides a scientific base for design. Moreover, it is very useful to obtain a simple design with minimal information content. In this paper, a decomposition diagram was proposed to show the ability of application of Axiomatic Design to e-commercial web sites. In this paper, it was discussed that e-commercial web sites have not only been made of usability aspects but also procedural features. Apart from this,
References 1. Suh, N.P.: Axiomatic Design: Advances and Applications. Oxford University Press, New York (2001) 2. Ng, C.F.: Satisfying Shoppers’ Psychological Needs: From Public Market to Cyber-mall. Journal of Environmental Psychology 23, 439–455 (2003) 3. Trepper, C.: E-commerce Strategies: Mapping Your Organization’s Success in Today’s Competitive Marketplace. Microsoft Press, Washington (2000) 4. Rayport, J.F., Jaworski, B.J.: Introduction to E-commerce, 2nd edn. McGraw Hill, New York (2004)
Axiomatic Design Approach for E-commercial Web Sites
315
5. Schniederjans, M.J., Cao, Q.: E-commerce: Operations Management, World Scientific Publication Co, Singapore (2002) 6. Kwan, I.S.Y., Fong, J., Wong, H.K.: An E-costumer Behavior Model with Online Analytical Mining for Internet Marketing Planning. Decision Support System 41, 189–204 (2005) 7. Helander, M.G., Khalid, H.M.: Modeling the Customer in Electronic Commerce. Applied Ergonomics 31, 609–619 (2000) 8. Sung, T.K.: E-commerce Critical Success Factors: East vs. West. Technological Forecasting & Social Change 73, 1161–1177 (2006) 9. Engelhardt, F.: Improving Systems by Combining Axiomatic Design, Quality Control Tools and Designed Experiments. Research in Engineering Design 12, 204–219 (2000) 10. Thielman, J., Ge, P.: Applying Axiomatic Design Theory to the Evaluation an Optimization of Large-scale Engineering Systems. Journal of Engineering Design 17(1), 1–16 (2006) 11. Helander, M.G., Lin, L.: Axiomatic Design in Ergonomics and an Extension of the Information Axiom. Journal of Engineering Design, 13(4), 321–339 (2002) 12. Bras, B., Mistree, F.: A Compromise Decision Support Problem for Axiomatic and Robust Design. Advances in Design Automation 65(1), 359–369 (1993) 13. Su, J.C., Chen, S., Lin, L.: A Structured Approach to Measuring Functional Dependency and Sequencing of Coupled Tasks in Engineering Design. Computers and Industrial Engineering 45, 195–214 (2003) 14. Lightner, N., Yenisey, M.M., Ozok, A.A., Salvendy, G.: Shopping Behaviour and Preferences in E-commerce of Turkish and American University Students: Implication from Cross-cultural Design. Behaviour and Information Technology 21(6), 373–385 (2002) 15. Konradt, U., Wandke, H., Balazs, B., Christophersen, T.: Usability in Online Shops: Scale Construction, Validation and the Influence on the Buyers’ Intention and Decision. Behaviour and Information Technology 22(3), 165–174 (2003)
Development of Quantitative Metrics to Support UI Designer Decision-Making in the Design Process Young Sik Yoon and Wan Chul Yoon Department of Industrial Engineering, KAIST, 373-1, Guseong-dong, Yuseong-gu Taejeon, Korea {nanhari, wcyoon}@kaist.ac.kr
Abstract. The UI designer must be able to anticipate cognitive difficulties of users in the UI design process. However, the designer is likely to make erroneous judgments in the context of increasing functionality. Furthermore, time constraints in the development process exacerbate the design problem. There are various techniques to support the UI designer in the design process, including abstract design principles, specific design guidelines, design cases, design inspections, and design metrics. Metrics can summarize the status of a UI design solution more objectively and more accurately than human designers. This paper aims to develop quantitative metrics based on a unified framework for interaction design, which decomposes UI design problem into the four components: information architecture, task procedure, system dynamics, and physical interface. Three metrics were proposed to assist designer’s decisionmaking, including incongruity, complexity, and inefficiency. A case study shows that the proposed metrics can support the designer’s decision making in an efficient manner. Keywords: Model-based UI Design, Metrics, Design Aids, Usability.
Development of Quantitative Metrics to Support UI Designer Decision-Making
317
another. Therefore, it is necessary to provide the designer with holistic information that describes multiple dimensions of the UI design space. In the present research, metrics are defined as numerical values that can reflect the status of a UI design solution. A set of metrics can evaluate multiple design aspects, and thus can serve as powerful tools in a design-evaluation process. There are two kinds of metrics: internal metrics and external metrics[4]. The former measure the internal attributes of the designed interaction, such as complexity, ambiguity, and efficiency. The latter represent the external perspective of the designed interaction when the system is in use, such as user performance time, number of errors, and subjective satisfaction. External metrics can be used as usability indicators[2]; however, they are ineffective for an economically feasible design-evaluation process. Although it is not easy to identify useful internal metrics, the internal metrics can serve as effective and early indicators for the usability of designed interaction. (Hereafter, ‘internal metrics’ will be referred to as ‘metrics’ for convenience.) This paper suggests three quantitative metrics that can support UI designer decision-making in the design-evaluation process. We apply a systematic modelbased approach to capture usability information of a designed interaction. These metrics can be applied to help the UI designer make a decision in an efficient and effective manner. Furthermore, the metrics can be based in automatic design and evaluation tools, which can reduce the cognitive work demands imposed on the UI designer. This paper is organized as follows. Section 2 provides an overview of related works in metrics for supporting user interaction design. Section 3 describes a unified framework for interaction design and the three metrics, including incongruity, complexity, and inefficiency. Section 4 presents a case study where the proposed metrics were applied to support the UI designer. Finally, section 5 describes our conclusions and further research directions.
2 Background and Related Works The UI designers must anticipate users’ behaviors during their interaction with a system. The Task-Interface Matching, or TIM, framework provides a unified view of the use/design space to deal with the usability problem[10]. Under this unified framework, both the use and the interaction design are employed to relate the tasklevel knowledge with the interface level knowledge. Matching between abstraction levels does not entail random coupling of tasks and interface means. The designers should comply with existing social standards and users’ prior knowledge at these levels. Therefore, the matching relations must be congruous with the top-down and the bottom-up expectations of users. That is, the designers’ TIM relations must be congruous with the users’ TIM relations. The UI design problem can be decomposed into two domains, the behavioral domain and the constructional domain[7]. The former concentrates on user-centered aspects of interaction design, while the latter focuses on system-centered aspects, such as every operation and state, and the transitions among them. Many researchers advocate a coupling method that utilizes functional and structural models for representing the behavioral and constructional domains. Navarre et al. proposed a tool
318
Y.S. Yoon and W.C. Yoon
that couples CocurTaskTrees for task modeling and Petri nets for system modeling[11]. Lee and Yoon suggested a coupling method that integrates OCD for the functional models and statechart for the structural model[9]. Model-based UI design is an effective approach to manage the design complexity[12]. However, there can exist some limitations in model-based design approaches when supporting tools are not available. Designing and evaluating with a manual approach is a time-consuming endeavor[8], imposes a high work load on the designers[3], and may lead to completion of the design search with premature, suboptimal solutions[13]. One approach to address these limitations is to develop useful metrics and to guide the design search with the metrics. Many kinds of metrics have been proposed to address different types of usability attributes in interaction design. First, complexity is one of the most popular and widely applied metrics in evaluating human-machine interaction[14-16]. Complexity can deteriorate human performance and cause human errors[18]. Lee et al. proposed the use of system entropy to assess the cognitive complexity of an interface. This metric predicts the difficulty of learning how to use an interface by taking into account the user’s state schemas[16]. Second, efficiency is another important metric frequently referred to in the literature pertaining to usability design[2, 17]. The UI designers must provide an efficient procedure for frequent and important tasks of users. Several members of the GOMS family can assess the efficiency of an interaction sequence by predicting the task execution time[17]. Finally, the prior knowledge of the user is an important issue for usability design. Many studies have suggested that learning and comprehending new knowledge can be facilitated when that knowledge is compatible with prior knowledge. There are many other possible metrics for usable interaction design. However, we focus here on applying a few important metrics based on a unified UI design framework.
3 A Framework for Interaction Design and I2C Metrics 3.1 A Framework for Interaction Design The unified framework for interaction design, shown in Fig. 1, consists of four components: information architecture, task procedure, system dynamics, and physical interface. First, information architecture (IA) is the structure within which the information, functionalities, and services are grouped. The IA component affects the user’s performance of navigation tasks. Second, task procedure (TP) is the functional model of user interaction. It represents user actions when interacting with a system. The OCD model is an effective tool to describe the task procedure in an operationcentered manner. Third, system dynamics (SD) is the structural model of user interaction. It describes the designed interaction in a state-centered manner. Finally, physical interface (PI) represents the physical aspects of the user interface, such as interface layout, UI controls, and information elements. The PI component provides a common ground for interaction between the user and system.
Development of Quantitative Metrics to Support UI Designer Decision-Making
319
Fig. 1. A unified framework for user interaction design
3.2 Models for Task Procedure and System Dynamics We can model each component of the framework, described in the previous section, with some modeling techniques. We use the OCD model for the task procedure, and use the state-operation matrix for some aspects of the system dynamics. The basic entities of the OCD are operation, abstract operation, state, state closure, and state header. The OCD can represent three procedural structures: sequence, branch, and loop. The elements of the state-operation matrix, aij, have a binary value of 0 or 1, which reflects the availability of the jth operation at the ith state. The notation is given in Fig. 2, and further details are presented in [5, 9, 16]. operation O1 O2 O3 O4 O5 O6
state state header abstract operation
S1
1
1
0
0
0
0
S2
1
1
0
0
0
0
S3
0
0
1
1
1
1
state closure (a) Entities of OCD
(b) State-Operation Matrix
Fig. 2. OCD notations and an example of an S-O matrix
3.3 Proposed Metrics as UI Design Aids We considered the following axioms to develop the metrics for usable interaction. First, the designed task procedure should be compatible with the prior knowledge of the user. That is, the designer must minimize the semantic gap between the user’s
320
Y.S. Yoon and W.C. Yoon
procedural knowledge and the designer’s conceptualization. Second, the user should be able to perform a task in an efficient manner. For example, the designer can provide a shortcut for important and frequently performed tasks. Third, the designer must reduce the relational complexity within a state-transitional structure. In this work, we suggest three metrics, incongruity, inefficiency, and complexity. Detailed descriptions of the metrics are given in the following sections. Table 1. The proposed metrics for interaction design Metric Name Incongruity
Description The transformational distance between the user’s procedural knowledge and the designed task procedure The length of the task procedure The entropy within the designed state transitional relations
Inefficiency Complexity
Related Model OCD
OCD S-O Matrix
Incongruity. The task procedure can be represented by a diagrammatic model, OCD. While description of the designed interaction with the OCD is a straightforward process, some additional steps, shown in Table 2, are needed to represent the user’s prior knowledge. The incongruity is estimated by measuring the transformational distance between OCD pairs. There are many kinds of operators to transform one structure into another[19-20]. However, the psychological relevance of the operators is thus far unknown. Here, we chose three operators - insert, delete, and substitute. The operators are frequently cited in the literature, and thus we assume the operators have psychological relevance. The incongruity for two OCDs of the ith task is defined as given in Eq.(1). The weight of each variable, wx, can be estimated by a regression analysis of the data from the usability test. We assume that the state of the OCD has negligible effects on the incongruity, because users learn the procedural knowledge in an operation-centered manner. Table 2. The process of representing user's procedural knowledge Step 1
Description Provide the user with a physical interface and a task list Then, record the user’s expectation during the interaction Refine the expected interaction procedures Represent the informal protocol data with the OCD
Inefficiency. The inefficiency is defined as the weighted sum of the length of the designed task procedure. The weight is a value between 0 and 1. The designer should allocate more weight for frequent and important tasks when calculating the
Development of Quantitative Metrics to Support UI Designer Decision-Making
321
inefficiency. The inefficiency is calculated using Eq. (2). We estimate the length of the task procedure by counting the number of operations in OCDs. IE ( D ) = ∑ wi ⋅ Li ,
where Li : the length of task procedure for task i
(2)
i
Complexity. The complexity is defined based on the entropy, an information theory concept, with reflecting schemas. Users attempt to identify whether an operation is available at a state while planning procedures for their tasks. Thus, the schemas originate from the similarity between states in which an operation is available. The process to calculate the complexity is described in Table 3 and Eq. (3). Table 3. The process of calculating the complexity Step Description 1 Represent the designed interaction with an SO matrix (A) 2
Calculate a similarity matrix (C) based on matrix A
Equation A = {aij | i = 1, " , n; j = 1, " , r} where aij = 1,
if Oi is available at S j
aij = 0, otherwise I ikj = −{log P (a ⋅ j = a ij ) + log P (a ⋅ j = a kj )}
C = {cik | c ik =
∑I
j ik
∑I
j ik
j∈S ik
, S ik = { j | a ij = a kj }
j
3
Calculate a weight matrix (W) based on matrix C
1 , for i = j ⎧⎪ C' = ⎨ C , others ij ∑ C kj ⎪⎩ k ≠i W = {wij | wij = C ij' ∑ C ij' }
4
Calculate an entropy matrix (I)
E[ A] = W T A
j
⎧ E ( a ij ) P (a ij ) = ⎨ ⎩ 1 − E ( a ij ) I ( a ij ) = − log P( a ij )
, a ij = 1 , otherwise
CM ( D) = ∑∑ I (aij ) nr , where A = {aij | i = 1,", n; j = 1,", r} i
(3)
j
4 A Case Study: Designing Interaction of an MP3 Player We introduce a simple example to demonstrate how the metrics are used as design aids in a UI design process. A multi-functional MP3 player is rather complex and can illustrate the utility of the proposed method. We made three design solutions by surveying and modifying a variety of UI cases. A detailed description follows. 4.1 Information Architecture and Physical Interfaces We assume that the design solutions have a common function list and common information architecture. The designed MP3 player has the following four functions:
322
Y.S. Yoon and W.C. Yoon
M P3P
Solution (1)
Back
M usic
FM Radio
Voice
Setup
Play / Pause
Record/Pause/Save
Play / Pause
Focus++ / Focus--
FF / Rew
M em ory/M .C ancel
Record/Pause/Save
Select
F.Scan / B.Scan
C h++ / C h--
Vol++ / Vol--
Value++ / Value--
Vol++ / Vol--
C h Scan
R
Rew
P/P
FF
▶
P/P
AB
R
Solution (2)
◀
OK
Solution (3)
◀
Vol++ / Vol--
OK
▶
P/P
R
Menu Back
Fig. 3. Information architecture and physical interfaces of the designed MP3 player
(1) play mp3 file or radio channels, (2) record a voice or radio channels, (3) manage files or radio channels, (4) setup the MP3 player. The information architecture and three design solutions are given in Fig. 3. 4.2 Modeling User Interaction We conducted a task analysis and created an initial task list, which includes playing MP3 files, playing radio, recording voice, setting up the lighting time of the display, storing a radio channel in memory, and setting up the radio volume. Fig. 4 shows the OCD pairs and the state-operation matrix of solution (1). User’s Procedural Knowledge
Designed System Dynamics
T1: Playing MP3 ¡ã
¡å
¡ã
music_ready
menu R
Back
¡å
¡ã
¡å
¡ã
¡å
¡ã R
Designed Task Procedure T1: Playing MP3 ¡ã
¡å
¡ã
R
Rew
Rew -L
PP
FF
FF -L
Menu
0
1
1
1
0
0
0
0
0
Music _list
1
1
1
1
0
0
0
0
0
Music _play
1
1
1
0
1
1
1
1
1
Radio _play
1
1
1
1
1
1
0
1
1
Voice
1
0
0
1
0
0
0
0
0
Voice _record
1
0
0
1
0
0
1
0
0
Setup
1
1
1
1
0
0
0
0
0
Lignting _time
1
1
1
1
0
0
0
0
0
¡å
lighting_time
setup R
¡å
music_list
menu
¡å
R
T4: Setting up the Lighting Time of Display menu
¡ã
music_list R-L
R
R
T4: Setting up the Lighting Time of Display ¡ã
¡å
¡ã
¡å
¡ã
setup
menu R
¡å
lighting_time R
R
Fig. 4. Modeling task procedures and system dynamics
4.3 Metrics-Based Analysis of the Design Solutions The design solutions were analyzed with the proposed metrics, as presented in Table 4. As an initial attempt, we calculated the incongruity without considering the relative importance of each transformational operator. That is, the incongruity value is an average of the number of transformational operators in a design solution. In
Development of Quantitative Metrics to Support UI Designer Decision-Making
323
calculating the inefficiency, we assigned more weight to the tasks 1, 2, 3, 5, 6 than to task 4. As a result, the ranking of the usable interaction design is as follows: design solution(1), design solution(3), and design solution(2). Table 4. Comparison of the design solutions based on the proposed metrics
5 Conclusion and Further Research In this work, a unified framework for interaction design is proposed and three quantitative metrics – incongruity, inefficiency, and complexity – are suggested. The metrics reported here are expected to complement usability testing by quantifying the usability attribute of interaction design to some extent. This will effectively reduce the development costs of user interaction. As future work, we plan to validate the proposed metrics with a series of empirical tests. A metric-based design support system also will be developed to facilitate the UI design process.
References 1. Yoon, W.C.: Identifying, Organizing and Exploring Problem Space for Interaction Design. In: Proceedings of the 8th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems, Kassel, Germany, pp. 81–86 (2001) 2. Nielsen, J., Mack, R.L.: Usability Inspection Methods. John Wiley & Sons, Inc, West Sussex, England (1994) 3. Sears, A.: AIDE: A step toward metric-based interface development tools. In: Proceedings of the 8th Annual ACM Symposium on User Interface and Software Technology, Pittsburgh, Pennsylvania, United States, pp. 101–110 ( 1995) 4. ISO: ISO/IEC DIS 14598-1 Information Technology – Evaluation of Software Products – Part 1, General Guide (1996) 5. Yoon, W.C., Park, J.S.: A diagrammatic model for representing user’s interface knowledge of task procedures. In: Proceedings of Cognitive Systems Engineering in Process Control, Kyoto, Japan, pp. 276–285 (1996) 6. Kang, H.G., Seong, P.H.: An Information Theory-Based Approach for Quantitative Evaluation of User Interface Complexity. IEEE Trans. on Nuclear Science 45(6), 3165– 3174 (1998) 7. Hix, D., Hartson, H.R.: Developing User Interfaces – Ensuring Usability Through Product and Process. John Wiley & Sons, Inc, New York (1993) 8. Paterno, F.: Tools for Task Modeling: Where we are, Where we are headed. In: Proceedings of the 1st International Workshop on Task Models and Diagrams for User Interface Design, Bucharest, Romania, pp. 10–17 (2002) 9. Lee, D.S., Yoon, W.C.: Coupling structural and functional models for interaction design. Interacting with Computers 16, 133–161 (2004)
324
Y.S. Yoon and W.C. Yoon
10. Yoon, W.C.: Task-Interface Matching: How we may design user interfaces. In: Proceedings of the 15th Triennial Congress of the International Ergonomics Association, Seoul, Korea (2003) 11. Navrre, D., Palanque, P., Paterno, F., Santoro, C., Bastide, R.: A tool suite for the coevolutionary design of user interfaces. In: Proceedings of the 8th International Workshop on Design, Specification of Interactive Systems, Glasgow, Scotland, pp. 88– 113 (2001) 12. Paterno, F.: Model-based Design and Evaluation of Interactive Applications. Springer, Heidelberg (1999) 13. Visser, W.: Use of episodic knowledge and information in design problem solving. In: Cross, N., Christiaans, H., Dorst, K. (eds.) Analysing Design Activity, pp. 271–289. Wiley, New York (1996) 14. Rouse, W.B., Rouse, S.H.: Measure of Complexity of Fault Diagnosis Tasks. IEEE Trans. On Systems, Man, and Cybernetics 9(11), 720–727 (1979) 15. Payne, J.S., Green, T.R.G.: Task-Action Grammars: A model of the mental representation of task languages. Human-Computer Interaction 2, 93–133 (1986) 16. Lee, D.S., Yoon, W.C, Choi, S.S.: An Entropy-Based Measure for Evaluating the Cognitive Complexity of User Interface. Korean Journal of The Science of Emotion and Sensibility 1(1), 213–221 (1998) 17. John, B.E., Kieras, D.E.: Using GOMS for user interface design and evaluation: Which technique? ACM Trans. on Computer-Human Interaction 3(4), 287–319 (1996) 18. Wickens, C.D.: Engineering Psychology of Human Performance, HarperCollins Publiser Inc. (1992) 19. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19, 255–259 (1998) 20. Hahn, U., Chater, N., Richardson, L.B.: Similarity as transformation. Cognition 87, 1–32 (2003)
Scenario-Based Product Design, a Real Case Der-Jang Yu and Huey-Jiuan Yeh Scenario Lab, No 1, Lane 49, Sianjheng 1st St., Jhubei City, Hsinchu County 302, Taiwan Creativity Lab, Industrial Technology Research Institute, 195 Sec. 4, Chung Hsing Rd. Chutung, Hsinchu County 310, Taiwan
Abstract. This paper proposes a simple framework for implementing SBD. This framework consists of four elements: a basic story structure, an innovation acceleration field, a tool for expressing idea/describing scenario, and an activity theory-based tension detector/idea stimulator, and a process based on the Chinese traditional literature four-stage creation process. A case study is presented at the end of the paper to demonstrate the feasibility of the proposed framework. Keywords: Scenario-based design, Activity Theory.
1 Introduction A product design team faces the challenge of working with people from multiple disciplines. Also, factors such as user needs, user experiences, and user emotion must be taken into account. They need a simple, familiar, and intuitive method to bridge not only the gap between designers and users, but also between designers and designers. Such a bridge can facilitate the exchange of experience and the sharing of knowledge easily and effectively. Scenario-based design (SBD) technique is a good approach for solving communication problems between design team members. The issue now becomes how to build a successful scenario-based product design process to resolve the aforementioned issues. This paper proposes a simple framework for implementing SBD. This framework consists of four elements: a basic story structure, an innovation acceleration field, a tool for expressing idea/describing scenario, and an activity theory-based tension detector/idea stimulator, and a process based on the Chinese traditional literature fourstage creation process. The target users of this framework are people who received no trainings in usability engineering or user-centered design.
means the existing stories that lifestyle or ethnographic researchers collect from the fields. Scenario, or “new scenario”, is the new stage that the people want to move to from old story. Scenario is generated based on the changes of tool, people, or situation and the causal activity narration. On the other hand, scenario is also generated based on a new narration that causes changes to tool, people, or situation. In our approach, we will focus on making changes to tool. 2.2 Innovation Acceleration Field A field to accelerate the stimulation of ideas is created by identifying negative issues and positive expectations from old story. The negative and positive issues provide the SBD users inspiration and stimulation for creating new tool to complete the “new scenario” originated from “old story”. 2.3 Idea-Scenario Sketch Sheet Idea-scenario sketch sheet is used for expressing ideas and describing scenarios. The sketch sheet has a drawing area on the right and space for text on the left. In the case of old story, the right side contains the snapshot or drawing related to the story and the story is narrated on the left. Similarly, for new scenario the related sketch is presented on the right and the new scenario is narrated on the left. Also, the positive and negative issues are required to be addressed in the old story narration. 2.4 Activity Theory-Based Tension Detector/Idea Stimulator Activity theory-based tension detector/idea stimulator is used to help SBD users interpret the activities in old story. Following the activity theory’s actor-goal-tool concept, an SBD user pretends to be the subject in the story and tries to understand the goal of intension and the obstacle between the goal and story teller. The objective is to understand the usage and incompleteness of tool and to create solution based on the SBD user’s own knowledge and experience.
3 The Chinese Traditional Literature Four-Stage Creation Process “Chi”, “chen”, “tsuang”, “ho” mean “start”, “adapt”, “evolve”, and “conclude”, respectively. It is the Chinese traditional literature four-stage creation process. It also enables a reader to enjoy and appreciate the esthetics of Chinese literature. Chinese people use this process to create literature works, and also acquire wonderful experience while reading the text. This traditional process is adopted to guide the SBD team through different design stages where team members can enjoy and experience a journey leading to successful innovations. • Start: the stage where old story is collected; positive/negative issues are identified. • Adapt: the stage where the SBD team studies the story and uses the activity theorybased tension detector/idea stimulator to come up with new ideas.
Scenario-Based Product Design, a Real Case
327
• Evolve: the stage where each SBD team member explores ideas or creates new scenarios which in turn stimulate new ideas/scenarios. Once this process ends, all members will share the results with each other. • Conclude: the stage where the final ideas/scenarios are created by evaluating, selecting, combing, and refining the ideas/scenarios created in the previous stage.
New scenarios
Old story
起
承
轉
合
start
adapt
evolve
conclude
Chinese traditional literature four stage creation procedure
Fig. 1. The four stages process transforms old story to new scenario
4 A Case Study A major scooter company in Taiwan wanted to design a new model utilizing SBD methodology. 4.1 The Start Stage The project started with user research. Thirty-two pioneer users were selected from 300 target market users. After the focus group meeting, 10 were invited to participate
Start Stage
起 Positive issues
境 人
Negative issues
situation
活動
activity
people
Old story
物
tool
“People”, “tool”, “situation”, “activity”, the story/scenario structure
起
承
轉
start
adapt
evolve
合 conclude
Chinese traditional literature four stage creation procedure
Fig. 2. The start stage, old story is composed of People, tool, situation, and activity
328
D.-J. Yu and H.-J. Yeh
in further ethnographic studies which included scooter use history, scooter usage diary, introduction of each participant’s vehicle, and one-on-one interview. Participants were asked to provide memorable and unique experiences using their scooters. Based on the ethnographic studies, a new market segment was identified and selected as the target market for the new product. Ethnographic data was collected. Interested statement pieces carrying negative or positive issues were used to create 50 short stories. 4.2 The Adapt Stage A multidiscipline team was formed to include product designers, market planners, and product planners. At this stage, two workshops were held in which team members read the stories collected from the start stage and created idea-scenario on the ideascenario sketch sheets by using the activity theory based tension detector/idea stimulator. The stories allowed the team members to empathize with the story tellers. Each member was encouraged to create ideas for the stories that affected them most. As a result, ideas for new scooter functions that would satisfy the new target market were created. Adapt Stage
Chinese traditional literature four stage creation procedure
Fig. 3. The adapt stage, innovation acceleration field and Activity theory-based tension detector/idea stimulator helps SBD user create ideas
4.3 The Evolve Stage About 50 idea-scenario sketches were created. During the presentation, each creator described not only the new concepts and scenarios in detail but also the process of creating them. Therefore, each member was able to understand the key points of the creations by the other members. This common understanding improved the communication between team members and helped generate better ideas. Each ideascenario snapshot was evaluated by a voting process. All ideas-scenarios are grouped based on similarity. The number of votes for each concept group indicated its value to the team.
Scenario-Based Product Design, a Real Case
329
Evolve Stage ᠏ Storyboard sketch sheet
Idea-scenario sketch sheet
ದ
ࢭ
᠏
start
adapt
evolve
ٽ conclude
Chinese traditional literature four stage creation procedure
Fig. 4. The evolve stage, Idea-scenario sketch sheet and Activity theory-based tension detector/idea stimulator help SBD users implement idea-scenario sketches and share with team members
4.4 The Conclude Stage Ideas-scenarios were selected from each group based on the votes to create 12 advertisement posters for the new scooter show to be held two years later. One more vote was cast to select the final advertisement poster to be used for the new generation scooter. Conclude Stage ٽ
New scenarios
ದ
ࢭ
᠏
start
adapt
evolve
ٽ conclude
Chinese traditional literature four stage creation procedure
Fig. 5. The conclude stage, the final ideas/scenarios are created by evaluating, selecting, combing, and refining the ideas/scenarios created in the previous stage
The results were used by the market planning division which started the new product development. Since each team had enough ethnographic information and new ideas and scenarios, and the cooperation experience between different teams, the new product development started smoothly.
330
D.-J. Yu and H.-J. Yeh
5 Conclusions SBD plus the proposed start-adapt-evolve-conclude is a powerful tool for product design. This approach is Chinese culture specific and has been successfully applied in various design projects in Taiwan. Further study is needed to determine whether or not it can be successfully adopted in a non-Chinese setting.
References 1. Yu, D.J.: Scenario-Oriented Design, Garden City, Taipei (2000) 2. Carroll, J.M.: Scenario-Based Design. Jhon Wiley, and Sons, New York (1995) 3. Usability Engineering, Scenario-Based Development of Human-Computer Interaction San Francisco, Morgan Kaufmann (2002) 4. Schank, R.: Tell Me a Story: Narrative and Intelligence, Evanston, IL, Northwestern University Press (1995) 5. Carroll, J.M.: Making Use: Scenario-Based Design of Human-Computer Interaction. MIT Press, Cambridge, MA (2000)
Designing Transparent Interaction for Ubiquitous Computing: Theory and Application Weining Yue, Heng Wang, and Guoping Wang Department of Computer Science and Technology, Peking University [email protected]
Abstract. Designing transparent interaction is important for ubiquitous computing (ubicomp). A psychology framework that characterizes user’s cognitive behavior in ubicomp environments would be invaluable for guiding the interaction design to be optimally compatible with human capabilities and limitations. By analyzing the cognitive skill and attention selectivity, such a framework is proposed in this paper. Correspondingly, a context-sensitive multimodal architecture is presented on the level of technology. A case study, where the theory was implemented in a handheld hypermedia guide and deployed into the context of authentic use, is then discussed.
As Weiser indicated, the disappearance of computation is a fundamental consequence not of computing technology, but of human psychology. This implies the importance of a psychology framework that characterizes user’s cognitive behavior in ubicomp environment when designing transparent interaction. Such a framework would be invaluable for guiding the interaction design to be optimally compatible with human capabilities and limitations. In this article, we propose a cognitive framework to describe users’ common features in cognitive skill and attention selectivity. Then technology architecture is correspondingly presented to support the general ubicomp interaction design. Finally we introduce an application where the theory was implemented and deployed into the real-world use.
2 Cognitive Psychology Framework As to the psychology framework, distributed cognition [3] is regarded as a new foundation of human computer interaction. The central hypothesis is that the cognitive and computational properties of systems can be accounted for in terms of the organization and propagation of constraints. Interacting Cognitive Subsystems (ICS) [4] represents the human information processing mechanism as a highly parallel organization with a modular structure. The assumption is that we are dealing with a system of distributed cognitive resources, in which behavior arises out of the coordinated operation of the constituent parts. As a fundamentally systemic approach to mental processing, ICS encompasses all aspects of perception, cognition, and emotion, as well as the control of action and internal bodily reactions. Norman also studies various psychological issues in his The Visible Computers [5]. These work made substantial contribution. However, none is the methodology that we can readily pick off the shelf and apply to a design problem. Lots of efforts need to be paid on understanding the concepts and learning to interpret and re-represent data captured in interaction design. In this paper we want to give a cognitive framework that one can directly adopts while designing ubicomp interaction. 2.1 Cognitive Skill Interaction load is most directly affected by cognitive skill. According to the widely accepted ACT (Adaptive Control of Thought) model [6], cognitive skill develops in three stages. In stage one, the declarative stage, the user produces a crude approximation of the skill by using general purpose problem solving strategies to interpret facts about the skill. Performance is slow, error prone, and working memory load is high because facts about the skill (e.g., the correct sequence of operations) must be actively rehearsed. The second stage, named knowledge compilation stage, is characterized by speedup, more seamless performance, and dropout of verbal mediation. During this phase, declarative facts about the skill are converted into procedural knowledge through knowledge compilation. Procedural knowledge is a collection of productions, or if-then statements that specify a cognitive condition and an action that will be performed if that condition is met. Two mechanisms underlie knowledge compilation: composition and proceduralization. Composition collapses successive productions into single productions, and produces speedup and more
Designing Transparent Interaction for Ubiquitous Computing: Theory and Application
333
seamless performance. The extent to which composition can occur is determined by the capacity of working memory because the conditions specified in a production must be represented in working memory. Through proceduralization, declarative facts are instantiated in productions, thereby eliminating the need to represent declarative information in working memory. Proceduralization is responsible for the dropout of verbal mediation. In the final phase, tuning, search of alternate solution paths becomes more selective. Generalization, discrimination, and strengthening are the three learning mechanisms responsible for proceduralization. With the development of cognitive skill, humans can perform tasks with less attention resources and low cognitive load. In desktop environment, users’ cognitive skill and knowledge on interacting with computers always begin at the first stage. Users have to spend a varying length of time to learn how to get their tasks done with certain applications. In this time span, interactions, being inefficient and unnatural, will distract lots of users’ attention on their own tasks. Moreover, a number of people cannot reach the second or the third stage even after a long time use. It violates the original purpose of ubiquitous computing. To minimize the cognitive load and technological distraction, ubicomp applications should allow users to utilize the procedural or tuned knowledge and skills they have obtained from daily lives to interact with computers. 2.2 Attention Selectivity In an ubicomp environment user’s cognitive and motor modalities, especially eyes and hands are often preoccupied by other tasks. So we also need to discuss the attention selectivity. Attention refers to a human’s ability to concentrate certain objects and allocate processing resources. We can think of it as a spotlight that we shine on things around us to make them “stand out”. Though people try to devote attention to several things at the same time, our ability to do so is clearly limited. When we give our attention to some, we ignore others inevitably. This cognitive limit puts computers in competition for user’s attention resource with other tasks and objects. Related topics have been discussed in depth in desktop computing. However, attention competition is more complex and important in ubicomp since users are often preoccupied with other physical or mental tasks while interacting with pervasive devices. Compared with computer applications, such physical and mental tasks are often more attractive for user’s attention. To explain this phenomenon, we need to address the allocation policy of human’s attention. As shown in Fig.1, according to the physiological mechanism of attention [7], the start of attention allocation is generally the stimulations on sense organs. As to the psychological mechanism of attention, Kahneman explains in his classic capacity model of attention [8] that people do have some control over how we allocate the mental capacity of attention, and the policy of allocation is principally affected by two factors. a) Intention and experience: the objects which users are more interested in and familiar with are more attractive. b) Evaluation of demands on capacity: humans will evaluate the demands on capacity in their minds when there are several things around us, and usually give attention to the ones who need lower capacity.
334
W. Yue, H. Wang, and G. Wang
Fig. 1. Attention allocation
The disappearance of computers will reduce its probability of obtaining user attention. Meanwhile, users are usually more familiar with and interested in their daily tasks rather than computers. Besides, long-time WIMP interactions make lots of users consider interacting with computers to be more difficult than performing their daily tasks. Thus user’s attention on computing systems will remarkably decrease in ubicomp, along with the reduction of explicit input. If interactions are still simply user-driven, the functionality of applications will be inevitably weakened even if users can interact multimodally. According to the features of attention selectivity, we should improve the adaptability and the activity of interaction so that tasks can be accomplished without entirely depending on explicit user input.
3 Interaction Architecture To deal with the challenges posed by cognitive skill and attention selectivity, two techniques play fundamental roles: multimodal interaction and context awareness. 3.1 Multimodal Interaction Humans speak, gesture, and write to communicate with other humans and alter physical artifacts everyday. For the majority of people, their knowledge and skills on performing and interpreting multimodal interaction have already been in the proceduralization or tuning stage. If ubicomp applications support more natural human forms of communication, they will create more natural and expressively powerful means of interaction, and will significantly reduce the cognitive load. Also, the flexibility will be improved since users can alternate modes and switch modalities as needed during the changing conditions. For example, a person may use speech input for voice dialing a car cell phone, but switch to pen input to avoid telling privacy during a public transaction. In ubiquitous computing, speech is considered to be the most potential modality is speech. It offers people more conveniences since it is entirely an eyes-free and handsfree modality. Users can interact with computers by speech when their eyes and hands are busy. Meanwhile, speech has obvious limitations. First, speech interaction is error-prone because recognition technology is still not reliable at present. Furthermore, speech is slow for presenting information, is transient and therefore
Designing Transparent Interaction for Ubiquitous Computing: Theory and Application
335
difficult to review or edit, and interferes significantly with other cognitive tasks, especially in noisy transactions. Consequently, we prefer pen as important assistance. Pen can act as the facilities of pointing, handwriting and gesture which are performed by millions of people everyday. Speech and pen are complementary along many dimensions [9]. By combining them, the parallel recognition and interpretation can yield a higher likelihood of correct recognition, and the strengths of one can be used to offset the weakness of the other. 3.2 Context Awareness Observing communication between humans we can see that the action of a person is always performed in a certain situation and lots of information is implicitly exploited in the exchange of messages. What happens in the surrounding environment often supplies valuable information that is vital for the communications. If computer applications can also utilize such information to characterize the situation, the activity and adaptability of interaction will be enhanced. In other words, computers should be able to have a certain understanding of user's behavior and surrounding states in a given situation, and use the contextual knowledge as additional input. The hints carried in context could help ubicomp appliances select the most appropriate mode and automate tasks, so that the attention minimization can be obtained. Context is an abstract concept and therefore difficult to capture directly. Salber and Dey's context toolkit [10] and Schmidt's architecture [11] proposed the idea of layered abstraction to extract ambiguous environmental situations into executable contextual information. In this architecture, physical or logical sensors are used to capture raw data from environment. Then the raw data is divided into several basic elements named cues, which provide an abstraction of sensor data. Generally, each cue is dependent on one single sensor; but using the data of one sensor, multiple cues can be calculated. Finally, a clustering algorithm is used to cluster cues into contexts. 3.3 Integration Multimodal interaction and context awareness are closely related. When integrating them at semantic level, two problems in particular require our attention: a) where and how context plays its role in the fusion, and b) how to make decision when some information collides with others in integration. The way in which humans integrate information from multiple sources can give us useful hints. Though the detailed mechanisms of information integration in human brains are still uncertain, the recent accepted Fuzzy Logical Model of Perception (FLMP) [12] can give us an overview. In this model there are two central assumptions: a) the sources of information are evaluated independently of one another and are integrated multiplicatively to provide an overall degree of support for each alternative, and the perceptual identification and interpretation follows the relative degree of support among the alternatives; b) the result of multimodal integration is not always all-ornot, but allows the fuzzy nature of information to be reflected in subjects evaluation and response. If the result is fuzzy, it will be integrated with contextual information to make the unambiguous decision.
336
W. Yue, H. Wang, and G. Wang
In accordance with the FLMP model, the fusion of multiple sources of information in ubicomp interaction is also designed to be a two-stage procedure. After recognized by parallel recognizers, user inputs are assigned with weight factors and integrated. If the result is clear enough to indicate an independent task and all its parameters are available, it will be issued. Otherwise, it will be integrated with contexts in the second stage. If the result is still ambiguous, application could ask the user to make the final decision. Context can also derive tasks independently. In order to control the tasks to communicate with high-level application and eliminate the conflicts among them, a buffer algorithm (e.g., statistical model, neural network) between task space and application is necessary. It is used to check if transitions among tasks are probable. If the transition is not very likely, the application will not change to the new behavior. Generally, the longer the system is trained, the better the performance becomes. The fusion of multimodal interaction and context awareness also plays its parts in output. Applications select the most appropriate modes to present the feedback information of high-level application based on context. Fig.2 illustrates the major components and data flow of the transparent interaction.
Fig. 2. Interaction architecture for ubiquitous computing
4 Application Handheld guides have been demonstrated in several research projects and commercial applications as a way to ease the plight of tourists. We developed a hypermedia handheld guide system, named TGH (Tour Guide in Hand), in accordance with the principles and architecture described above. It illustrates the basic idea of transparent ubicomp interaction. Besides, several novel facilities are implemented in the system. We deployed it into practice to test the theory in a real-world setting. 4.1 Interaction Facilities Users can perform tasks by speech, pointing, handwriting, sketching or a combination according to their own habits and task types. Table 1 shows some representative cases.
Designing Transparent Interaction for Ubiquitous Computing: Theory and Application
337
Table 1. Representitive cases of multimodel interaction Tasks Look for the location of A Look for the route from current location to A
Look for all the restaurants in certain area
Multimodal interaction modes Speak the name of A, or write by pen Input the entire command by speech, or speak “the way here” and point to A by pen synchronously speak “all the restaurants” along with sketching a area by pen on the screen
Contexts in TGH include location (current and past), time, moving direction, device orientation, task type, user’s operation habits and so on. The Markov chain model is used as the buffer algorithm to control the tasks. As mentioned above, the responses on context-based adaptations from users will be recorded to personalize successive interactions. Here we give an example to illustrate how contexts work: Example one: Detecting and displaying user’s location and trace on digital map is a basic function of handheld guides. However, we find that users often have to turn the devices’ or their own orientations to look for certain locations or routes, because the positive direction of map is always NORTH no matter users’ directions. It results in the mismatch between the physical representation and user’s psychological representations on orientation. Assume a user is walking towards the south, his will track “downwards” on the map, and physical objects on his right will be presented on the left. To solve this problem and allow users to locate objects more easily, the digital map in TGH is able to rotate automatically to suit the user’s direction. Users are also permitted to rotate the map back by sketching with pen. If the system detects a user does so several times, this adaptation will be stopped for s/he. 4.2 Performance Experiments To prove the advantage of context-sensitive multimodal interaction compared with context-insensitive unimodal style, we conducted user tests in two stages. In the first tests, which we ran in the campus of Peking University, we invited 36 undergraduates who are the novices at handheld device to study the performance by formative and qualitative evaluation in laboratories. After a short training, the participants are divided into three groups (A, B, and C), and given the same 12 typical tasks in guide application, shown in Table 2, to perform. In order to avoid the accident error, each type has two similar tasks. In group A, TGH worked in the traditional mode in which users could only interact by pen and all the context-based. facilities were stopped except location tracing. In Group B, users were permitted to interact multimodally. In Group C, interactions were contextsensitive multimodal. The result (see Figure 5) showed that users in Group C obtained the minimal completion time in average. In the experiment, we found that though multimodal interface could improve the efficiency and users did like being able to interact multimodally, they did not always do so when given free choice. Generally, they preferred to interact unimodally to perform simple tasks (e.g., task 1-4), and multimodally for those difficult or complex tasks (e.g., task 9-12).
338
W. Yue, H. Wang, and G. Wang Table 2. Experimental tasks Task description 1 Zoom in, zoom out and move the map 2 Display all the restaurants on the map 3 Look for the Stone Fish and point out its orientation 4 Look for the original owner of the Stone Fish from its textual introduction 5 Look for Boya Tower and point out its orientation 6 Look for the original purpose of the Boya Tower from its textual introduction 7 Look for the nearest ATM and point out its orientation 8 Look for the nearest classroom and point out its orientation 9 Look for the optimized route from West Gate to the library 10 Look for the optimized route from the library to Boya Tower 11 Look for the optimized route from current location to the Main Building 12 Look for the optimized route from current location to the post office
Fig. 3. Average task completion time. x-axis: task number; y-axis: average competition time (in seconds).
Besides the formative experiments, we ran a second stage series of tests to evaluate the general user acceptance in the context of authentic use. Eighty-three visitors to Peking University, between 15 and 54 years of age, were invited to use TGH in the campus of Peking University where they had never been to. During the tests, we observed users’ interactions, recording their reactions and comments. At the end of the test, the participant was required to express their experience with TGH and evaluate through a survey containing a 22-question questionnaire, which covered ease of use, general helpfulness, multimodal interaction, awareness and other aspects. Through the tests, we can see that the context-aware multimodal interaction does improve the user acceptance and reduce efforts. Meanwhile, we also noticed that currently the theoretical advantages of speech modality are compromised by the reliability and other problems which are essentially caused by recognition technology. It proves that simplex speech interface is still inappropriate at present. We also carried Chi-square test to find the correlations between some of the survey’s variables. According to the test, we found that the ease of use is strongly related to the age (90 percent confidence), with users aged from 19 to 44 enjoying the system more. The use of speech is closely related to the level of familiarity with computers (95 percent confidence), with novices being more likely to use speech. Enjoyability is correlated
Designing Transparent Interaction for Ubiquitous Computing: Theory and Application
339
with the user interface design such as the quality of the images and buttons (95 percent confidence). There are valuable for designing hypermedia guide tools.
5 Conclusion Improving the human computer interaction is a great challenge for ubiquitous computing. In order to design transparent interaction, a psychology framework that accounts for users’ cognitive behavior in ubicomp environment is proposed in this paper. By analyzing the cognitive skill and attention selectivity, two principles are proposed: a) allowing users to utilize their procedural or tuned knowledge and skill to interact with computers, and b) improving the adaptability and activity of applications. A context-sensitive multimodal architecture is then proposed to support universal interaction design. The implicit knowledge from contexts and explicit user inputs are integrated at semantic level in accordance with the FLMP model of multimodal integration of human brain. Finally, in a case study we give an overview of the handheld guide system TGH. User studies proved that context-sensitive multimodal interaction does improve the user acceptance and reduce efforts. Acknowledgements. This work was supported by NSFC (No. 60473100 and 60573151), and China 973 Program (No. 2004CB719403).
References 1. Weiser, M.: Computers for the 21st Century. Scientific American 265(3), 94–104 (1991) 2. Garlan, D., Siewiorek, D.P., Smailagic, A., Steenkiste, P.: Project Aura: Toward Distraction-free Pervasive Computing. IEEE Pervasive Computing 1(2), 22–31 (2002) 3. Abowd, G.D., Mynatt, E.D.: Distributed Cognition: Toward a New Foundation for HumanComputer Interaction Research. ACM Trans. on Computer-Human Interaction 7(2), 174– 196 (2000) 4. Barnard, P.J., Teasdale, J.D.: Interacting Cognitive Subsystems: A Systemic Approach To Cognitive-Affective Interaction And Change. Cognition and Emotion 5, 1–39 (1991) 5. Norman, D.: The Invisible Computer. MIT Press, Cambridge, Mass (1999) 6. Anderson, J.R.: Automaticity and the ACT Theory. American Journal of Psychology 105, 165–180 (1992) 7. Eysenck, M.W.: Principles of Cognitive Psychology. Psychology Press, UK (1997) 8. Kahneman, D.: Attention and Effort. Prentice-Hall, New Jersey (1973) 9. Oviatt, S.L., Cohen, P., et al.: Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and future research directions. Journal of Human-Computer Interaction 15(4), 263–322 (2000) 10. Salber, D., Dey, A.K., Abowd, G.D.: Aiding the development of Context-Enabled Applications, In: Proc. Conf. Human Factors in Computing Systems, pp. 434–441 (1999) 11. Schmidt, A., Karlsruhe, U.: How to Build Smart Appliances. IEEE Personal Communications 8(4), 66–71 (2001) 12. Massaro, D.W., Stork, D.G.: Speech Recognition and Sensory Integration: A 240-year-old Theorem Helps Explain how People and Machines Can Integrate Auditory and Visual Information to Understand Speech. American Scientist 86(3), 236–245 (1998)
Understanding, Measuring, and Designing User Experience: The Causal Relationship Between the Aesthetic Quality of Products and User Affect Haotian Zhou1,2 and Xiaolan Fu1,∗ 1
State Key Laboratory of Brain and Cognitive Science, Institute of Psychology Chinese Academy of Science, Beijing 100101, China {zhouht, fuxl}@psych.ac.cn 2 Graduate School, Chinese Academy of Science, Beijing 100101, China
Abstract. This study sought to test the often-taken-granted assumption about the causal relationship between the aesthetic quality of products and user affect by using affective priming paradigm. The results showed that when beautiful web-pages were used as primes, the discrepancy between the response latencies to positive target and negative targets was larger than when the primes were ugly-webpage. A parallel pattern was obtained when pleasant pictures and unpleasant pictures were used as primes. Such findings supported the hypothesis that visual Gestalt of products can lead to affective change independent of reflective beauty judgment. The possibility of employing affective priming procedure to measure product beauty is also discussed in the light of the experiment results. Keywords: user experience, aesthetics, beauty, affect, affective priming.
1 Introduction Since the first documented attempt to define user experience (UX) in 1996 [1], a major shift of focus from functionality and usability to non-pragmatic or hedonic aspects of products has been observed in the field of human-computer interaction and interaction design. Available evidence to date all points to the same conclusion that the hedonic aspects of a given interface can significantly influence the user experience of that product [8, 19]. Of all the non-instrumental qualities of products, beauty has been gaining prominence amongst HCI researchers, and striving for beauty has become one of ultimate goals of product design process [4, 16]. Though limited studies have yield important findings attesting to the leverage beauty has on UX [17], the underlying mechanism still remains nebulous [9]. Norman [15] argues that beautiful products induce positive affect in users, which in turn facilitates the user-product interaction process. Though Norman’s claim has enormous appeals among researchers [9], it is of both theoretical and practical importance to subject this ∗
Understanding, Measuring, and Designing User Experience
341
causal chain to careful examination. Until recently, much of the emphasis has been put on testing out the link between positive affect and UX [e.g. 13], while the first part of Norman’s claim has been largely assumed to be true. For example, Hassenzahl [9] asserts that beauty judgment is driving by the affect evoked by the visual Gestalt of products (Fig. 1a). Yet, given the lack of empirical evidence, one can propose an alternative model in which the direction of causality between beauty judgment and elicited affect reverses (Fig.1b). Note the principal difference between the two models is that whether conscious aesthetic evaluation is the precondition for the visual Gestalt of products to exert impact on users’ affective state or not.
Fig. 1. Outlines of two competing models of how beauty leads to affect: (a) Norman’s model; (b) alternative model. Despite their differences, we acknowledge that cognitive appraisal per se is capable of influencing UX (user experience).
Since the rationale behind many UX professionals’ advocacy of assigning more weight to aesthetics during product design process rests heavily on Norman’s model [15], this growing interest in beauty might be rendered groundless if the alternative model proves to be right (see Discussion section for a detailed explication). In fact, Hassenzahl [9] has indicated the necessity of carefully scrutinizing Norman’s claim. The lack of research to discriminate between the two competing models may be due to the inability of traditional UX methodologies to dissociate affective process and reflective process. In the present research, we endeavored to tackle this issue by adopting an approach often used to investigating the interplay between cognition and emotion. In addition, we also intended to demonstrate the possibility of adapting this paradigm for use as a promising measuring instrument of product beauty.
2 Methods 2.1 Overview The affective priming paradigm developed by Fazio et al. [6] was adopted to assess the automatic affective response evoked by the visual Gestalt of products. The
342
H. Zhou and X. Fu
underpinning of affective priming paradigm is the so-called congruency effect, that is when the affect induced by the prime is of the same valence as target (e.g. both are positive) the evaluation of target valence (i.e. whether it is negative or positive) will be facilitated as compared to the response to a incongruent target (in this case, a negative one). Thus, rather than explicitly inquiring participants about the affect elicited by certain stimuli—primes, it is inferred from participants’ response to another distinct yet affectively related stimuli—targets [10]. Consequently, this paradigm enables us to examine the affective effect of beauty free from possible distortion of conscious reflection (e.g. social desirability). 2.2 Participants 25 undergraduates participated in the experiment (12 men and 13 women), with a mean age of 21.5 years and an age range of 19-23 years. All of them were right-handed and had normal or corrected-to-normal vision ability. All participants were reimbursed upon the completion of the experiment. 2.3 Materials In this study, we concentrated on the beauty of website specifically. The screen shots of 100 English-language web pages constitute the primary stimulus pool. Among them, 50 are badly designed web pages (Fig. 2a) taken from www.webpagesthatsuck.com and
Fig. 2. Examples of web-pages used in this study: (a) well-designed webpage; (b) bad-looking webpage
other sources; and the remainders are beautifully designed web pages (Fig. 2b) from a few design-award winner lists (e.g. www.worldwidewebawards.net). 30 Chinese undergraduate students (15 males and 15 females), participated in a rating procedure designed by Lindgaard et al. [14], in which they were asked to give beauty judgment to all the candidate web-pages presented one by one in random order. Fig. 3 depicts the time course of a single trial in the rating session: each webpage was on screen for 500ms after a 800ms fixation symbol, and then, participants assigned a visual appeal score to that webpage via a sliding bar. Participants were unable to proceed to a second webpage unless they finished rating properly. An average visual attractiveness score
Understanding, Measuring, and Designing User Experience
343
was computed for each webpage following the algorithm suggested by Lindgaard et al. [14], and the candidates were ranked accordingly. The 20 most appealing and the 20 least appealing websites were retained for usage in subsequent experiment. 60 affective pictures taken from Native Chinese Affective Picture System [2] served as controls for webpage in subsequent study. The valence score of each picture was assessed using 9-point valence scale with 1 designating extremely unpleasant and 9 extremely pleasant. One third of the pictures are unpleasant ones such as bloody scene (mean rating = 2.18, SD = 0.20), 20 are of positive valence such as smiling baby (mean rating = 7.47, SD =0.19), and the remainders are neutral ones such as common tool (mean rating = 4.99, SD = 0.17). The targets consist of 20 positive Chinese adjectives (e.g. outstanding) and 20 negative ones (e.g. selfish) selected from the list standardized by Luo and Wang. These words were rated using the same valence scale. The mean valence rating of the 20 negative adjectives is 2.81 (SD = 0.10) while that of the 20 positive one is 6.81 (SD = 0.16). The average familiarity of positive words is 5.22 (SD = 0.22), slightly higher than that of negative ones, 4.73 (SD = 0.42).
Fig. 3. Time sequence of the webpage rating procedure
2.4 Experiment The time sequence of the affective priming procedure is shown in Figure 4. Each trial started with a fixation symbol (600 ms) followed by the prime (100 ms). After the prime was a blank screen lasting 50 ms, then the target appeared. Participants were required to judge whether the target was a positive word or a negative one as fast and accurate as possible by pressing specified keys on keyboards. After the judgment, the target disappeared and the program slept for 3 s before next trial resumed. Reaction time and accuracy for each target were recorded. The 40 web-pages retained plus the 60 affective pictures constituted the prime pool of this experiment. The primes were classified into five categories—beautiful webpage (BW), ugly webpage (UW), pleasant pictures (PP), unpleasant pictures (NP), and neutral/control pictures (CT). The experiment employed a fully crossed 5 (prime categories) by 2 (target valence: negative word vs. positive word) within-subject design. Each of the 40 targets appears five times, once in one of the five prime categories with the stipulation that the same instance from a given prime category can only be paired with one instance from a given target category. Such a pairing scheme
344
H. Zhou and X. Fu
guaranteed that the same set of targets was used as its own control with respect to prime categories. The whole experiment session consists of a training block with 16 trials and 200 experimental trials, which were divided evenly into two blocks with a break in-between.
Fig. 4. Time sequence of a single trial in the affective priming procedure
3 Results 3.1 Data Screening and Preliminary Data Analysis The priming data were screened for outliers by excluding trials with reaction times below 250 ms or above 1000 ms (8.6% of all trials). After correction for outliers, trials with a false response (1.7 % of the remaining trials) were also eliminated from subsequent analysis. Preliminary analysis shows that response latencies to positive targets (M=608.9) were shorter than those to negative words (M = 632.18), F(1, 24) = 17.15, p < .001. Such a positive-target-premium (PTP) is a typical finding in previous affective priming studies [3, 20]. Past affective priming studies [3] have demonstrated that the direction of PTP score variations can be used to infer the valence of the affect elicited by primes. Specifically, PTP score decrease is related with negative affect while its increase with positive affect. Therefore, we investigated priming effect of different types of primes by observing the variations of PTP score as a function of prime categories. PTP score for each prime category was computed subject-wisely by subtracting the mean response latency to positive targets from the mean latency to negative ones, resulting in five PTP scores (associated with each of the five prime categories) per subject. 3.2 Priming Effects of Prime Categories If beauty judgment is indeed an affect-driven response (i.e. affect change precedes beauty judgment, Fig. 1a), it can be hypothesized that the priming effect (indexed by PTP scores) of visual attractiveness of web-pages (ugly vs. beautiful) should resemble that of pleasantness of pictures (non-pleasant vs. pleasant).
Understanding, Measuring, and Designing User Experience
345
Fig. 5. Mean PTP scores of targets as a function of prime type and prime polarity (Table 1)
To test out this hypothesis, data related with neutral primes were dropped, and the remaining four prime categories used in previous analysis were recoded through two two-level variables—type (webpage and picture) and polarity (negative and positive). Table 1 shows the how different types of primes fall into corresponding cells determined by type and polarity. The PTP score was then analyzed via a 2 by 2 (type by polarity) ANOVA with both variables as repeated measures. The outcome clearly supports the hypothesis (Fig. 5). Of the two main effects and one two way interaction, only the main effect of polarity reaches significance, F(1, 24) = 32.37, p < .001, with PTP score related with positive primes (M = 37.39, SD = 8.00) much higher than that related with negative primes (M = 6.92, SD = 8.71). Table 1. Correspondence between prime categories and their values on polarity and type
Polarity Type
picture webpage
negative NP UW
positive PP BW
Post hoc comparisons shows that priming with both ugly web-pages and non-pleasant picture led to significantly decreased PTP score compared with neutral primes: both ts(24) > 2.30; ps < .03. On the other hand, when the priming effects of
346
H. Zhou and X. Fu
beautiful web-pages and pleasant pictures were compared with neutral primes respectively, PTP scores increased as predicted. However, neither increase reached significance, ps > .45 (a detailed account for this unexpected finding as well as its implication is provided in the discussion section). The means and standard deviations for PTP scores associated with all five prime categories are displayed in Table 2. Table 2. Means and standard deviations for PTP score as a function of prime categories
Prime type BW
Mean (ms)
Std Deviation (ms)
35.43
41.50
UW
14.06
49.37
PP
39.35
48.90
NP
-.23
47.82
CT
32.79
47.24
4 Discussion 4.1 Beauty and Affect In the present study, we found that ugly web-pages’ influence on subsequent adjective-evaluation task is similar to that of non-pleasant pictures, while beautiful web-pages similar to pleasant pictures. It can be inferred by analogy that ugly web-pages induced negative affect in participants; whereas beautiful web-pages pushed the affective state toward the positive direction. In the present study, participants were required to concentrate on target evaluation tasks rather than the primes. With little attention being directed to prime, any affective reaction evoked by the prime was unlikely to be contaminated by conscious reflective processing, and thus can be seen as the direct outcome of visual Gestalt’s impact on affective system (Fig. 1a). Such findings bear out the often-taken-for-granted assumption about the linkage between beauty and affect—visual Gestalt of products is capable of changing affective state of users independent of explicit judgment about product aesthetics. What would be the consequences if the alternative model outlined in Fig. 1b were true? According to this competing model, visual Gestalt of a product cannot impact users’ affective state until users make a conscious aesthetic judgment about its appearance. Yet, in most goal-mode (i.e. driven by predetermined goal) [7] user-product interaction context, it seldom happens that user will explicitly evaluate the visual appeal of the product before proceeding to interact with it. Accordingly, at least in the case of goal-mode products; much of the discussion about making beauty a design goal would become irrelevant, because in this model, the involvement of conscious reflection is required for beauty to influence UX (Fig. 1b).
Understanding, Measuring, and Designing User Experience
347
4.2 Direct Measure vs. Indirect Measure Affective priming paradigm has recently been developed as an indirect measure of stimuli (e.g. food or friends) valence to replace the more conventional direct measures (e.g. questionnaire) in the field of social psychology [5]. The edge of indirect measure over direct measure is that the former is less susceptible to distortions and bias by respondents (e.g. impression management). However, UX field has been slow in taking up this recent advancement in measuring methodology and traditional measuring instruments (e.g. Likert scales) still predominate [12]. In fact, the unexpected result of our study testifies to the possible caveats associated with the direct measurement. Recall that, compared to neutral pictures, neither beautiful web-pages nor the pleasant pictures were capable of increasing PTP scores significantly as predicated. There are two possible explanations for this finding. The first one is that both pleasant pictures and beautiful web-pages failed to alter affective states of participants, assuming that neutral pictures did not induce affective changes. However, this account can hardly reconcile with two facts (1) past affective priming studies have consistently demonstrate that positive pictures can induce affective reaction [e.g.11, 19] and (2) in this study, negative primes resulted in affect change as indexed by significant PTP score decrease. Hence, we believe the alternative account makes more sense—neutral pictures led to positive affect in participants. After reexamining the neutral pictures, we noticed that most of the pictures are beautifully designed artifacts (e.g. antique) or pleasing geometric patterns (Fig. 6). Their neutrality on rating scale is likely due to the distortion of raters’ cognitive process. Despite the initial pleasant affect brought about by the visual Gestalt of a streamlined hairdryer, the raters might have engaged in a reflective process (e.g. how can such mundane object make me feel good?) which dismissed this affective change; and therefore rated it as emotionally bland. Recently, Zhou et al. [21] discovered that Chinese participant took the neutral schematic face (explicitly rated as such) as positive one in an implicit task (i.e. categorization). Admittedly, including an extra condition, where no primes precede targets, to our experiment would help clarify this ambiguity. Nonetheless, the present study points to the precaution one should take when interpreting the results of beauty studies employing direct measuring instruments.
Fig. 6. Examples of emotionally-neutral pictures used in the experiment
348
H. Zhou and X. Fu
Notwithstanding the uncertainty concerning neutral pictures, the present research unequivocally demonstrated that PTP score was capable of distinguishing good designs from bad ones. Therefore, we provided some empirical evidence suggesting the possibility of using affective priming paradigm can be successfully as a promising alternative measuring instrument in future UX research.
5 Conclusion By differentiating between two opposing accounts of how beauty creates affect (Fig. 1), the present study demonstrated that product beauty is one of the ‘many circumstances in which affective reaction precedes the very cognitive appraisal on the which the affective reaction is presumed to be based [18] ’. Note that in our experiment, the affective changes of participants were inferred from PTP scores; therefore, such evidence is indecisive at best. Clearly, more research employing alternative procedures such as physiological instrument are needed to validate our conclusion. Showing that beautiful and ugly products can be set apart on their effect on PTP scores, this study has interesting implication for the important question of how to measure beauty [9]. Though we speculated about the possibility of adopting affective priming paradigm to measure visual attractiveness of products, the adjustment necessary for achieving this end has yet to be specified. Acknowledgments. This research was supported by grants from 973 Program of Chinese Ministry of Science and Technology (#2006CB303101), and the National Natural Science Foundation of China (#60433030).
References 1. Alben, L.: Quality Of Experience: Defining the Criteria for Effective Interaction Design. Interactions 3, 11–15 (1996) 2. Bai, L., Ma, H., Huang, Y.X., Luo, Y.J.: The Development of Native Chinese Affective Picture System-A Pretest in 46 College Students. Chinese Mental Health Journal 19, 719–722 (2005) 3. Banse, R.: Affective Priming with Liked and Disliked Persons: Prime Visibility Determines Congruency and Incongruency Effects. Cognition & Emotion 15, 501–520 (2001) 4. Desmet, P.M.A., Hekkert, P.: The Basis of Product Emotions. In: Green, W., Jordan, P. (eds.) Pleasure with Products, beyond Usability, pp. 60–68. Taylor & Francis, London (2002) 5. Fazio, R.H., Olson, M.A.: Implicit Measures in Social Cognition Research: Their Meaning and Use. Annual Review of Psychology, pp. 297–328 (2003) 6. Fazio, R.H., Sanbonmatsu, D.M., Powell, M.C., Kardes, F.R: On the Automatic Activation of attitudes. Journal of Personality and Social Psychology 50, 229–238 (1986) 7. Hassenzahl, M.: The Thing and I: Understanding the Relationship between User and Product. In: Blythe, M., Overbeeke, K., Monk, A., Wright, P. (eds.) Funology: From Usability to Enjoyment, pp. 31–42. Kluwer Academic Publishers, Dordrecht Boston London (2003)
Understanding, Measuring, and Designing User Experience
349
8. Hassenzahl, M.: The Interplay of Beauty, Goodness, and Usability in Interactive Products. Human-Computer Interaction 19, 319–349 (2004) 9. Hassenzahl, M.: Aesthetics in Interactive Products: Correlates and Consequences of Beauty. In: Schifferstein, H.N.J., Hekkert, P. (eds.) Product Experience, Elsevier, Amsterdam (2006) 10. Hermans, D., Baeyens, F., Lamote, S., Spruyt, A., Eelen, P.: Affective Priming as an Indirect Measure of Food Preferences Acquired through Odor Conditioning. Exp. Psychol 52, 180–186 (2005) 11. Hermans, D., Spruyt, A., De Houwer, J., Eelen, P.: Affective Priming with Subliminally Presented Pictures. Can J Exp Psychol. 57, 97–114 (2003) 12. Kuniavsky, M.: Observing the User Experience: A Practitioner’s Guide to User Research. Morgan Kaufmann, San Francisco (2003) 13. Lyubomirsky, S., King, L., Diener, E.: The Benefits of Frequent Positive Affect: Does Happiness Lead to Success. Psychological Bulletin 131, 803–851 (2005) 14. Lindgaard, G., Fernandes, G., Dudek, C., Brown, J.: Attention Web Designers: You Have 50 Milliseconds to Make a Good First Impression! Behaviour & Information Technology 25, 115–126 (2006) 15. Norman, D.A.: Emotional Design: Why We Love (Or Hate) Everyday Things. Basic Books, New York (2004) 16. Norman, D.A.: Introduction to This Special Section on Beauty, Goodness, and Usability. Human-Computer Interaction 19, 311–318 (2004) 17. Tractinsky, N., Katz, A.S., Ikar, D.: What is Beautiful is Usable. Interacting with Computers 13, 127–145 (2000) 18. Zajonc, R.B., Markus, H.: Affective and Cognitive Factors in Preferences. The Journal of Consumer Research 9, 123–131 (1982) 19. Zhang, P., Li, N.: The Importance of Affective Quality. Communications of the ACM 48, 105–108 (2005) 20. Zhang, Q., Li, X.: Affecitve Priming Effects under Two SOA Conditons. Chinese Journal of Applied Psychology 11, 154–159 (2005) 21. Zhou, G., Fu, X., Hayward, W.G., Locke, V., Pellicano, E.: Cultural Difference in the Application of the Diagnosticity Principle to Schematic Faces. Journal of Cognition and Culture 5(1), 240–247 (2005)
Enhancing User-Centered Design by Adopting the Taguchi Philosophy Wei Zhou1,2, David Heesom2, and Panagiotis Georgakis1 1
West Midlands Centre for Constructing Excellence (WMCCE), The Development Centre, Wolverhampton Science Park, Wolverhampton, WV10 9RU, U.K 2 School of Engineering and the Built Environment, University of Wolverhampton, Wulfruna Street, Wolverhampton, WV1 1SB, U.K {wei.zhou,d.heesom,p.georgakis}@wlv.ac.uk
Abstract. Since the 1980s User-Centered Design (UCD) has been becoming popular in the ICT industry. It helps seek usable designs through a set of workflows, evaluation methods, and design approaches, which construct a comprehensive UCD framework. Along with its extensive utilizations, its pitfalls are also exposed in cost-benefit, robustness, and optimization respects. However, applying the Taguchi Method can remedy these pitfalls to gain robust optimal designs. This approach is feasible but less emphasized in the Human-Computer Interaction field. From a theoretical perspective, this paper depicts a practical approach to enhance UCD framework by adopting the Taguchi philosophy. Based on the analysis of the UCD framework and the Taguchi Method, it discusses key adaptation points for the Taguchi philosophy adoption in the UCD framework. As a result, the Taguchi-Compliant User-Centered Design (TCUCD) framework is proposed in this paper. Keywords: Taguchi-Compliant User-Centered Design, the Taguchi Method, usability, User-Centered Design.
Enhancing User-Centered Design by Adopting the Taguchi Philosophy
351
this repetitiveness feature is a weakness of the UCD method. It costs a substantial amount of time and money to ensure usability. Secondly, the current UCD is not highly effective to achieve robust usability. Design and evaluation in the UCD are two separate phases, which have no relationship or mechanism to integrate them. Nielsen (1993, pp. 107) pointed out it is likely in an iterative design that "additional usability problems appear in repeated tests after the most blatant problems have been corrected". This shows that the unstructured iterative design is incapable to deliver a robust design, and inevitably causes uncertainties to achieve robustness in usability. Last but not least, it is questionable applying today’s UCD to gain an optimized design. In many circumstances, a design solution might have different options in its design components. For the purpose of picking out best one, a normal approach is to make a comparison among all the design options. Nonetheless, the current parallel design approach in the UCD is essentially a collection of several independent designs, which are unsystematic and short of analytical comparison. Moreover, the evaluation analysis methods, such as within-subjects, between-subjects, etc. are weak to deal with multi-variable situations, which normally have a large number of design choices for evaluating and analyzing. It is formidable to undertake huge optimizing work applying those evaluation methods in the UCD. In these aspects, ironically, current UCD approaches are unusable for designers to achieve robust optimal usability in designs. Total Quality Management (TQM) theory provides inspirations to enhance robustness and optimization of the UCD method. Its underlying philosophy of the Taguchi Method advocates designing the product quality in the design process instead of after the design. Comparably, it is possible to design usability in the user interface design process, and hence significantly shorten design-testing cycles, save cost in usability testing, and deliver optimized robust design. Its applicability in HCI was proven in the early 1990s (Reed, 1992). Applying the Taguchi Method, Smith (1996) attempted to create another design approach Logical User Centered Interface Design (LUCID, tagged as LUCID-Smith in this paper). Unfortunately, it lacks of explicit specifications to adopt UCD elements whilst the UCD has been increasingly emphasized in the HCI realm. However, it is shown in a pioneer design project (Zhou, 2005) that combining UCD essences with the Taguchi philosophy can overcome those weaknesses of the UCD, and achieve optimized robust usability. The aim of this paper is to describe an enhanced UCD framework by adopting the Taguchi philosophy. Firstly, the UCD methodology is depicted to show its fundamental elements. Secondly, the Taguchi Method is reviewed to highlight its key concepts and features. Thirdly, the integration of UCD essences with the Taguchi philosophy is discussed to show their correlation and adaptations. Based on the analysis of these theories and methodologies, a new Taguchi-Compliant User-Centered Design (TC-UCD) framework is proposed.
2 UCD Framework UCD was advocated by Donald Norman in the 1980s (Norman, 1988). It recommends placing the user at the center of the design. Since its initiation, it has been developing to be a substantial framework with various methods in requirement analysis, design,
352
W. Zhou, D. Heesom, and P. Georgakis
and evaluation in usability engineering. A few UCD methodologies are invented by HCI specialists, institutions, and organizations to fit needs in the framework. This section outlines the UCD framework from UCD workflow, evaluation methods, and design approaches. 2.1 UCD Workflow Mayhew (1999) introduced a detailed roadmap of usability engineering lifecycle. It can guide practitioners to achieve a usable design step by step. In this roadmap, a complete usability engineering lifecycle consists of requirement analysis, design/testing/development, and installation, in which a serial of specific activities needs to carry out for gaining certain goals. Roughly speaking, the requirement analysis deals with user profile, contextual task analysis, platform capabilities and constraints, and general design principles. These analyses are helpful for determining usability goals. The phase of design/testing/development plays a central role in the roadmap. On the one hand, it applies design strategies, which are derived from previous phase of requirement analysis, for prototyping and development. On the other hand, it provides feedback by testing to the previous phase for adjustment and improvement. The installation phase is to get deployment feedback for the further design improvement. In addition to this roadmap, similar UCD models are also proposed by Nielson (1993), Hix (1999), and some commercial institutions like Cognetics Corp., which created another LUCID (Logical User Centered Interaction Design) framework. Essentially, the backbone of these models can be generalized as several key parts: user study for requirement analysis, conceptual design/development for prototyping and implementation, usability testing for finding usability problems, and deployment for gaining design improvement points. 2.2 Evaluation Method There are three types of evaluation method for the usability testing: heuristic evaluation, formative evaluation, and summative evaluation. In the light of the number of needed users in usability testing, the cost of them is variant from low to high (Hix, 1999). Heuristic evaluation applies existing design guidelines or checklists without involving real users. Formative evaluation is often applied in design-testing lifecycles to identify usability problems. It can produce both qualitative (narrative) and quantitative (numeric) results. Summative evaluation is used for finalizing a design in order to obtain some statistical information. Besides these general types of usability evaluation, Nielson (1993) summarized popular usability methods and their suitability (pp. 224). Among these methods, thinking aloud is a typical formative evaluation, which can be further evolved to be several methods like constructive interaction (codiscovery), retrospective testing and coaching method. 2.3 Design Approach Iterative design and parallel design are two approaches in the UCD. The mainstream in the HCI field is iterative design. Usually, it is realized by several design-testing cycles to incrementally achieve usability. Mayhew (1999) applied this approach
Enhancing User-Centered Design by Adopting the Taguchi Philosophy
353
extensively in the roadmap. Such an iterative approach has been considered the best choice in user interface/interaction design. However, Dix (2003) specified that the iterative approach might be confined to obtain the best design due to an inappropriate start point. For overcoming this pitfall, he acclaimed that it is crucial to have a good initial design based on experience and judgment. Another approach is to have several initial design ideas and drop them one by one as they are developed further. Dix’s (2003) suggestion actually keeps consistency with Nielson’s parallel design model (Nielson, 1993, pp. 86), in which several independent designs can be performed simultaneously by different designers, and then to merge their merits to be a new one for further iterative design. In its case study (Nielson, 1996), it is reported that parallel design is more expensive than iterative design because of consuming more resources. Nevertheless, it can speed up time-to-market, and explore the design space in less time. It is noticed that the merged design was dependant on senior designer’s subjective judgment and individual experience. As a conclusion of the study, this method is not recommended for all projects due to its costly nature, unless time-to-market is of essence.
3 Taguchi Method The Taguchi Method is devised by Dr. Genichi Taguchi in the late of 1940s. It has a strong theoretical relationship with Design of Experiment (DoE), which is founded by Sir Ronald Fisher in the 1920s. Initially, the Taguchi Method was created for the purpose of quality control to deliver robust products. Nowadays, its application has been extensively used in all kinds of fields. Thousands of successful cases from diverse companies have been reported in the past 40 years. Its detailed theory and applications can be available from the reference (Ross 1988, Taguchi 2000). 3.1 Robust Design Dr. Taguchi establishes his philosophy about robust design. He defines “robustness” as: the state where the technology, product, or process performance is minimally sensitive to factors causing variability (either in the manufacturing or user’s environment) and aging at the lowest unit manufacturing cost (Taguchi, 2000). Following this philosophy, the designer’s goal should reduce output variability in the presence of noise. Traditionally, the design approach follows this way: design → test → find problem → solve problem → test → find problem →…until the problem can be eliminated. Such a “plug-the-leak” design approach is just what UCD follows. It is obviously time-consuming and costly for improving design quality. Nevertheless, the Taguchi Method breaks down this conventional design approach. It advocates designing quality into the product instead of inspecting it after its production. For realizing this aim, a three-stage design process is suggested in product quality control. 3.2 Three Stage Design Stage 1 - System Design. The focus of the system design is to determine suitable working levels of design factors. The Taguchi Method treats design in an analytical way. It identifies design issues as design factors, design levels, and noise factors.
354
W. Zhou, D. Heesom, and P. Georgakis
Design factors refer to main controllable design issues for product creation. They directly influence on product performance. Design levels are some options of a design factor. Noise factors are some uncontrollable external issues, which usually interfere to product performance. Basically, noise factors come from three aspects: outer noise (environment), inner noise (product itself) and between product noise (piece-to-piece variation). The choice of design factors, levels and noise factors can be decided by the researcher’s judgment based on selected materials, parts and technology. Stage 2 - Parameter Design. Parameter design is to seek design factor levels that produce the best performance of the product/process under study. These optimal conditions are selected so that the influence of uncontrollable factors (noise factors) causes minimum variation of system performance. For searching these optimal conditions which are insensitive to the noise, a partial factorial experiment is introduced rather than a tedious full-factorial experiment. Orthogonal Array (OA) and its optimization analysis play a key role in this stage. An OA consists of inner array and outer array. An inner array will control design factors and their levels to compose a group of parallel trials for an experiment, and achieve the purpose of partial trials to test whole design combinations. Its features of balance and orthogonality can lead to a comparable experimental result, and thus dramatically decrease the number of experiments. An outer array, likewise, can create different noise situations for testing those design combinations. Optimization analysis for experiment results is able to find most robust design levels for creating optimized design combinations. Besides the often used ANOVA method, Dr. Taguchi suggested using Signal-to-Noise (S/N) ratio to discern right design levels which provide the best performance under study. A confirmation experiment ought to be followed up to verify the validity of the optimized designs. Stage 3 - Tolerance Design. Tolerance design is a way to refine the results of the parameter design by tightening the tolerance of factors. It is possible that design levels are improperly chosen by designers. Even after optimizing in the experiment, the optimized designs might not show desired performances in the confirmation experiment. In this situation, adjustments for design levels need to be made so that to initiate another design-testing cycle. In accordance with the three-stage design as well as its OA rationale, the Taguchi Method constructs a design space or problem space by design factors and levels, in which optimal design solutions can be sought by optimization analysis. Contrasting to the “plug-the-leak” way of aimless searching, this target-oriented approach ensures to achieve a robust design, and can also be applicable for the quality control of user interface/interaction design, the usability.
4 Adoption and Adaptation The Taguchi philosophy demonstrates a structured approach to combine design with evaluation. It allows a group of correlated parallel designs to be created, evaluated, and optimized through comparison. In the UCD framework, adopting the Taguchi philosophy can compensate its weaknesses in robustness, optimization, and shorten tedious design-testing cycles to improve usability control. Some adaptations in both
Enhancing User-Centered Design by Adopting the Taguchi Philosophy
355
UCD and the Taguchi Method are needed so as to comply with the principles of each other. In the following sections, primary adaptations in the adoption are discussed. 4.1 Taguchi Design Taguchi design is introduced into the UCD framework. It can connect the conceptual design and the usability evaluation together to achieve robustness and optimization in application. In accordance with the Taguchi philosophy, the Taguchi design consists of system design and parameter design. Their functions are clarified as follows. System design. The objective of this stage is to identify main design factors and levels which are influential on usability. It has been acknowledged that task-centered process can be visionary to foster design (Davis, 2001). Therefore, task analysis is able to produce design issues, and helpful to decide design factors and design levels. For achieving this objective, task analysis needs to be differentiated into two levels: abstract level analysis for generating design factors, and concrete level analysis for choosing specific design levels. ConcurTaskTrees (CTT) (Paternò, 2000), a popular task analysis method, provides an ideal interface to meet this need. Besides user tasks, specific design elements in the user interface like layout, GUI components etc. also could be design factors if they can cause variations in usability. Task analysis in the UCD framework can lead to defining of usability goals, and thereafter to create prototypes. Similarly, it plays the same role in the system design, but the prototyping is performed in the parameter design. Parameter design. Parameter design seeks most usable design levels through usability testing on prototypes. It needs a proper inner array to arrange identified design factors and levels which have been decided in the system design. Thereby, the inner array will construct a set of parallel combinations for prototyping. Compared with independent parallel design in the UCD framework, these parallel prototype designs are subject to the inner array, and every prototype essentially corresponds to a trial for usability testing. The usability testing in every trial is to examine design factor levels so that to identify robust design levels, which are insensitive to noise factors. As these prototypes have the same design factors but with different design levels, it is hence comparable for each prototype to pick out robust design levels after the testing. Positively, the feature of balance and orthogonality of inner array ensures a partial factorial experiment to test all the possible prototypes. It is particularly valuable for usability testing to save cost. Formative evaluation is the main approach in this evaluation. Its execution and result analysis can lead to optimized designs. 4.2 Formative Evaluation and Analysis Formative evaluation is performed in the trials of the parameter design. It serves two aims of usability testing and design level optimizing. The former can check usability problems in each design level; the later is able to identify robust design levels. Among usability methods, thinking aloud and performance measure are applicable for the testing partly because both of them can check usability problems in prototypes, and partly because they all can provide objective quantitative information to identify
356
W. Zhou, D. Heesom, and P. Georgakis
robust design levels. According to the ISO9241-11, usability is defined as effectiveness, efficiency and satisfaction. Its evaluation accordingly has objective evaluation for effectiveness, efficiency, and subjective evaluation for satisfaction. In objective evaluation, effectiveness and efficiency can be associated with the rate of error and time cost respectively from the participant. During the testing, this quantified objective information can be recorded to assess usability of design levels. For the purpose of identifying most robust design levels among the test results, analyzing S/N ratio can intuitively judge effectiveness and efficiency. The measure of satisfaction, however, is unsuitable to apply a partial factorial experiment because of its subjective nature. Nevertheless, it is positive for the participant to pick out his or her most favorite design combinations after finishing all the trials. Based on these objective and subjective evaluations, three types of most usable design levels can be found out in terms of effectiveness, efficiency, and satisfaction. As this analysis approach focuses on individual’s behavioral information, its optimized design solutions are only applicable for the individuals. Such an analysis accordingly can be named as Within Individual Analysis. 4.3 Human Factors and Outer Array Human factors are main noises in terms of the Taguchi philosophy in the parameter design. Smith (1998) applied the outer array to arrange objective human factors like age, gender, ethnic background, etc. On the other hand, he suggested using cognitive approaches to handle subjective human factors such as cognitive style, attitude etc. However, utilizing outer array to build noise conditions in usability evaluation is suspicious because human factors are chaos in the testing. It is uncertain to assert that a user’s performances will be influenced by a pure human factor such as gender, or nationality etc. Moreover, usability testing is mixed with both objective factors and subjective factors from every participant. For these reasons, the obtained optimization results could be inconsistent. The more persuasive and economic approach for dealing with these variations is to check statistical significance of obtained optimized designs. Nonparametric statistics is an applicable approach for solving this problem. 4.4 Significance Analysis Applying the Within Individual Analysis, optimal robust designs can be available in the Taguchi design. Apparently, these optimized designs are not the same for all people because of individuals’ difference. For a group of obtained optimal designs in the evaluations, they fall into different categories of design levels’ combinations. Therefore, it triggers a question about if there are significant differences among these combinations. In essence this is a hypothesis test question of one-sample goodness-of-fit for categorical measurement. The Chi-square test (Siegel, 1988), one of nonparametric statistics methods, can answer this question. As this test concentrates on significance analysis of optimization results from all individuals, it thus can be named as Among Individual Analysis. Such an analysis is able to identify if there are the most effective, or efficient, or satisfactory solutions for all end users.
Enhancing User-Centered Design by Adopting the Taguchi Philosophy
357
5 TC-UCD Framework On the basis of the foregoing discussion, an enhanced UCD framework is proposed as Taguchi-Compliant User-Centered Design (TC-UCD). The backbones of TC-UCD consist of user study, conceptual design, Taguchi design, usability evaluation, confirmation test, and deployment. TC-UCD not only keeps the UCD essences in design techniques and usability evaluations, but also has some unique features for exploring optimized robust usability in designs. Fig. 1 illustrates present UCD framework and TC-UCD framework for comparison. Compared with the current UCD framework, there four major characteristics of the TC-UCD. Firstly, it preserves user study as design start point for user profile definition, functional and non-functional requirement analysis. Secondly, the Taguchi design is integrated in the UCD framework. Its system design and parameter design belong to the conceptual design and the usability evaluation respectively. The former determines design factors and levels; the later seeks robust optimal design factor levels through parallel prototyping and formative evaluation. Meanwhile, the Within Individual Analysis can identify optimal design combinations whilst the Among Individual Analysis can verify the significance of optimized designs. Thirdly, the confirmation test is suggested in the TC-UCD for checking the usability of optimized designs. Summative evaluation is no longer necessary in the new framework. Lastly, TC-UCD itself is compatible with iterative design. Although the emphasis of the TCUCD is placed on the parallel design and evaluation, it is still flexible to adjust design strategies from beginning of the user study, or to tighten design levels from system design. The start point of iteration depends on real design situations.
6 Conclusions and Future Work This paper presents a new vision to enhance present UCD framework. From a theoretical perspective, it presents the TC-UCD to compensate the UCD shortcomings in robustness and optimization aspects. As a multidisciplinary field, HCI theories and methods mainly derive from behavioral sciences. Undoubtedly, it is a valuable practice to combine feasible engineering theories with behavioral sciences for enriching HCI framework. TC-UCD is an advanced and comprehensive design approach. It is highly useful for exploring in-depth design solutions in multi-variable situations. Particularly, the identification of design factors provides a common start point to seek appropriate design levels within a parallel design space. Such a design mechanism can completely fulfill the needs of Nielson’s parallel design model only by more considerations and analyses. Its analysis methods, theories, and comparable quantitative design approach make the model more practical and applicable. Hence the cost of this parallel design will not be increased due to changing design mechanism. However, it is demanding that designers ought to own good skills for quick prototyping to meet the needs in evaluations and time-to-market. Designers also need to gain the knowledge of Taguchi philosophy so as to utilize it in practice. Adopting engineering theories into the UCD framework is a new attempt. Especially, UCD has
358
W. Zhou, D. Heesom, and P. Georgakis
Fig. 1. UCD and TC-UCD framework
Enhancing User-Centered Design by Adopting the Taguchi Philosophy
359
strong characteristics in empirical design and behavioral sciences. The merge of these two aspects is a challenge. It is expected to fully apply the TC-UCD in real design practices for further verification, validation, and improvement.
References 1. Davis, L., Dawe, M.: Collaborative Design with Use Case Scenarios. International Conference on Digital Libraries, In: Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries, Roanoke, Virginia, United States, pp. 146–147, ISBN: 1-58113-345-6 (2001) 2. Dix, A., Finlay, J., Abowd, G., Beale, R.: Human-Computer Interaction, 3rd edn. Prentice Hall, Englewood Cliffs (2003) 3. Hix, D., Ii, S., Gabbard, J.L., McGee, M., Durbin, J., King, T.: User-Centered Design and Evaluation of a Real-Time Battlefield Visualization. Virtual Environment. In: IEEE Virtual Reality Conference (VR’99) (1999) 4. Paternò, F.: Model-Based Design and Evaluation of Interactive Applications. Springer, Heidelberg (2000) 5. Mayhew, D.J.: The Usability Engineering Lifecycle. Academic Press, San Diego (CA) (1999) 6. Nielsen, J.: Usability Engineering. Morgam Kaufmann, San Francisco, California (1993) 7. Nielsen, J., Faber, J.M.: Improving System Usability Through Parallel Design. IEEE Computer 29(2), 29–35 (1996) 8. Norman, D.: The Psychology of Everyday Things. Basic Books, New York (1988) 9. Reed, B.M.: A robust approach to human-computer interface design using the Taguchi method. Old Dominion University, Norfolk, VA, USA (1992) 10. Ross, P.J.: Taguchi Techniques for Quality Engineering. McGraw-Hill, New York (1988) 11. Siegel, S.: Nonparametric Statistics, 2nd edn. International Editions. McGraw-Hill, New York (1988) 12. Smith, A.: Towards the total quality interface - applying Taguchi TQM techniques within the LUCID method. In: People and Computers XI, Proceedings of HCI-96, Springer, Heidelberg (1996) 13. Smith, A., Dunckley, L.: Using the LUCID method to optimize the acceptability of shared interfaces. Proceedings of Interacting with Computers 9, 335–345 (1998) 14. Taguchi, G.: Robust Engineering. McGraw-Hill, New York (2000) 15. Weiss, S.: An Alternative Business Model for Addressing Usability: Subscription Research for the Telecom Industry. Interactions (July + August 2005) 16. Zhou, W.: Confidential Report: The interface development of Explicit Recommender for an Open Media Centre. Stan Ackermans Institute, Eindhoven University of Technology, the Netherlands (2005) ISBN 90-444-0543-8
A Requirement Engineering Approach to User Centered Design Dirk Zimmermann1 and Lennart Grötzbach2 1
2
T-Mobile Deutschland GmbH, Landgrabenweg 151, 53227 Bonn, Germany Siemens IT Solutions and Services C-LAB, Fürstenallee 11, 33098 Paderborn, Germany [email protected],[email protected]
Abstract. This paper describes an approach to integrate UCD activities into the existing Software Engineering practices and processes. The aim is to use the outcomes of UCD activities throughout the development process and to ensure that they can be utilized, traced and tested by subsequent development groups. Through this, UCD activities do become planable and manageable just like any other development activity. The authors introduce a framework of three different usability-related requirement types that reflect the results of the UCD activities performed during the software development. Each requirement type is extracted from the UCD results generated in the first three phases of the DIN EN ISO 13407 model. Keywords: Usability Engineering, Requirements Engineering, Processes, Integration.
A Requirement Engineering Approach to User Centered Design
361
The approach taken in this paper is to create a Requirement Engineering (RE) framework that distinguishes three different types of Usability-related requirements that correlate to the three facets: Usability, Workflow and Design. Knowing that most UE processes are embedded into a more complex Software Engineering (SE) process, one of the goals was to align the framework both with current best practices in Usability Engineering and with existing RE approaches in the Software Engineering discipline. 1.1 Requirement Engineering Current software development practices use software requirements to specify the functional and non-functional aspects of software systems. While functional requirements specify the services and functions the system should provide and how it should react and behave, non-functional requirements define constraints to the offered services and functions. Thus non-functional requirements define the product quality. In the ISO/IEC 9126 six non-functional requirements dimensions are differentiated – Usability is one of them [13]. The process of requirement elicitation, as well as the results of the requirements engineering activities, is well described in today’s literature. The IEEE STD 830-1998 [12] describes the resulting artifact, the Software Requirement Specification (SRS) as a document that “correctly define[s] all of the software requirements” of the system to be developed. Each of these specifies a “software capability needed by a user to solve a problem to achieve an objective” [1]. For the SRS the IEEE STD 830-1998 also defines quality attributes such as correctness, unambiguousness, completeness, consistency and verifiability. Thus these attributes apply to each individual requirement contained in the SRS. This is a huge benefit for the development process since a SRS provides verifiable and testable demands towards the system. But to specify these testable and precise requirements is difficult, sometimes requirements represent architecture/design constraints or lofty goals that can neither be met by the system nor sufficiently tested. Recent efforts in the Usability Community aim to produce similar guidelines tailored to Usability-related requirements. One result of these efforts is the Common Industry Specification for Usability-Requirements (CISU-R) [5] that provides guidelines for “defining usability requirements in sufficient detail to make an effective contribution to design and development and [create] usability criteria that can be empirically validated subsequently if needed.” It is closely related to the Common Industry Format for Usability Test Reports (CIF) [14] that offers guidance and a specification format for performing and describing the results of summative Usability Tests based on Usability requirements. These tests following the CIF can be used to validate the specified requirements written in the CISU-R notation. Both standards aim to specify the level of Usability based on its three dimensions: Effectiveness, Efficiency and Satisfaction: „The CIF and CISU-R take a broad approach to Usability based on DIN EN ISO 9241 Part 11 […] The value of specifying these high level requirements is that they relate closely to business requirements for successful use of a product and increased productivity” [5]. From an RE point of view, it is therefore important that the quality attributes defined in the IEEE 830-1998 are applicable to any Usability-related requirement – in order to guarantee that they can be verified and tested throughout the development process.
362
D. Zimmermann and L. Grötzbach
1.2 User Centered Design Process In current literature several User Centered Design (UCD) processes are described (e.g. [6], [16]). They focus on different aspects and address different needs of software development – but all of them share a set of common properties. These properties are described in the DIN EN ISO 13407, where a generalized human-centered design process for interactive systems is described [8]. The process is not a replacement to established software development processes but an addition to them: “This International Standard does not assume any one standard design process, nor does it cover all the different activities necessary to ensure effective system design. It is complementary to existing design methods and provides a human-centered perspective that can be integrated on different forms of design process in a way that is appropriate to the particular context.” The process consists of four base activities common to UCD process models: • • • •
To understand and specify the context of use, To specify the user and organizational requirements, To produce design solutions and To evaluate the designs against requirements.
These four activities are performed iteratively during the development process. The process is complete when the resulting system meets its specified requirements. Iterations to close in on the requirements and testing the results throughout the process are also common traits of this process model. Even though the DIN EN ISO 13407 requests, that evaluation of mock-ups and prototypes take place throughout the process, few details are given on how this relates to the basic evaluation phase of the process. Because the results are iteratively improved, it seems reasonable to suggest that the intermediate results as well as the requirements need to increase in detail. Not only that new requirements will be discovered during the process, more specific requirements are needed to evaluate the intermediate process results while the system design becomes more concrete and precise. This is in line with the activities described in the DIN IEC TR 18529, a standard that is an addition to the DIN EN ISO 13407, providing process descriptions to the human-centered lifecycle [9]. In the Evaluate designs against requirements sub-phase evaluation activities are proposed to improve the design, to define and to validate requirements and to check whether the defined practices are being followed. Thus, results of the UCD phases serve as input for subsequent phases as well as validation criteria for intermediate and final UCD results. 1.3 Evaluation Since the DIN EN ISO 13407 describes an iterative process model, all of its phases are performed several times during one development cycle. As Woletz [17] pointed out, evaluation activities focus on different aspects during this cycle – in early phases intermediate results are evaluated to identify weak points, gaps or errors for further improvement, while at the end a final assessment of the system is being performed.
A Requirement Engineering Approach to User Centered Design
363
Therefore evaluation activities should not only be performed once at the end of the cycle but several times during the development cycle. “In the general software industry it is increasingly recognized that continued evaluation is needed throughout the system development lifecycle, from early design to summative testing, in order to ensure final products meet expectations of designers, users, and organizations” [15]. Two types of goals and corresponding evaluation methods can be differentiated for validation activities: First, it can be evaluated whether the resulting documents of the development phases are correct in terms of content and style and capture the appropriate information needed for the later phases of the cycle. This can be done via interviews where the captured information are discussed with user representatives, stakeholders or other domain experts. In addition to this, “upward validations” are needed to check whether the results of a phase correspond to the results of previous phases. As an example, all generated requirements of the Specify Requirement phase are checked against the results of the Context of Use phase to find open loops, conflicts or mismatches between them. Second it can be evaluated whether the system (in development) matches the requirements that were specified in advance. For UCD evaluations this can be done through Usability Tests or Expert Reviews. To be effectively able to do this, the specified requirements need to be measurable and precise, as stated above. For both evaluation types the activities and the granularity of the resulting artifacts of the phases vary greatly. Therefore different evaluation methods need to be applied to the results. For example, analysis results from the Context of Use phase require different evaluation methods than software prototypes from the Design Solution phase. Results of the Context of Use phase, such as persona descriptions or other models describing the work, the context and the users in focus can be validated with Usability Tests or through Expert Reviews. To check how well the system is accepted by the target audience, questionnaires or surveys can be conducted. The workflow descriptions resulting from the Specify Requirement phase can best be evaluated with real or potential users by comparing their real workflows with the proposed ones and gathering their feedback on the modifications. The same applies to early sketches showing the envisioned interaction steps the user will have to take. Design solutions, such as screens showing the user interface or detailed system interactions showing the flow of information, can a) be evaluated with users to see if the solutions supports their workflows and can b) be tested against styleguides or measurable criteria to see if the solution meets the previously specified expectations and requirements. As mentioned before, not just final result but also intermediate results from the different phases from the development cycle need to be evaluated. To effectively evaluate the system against the results of the previous phases, they need to contain measurable criteria. Thus, in order to be able to evaluate UCD results there should be requirements generated from each of the first three phases described in the DIN EN ISO 13407. This is what we do in the following sections.
2 A User Centered Requirement Framework Given that on one side each of the first three phases described in the DIN EN ISO 13407 generates some type of result, and on the other the fact that the DIN EN ISO
364
D. Zimmermann and L. Grötzbach
9241 Parts 10 to 12 also describes three facets: Usability, Workflow and Design, the RE framework presented in this paper focuses on these three types of results as starting points. Within the Context of Use phase, the analysis revolves mainly around the anticipated user, their jobs and tasks, their mental models, conceptions of the usage of the system, physical environment, organizational constraints and determinants and the like. While a lot of this information is descriptive (i.e. helps to create concepts and models rather than being a testable requirement), the Context of Use phase also yields users’ expectation regarding the fundamental Usability Dimension, namely effectiveness, efficiency and satisfaction. CISU-R reflects these overarching metrics pertaining to the use and perception of the system by the user in the form of requirements, whereas CIF presents a standard format for presenting results of tests based on Usability requirements. These requirements can be used as evaluation criteria for the system and any intermediate prototypes, as well as intermediate UCD outcomes of subsequent phases. The User Requirements phase focuses on individual workflows and tasks to be performed by one of the target users. Taking into consideration some of the Context of Use data, specific task performance models are elicited from users, workflow optimization happens and an improved task performance model is generated. The outcome of this phase could be described as a set of requirements pertaining to a user’s interaction with the system in the context of a specific workflow or task, e.g. as use case scenarios. The requirements describe the discrete sub-steps of a user’s interaction flow and the expected system behavior for each of these steps. These requirements themselves can already be evaluated against the Usability requirements elicited from the context of use, e.g. by comparing an optimized workflow to the current state of workflow performance with regard to effectiveness, efficiency and user satisfaction. However, they also serve as input and evaluation criteria for the subsequent process step: Produce Design Solution. During the Produce Design Solution phase, properties of the intended system are defined, e.g. information architecture, interaction flow, screen layout, component design, etc. Some of these properties are more conceptual, i.e. they serve as underlying models, but others can be experienced by the user during the use of the system. In order to translate these designs into solutions, different facets have to be considered. • Conceptual/Structural/Framework type requirements that describe a model that is more underlying and less visible. • Visual requirements (or other perceptive modality) that describe the physical properties of the solution (e.g. size, color, spacing, contrast, alignment, etc.) • Interaction requirements that describe the behavior of the system (e.g. interaction flow, messaging, etc.) These User Interface (UI) requirements can be evaluated against the Usability requirements generated in the Context of Use phase, i.e. to evaluate if the layout and interaction model fit the mental model that the users have of the task and associated information and if the general usage would be effective, efficient and satisfactory. They can also be evaluated against the Workflow Requirements from the Specify Reqirements, in order to assess whether all specific workflows can be performed
A Requirement Engineering Approach to User Centered Design
365
easily with the given concept and design. They serve as criteria for the actual system that has been developed, i.e. does it follow the defined model for layout and interaction In summary, the user related requirements, which can be generated in the first three phases of the DIN EN ISO 13407 model, are depicted in Figure 1.
Usability Requirements from Context of Use Phase
Workflow Requirements from User Requirements Phase
User Interface Requirements from Design Solution Phase
Usability Evaluation Usability Principles
Workflow Evaluation Dialog Principles
UI Evaluation Information Design Principles
Fig. 1. Usability-related requirements, their origin during the DIN EN ISO 13407 phases and their evaluation activities based on DIN EN ISO 9241 principles
For the overall system design, it shall be noted however, that these three types of Usability-related requirements are not exhaustive in the description of a system. There is a variety of requirements stemming from different stakeholder groups that need to be viewed in conjunction with the Usability-related requirements described in this paper. But from a UCD perspective it is important to include user needs in the form of requirements into the overall process for scoping, implementation or testing purposes. 2.1 Usability Requirements This type of requirement contains general Usability criteria that the system must meet. It is based on the three Usability dimensions effectiveness, efficiency and satisfaction described in DIN EN ISO 9241 Part 11 and specifies the requirements of the user population towards the holistic usage of the system in the specified context of use. Thus, they are the result of the Context of Use phase in DIN EN ISO 13407 and provide measures that the system should comply to. Usability requirements are derived from the results of the context analysis and from competitive analyses, previously identified areas of improvement, and from general expertise about the domain, human-computer interaction practices, etc. For example, UCD artifacts to be used as input can be the Contextual Design Models as specified by Beyer and Holtzblatt [2] (Flow of Work, Sequence of Work, Work Artifact, Work Culture and Physical model) or Persona Descriptions as introduced by Cooper [6], since they describe user archetypes capturing user characteristics and describing overarching contextual and task information. As usability requirements are one of many non-functional requirement types (as mentioned above), they have strong impact on functional requirements, since the usability goals address how the system shall be used and perceived by the user. They
366
D. Zimmermann and L. Grötzbach
can be documented using established SRS formats or using the best practices defined in the CISU-R (see [12], [5]). Usability Requirements can be verified through interviews and reviews during their elicitation, and tested via Usability Tests, Expert Reviews or with surveys and questionnaires towards the end of the development cycle. They can be used to generate unique selling points or other arguments used by marketing or sales divisions. The requirements can be tested by the testing department in conjunction with the other non-functional requirements. 2.2 Workflow Requirements During the Specify Requirement phase of DIN EN ISO 13407 the current state captured during the Context of Use phase is considered and iteratively modified towards an optimized set of workflows and requirements that reflect the planned changes and improvements. The new workflow descriptions capture user goals and the associated workflows. They specify how the system should support the user to complete his tasks/goals with the system. They encapsulate essential interaction steps, needed information and options for the user and thus specify system behavior without being too specific about the concrete details (UI). Entire Workflow Scenarios are considered as one requirement, meaning that the improved workflow to accomplish a task needs to be implemented in the system to allow the users to work successfully. The information and models describing the future system’s context of use serve as input as well as innovations, the anticipated changes, fixes to known bugs and new features which are molded into a description of a “better” system than the current. The possible interaction with the system and the flow of information is described in use cases or similar artifacts (e.g. scenario descriptions). “A scenario is a concrete description of a specific flow of interaction, but one that is chosen to be typical or representative. [...] A use case is a generic scenario, describing one kind of interaction with a particular user interface.” [4] Cockburn defines, that each use case is the implementation of a stakeholder's goal. Multiple stakeholders participate in the use case and their interests should also be protected by the interactions defined in it. For Cockburn, the collection of use cases is the essence of the system's requirements, even though they don't represent all the requirements [3]. Workflow Requirements can be reviewed and verified with users to assess whether the current workflow description is accurate and if the optimized workflows are regarded as an actual improvement of their tasks. Later in the development process the requirements can be used to evaluate a prototype or system with regard to the workflow support, i.e. whether the interaction model meets the workflow requirements. A single scenario, e.g. the main scenario from a use case, could for example serve as an evaluation scenario during a usability test. The concrete descriptions of the system and the needed functionality within workflows can be used to determine what to build into the system. The workflow descriptions are used as test cases to verify if the workflows can be realized with the prototypes and the final system created during the Design Solutions phase.
A Requirement Engineering Approach to User Centered Design
367
2.3 UI Requirements During the Produce Design Solution phase, both the Usability Requirements and the Workflow Requirements are synthesized into a set of conceptual models and solution elements. These usually consist of Information Architecture, Navigation Models, Wireframes, Sketches, high fidelity screen designs, or prototypes. They describe the logical model and physical properties of the system, i.e. specify how the widgets, UI patterns, and interaction elements shall look, behave, and interact. Ideally, they also provide all states, presets, and system reactions for concrete system screens. UI requirements are generated from prototypes, the actual system in the current stage, UI specification documents and also from general styleguides, information architecture or navigation models. They focus on different layers of the user interaction with the system and thus have sub-types: • Information architecture and information flow requirements which define the overarching logical structure of the user interface. • Presentation requirements, where the layout of an entire component or a single element is defined, e.g. a screen, a widget like a dropdown combo box. • User-system-interaction requirements, where the behavior of the UI elements is defined, e.g. status changes. • Compound requirements, where the interaction between more than one element is specified, e.g. in the case of drag and drop functionality. • Message requirements, which define when and how system generated notifications, e.g. errors or alerts, are presented to the user. In order to ensure that the solution is conform with the previous analysis, the UI requirements are evaluated against usability and workflow requirements to ensure that they match the mental model that the user has of the “look & feel” of the system, as well as the users expectations regarding effectiveness and efficiency on a granular level, e.g. with regard to screen flips or mouse clicks. Additionally, they should comply with general design guidelines as defined e.g. in DIN EN ISO 9241 Part 12, or styleguides. UI requirements are also used to evaluate the fulfillment of the solution design in the coded system. Within the development process, UI requirements can be used by Development as input for their system design, as well as by testing divisions for ensuring appropriate realization of the UI. They enhance/extend workflow requirements in a way that development can select the most appropriate implementation of a given workflow. UI Requirements are also used by internal experts (e.g. UI design, testing) to determine if the (coded) system complies with its specification. They are mostly of internal value, i.e. supporting system specification and verification. From an end user’s perspective they should almost be invisible, i.e. the users’ perception of the system should not recognize individual UI features, but a holistic experience that is compliant with their expectations.
3 Summary and Outlook The authors introduced a framework of Usability-related requirements to align Usability Engineering with Software Engineering and Requirement Engineering
368
D. Zimmermann and L. Grötzbach
practices. Correlated to the phases of the DIN EN ISO 13407 process model and to the three levels of usability principles laid out in DIN EN ISO 9241 Parts 10 to 12, three requirement types were differentiated: Usability-, Workflow- and UI Requirements. Through the introduction of Workflow- and UI Requirements the authors extended the established concept of usability requirements, e.g. as described in CISU-R. These two additional types offer a more precise representation of UCD demands generated in the later phases of a development cycle. The three requirement types were explained, their interconnections described and details to their application and reuse were given. By differentiating the three types of requirements, development organizations can ensure a seamless, traceable hierarchy of user-focused requirements, starting with the high level needs pertaining to the general use of a system, going through the specific needs during the task performance towards requirements regarding the specific implementation of the system. In addition, the approach allows selecting the most appropriate evaluation method for different types of requirements, e.g. summative usability tests for Usability Requirements, or expert based reviews for UI Requirements. Guidance in selecting the appropriate evaluation method is provided by a poster by Freymann et al. [11] that differentiates evaluation methods by their result types and by their application within the development life cycle. Another benefit is scalability. Individual requirements are less monolithic than e.g. complete specification documents. Especially with regard to the emerging lightweight agile approaches to SE, e.g. Extreme Programming or Scrum, a less document driven approach to capture UCD outcomes could be more suitable. A detailed analysis of potential application areas for the three UCD requirement types in agile development is described by Düchting et al. [10]. The approach and the requirement framework presented in this paper need to be applied in the field and tried out. The requirement types need to be integrated within established requirement engineering approaches to evaluate whether the proposed granularity and differentiation of the requirement types is sufficient and where areas of improvement and refinement can be identified.
References 1. ANSI/IEEE Std. 729-1983. Standard Glossary of Software Engineering terminology (1983) 2. Beyer, H., Holtzblatt, K.: Contextual Design – Defining Customer-Centered Systems. Morgan Kaufmann, San Francisco (1998) 3. Cockburn, A.: Writing Effective Use Cases. Addison-Wesley, London, UK (2001) 4. Constantine, L.: What do users want? Engineering Usability into Software. Windows Tech. Journal 4(12), 30–39 (1995) 5. CISU-R. Common Industry Specification for Usability-Requirements, Draft Version 0.86, National Institute of Standards and Technology, 18.03.2006 (2006) 6. Cooper, A., Reimann, R.: About Face 2.0. Wiley, Indianapolis, Chichester, UK (2003) 7. DIN EN ISO 9241. Ergonomic requirements for office work with visual display terminals (VDTs). International Organization for Standardization (1998)
A Requirement Engineering Approach to User Centered Design
369
8. DIN EN ISO 13407. Human-centered design processes for interactive systems, International Organization for Standardization (1999) 9. DIN IEC TR 18529. Software engineering - Product quality. International Organization for Standardization/International Electrotechnical Commission (2001) 10. Düchting, M., Zimmermann, D., Nebe, K.: Incorporating User Centered Requirement Engineering in Agile Software Development. In: preparation, HCII 2007, Beijing (2007) 11. Freymann, M., Grötzbach, L., Nebe, K.: Selecting Appropriate Evaluation Methods for Different UCD Outcomes. In: preparation, HCII 2007, Bejiing (2007) 12. IEEE STD 830-1998. IEEE recommended practice for software requirements specifications, International Organization for Standardization (1998) 13. ISO/IEC 9126. Software Engineering - Product Quality, International Organization for Standardization (2001) 14. ISO/IEC 25062:2006. Software product Quality Requirements and Evaluation (SQuaRE) Common Industry Format (CIF) for usability test reports (2006) 15. Kushniruk, A.W., Patel, V.L.: Cognitive and Usability Engineering Methods for the Evaluation of Clinical Information Systems, Journal of Biomedical Informatics 37 (2004) 16. Mayhew, D.J.: The Usability Engineering Lifecycle – A practitioner’s Handbook to User Interface Design. Morgan Kaufmann, San Francisco (1999) 17. Woletz, N.: Evaluation eines User-Centred Design-Prozessassessments - Empirische Untersuchung der Qualität und Gebrauchstauglichkeit im praktischen Einsatz. Doctoral Thesis. University of Paderborn, Germany (2006)
This page intentionally blank
Part II
Usability and Evaluation Methods and Tools
Design Science-Oriented Usability Modelling for Software Requirements Sisira Adikari, Craig McDonald, and Neil Lynch School of Information Sciences and Engineering University of Canberra ACT 2601 Australia {Sisira.Adikari,Craig.McDonald,Neil.Lynch}@canberra.edu.au
Abstract. An identified key reason for degraded usability in software systems is the deficiencies of current RE practice to incorporate usability perspectives effectively into SRS. The explicit expression of user and usability aspects in SRS benefits designers, developers, and testers in ensuring optimal usability in software products. This paper presents the results of a design-science oriented user interface design study to validate the proposition that incorporating user modelling and usability modelling in SRS improves design. Keywords: User modelling, usability modelling.
1 Introduction Despite the presence of various User-Centred Design (UCD) methods developed to produce usable information systems, still usability related issues are detected late in the software development, during testing and deployment [1]. One of the identified reasons for poor usability in products is that usability requirements are often weakly specified [2]. Traditionally, Requirements Engineering (RE) concentrates on functional requirements and ensuring that the developed products meet these requirements, rather than other non-functional requirements (NFR), which are considered less important [3]. Usability has been classified as one of the NFR in RE [4]. Designation of usability as a rather less important NFR may cause paying less attention during the requirements definition stage and less focus of usability in requirements may propagate usability issues into end products. In this paper, we propose design science-oriented requirements modelling based on user modelling and usability modelling as an effective means in transforming usability aspects into Software Requirements Specifications (SRS). We explain the UCD process and a possible way to integrate it into a typical SDLC. We present a design science-oriented research design to test our proposal and some results to validate the proposition that incorporating user modelling and usability modelling in SRS improves design.
desired properties [6]. In a much cited paper, March and Smith [7] describe ‘build’ and ‘evaluate’ as two fundamental design science processes and four types of products in design science: constructs, models, methods, and instantiations. According to their definitions, constructs or concepts form the vocabulary of a domain, a model is a set of propositions or statements expressing relationships among constructs, a method is a set of steps used to perform a task, and an instantiation is the realisation of an artefact in its environment. The reporting of design science concepts by March and Smith further developed by a number of authors recently [8-10] claim that design activities are central to the information systems (IS) discipline and present a conceptual framework for understanding, executing, and evaluating IS research combining behavioural science and design science paradigms. Figure 1 shows research as addressing both the rigour required of research and the practical environment of use.
Fig. 1. Information Systems Research Framework [10]
3 ISO 13407: Usability Engineering Process The ISO 13407 standard intends to provide guidance of best practice in usability engineering [11]. Jokela et al. carried out an in-depth interpretive analysis of ISO 13407 and identified that the standard provides limited guidance for designing usability, describing user goals, usability measures and producing the various outcomes [12]. Although the ISO 13407 provides a general guidance to UCD activities, our analysis of the standard yields two important aspects which are not clearly visible: • How to use of evaluation feedback to improve the design and requirements. • How the ISO 13407 can be used to support a software development process.
4 ISO 13407: Evaluation Feedback to Improve the Design In ISO 13407 process model (see Figure 2), the output of the process 4 feeds back to process 1. In practice, it is required to apply the outcome of the evaluation immediately to the design with changes to improve the design prior to the next iteration. In producing effective design solutions, we argue that it is quite important to feedback
Design Science-Oriented Usability Modelling for Software Requirements
375
the output of the process 4 (Evaluate design against requirements) also into process 3 (Produce design solutions) and also updating the requirements. Our suggested variation with additional feedback loops is also shown in the Figure 2.
Identify need for human-centred design process Understand and specify the Context of use 1
Evaluate design against requirements 4
System satisfies specified user and organisational requirements
Specify the user and organisational requirements 2
Produce design solutions 3
FINAL DESIGN
Fig. 2. A variation to the ISO 13407 to improve the design
4.1 Integration of ISO 13407 into Software Development Process In Figure 2, when the system satisfies user and organisational requirements (Final Design), the requirements at stage 2, can be considered as the requirements of the best design solutions of the product. As illustrated in Figure 3, we suggest that feeding such requirements into requirements definition stage of a typical SDLC will make requirements definition process more user-centred from the beginning. Identify need for human-centred design process Understand and Specify the Context of use 1 Software Development Life Cycle
Evaluate design against requirements 4
Specify the user and organisational requirements 2
Requirements Definition
Design
Produce design solutions 3
Build
Testing System satisfies specified Specified user and Organisational requirements Release Final Design
Fig. 3. The integration of ISO 13407 process model into a typical SDLC
There are many significant advantages of integrating the ISO 13407 process model through requirements into a typical SDLC: • Requirements definitions are more user-centred and task oriented. • Requirements definition phase can be completed fairly quickly. • System and software design phase can be driven with concrete design solutions leading to lesser turnaround time.
376
S. Adikari, C. McDonald, and N. Lynch
• Testing phase can also be user-centred because of the availability of usercentred requirements specifications and design solutions to aid for usabilityfocused test specifications.
5 User and Usability Requirements The key challenge of UCD process is to communicate the user-centeredness effectively to the designer and developer. We propose the integration of user modelling and usability modelling into software development process as important in filling this communication deficiency. 5.1 Conceptual User Model A user model is a representation of information and assumptions about users [13] and can be viewed from three perspectives: modelling user knowledge, modelling user plans, and modelling user preferences [14]. Modelling user knowledge involves the accurate estimation of users’ background knowledge, skills, and experience. Modelling user plans aims to investigate the sequence of user tasks required to achieve user goals. Modelling user preferences primarily focuses on users’ information needs and preferences. Our proposed conceptual user model consists of seven user attributes as illustrated in Figure 4. In this research a user model of the existing system was created through the process of user interviews, and observations and persona development resulting a specification of the model. User Needs and Expectations Existing Knowledge and Skills Existing Experience
User Model
User Goals and Tasks Physical Attributes Cultural Factors Attitude Information
Fig. 4. Conceptual User Model
5.2 Conceptual Usability Attribute Model Some of the important usability attributes published in the literature are: Learnability [15], Memorability [16 page 31], Functional Correctness [17], Efficiency [16 pages 30-31], Error Tolerance [16 pages 32-33], Flexibility [15 pages 260-270], and Satisfaction [16 pages 33-37, 11 page6]. Figure 5 uses an Ishikawa diagram (fishbone diagram) to illustrate the conceptual usability attribute model and its measurable criteria. It shows that usability is a combination of seven usability attributes: Efficiency, Functional Correctness, Error Tolerance, Learnability, Memorability, Flexibility, and Satisfaction, and the each usability
Design Science-Oriented Usability Modelling for Software Requirements
Functional Correctness
Efficiency E1
FC1 E2
Error Tolerance ET1
FC2
E3
377
ET2
FC3
ET3 Usability
L1
M1
L2
F1
M2
S1
F2
L3
S2 S3
L4 Learnability
Memorability
Efficiency E1 – Task completion in minimum time E2 – User tasks are not misleading E3 – No workarounds are needed
Flexibility
Satisfaction
Functional Correctness
Error Tolerance
FC1 – Task completion in minimum time
ET1 – Appropriate error messaging for invalid conditions FC2 – User tasks are appropriate, effective ET2 – Ability to exit error conditions or and match the user requirement unwanted states FC3 – User spends minimal time on “Help” ET3 – No workarounds are needed
Satisfaction S1 – User desirability of the system and user tasks S2 – User opinion about user experience S3 - User opinion about frustration or confusion
Learnability
Memorability M1 – No memory recall to carry out tasks
L1 – Clear visibility of current system status and a feel about what to do next L2 – User tasks are not misleading L3 – Task completion in minimum time
M2 – User spends minimal time on “Help” L4 – User spends minimal time on “Help”
Flexibility F1 – Multiplicity of ways to carry out user tasks F2 – User control of task performance
Fig. 5. The integration of ISO 13407 process model into software development cycle
attribute is governed by several usability related measurable aspects of the system or product. For example, the usability attribute “Efficiency” can be measured, based on the evaluation of three components: E1- Task completion in a minimum time, E2User tasks are not misleading, and E3- No workarounds are needed.
6 Research Design and Research Process The aim of the research design was to test whether systems design quality was improved when functional specifications were explicitly enhanced with user modelling and usability modelling. Seven designers participated in the design process. We selected a web-based library information system (“existing system”) as our target system of study. We presented the functional specification for the existing system to each designer and requested to produce a user interface based on the given functional specifications. We then gave each designer User and Usability Specifications and invited them to refine the design on the basis of the added information. The two designs were evaluated against the usability criteria and compared to detect the differences. This process was repeated with each designer. Results were aggregated to see where, and in what ways, the designers’ work differed and how the differences might have impacted on the quality of their design. Our research design is illustrated in Figure 6.
378
S. Adikari, C. McDonald, and N. Lynch
Existing System
T1 User Modelling
P1 User Model
T2 Usability Modelling
P2 Usability Model
System Functions Specifications
T3 Design Interface
T5 Questionnaire
P3 Interface 1
T4 Redesign Interface
P4 Interface 2
T6 Evaluation Process (Interface 1)
T7 Evaluation Process (Interface 2)
P5 Evaluation Outcome (Interface 1)
P6 Evaluation Outcome (Interface 2)
Key T
- Research Task
P
- Research Product - Flow between product and task (Input or Output)
T8 Comparison
P7 Research Findings
Fig. 6. The Research Design
Summary of activities in the research design: • T1 & T2 User-centred designers carried out user modelling and usability modelling producing artefacts: user model and usability model. • T3 Systems designers designed and produced Interface 1 based on system functional specification only • T4 Systems designers redesigned the interface and produced Interface 2 using Interface 1, and the user model and usability model • T5 Systems designers filled out questionnaires expressing their views, design experience and opinions on redesigning the interface • T6 & T7 Interface testers carried out an evaluation process against user requirements and usability requirements on both interfaces involving end users • T8 Outcome of the evaluations was compared to come up with research findings. 6.1 Functional Specifications As functional specifications, a set of documents were provided to designers namely; a scenario of use for a simple library system, a use case diagram, a data base structure diagram, and a diagram detailing library tables and sample data. The scenario of use contained a narrative description of a typical library system user completing a number of user tasks as per the use case diagram shown in the Figure 7. 6.2 Enhanced Requirements Specifications The addition of the two models based on user modelling and usability modelling produced an improved version of SRS - the Enhanced Requirements Specification (ERS).
Design Science-Oriented Usability Modelling for Software Requirements
379
Use Case Diagram: Library System
Search for Book(s)
Reserve Book(s)
Review Personal Account
Reserve Book(s)
Library System User
Cancel Reserved Books
Fig. 7. Use case diagram for library system
The research described will compare the designs resulting from the enhanced specification with those produced from only the functional specification to test the proposition that enhances specification produces more testable designs that are better suited to their environment.
7 Interface Evaluation and Results For the design of interface 1, designers were provided with only the functional specifications (FS). For redesigning the interface, designers were provided with enhanced requirements specifications (ERS) consisting user modelling and usability modelling descriptions. We requested interface designers to fill out a questionnaire expressing their views, design experience and opinions on designing and redesigning the interfaces. The questionnaire used a five point scale. A summary of findings of the questionnaires with mean (µ) and Standard Deviation (SD) values is as follows. For the design of interface 1 using only FS: • How easy was it to design the interface with the specifications provided? (µ=3.14, SD=0.69) • To what extent did you want additional information to create a proper design? (µ=2.0, SD=0.58). • To what extent the specifications were helpful to create fields, buttons, tabs, menus, information content etc. in the design? (µ=3.43, SD=0.98). • To what extent did you use your previous experience to create fields, buttons, tabs, menus, information content etc. in the design? (µ=1.43, SD=0.53). • For the design of interface 2 using ERS: • How easy was it to design the interface with the additional specifications provided? (µ=4.43, SD=0.53) • To what extent did you want FURTHER additional information to create a proper design? (µ=2.42, SD=0.98). • To what extent the additional specifications were helpful to create fields, buttons, tabs, menus, information content etc. in the redesign? (µ=4.14, SD=0.69).
380
S. Adikari, C. McDonald, and N. Lynch
• To what extent did you use your previous experience to create fields, buttons, tabs, menus, information content etc. in the redesign? (µ=2.14, SD=1.35). • A user tester facilitated interface evaluations on both interfaces involving one user for each design. A summary of results is outlined below with details in relation to the usability evaluations on five focus areas. • The evaluation of Interface 1: • Overall, the system was easy to use (µ=3.21, SD=0.63). • Ability to complete tasks in a reasonable amount of time (µ=3.14, SD=0.62). • Individual pages were well designed (µ=2.86, SD=0.63). • The content of the system meet user expectations (µ=3.07, SD=0.61). • The organisation and terminology used in the system was clear (µ=3.29, SD=0.57). • The evaluation of Interface 2: • Overall, the system was easy to use (µ=4.36, SD=0.75). • Ability to complete tasks in a reasonable amount of time (µ=4.29, SD=0.76). • Individual pages were well designed (µ=3.43, SD=0.53). • The content of the system meet user expectations (µ=3.5, SD=0.66). • The organisation and terminology used in the system was clear (µ=3.57, SD=0.53).
8 Discussion Table 1 shows a summary of results for designers’ questionnaires. Criteria 1 and 3 show distinct improvement in designers' perception of the usefulness and helpfulness of the enhanced requirements specification. Criteria 2 and 4 asked the designer for “requirements of additional information” and “use of their previous experience” for design and redesign processes. These criteria too showed that the enhanced specification was more complete and relied less on previous designer experience (note that as these questions are of 'negative' nature, the 5 point scale was reversed for the responses so that the correct µ and SD could be calculated). The perception that designers relied less on previous experience in their second design could have been influenced by some level of experience designers gained through first interface design. The intent of the question was made clear during the data collection. Table 1. Analysis of design questionnaire Questionnaire Criteria 1.The extent of the usefulness of specs 2. The need of additional information 3. The extent of helpfulness of specs 4. The use of designer’s experience
FS µ 3.1
ERS µ 4.4
Diff µ 1.3
2.0 3.4 1.4
2.4 4.1 2.1
0.4 0.7 0.7
Table 2 shows a summary of results relating to the usability evaluation of interfaces. The ERS was evaluated as better on all criteria.
Design Science-Oriented Usability Modelling for Software Requirements
381
Table 2. Analysis of interface evaluations Evaluation Criteria 1. Ease of use 2. Task completion 3. Individual page design 4. Information content 5. Organisation and terminology
FS µ 3.2 3.1 2.9 3.1 3.3
ERS µ 4.4 4.3 3.4 3.5 3.6
Diff µ 1.2 1.2 0.5 0.4 0.3
Table 3 summarises the two tables above. It shows that from the perspective of the designer and the perspective of product quality the design based on ERS was superior to that based on FS alone. Table 3. Analysis of interface design and evaluation Description Design Questionnaire Interface Evaluation
FS µ 2.5 3.1
ERS µ 3.3 3.8
Diff µ 0.8 0.7
9 Conclusion In this paper, we have reported research into the use of UCD approaches to interface development through the use of enhanced requirements specifications. This research shows that deploying user modelling and usability modelling specifications made a positive difference to both designer perception and to design quality of an interface. Improved design can be expected to lead to improved development of more usable and quality systems. Developers will be able to incorporate usability aspects effectively into systems for optimal usability and testers will be able to test systems effectively and efficiently to uncover functionality issues as well as usability issues. Such approaches will ensure that any system that goes “live” will be with no or minimal usability issues hence minimising the usability-related issues in end products and enhancing the positive user experience. There are three kinds of contributions made by this research. First, evidence has been collected as to the impact of user and usability specification on design. Second, techniques have been developed for the practical presentation of these specifications. Third, the research provides a reflection on the 'design science' research approach.
References 1. Folmer, F., Bosch, C.: Architecting for Usability: A Survey. In: Journal of Systems and Software, vol. 70, pp. 61–78. Elsevier, Amsterdam (2004) 2. Folmer, E., Gurp, J., Bosch, J.: Software Architecture Analysis of Usability. The 9th IFIP Working Conference on Engineering for Human-Computer Interaction, Hamburg (2004) 3. Bevan, N.: Design for Usability. In: Proceedings of HCI International, pp. 762–767 (1999) 4. Sommerville, I.: Software Engineering. Pearson Addison-Wesley, England (2004)
382
S. Adikari, C. McDonald, and N. Lynch
5. Venable, J.R.: The Role of Theory and Theorising in Design Science Research. First International Conference on Design Science Research in Information Systems and Technology, Claremont, pp. 1–18 (2006) 6. Carlsson, S.A.: Developing Information systems design knowledge: A critical realist perspective. The. Electronic Journal of Business Research Methodology 3(2), 93–102 (2005) 7. March, S.T., Smith, G.F.: Design and Natural Science Research on Information Technology. Decision Support Systems 15(4), 251–266 (1995) 8. Au, Y.: Design Science I: The Role of Design Science in E-Commerce Research. Communications of the AIS, vol. 7 (2001) 9. Ball, N.: Design Science II: The Impact of Design Science on E-Commerce Research and Practice. Communications of the AIS, vol. 7 (2001) 10. Hevner, A.R., March, S.T.: The Information Systems Research Cycle. IEEE Computer, 111-113 (November 2003) 11. ISO 13407:1999(E).: Human-Centred Design Processes for Interactive Systems (1999) 12. Jokela, T., Iivari, N., Matero, J., Karukka, M: The Standard of User-Centred Design and the Standard Definition of Usability: Analyzing ISO 13407 against ISO 9241-11. Retrieved July 30, 2006, (2006) from http://delivery.acm.org/10.1145/950000/944525/p53jokela.pdf?key 1=944525&key2=4303774511&coll=portal&dl=ACM&CFID=74456598&CFTOKEN=24 876680 13. Kobsa, A.: Supporting User Interfaces for All Through User Modeling. In: Proceedings HCI International’95, pp. 155–157. Elsevier, Yokohama (1995) 14. Kobsa, A.: User Modeling: Recent Works, Prospects and Hazards. Retrieved July 28 (2006) ( 2006), from http://www.ics.uci.edu/ kobsa/papers/1993-aui-kobsa.pdf 15. Dix, A., Finlay, J., Abowd, A.D., Beale, R.: Human-Computer Interaction Pearson, Upper Saddle River NJ, pp. 260–270 ( 2004) 16. Nielsen, J.: Usability Engineering. Academic Press, San Diego (1993) 17. Brinck, T., Gergle, D., Wood, S.D.: Usability for the Web: Designing Web Sites that Work. Morgan Kaufmann, San Francisco (2002)
Prototype Evaluation and User-Needs Analysis in the Early Design of Emerging Technologies Margarita Anastassova1, Christine Mégard2, and Jean-Marie Burkhardt3 1
CREATE-NET, Via Solteri, 38 - 38100 Trento - Italy [email protected] 2 CEA/LIST, 18, route du Panorama, BP 6, 92265 Fontenay-aux-Roses Cedex, France [email protected] 3 René Descartes University, Unité Ergonomie – Comportement & Interactions, 45, rue des Saints-Pères, 75270 Paris Cedex 06, France [email protected]
Abstract. This paper presents two case studies of prototype evaluation as a tool for user needs elicitation for emerging technologies. In the first user evaluation, a high-fidelity virtual reality prototype is used. In the second one, a low-fidelity mixed reality prototype is used. Our results show that prototypes may be a powerful a tool for eliciting user-needs, but their potentiality depends on their fidelity. In our studies, users elicit more needs when working with the highfidelity prototypes. Furthermore, the information collected in this case is richer and more useful for design. We discuss these results as well as some factors that could help stakeholders elicit a greater number of needs for emerging technologies. Keywords: Mixed Reality, Early Design, Emerging Technologies, Prototype Evaluation, User Needs Analysis, Virtual Reality.
approach to user needs seems quite interesting in the field of emerging technologies, which are also in constant evolution. Of course, the limited interest in the utility of emerging technologies may be because there is no need for specific research on user needs analysis for emerging technologies. Such a statement implies that the results of empirical analyses of user needs for traditional technologies could be directly transposed to innovative applications. Moreover, it means that traditional needs analysis methods in HCI are also easy-touse and suitable for innovation. However, there are many arguments against this assumption. These arguments, briefly discussed below, show that the design of useful emerging technologies challenges existing HCI knowledge and methodology. 1.1 Difficulties in Analysing End-User Needs in the Early Design of Emerging Technologies A first argument supporting the assumption that the elicitation of user needs for emerging technologies is a difficult matter stems from the fact that, by definition, such technologies express designers’ strive for technical achievements. As a result, their development is essentially technology-driven and user needs often remain designers’ minor concern. Therefore, HCI specialists, if requested, are principally integrated in the later stages of development projects. Furthermore, literature reviews in the field of VR and MR reveal that current research focuses on building systems ad-hoc and on evaluating them in artificial or informal settings [6]. In the rare cases where explicitly evoked, user-needs analysis is generally done by interviewing very few “task experts” (e.g. [7]), by quick field studies of future users’ activity (e.g. [8]) or by questionnaires (e.g. [9]). Such practices and research focuses hinder the capitalisation of HCI knowledge on user-needs analysis and partly explain the lack of a structured methodology for such analyses. A third argument for the difficulty of user needs analysis for emerging technologies is that innovation is upcoming and in search of potential applications. Consequently, it is barely known by its future users. Thus, users are not likely to express their needs for innovation because they can hardly imagine and describe what might be possible to do with an eventual future technology. In addition, “the more radical an innovation…the harder it is to understand how it should look, function, and be used” [10]. In fact, people are generally most prone to communicate needs which they are particularly aware of. Therefore, most of the HCI methods traditionally used for user needs analysis help the elicitation of such conscious user needs, thus undoubtedly informing design and key industrial stakeholders. However, during the early design of emerging technologies, users are required to express their “undreamed of requirements” [3], and unless people are encouraged explicitly to think about such requirements, they are unlikely to appear until later in the development of a technology, when its potential applications become clear and evident [11]. 1.2 Prototypes as a Tool for User-Needs Analysis in the Early Design of Emerging Technologies Prototypes may be one of the powerful tools for encouraging people to express their “undreamed of requirements” for emerging technologies. Their principal advantages and disadvantages in this context are discussed below.
Prototype Evaluation and User-Needs Analysis in the Early Design
385
The main advantage of prototypes for user-needs analysis is the fact that they are concrete physical representations of the future emerging technology as well as of some of its functional, aesthetic and interactional characteristics. Their concreteness facilitates users’, managers’ and, in some cases, designers’ understanding of abstract, unfamiliar or fuzzy technological concepts. Furthermore, prototypes could be considered as “executable representations of (designer’s) knowledge” [12]. As in the process of innovation designers are usually more knowledgeable than users and managers about multiple technical aspects, prototypes may help to transmit some of designers’ knowledge to other stakeholders. Thus, prototypes may support discussions about the functions of the future emerging technology and facilitate the refinement and elicitation of “latent” [11] and unconscious user-needs. For these reasons, physical prototyping of emerging technologies may be an efficient way (1) to demonstrate, communicate and explore a number of possible design ideas and solutions; (2) to search for design alternatives and, hopefully, (3) to provoke further innovation [13]. However, some authors have suggested that prototyping may not always be profitable to elicit “emerging” needs. Though there are few empirical results available, five major limits of prototypes in this context have been advanced. First, there might be an important difference between the functional characteristics of a prototype and the characteristics of a final product, because the potential applications of emerging technologies are, by definition, ill-defined [14]. Second, prototypes might primarily express their designer’s point of view on functions and interaction [15]. Third, especially if prototypes are high-fidelity ones, their concreteness might inhibit stakeholders’ imagination as well as their search of innovation [16]. Forth, as for lowfidelity prototypes, they might be « stigmatized » as less efficient, and even rejected, if stakeholders extended to them their expectations and mental images of traditional and, plausibly, more elaborated technologies. Finally, some researchers claim that low-fidelity prototypes have limitations, since they cannot precisely represent a number of interactional and sensory aspects of future products (e.g. [17]). Thus, an important issue is to clarify the advantages and disadvantages of both low- and high-fidelity prototypes evaluation as a method for user-needs elicitation in the early design of emerging technologies. In this paper, we compare the results of two case studies having this objective. The first one uses a high-fidelity prototype, while the second one uses a low-fidelity prototype. These were two prototypes of two different systems. We chose this solution instead of analysing two versions of the same prototype mainly because of practical and financial constraints. As the design of emerging technologies demands a lot of technical resources, it was impossible to have more than one prototype per system.
2 Case Study 1 The first case study concerns the design of a VR prototype with force feedback for upper-limb rehabilitation. The initial idea of the prototype was to provide various force-feedback based exercises adapted to the motor and the cognitive abilities of patients, who had motor difficulties shortly after a neurological lesion. Starting from this promising idea, the objective of the study was to verify how the prototype
386
M. Anastassova, C. Mégard, and J.-M. Burkhardt
matched actual user needs. We also wanted to refine these needs and to extend them to other potential applications that might be supported by such a prototype. 2.1 Prototype Design Approach End-User Participation. Eighteen users in two teams belonging to two rehabilitation centres participated in the design process. The first team was composed of one rehabilitation doctor and two occupational therapists. Four patients took part in the evaluation. The second team was composed of two physiotherapists, one occupational therapists and one rehabilitation doctor. Six patients participated in the evaluation. The other members of the project were one rehabilitation doctor who served as an expert, and 4 designers. Method and Resulting Prototype. At the beginning of the project, meetings were organized to gather user needs. Users felt that VR technologies with haptic feedback could be useful for rehabilitation but it seemed difficult for them to precisely define and express their needs. Therefore, it was decided to follow a prototype-based methodology to support the elicitation of user needs both for the hardware and for the software aspects of the future rehabilitation system. Thus, the next step was to define a few concrete rehabilitation exercises together with the hardware and software applications to support them. Four exercises, mainly dedicated to rehabilitation of the shoulder, were defined by the therapeutic teams with the help of the designers, who intervened in order to ensure the technical feasibility. These exercises were: (1) Crank wheel rotation, which implied movements of the shoulder; (2) Pong exercise, which consisted in catching a ball with a virtual racquet moving horizontally or vertically; (3) Trajectory exercise, in which the path of movement was defined by the therapist and the patient must follow it; (4) Library exercise, in which the patient had to pick virtual books on a shelf and put them onto another shelf. Different parameters could be controlled by the therapist, namely the level of difficulty of the exercises, the friction level of the robot, the weight of the virtual objects manipulated and the amplitude of the movements of the patient’s arm. The hardware of the prototype was based on a force feedback device called Virtuose from Haption Inc, built on technologies developed by CEA/LIST in France. It is a six degrees-of-freedom arm with a 40cm side working space ending with a shovelshaped handle to allow manipulation by spastic patients, i.e. patients having too much muscular tonus. This device can be considered as a high-fidelity hardware prototype for VR applications. The general GUI developed for the purpose of the evaluation was rather simple. It was specified and implemented by a software partner. 2.2 Prototype Evaluation Approach and Needs Collection Evaluation Approach. Each team could use the system during one week. No common methodology had been defined for the evaluation, because each team was working rather differently. Instead, it was decided that the therapists should freely choose their patients and the parameters of the exercises. Thus, the first team proceeded with a systematic evaluation of the 4 exercises for each parameter and with each patient. The therapists reported the problems they
Prototype Evaluation and User-Needs Analysis in the Early Design
387
encountered during the evaluation as well as the positive aspects of the system in a document. The second team decided to use the prototype as regular rehabilitation equipment. Each parameter was not systematically tested, but each patient could practise the exercises selected by the therapists according to her pathology and her capacities. Four patients used the system during an hour session each. Two patients used the prototype 3 times during the week in order to get an idea of the interest of the device for a longer rehabilitation period. All evaluation sessions were filmed and postanalysed. Data Analysis for Needs Collection. There were three steps in this analysis. First, the content of observations collected using the video data and the written document produced by the first team were classified according to 3 exhaustive categories, namely (1) the problems encountered either by the therapist or by the patient; (2) the positive judgments emitted by the therapists during the evaluation and (3) the new ideas for the enhancement of the system, further referred as “Needs”. The comments of the patients were not analysed because they emitted few critiques and suggestions. Second, the observations were further coded in 7 classes according to the central aspect, which they concerned. The first class is hardware, covering aspects related to the robot, its handle and support. The second class concerns software aspects. The third class concerns all general settings (i.e. the overall accessibility for patients and therapists). The other classes concern the content of the four rehabilitation exercises defined. Last, a plenary meeting was organised to gather stakeholders’ feedback on the evaluation and its results. The evaluation results were presented to the stakeholders and were used to elicit more suggestions on the possible enhancement of the system. The suggestions gathered during the meeting issued mainly from the designers. They were coded as “Needs”, since they corresponded to probable requirements constructed on the basis of users’ feedback on the utility of the prototype and on the problems encountered during the evaluation. 2.3 Major Results Eighteen observations are positive comments dispatched in the 7 classes. These observations show the great enthusiasm of the therapists. There are also 76 problems reported, which are not uniformly dispatched in the categories. Most problems (n=46, 61%) found concern the exercises. Many problems expressed concern also the hardware (n=18, 24%). They are mainly related to the handle of the robot and some security aspects. The general setting is also the object of critiques (n=9, 12%). The therapists criticize mainly its accessibility and the installation of the patients. Software was rarely criticized (3 times, i.e. 3%). Seventy-four needs were collected during the meetings, from the observations and from the written report. Among these needs, 84 % (n=62) are directly related to the concrete problems encountered during the evaluation and 16% (n=12) are needs expressed independently. Thus, the repartition of the needs follows tightly the repartition of the problems encountered. User needs are not uniformly dispatched in the classes. Most users’ needs are expressed on exercises (45, i.e. 61%). This result shows
388
M. Anastassova, C. Mégard, and J.-M. Burkhardt
good creativity on rehabilitation exercises in order to improve their efficiency and to maintain the motivation of the patient. Twenty percent of the needs (n=15) concern hardware. Most of them are ideas for the improvement of the handle. However, no major modifications of the general design and setting have been suggested. The most important problem concerning the general setting is the lack of compensation of the weight of patient’s arm. In fact, many patients could not move their arm upward because they lack arm strength and the prototype would not allow compensating this weight in order to provide gravity-free movements. The suggestions in this direction were provided by designers only. They proposed a new mechanical architecture in order to enhance the motor capacities of the system, hoping this solution will solve the problem of compensation of the weight of patient’s arm.
3 Case Study 2 This second study concerned an evaluation carried out with a low-fidelity MR prototype for guiding train maintenance tasks. The MR prototype had to provide contextual repair information to inexperienced train maintenance technicians by overlaying computer-generated graphics and textual repair instructions on a real piece of equipment. This was a formative evaluation done during the early design of the prototype. Its initial objective was to rapidly assess the usability and the usefulness of the prototype for such type of tasks. The results of the evaluation had to serve the redesign of the MR prototype. Moreover, the evaluation had a second indirect goal, which concerned the clarification of the role of low-fidelity prototypes for user needs elicitation. It is this aspect that is mainly developed below. The usability aspects have been reported elsewhere [14]. 3.1 Prototype Design Approach End-User Participation. The participation of real future users was impossible for two main reasons. First, the industrial partner did not want to provide access to real end-users to participate in the design and test phase. Second, the prototype we evaluated possessed very few of the functions of the future finished product and we were afraid of an eventual technological rejection. Resulting Prototype. The MR prototype was a video see-through system where a handheld tablet-PC was used as an augmented window. Thus, repair instructions in textual and graphical form as well as 3D models and 2D graphics and animations were overlaid on a real piece of equipment. The user interacted with the resulting multimedia content using a simple WIMP interface and a pointing device. The tracking and registration was done by a camera attached to the tablet-PC. The total weight of the prototype was 2 kg and it was suspended on user’s neck. 3.2 Prototype Evaluation Approach and Needs Collection Subjects. Ten subjects (6M, 4F), all of them computer and electronics engineers in one of the laboratories working on the project, participated in the study. A control group of 5 subjects used standard paper repair instructions. The experimental group
Prototype Evaluation and User-Needs Analysis in the Early Design
389
used MR-based guidance. The participants were aged from 26 to 46 (M = 30, SD = 6). Because of their professional background, they were all familiar with VR and MR technologies. Experimental Task. The task was rather simple and consisted in removing nuts and washers from a real piece of equipment. The task comprised 9 steps demanding one or more user actions. The task and the real piece of equipment were identical for both the paper-guided and the MR-guided group. Procedure and Data Analysis. All the user evaluation sessions were filmed. The evaluation required two evaluators, since the one was filming the user’s actions, while the other helped the user when technical breakdowns occurred. Participants were let to freely explore the prototype. Then, they were asked to perform the maintenance task. The participants could pose any question about the prototype and its usage. After the test sessions, the subjects using MR-based guidance were also interviewed on their difficulties, needs and ideas on the potential applications of the prototype in train maintenance tasks. The data obtained (videotapes and verbal protocols transcribed verbatim) were analyzed. The video data analysis focused on total time for task completion (comprising the time for instruction comprehension and the time for manipulation of the real piece of equipment); number of steps successfully completed; number of deviations; eye movements for the search of information, etc. The analysis of the verbal protocols obtained focused on users’ difficulties, needs and suggestions. 3.3 Major Results The user evaluation helped the identification of numerous usability problems (for more detailed results, cf. [14]). Primarily, we found that the MR-guided group realized the repair tasks less rapidly than the paper-guided group (this group needed 20% more time to realize the task). In the same time, the participants made the same number of errors whatever the type of guidance was. As for postures and gesture activity, MR users stayed in the same position longer than the users of paper instructions. In general, MR users stood up, while the participants in the paper-based condition stood mostly squatted down at the real piece of equipment. Having in mind these empirical results, it seems logical that, in their postexperimental interviews, 4 subjects did not find the MR prototype useful for the maintenance task chosen as a possible application, since the latter was too simple and could be done without any technological aid. The users’ impression of the limited utility of the prototype was reinforced by the numerous registration problems encountered during the evaluation. In fact, the registration problems constitute the main class of all the problems evoked in the interviews (33% of all the problems). Four of all the 5 participants we interviewed did not even wait for a good registration because the process was too long. Thus, they did not use one of the main advantages of MR systems, namely rapid contextual guidance. Furthermore, they did not express any needs for such guidance and for better registration as the task was too simple. In fact, these four participants declared having used mainly the information from the textual
390
M. Anastassova, C. Mégard, and J.-M. Burkhardt
instructions provided by the tablet-PC as well as some visual indications, which the real piece of equipment “afforded”. The design of animations was the second class of problem elicited by our participants. This problem has been evoked in 25% of the cases, and by all participants. The users estimate these animations too rapid and semantically unclear. The other problems put forward by our test participants concern the usability of the MR prototype (its weight and bulkiness). The MR users evoked very few suggestions for the future development of the MR prototype. Compared to the number of problems evoked, the number of suggestions is 3 times less important. These suggestions concern primarily the semantics and the colors of animations and the non-usage of registration.
4 Discussion and Conclusions The two case studies reported above provide qualitatively and quantitatively different pictures of prototype evaluation as a tool for needs elicitation for emerging technologies. These differences may be explained by several factors related to the design and the evaluation approaches chosen in our two case studies. The first explicative factor is the degree of fidelity of the prototype evaluated in each case. Our results show that the number of needs expressed by users is greater when working with the high-fidelity VR prototype than when working with the lowfidelity MR prototype. Furthermore, the information collected is richer. In this sense, our results do not support the hypothesis that the concreteness of high-fidelity prototypes would inhibit users’ imagination and needs elicitation [16]. A possible interpretation of this result is that a low-fidelity prototype would provoke a focalisation on minor usability problems, and would thus hinder the emergence of ideas about redesign, possible applications or anticipated benefits of the technology. Therefore, an accurate representation of a future emerging technology and of what is technically feasible could be beneficial for user needs elicitation. The second explicative factor is the representativity of users. The first case study was done with real future users, whereas their participation was not possible in the second one. This fact could explain the richness of data in the first case, where the actual future users were really motivated to contribute to the design of a potentially useful future emerging technology. Furthermore, the therapists used their knowledge of devices currently used in their daily work in order to provide suggestions for the improvement of the VR prototype. On the contrary, in the second case, the MR technology was almost perceived as a funny new toy, and the evaluation sessions – as a new game. Therefore, less effort was done to elicit user needs and suggestions for improvement. In the same vein, in the first case, the experimental tasks were real user tasks, whereas, in the second case, the task was too simple to be perceived as real. Moreover, as participants were not representative of future users, they had a vague idea of their real tasks. We may reasonably expect the elicitation of a greater number of needs, if the prototype evaluation is done with real future users and on real tasks. The discussion about users’ and tasks representativity naturally introduces some remarks on users’ roles in the design of emerging technologies. In the first case study,
Prototype Evaluation and User-Needs Analysis in the Early Design
391
the design process was longer and richer (e.g. it comprised several meetings and evaluation sessions). Therefore, users were regularly involved in design in a participatory manner. This fact allowed a co-construction and a gradual evolution of their needs. In this sense, users acted as co-designers. In the second study, users intervened locally as test participants. Therefore, they elicited a small number of needs, which could not be further discussed and developed in direct interactions with designers. Last but not least, the evaluation setting and approach were different in both cases. In the first case, the prototype evaluations were done in a real work setting and the therapists could freely choose their patients and the parameters of the exercises. In the second case, the evaluation was done in a laboratory setting in a quite formal manner. Our hypothesis is that the artificial setting and the formal comparison between the traditional guidance and the MR-guidance could partially explain (1) the poorer performance of the MR-group and (2) the small number of needs elicited by test participants. We could reasonably expect better results, if the evaluation was done in a real work setting, using less formal approaches (e.g. focus groups). In conclusion, the evaluation of high-fidelity prototypes as well as iterative prototyping seem efficient ways to point some problematic aspects of design, which could further serve to elicit new needs. We think that this process would be most beneficial if done in a close cooperation with designers, whose ideas on what is technically feasible are very important in the field of emerging technologies. In the future, we plan to evaluate more prototypes of different emerging technologies, and in different evaluation settings in order to have more formal and convincing results on prototype evaluation as a tool for user needs elicitation for emerging technologies.
References 1. Benko, H., Ishak, E.W., Feiner, S.: Collaborative Mixed Reality Visualization of an Archaeological Excavation. In: Paper presented at the 3rd IEEE/ACM ISMAR 2004, Arlington, VA (November 2004) 2. Nielsen, J.: Usability Engineering, AP Professional, Cambridge (1994) 3. Robertson, S.: Requirements Trawling: Techniques for Discovering requirements. Int. J. H.-C. St. 55, 405–421 (2001) 4. Brangier, E.: Besoin et Interface. In: Akoka, J., Comyn-Wattiau, I. (eds.): Encyclopédie des Nouvelles Technologies. Vuibert, Paris, pp. 1070–1084 (2006) 5. Kjeldskov, J.: Human-Computer Interaction Design for Emerging Technologies: Virtual Reality, Augmented Reality and Mobile Computer Systems. PhD Thesis, Aalborg University, Aalborg (2003) 6. Anastassova, M., Burkhardt, J.-M., Mégard, C., Ehanno, P.: L’ergonomie de la Réalité Augmentée pour L’apprentissage: une Revue. Le Tr. H (in press) 7. Gabbard, J.L., Swan II, J.E., Hix, D., Lanzagorta, M., Livingston, M., Brown, D., Julier, S.: Usability Engineering: Domain Analysis Activities for Augmented Reality Systems. In: Woods, A., Merritt, J., Benton, S., Bolas, M. (eds.): The Engineering Reality of Virtual Reality 2002. SPIE Vol. 4660, pp. 445–457 (2002) 8. Träskbäck, M., Haller, M.: Mixed Reality Training Application for an Oil Refinery: User Requirements. In: Paper presented at VRCAI 04, Singapore (June 2004)
392
M. Anastassova, C. Mégard, and J.-M. Burkhardt
9. Anastassova, M., Burkhardt, J.-M., Mégard, C., Ehanno, P.: L’ergonomie de la Réalité Augmentée pour L’apprentissage: une revue. Le Tr. H, vol. 70, pp. 97–125 (2007) 10. Leonard, D., Rayport, J.F.: Spark Innovation Through Empathic Design. Harv. Bus. Rev. 6, 102–113 (1997) 11. Sperandio, J.-C.: Critères Ergonomiques de L’assistance Technologique aux Opérateurs. In: Paper Presented at JIM’2001: Interaction Homme – Machine & Assistance, Metz, France (July 2001) 12. Schneider, K.: Prototypes as Assets, not Toys. Why and How to Extract Knowledge from Prototypes. In: Proceedings of IEEE ICSE-18 (1996) 13. Holmquist, L.E.: Protoyping: Generating Ideas or Cargo Cult Designs? Interactions, pp. 48–54 (March-April 2005) 14. Anastassova, M., Burkhardt, J.-M., Breda, J., Mégard, C.: Evaluation Ergonomique d’un Prototype de Réalité Augmentée par des Tests Utilisateurs: Apports et Difficultés. In: Paper Presented at ErgoIA 2006, Biarritz, France (October 2006) 15. Sutcliffe, A.: User-Centred Requirements Engineering. Springer, Heidelberg (2002) 16. Lindgaard, G., Dillon, R., Trbovich, P., White, R., Fernandes, G., Lundahl, S., Pinnamaneni, A.: User Needs Analysis and Requirements Engineering: Theory and Practice. Int. Comp. 18, 47–70 (2006) 17. Liu, L., Khooshabeh, P.: Paper or Interactive? A Study of Prototyping Techniques for Ubiquitous Computing Environments. In: Paper presented at CHI 2003, Ft. Lauderdale, FL, USA (April 2003)
Long Term Usability; Its Concept and Research Approach – The Origin of the Positive Feeling Toward the Product Masaya Ando1 and Masaaki Kurosu2 1,2
The Graduate University for Advanced Studies (SOKENDAI) [email protected] 2 National Institute of Multimedia Education [email protected]
1 Introduction There are many people who have the belief that the washing machine, for example, should equip with the minimum functions and there is no affection to such a machine. But today, some users have the affection and/or the positive adherence to such washing machine as to equip the slanted drum and anti-bacteria function. − To date, the subjective evaluation of users toward the product or artifact was grasped as the customer satisfaction (CS). But it is more important to let users have such a positive feeling that is more than the simple satisfaction, and this corresponds to the goal of the concept of long-term usability. − The fundamental question here is “how users will have the affection or the positive adherence to the product?” Table 1. Positive feelings towards artifacts
2 Feelings Toward Artifacts Table 1 summarizes positive feelings towards artifacts. Usually the concept of usability is considered as the quality of product, but such feelings listed in Table 1 are different from the concept of quality and represent a positive involvement to the artifact. In other words, an intrinsic motivation is triggered inside the user by using it. Among various theories about the customer satisfaction, the adaptation level theory by Helson (1959) is prominent for expressing the positive involvement of users. According to this theory, the satisfaction is determined by the balance between the assumed level of quality and the real performance of the artifact. But among positive feelings listed in Table 1, such feelings as are not related to the degree of conformity to the assumed level are included. In other words, these feelings cannot be analyzed by the model of customer satisfaction.
3 A Survey on the Structure of Positive Feelings The author conducted a survey regarding products of which users have long been using by applying the “cooperative drawing of usage history chart” method in which a history chart is drawn based on the interview. Informants were 9 people including 2 females. And 27 product items were used in this research. After drawing the history chart since the initial use till the present time, informants were asked to draw the line graph of satisfaction and affective feeling toward the product. Then they were asked the reason for the changes in the graph. Major findings of this survey were as follows; − Users do not have a positive feeling to all product items. − Positive feeling does not exist at the initial usage but will grow up during the long term usage. − Users experienced the change of the way of using the product and the feeling against it because of various reasons including the change in context of use, the accidental operation, and the reference information. − That change is related to the sympathy of the user toward the characteristics of the product. − In other words, if the user could have a sympathetic experience to the characteristics or the concept of the product, s/he may have a positive feeling toward the product to which s/he had not a positive evaluation because of the poor usability. − A positive cyclic relationship can be triggered by this kind of positive feeling in such a way as to begin to use the product more intensively or to customize it.
4 Relationship Between the Theory of Flow Experience by Csikszentmihalyi and the Use of Product There are many theories regarding the intrinsic motivation among which the theory by Csikszentmihalyi is interesting because he studied the structure of the feeling of pleasure in such an autotelic (self-purposed) activity as the chess and the rock climbing. He
Long Term Usability; Its Concept and Research Approach
395
called an involved feeling to the behavior as “flow”. The flow can be felt when there is a balance between the opportunity (challenge) of behavior and the skill of the human being as is shown in Figure 1. It should be noted that such challenge and skill are not objective concept but are based on the subjective perception of human being. According to Wiedenbeck and Davis (2001), it was found that the difference between the interaction style and the learning experience influences the existence of flow experience for the operation of computer. And it was also found that the flow experience is affecting the perception of ease-of-use and usability of application software. It could be said that the positive feeling toward the product can be generated by the sympathetic experience to the product, i.e. flow experience.
Fig. 1. Model of Flow State (Csikszentmihalyi 1975)
Csikszentmihalyi listed up the elements that compose the model of flow experience as follows. Besides he presented the way on how to get the flow experience. − − − − − − − −
to have the insight to achieve the task to be able to concentrate on what one is doing to have a clear goal for what one is doing to have a direct feedback to the task to be in the absorption deep but natural to have the feeling to control one’s own behavior to lose the self-consciousness to change the sensation of time
5 Discussion Wiedenbeck and Davis reported that the level of skill was improved for those who were given the challenge level higher than the skill. Based on this evidence, we can consider the strategy to let the user to have the quasi-sympathetic experience intentionally. It might be possible to plan a new procedure that many people should have the higher level of satisfaction and the positive feeling.
396
M. Ando and M. Kurosu
References 1. Csikszentmihalyi, M.: Beyond Boredom and Anxiety: Experiencing Flow in Works and Play, Jossey-Bass Inc. 1975 tr. by Imamura, H. Shisakusha (2000) 2. Shimizu, S.: New Consumer Behavior, Chikura-shobo (in Japanese) (1999) 3. Wiedenbeck, S., Davis, S.: Intrinsic Motivation, Ease of Use and Usefulness Perceptions as Mediators in Computer Learning. Proc. HCI International 2001 1, 1553–1557 (2001)
General Interaction Expertise: An Approach for Sampling in Usability Testing of Consumer Products Ali Emre Berkman METU - CADCAM / ŪTEST Product Usability Unit, Fac. of Architecture no. 21 Ankara – Türkiye [email protected]
Abstract. As digital technology flourished, modes of interaction pertaining to computer systems started to be utilized in consumer products. As a consequence, problems peculiar to software began to be observed in once simple-tooperate products. In order to overcome these problems, one of the most versatile tools utilized during design and evaluation stages in software development, that is usability testing, was introduced to the domain of consumer products. However, both literature findings and author’s personal experiences show that there are some problems with sampling issues, since participants’ prior experiences with digital interfaces seem to affect test results more in the case of consumer products. In this study, after a theoretical discussion, the measurement tool being developed to control general interaction expertise (GIE) was presented. In the preliminary studies of predictive validity, correlation coefficients up to 0.76 were detected between test scores and usability performance. Keywords: user expertise, usability testing, consumer products, sampling.
For professional products, it is usually possible to determine the characteristics of users and ‘choose’ the ones that best represent the actual population with the help of observable attributes.1 In the case of consumer products, working on homogeneous ‘subsets’ is not plausible most of the time, given the fact that such products are usually intended for a larger portion of the population.2 Therefore, many user characteristics, that vary both quantitatively and qualitatively, should be considered. 1.1 Problems with Heterogeneity Causes and consequences of this methodological phenomenon may best be illustrated with a speculative example: Suppose that during the development process of an innovative cellular phone, the manufacturer wants to see whether the new interface is easy to use or not. Furthermore, the manufacturer wants to verify that basic functions can be easily used by all users. Although usability testing would be the right choice to fulfill those quite specific needs, results of the test would not be able to yield unambiguous results. First of all, the manufacturer would never know whether the sample was representative enough to infer that ‘basic functions can be easily used by all users’, regardless of the level of success observed in the tests. Even if the types and frequencies of usability problems observed are concentrated on instead of—or together with—effectiveness and efficiency data, the problem is not even slightly alleviated. The fact that variance observed in performance may be explained by individual differences causes methodological problems, and is hard to neglect especially in the case of consumer products. For example, in the speculative case provided, some participants may not be able to complete even a single task successfully; interpretation of this result would really be trivial. Was it the interface that caused too much problem for the participants? Was it the participants’ lack of expertise with such interfaces? Were the participants representative enough of the intended users of the product? Although, the need for representative sampling finds support in literature, suggestions about factors to be considered are divergent. The primary aim of any usability test should be to observe the effect of interface design on user performance, and eliminate all the other interfering factors. Egan states that [4] variability in performance up to 20:13 can be explained by differences among users, regardless of design or other factors. Although, how an interface is designed should have a strong determining effect on user performance in a usability test4, and even usability practitioners keep informing participants that what they test is the interface not the participants’ abilities, it is usually the participant’s familiarity with digital interfaces that is being reflected in results. Experiential factors, among other individual differences, are known to have a significant effect on performance (e.g.[5],[6]). 1
Attributes such as age, occupation, level of education, instead of hard-to-measure latent traits. Literally, apart from for whom the product is intended for, everybody that has access to market is a potential user in the case of consumer products. For example, everybody in the world is theoretically a potential user for an mp3 player produced by a global company. 3 This difference in performance was observed with professional software, so ratios more than 20:1 may be expected in the case of consumer products. 4 Actually, this is the very motive behind testing. In conditions where this assumption is violated there is no possibility of turning test results into design recommendations. 2
GIE: An Approach for Sampling in Usability Testing of Consumer Products
399
1.2 Experience, Expertise, and Attitudes In this study, the model suggested in Fig. 1 will be partially utilized for comprehending the relationship between what is experienced (experience) and what is retained— i.e. permanent cognitive changes (expertise and attitudes). Term suggested for the expertise as it is formulated in this study is General Interaction Expertise (GIE) [7] and may be defined as: A general expertise acquired by experiencing several interfaces, which helps users to cope with novel interaction situations.
Fig. 1. Triad of acquisitions
This triadic model is inline with Bandura’s social learning theory [8]. According to the model, as users experience a diversity of interfaces5 for some time they start to gain an expertise and self images are formed. Bandura [8] suggests that individuals possess a self system called self-efficacy, which enables them to influence their cognitive processes and actions. It may be stated that during acquisition of GIE (expertise) through experience, a General Interaction Self-efficacy belief (attitude) is synchronously built.
2 Assessment of GIE Like the assessment of constructs such as computer literacy, assessment of GIE may be done in many ways. Bunz, Curry and Voon [9] argue that experience has actual and perceived facets. The latent construct of GIE can either be observed as competency in interaction performance (actual expertise) or in the form of self-perceptions (perceived expertise). Although both approaches are plausible, assessment of GIE in 5
See [7] for an elaborate discussion on how specific experiences with individual interfaces lead to a general expertise.
400
A.E. Berkman
accordance with so-called ‘actual’ expertise—in other words how to assess GIE through the observation of performance—will be explored for the rest of this study. 2.1 Recognition of Expert Behavior According to Norman [10], human action consists of two main components. In order our goals to be fulfilled, we should be able to perceive and evaluate the current state of the world. This is followed by a set of actions for changing the world so that our goals are accomplished.
Therefore, the steps of the cycle presented in Fig. 2 continuously follow each other until the “the world” is transformed so that our goals are satisfied. However, whether the flow is smooth or constantly interrupted, whether a single iteration is enough or the cycle is run many times depend on the characteristics of the components of interaction. Cycle may be so internalized by the user that both concretizations of goals and interpretation of the world may be minimally crucial. Taken to the extreme, executions may dominate the cycle, that is, automatic processing may take place, minimizing even the need for perception in the form of feedbacks. On the other extreme, there may be cases where sequence of actions may not be readily available, or “interpreting the perception” may not be possible. This usually occurs when people confront with serious problems with a known system, or when they came across with a totally novel interface. In such cases, translation of intention to act to a meaningful sequence of actions and to transform perceptions to evaluations may be problematic. In their seminal work, Human Problem Solving, Newell and Simon [11] argue that “[a] person is confronted with a problem when he [sic] wants something and does not know immediately what series of actions he can perform to get it” (p.72). According to them, together with the apparent qualities pertaining to experts such as extensity and intensity of interface experience; efficacy in building
GIE: An Approach for Sampling in Usability Testing of Consumer Products
401
internal representations when the problem is ill-defined and flexibility in exploring a diversity of methods to obtain the desired outcomes seem to be distinguishing qualities of expert problem solving. 2.2 Development of Apparatus Tests After the theories discussed here and in the previous studies [7], it is possible to formulate GIE as consisting of two fundamental behavioral components, which are automatic loops of execution – evaluation and controlled problem-solving behavior. These two distinct behavior categories constituted the framework for the development of apparatus tests. 2.2.1 GIE_XEC The task consisted of three simple sub-tasks, assumed to fall into automatic loops of execution and evaluation domain defined previously. Task content was deliberately reduced as to eliminate the direct effects of specific experiences. Task difficulty and novelty was tried to be adjusted to a level so that indications of automatic processing would provide a partial estimate of participants’ GIE for the specific case. Before the administration of the test, step-by-step instructions were provided so that task goals and methods of achieving them were clear. Therefore, it was expected that no problem-solving behavior was involved while completing the sub-tasks. Steps to complete one trial were as follows: Sub-task 1: Navigate and choose modify (‘değiştir’), Sub-task 2: Navigate and choose ‘P’, Sub-task 3: Complete the required modifications and choose confirm (‘onay’) Keystroke data6 were recorded for 6 successive trials and the mean elapsed times to complete trials (except trial 1) were assigned as GIE_XEC scores.
Fig. 3. Screenshots for each sub-task in GIE_XEC 6
Data consisted of keys pressed and key stroke latencies.
402
A.E. Berkman
When 6 trials were treated as separate items, odd-even7 reliability coefficient for the instrument was 0.96 (N = 71). However, there was a statistically significant learning effect, manifested in the difference between mean scores for odd and even groups (p < 0.01). 2.2.2 GIE_PS Considering a pre-defined set of heuristics, among many other alternatives a problem situation was chosen to be developed as an apparatus test. Task consisted of reproducing a pattern of shapes shown to participants so that the pattern displayed in the interface screen exactly matches the goal pattern. The interface elements were a display and five push buttons. Three of the buttons were located under the screen, each coupled with a one-digit numerical display. A button labeled with an arrow pointing towards the screen was positioned on the right (redraw button). An auxiliary button labeled “tamam” (done) was positioned between the pattern card and screen. By pushing that button, participants would be able to indicate that the task was successfully completed (see Fig. 4.). Parameters that can be manipulated were not described to participants. At the beginning of the test, the aim of the test was briefly described, together with some limited instructions about the task.
Fig. 4. Screenshot of the user interface for the GIE_PS
A typical sequence of actions taken by an expert user for accomplishing the task would be as follows: (1) Select the slot to be filled with the leftmost button, (2) Modify the type parameter with the middle button, (3) Select the appropriate value for the color parameter with the rightmost button, (4) Press redraw button to see the results, (5) After the goal state is reached, press the button labeled “done”. 7
Data were grouped as trials 1,3,5 - 2,4,6. Mean values for each group were computed and Pearson’s product-moment correlation coefficients were obtained.
GIE: An Approach for Sampling in Usability Testing of Consumer Products
403
2.3 Preliminary Validity Study I After pilot tests to fix bugs and operational problems, GIE_XEC was administered in a real usability test to explore whether there is a considerable correlation between usability performance and independent variables gathered during observations. Usability performance data was gathered during a user test for a dishwasher with a menu-driven interface, which consisted of an LCD, one rotary knob, six shortcut pushbuttons, an on/off and a flow control button. Total effectiveness score across the 7 task scenarios applied to a sample of 15 participants was assigned as the dependent variable that represents user performance (effectiveness). Partial effectiveness scoring was avoided since an objective way of determining partial scores seemed to be impossible. Therefore, in cases where participants could not totally complete the tasks as they are defined, effectiveness was scored as 0. For the main independent variable GIE_XEC scores were assigned. Besides this, number of visual feedbacks that users got in order to re-orientate their fingers or before they press keys (feedbacks) and number of errors (errors) done were also included in analyses to explore any other significant correlations. The results indicate that there is a strong correlation between effectiveness and feedbacks (-0.60), and between effectiveness and GIE_XEC (-0.68). The correlation between errors and effectiveness was not found to be strong (-0.17). Another finding was about the type of relationship between GIE_XEC and effectiveness. During the analyses it was suggested that there may be a non-linear relationship between the variables. After GIE_XEC scores were log transformed and re-analyzed, the correlation between GIE_XEC and effectiveness has increased to -0.74. However, it is early to arrive at a conclusion in accordance with this result. For the performance data collected, it can be stated that: • Participants that were able to complete the tasks embodied in GIE_XEC more efficiently (quicker) were more successful in using the digital interface tested; • Participants that were able to complete the tasks embodied in GIE_XEC with less feedbacks were relatively more successful in using the digital interface tested. • Although it seems that GIE_XEC has a predictive power, this does not necessarily mean that there is a causal relationship between these variables. 2.4 Preliminary Validity Study II For gaining further insight about the predictive validity of GIE_XEC and obtaining preliminary data with GIE_PS, tests were conducted in accordance with a comparative usability test, where the aim was to comparatively evaluate 4 washing machines with digital interfaces. With this purpose 24 participants were allocated to three test groups and each individual interacted with two different interfaces. Two apparatus tests were administered to participants, just before or right after the usability test sessions. Whether participants took the test before or after the sessions was not a controlled factor and was determined mainly by the restrictions imposed by test conditions. The dependent variable that represents user performance was assigned as effectiveness across seven tasks and two interfaces attained by each participant. In order to eliminate the effects of differences regarding the distribution of effectiveness scores
404
A.E. Berkman
for each interface, standardization of scores was performed. For each apparatus test, elapsed time data were used to represent performance. Correlation coefficients detected between effectiveness and GIE_XEC scores, and between effectiveness and GIE_PS scores were -0.69 and -0.46 respectively. After the initial analyses, it was seen that treating elapsed time data as GIE_PS score was quite problematic due to the fact that 8 out of 24 participants quitted the task without attaining success. Whether participant was successful or not in completing the task was thought to better grasp the essence of problem-solving behavior. Therefore, GIE_PS scores were converted into dichotomous pass-fail scores.8 Non-linear relationship hypothesized in Study I was also observed in the data set of Study II and correlation between effectiveness and log transformed GIE_XEC scores increased to -0.73. In order to explore predictive validity further, some ‘ex post facto’ analyses were done. In this regard, participants were seeded within a 2x2 matrix in accordance with their GIE_XEC and GIE_PS scores (see Table 1). Table 1. 2x2 score matrix
High GIE_PS
Low GIE_PS
High GIE_XEC
A
B
Low GIE_XEC
C
D
With this test design, following hypotheses were tested: • H1: Mean effectiveness values for participants seeded in High GIE_XEC row (cell A and B) are higher than the ones seeded in Low GIE_XEC row (cell C and D), where difference between means is D1; • H2: Mean effectiveness values for participants seeded in High GIE_PS column (cell A and C) are higher than the ones seeded in Low GIE_PS column (cell B and D), where difference between means is D2; • H3: Mean effectiveness values for participants seeded in cell A (High GIE_PS ∩ High GIE_XEC) are higher than the ones seeded in cell D (Low GIE_PS ∩ Low GIE_XEC), where difference between means is D3; • H4: The relationship between differences between means is: D3 > D1 and D3 > D2. All the related null hypotheses were rejected, that is, there were significant differences between high – low GIE_XEC groups (D1 = 1.20, p<0.02) 9, as well as between high – low GIE_PS groups (D2 = 1.87, p<0.01) and between cell A and cell D (D3 = 2.48, p<0.01)10. Lastly, the relationship hypothesized in H4 was also observed. 8
After this modification, GIE_PS was converted to a test with only a single item, which is actually not acceptable in terms of reliability. However, since study was explorative in nature analyses were done even with a single item. 9 Note that D1 is the difference between effectiveness scores in the standardized form.
GIE: An Approach for Sampling in Usability Testing of Consumer Products
405
The added value of GIE_PS was shown in the linear model derived: Z = -0.5 Log x + 0.3 p
(1)
Z: Standardized predicted value; x: GIE_XEC score; p: GIE_PS score. Utilizing this model, the correlation between effectiveness and value predicted with the formula given above (1) was 0.76. Given this, it may be stated that variance in GIE scores can be accounted for 58% of the variance observed in the effectiveness (r2 = 0.58). As a result, the partial conclusions drawn in Study I were supported in this study as well. In addition, it may be stated that GIE_PS has augmented the predictive power yielded with only GIE_XEC. However, GIE_PS should be revised so that performance assessment based on many pass-fail type items is possible.
3 Implications for Research and Future Studies Preliminary evidence provided in this study indicates that in its fully-fledged form GIE would be a valuable tool for sampling. Measurement of GIE may be used as a means for justification of certain assumptions regarding participant profile, as a way of manipulating GIE as an independent variable, or for ascertaining that the effects of GIE on test results were kept to a minimum. Furthermore, after determination of normative standards, the tool may also be used to evaluate usability of interfaces in absolute terms. In other words, it would be possible to identify interfaces that require high levels of GIE and those do not. A final merit of pre-evaluating participants would be to detect the individuals that exhibit intolerable levels of test / performance anxiety before the actual usability test. In addition to the ‘performance observation’ approach presented in this study, attitudes should also be studied in order to provide an opportunity of triangulation, and to embrace social aspects of the phenomenon as well. Acknowledgments. A part of the research was conducted in the testing facilities of METU – CADCAM ŪTEST. The author wishes to thank to researchers in ŪTEST; Ç.Erbuğ, B.Şener, E. Akar, Z. Karapars, P. Gültekin, A. Öztoprak; and all the participants involved in the corresponding usability tests.
References 1. Rosenbaum, S., Chisnell, D.: Choosing usability research methods. In: Proceedings of the IEA 2000/HFES 2000 Congress, pp. 569–572 (2000) 2. Gray, W.D., Salzman, M.C.: Damaged merchandise? A review of experiments that compare usability evaluation methods. Human-Computer Interaction 13(3), 203–261 (1998) 3. Potosnak, K.: Recipe for a usability test. IEEE Software, pp. 83–84 (November 1988) 4. Egan, D.E.: Individual differences in human-computer interaction. In: Helander, M. (ed.) Handbook of human-computer interaction, pp. 543–565. Elsevier, New York (1998) 5. Nielsen, J.: Usability engineering. Academic Press, Boston (1993)
406
A.E. Berkman
6. Dumas, J.S., Redish, J.C.: A practical guide to usability testing. Ablex, Norwood- NJ (1993) 7. Berkman, A.E., Erbuğ, Ç.: Accommodating individual differences in usability studies on consumer products. In: Proceedings of the 11th conference on human computer interaction, vol. 3 [CD-ROM] (2005) 8. Bandura, A.: Social foundations of thought and action. Prentice Hall, New Jersey (1986) 9. Bunz, U., Curry, C., Voon, W.: Perceived versus actual computer-email-web fluency. Computers in Human Behavior, [article in press] 10. Norman, D.A.: The design of everyday things. Currency, New York (1990) 11. Newell, A., Simon, H.A.: Human problem solving. Prentice Hall, Englewood Cliffs (1972)
Are Guidelines and Standards for Web Usability Comprehensive? Nigel Bevan1 and Lonneke Spinhof2 1
2
Professional Usability Services, 12 King Edwards Gardens, London W3 9RG, UK Centre for Usability Research-K.U. Leuven, Parkstraat 45 bus 3605, 3000 Leuven, Belgium [email protected], [email protected]
Abstract. A previous paper compared the 110 guidelines in ISO CD 9241-151 with the 187 guidelines produced by the U.S. Department of Health and Human Services (HHS) and found that 76% of the HHS guidelines and 54% of the ISO guidelines were unique. New versions of both the original 2004 documents were issued in 2006, but 71% of the HHS guidelines and 46% of the ISO guidelines are still unique. Neither set of guidelines is easy to use for an expert review of whether a web site complies with the guidelines. A more comprehensive checklist has been developed, based on the HHS and ISO guidelines, but extended to include additional research-based guidelines on privacy and security and e-commerce. It is complemented by a handbook describing each guideline in more detail, illustrated with an example, and with an explanation of how it should be tested and when compliance can be stated.
• A brief statement of the overarching principle that is the foundation of the guideline. • Comments that further explain the research/supporting information. • Citations to relevant web sites, technical and/or research reports supporting the guideline. • A score indicating the "Strength of Evidence" that supports the guideline. These range from "Strong Research Support," indicating that there is at least one formal, rigorous study with contextual validity and agreement among experts to "Weak Research Support," indicating limited evidence and disagreement among experts. • A score indicating the "Relative Importance" of the guideline to the overall success of a web site. These scores range from 1-5 and are intended to help guide usability experts and web designers to prioritize the implementation of these guidelines. • One or more graphic examples of the guideline in practice. ISO is developing an International Standard to provide recommendations for the user-centered design of web user interfaces. The recommendations cover much the same scope as HHS, but are documented in a more concise format appropriate for an international standard. The ISO document distinguishes between design, process and evaluation aspects of web development. However, since the development process and evaluation is already covered by other ISO standards, it focuses on the design aspects, and provides design guidance and recommendations in four major areas: • • • •
High-level design decisions and design strategy. Content design. Navigation and search. Content presentation.
ISO 9241-151 primarily contains material that is unique to the web, so some topics covered by HHS are omitted from ISO 9241-151 as they are covered by more general ISO standards, in particular: • • • • •
Design Process and Evaluation: ISO 13407, ISO TR 16982, and ISO 9241-11. Accessibility: ISO TS 16071 and WAI Guidelines. Lists: partly covered by ISO 9241-12. Screen-based Controls (Widgets): ISO 14915-2. Graphics, Images, and Multimedia: ISO 14915-3.
This means that for complete guidance on the web, readers have to acquire additional standards and identify the parts that are relevant. This is not easy to do, particularly as some interpretation is needed to apply the material in other standards to the web. The previous paper showed that 76% of the HHS guidelines and 54% of the ISO guidelines were unique. In 2006, new versions were published of both the HHS and ISO guidelines. The new HHS document has added 22 new guidelines and updated 30 more, and the ISO document has been extensively revised with 31 additional recommendations. The 196 HHS guidelines (excluding usability testing) and 141 ISO recommendations are listed in Table 1, showing the ISO topics that appear to be most closely equivalent to
Are Guidelines and Standards for Web Usability Comprehensive?
409
each HHS guideline (or category of guideline). Partially corresponding ISO guidelines are shown in italics. Apparently conflicting guidelines are shown in bold. Table 1. Comparison of HHS and ISO guidelines * = HHS importance rating * HHS Guideline
ISO 9241-151 Recommendation
Design Process and Evaluation
5 1:1
Provide Useful Content
5 1: 2
E stabl i sh U ser R equi rements
5 1:3
Understand and Meet User’s Expectations
5 1:4
Involve Users in Establishing User Requirements
4 1:5
Set and State Goals
4 4 4 3 2 1
Focus on Performance Before Preference Consider Many User Interface Issues Be Easily Found in the Top 30 Set Usability Goals Use Parallel Design Use Personas
1:6 1:7 1:8 1:9 1:10 1:11
7.1.3 Appropriateness of content for the target group and tasks 7.1.4 Completeness of content 6. 3 A nal ysi ng the target user groups 7.1.2 Designing the conceptual model 7.1.5 Structuring content appropriately 8.3.2 Choosing suitable navigation structures 6.2 Determining the purpose of a Web application 6.5 Matching application purpose and user goals
6. 4 A nal ysi ng the users’ task s 6. 6 Pri ori ti si ng di f f erent desi gn goal s 6. 7 Pri ori ti si ng di f f erent desi gn goal s 6. 11 C oherent mul ti -si te strategy 7. 2. 2 I ndependence of content, structure & presentati on 9. 3. 5 V i sual i si ng temporal status 9. 3. 12 C onsi stency across rel ated si tes 10. 6 U si ng general l y accepted technol ogi es & standards 10.7 Making Web user interfaces robust 10. 8 D esi gni ng f or i nput devi ce i ndependence Optimizing the User Experience
5 4 4 4 4 4 4
2:1 2:2 2:3 2:4 2:5 2:6 2:7
4 2:8 4 4 4 4 3 3 3 2
Do Not Display Unsolicited Windows or Graphics Increase Web Site Credibility Standardize Task Sequences Reduce the User’s Workload Design For Working Memory Limitations Minimize Page Download Time Warn of ’Time Outs’ Display Information in a Directly Usable Format
2:9 Format Information for Reading and Printing 2:10 Provide Feedback when Users Must Wait 2:11 Inform Users of Long Download Times 2:12 Develop Pages that Will Print Properly 2:13 Do Not Require Users to Multitask While Reading 2:14 Use Users’ Terminology in Help Documentation 2: 15 Provi de Pri nti ng Opti ons 2:16 Provide Assistance to Users
8.3.11 Avoidi ng opening unnece ssary windows 9.6.4 Text quality
10.5 Acceptable download times 10.1.4 Using appropriate formats, units of measurement or currency.
10.2 Providing help. 9. 3. 15 Provi di ng pri ntabl e document versi ons 10. 3 E rror pages
Accessibility
5 5 5 4 4
3: 1 3:2 3:3 3:4 3:5
C ompl y wi th S ecti on 508 Design Forms for Users Using Assistive Technology Do Not Use Color Alone to Convey Information Enable Users to Skip Repetitive Navigation Links Provide Text Equivalents for Non-Text Elements
4 3:6
Test Plug-Ins and Applets for Accessibility
3 3:7 3 3:8 3 3:9
Ensure that Scripts Allow Accessibility Provide Equivalent Pages Provide Client-Side Image Maps
6. 8 C onf ormi ng to content accessi bi l i ty standards 9.3.9 Using Colour 7.2.3.2 Providing text equivalents for non-text objects 10.9 Making the user interface of embedded objects usable and accessible
410
N. Bevan and L. Spinhof Table 1. (continued)
3 3 2 2
3:10 3:11 3: 12 3:13
Synchronize Multimedia Elements Do Not Require Style Sheets Provi de F rame T i tl es Avoid Screen Flicker
9. 3. 10 U si ng f rames wi th care 6.9 Conforming to software accessibility standards 9.6.7 Making text resizable by the user
Hardware and Software
4 4 4 4 3
4:1 4:2 4:3 4:4 4:5
Design for Common Browsers Account for Browser Differences Design for Popular Operating Systems Design for User’s Typical Connection Speed Design for Commonly Used Screen Resolutions The Homepage
5 5:1
Enable Access to the Homepage
5 5:2
Show All Major Options on the Homepage
5 4 4 4 3 2 2
Create a Positive First Impression of Your Site Communicate the Web Site’s Value and Purpose Limit Prose Text on the Homepage Ensure the Homepage Looks like a Homepage Limit Homepage Length Announce Changes to a Web Site Attend to Homepage Panel Width
5:3 5:4 5:5 5:6 5:7 5:8 5:9
8.4.11 Linking back to the home page 8.3.9 Directly accessing relevant information from the home page 8.3.8 Informative home page
Place Important Items at Top Center Structure for Easy Comparison Establish Level of Importance Optimize Display Density Align Items on a Page Use Fluid Layouts Avoid Scroll Stoppers Set Appropriate Page Lengths U se M oderate W hi te S pace Choose Appropriate Line Lengths Use Frames When Functions Must Remain Accessible
6:3 6:4 6:5 6:6 6:7 6:8 6:9 6:10 6: 11 6:12
1 6:13
4 7:1
Navigation Provide Navigational Options
4 7:2
Differentiate and Group Navigation Elements
4 7:3
Use a Clickable ’List of Contents’ on Long Pages
4 7:4
Provide Feedback on Users’ Location
4 3 3 2 2 2 1
Place Primary Navigation Menus in the Left Panel Use Descriptive Tab Labels Present Tabs Effectively Keep Navigation-Only Pages Short Use Appropriate Menu Types U se S i te M aps Use ’Glosses’ to Assist Navigation
7:5 7:6 7:7 7:8 7:9 7: 10 7:11
1 7:12
Breadcrumb Navigation
9.3.2 Consistent page layout 9.3.3 Placing title information consistently 9.3.7 Avoiding scrolling for important information
9.6.5 Quantity of text per information unit/page 9. 3. 16 U se of “ whi te space” 9.3.11 Providing alternatives to frame-based presentation 9.3.6 Making content fit the expected size of the display area 9.3.13 Using appropriate techniques for defining the layout of a page
8.4.3 Maintaining visibility of navigation links 8.4.5 Placing navigation components consistently 8.4.7 Splitting up navigation overviews 8.4.14 Subdividing long pages 8.2.2 Showing users where they are 8.4.4 Consistency between navigation components and content 10.4 Naming of URLs
9.4.5 Inferring the link target from link cues 8. 4. 8 Provi di ng a si te map 8.2.2 Showing users where they are 8.4.12 Going back to higher levels
Are Guidelines and Standards for Web Usability Comprehensive?
411
Table 1. (continued) 8. 2. 1 M ak i ng navi gati on sel f -descri pti ve. 8. 2. 3 S upporti ng di f f erent navi gati on behavi ours. 8. 3. 3 B readth versus depth of the navi gati on structure 8. 3. 4 Organi si ng the navi gati on i n a meani ngf ul manner 8. 3. 7 S uperi mposi ng di f f erent navi gati on structures 8. 4. 6 M ak i ng several l evel s vi si bl e 8. 4. 10 M aki ng dynami c navi gati on components obvi ous 8. 4. 13 Provi di ng a 'step back ' f uncti on Scrolling and Paging
5 2 2 2 2
8: 1 8:2 8:3 8:4 8:5
E l i mi nate H ori zontal S crol l i ng Facilitate Rapid Scrolling While Reading Use Scrolling Pages for Reading Comprehension Use Paging Rather Than Scrolling Scroll Fewer Screenfuls
5 4 4 4 4 4 3 2
9:1 9:2 9:3 9:4 9:5 9:6 9:7 9:8
Use Clear Category Labels Provide Descriptive Page Titles Use Descriptive Headings Liberally Use Unique and Descriptive Headings Highlight Critical Data Use Descriptive Row and Column Headings Use Headings in the Appropriate HTML Order Provide Users with Good Ways to Reduce Options
5 4 4 4
10: 1 10:2 10:3 10:4
9. 3.8 Avoiding horizontal scrolling
Headings, Titles and Labels
9.4.17 Page titles as bookmarks 9.3.1 General page information 8.2.2 Showing users where they are
Links
U se M eani ngf ul L i nk L abel s Link to Related Content Match Link Names with Their Destination Pages Avoid Misleading Cues to Click
4 10:5
Repeat Important Links
4 10:6 4 10: 7
Use Text for Links D esi gnate U sed L i nk s
3 10:8
Provide Consistent Clickability Cues
3 3 3 3 3 3
Ensure that Embedded Links are Descriptive Use ’Pointing-and-Clicking’ Use Appropriate Text Link Lengths Indicate Internal vs. External Links Clarify Clickable Regions on Images Link to Supportive Information
10:9 10:10 10:11 10:12 10:13 10:14
9. 4. 7 U si ng descri pti ve l i nk l abel s
8.2.4 Offering alternative navigation paths 8.4.9 Providing cross linking to potentially relevant content 9.4.15 Redundant links 9. 4. 8 H i ghl i ghti ng previously visited links 9.4.2 Identification of links 9.4.3 Distinguishing adjacent links from each other
9.4.14 Link length 9.4.13 Distinguishable within-page links
8. 4. 16 D ead l i nks 9. 4. 4 D i sti ngui shi ng navi gati on l i nk s f rom transacti ons 9. 4. 9 M ark i ng l i nks to speci al targets 9. 4. 11 M arki ng l i nk s openi ng new wi ndows 9. 4. 12 D i sti ngui shi ng navi gati on l i nk s f rom acti on l i nks 9. 4. 16 A voi di ng l i nk overl oad Text Appearance
Use Black Text on Plain, High-Contrast Backgrounds Format Common Items Consistently Use Mixed-Case for Prose Text Ensure Visual Consistency Use Bold Text Sparingly Use Attention-Attracting Features when Appropriate Use Familiar Fonts Use at Least a 12-Point Font Color-Coding and Instructions Emphasize Importance Highlighting Information
9.3.9 Using Colour
Lists
4 12:1 4 12:2 4 12:3
Order Elements to Maximize User Performance Place Important Items at Top of the List Format Lists to Ease Scanning
[ISO 9241-12 5.7.1]
412
N. Bevan and L. Spinhof Table 1. (continued)
4 3 3 2 2 1
12:4 12:5 12:6 12: 7 12:8 12:9
Display Related Items in Lists Introduce Each List Use Static Menus S tart N umbered I tems at O ne Use Appropriate List Style Capitalize First Letter of First Word in Lists
Distinguish Required and Optional Data Entry Fields Label Pushbuttons Clearly Label Data Entry Fields Consistently Do Not Make User-Entered Codes Case Sensitive Label Data Entry Fields Clearly Minimize User Data Entry Put Labels Close to Data Entry Fields Allow Users to See Their Entered Data 8.5.2.8 Search field size 9.5 Choosing interaction objects Use Radio Buttons for Mutually Exclusive Selections Use Familiar Widgets Anticipate Typical User Errors Partition Long Data Items Use a Single Data Entry Method Prioritize Pushbuttons Use Check Boxes to Enable Multiple Selections Label Units of Measurement Do Not Limit Viewable List Box Options Display Default Values Place Cursor in First Data Entry Field Ensure that Double-Clicking Will Not Cause Problems Use Open Lists to Select One from Many Use Data Entry Fields to Speed Performance Use a Minimum of Two Radio Buttons Provide Auto-Tabbing Functionality Minimize Use of the Shift Key 8.4.15 Explicit activation
[ I S O 9241-12 5. 7. 6]
Screen-Based Controls (Widgets)
Graphics, Images, and Multimedia
4 14:1 4 14:2 4 14:3
Use Simple Background Images Label Clickable Images Ensure that Images Do Not Slow Downloads
4 14:4
Use Video, Animation, and Audio Meaningfully
4 14:5
Include Logos
4 4 4 3 3 3
Graphics Should Not Look like Banner Ads Limit Large Images Above the Fold Ensure Web Site Images Convey Intended Messages Limit the Use of Images Include Actual Data with Data Graphics Display Monitoring Information Graphically
14:6 14:7 14:8 14:9 14:10 14:11
2 14:12 Introduce Animation 2 2 1 1
14:13 14:14 14:15 14:16
7.2.3 Selecting suitable media 7.2.3.1 Selecting appropriate media objects 6.10 Identifying the site and its owner 9.3.14 Identifying all pages of a site
7.2.3.3 Enabling users to control time-dependent content changes
Emulate Real-World Objects Use Thumbnail Images to Preview Larger Images Use Images to Facilitate Learning Using Photographs of People Writing Web Content
5 15:1
Make Action Sequences Clear
4 4 4 4 4 4
Avoid Jargon Use Familiar Words Define Acronyms and Abbreviations Use Abbreviations Sparingly Use Mixed Case with Prose Limit the Number of Words and Sentences
15:2 15:3 15:4 15:5 15:6 15:7
8.3.5 Offering task-based navigation 8.3.6 Offering clear navigation within multi-step tasks 8.3.7 Superimposing different navigation structures 8.4.2 Providing navigation overviews 9.4.6 Using familiar terminology for navigation links
Are Guidelines and Standards for Web Usability Comprehensive?
413
Table 1. (continued) 3 3 3 3
15:8 15:9 15:10 15:11
Limit Prose Text on Navigation pages Use Active Voice Write Instructions in the Affirmative Make First Sentences Descriptive 9.6.1 Readability of text 9.6.2 Supporting text skimming 9.6.3 Writing style Content Organization
5 5 5 4 4 3 3 3 2
16:1 16:2 16:3 16:4 16:5 16:6 16:7 16:8 16:9
Organize Information Clearly Facilitate Scanning Ensure that Necessary Information is Displayed Group Related Elements Minimize the Number of Clicks or Pages Design Quantitative Content for Quick Understanding Display Only Necessary Information Format Information for Multiple Audiences Use Color for Grouping
9.6.2 Supporting text skimming
8.2.5 Minimising navigation effort
7.1.6 Level of granularity 7.2.4 Keeping the content up to date 7.2.5 Making the date and time of the last update available 7.2.7 Accepting online user feedback Search
5 17:1
Ensure Usable Search Results
5 17:2
Design Search Engines to Search the Entire Site
4 4 4 3 3 3 2
Make Upper- and Lowercase Search Terms Equivalent Provide a Search Option on Each Page Design Search Around Users’ Terms Allow Simple Searches Notify Users when Multiple Search Options Exist Include Hints to Improve Search Performance Provide Search Templates
17:3 17:4 17:5 17:6 17:7 17:8 17:9
8.5.3.1 Ordering of search results 8.5.3.2 Relevance-based ranking of search results 8.5.3.3 Descriptiveness of results 8.5.3.4 Sorting search results 8.5.4.1 Scope of a search 8.5.4.2 Selecting the scope of a search 8.5.2.7 Availability of search 8.5.2.10 Error-tolerant search 8.5.2.3 Providing a simple search facility 8.5.2.6 Describing the search technique used 8.5.2.1 Providing a search function 8.5.2.2 Providing appropriate search functions 8.5.2.4 Advanced search 8.5.2.5 Full-text search 8.5.2.9 Shortcut to search function 8.5.4.3 Providing feedback on the volume of the search result 8.5.4.5 Showing the query with the results 8.5.5.1 Giving advice for unsuccessful searches 8.5.5.2 Repeating searches 8.5.5.3 Refining searches
Use an Iterative Design Approach Solicit Test Participants’ Comments Evaluate Web Sites Before and After Making Changes Prioritize Tasks Distinguish Between Frequency and Severity Select the Right Number of Participants Use the Appropriate Prototyping Technology Use Inspection Evaluation Results Cautiously Recognize the ’Evaluator Effect’ Apply Automatic Evaluation Methods Use Cognitive Walkthroughs Cautiously Choosing Laboratory vs. Remote Testing Use Severity Ratings Cautiously Privacy and business policies
7.2.8.1 Providing a privacy policy statement 7.2.8.2 Providing a business policy statement
414
N. Bevan and L. Spinhof Table 1. (continued) 7.2.8.3 User control of personal information 7.2.8.4 Storing information on the user’s machine Internationalization
9.6.6 Identifying the language used 10.1.1 General 10.1.2 Showing relevant location information 10.1.3 Identifying supported languages 10.1.5 Presenting text in different languages Personalisation and user adaptation
7.2.9.2 Taking account of the users’ information needs 7.2.9.3 Making personalisation evident 7.2.9.4 Making user roles evident 7.2.9.5 Allowing users to see and change profiles 7.2.9.6 Informing about automatically generated profiles 7.2.9.7 Switching off automatic adaptation 7.2.9.8 Providing access to complete content
Only 56 of the HHS guidelines are in common (71% of the HHS guidelines and 46% of the ISO guidelines are unique). For the 101 HHS guidelines rated highest for importance, the proportion of unique guidelines drops to 62%. If the topics not covered by ISO 9241-151 are excluded (Design Process; Evaluation; Hardware and Software; Lists; Screen-based Controls; Graphics, Images, and Multimedia; Privacy & Business Policies; Internationalisation; and Personalisation) the percentage of unique guidelines drops to 64% of the HHS guidelines (55% of those of highest importance) and 38% of the ISO guidelines. While the percentage of unique ISO guidelines in the new documents has reduced from 49% to 38%, the percentage of high priority HHS guidelines that are unique remains about 55% (As some judgments had to be made for what constitutes equivalence, these figures are only approximate.) Some HHS guidelines are not in the ISO draft because they are beyond the scope of software ergonomics, e.g.: • Hardware and Software: browser, and operating system (e.g. 4:1 Design for common browsers). • 5:3 Create a Positive First Impression of Your Site. Other types of HHS guidelines that are not included by ISO include: • Home page design, e.g.: 5:5 Limit prose text on the homepage. • Scrolling & paging: e.g. 8:3 Use scrolling pages for reading comprehension. • Headings, Titles and Labels: window titles and descriptive headings, e.g.: 9:1 Use clear labels for categories of information that summarise the items within the category. • Appearance, e.g.: 11:4 Ensure visual consistency of website elements within and between web pages. • Lists: headings, ordering and formatting, e.g. 12:2 Display a series of related items in a vertical list. • Writing Web Content: jargon, abbreviations, and case, e.g. 15:5 Use abbreviations sparingly. • Content Organisation: support scanning and display necessary information, e.g.: 16:1 Organize information clearly: Structure the site to be meaningful to the user.
Are Guidelines and Standards for Web Usability Comprehensive?
415
ISO provides more detail in areas specific to the web such as Navigation and Searching, and includes Privacy and Internationalization that are outside the scope of HHS. In total 65 guidelines are unique to ISO. Examples of apparently important guidelines within the scope of HHS, but unique to ISO include: • 8.3.10 Avoiding unnecessary start (splash) screens. • 9.4.2 Navigation links should be clearly distinguishable from links activating some action. • 9.4.9 Links to other file formats should be clearly marked. • 9.4.11 Links that open new browser windows should be clearly marked. • 10.3 Error messages should clearly state the reason why the error occurred. • 8.4.13 Provide a separate ‘back’ function if the standard function does not lead to a meaningful previous state. These items may not have been included by HHS either because they were not included in the original set of guidelines that were reviewed (for example because there was no supporting evidence), or because they were subsequently judged “less important” and therefore eliminated from the published set. Some differences were noticed in the content of some HHS and ISO guidelines: • ISO 8.2.2 recommends use of breadcrumbs, while HHS 7:12 says that they are ineffective. • ISO 9.6.5 recommends limiting the quantity of text per information unit/page, while HHS 6:10 recommends using an appropriate page length, and using longer scrolling pages when reading for comprehension (8:3). • ISO 9.3.11 warns against using frames, while the HHS guidelines recommend frames in some circumstances (6:13 When functions must remain accessible) and suggest how they should be used (3:12 Use frame titles). • ISO 9.4.15 warns against using redundant links, while HHS 10:5 recommends repeating important links. • ISO 9.4.14 recommends that link names should not exceed one line of text, while HHS 10:11 recommends that link names should be long enough to be understood, but short enough to minimize wrapping.
2 Guideline Based Inspections Since the use of design- and usability standards in software development is rising [5] the interest in usability inspections is also rising. Previous research pointed out that the, then existing standards were not very easy for researchers and professionals to use [2,8]. Most current standards are still not readily useful in guideline reviews. To perform a thorough guideline review on websites can involve using a combination of different sets of guidelines. But when sets are combined ad hoc they are difficult to use and to interpret. This part of the paper describes: 1) the problems that occurred when the second author tried to use the HHS and ISO documents as a tool in guideline reviews, and 2) how the second author combined the different sets of guidelines into one checklist.
416
N. Bevan and L. Spinhof
Guideline based inspections are commonly used by usability professionals in the User Centered Design process. ‘Guideline based inspections’ or ‘guideline reviews’ are considered as usability inspection methods [11]. During a usability inspection method the usability related aspects of the interface will be examined. In contrast with formal usability tests here are no end users involved, the examination is performed by some kind of professional, a usability expert, a developer, an experienced user, etc. [11]. The best-known inspection method is ‘heuristic evaluation’, in which an application is checked against a list of quite generally formulated heuristics (e.g. the ten heuristics of Nielsen). The outcome of a heuristic analysis depends highly on the interpretation of the expert that performs the inspection. A Guideline review checks an application against a more concrete set of guidelines (e.g. the HHS guidelines). The outcome of a guideline review depends less on the expertise of the expert (Jordan, e.a. 1996). Guidelines used in a guideline review can have very different abstraction levels, they can vary from quite system specific to more general guidelines that can be applied to different kind of systems. The higher the abstraction level the more insight the expert needs to have in the system [3]. Design- and usability guidelines are becoming more and more popular in the software development processes [5]. The use of and compliance with usability standards is considered as a good way to create a high degree of consistency across and within applications [5,13]. One of the benefits of using guidelines can be increased consistency. Consistency is considered as one of the most important usability principles of Human Computer Interaction. Consistency within and between applications improves the overall usability of the application. Interfaces that are consistent have a higher ease of learning, ensure a smaller number of errors and therefore a higher user satisfaction [10]). More recent sets of usability guidelines such as the HHS guidelines and the ISO 9241-151 also consider other aspects of usability like interaction design and information architecture. To test an application for compliance with a certain set of guidelines it is useful to develop a checklist, listing all requirements the application should adhere to. A checklist helps the expert who checks for compliance to keep track of the requirements that need to be met. A good checklist consists of an exhaustive list of well-written requirements. A good requirement is necessary, verifiable, attainable and clearly formulated to avoid ambiguous statements [6]. Although usability guidelines tend to be more subjective in nature these rules for good requirements should also be kept in mind while creating a usability checklist. A good set of guidelines is composed of a combination of more specific guidelines for the application at hand and more generic guidelines that refer to more general aspects of the interface. The set of guidelines needs to be well-documented including concrete examples illustrating the different guidelines [5]. The document itself should comply with all guidelines for good document design, such as a thorough table of contents and index, word lists and glossaries [13]. Although the focus of usability guidelines may differ they are all developed to be used by developers when developing new applications and by usability experts when they are inspecting the usability of a system. Existing usability standards tend to be quite unusable for developers and even for usability experts who check for compliance with these standards [13]. So users of the different available standards still need to supplement the standards to make them usable in a guideline review.
Are Guidelines and Standards for Web Usability Comprehensive?
417
The two standards discussed in this paper are not directly usable in a guideline review. Ambiguous guideline formulations occur in both sets of guidelines, not all guidelines are verifiable and neither documents provides a ready to use checklist (see Table 2). The ISO standard contains no illustrations to serve as examples for the requirements and no index is provided. So both standards need to be supplemented to be usable as a checklist. Table 2. Examples of guidelines transformed to make them verifiable Original guideline from HHS: Let users know if a page is programmed to 'time out', and warn users before time expires so they can request additional time. Unambiguous requirements: Are users warned for a page time out? Can users "ask" for more time to complete a task? Original guideline from HHS: Provide users with appropriate feedback while they are waiting. Verifiable requirements: Is progress feedback provided for processes that take longer than one second? If a process takes maximum 10 seconds, is a visual indication used to indicate the progress? If a process will take more than 10 seconds, is a progress indicator used that shows progress toward completion?
2.1 Creating a Web Usability Checklist To perform a usability guideline review on websites we needed a complete checklist covering all usability topics concerning web usability. Neither of the existing sets of guidelines was complete and detailed enough to serve the purpose: a complete conformance test of websites against the existing and commonly used web design and usability guidelines. The HHS guideline set was the most complete and almost ready to use set of guidelines we could find. Therefore we decided to take the HHSguidelines as a basis to create the usability checklist we were looking for. We extended the scope of the HHS-guidelines with two topics: privacy and security and e-commerce taken from other guidelines found in literature (e.g. [12]). The requirements from the other topics were transformed into necessary, verifiable, attainable and unambiguous guidelines. All requirements are stated as yes/no questions. Some of the topics were combined into one topic and other topics were complemented with some extra requirements that in our experience are important in usability evaluations. All requirements in the checklist are research based. The whole set of guidelines was checked against the ISO/DIS 9241-151 to be sure we included everything from this upcoming international standard. All requirements in the ISO/DIS 9241-151 were covered in our set of guidelines but often differently formulated. Due to the contextual nature of usability research it was not always easy to formulate requirements that are true for all websites and web applications. We therefore developed a structure where the expert who does the inspection can decide which requirements are applicable in what situation. This was the only way to create a generally applicable web usability checklist that can be adjusted to each specific situation.
418
N. Bevan and L. Spinhof
Fig. 1. Example from the handbook
2.2 Use of the Checklist The checklist as it is developed here is intended to be used by usability experts during guideline-reviews. As stated above a checklist alone is not enough to create a usable review-tool. All requirements in the checklist are documented in the “guideline handbook”. Each requirement in the checklist is described in more detail and when possible is illustrated with an example (see Fig, 1). For each requirement there is a description of how it should be tested and when compliance can be stated. A glossary, a table of content and an index are included to make the document usable for the user of the checklist. Not all requirements have the same impact on the usability of an interface [1], so all requirements have been prioritized on a scale from 5 (highest priority) to 1 (lowest priority). This prioritization is based on the “relative importance” scales used in the HHS-guidelines. Adjustments to this scale have been made based on the experience from former research and on feedback from a group of usability experts that used the different drafts of the checklist in usability research.
Are Guidelines and Standards for Web Usability Comprehensive?
419
3 Further Research The complete review-tool (handbook and checklist) need to be user-tested before it can be used as a standalone test-tool. The usability of the tool itself as well as the outcome of the test need to be evaluated. In the next phase the complete review-tool will be tested in parallel with a formal usability test on the same application. By performing such a test the results of the guideline review can be compared with the results of a formal usability test. By asking several usability experts to participate in this test as evaluators, the usability of the tool itself will be evaluated at the same time.
References 1. Agarwal, R., Venkatesh, V.: Assessing a Firm’s Web Presence: A Heuristic Evaluation Procedure for the Measurement of Usability. Info.Sys.Research 13, 168–186 (2002) 2. Bevan, N.: Guidelines and standards for web usability. In: Human Computer International 2005, Proceedings HCI International 2005, Lawrence Erlbaum, Mahwah (2005) 3. Cockton, G., Woolrych, A., Hall, L., Hidemarch, M.: Changing Analysts’ Tunes: The Surprising Impact of a New Instrument for Usability Inspection Method Assessment. In: Proc.HCI 2003, pp. 145–162. Springer, Heidelberg (2003) 4. de Souza, F., Bevan, N.: The use of guidelines in menu interface design: Evaluation of a draft standard. In: Proceedings of the IFIP TC13 Third Interational Conference on HumanComputer Interaction pp. 435–440. North-Holland Publishing Co (1990) 5. Henninger, S., Lu, C., Faith, C.: Using organizational learning techniques to develop context-specific usability guidelines. pp. 129–136. Amsterdam (1997) 6. Hooks, I.: Writing good requirements. In: Proceedings of the 3rd International Symposium of theNational Council on Systems Engineering (NCOSE) (1993) 7. Jordan, W.P., Thomas, B., Weerdmeester, A.B., McClelland, L.I.: Usability evaluation in industry. Taylor & Francis, London (1996) 8. Mosier, J.N., Smith, S.L: Application of guidelines for designing user interface software. Behavior and Information Technology 5(1), 39–46 (1986) 9. Nielsen, J.: Usability Engineering. Morgan Kaufmann Publishers, San Francisco, CA (1993) 10. Nielsen, J.: Coordinating user interfaces for consistency. Morgan Kaufmann Publishers Inc. San Francisco, CA (1989) 11. Nielsen, J., Mack, L.R.: Usability inspection methods. John Wiley & Sons, Inc, US (1994) 12. Nielsen, J., Molich, R., Snyder, C., Farrell, S.: E-Commerce User Experience. Nielsen Norman Group Fremont, CA, USA (2001) 13. Thovtrup, H., Nielsen, J.: Assessing the usability of a user interface standard, pp. 335–341. ACM Press, New Orleans, Louisiana, United States (1991) 14. US Department of Health and Human Sciences: Research-Based Web Design & Usability Guidelines (2006) Available at, http://www.usability.gov/guidelines/
The Experimental Approaches of Assessing the Consistency of User Interface Yan Chen, Lixian Huang, Lulu Li, Qi Luo, Ying Wang, and Jing Xu User Research & Experience Design Center of Tencent Technology (Shenzhen) Ltd. Shenzhen 518057
Abstract. Consistency, as one of the most important features of usability, has been using as an important indicator of accessing usability. A number of studies recently have focused on how to create consistency in a single application, but few of them have noted how to create and evaluate the consistency across products in a same company. In this paper, we addressed the problem by using two methods, in-complete matching task and the methods of paired comparison, to analyze the distinction among the competitive products and evaluate the consistency of the current products. The study finds that these two methods can relative rapidly identify the performances of consistency between different products and be able to find out some design elements impacting the consistency. However, as the object of the study in this experiment is only involved in the login interface, the applicability of the method needs further studies. Keyword: consistency; user experience; usability testing.
1 Introduction One of the most important aspects of usability is consistency. Consistency should apply both within the individual application and across complete computer systems and even across families.[1] Consistency in a single application can reduce user’s memory load, and the risk of errors. While Consistency kept among different products in the same company can enhance the overall identification of the product so as to distinguish from our competitors’ products. However, due to different product positioning, different product application areas, and different target users of the product, it is very difficult to maintain consistency in the interface design of all the products. And for the company with a certain number of products, it is even more difficult to analyze the factors affecting the consistency among products. This study focuses on the exploration and development of an effective method which is applicable for more products to complete the analysis of consistency performances among the existing products.
The Experimental Approaches of Assessing the Consistency of User Interface
421
consistency interface is the very one following the rules, such as using the same operation to select targets [2]. Another issue of consistency is to determine where consistency should be reflected in the interface, and what should be kept consistent. Is it the object operation manner in the daily life (called external consistency) or the one in the existing operating system (known as internal consistency) [2]. In accordance with the definition in the "User Interface Design" by Ben Shneiderman, consistency is mainly the unification of the general operating sequence, terminology, components, layout, color, style sheets in the application of [3]. The former mentions external and internal consistency, while the latter mainly emphasize, from a procedural point, what should be kept consistent. Although these definitions vary from each other, but they all refer to the complete consistency in all products experience within the same company. (Find theoretical supports) Here, consistency includes the identification (differences between them and other competitive products) that is used by user in products of a company, which are caused by as the number, size, color, layout and other factors of interface design elements among different products within the same company, and consistency among different products within the same company.
3 Methods of Assessing Consistency 3.1 Premises of the Study The above definitions on consistency contain two important factors: the identification of the product and the consistency among different products within the same company. If the product has a high identification, and the consistency among different products is also high, we can say that the complete consistency of the product experience is very good. Based on this premise, we list the following matrix to describe the complete consistency. We determine the location of the consistency of the assessing products in the list and give the corresponding descriptions through the use of experimental methods. 3.2 Methods of Accessing Since there is the big quantity and more categories of products in the company, and the login interface has important influence on user and has the maximum using frequency, the consistency performance of the login interface is selected as the study object of this experiment. To analyze two factors constituting the complete consistency, we apply the in-complete matching method and the method of paired comparison separately to finish the evaluation. 3.2.1 Method of Paired Comparison The method of paired comparison is an indirect method used for the preparation of a sequence scale in experimental psychology. It was first introduced by Psychologist J.Chon in his color-preferred study. This method is to match a pair for all pre-compared
422
Y. Chen et al.
Table 1. Matrix of Complete Consistency Performance Description
Consistency
1
Complete
Identification
among products
consistency
High
High
Good
Description The complete consistency of the product experience is very good, and
the
consistency
among
different products is high and the products have high identification. 2
Low
High
Common
The complete consistency of the product experience is common, and
the
consistency
among
different products is high while the products have low identification. Possible reason: not have obvious differences
with
competitive
products and need to enhance the identification elements design. 3
High
Low
Common
The complete consistency of the product experience is common, and the products have high identification consistency
while among
the different
products is low. Possible reason: a part
of
products
have
high
identification and it is possible to have sub-products. 4
Low
Low
Bad
The complete consistency of the product experience is bad, and the products have low identification, the similarity to other products is high; the consistency among different products is low and the product design has not unity.
The Experimental Approaches of Assessing the Consistency of User Interface
423
objects, and then present them to tested persons one pair by another, and require the tested persons to compare some given property of each matched objects and judge which of two objects has a stronger property. The feature of this method is to force the tested persons to make a choice in two objects, and the paired match can ensure that every object can be matched with other objects to complete the comparison among objects, thus we can conclude the performance of all pre-compared objects in a selected property. In the experiment, we used the login interface of QQ2006 as a standard, and selected similar interface to the one of QQ2006 in other pre-compared matched pairs. Seven pre-compared interface were selected in the experiment, they are matched in to 21 pairs, and operate the forced selection in two options of A and B interface in each pair.
Login interface of QQ2006
A
B
Fig. 1. Depiction of Task of the Method of Paired Comparison
If the consistency of all pre-compared interfaces is high to the login interface of QQ2006, the consistency evaluation of all pre-compared interfaces should not show a clear trend, that is to say, the options of user have no obvious distribution trends. The deduction principle is a hypothesis that there are the same N balls, and they will be compared by different users, the comparison results should be random and there is not any trend in users’ options; If there are N balls marked different numbers, and they will be compared by different users, some users may compare them by the numbers’ size, while some may compare them by odd or even number, so there are certain distribution trends in the comparison results. We can assess whether there are distribution trends by using Kendall's consistency coefficient which is also called "Kendall’s U coefficient". If there are distribution trends, the U-value should be close to 1, if there are no distribution trends, the U-value should be close to 0. Kendall formula follows as:
U=
( (
8( ∑ γ 2ij − K ∑ γ ij ) N ( N − 1) • K ( K − 1)
);
+1
N means the number of the assessing objects (the number of classes) N=7 7pre-compared interfaces K means the number of assessors K=16 16 users
)
(1)
424
Y. Chen et al.
Rij means the selected score of i> j (or i<j) in the recording table of paired comparison. The selected interface is noted 1, while the non-selected interface is noted 0. Table 2. Table of Kendall Coefficient Calculation
Interface No. A B C D E F G
A 4 1 1 3 9 5
B
2 1 5 9 5
C
9 1 1 0
D
1 0 1
E
9 8
F
G
3
3.2.2 In-Complete Matching Task-Name Matching Task The in-complete matching method is to assess the identification of products. It requires users to select a product name for the login interface of each product, and then it will calculate the accuracy of interface-produce match (AM). AM is equivalent to the number of interfaces selected correctly by users dividing the total number of the required-selected interfaces, that is, AM = AI (accurate interface) / TI (total interface). If the matching task gets a high score, we will consider the product has high identification.
Fig. 3. Depiction of In-complete Matching Task
4 Experiment Preparation Screening of users is conducted through telephone. Combining with the server registered information, we make a quick telephone interview and invite users to participate in our activities, and we will require users to recall the used relevant QQ products and give a brief description of this product in order to ensure that users meet our requirements. 16 users in total participated in this experiment, among which were 10 men and six women, and 13 users aged 21-30 years, 3 users aged 20 years and less years. To
The Experimental Approaches of Assessing the Consistency of User Interface
425
ensure the familiarity of users to the QQ businesses, QQ business we selected those users who have used at least five QQ businesses. Only a few users have used the QQ businesses less than five. Seven login interfaces of the typical businesses and three login interfaces of other instant messaging software including MSN, Yahoo Message and Popo are selected in the complete-matching method. These interfaces are all processed in advance to remove the names and icons of relevant products.
Login interface before modifying
Login interface after modifying
Fig. 4. Depiction of Interface Modifying of In-complete Matching Task
5 Results The average accuracy of in-complete matching task of 16 users is 61%. The result shows that the identification of seven interfaces to be assessed is high. However, the Kendall’s U in coefficient calculated in the paired comparison task is approximately equivalent to 0.28. There is a certain trend in the distribution of users’ options, but the trend is not significant. Therefore, we can conclude that the overall consistency of 10 interfaces to be assessed is common. But in the calculation process, we also find that some options of pairs appear a concentrating trend, seeing the following table. Table 3. Table of Paired Comparison Data
Interface No. A B C D E F G
A
B
C
D
E
F
4 1 1 3 9 5
2 1 5 9 5
9 12 14 14
4 14 12
9 8
3
G
426
Y. Chen et al.
A-G is the code of seven interfaces to be assessed. Among of them, option C of CF, CG and CE is hardly selected, the selected times are separately 4 (subtract 12 times of E selected from 16 times in total), 2 (subtract 14 times of F selected from 16 times in total), and 2 ((subtract 14 times of G selected from 16 times in total). In the interviews, we know that the colors of the interfaces have greater impact on the users’ options. The color of C interface varies from the colors of all other interfaces and it is why the options of C have a concentrating trend. While in the DF and DG pairs, the times that D is marked as an option is very few, and they are separately: 2 (subtract 14 times of F selected from 16 times in total) and 4 (subtract 12 times of G selected from 16 times in total). Through interviews, we also learn that the sizes of the interfaces have greater influence on the users’ options. The size of D interface, relative to the sizes of F and G interfaces, is very different. Therefore, the size of the interface is the main cause of option concentration.
6 Conclusion We can conclude that the identification of products is high through the in-complete matching task, and conclude that the consistency of products is low by the method of paired comparison. Based on the list corresponding to the above study premises, we have come to the consistency description of eight interfaces to be assessed, seeing the table below. Identification
Consistency among
Complete consistency
Description
products 3
high
low
common
The complete consistency of the product experience is common, and the products have high identification while the consistency among different products is low. Possible reason: a part of products have high identification and it is possible to have sub-products.
Eight interfaces to be assessed including the QQ2006 interfaces as a standard in the paired comparison task all have high overall identification, while the consistency among the product is low. Design elements of some products with great differences, such as size, color and icon of the interface, may be the causes of a rise of the accuracy
The Experimental Approaches of Assessing the Consistency of User Interface
427
in the matching task, while that the consistency among the products is lower gives a support for the possibility of the design uniqueness existing in some products, and there are also many causes of the uniqueness formation, sub-products may be one of them.
7 Discussion In this study, it is found that the in-complete matching method and the method of paired comparison can rapidly assess the consistency performance of products, and that the increase in the number of products does not affect the experimental progress, therefore these two methods are more applicable for the situation with the larger quantity and more categories of products. However, the study only involves the login interface with single design elements, and therefore, the applicability of the methods needs for the next study. We consider adding the further interviews to users in the following studies and explore the specific factors which affect the consistency. A rise in the number of users and experimental groups to compare also could further verify the experimental conclusion.
References [1] Nielsen, J.: Coordinating User Interfaces for Consistency, Academic Press, Boston, MA. pp. 35–55 [2] Preece, J., Rogers, Y., Sharp, H.: Interaction Design, Beyond Human-computer Interaction. John Wiley & Sons, Inc, New York (2002) [3] Shneiderman, B.: Designing the User Interface: Strategies for Effective Human-computer Interaction, Person Education (1998)
Evaluating Usability Improvements by Combining Visual and Audio Modalities in the Interface Carlos Duarte, Luís Carriço, and Nuno Guimarães LaSIGE – Faculty of Sciences of the University of Lisbon Edifício C6, Piso 3, Campo Grande 1749-016 Lisboa, Portugal {cad,lmc,nmg}@di.fc.ul.pt
Abstract. This paper reports the findings of an evaluation of an adaptive multimodal application for reading of rich digital talking books. Results are in accordance with previous studies, indicating no user perceived difference between applications with and without adaptivity. The NASA Task Load Index was also used and showed that users of the adaptive application reported less workload. Results also include a comparison between tasks executed with electronic support and tasks executed with print support, and also what specific features in the interface benefited the most from the use of visual and audio modalities. Keywords: Evaluation, Adaptive Interfaces, Multimodal Interfaces, Electronic and Print Reading, Digital Talking Books.
Evaluating Usability Improvements by Combining Visual and Audio Modalities
429
whose performance has increased over the years, are beginning to be deployed in general public applications, meaning more and more users have had contact with some kind of speech technology. However, most of these applications, like call centers, rely solely on audio. The combined used of two modalities remains outside of the general public reach. In this paper we explore usability issues in an application using video and audio as input and output modalities. The following section briefly introduces the application used in the evaluation sessions. The next section describes the experimental setting and procedures. This is followed by the presentation of the evaluation results. Section 5 discusses the results, and the final section concludes the paper and presents future work.
2 Rich Book Player – An Adaptive Multimodal Digital Talking Book Player The application used in the usability evaluation was the Rich Book Player, an adaptive multimodal Digital Talking Book player [4]. This player can present book content visually and audibly, in an independent or synchronized fashion. The audio presentation can be based on previously recorded narrations or on synthesized speech. The player also supports user annotations, and the presentation of accompanying media, like other sounds and images. In addition to keyboard and mouse inputs, speech recognition is also supported. Due to the adaptive nature of the player, the use of each modality can be enabled or disabled during the reading experience. Figure 1 shows the visual interface of Rich Book Player. All the main presentation components are visible in the figure: the book’s main content, the table of contents, the figures panel and the annotations panel. Their arrangement (size and position) can be changed by the reader, or as a result of the player’s adaptation. The other visual component, not present in figure 1, is the search panel. Highlights are used in the main content to indicate the presence of annotated text and of text referencing images. The table of contents, figures and the annotations panels can be shown or hidden. This decision can be taken by the user and by the system, with the system behavior adapting to the user behavior through its adaptation mechanisms. Whenever there is a figure or an annotation to present and the corresponding panel is hidden, the system may choose to present it immediately or may choose to warn the user to its presence. The warnings are done in both visual and audio modalities. All the visual interaction components have a corresponding audio interaction element, with one exception. Since the speech recognizer currently used in the player1 does not support free speech recognition, annotations have to be entered by means of a keyboard. All the other commands can be given using either the visual elements or vocal commands. 1
This applies to the Portuguese version of the player, which was the one used in the usability study.
430
C. Duarte, L. Carriço, and N. Guimarães
Fig. 1. The Rich Book Player’s interface. The center window presents the book’s main content. On the top left is the table of contents. On the bottom left is the annotations panel. On the right is the figures panel.
3 Experimental Setting The usability evaluation was carried out in the context of an article reviewing assignment for a Hypermedia Systems course. The students had several such assignments over the semester, which consisted of preparing a summary and an oral presentation of a given article. The summary and the oral presentation were group tasks, typically done over a two weeks period. With the students’ agreement, it was decided that one of those assignments was to be done with support from the Rich Book Player, over a one day period. The assignment consisted in reading the article “The Dexter Hypermedia” individually during the morning period, and preparing a group summary and answering a short test during the afternoon. Over a period of four days, thirty-three students participated in the evaluation: six in the first day, and nine in each of the other days. Giver the number of simultaneous participants, and the length of each session, the experiment was not conducted in our regular usability evaluation laboratory, but a special setting was prepared in another room. The room was set up with nine test stations. Each station consisted of a laptop computer with a larger screen, mouse, headphones, microphone and webcam attached to it. The Rich Book Player application was available in all stations. The application was endowed with logging capabilities, thus recording all interaction with the participants. The stations also did screen recording, voice inputs recording, and webcam recordings, thus allowing for a full backup of the experiment. In addition to the stations, two digital video cameras recorded other aspects of the interaction.
Evaluating Usability Improvements by Combining Visual and Audio Modalities
431
The experiment was divided in two periods. The morning period started with a 30 minutes period for application familiarization, which was followed by 120 minutes for article reading, and ended with a usability questionnaire. The afternoon period was composed by a 75 minutes session for summary preparation, 30 minutes for answering a short test without access to the article, 30 minutes for the same test with access to the article, and finally, another questionnaire. For the summary preparation task, the annotations of all the group’s members were merged, and the group worked on only one station. In order to be able to evaluate the effects of using a multimodal application on the task of reading an article, the students were divided in two major groups. The control group read the article printed in paper, and the test group read the article using the Rich Book Player. In order to investigate the effects of adaptation, the test group was further divided in two groups: a group with some of the adaptation features turned off, and other group with all the adaptation features on. In total the control group counted nine elements, and the other two groups, twelve elements each. To reduce the effect of extraneous variables, the following controls were applied: • The tasks were the same for each participant. • The tasks had the same time constrains for all participants. The questionnaires were answered immediately after task completion. • All test stations were equipped with laptop PCs of the same model (Sony VAIO TX3) and external monitors with the same dimensions. All stations were configured to use the same screen resolution, operating system version, applications and desktop configuration.
4 Evaluation Results The experiment results consist of qualitative data, gathered from the different questionnaires answered by the participants, and quantitative data, gathered from the logs and screen and video capture. In this paper we present and analyze the results from the qualitative data. Three sets of questionnaires were answered during the experiment by the participants from both test groups, and one set only by participants from the control group. 4.1 NASA Task Load Index The first questionnaire administered to the participants was the NASA Task Load Index (NASA-TLX) [5]. All the participants answered this questionnaire since it focused on the task, not the application. The questionnaire was presented to the participants immediately after the completion of the article reading task. The NASA TLX is a subjective workload assessment measure. NASA-TLX is a multi-dimensional rating procedure that derives an overall workload score based on a weighted average of ratings on six subscales: Mental Demands, Physical Demands, Temporal Demands, Own Performance, Effort and Frustration. The NASA TLX was used in this experiment with the main goal of finding a difference between the scores of participants in the adaptive and non-adaptive groups,
432
C. Duarte, L. Carriço, and N. Guimarães
and between these groups and the control group. Previous findings [6,7] show users do not perceive advantages in using adaptive interfaces over non-adaptive interfaces. Using a subjective workload assessment measure might reveal a difference not directly perceived by the participants, leading to the following hypotheses: H1 Performing the article reading task with the adaptive application, the nonadaptive application, or with a paper article, will result in different perceived workload measures. Measures were collected for all participants (12 in the adaptive group, 12 in the non-adaptive group and 9 in the control group). A one-way ANOVA test was performed, and revealed that the perceived workload by users of the adaptive application (M = 53.30, SD = 14.27), users of the non-adaptive application (M = 57.11, SD = 13.45), and users with only a paper article (M = 57.56, SD = 14.79) did not differ significantly F(2, 30) = 0.31, p > 0.05. The statistical analysis does not support hypotheses H1, meaning that the perceived workloads do not differ significantly based on the support used for reading the paper. 4.2 Usability Questionnaire Following the NASA TLX, participants in the adaptive and non-adaptive group were asked to answer to a second questionnaire. This 26 questions questionnaire focused on feature usefulness and application usability, and was organized in the following groups: Navigation, Annotations, Images, Search, Adaptation (only for the adaptive application group), Presentation, Interaction and General Opinion. All the questions were answered in a 10 point scale. The General Opinion was measured on three questions, evaluation the participants’ opinion and reaction to the application (figure 2). The correlation between the answers to the three questions was calculated, and all three showed to be significantly correlated (p < 0.001). Taking this significant correlation into account, it was possible to reach a single measure of opinion by adding the answers to the three questions for each participant. In accordance to what has been presented before, no significant difference was expected to be found between the two groups, which lead to formulating the following hypotheses: H2 The general opinion of users of the adaptive application is similar to the general opinion of users of the non-adaptive application. To evaluate this hypotheses a t-test was performed on the data, showing that the opinion of people in the adaptive group (M = 18.92, SD = 6.05) was not significantly different from the opinion of non-adaptive group (M = 17.33, SD = 5.02), t(22) = 0.70, p > 0.05. For each of the other question groups in the questionnaire, t-tests were applied to the usability related questions, in order to understand how the use of multimodal output (visual and audio combined) contributed to the overall usability of the application. In the following paragraphs all reported t-tests take into consideration the necessary Bonferroni adjustment.
Evaluating Usability Improvements by Combining Visual and Audio Modalities
433
Fig. 2. Average of the answers per participant group to the three criteria on the General Opinion group of the usability questionnaire
Regarding the navigation in the Rich Book Player, several features were offered, including navigation using the table of contents, going forward or backwards a word, sentence, paragraph or chapter, and by direct selection in the main window content. The results indicate it the available mechanisms were considered usable, t(23) = 10.79, p < 0.001. Annotation creation is one of the most difficult mechanisms to implement. Previous evaluations showed it [8], and prompted an alteration of the steps necessary to create an annotation. This procedure was redesigned, making more explicit the need to first select the part of text being annotated, and only after that step inputting the annotation. Better support for text selection was developed, including an initial suggesting of the current sentence, and simple commands to expand this selection. However, both the sequence of commands to create an annotation, t(23) = 1.79, p > 0.05, and the commands for helping with the text selection, t(23) = 2.00, p > 0.05, did not reach statistical significance, meaning test participants did not consider them particularly usable. Search results appear highlighted in the text. To improve context acquisition, the whole sentence where the search term exists is also highlighted with a different color (one lighter than the one used for highlighting the searched terms). This feature was considered to improve the usability, t(21) = 4.21, p < 0.05. The application also tried to minimize the movement of the main text windows whenever another window appeared or disappeared from the screen, by controlling the appearance point, the width of the windows, and the position of remaining windows whenever a window was hidden. This feature was considered useful by the test participants, t(23) = 4.20, p < 0.05. On an overall interaction rating, the Rich Book Player was considered usable by the participants, t(23) = 7.05, p < 0.001.
434
C. Duarte, L. Carriço, and N. Guimarães
The awareness raising mechanisms made special use of the two modalities available, displaying text which had been annotated, or had an image associated with it, in different background colors, and also using verbal cues to signal the presence of such text. Current chapter was also highlighted in the table of contents, and after arriving at a new chapter, verbal cues indicated its number and name (whenever applicable). A series of questions concerned this features, and tried to evaluate if they helped the users become aware of their place in the book, and what content existed around their current reading point. All the answers showed these features to be usable and effective awareness raising mechanisms, p < 0.05. 4.3 Comparing Electronic and Paper Reading The final questionnaire, presented after the group summary writing task, asked the participants from adaptive and non-adaptive groups to compare their experience of reading an article with the Rich Book Player application to that of reading printed articles. A questionnaire with eight questions comparing different aspects of the reading experience was prepared. Answers were given on a 5 point Likert Scale. Once again, all the t-test results presented in the following paragraphs have taken into account the necessary Bonferroni adjustment. The first question compared navigation in the electronic format to the printed format. The average of the answers was 3.79, and a t-test revealed that participants felt navigation in the electronic format was significantly easier than in the printed format, t(23) = 4.98, p 0< 0.05. The next question compared searching in both formats. Answers’ average was 3.96, and a t-test confirmed that participants felt that finding text in the electronic format is significantly easier than in the paper format, t(23) = 4.7, p < 0.05. The two following questions deal with annotation creation and annotation reading. Neither of these showed statistically significant results. Answers for easiness of annotation creation were 3.00 in average, while for annotation reading 3.46 on average. The next question dealt with how easy it was to acquire the context of an image in both formats. Once again the answer is not statistically significant, even though the average answer, 3.21, is above the scale’s mid-point. Questions six and seven dealt with which format did the users felt it was quicker to read, and easier to understand the article’s contents. The average for the first one was 3.04, and for the second one 3.13, with both failing to reach statistical significance. The last question asked which is the less tiring format for reading the article. Average answer was 3.08, not reaching statistical significance.
5 Discussion The analysis of the experiments results conducted so far allows drawing some conclusions regarding the usage of a digital book player endowed with multimedia and adaptive features: the comparison of an application with adaptive features turned on and off, the comparison of performing a task with electronic or printed support, and the improvements in usability gained from combining two modalities (in this case, video and audio).
Evaluating Usability Improvements by Combining Visual and Audio Modalities
435
5.1 Adaptive Versus Non-adaptive Applications When evaluating adaptive systems, additional problems have to be dealt with, in comparison to the evaluation of non-adaptive systems: • The definition of a control group is difficult for those systems that cannot switch off the adaptivity to make a non-adaptive version, because it is an inherent feature of the system [9] • Criteria for definition of adaptivity success are not well defined. On the one hand, objective standard criteria regularly failed to find a difference between adaptive and non-adaptive versions of a system. On the other hand, subjective criteria, standard in HCI research have been rarely applied to evaluation of adaptive systems [10]. • The effects of adaptivity in most systems are expected to be rather subtle in comparison to what may be expected from individual differences, and thus require precise measurements, potentially taking into account behavior and cognitive aspects of the users [11]. This study tried to deal with some of these aspects. By having some of the participants work with a print version of the article, it was possible to define a control group applicable to both adaptive and non-adaptive versions of the application. Furthermore, it was possible to turn off some of the application’s adaptive features without rendering it unusable, enabling a comparison between two versions of the application. The study also tried to establish a comparison between adaptive and non-adaptive versions of the same application using different subjective measures. The results, however, are in accordance to previous results in the literature, indicating no significant perceived differences between the adaptive and non-adaptive versions of the application, even tough the opinion of the participants who worked with the adaptive version of the application was, on average, higher than that of the participants who worked with the non-adaptive version. The same can be said about the perceived workload measured by the NASA TLX, where, once again, no statistical significance was found in the results. In this case, the comparison extended to the participants working with the print version, who achieved scores very similar to those of the participants working with the non-adaptive version of the application. The participants of the adaptive application group achieved lower scores on the NASA TLX, indicating a lower perceived workload, even tough not enough to be statistically significant, but justifying further studies to investigate if this indicator can identify a difference between adaptive and non-adaptive applications. 5.2 Reading in Electronic and Print Supports Another aspect evaluated in this study was the participants’ opinion regarding the task of reading an article using an electronic medium offering multimodal output, compared to reading printed works. A somewhat surprisingly result was the average answer to all the questions being above or equal to the 5-point Likert scale’s medium point, meaning that no task was more difficult to perform in the electronic medium than in the printed medium. This
436
C. Duarte, L. Carriço, and N. Guimarães
was the expected result for some tasks, like searching, but not for other tasks like annotation creation. However, only two tasks were significantly easier to perform with the Rich Book Player than with printed articles: navigating and searching. While this was an expected result for searching tasks, given the digital supports advantage, it is worth mentioning that navigation tasks also achieved the same level in the participant’s opinion. This is probably explained by the vast possibilities offered for navigation inside the application, allowing users to navigate to any point with ease. 5.3 Improvements from Multimodality Multimodal output is used throughout the application: content is presented visually and aurally, awareness raising mechanisms combine both modalities, and reading position is presented in both modalities also. Usability questionnaires assessed how the use of multimodality impacted the participants’ opinion of the application. The results show that combining visual and audio led to improvements not felt in other areas of the interaction, where the modalities were not used in combination. This was particularly felt in the participants’ opinion of the usability of the awareness raising mechanisms.
6 Conclusions and Future Work This paper presented the results of an evaluation of an adaptive multimodal Rich Digital Talking Book Player. This player combines visual and audio modalities, both for input and output, and is also endowed with adaptive capabilities, leading to the interface’s behavior adaptation in response to changes in the user’s behavior. The evaluation experiment counted with the involvement of 33 participants, arranged in three groups: an adaptive application group, a non-adaptive application group, and a control group which worked with printed texts. Evaluation results confirmed no perceived differences between adaptive and nonadaptive applications. However, when considering the NASA Task Load Index, the workload felt was smaller for the adaptive application group. This result did not reach statistical significance, but nevertheless prompts the need for further experiments. When comparing tasks performed with the Rich Book Player, and tasks performed with printed texts, the participants’ general feeling was that it was easier to perform tasks with electronic support. While for some tasks (e.g. searching) this was expected, for other it was somewhat surprising. The use of multimodality has also proven beneficial from the usability viewpoint, particularly for implementing awareness raising mechanisms. To gather further results that may shed some light on the effects felt with long term usage of an adaptive application, another experiment is currently underway, where the participants have the Rich Book Player at their disposal in their home environment for a period of two months.
Evaluating Usability Improvements by Combining Visual and Audio Modalities
437
References 1. Kalyuga, S., Chandler, P., Sweller, J.: Managing split-attention and redundancy in multimedia instruction. Applied Cognitive Psychology 13(4), 351–371 (1999) 2. Sawhney, N., Schmandt, C.: Nomadic radio: speech and audio interaction for contextual messaging in nomadic environments. In: ACM Transactions on Computer-Human Interaction, vol. 7(3), pp. 353–383. ACM Press, New York (2000) 3. Oviatt, S., Coulston, R., Lunsford, R.: When Do We Interact Multimodally?: Cognitive Load and Multimodal Communication Patterns. In: Proceedings of the 6th International Conference on Multimodal Interfaces, State College, PA, USA, pp. 129–136. ACM Press, New York (2004) 4. Duarte, C., Carriço, L.: A conceptual framework for developing adaptive multimodal applications. In: Proceedings of the 11th International Conference on Intelligent User Interfaces, Sydney, Australia, pp. 132–139. ACM Press, New York (2006) 5. Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In: P.A. Hancock, N. Meshkati (eds.): Human mental workload. North-Holland, Amsterdam, pp. 139–183 (1988) 6. Höök, K.: Evaluating the utility and usability of an adaptive hypermedia system. In: Moore, J., Edmonds, E., Puerta, A. (eds.) Proceedings of the 2nd International Conference on Intelligent User Interfaces, Orlando, Florida, United States, pp. 179–186. ACM Press, New York (1997) 7. Weibelzahl, S.: Evaluation of Adaptive Systems. PhD Dissertation. University of Trier, Germany (2003) 8. Duarte, C., Chambel, T.: Simões, H., Carriço, L., Santos, E., Francisco, G., Neves, S., Rua, A.C., Robalo, J., Fernandes, T.: Avaliação de Interfaces Multimodais para Livros Falados Digitais com foco Não Visual. In: Proceedings of the 2nd Conferência Nacional em Interacção Pessoa-Máquina, Braga, Portugal (2006) 9. Höök, K.: Steps to take before intelligent user interfaces become real. In: Interacting with computers, vol. 12(4), pp. 409–426. Elsevier, Amsterdam (2000) 10. Weibelzahl, S., Lippitsch, S., Weber, G.: Advantages, opportunities, and limits of empirical evaluations: Evaluating adaptive systems. Künstliche Intelligenz 16(3), 17–20 (2002) 11. Karagiannidis, C., Sampson, D.G.: Layered evaluation of adaptive applications and services. In: Brusilovsky, P., Stock, O., Strapparava, C. (eds.) AH 2000. LNCS, vol. 1892, pp. 343–346. Springer, Heidelberg (2000)
Tool for Detecting Webpage Usability Problems from Mouse Click Coordinate Logs Ryosuke Fujioka1, Ryo Tanimoto2, Yuki Kawai2, and Hidehiko Okada2 1
Kobe Sogo Sokki Co., Ltd, 2 Kyoto Sangyo University 4-3-8, Kitanagasadori, Chuo-ku, Kobe 650-0012, Japan 2 Kamigamo Motoyama, Kita-ku, Kyoto 603-8555, Japan [email protected], [email protected] 1
Abstract. In this paper, we propose a method that detects inconsistencies between user interaction logs of a task and desired sequences for the task based on mouse click coordinate logs. The proposed method models two successive clicks as a vector and thus a sequence of operation in a user/desired log as a sequence of vectors. A vector is from the ith clicked point to the (i+1)th clicked point in the screen. To detect inconsistencies in user interactions and desired sequences, each vector from user logs is compared with each vector from desired logs. As cues of usability problems, the method detects two types of inconsistencies: unnecessary/missed operations. We have developed a computer tool for logging and analyzing user interactions and desired sequences by the proposed method. The tool is applied to experimental usability evaluation of ten business/public organization websites. Effectiveness of the method is evaluated based on the application result. The proposed method contributes to find 61% of the usability problems found by a manual method in much smaller amount of time: the number of clicks analyzed by an evaluator with the proposed method is only 1/5-1/10 of that with the manual method. This result indicates the proposed method is efficient in finding problems. Keywords: Automated usability evaluation, web, user interaction logs, mouse clicks, usability problem cues.
Tool for Detecting Webpage Usability Problems from Mouse Click Coordinate Logs
439
The existing methods require widget-level logs for the comparisons: the logs are required to include data of widget properties such as widget label, widget type, title of parent window, etc. This requirement degrades independency and completeness in logging user interactions with systems under evaluation. In this paper, we propose a method that detects inconsistencies between the user logs and the desired sequences based on mouse click coordinate logs. Coordinate values of clicked points can be easily and fully logged independently of what widgets are clicked. We have developed a computer tool for logging and analyzing user interactions and desired sequences by the proposed method. The tool is applied to experimental usability evaluations of websites. Effectiveness of the method in usability testing of webpages is evaluated based on the application result.
2 Method for Analyzing Mouse Click Coordinate Logs 2.1 User Logs and Desired Logs A user log can be collected by logging mouse clicks while a user (who does not know the desired sequence of a test task) performs the test task on a computer in user testing. In our research, a log file is collected for a test user and a test task: if the number of users is N and the number of tasks is M then the number of user log files is N ∗ M (where all the N users completes all the M tasks). A “desired” log is collected by logging mouse clicks while a user (who knows well the desired sequence of a test task) performs the test task. For a test task, one desired log file is usually collected. If two or more different interaction sequences are acceptable as desired ones for a test task, two or more desired log files can be collected (and used in the comparisons described later). 2.2 Method for Detecting Inconsistencies in User/Desired Logs The proposed method models two successive clicks as a vector and thus a sequence of operation in a user/desired log as a sequence of vectors. A vector is from the ith clicked point to the (i+1)th clicked point in the screen. To detect inconsistencies in a user log and a desired log, each vector from the user log is compared with each vector from the desired log. If the distance of the two vectors (vu from the user log and vd from the desired log) is smaller than a threshold, vu and vd are judged as being matched: the user operation modeled by vu is supposed to the same operation modeled by vd. The method defines the distance of two vectors as a weighted sum of distance between start points and size of difference (Fig. 1). Distance between start points = Size of difference =
w x (x 1 − x 2 ) 2 + w y ( y1 − y 2 ) 2 .
w x (x 3 − x 4 ) 2 + w y ( y 3 − y 4 ) 2 .
Vector distance = wp(Distance between start points) + wv(Size of difference) .
(1) (2) (3)
440
R. Fujioka et al. (x3,y3) (x4,y4)
vd
vu
(x1,y1) (x2,y2)
Fig. 1. Two Vectors and Their Distance
The role of weight factors wx and wy used in the calculations of distance between start points and size of difference is as follows. Users may click on links shown in a web browser window. The width of a link is usually larger than the height of the link, especially of a text link. Therefore, the differences of clicked points for clicking on the same link are likely to become larger for the horizontal axis (the x coordinate values) than for the vertical axis (the y coordinate values). To deal with this, weights wx and wy are used so that the horizontal differences can be counted smaller than the vertical differences. User operations to scroll webpages by mouse wheels should also be taken into account: scrolls by mouse wheels changes widget (e.g., link) positions in the screen so that the clicked positions may not be the same even for the same widget. Our method records the amount of wheel scrolls while logging interactions. Bu using the log of wheel scrolls, coordinate values of clicked points are adjusted. Fig. 2 shows this adjustment. Suppose a user clicked the point (xi,yi) in the screen (Fig. 2(a)) and then clicked the point (xi+1,yi+1) (Fig. 2(b)). In this case, the vector derived from the two clicks is the one shown in Fig. 2(c). As another case, suppose a user scrolled down a webpage along the y axis by the mouse wheel between the two clicks and the amount of the scroll was S pixel. In this case, the vector derived from the two clicks is the one shown in Fig. 2(d).
(xi,yi)
(a)
vu=(xi+1-xi,yi+1-yi)
(xi+1,yi+1)
(b) vu=(xi+1-xi,yi+1+S-yi)
(xi+1,yi+1+S)
(c)
(d)
Fig. 2. Adjustment of Clicked Point for Mouse Wheel Scroll
Tool for Detecting Webpage Usability Problems from Mouse Click Coordinate Logs
441
2.3 Two Types of Inconsistencies as Cues of Usability Problems As cues of usability problems, the proposed method detects two types of inconsistencies between user interactions and desired sequences. We name them as “unnecessary” operations and “missed” operations. Fig. 3 illustrates unnecessary and missed operations. Desired Log User Log
m u
u
u
m
Missed operations
u
Unnecessary operations Two operations judges as the same
Fig. 3. Unnecessary Operations and Missed Operations
Unnecessary operations are user operations judged as not included in the desired sequences, i.e., unnecessary operations are operations in a user log for which any operation in desired logs is not judged as the same one in the comparison of the user/desired logs. The method supposes such user operations as unnecessary because the operations may not be necessary for completing the test task. Unnecessary operations can be cues for evaluators to find usability problems that users clicked on a confusing link when another link is desired (expected) to be clicked on for the task. Missed operations are desired operations judged as not included in the user interaction sequences, i.e., missed operations are operations in desired logs for which any operation in a user log is not judged as the same one. The method supposes such user operations as missed because the operations may be necessary for completing the test task but the user finished the task without performing the operations. Missed operations can be cues for evaluators to find usability problems that a link is not clear enough or not easy to find for users. Our method models an operation in a user/desired log by a vector derived from clicked point coordinate logs, so the method detects unnecessary/missed operations as unnecessary/missed vectors. Suppose two or more successive operations are unnecessary ones in a user log. In this case, the first operation is likely to be the best cue in the successive unnecessary operations. This is because the user might deviate from the desired sequence by the first operation (i.e., the expected operation instead of the user operation is not clear enough for the user) and had performed additional operations irrelevant to the test task until the user returned to the desired sequence. The method can extract the first operations in the successive unnecessary operations and show them to human evaluators so that the evaluators can analyze usability problem cues (unnecessary operations in this case) efficiently. 2.4 Unnecessary/Missed Operations Common to Users Unnecessary/missed operations common in many of test users are useful cues for finding problems less independently of individual differences among the users. The method analyzes how many users performed the same unnecessary/missed operation. The analysis of the user ratio for the same missed operation is simple. For each missed operation, the number of user logs that do not include the desired operation is counted. To analyze the user ratio for the same unnecessary operation, the method
442
R. Fujioka et al.
User2 User3
User1 Extraction
Vector Comparison
Desired Log
User1
User2 User3
Unnecessary Operations Common to Users
Unnecessary Operations
User Logs
Vector Comparison
compares unnecessary operations extracted from all user logs of the test task. This comparison is achieved by the same way as operations (vectors) in user/desired logs are compared. By this comparison, unnecessary operations common among multiple users can be extracted (Fig. 4).
Fig. 4. Unnecessary Operations Common to Users
3 Evaluating Effectiveness Based on Case Study 3.1 Design of Experiment Ten websites of business/public organizations are selected. For each site, a test task is designed. The average number of clicks in the designed sequences for the ten test tasks is 3.9. Five university students participate in this experiment as test users. Each test user is asked to perform the task on the site. They have enough knowledge and experience in using web pages with a PC web browser but they use the websites for the first time. The desired sequences of the test tasks are not told to the test users. Thus, if the desired sequences are not clear enough for the test users, the users will be likely to deviate from the desired sequences and unnecessary and/or missed operations will be observed. The interaction of each user for each test task is logged into a user log file. To avoid fatigue affecting the results, the time of experiment for each user is limited to 60+ minutes: each test user is asked to perform a test task within five or ten minutes depending on the task size. Fifty user logs (five users ∗ ten tasks) and ten desired logs (a log per test task) are collected. For each task, a computer tool implementing the proposed method analyzes the logs and extracts possible cues of usability problems (i.e., unnecessary/missed operations). An evaluator tries to find usability problems from the extracted cues. 3.2 Weight Factors and Thresholds for Vector Distance Our method requires us to determine the values of weight factors wx, wy, wp and wv and the threshold value of vector distance (see subsection 2.2). To determine these values, we conducted pre-experiment with another test user. Based on the analysis of the log files collected by the pre-experiment, we investigated appropriate values that lead an accurate result in detecting unnecessary/missed operations. Values in the row labeled as “Original” in Table 1 shows the obtained values.
Tool for Detecting Webpage Usability Problems from Mouse Click Coordinate Logs
443
In our method, distance of two operations (vectors) is defined by Eqs. (1)-(3). In the case where wp = 0, the vector distance = the distance between start points so that two operations are compared by the clicked points only (i.e., a click in a user log and a click in a desired log are judged as the same operation if the clicked points are near). Similarly, in the case where wv = 0, the vector distance = the size of difference so that two operations are compared by the size of vector difference only (i.e., the position on which the click is performed is not considered). We evaluate these two variations of the method. Variations A/B in Table 1 denote those in which wp/wv = 0, respectively. Table 1. Values of Weight Factors and Threshold for Vector Distance
Original Variation A Variation B
wx
wy
wp
wv
0.4 0.4 0.4
1.0 1.0 1.0
0.5 0.0 1.0
1.0 1.0 0.0
Threshold (pixel) 100 67 34
3.3 Number of Problems Found To evaluate the effectiveness of our method in finding usability problems, we compare the number of problems found by the method with the number by a method based on manual observation of user interactions. In addition to record click logs in user interaction sessions, PC screen image was also captured to movie files (a screen recorder program is used in the PC). A human evaluator observes user interactions with the replay of the captured screen movies and tries to find usability problems. This manual method requires much time for the interaction observation but it will contribute to find problems thoroughly. In this experiment, the evaluator who tries to find problems by the proposed method and the evaluator who tries to find problems by the manual method are different so that the result with a method does not bias the result with another method. Table 2 shows the number of problems found by each of the methods. The values in the table are the sum for the ten test tasks (sites). Eleven problems are shared in the four sets of the problems, i.e., the proposed (original) method contributes to find 61% (=11/18) of the problems found by the manual method. Although the number of problems found by the proposed method is smaller than the manual method, the time for a human evaluator to find the problems by the proposed method is much less than the time by the manual method. In the case of the manual method, an evaluator has to investigate all clicks by the users because, in this case, user clicks that are possible problem cues are not automatically extracted. In the case of the proposed method, a human evaluator is required to investigate smaller number of clicks extracted as possible problem cues by the method. In this experiment, the number of clicks to be investigated in the case of the proposed method is 1/5-1/10 of the number in the case of the manual method.
444
R. Fujioka et al. Table 2. Number of Problems Found by Each Method
Methods Manual Original Variation A Variation B
#Problems 18 15 14 13
This result of case study indicates that the proposed method • contributes to find usability problems to a certain extent in terms of the number of problems, and • is much efficient in terms of the time required. 3.4 Unnecessary/Missed Operations Contributing to Finding Problems Not all unnecessary/missed operations extracted by the proposed method may contribute to finding usability problems. As the number of unnecessary/missed operations that contribute to finding problems is larger, the problems can be found more efficiently. The contribution ratio is investigated for the proposed method and its variations (Table 3). Values in the “Counts (first)” column are the counts of unnecessary operations that are the first in two or more successive unnecessary operations (see subsection 2.3). For example, the original method extracted four missed operations in total from log files of the ten test tasks, and 25.0% (one) of the four operations contributed to finding a problem. Similarly, the original method extracted 375 unnecessary operations in total, and 7.5% (28) of the 375 operations contributed to finding problems. Table 3. Number of Unnecessary and Missed Operations Found by Each Method and The Ratio of Contribution in Finding Problems
Missed Operations Methods Original Variation A Variation B
Counts 4 8 1
Ratio 25.0% 37.5% 0.0%
Unnecessary Operations Counts Counts Ratio Ratio (all) (first) 375 7.5% 51 49.0% 422 5.7% 58 39.7% 299 9.4% 72 37.5%
Findings from the result in Table 3 are as follows. • In the three methods, the ratios are larger for unnecessary operations (first) than those for unnecessary operations (all). This supports our idea that an evaluator can find usability problems more efficiently by analyzing the first operations only in the successive unnecessary operations. • In the result of unnecessary operations (first), the ratio for the original method is larger than either of the two variations. In the result of missed operations, the ratio for the original method is larger than that for the variation B but smaller than that for the variation C. This indicates that both the original method and its variation A are promising ones.
Tool for Detecting Webpage Usability Problems from Mouse Click Coordinate Logs
445
4 Conclusion In this paper, we proposed a method that extracts cues for finding usability problems from user/desired logs of clicked points. To detect inconsistencies between user and desired logs, the method compares operations in the logs. The method compares user/desired operations by modeling each operation as a vector derived from coordinate values of the clicked points and checking the distance between two vectors. The distance is defined as a weighted sum of distance between start points and size of difference for the two vectors. The method extracts two types of inconsistencies: unnecessary and missed operations. Effectiveness of the proposed method was evaluated based on a case study. We tried to find usability problems for ten websites by the proposed method and the manual method. The proposed method contributes to find 61% of the usability problems found by the manual method in much smaller amount of time: the number of clicks analyzed by an evaluator with the proposed method was only 1/5-1/10 of that with the manual method. This result indicates the proposed method is efficient in finding problems. In our future work, we extend our method by utilizing log data of click time intervals. Timestamps are another data that are easily, independently and fully logged. By utilizing both logs of clicked points and time intervals, usability problems cues that are more likely to contribute will be obtained. Additional case studies are also necessary for further evaluations of our method.
References 1. Ivory, M.Y., Hearst, M.A.: The State of the Art in Automated Usability Evaluation of User Interfaces. ACM Computing Surveys 33(4), 1–47 (2001) 2. Okada, H., Asahi, T.: GUITESTER: a Log-based Usability Testing Tool for Graphical User Interfaces. IEICE Transaction on Information and Systems E82-D(6), 1030–1041 (1996) 3. Okada, H., Ashio, T., Kunieda, K., Shimazu, H.: Interaction Logging and Analysis Tool Estimating Expected Operations for Unexpected User Operations. In: Proc. of the 11th Int. Conf. on Human-Computer Interaction (HCI International 2005), CD-ROM (2005)
A Game to Promote Understanding About UCD Methods and Process Muriel Garreta-Domingo1, Magí Almirall-Hill1, and Enric Mor2 1
Learning Technologies Dept - Universitat Oberta de Catalunya, Av. Tibidabo 47, 08035 Barcelona, Spain {murielgd,malmirall}@uoc.edu 2 Computer Science, Multimedia and Telecommunication Dept - Universitat Oberta de Catalunya - Rambla del Poblenou 156, 08018 Barcelona, Spain [email protected]
Abstract. The User-centered design (UCD) game is a tool for human-computer interaction practitioners to demonstrate the key user-centered design methods and how they interrelate in the design process in an interactive and participatory manner. The target audiences are departments and institutions unfamiliar with UCD but whose work is related to the definition, creation, and update of a product or service. Keywords: Games with a purpose, game pieces, HCI education, HCI evangelization, user-centered design, role-playing, design games, experience.
A Game to Promote Understanding About UCD Methods and Process
447
The project is organized in 12 work packages and besides the coordination and methodological packages, all except two are technically-oriented and led by programmers. These two exceptions are the first work package, which consists of gathering user requirements, and the second work package, which is responsible for the prototyping and user testing of all developed modules. Therefore, these two work packages with the help of the methodology package are responsible for ensuring that the entire project and, consequently, all development teams follow a UCD approach. As Twidale and Marty state, usability professionals often have to combine the roles of usability advocates, educators and practitioners [10]. Bias and Mayhew [2] addressed this issue by putting together a collection of articles on cost-justification of usability. However, the argument of cost-justification by itself is not enough to introduce UCD in an organization. As Siegel [7] explains, “success will hinge not on a single convincing argument, but on the many interrelated ideas we introduce to our organizations, on the kinds of relationships we build with various stakeholders, and on how we demonstrate our value to them first hand.” At the Open University of Catalonia (UOC), one of the Campus project participating universities, we created a game as a tool to increase the understanding of UCD methods. Through a participatory and interactive manner, its purpose is to promote a better understanding of a good design process; showing the importance of knowing the end user and keeping the focus on the user as well as choosing the right methods for analyzing the users and evaluating the design. Several activities have been programmed in the context of the Campus project in order to proselytize and teach the project teams the importance of a UCD process and the best way to apply it. As a part of these activities, we decided to deploy the UCD game for the Campus project teams.
2 The UCD Game: Origin, Audience and Goals The UCD game idea was created after the celebration of World Usability Day (WUD) 2005. As part of the UCD diffusion goal at UOC, different activities were organized for the occasion. There were formal presentations about in-house projects that followed the UCD process. Outside the conference room, a set of independent stations was placed for visitors to receive an overview of the UCD process and methods, experience a usability test in a lab setup, and use a computer with a screen reader installed (JAWS) in order to understand the importance of accessibility. This accessibility station, where participants had to browse the Internet with the monitor off and only with the help of the screen reader, was the most successful of all the activities organized. The aim when designing the game was to obtain a set of engaging stations where participants can experience the different steps of a UCD process. It is structured as a team and participatory activity with a set of interrelated tasks because the goal is not only to show how each project phase is accomplished individually but also how the project is completed and how these different phases relate to one another. 2.1 The Game Goals The setup of the UCD game is similar to the Interactionary, a design exercise envisioned and organized by Berkun [1], however, the main difference is that unlike
448
M. Garreta-Domingo, M. Almirall-Hill, and E. Mor
Interactionary, our game is not an on-stage competition and does not address designers or an HCI audience. In this sense, the goals of the game show created by Twidale and Marty [10] are more closely related to our objectives. Yet, while their game illustrated usability evaluation methods, our game strives to illustrate the UCD process and techniques. Buchenau and Fulton [3], in their paper about experience prototyping, quote the Chinese philosopher Lao Tse: “What I hear, I forget. What I see, I remember. What I do, I understand!” Our game is a way of promoting understanding by doing. Like Buchenau’s and Fulton’s, there are several papers on how to include “doing” in the design process by role-playing, informance design, interactive scenarios, participatory design, etc. [4,5,6,8,9]. However, these papers address a different problem than the UCD game and therefore are aimed to a different audience, pursuing different goals. While addressing designing exploratory design games, Brandt [4] describes various kinds of games, one of which is similar in concept to the UCD game: “The primary aim with the negotiation and workflow oriented games is for the designers to understand existing work practice. Game boards and game pieces are produced in paper. The outcome of the game playing is often flow diagrams showing relations between people and various work task or tools.” In our case, we want the Campus project participants to understand UCD work practices using game pieces for each of the UCD phases and a game board to show the relations between the different phases and the end design. In summary, the purpose of the game is: 1) To show the key steps of a UCD process in an enjoyable and informal setting. 2) To help participants understand how these steps relate to each other. 3) To provide an overview of the main HCI techniques and methods. 4) To illustrate that the user target and the methods used affect the end design. 2.2 The Target Audience We initially created the UCD game in the context of the UOC, a completely online university with more than 40,000 students that offers 19 official undergraduate degrees as well as several graduate programs. As a result, the virtual campus plays a key role at the UOC as it is the work tool for UOC employees, the teaching tool for faculty members, and the learning tool for students. In such a context, UCD should play a central role in all UOC departments that design products for the virtual campus users. Nevertheless, this is still not the case today. Although the introduction of usability and HCI concepts in the organization started in 2002, they are still not well understood and therefore not always properly applied. Hence, we created the game as a tool to promote a better understanding of a good design process in hopes of demonstrating the importance of understanding and focusing on the end user as well as choosing the right methods for analyzing the users and evaluating the design. The target audiences are organizational departments that participate in the creation, definition, and update of the virtual campus applications. Even though this target audience is formed by people familiar with the concept of usability and UCD, the goal is to ensure that the game is comprehensible even for people unaware of the existence of UCD.
A Game to Promote Understanding About UCD Methods and Process
449
Within the context of the Campus project, there are nine universities actively working in its development. Additionally, there are several other universities and public institutions that act as observers of the project and whom, in the near future, may use the virtual campus as their learning management system. Therefore, the audience is much more diverse than at UOC since it includes active and passive management, project leaders and developers from both public and private institutions. For our purposes, the target audience of the game was management and project leaders since they are responsible for ensuring that their teams follow a UCD process. However, our aim was to include developers because the more participants that understand the value of UCD afford greater opportunities for UCD to be applied throughout the development processes.
3 The Game Structure The game consists of four different stations; each station representing a phase in the UCD design process: defining the users, analyzing the users’ needs, designing the artifact and evaluating the resulting artifacts. Like the exploratory design games [4], players do not compete. Each team goes through the stations and at the end, all game boards are shown together in a separate room so that participants and observers can evaluate the design solutions. The WUD promoted by the Usability Professionals’ Association was the first background for the UCD game. The Campus project was the context of our second application of the game. Groups of 3 to 4 people are created with participants from different institutions and departments. To begin, they read an overall description of the game and are given a one-page description of the design problem. 3.1 The Design Problem Like Berkun [1], we decided that a non-web problem would work best with a large audience and that a physical design of a public and not-work related object would be better as these concepts are familiar to everyone, and the details are broad enough for everyone to follow along. Therefore, when considering a design problem, these issues played a key role; in the end, we opted for the design of an airport self-check in machine. The initial design problem was to create a ticket vending machine. To narrow the scope, the machine was supposed to only sell tickets to the airport and it was to be placed in a central railroad station of Barcelona, Spain. For the self-check in machine, we also narrowed the project to the flights between Barcelona and Melbourne or Sydney of a specific airline company. 3.2 The First Game Station: Defining the Users The aim of this first station is to introduce the idea that good design is accomplished by thinking of the end user and that this end user is neither the designer, nor anyone else. The team is presented with four groups of people, each containing four users with pictures and a short demographic description. Participants are asked to choose the group of users for whom they will design and write down their main characteristics.
450
M. Garreta-Domingo, M. Almirall-Hill, and E. Mor
Initially, all possible users were presented individually but we realized that participants needed a substantial amount of time to choose and group a set of users in order to build their primary user type. We have found that having the groups already formed is clearer for our audiences. 3.3 The Second Game Station: Analyzing the Users’ Needs The aim of this second station is to show that designers use several quantitative and qualitative methods to gather data about the chosen target. Defining the users is the first step, in this phase participants analyze the users’ needs, wants, contexts, and limitations by choosing a maximum of three methods from the UCD toolbox. After opening the envelopes of the selected methods, the team has to summarize the findings and write down a list of characteristics that should be considered when designing the artifact. For example, during the contextual inquiry method, the team watches a video of the Barcelona airport. For benchmarking, they have pictures of other self check-in machines already in use at the airport. Other methods available in the toolbox are: in-depth interview, focus groups, surveys and log analysis. Outside of the envelope there is a short description of the technique to help participants choose the ones that they consider most useful. Inside there is more information about the technique being applied to the design problem and the results of conducting it. For instance, for an in-depth interview, the inside page contains a list of possible interview questions and a list of possible answers given by users. 3.4 The Third Game Station: Designing the Artifact The goal of the third station is to show that a successful design is focused on the end user. As a consequence, designers should not jump directly to the end design but they should consider the output of the previous stations and follow an iterative design process. The team is also asked to use one of the evaluation methods available in another UCD toolbox. Assuming the team has understood the UCD philosophy, the result of this station will be a simple prototype and a list of changes that should be made to it after applying an evaluation technique. The game organizers are the pretended users for the evaluation techniques. For user testing, the team has to think of one or two tasks they would like the user to accomplish. The organizer will then perform these tasks using the prototype. 3.5 The Fourth Game Station: Evaluating the Designed Artifacts At the end of the game, each team pastes the one-page output of each station on a horizontal game board. The board is separated into four quadrants: 1) photos of the target users and key characteristics, 2) required characteristics of the artifact according to the user analysis and the methods used, 3) the first low-fidelity prototype, a list of changes resulting from the evaluation of the prototype and 4) the evaluation of the development process. Game boards are displayed in a room where participants and other observers can see the different designs and UCD processes. In order to evaluate the designs, participants and observers have a questionnaire that contains questions such as “Does the
A Game to Promote Understanding About UCD Methods and Process
451
design take into account the context of use?” or “Did the team evaluate their first design solution?”
4 Deploying he UCD game In order to test the game structure and its different stations, we initially ran a pilot of the game using a small group, half being HCI experts while the others were familiar with UCD but had never applied a full UCD process. The mixed groups were required to traverse through each of the four stations of the game: defining the target user, analyzing its needs, designing, and evaluating. It was very rewarding to see the groups make different decisions at each of the stations. Since the groups defined different user characteristics and target goals as well as selected different evaluation methods, the final designs varied greatly. In this sense, the pilot study proved that the game is useful in showing how phases relate to each other and that designs depend on characteristics of the end user and the methods used. Through the post-game questionnaire, we concluded that all participants considered the game useful to show the value of UCD methods and process and that it was an enjoyable, refreshing and enriching experience. We also obtained feedback on areas to improve, such as a tighter control of time for each station and a less technical and ambiguous description of the phases and methods. Our second application of the game was on November 14th during World Usability Day 2006. Around thirty people (8 groups) participated in the game. From observing the teams and the post-game questionnaire, we gathered that most participants enjoyed the experience and found it a successful tool to show UCD process and methods. Again the time spent on each station was perceived as too long despite the timers at each station and the organizers, who tried to encourage groups to move to the next station. Participants that had an interest in UCD wanted to do a good work on each phase and therefore they took longer than the allowed time. The biggest problem caused by lack of time was that the teams were not able to view the other teams´ solutions. Thus, visualizing how different target users and processes led to different results was one of the goals not accomplished in this application of the game. Running the UCD game for the Campus project participants was more challenging since most people are not interested in UCD. This lack of interest reduced the problem of the time issue but still made the game useful as another tool to promote understanding about the UCD process and approach.
5 Conclusions We created the game in order to show the UCD process and methods to an audience of non-experts but whose tasks are related to the definition, creation, and update of a product or service. We have deployed the game in three different occasions with diverse contexts and audiences. The feedback given by the different types of participants told us that the game is perceived as enjoyable and useful for our purpose. Recalling our goals when creating the game (1) To show the key steps of a UCD process in an enjoyable and informal setting. 2) To help participants understand how these steps relate to each other. 3) To provide an overview of the main HCI
452
M. Garreta-Domingo, M. Almirall-Hill, and E. Mor
techniques and methods. 4) To illustrate that the user target and the methods used affect the end design.), we are confident that it manages to accomplish the four objectives in a short period of time. However, a new question arises: does it make a difference in the participants’ everyday work? Will they consider applying a UCD approach in their next project? Will they be more willing to include the results of UCD methods in their work? We plan to answer these questions by deploying again the UCD game in the Campus project context but with the real design problem. As it has been mentioned, the first work package of the project is to gather user requirements. The output of the package will be personas, scenarios and needs of the future campus users. We will use the project main goal as the design problem and these outputs to prepare the game materials. With this new focus of the game, we expect to increase the developers’ involvement in the UCD process as well as interesting feedback from both developers and observers for the project development. The UCD game is a powerful and flexible tool that can be applied for different goals, in diverse contexts and for different audiences. Although each setting will require an adapted design problem, the overall structure of the game is a useful guide for all cases. Acknowledgments. This work has been partially supported by a Spanish government grant under the project PERSONAL (TIN2006-15107-C02-01) and by the Campus project promoted by the Generalitat de Catalunya.
References 1. Berkun, S.: Interactionary: Sports for design training and team building. http://www.scottberkun.com/dsports 2. Bias, R.G., Mayhew, D.J. (eds.): Cost justifying Usability: An Update for the Internet Age. Morgan Kaufmann, San Francisco, CA, USA (2005) 3. Buchenau, M., Fulton Suri, J.: Experience Prototyping. In: Proceedings on Designing Interactive Systems, pp. 424–433. ACM Press, New York (2000) 4. Brandt, E.: Designing Exploratory Design Games: A Framework for Participation in Participatory Design? In: Proceedings Participatory Design Conference, pp. 57–66. ACM Press, New York (2006) 5. Burns, C., Dishman, E., Verplank, W., Lassiter, B.: Actors, Hairdos & Videotape – Informance Design. In: Proceedings of CHI 1994, pp. 119–120. ACM Press, New York (1994) 6. Klemmer, S.R., Hartmann, B., Takayama, L.: How Bodies Matter: Five Themes for Interaction Design. In: Proceedings on Designing Interactive Systems, pp. 140–149. ACM Press, New York (2006) 7. Siegel, D.: The Business Case for User-Centered Design: Increasing Your Power of Persuasion. Interactions 10(3), 30–36 (2003) 8. Simsarian, K.T.: Take it to the Next Stage: The Roles of Role Playing in the Design Process. In: Proceedings of CHI 2003, pp. 1012–1013. ACM Press, New York (2003) 9. Svanaes, D., Seland, G.: Putting the Users Center Stage: Role Playing and Low-fi Prototyping Enable End Users to Design Mobile Systems. In: Proceedings of CHI 2004, pp. 479–486. ACM Press, New York (2004) 10. Twidale, M.B., Marty, P.F.: Come On Down! A Game Show Approach to Illustrating Usability Evaluation Methods. Interactions 12(6), 24–27 (2005)
DEPTH TOOLKIT: A Web-Based Tool for Designing and Executing Usability Evaluations of E-Sites Based on Design Patterns Petros Georgiakakis1, Symeon Retalis1, Yannis Psaromiligkos2, and George Papadimitriou1 1
University of Piraeus, Department of Technology Education and Digital Systems 80 Karaoli & Dimitriou, 185 34, Piraeus Tel.: 0030 210 414 2746 2 Technological Education Institute of Piraeus General Department of Mathematics Computer Science Laboratory 250, Thivon & P. Ralli, 122 44 Athens, Greece Tel.: 0030 210 5381193, Fax: 0030 210 5381351 {geopet,retal,papajim}@unipi.gr,[email protected]
Abstract. This paper presents a tool that supports a scenario based expert evaluation method called DEPTH (usability evaluation method based on DEsign PaTterns & Heuristics criteria). DEPTH is a method for performing scenario-based heuristic usability evaluation of e-systems. DEPTH focuses on the functionality of e-systems and emphasizes on usability characteristics within their context. This can be done not only by examining not only the availability of a functionality within an e-system but also the usability performance of the supported functionality according to a specific context of use. The main underlying ideas of DEPTH are: i) to minimize the preparatory phase of a usability evaluation process and ii) to assist a novice usability expert (one who is not necessarily familiar with the genre of the e-system). Thus, we (re)use expert’s knowledge captured in design patterns and structured as design pattern languages for the various genres of e-systems. This paper briefly describes the DEPTH method and presents the way a specially designed tool supports it along with the findings from an evaluation study.. Keywords: Heuristic evaluation, design patterns, reuse of design expertise.
account the user needs. The benefits anticipated are as follows: increased sales, customer satisfaction, customer retention, reduced support, stronger brand equity [13]. Usability evaluation of e-sites is not an easy task and requires a lot of effort [9]. One approach is the use of usability experts which raises the cost for the organization undertaking the task [2]. It is often difficult to find a usability expert who will be able to perform his/her tasks and pinpoint a lot of usability problems which stem from the general usability heuristics as well as to successfully determine usability problems which have to do with the specific context of use for the e-site. Not only is it difficult to find usability experts [10], but it is even harder to find genre specific e-sites usability experts. Digital genres are described as a classification system for kinds and types of digital products [12]. During the last years several digital genres of e-sites have been studied such as online newspapers, e-shops, e-travel sites, etc. Thus, a practical approach for solving the problem of finding usability experts for the specific genre of an e-site under evaluation could be to, accurately and efficiently, help a typical novice usability engineer in performing usability evaluation for that genre of e-sites. This can be achieved by transferring the expert knowledge to the novice usability engineers and guiding them to perform an e-site evaluation with the aid of systematic approaches and supported toolkits. Such an approach is the DEPTH method (usability evaluation based on DEsign PaTterns and Heuristics criteria). DEPTH is a scenario based expert evaluation method. It eliminates the difficulties of expert based evaluation described above and provides an integrated framework where the novice usability evaluator can find and (re)use expert knowledge for better performing the evaluation tasks of genres of e-sites. The innovative ideas behind the DEPTH approach are: i) the reuse of expert knowledge in the form of design patterns during the evaluation process. A design pattern describes a problem, a solution to it in a particular context, and the benefits or drawbacks from using that solution [1, 3]; ii) the use of scenarios of genres specific of e-sites. In this paper we describe the DEPTH toolkit which is a prototype Web-based tool, for designing and implementing usability evaluations of e-sites, based on the DEPTH usability method [4, 11]. The rest of the paper is organized as follows: Section 2 describes DEPTH in detail. Section 3 describes the application of DEPTH on two systems classified as Learning Brokerage Platforms (LBPs) in order to clarify the main points of the method. Finally, in section 4 we discuss the current status of the method, as well as our future plans.
2 The DEPTH Approach 2.1 Principles of DEPTH According to DEPTH the evaluation process of an e-site should focus on three dimensions: functionality according to genre, usability performance of the functionality according to context of use, and general usability performance according to heuristics criteria. We present the whole process in Figure 1 using an activity diagram which depicts the general activities and responsibilities of the elements that make up our method.
DEPTH TOOLKIT: A Web-Based Tool
455
The basic aim of our method is to provide a framework where an evaluator can find and (re)use expert knowledge in order to perform an evaluation that supports the above dimensions.
Fig. 1. The whole process depicting the general activities and responsibilities of the elements those make up DEPTH method
The first swim lane presents the general steps/actions of the evaluator according to DEPTH. These steps are guided and supported by the “DEPTH-Repository” which is the element that is constructed during the preparatory phase. The last element shows the deliverables of the execution phase of the evaluation process. Each evaluation study should start by first selecting the specific genre of the e-sites under evaluation. There are various checklists with the functionality of various genres of e-sites which one can easily use and re-use. In case one cannot find such a checklist, an analysis of the most well-known systems of a specific genre should be made in order to find out their functionality and provide a super set of all features categorized in groups as well as an analytical table. Such genres of systems along with their analytical tables of the supported functionality become part of the “DEPTH- Repository”. Having as input the analytical table of the functionality of the system under evaluation, the evaluator can easily perform the next step, which is a simple checking to ensure whether the system supports the underlying functionality. This step provides the first deliverable of our method which is a functionality report. This report describes the functions supported by the selected system. At the next step the evaluator has to decide which of the supported functions will be further analyzed for usability performance. As we have already mentioned, the production of the functionality table alone is not enough for someone to select the right e-site. We may have systems of similar genre, like e-commerce systems, which may contain the same set of features but vary in usability. In other words, “It is not only the features of the applied technology but especially the way of implementation of the technology”, as Lehtinen, et. al. [6] says for different genre of systems.
456
P. Georgiakakis et al.
Evaluating the usability performance of the system involves two primary tasks: (a) evaluation in the small, i.e. at the specific context, and (b) evaluation in the large, i.e. evaluating the general usability conformance to well-defined heuristics criteria. The first task is the most difficult since it implies the use of domain experts and therefore it is very expensive. Moreover, the availability of domain experts is very limited. At this point our method suggests the (re)use of domain knowledge through the design patterns and the underlying design pattern languages. Such a language can adopt issues from HCI design patterns since usability is of prime importance, while at the same time will take into account the particularities of the type of genres under evaluation, and so forth [14]. So, at the next step the evaluator for each specific function (or a set of functions) identified for usability performance, can see a related scenario. As we described above, one or more related scenarios are bound to specific design patterns during the preparatory phase and are part of the DEPTH’s repository. The evaluator may also decide to modify a related scenario to better suit his/her case. The next step is the execution of the underlying related tasks of the specified scenario. We have to stress here the essential role of the underlying usage scenario which acts as an expert wizard guiding the evaluator. After the execution, the evaluator is motivated by DEPTH to see the ideal solution as it has been recorded in the related pattern(s). This is necessary because the evaluator hasn’t seen the solution till now, but only the related usage scenario. By seeing the actual solution, the evaluator can complement his/her findings about the e-site under evaluation and he/she becomes more prepared to compose the evaluation report. The final evaluation report has two parts: a context specific part and a general part. The first reveals/measures the usability performance of the system under evaluation according to its specific context of use, while the second presents the general usability performance according to the expert/heuristic criteria. 2.2 The DEPTH TOOLKIT DEPTH TOOLKIT supports the tasks of two categories of users: i) Creators of the evaluation studies to be performed and the ii) the novice usability engineers. The DEPTH toolkit from the creator’s point of view supports four main tasks for each genre of e-sites: i) Specifications of the features of genres of e-sites, ii) Assignment of scenarios and appropriate tasks to features of genres iii) Editing of design patterns, as well as links between those patterns and specific features of the genre and, iv) Management of evaluation sessions and recording of evaluation reports. From the novice usability engineer perspective, the toolkit supports the evaluation study in two phases, first a preparatory and then an execution phase. During the preparatory phase, the user (novice usability engineer) chooses the genre of the e-site, and selects from a list the features of the system that she is interested in evaluating. This list is generated by the toolkit and includes features related to systems of the same genre as the one specified. During the execution phase, the selected set of features is being evaluated through the context oriented scenarios that have been proposed and written by the creators of the evaluation studies for that specific genre of e-sites (Fig. 2). At the end of the evaluation process, a detailed report is automatically
DEPTH TOOLKIT: A Web-Based Tool
457
produced, describing the usability performance of the examined system on the chosen function(s) at its specific context of use, along with the general usability performance of the examined e-site according Nielsen’s heuristics criteria [8].
Fig. 2. DEPTH TOOLKIT interface
3 Evaluating the DEPTH Method 3.1 Scope of Evaluation Study In order to evaluate our method we conducted an experiment with non expert usability evaluators. Twenty three (23) graduate students of our Department after having completed an introductory MSc course on Human - Computer Interaction, (we call them novice usability engineers) were asked to evaluate two e-sites of a specific genre. In order to make an experiment that would be related to their interests (they attend a MSc programme on e-learning technologies) we proposed the evaluation of two Learning Brokerage Platforms (LBPs), namely the Premier Training Online (https://www.premiertrainingonline.com/default.aspx) and the Adobe Store of North America (https://store1.adobe.com/cfusion/store/index.cfm). All students had average knowledge of such a genre of systems, and none of them claimed to be experts in using (nor designing) such systems. Actually, none of them had ever used any of the systems under evaluation. These e-sites allow the user to search, view, and purchase selected online learning objects that have to do with training in specific areas of interest. The Premier Training Online site offers distance learning programs in order to provide comprehensive home study courses for those pursuing careers in the health and fitness industry. The Adobe Store provides Adobe training about Adobe products via an online training center. This center gives access to many libraries full of engaging, interactive course contents, assessment features, and additional resources to maximize design and development of skills. These two e-sites had been carefully picked up since various usability problems had been identified during expert based evaluations previously organized by our group. We used DEPTH only from the evaluator’s point of view since we wanted to focus in this specific perspective. The main research questions of this evaluation
458
P. Georgiakakis et al.
study were: Can the DEPTH method help novice usability engineers identify usability problems (especially complex ones)? Can the DEPTH method make novice usability engineers improve their ability to propose solutions to the identified usability problems? Is the DEPTH method easy to apply? Does the DEPTH method make the novice usability engineers’ evaluation process easier, more flexible and enjoyable? Does the DEPTH method make novice usability engineers feel confident that they performed a good evaluation study? Do the novice engineers appreciate the added value of Design Patterns for usability evaluation? 3.2 Evaluation Process Several systems of LBP genre have been thoroughly examined with respect to the features they provide and a superset of those features is shown in Figure 3 (Fig.3 ).
Fig. 3. Features supporting online purchases
Selected design patterns (DP) from Martijn van Welie’s web design patterns repository (http://www.welie.com/) were related to a number of features (F) that an LBP may support as shown in Table1 below (each feature was related to one or more design patterns from Welie’s repository). Table 1. Examples of relations between Features and Design Patterns
FEATURES (F) F1. Select preferred language F2. Directions to the right section of the website F3. Know where you are in a hierarchical structure F4. Navigate a hierarchical structure
For each of these features we created a related usage scenario. For example for the functionality “F11: Buy/use shopping basket” we assigned the scenario “S11: Shopping Cart” as shown in Table2.
DEPTH TOOLKIT: A Web-Based Tool
459
Table 2. Task Scenario and Questions for a specific functionality
(S11) Description: Task:
Questions:
Shopping Cart Collect and purchase several items in one transaction. Locate the Shopping Cart / Shopping Basket. The basket is initially empty. Search for a product (either manually or with assistance from a search mechanism) and, if found, add it to the contents of the basket. Add to the basket a new product that is advertised in the home page. Search for a new product, other than those already included in the basket. Browse through the store. Delete one item from the shopping cart. Select another instance of a product included in the cart and add that instance to the contents of the cart. Select one of the products that are already in the cart and view its price. While viewing the shopping basket contents, try to locate a link related to shipping and handling costs and the calculation of their cost. While viewing the shopping basket contents, try to locate a link related to the return policy. • Was the name of the shopping cart used appropriately? • Was the shopping cart easily located? • Were you able to add in the basket a product advertised in the home page? How easy did you find this operation? • While viewing search results were you able to see the contents of the shopping cart? • Could the operation of searching for a new product other than those already included in the basket, be executed with zero or one click / move? Were you still able to view the contents of the shopping cart? • Was it easy to delete items from the shopping cart? Was it easy to modify their quantity? Was the total price automatically recalculated? • Was it easy to view the price of any product item included in the cart once you selected that item? • Was it easy to locate the link related to shipping and handling and the calculation of their cost? Was the information provided satisfactory? • Was it easy to locate the link related to the return policy? Was the information provided satisfactory?
The novice usability evaluators (i.e. the DPEs) had to conduct all the tasks of the proposed scenarios while having the ability to look at the related design pattern that the DEPTH method proposes. After having fulfilled the inspection of the LBP, they had to express their overall opinion about the e-site according to Nielsen heuristic criteria [8] that are: Visibility of system status, match between system and the real world, user control and freedom, consistency and standards, error prevention,
460
P. Georgiakakis et al.
recognition rather than recall, flexibility and efficiency of use, aesthetic and minimalist design, help users recognize, diagnose, and recover from errors, help and documentation. Finally a report in which all the answers of the questions proposed from the scenario and Nielsen’s heuristic criteria is automatically generated. Not only did we analyze the reports written by the DPEs but we conducted focus group interviews (in teams of three students) to get better insight of their opinion about DEPTH and the DEPTH toolkit. The major advantage of conducting a focus group interview [5] was the ability to obtain detailed information through group cooperation. The findings resulted in important and promising conclusions as shown below. 3.3 Evaluation Findings Using, throughout this experiment, novice usability engineers helped us verify what we intended to prove: the DEPTH method can actually enable novice usability evaluators perform evaluations of expert quality. After following the related task scenarios, they were able identify simple usability problems, while at the same time they were also assisted in identifying complex problems, which could not be easily spotted if scenarios and design pattern have not been given. The novice usability evaluators clearly stated that the design patterns helped them in realizing the good design practices concerning the various features of a LBP e-site. As an unexpected outcome, many of the evaluation reports that we received showed that the DPEs were also suggesting solutions for each of the problems identified. We, as reviewers of the experiment, wanted to know where this kind of knowledge came from. When we asked our students how they got these references, all of them mentioned the added value of the Design Pattern that accompanied each feature. By considering the solution given from the pattern and customizing to the context of the specific e-site, they were able to offer clear solutions to the usability problems. This assistance made them more confident, not only in indicating the usability problems, but also in proposing solutions for them. All students stated that the interface of the Toolkit made the evaluation process flexible and enjoyable. The use of specific task scenarios along with the categorization of all features provides a source of tasks and requirements that can be easily evaluated. Among other remarks, it was also mentioned that DEPTH can be used in evaluating isolated areas of interest by simply choosing only few features. However the method has some disadvantages. Design patterns are not that many. So it is difficult to find mature pattern languages to support the variety of e-sites genres. That became obvious from the collection of Design Patterns we proposed as we deliberately chose some that are not pretty matured. Even if we assume that the pattern language is there, pretty matured, will there always be a design pattern to validate all areas of interest in a digital genre? A problem that occurred during the evaluation was that the TOOLKIT didn’t allow the users to revise their report after they had submitted it. This problem is not difficult to be solved and the next revised version of the toolkit will include such functionality. Another major issue related to the evaluation of the DEPTH method and mainly depending on the user of the DEPTH Toolkit is the creation of genre dependent scenarios. Who should be the creator of those? Will the scenarios be highly scripted or
DEPTH TOOLKIT: A Web-Based Tool
461
loosely defined? What will the granularity of each scenario be? The need of experts in the creation of these task scenarios is meaningful. We may want to define scenarios that are very descriptive, or we may want to use scenarios that are more general. We need to have several scenarios, of different granularities, for each feature and let the user decide between cost and efficiency and choose the one that is most appropriate to the case of study.
4 Conclusions In this paper we provided an overview of DEPTH, which is an innovative method for performing scenario-based expert heuristic usability evaluation for e-sites. It is innovative since it uses the added value of design patterns in a very systematic way in the usability evaluation process. This method can be easily used by a novice usability engineer. When DEPTH was used by non-expert engineers in the evaluation of LBPs using a supported toolkits, called DEPTH toolkit, the results were satisfactory. The expert knowledge embedded in the form of design patterns and usage scenarios was readily available to the novice engineers, thus enhancing their testing methods and improving their perspective towards the usability of each functionality being tested. As the field of design patterns grows and matures, this method will be very promising and highly applicable. Acknowledgments. This work has been partially funded by through the EU IST FP7 project Grid4All (http://grid4all.elibel.tm.fr/).
References 1. Alexander, C.: The Origins of Pattern Theory: the Future of the Theory, And The Generation of a Living World, In: Keynote speech at the 1996 ACM Conference on ObjectOriented Programs, Systems, Languages and Applications (OOPSLA) (1996) retrieved from http://www.patternlanguage.com/archive/ieee/ieeetext.htm 2. Dix, Alan, Finlay, Janet, E., Abowd, Gregory, D., Beale, Russell.: Human-Computer Interaction, 3rd edn. Prentice Hall, Englewood Cliffs (2003) 3. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns – Elements of reusable object oriented software. Addison –Wesley, London, UK (1994) 4. Georgiakakis, P., Tzanavari, A., Retalis, S., Psaromiligkos, Y.: Evaluation of Web applications Using Design Patterns. In: Costabile, M.F., Paternó, F. (eds.) INTERACT 2005. LNCS, vol. 3585, Springer, Heidelberg (2005) 5. Krueger, R.A., Casey, M.A.: Focus Groups: A Practical Guide for Applied Research, 3rd edn. Sage Publications, Thousand Oaks, CA (2000) 6. Lehtinen, E., Hakkarainen, K., Lipponen, L., Rahikainen, M., Muukkonen, H.: Computer supported collaborative learning: A review of research and development (The J.H.G.I. Giesbers Reports on Education No. 10). Department of Educational Sciences. University of Nijmegen, Nijmegen, the Netherlands (1999) 7. Martín, G.Y.: Wiki tools for collaborative learning environments, Final project thesis, Telecommunications Engineering, Universidad de Valladolid (August 2005) 8. Nielsen, J.: Usability Engineering. Academic Press, London (1993)
462
P. Georgiakakis et al.
9. Nielsen, C.: Testing in the field. In: Werner, B. (ed.) Proceedings of the third Asia Pacific Computer Human Interaction Conference, IEEE Computer Society, Los Alamitos, CA (1998) 10. Nielsen, J.: Designing Web Usability: The Practice of Simplicity. New Riders Publishing, Indianapolis (2000) 11. Sartzetaki, M., Psaromiligkos, Y., Retalis, S., Avgeriou, P.: Usability evaluation of ecommerce sites based on design patterns and heuristic criteria, In: 10th International Conference on Human - Computer Interaction, Hraklion Crete (June 22-27, 2003) 12. Schmid-Isler, S.: The Language of Digital Genres: a Semiotic Investigation of Style and Iconology on the World Wide Web. In: Proceedings of the 33rd Hawaii International Conference on System Science, IEEE Press, CD-ROM, Hawaii (2000) 13. Stefani, A., Xenos, M.: A model for assessing the quality of e-commerce systems: PC-HCI 2001 Conference (2001) 14. Van Welie, M., Klaassen, B.: Evaluating museum websites using design patterns, Technical Report: IR-IMSE-001 (2004) Available at: http://www.welie.com/articles/IR-IMSE001-museum-sites.pdf
Evaluator of User's Actions (Eua) Using the Model of Abstract Representation Dgaui Susana Gómez-Carnero and Javier Rodeiro Iglesias Escuela Superior de Ingeniería Informática de la Universidad de Vigo Campus As Lagoas S/N, Ourense {jrodeiro,susanagomez}@uvigo.es
Abstract. User Interfaces has an important role on the success of an application. Due the relevant temporal and economic cost of its development is necessary to obtain a high acceptability and effective design. To consider a user interface acceptable this must be kind to user, do its objectives and be easy for the user. In this paper an abstract model specification is presented to allow evaluate the acceptability of user interfaces. This is made in a semiautomatic way validating the three items defined before. We also present a notation for the user interface testing and a tool that allows the user executes user tasks over the graphic user interface prototyping generates by the tool. Keywords: user interface design, usability, user interface modelling, prototyping, user interface test.
The evaluation of the compliment of these criteria is very complex. This complexity is given by the subjective approximation to this evaluation in most of the cases, mainly using expert opinion or questionnaires for the user [7] [11] [9] [14]. Given the subjectivity on user interface evaluation, and due to the importance of personal human perception in user interface qualitative evaluation techniques it will be interesting can reach a more direct and discrete method to avoid this personal perceptions. One engineering approach could be that user can interact with a prototype generating from a specification model obtained in a previous phase of analysis the requirements. With this approach the user can obtain a view of user interface more tangible, identifying early possible problems before spending time and money for the industry. Due the cost of develop software and that most of evaluation methods is realized after this development seems correct try to do the evaluation previously to the implementation. Following the criteria exposed before, a big amount of research related exists but almost all of them are in theoretical state and is not of practical application. Before to examine representation or notation techniques to probe if they can be applied to complex interfaces for an objective evaluation bases on the criteria the conclusion is negative. These are not appropriated to cover the necessities previously defined. By this, we have presented an abstract notation that allows represent the user interface using components, visual presentation (in graphic terms) and user interaction defined at component level. We present in this paper the part of the notation which represents the functionality of user interface, extracting of this task user that allows doing semiautomatic evaluation making minimum the actual development and evaluation costs. In section 2 it’s presented the notation for user interface behaviour representation. In section 3 we present the notation for user task test definition. In section 4 we present the EAU tool that allows the user dynamic evaluation of usability of the user interface by an interactive simulation of user interface. In section 5 we present the conclusions and future work of this research.
2 DGAUI Representation A review of user interface representation models had been made on literature. This review was focus in models that proposed a visual representation and behaviour of user interface [3], but considering the problems that we have found in these representations we need to present an alternative solution. The proposed representation DGAUI considers that visual user interface is not a continuous structure. The fact, it considers that is composed by discrete finite elements, defining the interface as a composition of individual elements called user interface components. This user interface components have a topological hierarchy could be and component into another. [12]. For the definition of the visual user interface the notation allows: - Define the visual user interface components, with standard graphical primitives if the component has visual representation on the user interface, or determinates properties if the component is for input information or only a container of other user interface components.
Evaluator of User's Actions (Eua) Using the Model of Abstract Representation Dgaui
465
- Determine the topological composition of visual user interface components to construct the visual user interface which the user interact in a moment of time. - Represent the dialog between components. Identifying the user events that user can use over components and which is the response of the other user interface components when the interaction occurs. The best choice for structure the notation is XML. By this we have create a DTD to allow an easy parse of notation structure. Attending to the notation semantic, one part of it has the initial representation of visual user interface and the second part allows represent everyone states of user interface obtained from interaction defined over user interface components and the transitions between states. Due the different nature of the two parts we divide the notation in two DTD. The first (called DGAUI-DEF) consist in a details definition of each one of the user interface components that composes the whole interface. This separation is adequate for allowing reusing of component definitions in other visual user interface representations. The second (called DGAUI-INT) depending of the first. This dependence is because this second notation is calculated from the first. The DGAUIDEF contains all states that visual user interface can reach. This set of states can be calculated from the initial representation of user interface. The initial state is formed from the properties of the user interface components definition. After this, the possible individual events over user interface components of this state are simulated, and the changes on components determines a new state (that already exists or that is identifier as new) and a transition between the state actual and the new state. This is the application of the concept the state diagrams for a user interface but generates from interaction on individual user interface components. This notation is oriented to the state of visual user interface component instead the state of whole visual user interface. A state of visual user interface is obtained from the combined states of the visual user interface components. Thus, the notation has a separation between presentation and behaviour of user interface. The presentation is located in the representation definition while functionality is located on states and transitions between states. This transition between states is calculated using the possible user actions (events) on visual user interface components and using also event of the system. We define an interface state as the join of all visual user interface components that, by the value of its properties, can be reached by user to interact in a moment of time. For the definition of the notation we consider that: - The user actions are not arbitrary. - The set of visual user interface states are finite and can be described and evaluated. - A visual user interface state depends on components that own and the properties of each one of them. - A state is a moment of visual user interface when is waiting by a user action, and doesn’t change while the user does not interact with it. Each state is characterized by the component value in these four properties:
466
S. Gómez-Carnero and J. Rodeiro Iglesias
- Visible: Visibility property of the component. Visible (T) or not Visible (F) on screen. - Activo: Indicates if the component responds to the user action (T) or not (F). If the component has Activo(F) in a state it doesn’t exist transition to other state caused by this component. - InfI: This component property activates the input data from user for itself component. If the component property has value True accepts data given by user. - InfO: the data output function of the component is activates with value true. If this property has value True the component will visualize the data send from the “core” of the application to the user. Events are user action over input hardware devices on system. These events are detected by the system and the system will respond as be defined for each event. An event is a single user action, for example, drag and drop is the combination of three single actions or events: click, move and release. We can define pre-conditions and post-conditions. If an event is defined using the notation over a visual user interface component and it has not pre-condition, the changes for other components can be performed always if the event over this component is produced. If a pre-condition exists, for example, for event RightClick over ComponentTwo is “ComponentOne:Activo(T)” (the property Activo of componentOne component have value True), this user action over ComponentTwo will not be performed if the value of ComponentOne in property Activo is F. For post-conditions, we define the values of properties that must be satisfied to reach the next state. The notation does not limit the events that can be defined for interaction. The HCI engineer can define the events that considers necessary and communicates its meaning to the workgroup. Some examples of basic event that we use are: - LeftClick: click over Mouse left button - RightClick: click over Mouse right button - ReLeClick: Release Mouse left button - ReRiClick: Release Mouse right button - MouseOn: Mouse pointer over a component - Key(key of keyboard): keys of keyboard combination With DGAUI from the events over the visual user interface components is possible to calculate the events over states. Thus, we can build a oriented labelled state graph of user interface and establish what the following state of user interface is if we know that component is affected by an event. The vertices are the visual user interface states and labelled arcs are the transitions between states. Two determining special states for the functionality of the interface exist: - Initial state: vertex in which all their associate arcs are of exit and do not have any arc of entrance that it can not possible be reached without passing by initial state.
Evaluator of User's Actions (Eua) Using the Model of Abstract Representation Dgaui
467
- Final State: vertex in which all their associate arcs are of input and no one of exit. An anomalous situation exists if two or more vertex has only arcs of input because it will exists more that one final state in visual user interface. The set of possible following states if a user action is performed can be obtained from the initial state. This is possible applying the events over the visual user components with property Activo(T) and making the associate changes of interaction on the other visual user components. From this first set of following states, and applying the same process to each one, the rest of states can be obtained until to reach a final state where any visual user component has value True in properties Activo and Visible. During the building state process can be identified transitions or arcs (labelled with a component and event that is applied) to states identified before. Two states are equal (and therefore the same state) if all its visual user interface components have the same value in properties Activo, Visible, InfI and InfO. A visual user interface component belongs to a state if has one of the following functionality. Los componentes que forman parte de un estado son los que tienen una funcionalidad dentro del estado. One of the functionality is that the visual user interface component has a visual appearance in the visual interface that provides of relevant information to the user (in this case the component property Activo has value True). Other of the functionality is that the visual user interface component causes changes on the properties of other components when event is produced over it. One of the advantages of this notation is that allows the visual properties modification of a component without the behaviour of this component varies (the user could not seen the component but its behaviour is maintained along intermediate states). If the visual user interface component appearance modification varies its functionality, this would be a different visual user interface component. The user interpretation of the visual user interface component appearance must be unique for each visual user component and must also identify to the user its functionality. In other case the component will be ambiguous and the visual user interface design will be wrong. This situation exists in interactive models that using interactor. The specification of interactor is defined to support the different interactor states reached by the occurrence of user actions. According to traditional specification of interactor, for an interactor state there is a concrete functionality and a unique appearance of state interactor. There is no model that considers multiple rendering functions for a unique interactor state. This is because the specification is based on interactor dialog instead of its visual appearance. In DGAUI proposal may exists different appearance for a visual user interface component, caused by user actions (visual operations), but the visual user interface component behaviour is the same. Visual operations are for example resizing or changes on the size of visual user interface components. Using DGAUI the visual interface consistence is maintained because two visual user interface components, with the same appearance must have the same behaviour. But if the visual user interface component appearance is modified as a personal choice of user, this modification will not affect to visual user interface component behaviour. Because this work is oriented to early phases of prototyping DGAUI proposal does not consider the abstract representation of application data. The participation on user interface of visual user interface component as elements to allow the user choice of input and output information are defined including in version 3.04 domain definition
468
S. Gómez-Carnero and J. Rodeiro Iglesias
of data or mask for text input. If a user action changes drastically the visual user interface component appearance then the new appearance must be a new visual user interface component and therefore a different visual user interface state. If the visual user interface is correct would be exists only a state where all visual user interface component have the properties Activo and Visible with value False (final state). Also, it can identify the initial state from the visual user interface components representation in DGAUI-DEF examining their properties. The visual user interface component definition, the topological composition, and the dialog between components are constant for a visual user interface. La information for each state of visual interface is determined by the values of visual user interface component properties. Once obtained the states of the visual user interface we use a state graph (multidigraph) to represent the whole set of transitions between states. In it the vertex are the visual user interface states and the arcs are the transitions between states. The arcs are labelled with the name of visual user interface component and the event that causes the transition. The XML document (DGAUI-INT) contains the following information: - Topological Composition of visual user interface components contained in other visual user interface components. - Information about the visual user interface states. All visual user interface states are defined by the description and properties of its components. The initial state is obtained from visual user interface components description and the other states are obtained from a automatic process. - Set of transitions between states. This is obtained during the automatic process of states identification. Information about XML the structure and examples of notation DGAUI may be seen in http://www.ei.uvigo.es/~susanagomez/hci.html
3 User Test Definition Once defined the visual user interface, if we want to evaluate it we must define the test over this interface. The objective is that the user interacts with the prototype and during this interaction we record as parameters or actions as we desire. DGAUI provides of the components and state appearance description that may be rendering in a standard rendering device. Also it provides of the user tasks and the states that user can find using the visual user interface. The first phase if we want automate the evaluation of visual user interface is define a notation that allow describe the atomic parts of the evaluation. Equal that DGAUI-DEF we use XML for structure the notation. By this we have create a DTD to allow an easy parse of notation structure. It is possible define as many evaluations as it is desired. Each evaluation is formed by a set of user task to perform. Each user task is described by the following information: - A description. (textual description of the user task for documentation) - The parameters to evaluate and record during the evaluation process. May be time parameters, for example total time using by user to perform the task, time until the user starts interaction, medium time between events,
Evaluator of User's Actions (Eua) Using the Model of Abstract Representation Dgaui
469
time to first user mistake, etc. los parámetros que se quieren evaluar durante la realización de la tarea. Other parameter may be counter parameters, for example, number of user events, number of user mistakes during the evaluation task, number of times that an error happens, etc. The last parameter may be error parameter that is dedicated to identify and control types of user mistakes, for example, what is the most frecuent user mistake or what is, produced in a state, the user mistakes previously defined. - The visual user interface states that it will be evaluated. It is defined which are the states and the transitions (visual user interface components and user actions) presented in the prototype to the user. Information about XML the structure and examples of notation EUA may be seen in http://www.ei.uvigo.es/~susanagomez/hci.html
4 Evaluator of User's aActions (EUA) The EAU tool allows the user evaluate dinamicaly the visual user interface usability. This evaluation is doing with a interactive simulation of the interface. From the abstract notation of DGAUI (concretly DGAUI-INT) we can build the visual appearance of the interface states and simulate the user actions over the components. With de EAU notation it can posible define the user task that user can probe. The simulation reproduces the visual appearance of interface following the user task described in section 3. The user interaction on the simulation record information according the parameters defined in section 3. This information is stored in a data base for its posterior study and analysis. This tool allows the HCI engineer define as evaluations as be necessary and obtain cuantitative information about the real use of a user on the visual user interface. Normally, the HCI engineer explains the user which are the objetives that has to reach while evaluate the visual user interface on the prototype. Then with the information obtained the HCI engineer can determine if the interface has any problem before start to code it. EAU tool has a simple interface with two basic functionalities: - Load Interface: allows select a XML file that contains the visual user interface description (DGAUI-DEF and GDAUI-INT) and generate the visual appearance of the user interface states. - Test Interface: it is necessary to evaluate the visual user interface select the individual user tasks definitions that complete the evaluation (AEU XML file). With this definition and with the configuration parameters to access to the data base the simulation is executed. For each user interaction with the prototype the information about this interaction it is stored in the data base for its study. Fig. 1 shows a visual user interface state generated by EAU tool from a description DGAUI. The interface example corresponds with a basic text processor.
470
S. Gómez-Carnero and J. Rodeiro Iglesias
Fig. 1. Text processor prototype
5 Conclusions and Future Work In this work we present an abstract representation of user interfaces specially designed for visual interactive systems. The focus of this representation is the visual aspect of user interface because this is the most important part of the user interface for the user. The most significate information that user obtain from user interface is throught appareance and his interaction on user interface is based on this signification. Other contribution of this work is the concept of visual user interface component with different appearance (in most of the cases size and position) for the same behaviour allowing diferent rendering functions for the same state of component. We showed that is possible to describe in a notation a set of interface user tasks and create automatically a prototype to evaluate with users its behaviour and its acceptability in a cuantitative way. As future work we are working on exception definitions for the user interface behaviour, including system events and user interface answers as result of data basw querys. Other line of work is to define metrics to be used with the information obtained from the evaluation and create a system that can find automatically errors on user interface prototype using agents. Acknowledgements. This work has been founded by projects TIN2005-08863-C0302 and 05VI-C02.
References 1. Eason, K.: Information Technology and Organizational Change. Taylor and Francis, London (1988) 2. Gartner Group Annual Symposium on the Future of Information Technology, Cannes (November 7-10, 1994)
Evaluator of User's Actions (Eua) Using the Model of Abstract Representation Dgaui
471
3. Gómez Carnero, S., Rodeiro Iglesias, J.: Aplicación de los sistemas de representación para la sistematización de la validación de interfaces de usuario. Technical Report TR-LSIGIG-05-2. Computer Science Department. University of Vigo (2005), http://www.ei.uvigo.es/ susanagomez/hci.html 4. IBM, IBM Dictionary of Computing. McGraw-Hill (1993) 5. ISO. Software product evaluation quality characteristics and guidelines for their use (1992) 6. ISO. Ergonomics Requirements for Office Work with Visual Displays Terminals: Guidance and Usability (1993) 7. Molich, R.y., Nielsen, J.: Heuristic evaluation of user interfaces. In: Proceedings of ACM CHI 1990. Seattle, WA, April 1990, pp. 249–256 (1990) 8. Myers, B.A.y., Nielsen, J.: Survey on user interface programming. In: Bauersfeld, P., Bennett, J.y., LYNCH, G. (eds.) CHI’92 Conference Proceedings on Human Factors in Computing Systems, pp. 195–202. ACM Press, Nueva York, NY (1992) 9. Nielsen, J.: Usability Engineering. Academic Press, London (1993) 10. Nielsen, J., Mack, R.L.: Usability Inspection Methods. John Wiley and Sons, New York (1994) 11. Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S.: Human-Computer Interaction. Addison-Wesley Publishing, Reading, MA (1994) 12. Rodeiro Iglesias, J.: Representación y análisis de la componente visual de la interfaz de usuarios. PhD Thesis. Universidad de Vigo (September 2001) 13. Shackel, B.: Ergonomics in designing for usability. In: Harrison, M.D., Monk, A. (eds.) People and Computers: Designing for Usability, Cambridge University Press, Cambridge (1986) 14. Wharton, C., et al.: The cognitive walkthrough method: a practitioner’s guide. In: Nielsen, J.y., MACK, R.L. (eds.) Usability Inspection Methods, pp. 105–140. John Wiley & Sons, New York (1994)
Adaptive Evaluation Strategy Based on Surrogate Model Yi-nan Guo, Dun-wei Gong, and Hui Wang School of Information and Electronic Engineering,China University of Mining and Technology, 221008 Xuzhou, China [email protected]
Abstract. Human fatigue is a key problem existing in interactive genetic algorithms which limits population size and generations. Aiming at this problem, evaluation strategies based on surrogate models are presented, in which some individuals are evaluated by models instead of human. Most of strategies adopt fixed substitution proportion, which can not alleviate human fatigue farthest. A novel evaluation strategy with variable substitution proportion is proposed. Substitution proportion lies on models’ precision and human fatigue. Different proportion cause three evaluation phases, which are evaluated by human only, mixed evaluated by human and the model, evaluated by the model only. In third phase, population size is enlarged. Taking fashion evolutionary design system as an example, the validity of the strategy is proved. Simulation results indicate the strategy can effectively alleviate human fatigue and improve the speed of convergence.
Adaptive Evaluation Strategy Based on Surrogate Model
473
were adopted in order to lower the complexity of evaluation and reduce human burden. In above all researches, surrogate models replace human to evaluate all individuals or part of individuals in each generation so as to reduce the number of individuals evaluated by human. But they did not utilize surrogate models enough. First, the proportion of population evaluated by surrogate models in each generation is fixed which alleviate human fatigue limitedly. Second, population size is small and fixed all the time which limits the performance of IGAs. Surrogate models compute the fitness of individuals by computers which do not need human participation. So population size can be enlarged when only surrogate models are adopted in evaluation. Aiming at solving above problems, a novel adaptive evaluation strategy based on surrogate model is proposed. The number of individuals evaluated by surrogate model is adaptively tuned according to the degree of human fatigue and the evaluation precision of the model so as to effectively alleviate human fatigue. When population is evaluated only by surrogate model, population size is enlarged so as to improve the speed of convergence. In the rest of the paper, adaptive evaluation strategy is explicated in Section2. To validate the validity of the strategy, experiments based on fashion evolutionary design system and testing results are analyzed in Section3. At last, future work planned to introduce distributed neural networks into surrogate model is included.
2 Adaptive Evaluation Strategy Based on Surrogate Model When human preference to optimization problems is stable, surrogate model is adopted to evaluate individuals instead of human. Here, two problems must be taken into account. First, surrogate model must keep consistency with human cognation and preference exactly in order to ensure the convergence of the algorithm. So how to obtain the model with great prediction precision and generalization is the base of the strategy. Second, how to utilize the model instead of human in evaluation influences the performance of the algorithm. In this paper, the latter is of interest. In adaptive evaluation strategy, when the model is started up in the evaluation process and how many individuals are evaluated by the model in each generation are two key problems. And now few of researches concern them. 2.1 Startup Mechanism About Surrogate Model Startup mechanism offers some conditions which decide when to start up surrogate model in the evaluation process. That is in which generation these conditions are satisfied, population can be evaluated by surrogate model in proper proportion. In general, when human feel tired and surrogate model has learned human preference exactly, the model is adopted to calculate the fitness of individuals. So startup mechanism about surrogate model includes two conditions. They are the condition of human fatigue and the condition of models’ precision. When any of conditions is satisfied, surrogate model is start up to evaluate. This evaluation strategy is shown as follows.
474
Y.-n. Guo, D.-w. Gong, and H. Wang
F(P(t)) ={Fm(I,t), Fu (I ' ,t)| I ≠ I ' , I, I ' ∈P(t)}, Fm ≠∅, ∀((Fa(t) ≥ ε) ∨ (Trf (t) ≥Ψ))
(1)
where Fm denotes fitness value calculated by surrogate model and Fu denotes fitness value given by human. I and I ' express individuals evaluated by surrogate model and human respectively. P ( t ) denotes the population in t-th generation. Fa(t) ≥ ε describes the condition of human fatigue where Fa(t ) expresses the degree of human fatigue and ε is the threshold for human fatigue. The degree of human fatigue reflects how tired human are. Letting v(t ) denotes time that human spend
evaluating and β (t ) denotes the proportion of population evaluated by human. The degree of human fatigue is defined as follows [7].
Fa (t ) = 1- e - tv ( t )β( t ) S ( t )
(2)
where t is generation and S (t ) is the similarity of population which describes average similarity of individuals in population, shown as follows. | P −1| | P|
S (t ) =
2∑
n
∑ ∑σ ( x (t ) , x (t )) l
i
j
i =1 j = i +1 l =1
(3)
| P || P − 1|
where | P | is population size and n is the length of individuals. σl ( xi (t ), x j (t )) expresses the similarity of l-th bit between two individuals. σl ( xi (t ), x j (t )) =1 if l-th bit of xi (t ) is same as it of x j (t ) , otherwise σl ( xi (t ), x j (t )) = 0 . Human will spend more time evaluating the individuals when similar individuals in the population are more. It is obvious that human feel more tired when the total number of individuals evaluated by human is more and time for evaluation in each generation is more. Trf (t) ≥Ψ describes the condition of models’ precision where Trf (t) expresses the reliability of surrogate model and Ψ is the threshold for the reliability of the model. The reliability of the model reflects the consistency between surrogate model and human preference. It is measured by the average Euclid distance between fitness value calculated by the model and fitness value given by human of individuals in sampling population, shown as follows.
Trf ( t ) =
| Ps | | Ps |
∑( F ( I , t ) - F ( I , t ) ) u
i
m
2
(4)
i
i =1
where
| Ps | is the sampling population size.
In a word, whether surrogate model is start lies on two conditions: whether or not the degree of human fatigue exceed the threshold for human fatigue; whether or not reliability of surrogate model exceed the threshold for the reliability of the model.
Adaptive Evaluation Strategy Based on Surrogate Model
475
2.2 The Proportion of Population Evaluated by Surrogate Model
The proportion of population evaluated by surrogate model decides how many individuals are evaluated by the model in each generation. Up to now, in most of evaluation strategy based on surrogate models, the proportion of population evaluated by the model is fixed which limits the effect of surrogate models on performance. Aiming at this problem, the proportion of population evaluated by the model adaptively varies. Two factors are taken into account to decide the proportion of population. First, when human feel more tired, human hope that fewer individuals are evaluated by themselves. Second, when the reliability of surrogate model is higher, the model is more urgent to evaluate more individuals instead of human. So the proportion of population evaluated by the model in t-th generation is defined as -Trf (t )
ρ(t ) = Fa (t )(1- e
)
(5)
So the number of individuals evaluated by the model in t-th generation is
N f (t ) = ⎣⎢| P | ρ (t ) ⎦⎥ 2.3
(6)
Substitution Mechanism About Surrogate Model
In general, the evaluation process of IGAs adopting evaluation strategies with fixed proportion of population evaluated by surrogate model can be divided into two phases. There are two kinds of division. If ρ(t ) = 1 while the conditions of startup mechanism are satisfied, two phases include phase evaluated by human only and phase evaluated by surrogate models only. If ρ(t ) < 1 while the conditions of startup mechanism are satisfied, two phases include phase evaluated by human only and phase mixed evaluated by human and surrogate models. But in this paper, adaptive proportion of population evaluated by the model is adopted. So the evaluation process is different with above instances. According to the number of individuals evaluated by human in each generation, there are three phases in the evaluation process of IGAs, including evaluated by human only, mixed evaluated by human and the model, evaluated by the model only. Here, the third phase is of interest. In this phase, population size is enlarged because there does not exit human fatigue when surrogate model is adopted as an implicit fitness function. Phase I: Population is evaluated by human only In this phase, all of individuals are evaluated by human and surrogate model is not started up. So the evaluation mode and the number of individuals evaluated by the model are defined as follows.
F ( P(t )) = {Fu ( I ' , t ) | I ' ∈ P(t )}, ∀Fa(t ) < ε , Trf (t ) < Ψ
(7)
Nf (t) =0
(8)
476
Y.-n. Guo, D.-w. Gong, and H. Wang
It is obvious that in this phase, human do not feel tired and surrogate model can not reflect human preference exactly. Above phenomenon possibly appear in the beginning of evolution. So the evaluation mode of this phase is usually adopted in the former of evolution. Phase II: Population is mixed evaluated by human and surrogate model In this phase, surrogate model is startup. Some of individuals are evaluated by human and others’ fitness values are calculated by the model. So the evaluation mode and the number of individuals evaluated by the model are shown as follows.
F(P(t)) = {Fm (I , t), Fu (I ' , t)| I ≠ I ' , I , I ' ∈ P(t)}, ∀Fa(t) < ε ,Trf (t) ≥ Ψ −Tr ( t ) N f (t ) = ⎢⎣| P | Fa (t )(1 − e f ) ⎥⎦
(9)
(10)
In this phase, the degree of human fatigue does not exceed the threshold and surrogate model learns human preference exactly. So the number of individuals evaluated by the model is increasing. In above phases, population size is fixed and small because human participate in the evaluation process. In general, population size in IGAs is less than ten to alleviate human visual fatigue. Phase III: Population is evaluated by surrogate model only In this phase, all of individuals are evaluated by surrogate model. So the evaluation mode is shown as follows.
F (P(t )) = {Fm ( I , t ) | I ∈ P(t )}, ∀Fa(t ) ≥ ε
(11)
Human often feel very tired in the latter of evolution while this evaluation mode is adopted. Because evaluation strategy based on surrogate model is done by computers, the evaluation process in this phase is the same as traditional genetic algorithms. So population size can be enlarged. But how to extend population size is a key problem. Higher the precision of surrogate model is, generalization of the model is better. And population size can be larger. So enlarged population size is defined as follows.
N p (t ) =| P | where
⎢ ⎥ 1 + 0.5 ⎥ ⎢ ⎢⎣ Tr f ( t0 ) Fmax ⎥⎦
(12)
Fmax denotes the upper limit of fitness. Trf (t0 ) expresses the reliability of
surrogate model as evaluation strategy in Phase III is adopted. It is obvious that the value of exponent in formula (12) may be 1, 2 or 3. So population size may be enlarged to corresponding multiple of | P | .
Adaptive Evaluation Strategy Based on Surrogate Model
477
3 Simulations and Analysis 3.1 Background for Simulations
In this paper, fashion evolutionary design system is adopted as a typical background to validate the rationality of adaptive evaluation strategy. The goal of the system is to find a dress which wins the favor of human [8]. Visual Basic 6.0 as programming tool for human-machine interface and Microsoft Access as database are utilized. Matlab 6.5 is adopted to train surrogate model based on artificial neural networks. In fashion evolutionary design system, each dress is composed of collar, skirt and sleeve. Each part has two factors including pattern and color which described by two bits. So each dress is expressed by 12 bits, which act as 6 gene-meaning-units (GMunits)[9]. Each gene-meaning-unit has four alleles. The meanings of each allele in gene-meaning-unit are shown in Table.1. Table 1. The meanings of each allele in gene-meaning-unit allele GM-unit meaning
code
meaning
code
meaning
code
meaning
code
collar’s pattern
medium collar
00
high collar
01
wide collar
10
gallus
11
sleeve’s pattern
long sleeve
00
medium sleeve
01
short sleeve
10
nonsleeve
11
skirt’s pattern
long skirt
00
formal skirt
01
medium skirt
10
short skirt
11
color
pink
00
blue
01
black
10
white
11
3.2 Desired Objectives and Parameters in Experiments
In order to validate the rationality of adaptive evaluation strategy and the influence on performance of IGAs, two groups of experiments are designed. They have different desired objectives which reflect different psychological requirements of human. Desired objectives of experiments are shown as follows. Experiment I: To find a favorite dress fitting for summer without the limit of color. Experiment II: To find a favorite dress fitting for summer and the color is blue. In both experiments, artificial neural network is adopted as surrogate model. The values of parameters about the model and the evolution are shown in Table.2. 3.3 Analysis of Performance About Adaptive Evaluation Strategy
In order to validate the rationality of IGAs with adaptive evaluation strategy (AESIGAs), 30 persons are gathered to do two groups of experiments aiming at desired objective of experiment II.
478
Y.-n. Guo, D.-w. Gong, and H. Wang Table 2. The values of parameters
parameters about the evolution
parameters about the model
crossover probability
mutation probability
population size
generation
ε
ψ
0.5
0.01
8
40
0.7
0.7
input neurons
hidden neurons
output neurons
learning rate
epochs
error
6
15
1
0.09
15000
10
-2
Group I: Comparison of different proportion of population evaluated by surrogate model Fixed proportion of population and adaptive proportion of population evaluated by surrogate model are adopted in experiments respectively. Testing results done by all persons are integrated, as shown in Table.3. Table 3. Comparison of the performance by different proportion of population the proportion of population
average generation
The average number of individuals evaluated by human
ρ( t ) = 0.5
16
100
ρ (t ) = 1
14
72
12
52
-Trf (t )
ρ(t) = Fa (t)(1- e
)
It is obvious that the difference of average generation among different proportion of population is small. But the difference of the average number of individuals evaluated by human is large. First, if ρ(t ) = 0.5 , the number of individuals evaluated by human is equal to half of population size when the condition of startup mechanism is satisfied. But human should evaluate individuals all along. So human shall feel more tired adopting this evaluation strategy than other strategies. Second, if ρ(t ) = 1 , all individuals are evaluated by surrogate model when human feel tired and the model can reflect human preference exactly. Although human evaluate fewer individuals than first strategy, the model are started later than adaptive evaluation strategy. So the number of individuals evaluated by human is more than last strategy. Group II: Comparison of different population size in Phase III Fixed population size and enlarged population size are adopted in experiments respectively. Testing results are shown in Table.4.
Adaptive Evaluation Strategy Based on Surrogate Model
479
Table 4. Comparison of performance by different population size in Phase III
population size
average generation
The average number of individuals evaluated by human
|P|
13
52
12
52
⎢ ⎥ 1 +0.5⎥ ⎢ ⎣⎢Trf (t0 ) Fmax ⎦⎥
| P|
It is obvious that different population sizes in Phase III do not influence the degree of human fatigue in evaluation because the average number of individuals evaluated by human is same. But the speed of convergence adopting enlarged population size is faster than it adopting fixed population size. The reason for this result is that exploration of the algorithm is better while population size is larger. 3.4 Comparison of Performance About IGAs
In order to validate the improvement in performance of IGAs with adaptive evaluation strategy, 30 persons are gathered. Everyone do four experiments, including experiment I adopting IGA and AES-IGA respectively, experiment II adopting IGA and AES-IGA respectively. Aiming at each experiment, testing results done by all persons are integrated, as shown in Table.5. Table 5. Comparison of performance with IGAs and AES-IGAs Experiments
I
II
Evaluation strategies
IGA
AES-IGA
IGA
AES-IGA
Average generation
28
9
40
12
The average number of individuals evaluated by human
224
41
240
52
The average number of individuals evaluated by human in each generation
8
4
8
4
Generations when Fa(t) ≥ ε
-
7
-
9
Comparison of testing results in experiment I, generation adopting AES-IGA averagely reduces 68.9% than IGA. The total number of individuals evaluated by human adopting AES-IGA averagely reduces 80%. These indicate adaptive evaluation strategy can effectively alleviate human fatigue and speed up convergence so as to reduce human burden for evaluation which makes human absorbed in more creative design work.
480
Y.-n. Guo, D.-w. Gong, and H. Wang
Comparison of testing results between two groups of experiments, generations when Fa(t) ≥ ε in experiment I is lower than it in experiment II. This means human are easy to feel tired when they concern more gene-meaning-units. This matches the physiological rules of human.
4 Conclusion In order to farther alleviate human fatigue in interactive genetic algorithms, a novel adaptive evaluation strategy with variable substitution proportion is proposed. Startup mechanism about surrogate model considering the degree of human fatigue and the evaluation precision of the model is given. Variable proportion of population evaluated by surrogate model is proposed. Three phases are given according to the number of individuals evaluated by human in each generation, including evaluated by human only, mixed evaluated by human and the model, evaluated by the model only. In third phase, population size is enlarged. Taking fashion evolutionary design system as a testing platform, the validity of adaptive evolution strategy is validated aiming at different psychological requirements of human. Comparison of testing results adopting IGAs with fixed proportion of population evaluated by surrogate model or fixed population size and AES-IGAs with adaptive evaluation strategy proposed in this paper, they indicate adaptive evaluation strategy can convergent faster than others and human feel less tired. Compared with canonical IGAs, AES-IGAs can effectively alleviate human fatigue and improve the speed of convergence. The surrogate model based on distributed neural networks is the future research. Acknowledgements. This work was supported by the National Postdoctoral Science Foundation of China under grant 2005037225, the Postdoctoral Science Foundation of Jiangsu under grant 2004300, the Youth Science Foundation of CUMT under grant OC 4465.
References 1. Biles, J.A., Anderson, P.G., Loggi, L.W.: Neural Network Fitness Functions for A Musical IGA. In: Proc.of the Symposium on Intelligent Industrial Automation & Soft Computing, pp. 39–44 (1996) 2. Takagi, H.: Interactive Evolutionary Computation: System Optimization Based on Human Subjective Evolution. In: Proc.of IEEE Conference on Intelligent Engineering System, pp. 1–6 (1998) 3. Zhou, Y., Gong, D.W., Hao, G.S., et al.: Neural Network Based Phase Estimation of Individual Fitness in Interactive Genetic Algorithm. Control and Decision 20, 234–236 (2005) 4. Wang, S.F., Wang, S.H., Wang, X.F.: Improved Interactive Genetic Algorithm Incorporating with SVM and Its Application. Journal of Data Acquisition & Processing 18, 429–433 (2003) 5. Lee, J.Y., Cho, S.B.: Sparse Fitness Evaluation for Reducing User Burden in Interactive Genetic Algorithm. In: Proc. of IEEE International Fuzzy Systems, pp. 998–1003 (1999)
Adaptive Evaluation Strategy Based on Surrogate Model
481
6. Sugimoto, F., Yoneyama, M.: An Evaluation of Hybrid Fitness Assignment Strategy in Interactive Genetic Algorithm. In: 5th Workshop on Intelligent & Evolutionary Systems, pp. 62–69 (2001) 7. Guo, Y.N., Cheng, J., Dun, W.G.: Knowledge-inducing Interactive Genetic Algorithms Based on Multi-agent. In: Jiao, L., Wang, L., Gao, X., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 769–779. Springer, Heidelberg (2006) 8. Kim, H., Cho, S.B.: Application of Interactive Genetic Algorithm to Fashion Design. Engineering Applications of Artificial Intelligence, 13, 635–644 (2000) 9. Hao, G.S., Gong, D.W., Shi, Y.Q.: Interactive Genetic Algorithm Based on Landscape of Satisfaction and Taboos. Journal of China University of Mining & Technology 34, 204–208 (2005)
A Study on the Improving Product Usability Applying the Kano’s Model of Customer Satisfaction Jeongyun Heo, Sanhyun Park, and Chiwon Song MC R&D Center, LG Electronics Inc, Kasan-Dong, KumChon-Ku, Seoul, Korea {jy_heo,sanghyun,chiwon}@lge.com
Abstract. User-Centeredness is the popular approach for achieving users’ satisfaction. Nevertheless, when considering profit optimization under economy efficiency and the limit of development period, it is almost impossible to apply solutions to all the usability problems reported during the test. Therefore, the strategic approach is required to maximize the perceived usability under the limited circumstance. Physical User Interaction (PUI) is defined as the physical side view of the usability and the broader concept of the usability. In this research, we constructed UI guidelines for PUI (Physical Usability Interaction) of mobile phone reflecting the user’s value. This research applied the Kano’s model of customers’ satisfaction to classify the PUI guidelines into two groups. One is the design standards which must be satisfied to guarantee the minimum satisfaction. The other is the value-adding criteria to hold a dominant position compared to competitive product. From this categorization, we could use the PUI design guidelines not only for evaluating current product quality, but also for finding the direction of strategic value improvement. Keywords: PUI(Physical User Interaction), Customer satisfaction, classification of usability problem, Perceived usability, kano’s model of customer satisfaction.
A Study on the Improving Product Usability Applying the Kano’s Model
483
Physical User Interaction (PUI) is defined as the physical side view of the usability and the broader concept of the usability. PUI issues come not only from the issues in the ergonomics area which you could easily conceive, but also from the usage experience of similar devices, emotional preferences, and the usage context of each function. Imagine a camera-phone which needs lots of clicks to take a photo cause of the absence of quick access key for the camera mode, you probably could not satisfied with it. Likewise if the buttons is too hard to press, your perceived usability would be bad no matter how other features’ usability is high. A research based on the real industrial field cases reported that usability evaluation and improvement activity not considering users’ value could not lead to the product value enhancement in the real market field. The main reason behind seems that evaluation during the product-development-phase usually focus on the detection of dissatisfaction factor and improvement of detected issues. Besides internal development environment does not allow to improve all the detected issues cause of the economic concept like maximum value ROI(Return on Investment) or constraint on development time. In other words, applying all the issues found to the product is almost impossible. That’s why strategic approach like applying issues with priority is needed. Priority for usability issues are usually decided with considering the severity of the issue itself, and most of the gap between the evaluated usability and user-perceived usability comes from the priority difference. To enhance the perceived usability, we should find the way to reflect the users’ value to the priority of the usability issues. Kano's methods(1984) show a reasonable approach to reflect users’ value and to understand customer-defined quality. He reveals the relations between the customersatisfaction with the product requirement. Furthermore he characterizes product requirements which influence customer satisfaction into three different groups; must-be requirement, attractive requirement, and one-dimensional requirement. Must-be quality is the mandatory one, without that users could not satisfies at all. Attractive requirement is an optional one. If this type of requirement is provided, users may be attracted by the product but without that users may not feel the inconvenience. Onedimensional requirement is functionally related to users’ satisfaction. If not provided, users may be un-satisfied. If provided, users may be satisfied. In this research, we applied Kano’s model on the constructing UI guidelines for PUI (Physical Usability Interaction) of Mobile phone. Classification of usability issues considering potential effect is the first start point of this research. Then we applied Kano’s model of customers’ satisfaction to prioritize the issues reflecting users’ value. This priority is the base of strategic improvement. This research categorizes the UI guidelines into two groups. One is the design standards which must be satisfied. The other is the comparison criteria to hold a dominant position compared to competitive product. From this UI design guidelines may applied not only to evaluating product quality, but also to providing value improvement direction. Furthermore, the use of UI design guideline may expand to user satisfaction from quality control.
2 Kano Model: The Theory of Attractive Quality Kano’s Model(1984) is based on the two-factor theory of job satisfaction by Herzberg(1974) which suggests that the factors causing job satisfaction are different from the factors causing job dissatisfaction. According to Kano, quality could not be explainable with one-dimensional recognition. For instance, people are very dissatisfied
484
J. Heo, S. Park, and C. Song
if they could not make a call, but they are not satisfied if it does. The one-dimensional view of quality could not explain this case. Kano et al(1984) introduced a model which categorize quality attributes, based on customers’ satisfaction with the level of quality. This view is useful to understand how customers evaluate a product. Kano defined the customer expectations for product quality as five levels: 1) must-be, 2) one-dimensional, 3) attractive, 4) Indifferent, and 5) Reverse. Must-be quality is the minimum requirement to avoid the customer’s dissatisfaction. It also introduces as must-have level. One-dimensional quality has the one-dimensional characteristics regarding quality and satisfaction; users’ level of satisfaction is proportional to the provided quality. Attractive quality has the contrary meaning of the basic quality. Though the absence of attractive quality does not promote the user’s dissatisfaction, if provided these features could excite and delight users. Indifferent quality is the quality which does not result in either customer satisfaction or customer dissatisfaction. Though Kano model explains the relationship between product quality and users’ satisfaction, it also applicable to the relationship between the usability and users’ satisfaction. (Jolka, 2005) Fig.1 shows the applied Kano Model.
Fig. 1. Kano Model applied on the usability domain
We defined the characteristic of utility for uses’ satisfaction as three levels: 1) basic, 2) opportunity, and 3) attractive. For clear meaning the term is changed, and the meaning is almost same as explained above. While developing a product, all of the guideline could not be applied cause of environment constraints like time and resource. Even worse there may be some conflicts among guidelines, so we should decide which one to apply. By adopting the Kano’s model of customers’ satisfaction, we could classify the PUI guidelines with priorities, and this will be helpful to for finding the direction of strategic value improvement.
3 Constructing PUI Guideline Using Kano’s Model The guideline is a usable tool in organization to systematically improve and monitor the usability of the product. A guideline could be structuralized by grouping the cases of PUI issues then revised from the applicable design principles. This section introduces the details of PUI guideline suggested.
A Study on the Improving Product Usability Applying the Kano’s Model
485
3.1 Constructing PUI Guidelines Total 106 design evaluation lists are designed based on the design principals which are obtained from the usability problem reported through commercialized products. Among these defined evaluation lists, some are proposed to Appendix. 3.2 Adopting Kano’s Model to Categorize He Collected PUI Guidelines Kano’s survey model asks the users’ preferences in both cases of where the effect is given and where it is not. The organized evaluation lists are performed with sixty users. These users are not sexually considered and the average age is between twenty four and thirty one. Kano’s survey model is designed as following.
Fig. 2. A pair of requirement questions in a Kano questionnaire
The results can be classified into three categories according to the Kano’s evaluation standard shown in Fig. 3 below.
Fig. 3. Kano evaluation table adapted from Berger et al.(1993)
Kano’s classification defines the representing property as the one that has the most votes. However, this method hardly reflects the difference between preferences of each user. To compensate Berger(1993) proposed users’ satisfaction coefficients. However, If
486
J. Heo, S. Park, and C. Song
most of responses are irrelevant one like indifferent, reverse or questionable, another index for selecting meaningful properties is needed. We proposed effective in order to isolate effective responses from those are not. All three coefficient is represented as follows; Users’ satisfaction coefficients = (A + O) / Total responses .
(1)
Users’ dissatisfaction coefficients = (O + E)/ Total responses.
(2)
Effective Coefficients = (A + O + E)/ Total responses
(3)
The user satisfaction coefficient is the ratio of ratio of positive responses. “Onedirectional” and “Attractive” are the positive responses because they are directly proportion to the increment in usability. The user dissatisfaction coefficient is defined as the ratio of negative. “One-directional” and “Must-be” are the negative responses cause which tend to decrease users’ satisfaction when usability is inferior. At this time, the total response is defined as the sum of four kinds of response except the irrelevant responses which seems discreditable and the questionable response. Also, effective coefficient is defined as the ratio of meaningful properties. “Must-be,” “One-directional,” and “Attractive” are the elements that directly affect the user. In this research, we defined the property with 0.65 or higher effective coefficient as valid. Fig.4 shows graph of valid properties with 0.65+ effective coefficients representing relationships between the user satisfaction and dissatisfaction coefficients.
Fig. 4. Three Utilities Classified Using Satisfaction Coefficients
According to Fig. 4 ”One-Dimensional” property is positioned in the first quadrant, “Must-be” is in the second quadrant, and “Attractive” is in the fourth quadrant. The properties in the third quadrant are known as indifference property according to the Kano’s model. 3.3 Strategic Use of Proposed PUI Guidelines Properties that are related with fundamental quality are defined as “Basic Utility.” The Basic Utility includes screen resolution and size when watching DMB (Digital
A Study on the Improving Product Usability Applying the Kano’s Model
487
Media Broadcasting) or setting keys to prevent user error such as an outer interruption or unintended input. Moreover, emotional satisfaction factors relating cell-phone design and images such as a well-taken picture are defined as “Opportunity Utility.” This categorization well explains that users’ expectation becomes higher as cellphone’s functions and quality get better. However, unexpectedly, “Attractive Utility” barely includes any lists, because the lists used in survey are already extracted from the preexisting usability problems and design principals. Also, improvement based on the preexisting usability evaluation can rarely bring an epochal reformation of the product and customer’s satisfaction as well. Improvement strategy can be described as the following based on Kano model’s three types of property classification. Basic utility is defined as a product usability standard which should be applied to the product. If not, product would be failed at the market cause of users’ claim. Opportunity is utilized as a comparison evaluation standard, because the products need a marketability power in order to compete other companies. Below diagram shows the suggesting strategy for effectively improving the usability of product.
Fig. 5. Strategy for Improving Usability considering the characteristic of Utility
Especially, since these properties have an independent relationship, Basic utility should be applied to the product because users’ dissatisfaction can be occurred when this is not satisfied.( Jokela, 2004) Moreover, this cannot be helped with an addition of attractive utility. Also, this property classification depends on time so that attractive utility could be basic utility after some years. Particularly, this trend tends to appear on the products, which requires a short development period such as a cell phone; therefore, a constant usability testing is necessary in order to understand a new kind of attractive utility and to apply this new property to the product.
488
J. Heo, S. Park, and C. Song
4 Conclusion In this research, we applied Kano’s model on the constructing UI guidelines for PUI (Physical Usability Interaction) of Mobile phone. Physical User Interaction (PUI) could be seen as the Physical side view of the usability and a broader view of the usability. Besides, PUI seems the most influential aspect on users’ satisfaction. Usability issue of PUI should be taken into account from the concept phage of product considering the issue characteristics. As development process goes on, possible region of physical design change shrinks rapidly. Guideline is a usable tool in organization to systematically improving and monitoring the usability of the product. Classification of usability issues considering potential effect is the first start point of this research. Then we applied Kano’s model of customers’ satisfaction to prioritize the issues reflecting users’ value. This priority is the base of strategic improvement. This research categorizes the UI guidelines into two groups. One is the design standards which must be satisfied. The other is the comparison criteria to hold a dominant position compared to competitive product. From this UI design guidelines may applied not only to evaluating product quality, but also to providing value improvement direction. Furthermore, the use of UI design guideline may expand to user satisfaction from quality control. The benefits of applying Kano model to design process are summarized as follows; 1) It is possible that the characteristics and criteria of product which affect to user satisfaction could be revealed. Besides, potential element which users usually may not describe explicitly could be understood. 2) It helps to find out the effect of the designer’s intention to users’ satisfaction. 3) The categorization of characteristics may be used as the criteria for the decision making. Especially limits on development resources exists, proposed categorization may be used as the criteria for deciding to focus on which characteristics. 4) It is easy to apply with many numbers of users, while most of common methods collecting users’ needs like Focus Group Interview are only applicable with small numbers of users.
References 1. Berger, C., Blauth, R., Borger, D., Bolster, C., Burchill, G., DuMouchel, W., Pouliot, F., Richer, R., Rubinoff, A., Shen, D., Timko, M., Walden, D.: Kano’s methods for understanding customer-defined quality. The Center for Quality Management Journal, 2(4) (1993) 2. Bevan, N.: Quality in use: meeting user needs for quality. Journal of Systems and Software 49(1), 89–96 (1999) 3. ChiWon, S., JeongYun, H., SangHyun, P.: Evaluating elements of the physical user experience (usability) of mobile device. In: Proceedings of HCI2006, Korea (2006) 4. ChiWon, S., JeongYun, H., SangHyun, P.: Classifying emotional elements of Mobiledevices to evaluate physical interface usability. In: Proceedings of Korean Society for Emotion and Sensibility 2006, Korea (2006) 5. Herzberg, F.: Work and the Nature of Man (1974) 6. JeongYun, H., SangHyun, P., ChiWon, S.: A Study of Improving Product Usability Based on the Classification of Usability Problems Considering Users’ Satisfaction 7. Jokela, T.: When Good Things Happen to Bad Products: Where are the Benefits of Usability in the Consumer Appliance Market?”, Ineractions, pp. 29–35 (2004)
A Study on the Improving Product Usability Applying the Kano’s Model
489
8. Lofgren, M., Witell, L.: Kano’s Theory of Attractive Quality and Packaging, 2005. Quality Management Journal 12(3), 7–20 (2005) 9. Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive quality and must-be quality. The Journal of Japaneses Society for Quality Control 14(2), 39–48 (1984) 10. Matzler, K., Hinterhuber, H.H.: How to make product development projects more successful by integrating Kano’s model of customer satisfaction into quality. Technovation 18(1), 25–38 (1998) 11. Zhang, P., von Dran, G.M.: Satisfiers and Dissatisfiers: A Two-Factor Model for Website Design and Evaluation. Journal of The American Society For Information Science 51(14), 1253–1268 (2000)
Appendix: Part of PUI Guideline with Kano Classification Part of PUI guideline consisted with valid items considering efficient coefficient is provided. Classification applying Kano model and users’ satisfaction coefficient and dissatisfaction coefficient is also included. Table 1. Classification example of PUI guideline
The Practices of Usability Analysis to Wireless Facility Controller for Conference Room Ding Hau Huang, You Zhao Liang, and Wen Ko Chiou 259 Wen-Hwa 1st Road, Kwei-Shan Tao-Yuan, Taiwan, 333, R.O.C Chang Gung University [email protected]
Abstract. Increasingly there are more and more advantageous technical facilities and automated systems visible in business conference rooms. One of most advantageous media from the central system to users is the wireless facility controller and it is expected to bring individuals more convenience and efficiency by assisting them to control many kinds of media. This paper discusses ‘usability analysis’ with a ‘scenario-based’ approach on ‘user-oriented’ design concepts early on in the product design process through a practical case study concerning the controller. This study suggests a practical approach of scenario and usability analysis through a simple, structured framework. The framework is outlined by three major components: the design strategy from analyzing competitors’ products with scenario-based approach consisting of user, product, applications and field of use as context variables, usability analysis with product interaction and user’ observations with existing problems. Keywords: Wireless facility controller, User-oriented design, Usability, Interaction, Innovation.
The Practices of Usability Analysis to Wireless Facility Controller
491
interface’ to control these controllable appliances and when there are many different kinds of appliances to control, the user interface of the controlling device might be very challenging to use. The changes in the infrastructure may need software or even hardware changes to the user interface as well, therefore, controlling them all in an efficient way requires much thought on the user interface and usability issues [4]. Moreover the important function of a smart business conference room is to help document meetings: i.e. to capture and index the various activities that occur during meetings, presentations and teleconferences. Other functions include controlling room equipment and ambience, managing media streams, and providing networked electronic whiteboards and note taking devices [1]. All of the above are highly interactive activities and user orientated, if there was a universal wireless facility controller that could integrate all of these multi-functions, then how to ‘understand’ the context of people interacting with them is a very important issue. ‘Scenario’ based approach is useful in stimulating user-centered ideation, illustration of issues, evaluation of design ideas from a user's point of view and showing the role of a product in a larger context of use. All these are important activities in establishing a common user centered focus, particularly in the earliest phases of product design. Scenario building provides the human factors professional an elective additional method for exploring, prototyping and communicating human factors issues within a product design context [5]. Lim & Sato [3] created a method for generating scenarios through use of aspect models based on design information framework (DIF) structure and addressed the scenario generation technique to design is that designers can effectively analyze complex use situations through multiple aspects, and identify problems and requirements that lead to further design problem solving. The scenarios that clearly embed rationales for solutions become valuable reference sources of use context information throughout the design process. However the DIF was developed to enable designers to organize and manipulate information throughout a design process. In a design process, templates for archiving information into a DIF structured database can be generated, and all types of design information, such as user studies data, design concepts, models, scenarios, and prototype models, are then structured by those templates (shown as figure 1) [2].
Fig. 1. Design information framework [2]
492
D.H. Huang, Y.Z. Liang, and W.K. Chiou
During this research, we collaborated with ADVANTECH Co. Ltd, Taiwan, who wish to develop the wireless facility controller, and is a leader in the industrial computing and automation market, and has more than 20 years experience [6]. ADVANTECH covers the complete market share of integrated solutions, from industrial automation to medical computing to home automation. Nevertheless, it is the engineering and marketing departments who are concerned about user oriented design approaches to fit the new interactive generation. Thus this paper discusses usability analysis with a scenario-based approach on UOD concepts early on in the product design process through a practical case study. This paper describes some advantages and potential pitfalls in using scenario and usability analysis, and provides examples of how innovative concepts are developed and can be applied usefully.
2 Methods This research took on the form of a case study in progress of new interactive product concept develop processes which included a wireless facility controller for conference rooms. The wireless facility controller is a kind of interactive device which has a wireless touch panel and fast operation keys that can control other automated devices. We applied the usability analysis and scenario-based approach to UOD concepts to create an innovative design concept and to synthesize the processes to develop an interactive product development framework. The detailed process is as follows. Firstly, the analysis of competitor’s products took place to elicit all raw information such as the function of the product, applications, user groups and field of uses. Then deeper more involved analysis was conducted, using scenario-based approach, which included product functions to applications, field of use to scenarios, and scenarios synthesis. The third stage was product usability analysis to define the product positions and the fourth was user observation. After appraising all the information, we discussed all the processes to establish an interactive product development framework. 2.1 Monitored Competitor Offering Competitor’s product analysis: The two most famous companies and worldwide leaders in advanced controlled and automation systems were chosen to elicit all raw information. The first, CRESTRON, is the world's leading manufacturer of advanced control and automation systems, innovating technology and reinventing the way people live and work. Offering integrated solutions to control audio, video, computer, IP and environmental systems, CRESTRON streamlines technology, improving the quality of life for people in corporate boardrooms, conference rooms, classrooms, auditoriums, and in their homes [8]. The second is AMX a worldwide leader in advanced control and automation technology for commercial and residential markets. The company’s hardware and software products simplify the way people interact with technology. This includes making it easier for system integrators to sell, program and install AMX products – ranging from touch panels, keypads and handheld remotes to customizable resource management tools – as well as making the overall AMX enduser experience intuitive and simple [7].
The Practices of Usability Analysis to Wireless Facility Controller
493
During this stage initially a product ‘tree map’ was used to identify all the related devices to facilitate an overview of the whole product line and to know what kind of types we should focus on. After deducing all the raw product information such as company, product type, prices, specifications, fields of use, characteristics, product semantics and related accessories with product analysis card which can build the product information database. And the database can be a strong information bank. 2.2 Scenario Analysis and Product Classification This stage was separated into three parts: The first was ‘product functions to applications’ in order to find which functions were supplied in the main market. For example after analyzing competitors’ products we could elicit all the applications including lighting automation, AV control, light control, presentation tools, windows treatments, voting systems, drapes shades & screens, climate control, internet access, remote manager, network campuses, media manager, security & intercoms, central control, security systems, automated bell towers, record manager and electronic menus. The second was ‘field of use to scenarios’ to understand what kinds of applications fit the individual field of use such as business, whole home, home theater, government, education, house of worship, MDU, private transportation, entertainment, healthcare, broadcasting, network operation, and retail hotels; as well as choosing the main target of these fields of use. For this research we conducted the business conference as an example. The third part was ‘scenario synthesis’ to factor-in ADVANTECH’s strategy and to identify the main scenarios, field of uses and its applications. 2.3 Usability Analysis and Product Positioning In this stage there were seven related products from seven different companies to facilitate usability analysis to understand basic specifications, hardware interface and their accessories. Finally the illustration of radar was used to synthesize and compare with the product characteristics. According to this we could define the product position. 2.4 User Observation with Existing Problems During this stage researchers went to observe the ‘field situation’ where the central control system and wireless facility controller was installed at an assembly hall and business conference room. We conducted direct ‘user observation’ to discover how the user interacted with the system, as well as asking questions which focused on existing problems. 2.5 Constructive Demo Design After each stage ‘rough’ design issues were synthesized facilitating several ‘demo’ designs. However only after completing all processes would the key design issues be gleaned, and then accordingly a ‘constructive demo’ design could be developed.
494
D.H. Huang, Y.Z. Liang, and W.K. Chiou
3 Results After completing all processes the main findings will be discussed as follows: firstly, the findings gleaned from ‘scenario analysis’, secondly, the findings from the ‘usability analysis’; and thirdly the findings gleaned from ‘user observation’. To synthesize all the results, the new design concept can be accepted by the development team members and the innovative results are better than the traditional method. 3.1 The Results of Competitor’s Product and Scenario Analysis Firstly, the findings gleaned from scenario analysis: modularization, standardization and diversification (shown as figure 2).
Fig. 2. The application of different field of use
By analyzing the ‘product functions to applications’, ‘field of use to scenarios’ and ‘scenarios to interaction’, we found that many applications had the same key functions, so we need to design a main type which has the same key functions but can still add functions to fit other fields of use. The main type must have one base cover which is different from the front cover and has different fast keys, styles and textures. According to these results the ‘demo design’ shown as figure 3, was refined from ADVANTECH’s UbiQ350 (a kind of wireless facility controller). In this stage we also applied ‘scenario approach’ to simulate the main user target and the different styles of the conference room (shown as figure 4). All users’ characteristics and space ‘styles’ can be references early on in the design process.
The Practices of Usability Analysis to Wireless Facility Controller
495
Fig. 3. The demo concept from competitor’s product and scenario analysis
Fig. 4. Scenario analysis of main targets and styles
3.2 The Results of Usability Analysis By analyzing the product hardware interaction interface shown in figure 5, the product illustration radar has eight dimensions including product functionality, hotkey numbers, the ease degree of handling, shape, price, battery life, whole product size and LCD capability. Through the main findings of usability analysis by the radar: we discovered that the controller should retain its multi-function aspect to remain convenient, so we separated the touch panel and fast key to form different parts. The independence of the fast key part can be used conveniently and also supply advanced settings through the touch panel. 3.3 The Results of User Observation Concerning the findings of user observation, we found that after ‘user observation’ there was an existing problem: that is that the handling should be improved. This should include advanced settings and accident prevention (shown as figure 6).
496
D.H. Huang, Y.Z. Liang, and W.K. Chiou
Fig. 5. The product position analysis
Fig. 6. User observation situation
3.4 The Results of Final Design To synthesize all the results, the new design concept could be accepted by the development team members and the innovative results were better than the traditional method. There are seven design issues which were synthesized including: (1) Modularizing the hardware and software to easily fit different fields of use; (2) The number and the function of fast keys are changeable with different applications; (3) More function, but retaining the same convenience; (4) Personality and safety setting; (5) Easy to handle; (6) Preventing unexpected start; (7) One touch scenario pre-setting.
The Practices of Usability Analysis to Wireless Facility Controller
497
According to all of the above the final design is shown in figure 7.
Fig. 7. Final design
The touch panel and fast key form different parts. The tube of the fast key part can be used conveniently and also supply advanced settings through the touch panel. 3.5 Design Framework The user-oriented innovation design framework (UOIDF) was developed by means of a practical design case. The UOIDF was developed to enable designers to organize and manipulate user data or user oriented information throughout a design process. In a design process, templates for archiving information into a UOIDF structured database can be generated, and all types of design information, such as product functions, applications, field of use, observation and interactions, are then structured by those templates. Product strategy
Product positioning
Competitor offering
Usability analysis
Scenario based approach
*user *product *applications *field of use
User interaction
Product improvement User observation Scenario based approach
4 Conclusion This study suggests a practical approach of scenario and usability analysis through a simple, structured framework. The framework was outlined by three major components: the design strategy from analyzing competitors’ products with scenario-based approach consisting of user, product, applications, and field of use as context variables, usability analysis with product interaction and user’ observations with existing problems. Based on this framework, this study established methods to specify interactive product features, to define development context, and to measure usability. The effectiveness of this framework was demonstrated through case studies in which the usability of interactive products was developed by using UOD concepts in this study.
5 Implications This study is expected to help product design practitioners in the consumer electronics industry in various ways. Most directly, it supports the plan and conduct of product development teams to develop new concepts in a systematic and structured manner. In addition, it can be applied to other categories of consumer interactive products (such as appliances, automobiles, communication devices, etc.) with minor modifications as necessary.
References 1. Chui, P., Wilcox, L.: Kumo Interactive: A Smart Conference Room. DARPA/NIST/NSF Workshop on Research Issues in Smart Computing Environments. 2. Lim, Y., Sato, K.: Development of design information framework for interactive systems design, In: Proceedings of the 5th Asian International Symposium on Design Research, Seoul, Korea (2001) 3. Lim, Y., Sato, K.: Describing multiple aspects of use situation: applications of Design Information Framework (DIF) to scenario development. Design Studies, 27(1) (2006) 4. Ritala, M., Tieranta, T., Vanhala, J.: Context Aware User Interface System for Smart Home Control. In: HOIT 2003 Conference. Irvine, California (2003) 5. Suri, J.F., Marsh, M.: Scenario building as an ergonomics method in consumer product design. Applied Ergonomics 31, 151–157 (2000) 6. ADVANTECH: http://www.advantech.com/ 7. AMX: http://www.amx.com 8. CRESTRON: http://www.crestron.com/
What Makes Evaluators to Find More Usability Problems?: A Meta-analysis for Individual Detection Rates Wonil Hwang1 and Gavriel Salvendy2 1
School of Industrial & Information System Engineering, Soongsil University 511 Sangdo-Dong, Dongjak-Gu, Seoul, 156-743, South Korea 2 School of Industrial Engineering, Purdue University 315 N. Grant St. West Lafayette, IN 47906, USA and Department of Industrial Engineering, Tsinghua University Beijing 100084, P.R. China [email protected], [email protected]
Abstract. Since many empirical results have been accumulated in usability evaluation research, it would be very useful to provide usability practitioners with generalized guidelines by analyzing the combined results. This study aims at estimating individual detection rate for user-based testing and heuristic evaluation through meta-analysis, and finding significant factors, which affect individual detection rates. Based on the results of 18 user-based testing and heuristic evaluation experiments, individual detection rates in user-based testing and heuristic evaluation were estimated as 0.36 and 0.14, respectively. Expertise and task type were found as significant factors to improve individual detection rate in heuristic evaluation.
In this situation, that is, a situation in which many empirical results have been accumulated from previous usability evaluation research, providing usability practitioners with generalized guidelines by analyzing the combined results of this body of work will be more useful than conducting research that involves all the factors. The objectives of this study are (a) to estimate individual detection rate for user-based testing and heuristic evaluation through meta-analysis, and (b) to find hidden factors, which affect individual detection rates.
2 Related Literatures Individual detection rate, which indicates the ratio of the number of usability problems found by individual evaluator or test user against the number of real usability problems that exist, is an important measure in usability evaluation research, because it reflects the individual ability to detect usability problems in a certain situation. If the individual detection rate can be estimated more reliably, the optimal sample size issue, one of the most disputable issues in usability evaluation research, can be resolved. When Nielsen [14] suggested so-called ‘4±1’ or ‘magic number five’ rule that indicates we need only 3 ~ 5 evaluators for detecting 80% of usability problems with heuristic evaluation method, the underlying assumption would be that the mean probability of detecting a problem by an evaluator (i.e., mean of individual detection rate) existed between 0.32 and 0.42. However, a lot of empirical research reported that the mean of individual detection rate did not exist in the above range, but in much lower range. For example, Law and Hvannberg [10] reported that mean of individual detection rates was 0.14 when user-based testing with think aloud method was employed, and Andre, Hartson and Williges [1] indicated that mean of individual detection rates was 0.179 when heuristic evaluation was used. Since the results of usability evaluation experiments could not support the underlying assumption of Nielsen’s conclusion, there have been many arguments about optimal sample size and the conditions, in which Nielsen’s conclusion can be supported. Thus, in order to get a more generalized conclusion about the optimal sample size issue, we need valid and reliable estimation of individual detection rate from the accumulated results of usability evaluation experiments. In addition, in case that the means of individual detection rates are significantly heterogeneous, we need to find hidden factors that have effects on the variability of individual detection rates, in order to suggest usability evaluation conditions, in which evaluators’ ability to detect more usability problems can be improved. There are many scientific ways to summarize, integrate, and interpret the independent studies. One of them is meta-analysis, which is the statistical methodology for combining findings from a selected set of studies. Lipsey and Wilson [11] indicated three conditions, under which meta-analysis can be applied. First, meta-analysis applies only to empirical studies that produce quantitative results. Second, metaanalysis is conducted based on the statistics of summarizing the research results rather than the original data sets. Third, meta-analysis aggregates and compares the results of independent studies that deal with the same constructs and relationships and report their results in similar statistical forms. Due to these limited applications, historically meta-analysis has been applied to combine the results from studies that have been
What Makes Evaluators to Find More Usability Problems?
501
repeated, such as astronomical and physical experiments, and social science research [4, 6]. During the 1990s, researchers have tried to combine the results from software engineering studies using the meta-analysis methods [3, 12]. When statistical information, including means and standard deviations, for estimating the effect sizes can be obtained, the parametric estimation models, such as fixed effects model, are utilized [9]. The underlying idea of these models is to combine the estimates of effect sizes that are computed from each study by weighting studies based on inverse of variance. When a statistical test for the homogeneity of effect sizes shows a non-significant result, there is no problem in combining the estimates of effect sizes. Otherwise, researchers need the parametric models that explain the variance of effect sizes (or studies variability) among individual studies. In general, the fixed effects model is employed when a researcher believes that there are systematic sources as potential moderators, which can explain the studies variability. Hedges [8] explained the fixed effects model as the ANOVA analog model, which is used when the study characteristic variables (independent variables) are categorical.
3 Methods 3.1 Data Collection We collected 103 usability evaluation experiments from HCI-related journals, the proceedings of HCI-related conferences, technical reports and books, since 1990. Online academic databases and offline sources were used to search for the relevant studies. Most of references of the papers found as relevant studies were also checked to make sure that no relevant studies were missed. As a result of such extended search efforts without pre-selected sources of studies, most of major HCI-related journals, such as International Journal of Human–Computer Interaction, Behaviour & Information Technology, International Journal of Human–Computer Studies, and Human Factors, and the proceedings of major HCI-related conferences, such as CHI Conference on Human Factors in Computing Systems and Human Factors Society Annual Meeting, were included as the sources of relevant studies. However, only 18 experimental results were used for meta-analysis, because we selected usability evaluation experiments under two criteria: (a) user-based testing or heuristic evaluation method was employed in the experiments, and (b) the experiments reported mean and standard deviation of individual detection rates. Because multiple experiments, which result in an independent experimental result for each, may be reported in a paper, 18 experimental results that are used for metaanalysis in this study come from 10 usability evaluation papers, which were published between 1990 and 2004. Albeit some of experiments were conducted under the same experimental conditions and reported in the same publications, each experiment is considered as an independent experiment that shares some of experimental conditions, such as evaluated systems, task type, and report type, because each experiment is administered independently. All 18 experiments reported mean and standard deviation values of individual detection rates after the interfaces of software products or information systems were evaluated for checking usability problems based on user-based testing (9 experiments) or heuristic evaluation (9 experiments) (see Table 1).
502
W. Hwang and G. Salvendy Table 1. Data collected for meta-analysis
Usability evaluation methods
User-based testing
Heuristic evaluation
Number of test user or evaluators 20
Mean of individual detection rates 0.36
Standard deviation of individual detection rates 0.0006
12 20
0.32 0.42
0.0014 0.0015
[18] [18]
17 36
0.14 0.4625
0.07 0.2032
[10] [13]
18 18
0.0799 0.0926
0.0269 0.0595
[16] [16]
7 6
0.1984 0.2432
0.0707 0.1172
[16] [16]
12 12
0.084 0.094
0.038 0.064
[19] [19]
14 10
0.19 0.179
0.0199 0.032
[5] [1]
16 9
0.203 0.14
0.075 0.123
[2] [2]
16 11
0.279 0.222
0.11 0.095
[2] [2]
18
0.046
0.025
[10]
References [17]
3.2 Methods of Analysis First of all, bubble charts were used to see the overall shape of data before the metaanalysis was conducted. In bubble charts, each bubble represents an experiment, the center of bubble indicates mean of individual detection rates across the number of test users or evaluators, and the radius of bubble means standard deviation of individual detection rates. According to the philosophy of meta-analysis, relatively small bubbles, which indicate smaller standard deviations, are given more importance than big bubbles, which indicate bigger standard deviations, when mean of individual detection rates are estimated. Second, means of individual detection rates from individual usability evaluation experiments were combined using inverse of variance as a weight in order to estimate individual detection rate for user-based testing and heuristic evaluation. Q statistic [15] that is known to follow Chi-square distribution was calculated to test homogeneity of effect sizes (i.e., means of individual detection rates in this study) that were used to estimate parameter (i.e., individual detection rate in this study). Third, the fixed effect model was applied for finding hidden factors that explain the variability of effect sizes (i.e., means of individual detection rates in this study), in case that means of individual detection rates were significantly heterogeneous. In
What Makes Evaluators to Find More Usability Problems?
503
practice, candidates of hidden factors were selected, and then homogeneity tests were conducted repeatedly to check whether each candidate of hidden factors contributed to making homogeneous subgroups until hidden factors were identified as moderators in the fixed effects model.
4 Results 4.1 Bubble Charts Analysis In order to see the overall shape of data that were used for meta-analysis, two bubble charts were drawn for user-based testing and heuristic evaluation. In bubble charts, x-axis represents number of test users or evaluators, and y-axis represents mean of individual detection rate. As shown in Figure1 and Figure 2, the bubbles of user-based testing were more scattered according to x-axis and y-axis than those of heuristic evaluation, and also in terms of size, bubbles of user-based testing were more various than those of heuristic evaluation. It means that the collected data of user-based testing were more scattered in a wide range of mean and standard deviation of individual detection rate and number of test users than the data collected from heuristic evaluation experiments. Thus, we can conclude from bubble charts analysis that means of individual detection rates reported from user-based testing experiments are fairly heterogeneous, whereas means of individual detection rates from heuristic evaluation experiments show somewhat heterogeneous, but the possibility of being sub-grouped.
User-based testing
Mean of individual detection rate
0.6 0.5 0.4 0.3 0.2 0.1 0 0
10
20
30
40
Number of test users
Fig. 1. Bubble chart of user-based testing data
4.2 Estimation of Individual Detection Rate Meta-analyses were conducted for combining results from 9 user-based testing experiments and results from 9 heuristic evaluation experiments, respectively. As shown in Table 2, when user-based testing is used, the estimated individual detection rate is
504
W. Hwang and G. Salvendy
Heuristic evaluation
Mean of individual detection rate
0.6 0.5 0.4 0.3 0.2 0.1 0 0
5
10
15
20
Number of evaluators
Fig. 2. Bubble chart of heuristic evaluation data
0.36 and its 95% confidence interval is 0.361 ~ 0.363. When heuristic evaluation is used, the estimated individual detection rate is 0.14 and its 95% confidence interval is 0.115 ~ 0.164, which is not consistent with the assumption of Nielsen [14]’s conclusion. However, the results of homogeneity tests for effect sizes show that means of individual detection rates both from user-based testing and heuristic evaluation are significantly heterogeneous. Thus, we need to employ the fixed effects model to find hidden factors that affect heterogeneity of means of individual detection rates. Table 2. Estimated individual detection rates and homogeneity tests Usability evaluation methods User-based testing Heuristic evaluation
Estimated individual detection rate
95% confidence interval of mean Lower bound Upper bound
Q statistic
0.3615
0.3605
0.3625
2552.558
0.1393
0.1150
0.1637
27.669
Chi-square (d.f. = 8, α = 0.05) 15.507
4.3 Hidden Factors for Individual Detection Rate in Heuristic Evaluation We considered experimental conditions, such as expertise of test users or evaluators, task type, type of evaluated systems, experimental duration, and report type, as candidates of hidden factors that would explain the heterogeneity of means of individual detection rates. As for user-based testing, means of individual detection rates were too heterogeneous to find homogeneous sub-groups based on the above candidates of hidden factors. Thus, in this study we cannot find proper hidden factors that affect individual detection rate in user-based testing.
What Makes Evaluators to Find More Usability Problems?
505
As for heuristic evaluation, evaluator’s expertise (experts vs. novice) and task type (scenario-based task vs. free exploration) were found as significant factors that explain the variability of individual detection rates. Using evaluator’s expertise and task type, heterogeneous data could be sub-grouped into three homogeneous data, such as heuristic evaluation done by experts, by novice with free exploration task, and by novice with scenario-based task (see Table 3). Because we had only one data resulted from evaluation done by novice with scenario-based task, homogeneity test was not conducted for this case. When novice evaluators conduct heuristic evaluation with free exploration task, the estimated individual detection rate is the highest (0.19). It implies that in order to improve evaluator’s problem detection ability the evaluation conditions of heuristic evaluation need to be set up similar to that of user-based testing (novice + free exploration). Table 3. Hidden factors for individual detection rate in heuristic evaluation Hidden factors Expertise
Task type
Expert
Mixed
Novice Novice
Free exploration Scenariobased
Estimated individual detection rate
95% confidence Interval of mean Lower Upper bound bound
5 Conclusion and Discussion We conducted meta-analyses for combining results from user-based testing and heuristic evaluation experiments, and thus, for estimating individual detection rates in user-based testing and heuristic evaluation. Estimated individual detection rates in user-based testing and heuristic evaluation were computed as 0.36 and 0.14, respectively, but they need to be interpreted carefully because they were derived from heterogeneous data. As for heuristic evaluation, however, based on the fixed effects model with expertise and task type as moderators, we estimated two individual detection rates from homogeneous sub-data: 0.14 (when experts conduct heuristic evaluation) and 0.19 (when novice evaluators conduct heuristic evaluation with free exploration task). Those individual detection rates are not consistent with the assumption of Nielsen [14]’s conclusion. This study makes two contributions in usability evaluation research. First, this study combined results from user-based testing and heuristic evaluation experiments, and estimated individual detection rates in those usability evaluation methods. Usability practitioners can use this generalized conclusion of individual detection rates for deciding optimal sample sizes for usability evaluation. Second, this study found significant factors, such as expertise and task type, to improve individual detection rate
506
W. Hwang and G. Salvendy
in heuristic evaluation. Usability practitioners can consider these factors to improve the performance of usability evaluation. However, this study has limitation in that the number of collected data is small. There are not enough experiments that have reported statistical information for conducting meta-analysis. It was one of reasons why meta-analysis was often given up to apply for usability evaluation research [7]. In addition, the collected data from userbased testing were significantly heterogeneous, but we could not find significant factors that explain the variability of means of individual detection rates. This issue would be left to the future study.
References 1. Andre, T.S., Hartson, H.R., Williges, R.C.: Determining the effectiveness of the usability problem inspector: a theory-based model and tool for finding usability problems. Human Factors 45, 455–482 (2003) 2. Baker, K., Greenberg, S., Gutwin, C.: Empirical development of a heuristic evaluation methodology for shared workspace groupware. In: Proceedings of the 2002 ACM Conference on Computer supported cooperative work, pp. 96–105. ACM, New York (2002) 3. Chen, C., Rada, R.: Interacting with hypertext: a meta-analysis of experimental studies. Human-Computer Interaction 11, 125–156 (1996) 4. Cook, T.D., Leviton, L.C.: Reviewing the literature: a comparison of traditional methods with meta-analysis. Journal of Personality 48, 449–472 (1980) 5. De Angeli, A., Matera, M., Costabile, M.F., Garzotto, F., Paolini, P.: Validating the SUE inspection technique. In: Di Gesù, V., Levialdi, S., Tarantino, L. (eds.) Proceedings of Advanced Visual Interfaces (AVI’2000), pp. 143–150. ACM, New York (2000) 6. Glass, G.V., McGaw, B., Smith, M.L.: Meta-analysis in social research. Sage Publications, Beverly Hills CA (1981) 7. Hartson, H.R., Andre, T.S., Williges, R.C.: Criteria for evaluating usability evaluation methods. International Journal of Human-Computer Interaction 15, 145–181 (2003) 8. Hedges, L.V.: Fixed effects models. In: Cooper, H., Hedges, L.V. (eds.) The handbook of research synthesis, pp. 285–299. Russell Sage Foundation, New York (1994) 9. Hedges, L.V., Olkin, I.: Statistical methods for meta-analysis. Academic Press, Orlando FL (1985) 10. Law, L.-C., Hvannberg, E.T.: Analysis of combinatorial user effect in international usability tests. In: CHI Conference on Human Factors in Computing Systems, pp. 9–16. ACM, New York (2004) 11. Lipsey, M.W., Wilson, D.B.: Practical meta-analysis. SAGE Publications, Thousand Oaks CA (2001) 12. McLeod, P.L.: An assessment of the experimental literature on electronic support of group work: results of a meta-analysis. Human-Computer Interaction 7, 257–280 (1992) 13. Nielsen, J.: Finding usability problems through heuristic evaluation. In: CHI Conference on Human Factors in Computing Systems, pp. 373–380. ACM, New York (1992) 14. Nielsen, J.: Estimating the number of subjects needed for a thinking aloud test. International Journal of Human–Computer Studies 41, 385–397 (1994) 15. Shadish, W.R., Haddock, C.K.: Combining estimates of effect size. In: Cooper, H., Hedges, L.V. (eds.) The handbook of research synthesis. pp. 261–281. Russell Sage Foundation, New York (1994)
What Makes Evaluators to Find More Usability Problems?
507
16. Spool, J., Schroeder, W.: Testing web sites: Five users is nowhere near enough. In: CHI ’01 extended abstracts on Human factors in computing systems, pp. 285–286. ACM, New York (2001) 17. Virzi, R.A.: Streamlining the design process: Running fewer subjects. In: Human Factors Society 34th Annual Meeting. Human Factors and Ergonomics Society, pp. 291–294. Human Factors and Ergonomics Society, Santa Monica CA (1990) 18. Virzi, R.A.: Refining the test phase of usability evaluation: how many subjects is enough? Human Factors 34, 457–468 (1992) 19. Zhang, Z., Basili, V., Shneiderman, B.: Perspective-based usability inspection: An empirical validation of efficacy. Empirical Software Engineering 4, 43–69 (1999)
Evaluating in a Healthcare Setting: A Comparison Between Concurrent and Retrospective Verbalisation Janne Jul Jensen Department of Computer Science, Aalborg University Fredrik Bajers Vej 7, E2-220, DK-9210 Aalborg East, Denmark [email protected]
Abstract. The think-aloud protocol, also known as concurrent verbalisation protocol, is widely used in the field of HCI today, but as the technology and applications have evolved the protocol has had to cope with this. Therefore new variations of the protocol have seen the light of day. One example is retrospective verbalisation. To compare concurrent and retrospective verbalisation an experiment was conducted. A home healthcare application was evaluated with 15 participants using both protocols. The results of the experiment show that the two protocols have each their strengths and weaknesses, and as such are very equally good although very different.
the decrease of mental workload, as the participant is now free to focus on the task at hand. However, a drawback could be that participants quickly forget specific details that occurred in the task solving process and they are then unable to recall these details afterwards [3]. To shed some light on the pros and cons of the two protocols an experiment was conducted. This was done as a field evaluation in the home healthcare system. The reason for choosing this setting and type of evaluation was to make the setting as realistic as possible in order to investigate any possible effects the surroundings might have with regards to sensitivity. Is it possible to observe any awkwardness in using the concurrent think-aloud protocol compared to the retrospective thinkaloud protocol, with respect to a sensitive setting?
2 The Experiment To compare concurrent vs. retrospective verbalisation in a healthcare setting and to test the appropriateness of each protocol, an experiment was conducted. It was set up as a field evaluation to create as realistic settings as possible. The system chosen for evaluation was an application developed to aid home healthcare workers in their daily work. It is an electronic replacement to the existing paper-based system which is currently in use in many municipalities in Denmark. It supports the current work-procedure as well as offer new functionality such as wireless access to added information about the elder citizens and the progress of coworkers, information that earlier was available only at the main office building. 2.1 Participants 15 participants were chosen with the help of the head of the group of home healthcare workers with due consideration for work plans etc. All 15 were trained home healthcare workers and their demographic data is shown in table 1.
Experience computer (1-6)
Concurrent
Average High Low Average High Low
Experience total
Retrospective
Experience local
Protocol
Age
Table 1. The demographic data of the 15 participants in the two protocols
42.0 54 33 42.4 57 31
5½ 12 2½ 7 18 1
8¼ 13 3¾ 10.3 23 1½
3 6 1 3.9 6 1
The table shows the age, the experience as home healthcare workers in the municipality where the experiment took place, the experience as home healthcare workers in total and the level of experience with computers on a scale from one to six where 1 is
510
J.J. Jensen
most experienced and 6 is least experienced. For each of these variables, the high low and average has been calculated for each of the protocols. 2.2 Equipment To support the field evaluation a mobile laboratory was used. It consists of small clip-on wireless mobile cameras (see figure 1), wireless microphones and a mobile digital video recorder. To run it all, it furthermore requires various types of batteries and receivers for the wireless technology. Only the camera and microphone are carried by the participant, the rest is carried by the test monitor packed in a small bag (see figure 2 and 3).
Fig. 1. The small clip-on wireless mobile camera from the mobile laboratory
Fig. 2. The equipment in the mobile laboratory used for concurrent verbalisation
Fig. 3. The mobile laboratory packed up for use
Fig. 4. The setup for retrospective verbalisation
For retrospective verbalisation, the digital recordings from the mobile video recorder were played back to the participant and the retrospective verbalisation was caught using a camcorder (see figure 4).
Evaluating in a Healthcare Setting
511
2.3 Procedure To gain the necessary insight into the field of home healthcare, a small ethnographic field study was conducted. Based on a thorough examination of the system and the insight gained from the ethnographic field study the 8 tasks that covered a wide range of the commonly used functionalities in the application were designed and the experiment was then designed in detail. With the design of the experiment in place, a pilot was conducted for both protocols and the setup was adapted according to the minor issues discovered. 15 participants were recruited from a local municipality. 14 were female and one male, which was representative for the employment situation where women far outweighed the men. The actual experiment took six days and all evaluations were recorded on video. The evaluations took place in six different homes of actual elderly citizens, with the citizen present during the evaluation to further heighten the realism in the experiment. 7 of the 15 participants were assigned to evaluate using retrospective verbalisation, while the remaining 8 participants evaluated the application using concurrent verbalisation. Each of the participants was given a thorough introduction to the experiment, explaining the equipment and its function, what their contribution was, what was expected of them, what would happen etc. They were also instructed thoroughly in how to apply the protocol assigned to them. They were then given 10 minutes to freely familiarise themselves with the system, before trying to solve the tasks. After the introduction the experiment itself took place in the home of an elderly citizen where the participants attempted to solve the tasks handed out. 8 participants solved them thinking aloud during the evaluation whereas the other 7 had their test session played back to them on a screen afterwards and were thinking aloud during the replay. Upon completion of the evaluation each participant was debriefed. All the raw video data was analysed afterwards and a list of problems was constructed. The severity of each of the problems was categorised according to the definition by Rolf Molich [5]. According to the definition a problem experienced by a participant falls in one of three categories: • Cosmetic: The user is delayed for less than one minute, is mildly irritated or is confronted with information, which to a lesser degree deviates from the expected. • Serious: The user is delayed for several minutes, is somewhat irritated or is confronted with information, which to some degree deviates from the expected. • Critical: The users attempt to solve the task comes to a halt; the user is very irritated or is confronted with information which to a critical degree deviates from the expected. The categorisation was done by observing the video recording of each participant, and then evaluate each situation according to the guidelines described above. A given problem is often not experienced equally serious by each participant, and in those cases the problem is categorised in the most severe category.
512
J.J. Jensen
3 Results This section sums up the observations made from the list of problems, which was extracted from the analysis of the raw video data. 3.1 Problems Revealed In total, 105 problems were identified through the evaluation and interestingly the participants using concurrent verbalisation revealed a total of 87 problems whereas the participants using retrospective verbalisation only experienced 61 problems in total. This is a quite big difference which origin is not clear. One explanation could be that the participants evaluating with retrospective verbalisation has an average computer experience level that is almost a point better (3.0) compared to that of the participants using concurrent verbalisation (3.9) on a scale from 1 to 6 (see table 2). Table 2. Total number of problems, unique problems and the average computer skill of the participants
All Problems revealed Unique problems* Average computer experience
105 44 3.4
Concurrent Verbalisation 87 30 (47) 3.9
Retrospective Verbalisation 61 14 (33) 3.0
* Note that the number in parentheses refers to problems that are unique to that protocol and not necessarily unique in total.
3.2 Unique Problems When looking at the number of unique problems the experiment in total reveals 44 unique problems. 30 of these are problems revealed by the concurrent verbalisation protocol, whereas the retrospective verbalisation protocol only experience 14 of the 44. Even if we look at problems that are unique to each of the protocols, concurrent verbalisation discovers 47 problems that are unique to that protocol, whereas retrospective verbalisation only encounters 33 problems that are unique to that protocol (see table 2). It has long been debated in the literature whether unique problems were real or “false” problems, since they had only been encountered by one participant during the evaluation, and how this seems increasingly likely when the number of participants increase. If unique problems are indeed “false” problems, then this experiment could indicate that retrospective verbalisation is better at eliminating these “false” problems. This could be because the protocol is of a recall-nature, where the participant simply recalls fewer of these “false” problems afterwards than what would be verbalised in the situation, due to it not really being a problem after all. 3.3 “False” Problems – Do They Exist? However, retrospective verbalisation finds only slightly more than half of the total number of problems, and the question is if nearly half of the problems found can be
Evaluating in a Healthcare Setting
513
considered “false” problems. When looking at the severity, concurrent verbalisation finds more problems in all three categories. If the problems found extra by concurrent verbalisation were “false” problems, it would be fair to assume that they would appear mostly as cosmetic problems. However it is difficult to dismiss problems that are categorised as critical as being false, so eliminating “false” problems can only partly explain why retrospective verbalisation finds only slightly more than half the problems. Another explanation might be that the participant forgets some of the problems in the short time between the evaluation and the retrospective verbalisation. Perhaps problems seem less frustrating when looking back, than when in the middle of it. It is possible that it is easier for the participant to keep the overview when sitting outside the situation looking in. 3.4 Problems Detected by Both Protocols There are 43 problems that are registered by both protocols. As an example one problem was that the participant did not enter username and password before pressing the “login”-button. In another problem the participants did not understand the error message displayed to them. Thirdly, the participants think “Unplanned task” adds an extra task to the visit in progress. These three problems are typical for the 43 problems in common of the two protocols and the initial inspection does not reveal any connection between them that explains why exactly those problems have been revealed by both protocols. The same is the case with the unique problems that also doesn’t seem to have anything in common. Examples of those are: The participant thinks TAB will move the cursor to the next text field. Secondly, a participant is unsure how to end a visit in progress. Thirdly, a participant is unsure what data the “search”-button searches in. 3.5 Few or Many – Nothing in Between It is notable that in concurrent verbalisation it seems like the participants fall in one of two groups. They either experience few or many problems and not the average in between, whereas the number of problems experienced by the participants in retrospective verbalisation is more evened out. Three of the participants using concurrent verbalisation experience only few problems (6-11) while the other five experience many (21-36), but none of the participants experience the average number of problems in between (12-20). This could be due to difficulties in verbalising concurrently with the task-solving, as has been reported as a drawback of the concurrent thinkaloud protocol [7]. This can materialise itself either as very little verbalisation due to difficulties doing that simultaneously with the task-solving (few problems experienced) or by extra problems occurring due to lack of concentration caused by the simultaneous verbalisation (many problems experienced). In retrospective verbalisation this is much more evened out, because the mental workload is lowered by letting the participants concentrate on one thing at a time and the differing number of problems experienced might simply be caused by their varying computer skills and also differing skills in recalling their thought process at the time in details.
514
J.J. Jensen
3.6 The Diverse Participants Each participant in concurrent verbalisation revealed an average of 20.8 problems, whereas each participant in retrospective verbalisation only discovered an average of 16.0 problems (see table 3). This difference is not particularly big though when considering the large spread in experienced problems between the participants, and this spread is probably to be expected in a group of participants as diverse as the present one. The group contained a wide variety both in job experience and computer experience and as such it would have come as a surprise if the amount of problems experienced were similar between the participants. Table 3. Average number of problems experienced totally and for each of the two protocols
Total Average problems
18.5
Concurrent Verbalisation 20.8
Retrospective Verbalisation 16.0
4 Discussion Many attempts have been made to determine which of the two verbalisation protocols are better, but so far the results are differing between studies. Nielsen et al. [7] discover quite a few weaknesses in concurrent verbalisation, and propose that Mind Tape (a version of retrospective verbalisation) is a more viable option, whereas van den Haak et al. [9] rate the two protocols as being equally good although clearly different. This study indicates that concurrent verbalisation finds more problems than retrospective verbalisation, but it seems that this can be both a good and a bad thing. Good, if it means that the number of “false” problems (unique) is minimized; bad since it is not only “false” problems that aren’t discovered. Concurrent verbalisation on the other hand seems to lay a higher mental workload upon the participant, causing them to focus either on the task-solving process and thus tend to forget to verbalise or to focus on the verbalisation thus loosing concentration on the task-solving. However, the reason that retrospective verbalisation finds less problems might be that even in the short time between the actual evaluation and the retrospective verbalisation, things have already started to fade in the memory of the participant and problems are being forgotten. Thus, the conclusion tends to lean towards that of van den Haak et al. [9] that they are equally good, but very different. As the observant reader might have noticed, the two protocols in the experiment had an uneven number of participants: 8 participants used concurrent verbalisation, while only 7 participants used retrospective verbalisation. This of course influences the results in the subsection Problems Revealed of the Results-section, but even if the numbers are corrected to compensate for that (done by taking all possible combinations of 7 participants out of the 8 and then taking of the average of the amount of problems found by these combinations of 7 participants in concurrent verbalisation), concurrent verbalisation still reveals 81.125 problems to retrospective verbalisations 61. This is still a notable difference and does not change the conclusions drawn. The same is the case in the subsection Unique Problems where concurrent verbalisation
Evaluating in a Healthcare Setting
515
still finds 27.3 of the globally unique problems (compared to the 30) and 41.1 problems that are unique to that protocol (compared to 47) when the numbers are corrected to compensate for the extra participant as descried above. Here the differences too are still noteworthy even after the compensation and therefore does not change any of the above written. It of course looks a bit odd to be talking about a fraction of a problem, but it is simply to illustrate the average amount of problems that would have been experienced, if we had only used 7 participants and not 8, regardless which 7 participants we were to choose of the 8. With the corrected numbers, table 2 would then look as can be seen in table 4. Table 4. Table 2 as it would look with the corrected numbers for concurrent verbalisation
All Problems revealed Unique problems* Average computer experience
105 44 3.4
Concurrent Verbalisation 87 27.3 (41.1) 3.9
Retrospective Verbalisation 61 14 (33) 3.0
* Note that the number in parentheses refers to problems that are unique to that protocol and not necessarily unique in total.
One purpose of the experiment conducted was to look at the suitability of the protocols for sensitive settings, in this case healthcare in a field evaluation: Surprisingly, and contrary to expected, there was no evidence that the participants using concurrent verbalisation were influenced by the awkwardness or private nature of the information they were verbalising about. This indicates that this is not an issue that affects the test situation or the participant. It is however unclear if this goes for other settings and it would be interesting to explore if, what can be described as sensitive settings, influence the suitability of verbalisation. However, this requires a definition of what makes a sensitive setting, such as surroundings, participants etc., and then identifying application areas where this could pose a problem. Acknowledgements. The research behind this paper was partly financed by the Danish Research Councils (grant number 2106-04-0022, the USE-project), without which it would not have been possible. I would also like to thank my supervisor for his continuously constructive comments on the paper. Finally, a thank you to the home healthcare workers of Aars kommune in Denmark, who agreed to participate in this experiment, and to the elderly citizens, who so willingly opened their homes to us.
References 1. Dix, A., Finlay, J., Abowd, G., Beale, R.: Human-Computer Interaction. Prentice Hall, Englewood Cliffs (1997) 2. Duncker, K.: On Problem-solving, in Dashiell, John F.: Psychological Monographs. The American Psychological Association, Inc.vol. 58, pp.1–114 (1945) 3. Ericsson, K.A., Simon, H.A.: Protocol Analysis: Verbal Reports as Data. MIT Press, Cambridge, MA (1993)
516
J.J. Jensen
4. Hackos, J.T., Redish, J.C.: User and Task Analysis for Interface Design. Wiley, Chichester, UK (1998) 5. Molich, R.: User Friendly Systems (in Danish), Teknisk Forlag (1994) 6. Nielsen, J.: Estimating the Number of Subjects Needed for a Thinking Aloud Test. International Journal of Human-Computer Studies 41(3), 385–397 (1994) 7. Nielsen, J., Clemmensen, T., Yssing, C.: Getting access to what goes on in people’s heads? – Reflections on the think-aloud technique. In: Proceedings of NordiCHI, ACM Press, New York (2002) 8. Preece, J.: Human-Computer Interaction. Addison-Wesley, London, UK (1994) 9. van den Haak, M., de Jong, M.D.T., Schellens, P.J.: Retrospective vs. concurrent thinkaloud protocols: testing the usability of an online library catalogue. In: Behaviour and Information Technology, vol. 22, pp. 339–351. Taylor & Francis Ltd, London (2003)
Development of AHP Model for Telematics Haptic Interface Evaluation Yong Gu Ji, Beom Suk Jin, Jae Seung Mun, and Sang Min Ko Yonsei University, 134 Sinchon-Dong, Seodaemun-gu, Seoul, Korea {yongguji, kbf2514jin, mjs, sangminko}@yonsei.ac.kr
Abstract. These days, the main focus in developing telematics systems is to promote safety by decreasing the workload of the driver. To achieve this goal, simplification of the interface as well as the resolution of GUI interaction problems must be worked on. For this research, objective and quantitative assessments are provided in the early steps of building the haptic interface model. The purpose of this research is to create an evaluation model that uses the Analytic Hierarchy Process (AHP) method to fulfill user requirements. This research developed an AHP evaluation model that can present recommendations, as well as the degree of importance, for haptic interface design with quantitative assessments of the prototype by finding out the absolute and relative importance for evaluation groups and factors in early design levels using AHP. Keywords: Analytic Hierarchy Process, Haptic Device, Haptic Interface, Telematics.
Therefore, simplifying complex GUI interactions to reduce driver workload and secure driver safety has become an important issue for the next generation’s telematics systems. It is necessary to introduce new interaction methods to overcome setbacks in some parts of speech recognition and touch screen technology. Multifunction display and device control using a mental model would be a solution. Haptic interface will be an important part in intelligent vehicles, and drivers will easily manipulate devices using a haptic interface [1]. Moreover, using the haptic interface will help us obtain the core technology that can affect the future market of intelligent vehicles. If tactile feedback is supported in telematics devices, it would greatly reduce the chances of malfunctions in telematics systems caused by overload during driving. Furthermore, the tactile feedback provides drivers with useful information on the functions of telematics. It will contribute to the reduction of driver distraction to further guarantee driver safety. While the previous haptic interface model, equipped with difficult and complex manipulations, has been inefficient, the new telematics device will support efficient and instinctive manipulation using combinations of tactile feedbacks [11]. Due to the lack of quantitative evaluation in early design steps, customer needs have not been reflected enough in the haptic device. In this study, to reflect customer need properly and to offer an effective evaluation method, we developed the Analytic Hierarchy Process (AHP) evaluation model for Multi Criteria Decision Making. In the AHP evaluation model for an early prototype using hierarchical analysis, quantitative values of each evaluation factor were computed [14]. As a result, an objective and quantitative evaluation model for a usercentered haptic device design was developed. It will be useful to both drivers and developers.
2 Literature Review Previous studies about haptic interface and device evaluation were researched to collect the evaluation factors of the AHP model. As human-machine interaction within vehicles is starting to get more and more complex, an advanced interface is needed. Interaction types like haptic interface, touch screen, and voice control have been developed to manipulate multi-functional systems, and interaction types were compared in usability studies [2]. To study the relationship between practical applications and the haptic interface’s hardware/software, user perception and motor control in haptic mode were evaluated. E. Kirkpatrick (2002) researched the requirements of hardware/software for haptic interface [6]. The study on how to design a haptic user interface and application was performed by Steven Wall (2006) [15]. In this study, a heuristic guideline for improving usability was presented. The heuristic guideline was acceptable to users. Also, relations among the haptic device, human perception, and computer application were analyzed. As a result, the haptic interface to improve interaction between the user and computer was proposed. A study of interface attributes of navigation in vehicles shows that controllability—like speed and accuracy—as well as ease of use are highly related to user satisfaction in car navigation systems. So, Robert (2003)
Development of AHP Model for Telematics Haptic Interface Evaluation
519
researched device design that considers users who have no experience manipulating complex devices [12]. E. Kirkpatrick (2001) researched predicting the performance of usability in the haptic environment by measuring the necessary time to perceive the shape of a physical object [7]. This study offered ways to improve user performance by using tactile feedback from the haptic interface. Mark Evans (2005) researched the usability evaluation study of tactile feedback devices to improve interactions in a virtual environment [10]. In this study, the limitations of commercial haptic feedback were described, and a guideline for haptic feedback to develop new products was provided. P. Richard (1999) evaluated controllability and accuracy related to object control using haptic feedback to evaluate user performance of haptic interaction in a virtual environment [13]. As a result, task performance decreased when visual, auditory, and tactile feedback were offered in a simple task. According to the study, multiple feedbacks are not necessary, and only one feedback is more effective in a simple task. Also, Camilla Grane (2005) compared haptic information, graphic information, and integrated information (haptic and graphic) in a simple menu selection, and Dario D. (2005) measured the effectiveness of different input methods in driving [3][4].
3 Methodology To develop the AHP evaluation model for evaluating the haptic interface, the preceding study on haptic interface was reviewed and then evaluation indexes were extracted. After reviewing, 25 evaluation indexes were finally selected and classified with 7 evaluation groups by factor analysis. Then, using AHP, each index’s weight was generated and the hierarchical structure of evaluation indexes was organized. 3.1 Selection of Evaluation Indexes The literature review was used to generate evaluation indexes, which will be used in the AHP evaluation model. Based on this review, and on studies about general device evaluation indexes or haptic interface, 50 usability evaluation indexes were extracted, including previous studies’ haptic interface evaluation indexes as criteria for usability and functionality of the haptic interface model. These 50 indexes were reviewed twice to make the criteria more subjective and clear. An expert group interview was performed in the first index selection. The criteria of selection, unification, and exclusion were decided in this step. Similar or duplicate meanings of the evaluation indexes were resolved considering the research purpose and characteristic of object. If an evaluation factor was ambiguous, it was not used. Secondly, 25 indexes were finally chosen to evaluate the haptic interface model. The criterions were: scope of definition (inclusion of index’s concept or scope of generality), hierarchical relation among concepts (one index’s concept is the other’s subset), and correlation among concepts (causation or correlation between concepts of indexes). Table 1 shows the 25 indexes and descriptions.
520
Y.G. Ji et al. Table 1. Evaluation indexes Index Learnability Memorability Ease Flexibility Efficiency
Evaluation index definition The way of manipulation should be easy for novices to learn. The manipulation way of device controller should be easy for users to remember once learned. The device controller should be easy for the selection/execution/level control of the function. The manipulation way of device controller should be designed to be connected flexibly for each menu/mode. Device controller should be worked efficiently. (Functions have to be executed by the minimum number of the key operations.) Device controller should be worked effectively. (Functions have to be executed to minimize the workload of users’ hands and brain.) The user’s capacity for tasks should be excellent. The search menu and list and the performance of the task should be manipulated fast. The input information like the selection/execution/control of the level should be delivered to the system exactly through the device controller. Device should provide users with controllability. The feedback of errors, including tactile, visual, and auditory, should be indicated and provided so clearly that users can recognize errors easily. The user’s input error or incorrect operations are prevented in advance. The exit or cancel should be provided in order that users can escape from wrong input or unwanted menus. The current state of the device/system should be given visually. Device controller should be designed robustly so that malfunction or damage is prevented. Device controller should exclude unnatural manipulation so that the overload of the users is minimized. Device controller should be designed considering the common user’s hand size. Device controller’s features, like the shape/form/surface/elasticity/weight/tactile, should be designed to increase the sense of a grip. Device controller should be set up within hand’s radius for action so that users have no difficulty in operation. (The control of the device location should be provided.) The way to manipulate or form type of device controller should attract user’s interest. The complex manipulation for a selection/execution/navigation of the functions should be excluded.
Development of AHP Model for Telematics Haptic Interface Evaluation
521
Table 1. (continued) Simplicity
Cognition Consistency Discriminability
Device controller should be simply manipulated for selection/execution/navigation of the functions (to minimize user’s workload). Device controller should be designed to anticipate how to manipulate and which function to perform. The one that has a similar way to manipulate the function should be designed to be manipulated similarly. The function that has a different manipulation should be designed to use the different manipulation or controller.
3.2 Hierarchical Classification of Evaluation Indexes For the AHP analysis, a hierarchical classification was conducted on the basis of 25 indexes in table 1. Hierarchical classification and grouping related indexes guaranteed more effective and efficient evaluation of the haptic device model. The degree of relation among evaluation factors was assigned by 9 evaluators: 2 points for high relation, 1 point for middle relation and 0 points for low relation. Using factor analysis,
522
Y.G. Ji et al.
the evaluated points were used to organize and group evaluation indexes. Table 2 shows the results of factor analysis along with the degree of relation among evaluation factors. 25 evaluation factors were classified into 7 groups in accordance with the result of factor analysis and terminology was defined to represent each group. The 7 groups were “interaction support,” “function support,” “user support,” “information support,” “device capacity,” “device appearance” and “device control.” These were reclassified into “time,” “manipulation,” and “device” for the haptic model. These grouped evaluation indexes formed a basis of the AHP evaluation model to evaluate the haptic interface. Table 3 shows description about each group. Table 3. Grouping of evaluation indexes
Device
Manipulation
Evaluation group Definition Index Interaction Evaluation index group related to Learnability Support interaction between user and device for Memorability controlling the device Ease Function Evaluation index group related to Flexibility Support device function Efficiency Effectiveness User Evaluation index group related to user’s Performance Support usability to perform a given task Fast Accuracy Controllability Information Evaluation index group related to Feedback Support feedback or information about the state Prevention of the device Recoverability Visibility Device Evaluation index group related to Durability Capacity duration and capacity in design the Safety hardware Device Evaluation index group related to the Size Appearance physical features like shape, size, and Familiarity arrangement of the device controller Arrangement Attractiveness Device Evaluation index group related to Complexity Control manipulation way for selection and Simplicity operation of function using the device Cognition controller Consistency Discriminability
3.3 Analytic Hierarchy Process Evaluation Model The hierarchical structure generated from the factor analysis on evaluation indexes indicates the index’s structural level of the AHP evaluation model. To calculate the degree of significance, a cross comparison of parallel indexes was conducted, and the relative comparative value was collected on 10 evaluator’s checklists. With this value, we generated the Eigen value of the evaluation index, which was used to deduce the relative significant value of the criterion indexes. Expert-choice, as an AHP tool, was used to generate the absolute significant value by the upper criterion’s significance.
Development of AHP Model for Telematics Haptic Interface Evaluation
523
Fig. 1. Hierarchical structure of evaluation indexes (Analytic Hierarchy Process evaluation model)
4 Results The weights of the evaluation factors of the haptic interface were generated in the AHP model. The evaluation factors’ weights were divided into local and global results. Local result refers to the importance of the evaluation factors in each group, and global result refers to the importance of the evaluation factors for the whole model. Table 4. Local results Evaluation group
4.1 Results of Local Through analysis using the AHP model, the values of local and global section’s results were generated. Local section’s results refer to the importance between the evaluation group and evaluation factors. Table 4 shows the comparative importance between the evaluation group and evaluation factors. In the haptic interface evaluation model, the “Manipulation” group is more important than the “Device” group. Also, in the Manipulation group, “Interaction Support” was most important. “Device Control” was most important in the Device group. This shows that the manipulation method and offered information to the driver is more important than the device appearance. 4.2 Results of Global The value of the global results refers to the importance from the evaluation factors to the haptic interface model. Table 5 shows the values from each evaluation group and factors to the haptic interface model. In the haptic interface evaluation model, “Interaction Support,” “Information Support” and “Function Support” were most important among 7 evaluation groups, and “Memorability,” “Ease,” “Durability,” and “Learnability” were most important among the 25 evaluation factors. Table 5. Global results Evaluation group
Development of AHP Model for Telematics Haptic Interface Evaluation
525
As a result, developing simple interaction method with haptic device that is easy to learn to use, and supporting feedback and information about system status will make effective and efficient haptic device. Consequently, considering core factors of haptic device will improve its usability.
5 Conclusion In this study, we offered a priority-based, quantitative evaluation generated from statistical analysis using qualitative evaluation from experts in early design steps of the haptic interface model. In conclusion, a developer can design a user-centered haptic device with important considerations, and a heuristic evaluation for a haptic interface’s prototype is possible using the AHP model. This will have a great impact on the advancement of haptic interface design and improvement.
References 1. Marcus, A.: The next revolution: vehicle user interfaces. Interactions, 11(1) (2004) 2. Rydström, A., Bengtsson, P., Grane, C., Broström, R., Agardh, J., Nilsson, J.: Multifunctional Systems in Vehicles: A Usability Evaluation. In: Proceedings of CybErg 2005, The Fourth International Cyberspace Conference on Ergonomics, Johannesburg, International Ergonomics Association Press (2005) 3. Grane, C., Bengtsson, P.: Menu Selection with a Rotary Device Founded on Haptic and/or Graphic Information. In: Proceedings of the First Joint Eurohaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, IEEE (2005) 4. Salvucci, D.D., Zuber, M., Beregovaia, E., Markley, D.: Rapid Prototyping and Evaluation of In-Vehicle Interfaces. In: CHI 2005, Portland, Oregon, USA (April 2-7, 2005) 5. Electronics Information Center: Market trend of Telematic. Knowledge Research Group report (November 2003) 6. Kirkpatrick, E., Douglas, S.A.: Application-based Evaluation of Haptic Interfaces. In: Proceedings of the 10th Symp. On Haptic Interfaces For Virtual Envir. & Teleoperator Systs. IEEE, New York (2002) 7. Kirkpatrick, E., Douglas, S.A.: A Shape Recognition Benchmark for Evaluating Usability of a Haptic Environment. In: Brewster, S., Murray-Smith, R. (eds.) Haptic HCI 2000. LNCS, vol. 2058, pp. 151–156. Springer, Heidelberg (2001) 8. Gartner Inc.: Automotive Telematics Overview and Forecast (2002) 9. Isaksson, J., Nordquist, J.: Evaluation of haptic interfaces for in-vehicle systems. IEA (2003) 10. Evans, M., Wallace, D., Cheshire, D., Sener, B.: An Evaluation of Haptic Feedback Modelling during Industrial Design Practice. Design Studies 26(5), 487–508 (2005) 11. Payette J., Hayward V., Ramstein V., Bergeron D.: Evaluation of a Force Feedback (Haptic) Computer Pointing Device in Zero Gravity. In: Proceedings ASME Dynamics System and Control Division, DSC-vol. 58 (1996) 12. Llaneras, R.E., Singer, J.P.: In-Vehicle Navigation Systems. In: 2nd International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design (2003)
526
Y.G. Ji et al.
13. Richard, P.: Dextrous Haptic Interaction in Virtual Environments: Human Performance Evaluations. In: Proceeding of the IEEE, International workshop on robot and human interaction, pisa, Italy (September 1999) 14. Saaty: The analytic hierarchy process: planning, priority setting, resources allocation. McGraw-Hill (1980) 15. Wall, S., Brewster, S.: Design of haptic user-interfaces and applications. Virtual Reality 9, 95–96 (2006)
How to Make Tailored User Interface Guideline for Software Designers Ilari Jounila University of Oulu P.O. Box 3000, 90014 University of Oulu, Finland [email protected]
Abstract. A large numbers of user interface guidelines and patterns have developed by different researchers. These patterns and guidelines are, however, either too generic or too specific to use. In addition, a multitude of guides cause problems to find and use them effectively. Because of these problems, using different guides are not enough useful e.g. for software designers. This paper describes experiences and findings of a case study project. As a result of an iterative development process, the tailored user interface guideline is presented. Other result was that the guideline was well received by the software designers. Keywords: User interface guidelines, software designers.
There have been several tools to make usability guides more accessible through different systems. For example, the tool that provides the management of multiple guidelines via Internet or locally [8]. This paper approach is case study in the project, which purpose is creating tailored and useful user interface guideline for software designers.
2 Related Work The case project, Kätsy, consisted of our usability research group and the company, whose business activity is software development. The company contacted our usability researchers because of their growing interest in usability issues. In the first meeting, the company listed their interest of usability, such as usability knowledge generally, and existing methods. They had also expectations of two issues: usability evaluation of their existing web-based Content Management System (CMS) as well as short-term and long-term advantages of usability for the company. The company had developed the system without knowledge of usability; e.g. their product development process had not included participation of end users at all. The CMS is consisted of several modules such as content production, content editing, content updating etc. A new version of the CMS was under development, and therefore the company had interested in how to improve the usability of the CMS before releasing it. Improving the usability of the CMS was consisted of several parts. First of all, our usability research group was educated by the company for a few hours to use the system. We also organised a workshop together with the company, in which user groups of the system and their typical tasks were identified by using a persona method. Based on a selected persona, usability test tasks were designed with our usability research group and company’s software designers. After the workshop a specified test environment was constructed for the purpose of the usability tests by the company. Defining the current state of the system’s usability was started by using expert evaluation. Expert evaluation was used, when the usability research group familiarised themselves with the use of the test environment. Also four usability test sessions, each with one participant, were conducted for gathering user feedback of the system. Three participants were end users of the tested product and one had used of the same kind product earlier. Test facilitator and observers were all from our research group. Test participants were briefed shortly to the test session situation and they filled out a user profile questionnaire. After that they had about five minutes to familiarise themselves with the system before beginning the session. The think aloud method was used during the all sessions. Each of the four test sessions took about one hour and was recorded. Finally, an after test questionnaire was filled out by each participant Findings from the all test sessions were analyzed and combined to the test report. The test report consisted of the usability problems of the system mostly, but also good design solutions were included. The test report and a test video run-through were introduced to the company in a test report meeting. The company was very satisfied with the results and provided solutions. However, one thing came up in the meeting: the company needed some
How to Make Tailored User Interface Guideline for Software Designers
529
concrete guides and rules for the aid of user interface development. At the end of the project, our usability research group produced two deliverables: a content analysis document based on the expert evaluation and the findings of the usability test sessions as well as the user interface guideline document based on a literature research, the expert evaluation, and the findings of the usability test sessions.
3 Development Process of the Guideline Findings from the usability tests and the expert evaluation, and the software development company’s needs were all as a starting point to develop the user interface guideline. Using generic design guidelines and design patterns were basic aspect to illustrate bad and good design solutions of the system. With generic guidelines in this study means guidelines provided e.g. in text books such as GUI Bloopers [5], but also ISO 9241 standard with parts 10 to 17 [4] and Research-based Web Design and Usability Guidelines [6]. In addition, the design patterns, such as Patterns for Effective Interaction Design by Tidwell [11] and Interaction Design Patterns by van Welie [12] were also included in this study. A linguistic form of the guideline was Finnish because of a small-sized national organisation. The guideline was developed iteratively based on empirical findings. The first version of the guideline was implemented as text and picture examples of the generic design guidelines and design patterns. However, the use of this kind of general examples included some problems. It was not easy to understand and use too general text descriptions and examples of pictures. Company pointed out that they preferred less general pictures of literature, and more picture examples of their own system. After three iteration rounds and three meeting discussions (like evaluation sessions), the guideline was accepted by the company. Depending on the occasion, 4 to 5 researchers and two persons from the company, software designer and development manager, participated the all meeting sessions. In these sessions, all of the participants followed the content of guideline from wide-screen. The user feedback was gathered through the observations done by the researchers, and discussions with the company. Final version of the guideline provided 28 different individual guides. Each of them included text based description of the problem or good design solution, sample picture(s) of the company’s own system only, and also recommendation of the better design solution or mention of already good design solution in the existing system. 3.1 Iteration 1: Preliminary Version The first iteration started with literature research of existing guidelines, principles and patterns. Some exclusions were made because of a large numbers of different guideline collections were found. It was decided to include only well-known guideline and pattern collections into this guideline (e.g. GUI Bloopers [5], ISO 9241 standard [4], Research-based Web Design and Usability Guidelines [7], Patterns for Effective Interaction Design by Tidwell [11], and Interaction Design Patterns by van Welie [12]). A purpose of this iteration was also to find only such guidelines that were not followed by the CMS.
530
I. Jounila
The first version of the guideline was produced as text and picture examples of the original generic design guidelines and design patterns. This preliminary version consists of only four guides because of needs to get user (company) feedback before doing too much time-consuming work. Figure 1 shows an example of the individual guideline. Structure of the guideline was presented as the title, the description how to would have to design, and after all an example picture. The title (number 1 in the fig 1) was translated in Finnish but also English analogue was included. Textual description (number 2 in the fig 1) was translated in Finnish only from the original source. The example picture was included in it original form. This example was found from Research-based Web Design and Usability Guidelines [6].
Fig. 1. An example of the first version of the guideline. (1) Title of the individual guideline, (2) longer description of the followed guide (translated from English into Finnish from original source, (3) an example picture from original source. Numbers (1-3) were added into this picture to clarifying structure of the example.
The first version of the guideline was presented in the meeting with company. The use of these version general examples included some problems. It was not easy to understand and use too general guidelines with general examples of pictures. Company pointed out that they preferred less general pictures of literature, and more picture examples of their own system. Other feedbacks were that the guideline should be concrete and logical wholeness. Also more individual guides should be included in the next version.
How to Make Tailored User Interface Guideline for Software Designers
531
3.2 Iteration 2: Restructured Version After the feedback of the preliminary version, it was then looked at how to create the content of guideline logically. The second version of the guideline was implemented of the same kind than the first version but using the picture examples of the company’s own system. Also, a structure of an individual guideline was changed more clearly and logical with numbered guides. Each of the individual guideline followed the same structure: (1) A title of the guideline in numbered order, (2) a description of the followed guideline, (3) a description of the founded problem, (4) a description of the proposed solution, (5) an example picture. In this version, the number of individual guideline increased. The restructured version consists of eight guides and four proposal guides only with the title. The guideline was still proposal level in this iteration because of needs to get company’s feedback before continuing its development. Figure 2 shows an example of the individual guideline of the second iteration based on ISO 9241-12 standard [4].
Fig. 2. An example of the second version of the individual guideline: Initial position for entry fields. (1) The numbered title, (2) description of the followed guide, (3) founded problem in the company’s system, (4) proposed solution, (5) an example picture from the company’s system. Numbers (1-5) were added into this picture to clarifying structure of the example.
The second version of the guideline was presented in the meeting with company. This style of implementation was found quite clear and illustrative by the company. However, the company proposed to increase the number of the individual guideline, e.g. pop-up menus, wizards, and error messages. After this iteration, it was concluded also, that the structure of the individual guideline should specify more precise.
532
I. Jounila
3.3 Iteration 3: Superfine Version The primary focus of the third iteration was to increase examples of different user interface elements. Instead of only bad solutions, it was also included examples of well implemented solutions of the system into the guideline. Also, the structure of the individual guideline was revised. This version of the guideline consists of an abstract, an introduction, a table of content and several chapters of guidelines. The guideline included 28 individual guides and 38 pages (using MS Word). The structure of an individual guideline was changed again more clear and simple. Each of the individual guideline followed the same structure: the title, an example of the system with problem description and picture, and guideline/solution for the problem. Figure 3 shows an example of the individual guideline of the second iteration based on ISO 9241-13 standard [4].
Fig. 3. An example of the final version of the individual guideline: “Error prevention and error messages”. (1) The numbered title, (2) an example of the company’s system including the problem and picture, (3) guideline/solution for the problem. Numbers (1-3) were added into this picture to clarifying structure of the example.
This version of the guideline was very well received and appreciated by the company. The representatives of the company commented in the meeting (translated in English): “[the document is] concrete guideline”, “[this guideline is] superfine because of using examples of our own system”, “we will also go through [the guidelines] with our business partner” and “the guidelines will ensure of basic quality of usability to our company”.
How to Make Tailored User Interface Guideline for Software Designers
533
The third iteration version of the guideline was accepted by company. Only some misspelling had to correct into the final deliverable version.
4 The Tailored Guideline The final version of the guideline is concrete presentation for a specific user interface design in a specific company. In this project, the deliverable form of the guideline was a Word-document by the email and also same document in a paper printout. Of course the deliverable form depends on the users needs. The tailored guideline supports development of user interface design in a small-sized software development company. 4.1 Structuring the Model of Guideline Proposed structure of guideline consists of the title with identified number, an example of the user’s own system with description of the problem/well-designed solution as well as the picture, and also short description of general guideline and solution to the specific problem (solution not needed if the example is well-designed). Figure 4 shows the simple model of the structure.
Fig. 4. A proposed simple model of an individual guideline
4.2 Criteria for Making the Tailored Guideline Proposed criteria when making the tailored guideline: 1. Mode of the generic guidelines have to be changed toward the close by user (examples have to be from users own system including the description of the problem with picture and guideline/solution how to correct it) 2. Guidelines have to be concrete 3. Guidelines have to be enough extensive but not too long
534
I. Jounila
4. Guidelines have to include bad solutions of the system as well as well designed solutions 5. Deliverable form has to decide as case-specific 6. Iteration needed when developing a tailored user interface guidelines
5 Discussion One of the case project ideas was to provide long-term usability knowledge for the case company. Because of this, the tailored user interface design guideline was developed for a case company, although Mosier and Smith suggested that it is not appropriate to make specific guidelines [10]. However, this study seemed that specific guidelines are needed at least for small-sized company. Due to a large numbers of general guidelines, finding the right guidelines for the specific needs causes problems for software designers. Also, the general guidelines are often too general for using in a specific context. The tailored individual guideline was built with a simplified structure with quite short length. The simplified structure was also supported by the expectations of the company. An important thing is to use examples of the company’s own system to describe the problems. Other important thing is that the picture examples in generic guidelines confused developers. This is the reason why should have to use sample pictures only of their own system. It was found that the form of deliverable is casespecific. Deliverable could be a paper document, a Word-document or a Web-page etc. Thus the specific tools are not needed. In this research was found, that developing tailored user interface guideline is timeconsuming due to the development bases on the findings of product education by the company, requirement specification workshop, expert evaluation, and four usability tests. Also, a large numbers of existing general guidelines caused challenges to find appropriate general guidelines to this work. However, the discussions with the company were useful between iteration rounds to decide things to be included into the guideline. Proposed criteria are useful when developing guideline for a company without knowledge of usability issues, but perhaps they could be too restrictive for a company with usability knowledge (e.g. guideline should be substantially more extensive than developed in this work.). A proposed simple model of an individual guideline seemed to work in this project. However, it needs more study in further research. Also, the initial criteria should define more specific. In the future, it would be interesting to see the usefulness of developed guideline in the case project after six to twelve months. This study was a one approach to educate guidelines to software developers with making tailored guideline. Thus, one other future work will include studies on how should educate existing guidelines and patterns for software designers but also for other groups such as students.
6 Conclusions This research concluded that adoption of tailored user interface guideline is more appropriate for software developers than generic guideline collections due to
How to Make Tailored User Interface Guideline for Software Designers
535
understandability and expression. The most important thing is to included examples of developers own system into the guideline. Acknowledgments. I would like to thank the Kätsy project for providing a research environment. I also thank Dr. Timo Jokela, Kari-Pekka Aikio, Niina Kantola, and Mauri Myllyaho for feedback and comments on developed guideline. In addition, this work would not be possible without software designers at software development organization.
References 1. Apple Computer Inc.: Apple Human Interface Guidelines (2006) Last accessed 2007-0215, http://developer.apple.com/documentation/UserExperience/Conceptual/OSXHIGuidelines/ index.html. 2. Deng, J., Kemp, E., Todd, E.G.: Managing UI pattern collections. In: Proceedings of the 6th ACM SIGCHI New Zealand chapter’s international conference on Computer-human interaction: making CHI natural, July 07-08, 2005, Auckland, New Zealand, pp. 31–38 (2005) 3. Henninger, S., Haynes, K., Reith, M.W.: A Framework for Developing Experience-Based Usability Guidelines. In: Proceedings of DIS ’95, pp. 43–53. ACM Press, New York (1995) 4. International Standards Organization: ISO 9241: Ergonomic requirements for office work with visual display terminals. Geneva, Switzerland (1999) 5. Johnson, J.: GUI Bloopers: Don’ts and Do’s for Software Developers and Web Designers. Morgan Kaufmann, San Francisco (2000) 6. Koyani, S.J., Bailey, R.W., Nall, J.R.: Research-based Web Design and Usability Guidelines, Dept. of Health & Human Services, National Institutes of Health Publication 03-5424, National Cancer Institute, Washington, DC (2006) Last accessed 2007-02-13 http://www.usability.gov/pdfs/guidelines.html 7. Mariage, C., Vanderdonckt, J.: Creating Contextualised Usability Guides for Web Sites Design and Evaluation. In: Proceedings of 5th Int. Conf. on Computer-Aided Design of User Interfaces CADUI’2004 (Funchal, 12-16 January 2004), Kluwer Academics, Dordrecht (2004) 8. Mariage, C., Vanderdonckt, J., Pribeanu, C.: State of the Art of Web Usability Guidelines. In: Proctor, R.W., Vu, K.-P.L. (eds.) The Handbook of Human Factors in Web Design (Chapter 41), Lawrence Erlbaum Associates, Mahwah (2004) 9. Microsoft Corporation: The Windows interface guidelines for software designers. Microsoft Press, Redmond, WA (1995) 10. Mosier, J.N., Smith, S.L.: Application of Guidelines for Designing User Interface Software, in Behaviour and Information Technology, vol. 5(1), pp. 39–46 (JanuaryFebruary 1986) 11. Tidwell, J.: Designing Interfaces – Patterns for Effective Interaction Design (2006), Last accessed 2007-02-13. http://designinginterfaces.com/ 12. Welie, M.v.: Patterns in Interaction Design (2001) Last accessed 2007-02-13, http://www.welie.com/index.html
Determining High Level Quantitative Usability Requirements: A Case Study Niina Kantola and Timo Jokela P.O. Box 3000 90014 Oulu University, Finland {niina.kantola, timo.jokela}@oulu.fi
Abstract. High-level quantitative usability requirements were determined for a public health care system. The requirements determination process was iterative, and the requirements were refined step-by-step. The usability requirements are categorized first through the main user groups, then by the services, and finally by specific usability factors. Keywords: Usability requirements, requirements.
health
care
systems,
quantitative
1 Introduction It is generally agreed as a good project management practice to define quantitative requirements for system quality characteristics. Quantitative, measurable quality requirements provide a clear direction of work and acceptance criteria for a development project. In practice, usability requirements are quite seldom among the quantitative requirements in development projects. One of the consequences of not defining usability requirements is that other objectives dominate and usability is considered only as a secondary objective of a project. The obvious consequence is a product with usability problems. Our case study is a system development project where the city Oulu is the purchaser of a system, and a consortium of two software development companies will develop the system. The system-to-be-developed is a healthcare system. The goal is that is will be extensively used by the citizens. Also the healthcare professionals (doctors, nurses, etc.) would be naturally users of the system. To make usability a true issue in the project, it was decided in the beginning of the project that measurable level high usability requirements be determined. In this paper, we present how we approached usability requirements determination in order to define them at a high level of abstraction but still in a measurable way, and what were the results.
Determining High Level Quantitative Usability Requirements: A Case Study
537
are we measuring? How many do we measure? How do we present the measures? What measures do we take? For example these kinds of topics were discussed in the special issue on measuring usability of Interactions magazine (Interactions Nov + Dec 2006). According to [5] discussions have recently recurred on which measures of usability are suitable and on how to understand the relation between different measures of usability. Literature recognizes several usability attributes that can be measured. ISO 924111 [6] defines usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use”. In the early phases of development measures of effectiveness, efficiency and satisfaction should be selected, and acceptance criteria based on these measures established. These attributes are generally measured on different scales such as task completion rates, average time to task completion and average task satisfaction scores. Acceptance criteria may include separate definitions of the target level and the minimum acceptable level [12]. It is also possible use different scales, for example worst, planned, best and current levels [15]. Other measurable usability attributes may include attributes such as learnability, memorability, error, affect, helpfulness and control [11, 9]. There are also attempts to standardise traditional usability metrics on a uniform scale (e.g. [14]). Several measurable usability attributes exists, but however, there are no clear guidelines how the determination of measurable, quantitative, requirements should be organized and managed. Jokela [7] points out that the existing literature mainly focus on describing and exploring the concepts and formats related to the definition of usability and the contents of usability requirements document. Some guidelines are presented for example by Wixon and Wilson [16], Nielsen [11] and Mayhew [10]. Further, there exist only very few empirical research reports on quantitative usability requirements methods in practice. One of the few reports is by Bevan et al. [3] who conducted case studies on quantitative usability evaluations following the Common Industry Format for usability testing, CIF [1]. However, the methodological aspects are not discussed in detail in his report. Jokela et al. [8] describe a case study where quantitative requirements played a key role in the development of a new user interface of a mobile phone. Because of limitations of existing methods, they developed tailored methods for determining and evaluating quantitative usability requirements. To support the process of defining usability requirements and usability criteria, the Working Group sponsored by National Institute of Standards and Technology (NIST), has recently developed a Common Industry Specification for Usability–Requirements (CISU-R). It is still in a draft form, but it aims to define the content of usability requirements, rather than requiring a specific process by which they are gathered [13]. CISU-R has three parts: the context of use, usability measures and the test method. Scenarios of use play an important role in the process. In the first part scenarios are used to specify how users carry out their tasks in a specified context. In the second part usability measures are provided for defined scenarios of use. Such task (scenarios) should have to be selected that are the most frequent or most critical to the business or user [13]. Determining whether the quantitative requirements have been achieved can be done through a usability test. Also user preference questionnaires provide a subjective
538
N. Kantola and T. Jokela
metric for the related usability attribute. Usability can be quantitatively evaluated with theory-based approaches such as GOMS and keystroke level model, KLM, too [4]. Hornbæk [5] has reviewed current practice in how usability is measured. He has also analysed problems with the measures of usability employed. According to his analysis such problems include for example such: measures of the quality of interaction are used only in a few studies; measures of learning and retention of how to use an interface are rarely employed; measures of users’ satisfaction with interfaces are in a disarray and validated questionnaire readily available are ignored. Based on his review, he proposes several challenges with respect to measuring usability [5].
3 Flow of the Case Study The case study is a health care system, the aim of which is to provide web-based health services to citizens of the city. Because the use of the services will be based on the voluntariness of the citizens, usability is a key success criterion for the system. Therefore, it is essential to explicitly define the usability requirements for the system. A typical way of defining usability requirements is by user task performance. Because there typically are many kinds of users and many user tasks, the number of user requirements easily is quite big. The number of requirements should not be too many, but still well depict the required usability. In practice, the requirements were determined in a qualitative and iterative way. The steps were: 1. The available documentation of the project was examined. 2. The key persons behind the project idea were interviewed; these persons were at various managerial positions in the city and the at the development companies. The goal of the interviews was to get an understanding of the planned use of the system, and the critical success factors. 3. Thereafter, an interpretation of the interviews was made, and the first proposal of usability requirements was produced. The requirements included (1) three main user categories and (2) two to three usability requirements for each category. 4. This first set of requirements was presented in a small working group of the project steering group. The discussion revealed that some updates were needed. For example, the appropriate number of user categories should be four (not three) 5. Based on the feedback from the working group, the requirements were revised. 6. The revised set of requirements, then, were presented to a larger steering group of the project, with larger number of participants. Another problem in the requirements was noticed in the meeting. An additional requirement was added to the requirements “on-line”, and thereafter the requirements were approved. In summary, the research method was an iterative and constructive process. An artifact (the requirements) was constructed, based on usability experience and on the data that had been gathered through documentation and interviews. The artifact was evaluated and refined two times before being accepted.
Determining High Level Quantitative Usability Requirements: A Case Study
539
4 Result: The Usability Requirements As a result, we have defined a set of usability requirements. The requirements identify main user categories, and a set of quantitative usability requirements (= 1 to 4 requirements) is defined for each category. The main user categories are: • “Customers”; i.e. the citizens of the city • “Professionals”; i.e. the healthcare personnel of hospitals and health centers. In the following, we discuss the usability requirements for each two main user categories separately. The “Customer” Category In the “Customer” category, the following main services of the health care system were identified: • Proactive healthcare: information about proactive healthcare issues, such as weight control, nutrition, and physical training • Occasional healthcare problems: information for self-assisted care (how to act in occasional healthcare problems such as occasional fever, flu, small accidents, etc.) • Chronic diseases: support for self-assisted care for diseases such as diabetes, asthma and arterial hypertension Four main users groups within the “customer” category were identified: • • • •
Parents of small children Young (teenagers, students) Adults Seniors
The relationship between the services and the users groups are illustrated in Table 1. One can find that the single most important service/user group segment is “chronic diseases” service used by “seniors”. On the other hand, for example, chronic diseases are quite seldom in younger user groups. Table 1. Services and the users of the “Customer” category of the healthcare system. The higher number, the more important user group. User category Parents of small children Teenagers, students Adults Seniors
Proactive healthcare
Occasional healthcare Chronic diseases problems
1
2
1
1
2
1
2
2
2
2
1
3
540
N. Kantola and T. Jokela
The representative user groups for each service were chosen to be the ‘demanding’ ones (= if these user groups can use the service, then one can assume that the other user groups can use it, too)1: • Proactive healthcare: adults (including parents of small children) • Occasional healthcare problems: adults (including parents of small children) • Chronic diseases: seniors The usability requirements for the service “Occasional healthcare problems” and “chronic diseases” are as shown in Table 2 and Table 3. One can see, for example, the importance of positive first-time usage in the “Chronic disease” service: 9 users out of 10 should be able to use the system and have a positive first experience.
Description The user finds instructions to those typical sicknesses and accidents that one can care by him or herself (a separate specification of those sicknesses and accidents exists)
Goal 50% of users find instructions without contacting the health care personnel
Measuring means Usability tests Post-release follow-up
Description The user needs to experience the system useful and easy to use
Goal Measuring means 9 users out of 10 can Usability test perform the routine tasks related to his/her sickness correctly, and find the experience positive The users should 9 users out of 10 Follow-up studies continue to use the regularly use the system on daily basis system
The service “Proactive health care” was identified as a separate service only later, and the goals and measuring means have not been determined yet, Table 4. 1
It is assumed, however, that the users have used internet (web).
Determining High Level Quantitative Usability Requirements: A Case Study
541
Table 4. Usability requirements: Proactive health care Criterion Easy-to-find programs
First time usage Every day use
Description The user finds instructions to proactive health care programs, as appropriate to him/her Taking the system must be very easy The users should continue to use the system on daily basis
Goal
Measuring means
The “Professionals” Category Several user groups were identified within this category: • • • •
Doctors Nurses Public health nurses Other personnel
The usability requirements for these different user groups, however, were consolidated into one table, Table 5. At this stage, it was neither found necessary to define the requirements separately for different services. Table 5. Usability requirements: Professionals Criterion Learnability
Description Can be learnt without training
Efficiency
The users need to be able to quickly carry out time-critical tasks
Subjective satisfaction
Pleasant to use regularly
Goal 9 experienced professionals out of 10 learn how to correctly carry out the routine tasks without training Users can carry out time-critical tasks (which need to be identified) within the pre-defined time limits 9 users out of 10 rates the system 1 point (scale 1…7) more pleasant to use than a reference system (= a system widely used in hospitals and health care centers).
Measuring means Usability test
Usability test
Satisfaction measurement questionnaire
542
N. Kantola and T. Jokela
5 Conclusions In this study, a natural way of determining the requirements was first through user categories. This is probably not very surprising – “who are your users” is the key question when designing usability. In all, the quantitative usability requirements determined in this study fall into the following hierarchical categorization. • First by the main user groups (“Customers”, “Professionals”) • Then by the services (“Occasional health problems”, “Chronic diseases”, etc.) • Finally by specific usability factors (“Learnability”, “Subjective satisfaction”, etc.) We find that the usability requirements determined in this study have some new features: • The overall idea of determining ‘high level’ usability requirements. The requirements outlined in section 0 are defined in a quantitative but abstract level at the level of services. User task based usability requirements could be determined without detailed user tasks analysis. The ‘routine tasks’ are not determined at this stage, and need to be determined later. • This kind of hierarchical categorization is, in our knowledge, quite new. Typically usability requirements are “just” a set of individual requirements [2]. • The types of appropriate usability requirements for different users are quite different between the different services and user groups.
6 Discussion of Results In this study, preliminary quantitative usability requirements for a public health care system were determined. Overall, this study is one of the few case studies on quantitative usability requirements. A meaningful set of quantitative, high-level usability requirements could be determined – which really was not obvious in the beginning of the research. The requirements determination process was iterative, and the requirements were refined step-by-step. The usability requirements are categorized first through the main user groups, then by the services, and finally by specific usability factors. As research contributions, we find (1) the idea of having high level usability requirements determined at the level of services; (2) hierarchical organization of the requirements; and (3) the finding that the types of usability requirements may be quite different for different categories of users and service. One should understand that the approach for defining usability requirements described in this paper is not proposed to be applicable as such for other development contexts. For example, the authors ended up with quite a different set of usability requirements in the context of a development project for a user interface of a mobile phone [8]. Another limitation of this study is that the health care system is still very much under development, and we do not yet have data on the appropriateness and usefulness of the requirements. These issues are the topic of other papers in the future.
Determining High Level Quantitative Usability Requirements: A Case Study
543
For practitioners, the results indicate that the appropriate set of usability requirements in dependent on the specific application and development context. One should try to define requirements such that truly depict the usability of the system or product under development. Research on quantitative usability requirements is quite limited. There is space and need for different kinds of research efforts: from better theoretical understanding to the development of effective practical methods.
References 1. ANSI. Common Industry Format for Usability Test Reports. NCITS 354-2001 (2001) 2. Bevan, N.: Practical Issues in Usablity Measurement. ACM interactions 13(6), 42–43 (2006) 3. Bevan, N., Claridge, N., Athousaki, M., Maguire, M., Catarci, T., Matarazzo, G., Raiss, G.: Guide to specifying and evaluating usability as part of a contract, version1.0. PRUE project. London, Serco Usability Services: 47 (2002) 4. Card, S.K., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale (1983) 5. Hornbæk, K.: Current practice in measuring usability: Challenges to usability studies and research. International Journal of Human-Computer Studies 64(2), 79–102 (2005) 6. ISO/IEC. 9241-11 Ergonomic requirements for office work with visual display terminals (VDT)s - Part 11 Guidance on usability. ISO/IEC 9241-11: 1998 (E) (1998) 7. Jokela, T.: Guiding designers to the world of usability: Determining usability requirements through teamwork. In: Seffah, A., Gulliksen, J., Desmarais, M. (eds.) Human-Centered Software Engineering. Kluwer HCI series (2005) 8. Jokela, T., Koivumaa, J., Pirkola, J., Salminen, P., Kantola, N.: Methods for quantitative usability requirements: a case study on the development of the user interface of a mobile phone. Personal and Ubiquitous Computing 10(6), 357–367 (2006) 9. Kirakowski, J.: The Software usability measurement inventory: background and usage. In: J.P., W., Thomas, B., Weerdmeester, B.A., McClelland, I.L. (eds.) Usability Evaluation in Industry, pp. 169–177. Taylor & Francis, London (1996) 10. Mayhew, D.J.: The Usability Engineering Lifecycle. Morgan Kaufman, San Francisco (1999) 11. Nielsen, J.: Usability Engineering. Academic Press, Inc. San Diego (1993) 12. NIST. Proposed Industry Format for Usability Requirements. Draft version 0.62. 8-Aug-04 (2004) 13. NIST. Common Industry Specification for Usability - Requirements (2006) (Retrieved 16.2.2007) http://zing.ncsl.nist.gov/iusr/ 14. Sauro, J., Kindlund, E.: A Method to Standardize Usability MetricsInto a Single Score. In: Conference on Human Factors in Computing Systems, Portland, Oregon, USA, pp. 401– 409. ACM Press, New York (2005) 15. Whiteside, J., Bennett, J., Holtzblatt, K.: Usability Engineering: Our Experience and Evolution. In: Helander, M. (ed.) Handbook of human-computer interaction. Amsterdam, North-Holland, pp. 791–817 (1988) 16. Wixon, D., Wilson, C.: The Usability Engineering Framework for Product Design and Evaluation. In: Helander, M., Landauer, T., Prabhu, P. (eds.) Handbook of HumanComputer Interaction, pp. 653–688. Elsevier Science B.V, Amsterdam (1997)
Why It Is Difficult to Use a Simple Device: An Analysis of a Room Thermostat Sami Karjalainen VTT, P.O. Box 1000, 02044 VTT, Finland [email protected]
Abstract. A diversity of usability problems with office thermostats were found in a preceding study. In this paper, the reasons behind the problems are studied by analysing a room thermostat. The analysis shows that a substantial amount of information is needed to use a simple thermostat, and the system image of the thermostat does not deliver the information. From the viewpoint of the analysis, it is not surprising that office occupants have serious problems with thermostats. Keywords: thermostat, knowledge, information needs, user interface design.
Why It Is Difficult to Use a Simple Device: An Analysis of a Room Thermostat
545
This paper concentrates on the reasons behind the problems: why is it difficult to use a room thermostat? The paper presents an analysis of a room thermostat and the knowledge a user must have to be able to use the room thermostat with effectiveness and efficiency. The analysis is based on the experiences gained in interviewing 27 office occupants in 13 Finnish offices. Twenty-three of the occupants had a room thermostat in their office. All the room thermostats were non-programmable and simple. The interviewees had been working in their present rooms from one-and-ahalf months to more than ten years, but they still had serious problems with the thermostats in the offices.
2 A Typical Example of a Room Thermostat Many kinds of room thermostats have been designed. Many of them are very complex and it is clear that they can not be used without a manual. This study, however, concentrates on simple room thermostats. I have chosen a typical example of a room thermostat for a closer examination (see Fig. 1). The model presented in Fig. 1 is common in Finnish offices. Several companies manufacture practically similar versions of the room thermostat.
Fig. 1. An example of a room thermostat
The room thermostat has a dial for adjusting room temperature set point. The scale presents no temperature values, but only the symbols "+" and "–". To increase the room temperature, the user should turn the dial to the "+" direction, and to decrease the room temperature the dial should be turned to the opposite direction ("–"). The room thermostat presents a light symbol in the upper right corner of the interface. If the light is red, the system is increasing the room temperature. Correspondingly, a green light means that the system is decreasing the room temperature at the moment. A blank light denotes a stable situation.
546
S. Karjalainen
The room thermostat can be connected to a cooling or heating system, or it can be shared with both systems. Most typically in offices it is connected to a cooling system, for example, to a cooled beam system or a fan convector system. A separate heating system typically exists in Finnish offices. The heating systems typically include thermostatic valves for user adjustment.
3 Information Needs for Using the Room Thermostat The use of the room thermostat (Fig. 1) is analysed in Table 1. The table presents the information needs and possible misunderstandings with the thermostat. It also presents the consequences of the misunderstandings. Table 1. Information needs for use of the room thermostat
1
2
3
4
5
Information needed Correct knowledge for use of the room thermostat What is the purpose It is a user-adjustable of the device? thermostat.
Are office occupants allowed to touch the device? Is the room thermostat active or passive at the moment?
Yes. The room thermostat is for the use of occupants. Depends on the cooling/heating system and the current conditions (e.g. season). What do "+" and "–" "+" means increase mean in the and "–" decrease in interface? room temperature set point.
Possible misunderstanding
Consequences of misunderstanding
The purpose of the device remains unclear. It is not recognised as being for temperature control. The room thermostat is for service personnel only. Passive thermostat is considered to be active, or the other way around.
The room thermostat is not used even in thermal discomfort.
As above.
Use of a passive system leads to dissatisfaction with the system (or to a placebo effect). "+" means increase The dial is turned and "–" decrease in (i.e. the room cooling power. temperature set point is adjusted) to the wrong direction. This may lead to dissatisfaction with the system. How much should There is no clear The adjustable range The dial is turned the dial be turned to answer, because that of room temperature (i.e. the room get the desired effect depends on the may be understood temperature set on room characteristics of completely wrong. point is adjusted) temperature? cooling/heating For example, it may too little or too system and the be thought that the much. This may lead current conditions. room temperature is to unnecessary adjustable with a adjustments and very large range. dissatisfaction with the system.
Why It Is Difficult to Use a Simple Device: An Analysis of a Room Thermostat
547
Table 1. (continued) 6
(After adjusting the thermostat), Has the room temperature changed to the desired level or is it still changing?
A red light means that the system is increasing the room temperature at the moment. The green light means that the room temperature is decreasing. A blank light means a stable situation.
The light is not recognised at all, or the meaning of the light symbol is not understood.
The user may think, for example, that the room temperature has reached the new level, although it is still changing. This may lead to unnecessary adjustments and dissatisfaction with the system.
All the misunderstandings presented in Table 1 are real and are taken from the contextual interviews with the occupants. Most of the misunderstandings are common in practice. The analysis explains that bad user interface design may easily lead to dissatisfaction with the system or disuse of the system. The analysis also shows that a lot of information is needed to use a simple thermostat. From the viewpoint of the analysis, it is not surprising that office occupants have serious problems with the thermostats and that the significance of the thermostats on thermal comfort is low, as it was found [1]. Some of the information (Table 1) can be gathered by trial and error. For example, the meaning of "+" and "–" should be easy to learn by experience. However, it is clear that office occupants need instructions for use of the thermostats. Occupants need to understand, for example, whether the thermostat is connected to a cooling or heating system. If the thermostat is connected to a cooling system, the user of the thermostat needs to know when the cooling system is active. It is, however, unrealistic to suppose that office occupants would spend their valuable time on learning the way in which the building works.
4 Incoherent Mental Models Norman [3] distinguishes three aspects of mental models: the design model, the user’s model and the system image. The designer creates the image of the system, the visible part of the system (the user interface including the labels and the documentation), according to the design model. The user is confronted with the system image. The user acquires all knowledge of the system from the system image, and the user’s model is the way the user perceives the system operates. We can drive a car without an understanding of how it actually works. Similarly, we should be able to use the thermostats with only limited knowledge of the cooling and heating systems. Unfortunately, the system image of the thermostat in Fig. 1 does not deliver the information that is needed to operate the thermostat. The design model is not similar to the user’s model as it should be. The designer has not had a realistic view of the users but has supposed that office occupants have knowledge they do not have in reality. Misunderstandings with thermostats have earlier been reported by Kempton [2]. He analyzed folk theories for home heating control and found two common theories of
548
S. Karjalainen
how a thermostatic valve works: a feedback theory and a valve theory. In the feedback theory a thermostat senses room temperature, but in the valve theory that is not understood and a thermostat dial is like a gas pedal and controls the amount of heat.
5 Improving the Design of the Thermostat The analysis shows that for the successful use of the thermostat the user must have a lot of knowledge. Although room thermostats are common in offices, the office occupants do not have that knowledge. This had lead to a situation where the thermostats are used very little. It is clear that the user interface of the thermostat in Fig. 1 could be improved considerably. At first, the thermostat should clearly present its purpose. Identifiability can be enhanced by symbols that refer to temperature, e.g. a degree sign, a thermometer, or red and blue colours (denoting warm and cool). Many of the problems users have could be avoided by just two more modifications to the user interface. If the thermostat had an understandable temperature scale, there would be fewer problems in adjusting the thermostat. Another main improvement concerns the feedback the thermostat gives after a user adjustment. Users need to know whether the system is working to fulfil the request. The feedback is especially important since the rate of room temperature change is slow, because of the thermal inertia of the building materials and the cooling/heating system itself. Many thermostats do not give any feedback, but the thermostat in question shows a light symbol when the room temperature is changing. However, the light symbols are not intuitively understandable but learning is needed to understand their meaning. The feedback should be presented more clearly (in one way or another), for example, by arrow symbols that show the direction of the temperature change.
6 Conclusion and Future Work Even a simple device can be very difficult to use if the user does not have the information needed for the use of the device. The designers often overestimate the knowledge the users have, and that overestimation leads to usability problems and dissatisfaction with the system or even disuse of the system, which is the case with the room thermostat analysed in this paper. No specific usability guidelines are available for room temperature controls in the literature. In future work I will concentrate on developing such a guideline. Acknowledgments. I thank Raino Vastamäki for the picture of the thermostat in Fig. 1.
References 1. Karjalainen, S., Koistinen, O.: User Problems with Individual Temperature Control in Offices. Building and Environment (In Press) 2. Kempton, W.: Two Theories of Home Heat Control. In: Quinn, N., Holland, D.C. (eds.) Cultural Models in Language and Thought. pp. 222–242. Cambridge University Press, Cambridge (1987) 3. Norman, D.A.: The Design of Everyday Things. Basic Books, New York (1988)
Usability Improvements for WLAN Access Kristiina Karvonen and Janne Lindqvist Department of Computer Science and Engineering, Helsinki University or Technology, P.O. Box 5400, 02015 TKK, Finland {Kristiina.Karvonen, Janne.Lindqvist}@tml.hut.fi
Abstract. Wireless Local Area Networks (WLANs) have become commonplace addition to the normal environments surrounding us. Based on IEEE 802.11 technology, WLANs can now be found in the working place, at homes, and in many cities’ central district area as open or commercial services. These access points in the public areas are called “hotspots”. They provide Internet access in various types of public places such as shopping districts, cafés, airports, and shops. As the hotspots are being used by a growing user base that is also quite heterogeneous, their usability is becoming evermore important. As hotspots can be accessed by a number of devices differing in their capabilities, size, and user interfaces, achieving good usability in accessing the services is not straightforward. This paper reports a user study and usability analysis on WLAN access to discover user’s needs and suggest enhancements to fight the usability problems in WLAN access. Keywords: WLAN, Usability, user interface design, security, accessibility, authentication.
expect their security be in place and privacy protected, and be in control of what information is disclosed of them. A further difficulty to providing easy-to-use WLAN access is caused by the fact that hotspots can be accessed by a number of devices differing in their capabilities, size, and user interfaces. As a result, achieving good usability in accessing the WLAN and the services it can provide is by no means straightforward. In this paper, we will look into the current work done in this area, covering both relevant user studies done on hotspot, as well as other types of WLAN access, such as public WLAN service or private home WLAN access to discover user’s needs and current usage of the access points, and usability work done to enhance the current solutions. We will also present and evaluate the methodologies used to study the usability of the hotspots and other types of WLAN access and the usability issues in the controllability and visualisation embedded in current approaches. The main body of this work consists of a report on a user study and usability analysis conducted for determining the current level of usability of WLAN hotspot access. The work covers a representative selection of earlier usability work done in this area or an area related to it: the relevant user studies, and the usability studies of existing solutions and UIs. It will also look into how various usability methods have been applied to the study of the usability of WLAN hotspot access and discuss on their feasibility. The novelty of the work lies in that it not only covers both the usability of the access points themselves, and the usability issues in some of the most probable end devices to be used to access the WLANs, but also seeks to point out the usability of security issues involved in this access. The paper is organized as follows: First, we will give a short presentation of usability and user-centred design in general in regard to mobile usage situations with small devices. We will proceed by presenting the relevant work done in this area, and discuss the state of the existing usability work. We will then present our user field studies, where we searched for and located publicly visible WLAN access points in several locations in two cities in Finland. Since providing access means dealing with users’ privacy and security issues also, we will complement the analysis by a short discussion on the privacy and usability of security in the area of WLAN access.
2 Background 2.1 Security Issues in WLAN Access The standardized way to secure WLAN access is based on the link-layer: the radio traffic between the access point and the user’s device is encrypted. The first version of the WLAN link-layer security architecture – Wired Equivalent Privacy (WEP) [16] – was proven to be insecure [12] and a working attack tool was quickly implemented and published [33]. Today, there are free easy-to-use attacking tools downloadable from the Internet for e.g. Windows and Linux that can passively break the WEP protection in few seconds [2]. The WLAN vendor community decided to solve the problem without the IEEE standardization body and formed an alliance to correct the problem. The result of the alliance was Wi-Fi Protected Access (WPA) which corrects the deficiencies of WEP. The standard security scheme is IEEE 802.11i [17] which is
Usability Improvements for WLAN Access
551
also known as WPA2. Despite that the WEP is very insecure, it is still widely used for backwards compatibility reasons. Since the radio network is a shared medium, anyone in the proximity (with clear and unobstructed space, the WLAN signal can reach more than 500 meters) of the WLAN network can receive the broadcasted traffic. This allows for a technique called wardriving [8]. The attacker merely drives around the city looking for open or vulnerable networks. The equipments needed are again software downloadable from the Internet [4] and a laptop or even a PDA. A community of security professionals and hobbyists gathered world wide data, which revealed that out of 22 8537 access points found 61.6 % were not using link-layer encryption at all [3]. In addition to link-layer encryption, WLAN networks are secured by two common ways: MAC address based and application layer, usually Web based, authentication. The MAC address based authentication is used to allow only known computers to the network. However, it practically provides protection only against benevolent network visitors. Attackers can just eavesdrop that which MAC addresses can access the network and reconfigure their WLAN devices accordingly. Web based authentication is used to authenticate users to the network. The network might not even use link-layer encryption; instead the authentication process is secured with TLS as are secured Web sites. Before the connection is authenticated, all traffic originating from the unauthenticated WLAN device is forwarded to the authentication web page. On the page, the user is required to give a correct user name and password, and is then given access to the Internet. It is also common to bind the authentication to the particular MAC address of the device that performed the authentication. This results in problems that we give in the discussion section. 2.2 User Studies and Usability Testing of WLAN Hotspot Access The usability work done in areas related to usability of WLAN access fall under several subcategories of work done in the field of Human-Computer Interaction (HCI). These areas include work done in the areas of personal computing, pervasive computing, wireless computing, mobile HCI, and, of course, usability of WLAN access. However, most user studies that we know of focus on Hotspot/WLAN access in a specific area, most notably in university campuses, such as [14],[15],[24],[30], [31], building-wide local-area wireless network [34], or a scientific conference [6]. This work does not, however, really give insight to understanding the usability issues embedded in accessing public APs on the fly, since the users were not at all mobile in their usage behaviours. [9] have tracked how public APs were utilized in Manhattan, N.Y., finding some similarities with the campus studies: the users were repeatedly using the same APs, another proof of the relative “immobility” of the subjects. However, finding the APs was again not a part of this study. [5],[7] have identified the current challenges and future directions for wireless hotspots. They mention, among other things, “device-independence” as one key goal in enhancing the usability of WLAN access – the need for this became obvious in this study also. Further challenge is presented by defining the identity for user when accessing hotspots – which attributes to use to preserve privacy at the same time preserving accountability. Existing solutions for easy-to-use WLAN hotspot access include the FriendZone usage study [10], which was however not tested with real users. Another interesting
552
K. Karvonen and J. Lindqvist
piece of work is presented by [32], who bring up the issue that in mobile usability in general, the work on understanding the interaction between device and its physical environment is still scarce and better utilization of geographical information could prove beneficial to locating and naming of WLAN APs also. [31] deal directly with how hotspots are currently found, the key outcome being that “word-of-mouth” was currently serving as primary information source about the location of available hotspots. They identify several usability problems in finding WLAN APs. These include the locating of distant networks, notification of new hotspots, finding and accessing the strongest signal (an approach that could easily be used for malicious purposes also as reported in [29]), and getting information about hotspots. [11] write interestingly about the free WLAN access as part of wireless commons, and what means to use to prevent its misusage. The usability methods applied to studying WLAN hotspot access usability include the usage of questionnaires, as e.g. in [20], [21], usage tracking and analysis, e.g. [5], [10], [24], and observations and interviews of users [31]. On basis of the related work, there are several areas in WLAN usability where we several usability issues can be detected. These are: • The multitude of devices, differing on form, input mode, processing power, battery life, and screen size/resolution/colour depth [7], [32]. • The relative immobility of users in how the hotspots are currently used and why this is so [9], [26], [31], and how hotspots can be found [7], [31] • Location privacy, tracking, transactions [1], [13], [15], [21], [23] • Tradeoffs between usability and security. E.g. according to [23], users require that their transactions over public WLANs be safe, yet they want seamless, automated roaming without need for manual sign-on. What form of authentication is best from usability point-of-view [7]? [25] concentrate on privacy enhancements AP usage. Further, work on finding out about home users’ network access behaviours can be found in e.g. [19], who have evaluated the usage behaviours and UI expectations in a smart home environment with several users over extended period of time (6 months), where part of the network access was via WLAN. Three devices, PC, mobile phone, and a media terminal, were tested, and UI prototypes for these devices were designed and adjusted according to user feedback. It became clearly evident that the user expectations for each device were different, mobile phone becoming the most used device to control the smart home functionalities despite initial reluctance and suspicion towards it as suitable for operating the home. These results may have repercussions for the work at hand, since the initial resentment of small terminal of the mobile phone was later overcome and this device was preferred. In practice this means that users were willing to negotiate usability for mobility and personal possess in the actual usage.
3 The Usability Study 3.1 Test Setting We searched for and located publicly visible WLAN access points in several locations in two cities in Finland. The discovered access points were of several different nature:
Usability Improvements for WLAN Access
553
part of a publicly available WLAN provided by the city; WLAN access offered by a private vendor, such as a café, as free service; company WLAN offered for outside visitors; private home WLAN, and WLAN offered by a public institute, such as a local university. The user studies were done by two researchers, one security expert and one usability expert, with three types of end devices: an Apple iBook G4 laptop with Mac OS X operating system, a Nokia 770 series PDA device, and a Nokia 9500. 3.2 Test Procedure A cognitive walkthrough method was used to simulate the steps and mindset of a actual mobile user, who would be moving within a district, trying to find and utilize existing WLAN APs. The cognitive walkthrough method is a well established usability methodology that has been effectively used e.g. in the classical study by Whitten et al [36]. As further methodology, expert analysis, consisting also a heuristic analysis as reported e.g. by [27], was used to analyse the usability problems detected during the usage. The two test persons would test different locations, dispersed in the capital city region in Finland in two cities, Helsinki and Espoo. The searching and accessing procedures were repeated in each location with at least two different devices, in most cases with all three.
4 Discussion On basis of the study, we were able to detect several generic usability problems in how the current access points are provided and visualised to the end users regardless of the device used. These usability problems include the naming of the WLANs available, their actual availability, visualisation of the security and access possibilities of the WLANs, as well as usability problems arising from controlling issues in managing the connections due to the dynamism of the WLAN search functionalities in the tested devices. Next, we will discuss each of these generic issues in detail. Naming. In the study, it became obvious that there were no standard and intuitive ways to name the various WLAN available. For the most, the WLANs were named either according to the service provider (the company; the university; the city), according to the manufacturer of the WLAN device used (Linksys; Motorola), or according to the WLAN owner (“pete”), or a generic location name (“home network”; “home base”). Possible better naming policies could be induced from [22], where users were asked to name locations in a mobile use situation. The name classifications that came up in the study included 1) generic locations, 2) point of interest and 3) geographical areas. Since these may be the natural and intuitive location names for users, even if the focus of this work is on presence information, utilizing these categorizations in WLAN AP naming might prove beneficial to the overall usability of WLAN access and its understandability. Actual availability. The found WLAN access points were often not really available. Also, the lists were not updated according to what APs were currently available.
554
K. Karvonen and J. Lindqvist
Visualisation of the security and access possibilities. The various types of locks associated with different types of WLAN security features (WEP, WPA) were not intuitive to the users. Further, the WLAN APs were often visually suggestive of being openly accessible, when in reality they were not. In many cases, like in trying to find open access in a public place, it is futile to show users the An improvement suggestion would be to allow user selecting listing only non-secured WLAN APs. Managing the connections list. The tested devices had a built-in feature of searching for available access points constantly, regardless of user actions conducted at the same time. It was not possible for user to stop the search when a desired AP had been found. It was also not possible to organize the list of access points in any way, except by naming policy. The only possible listing that was easily available (or at all?) to the users was alphabetical listing on basis of the default of user-specified name of the AP. In many cases, this was the worst order, since the list contained all WLAN APs added to the list at any point during usage – including APs found in another city, for example. A usability improvement would include more advanced ways to arrange the APs listed, according to preference and most recently used aspects, for example. Further, the search should be stoppable – and restartable – by user command. In addition to the generic usability problems discovered, with each end device tested, there were also several usability issues specific to each device adding to the usability issues embedded in the WLAN access itself, including how the established connections are established, shown, maintained, managed and accessed via the devices. Nokia 770. The Nokia 770 clearly was easiest to use for WLAN access, which is natural since WLAN is its major connectivity type among other NFC such as Bluetooth connections via mobile phone. Access to WLAN was rather straightforward, with a one-step access from main screen to the connections. However, changing the connectivity settings was accessible only via control panel, not from the connection manager directly. A clear improvement would be to allow connectivity editing also from the connection manager directly. Further, the dynamic search process with no ‘pause’ possibility made the managing of the found APs very low in usability, since the list of connections was changing constantly. Adding a ‘pause’ button to search would benefit the experienced usability of the search and list handling. Further, the once established and saved APs were listed as a singular list in alphabetical order. Because of this, as first items in the list could appear APs that were not accessible at the time. A clear improvement would be to include multiple ways to organize the list of APs according to e.g. recent use, location, currently available, etc. Further, even if the device was showing via an icon, whether the AP was in fact reachable, the user was able to connect to any AP, go all way through with the process, get an acknowledgement for successful connection, and only when opening e.g. a web browser get a notice of failed network connection. A usability improvement would be not to able connections to APs that the system has detected as unreachable. Nokia 9500 (Communicator). Nokia 9500 Communicator offers two ways to start using a WLAN access network. The main screen shows a white W sign on the left if WLAN access is available. The first is "EasyWLAN", where the user is shown a list
Usability Improvements for WLAN Access
555
of available access points. However, the list does not provide any information of the access point, only the name of currently available access points. To get more information, the user must explicitly configure the access point with the process shown in Figure 1. After the access point has been configured, the user can start e.g. a Web browser. The user is prompted to "Select an access point". By default, the list of available network connections shows also list of e.g. GPRS and other configured network access options. The user thus must remember where to connect.
Fig. 1. The multiple steps required for handling WLAN access in Nokia 9500
With this device, the steps required for managing and starting the WLAN access was quite cumbersome, consisting of multiple non-straightforward steps. Further, the language and terminology used in the UI was quite technical, thus effectively diminishing an average user’s capability for any WLAN access management. A clear improvement for the usability for WLAN access with Nokia 9500 would, then, include at least changing the UI language to more user-friendly, as well as cutting down the steps required to form a WLAN connection in the first place. Further, major visualization enhancements would be desirable. The current icons used for actual access and for the strength of the signal will probably be incomprehensible for most users, especially if WLAN usage is infrequent no learning effect can be expected. Apple iBook G4 laptop with Mac OS X. Maybe one of the biggest problems with the laptop, besides the obvious fact that a laptop size connection device is not truly feasible in the use situation described in this study, was that it would find only small percentage of the available APs at each location, as compared with the other two devices. This was eating on its reliability as providing WLAN access in the first place, and leave the user frustrated and without any connections, since they would not
556
K. Karvonen and J. Lindqvist
be found. In usability there is a saying, “if functionality is not found by the user, it doesn’t really exist”. This truly holds for trying to get WLAN access with a laptop. A further difficulty was presented by the change of the connecting device. In the case of restricted access with a temporary username and password to a publicly available WLAN access such as Helsinki city public wlan, once the user had logged into the system with one device, it was not possible to change the end device. In the case the batteries would run out or the user would decide to change to a device that would be better suited for browsing the available services, the initial wrong choice would effectively stop the user from accessing the service at all, since it was not possible to log out of the service. 4.1 On the Privacy and Usability of Security in the Area of WLAN Access The Privacy Enhancement Technologies (PET) address four essential requirements for privacy. These are: anonymity, pseudonymity, unlinkability, and unobservability [25]. [25] describe the current state of the privacy protection in existing solutions for WLAN access as “complex, non-adaptive, intrusive for the user and not contextaware”. Further, these solutions use only a very limited set of possible user identification parameters for accountability, like ID, address, or location. In their approach, medical information of the user, collected from a body sensor network is protected by automatic filtering when user is using a hotspot service. However, the system described also allows for user control by enabling the user to assign preferred levels of privacy to the data. The different aspects of the data the privacy of which needs to be protected include location, context, identity of the user, and private information available. The user of the AP should, then, be able to choose to reveal these different types of information of himself in different amounts and different combinations in different situations – in an easy fashion.
5 Conclusions On basis of the usability analysis presented, it is clear that there are several serious usability issues in the current UIs for handling WLAN access management. On basis of the analysis, we are in the process of implementing the suggested usability improvements on Nokia 770 and then intend to do extensive usability testing with real users on the new design.
References 1. Ackerman, M.S.: Privacy in pervasive environments: next generation labeling protocols. Pers. Ubiq. Comput. 8(6), 234–240 (2004) 2. Anon: [Aircrack-ng]: Referenced 15.2.2007 (2007) Web page http://www.aircrackng.org/doku.php 3. Anon: The Official WorldWide Wardrive (2007) Referenced 16.2.2007 Web page available at http://www.worldwidewardrive.org/wwwdstats.html 4. Anon: Wardriving Tools, Wardriving Software, Wardriving Utilities, (2007) Referenced 16.2.2007 Web page available at http://www.wardrive.net/wardriving/tools
Usability Improvements for WLAN Access
557
5. Balachandran, A., Voelker, G.M., Bahl, P.: Wireless hotspots: current challenges and future directions. In: WMASH ’03: Proc. of the 1st ACM international workshop on Wireless mobile applications and services on WLAN hotspots, pp. 1–9. ACM Press, New York (2003) 6. Balachandran, A., Voelker, G.M., Bahl, P., Rangan, P.V.: Characterizing user behavior and network performance in a public wireless lan. In: SIGMETRICS ’02: Proc. of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp. 195–205. ACM Press, New York (2002) 7. Balachandran, A., Voelker, G.M., Bahl, P.: Wireless hotspots: current challenges and future directions. Mob. Netw. Appl. 10(3), 265–274 (2005) 8. Berghel, H.: Wireless infidelity I: war driving. Comm. ACM. 47(9), 21–26 (2004) 9. Blinn, D.P., Henderson, T., Kotz, D.: Analysis of a wi-fi hotspot network. In: Proc. of the 2005 workshop on Wireless traffic measurements and modeling, pp. 1–6. USENIX Association, Berkeley, CA, USA (2005) 10. Burak, A., Sharon, T.: Analyzing usage of location based services. In: Proc.of Human factors in computing systems, pp. 970–971. ACM Press, New York (2003) 11. Damsgaard, J., Parikh, M.A., Rao, B.: Wireless commons perils in the common good. Commun. ACM 49(2), 104–109 (2006) 12. Fluhrer, S., Mantin, I., Shamir, A.: Weaknesses in the Key Scheduling Algorithm in RC4. In: Vaudenay, S., Youssef, A.M. (eds.) SAC 2001. LNCS, vol. 2259, Springer, Heidelberg (2001) 13. Gruteser, M., Grunwald, D.: Enhancing location privacy in wireless lan through disposable interface identifiers: a quantitative analysis. In: Proc. of the 1st ACM international workshop on Wireless mobile applications and services on WLAN hotspots, pp. 46–55. ACM Press, New York, USA (2003) 14. Henderson, T., Kotz, D., Abyzov, L.: The changing usage of a mature campus-wide wireless network. In: Proc. of the 10th annual international conference on Mobile computing and networking, pp. 187–201. ACM Press, New York, USA (2004) 15. Hong, J.I., Ng, J.D., Lederer, S., Landay, J.A.: Proc. of the 2004 conference on Designing interactive systems: processes, practices, methods, and techniques, pp. 91–100. ACM Press, New York, USA (2004) 16. IEEE: 802.11-1999 Information technology. Telecommunications and information exchange between systems- Local and metropolitan area networks. Spec. req. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE New York (1999) 17. IEEE: 802.11i-2004 IEEE Standard for Information technology- Telecommunications and information exchange between systems- Local and metropolitan area networks. Spec. req. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications Amendment 6: Medium Access Control (MAC) Security Enhancements. IEEE New York (2004) 18. Kanter, T.G.: Going wireless, enabling an adaptive and extensible environment. Mob. Netw. Appl. 8(1), 37–50 (2003) 19. Koskela, T., Väänänen-Vainio-Mattila, K.: Evolution towards smart home environments: empirical evaluation of three user interfaces. Personal Ubiquitous Comput. 8 (2004) 20. Lederer, S., Mankoff, J., Dey, A.K.: Who wants to know what when? privacy preference determinants in ubiquitous computing. In: CHI’03: extended abstracts on Human factors in computing systems, pp. 724–725. ACM Press, New York, USA (2003) 21. Lederer, S., Hong, I., Dey, K., Landay, A.: Personal privacy through understanding and action: five pitfalls for designers. Pers. and Ubiq. Comp. 8(6), 440–454 (2004)
558
K. Karvonen and J. Lindqvist
22. Lehikoinen, J.T., Kaikkonen, A.: PePe field study: constructing meanings for locations in the context of mobile presence. In: Proceedings of the 8th Conference on HumanComputer interaction with Mobile Devices and Services MobileHCI ’06, vol. 159, pp. 53– 60. ACM Press, New York (2006) 23. Matsunaga, Y., Merino, A.S., Suzuki, T., Katz, R.H.: Services: Secure authentication system for public WLAN roaming. In: Proceedings of the 1st ACM international workshop on Wireless mobile applications and services on WLAN hotspots WMASH ’03, ACM Press, New York, USA (2003) 24. McNett, M., Voelker, G.M.: Access and mobility of wireless pda users. SIGMOBILE Mob. Comput. Commun. Rev. 7(4), 55–57 (2003) 25. Mitseva, A., Imine, M., Prasad, N.R.: Context-aware privacy protection with profile management. In: Proc. of the 4th international Workshop on Wireless Mobile Applications and Services on WLAN Hotspots, pp. 53–61. ACM Press, NY (2006) 26. Nicholson, A.J., Chawathe, Y., Chen, M.Y., Noble, B.D., Wetherall, D.: Improved access point selection. In: Proc. of the 4th international conference on Mobile systems, applications and services, pp. 233–245. ACM Press, New York (2006) 27. Nielsen, J.: Usability engineering. Academic Press, Inc. Boston, USA (1993) 28. Palmieri, A., Sigona, F.: A QoS management system for multimedia applications in IEEE 802.11 wireless LAN. In: Proc. of the 5th international Conference on Mobile and Ubiquitous Multimedia MUM ’06,, vol. 193, ACM Press, New York, USA (2006) 29. Potter, B.: Wireless hotspots: petri dish of wireless security. Comm. ACM 49(6), 5 (2006) 30. Päykkänen, K., Räisänen, H., Isomäki, H.: Mobile studying and social usability on a wireless campus. In: Proc. of the 8th Conference on Human-Computer interaction with Mobile Devices and Services, vol. 159, pp. 269–270. ACM Press, New York (2006) 31. Roto, V., Laakso, K.: Mobile guides for locating network hotspots. In: Workshop on HCI in Mobile Guides (2005) 32. Ryan, C., Gonsalves, A.: The effect of context and application type on mobile usability: an empirical study. In: Proc. of the Twenty-eighth Australasian conference on Computer Science, Australian Computer Society, Inc. pp. 115–124 ( 2005) 33. Stubblefield, A., Ioannidis, J., Rubin, A.D.: Using the Fluhrer, Mantin, and Sharmit Attack to Break WEP. In: Proc. of the Network and Distributed System Security Symposium, Internet Society (2002) 34. Tang, D., Baker, M.: Analysis of a local-area wireless network. In: Proc of the 6th annual international conference on Mobile computing and networking, pp. 1–10. ACM Press, NY, USA (2000) 35. Venkatesh, V., Ramesh, V., Massey, A.P.: Understanding usability in mobile commerce. Commun. ACM. 46(12), 53–56 (2003) 36. Whitten, A., Tygar, J.D.: Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0, In: Proc. of the 8th USENIX Security Symposium, USENIX (1999)
A New Framework of Measuring the Business Values of Software In Ki Kim1, Beom Suk Jin2, Seungyup Baek3, Andrew Kim4, Yong Gu Ji3,∗, and Myung Hwan Yun1 1
Department of Industrial Engineering, Seoul National University Department of Industrial and information Engineering, Yonsei University 3 Department of Industrial Engineering, Pennsylvania State University 4 Ubiquitous Computing Laboratory, IBM Korea {lookat2, mhy}@snu.ac.kr, {kbf2514jin, yongguji}@yonsei.ac.kr, [email protected], [email protected] 2
Abstract. A new framework for measuring the business values of software is presented. The business values of software are categorized to two groups: tangible- and intangible-benefit. An implicit approach is used to quantitatively measure the intangible benefit of software by introducing two concepts, product attribute and quality attribute. The approach can relate the quantitative value from the usability test into the qualitative, intangible benefits of software. As an example, the proposed framework is applied to a software system in the development stage. We demonstrate the capability of the framework to quantitatively measure the intangible benefits of software as well as the tangible benefit by studying the usability test. Keywords: Software, Business value, Product attribute, Quality attribute, Usability test.
1 Introduction In the development stage of business application software, many project managers need to minimize the risk of the failure of the investment. To do that, they usually conduct project reviews by assessing the potential benefit of the software usability in terms of business value. There have, however, been many qualitative explanations of the potential benefit, which are somewhat vague in essence. The one reason is that the potential benefits of software usability are relatively harder to be quantified in comparison to the cost. In addition, it is difficult to find hard evidence to support the common sense expectation that the ease-of-use of a software leads to improved productivity or specific ease-of-use characteristics that truly make the software easier to use for a majority of users, as cautioned by Fried [1]. Furthermore, when problems are recognized after implementation begins, they cost considerably to find and fix them compared to at the stage of requirements and design [2]. ∗
There are, however, few systematic approaches to early evaluate the problems in software engineering [3]. Therefore, a framework is proposed for a quantitative measurement of business value (BV) of software in conjunction with usability test. For this end, an implicit approach is applied by introducing several concepts to relate the quantifiable value of the business value to usability test which can provide tangible data and related financial value. The remainder of this paper is organized as follows. Section 2 briefly reviews some related works. In Section 3, details of the proposed framework are described. The proposed framework introduces an approach to quantify the intangible value of benefits of business values by introducing two concepts, quality attribute and product attribute, and utilizing them to relate the tangible data from the usability test into the intangible value. Section 4 presents and discusses the results obtained on quantitatively measuring the business value for an example of software, named Dsoftware, which is in the development stage. The software system has been developed to support the developers of RFID application by providing a way of visual representation, namely Graphical Composition Language (GCL). Finally, the conclusive remarks are drawn in Section 5.
2 Literature Reviews According to Karat [4], usability engineering can help developers produce marketable products that will be useful to all the organization, the users, and the customers of their products’. In this paper, we take the assumption that usability itself has significant advantages in the business perspective. The assumption leads to develop metrics for quantifying of software usability. The standard economic model, which is one of the traditional methods to estimate the value of software, assumes that software consists of various design attributes [3]. Under this assumption, the economic model compares costs and benefits of the implementation properties to estimate the values of the design attributes. This model, however, requires thorough and long-term observation of trajectory in sales and cost structure to estimate the value of software. In the competitive industry environment today, this is often not practical and not possible. Thus, there are suggested several alternative methods to analysis the cost and benefit of software attributes in the information system [5]. Mantei and Teorey [6] tried to tie human factors into the software development lifecycle. They calculated tangible benefit and cost of applying human factors approach into the software lifecycle by using the results of task analysis and usability test. They found out that even on small scale, the task analysis and the usability test can reduce a great amount of time and efforts required to estimate the total tangible benefit and cost. In addition, the usability test which is performed prior to the release of the product to market also enables the stakeholders to understand the potential intangible benefit and cost. (See [6] for details of the intangible benefit and cost) Krishnan [7] utilized two concepts, QA (Quality Attribute) and PA (Product Attribute), to quantify the benefits from the result of task analysis and usability test. QA can be regarded as abstract values such as capability and reliability perceived by customers from a number of PAs. QA can be defined conceptually and the relative
A New Framework of Measuring the Business Values of Software
561
importance of each of QAs can be estimated from the business and organization’s perspective. There is, however, no predefined set nor unique importance values for the QA. The reason is that new QA can be always introduced with the change of market environment, which can accordingly modify the original relative-importance of QA. According to Bachmann and Bass [8], QA is incurred from multiple interactions between multiple PAs and it is reflected through specific User Interfaces. Contrary to QA, PA is concrete and tangible, which can be component and/or function of software. Individual PA is closely related to the result of the usability test.
3 Methods In this section, a framework is proposed for a quantifiable evaluation of business value of software in conjunction with task analysis, questionnaire and usability test. 3.1 Business Values of the D-Software The survey and interview method are used to select key business-values of the Dsoftware. First, a business value pool for a RFID-related software is surveyed. Next, five factors for the D-software are identified in the list of business values through the interview with a sales expert (table 1). Table 1. Key business values Business Value Proposition Cost savings Tangible benefit (BV1) Ease (BV2) Quality (BV3) Intangible benefit
Flexibility (BV4) Extendibility (BV5)
Operational definition The amount of quantifiable savings in the system development and maintenance phase derived from usability-related factors Potential benefits due to ease of development, ease of use, ease of maintenance derived from usability-related factors Potential benefits in the quality of the system derived from usability-related factors Potential benefits regarding agilely responding to external environments and various requirements of users derived from usability-related factors Potential benefits that supports the system to extend based on components in the light of usability-related factors
To develop a method for the quantifiable assessment of the key business-values (BVi, i=1,..,5), the values are categorized as two kinds of benefits by deciding whether they are financially measurable or not: 1) tangible benefit which is financially measurable and 2) intangible benefit which is not financially measurable. 3.2 Usability Evaluation Framework The D-software is a component based solution to facilitate the RFID application composed of various physical and logical devices such as RFID Tags, Readers, Motion sensors and servers. Since the D-software is in the stage of development and revision, it is difficult to estimate and quantify the potential benefit of the software system due to lack of information such as sales data and feedback from market. Thus,
562
I.K. Kim et al.
the usability test and task analysis will be conducted to quantify the benefit of the Dsoftware. A hierarchical structure model by Seffah et al. [9] is utilized to quantify a user’s performance in modeling of a RFID application using the D-software (fig. 1).
Fig. 1. Usability evaluation framework
From the usability test including small task using the D-software, we can measure time, error and subjective response directly coming from the D-software. It is important to divide the whole process of use into sub tasks matching each PA respectively in the sufficient sub-level. For the usability test, a use case is selected, which is utilized in many commercial RFID-based solutions for supply chain management. 3.3 Measuring the Tangible Benefits of Software For tangible benefit, at first, two categories in the standard cost structure of a software development are considered as the main factors of the cost to develop a RFID application: 1) development cost and 2) maintenance cost. In the estimation method of development cost, there are estimating method of developing scale and number of people and period. Using the estimated number of people and period, total development cost (Cdevelopment) is computed as follows: C direct labor = C engineer × N people × N day × N month C over head = C direct labor × w over head
C technology = (C direct labor + C over head ) × w technology
(1)
C development = C direct labor + C over head + C technology
where Cdirect labor means the direct labor cost, Cengineer the unit cost of engineer payment, Cover head the overhead expenses, Ctechology the technology cost, Npeople the number of people, Nday the number of average working days, Nmonth the number of month, wover head the ratio of the Cover head to the Cdirect labor ranging from 110 to 120%, and wtechnology the ratio of the Ctechology to the sum of the Cdirect labor and Cover head ranging from 24 to 40%. The maintenance cost per year (Cmaintenance) is estimated using MD (Maintenance Difficulty) and Cdevelopment. The maintenance difficulty (MD) is computed using TMC (Total Maintenance Complexity) which is calculated by measuring frequency of maintenance, frequency of data manipulation, interconnectivity to other system,
A New Framework of Measuring the Business Values of Software
563
required knowledge, and divided transaction. Then, the Cmaintenance induced by a RFID application is computed as follows: ⎛ [TMC] ⎞ MD(%) = 10 + ⎜ 5 × ⎟ 100 ⎠ ⎝ C maintenance = C development × MD
(2)
3.4 Measuring the Intangible Benefits of Software Benefit of PA comes from two different aspects - product and Task (T). Assuming causal relationship between PA and usability test, the benefit of PAj can be expressed as follows:
∑ (U Tk × X jk ) K
j B PA =
k =1
J
K
∑ ∑ X jk
(3)
j=1 k =1
j represents the benefit of PAj, J and K are the number of PA and Task, where B PA
respectively, U Tk refers the increased usability in a task Tk, which means increased effectiveness, saved time, or increased subjective satisfaction during conducting Tk, and Xjk is the random variable representing the relevance between each PAj and Tk. The random variable is defined as follows: ⎧1 if PA j is fully used during a task Tk X jk = ⎨ ⎩0 otherwise
(4)
Since the reference point of comparison is before the implementation of PAj, U Tk is expressed in terms of percentage. When PAj is partially used during a task Tk, it is highly recommended that Tk should be divided further into subtask Tk1 and Tk2 . Thus, either of the subtasks can be attributed to PAj. After all, the benefit of PAj is the averaged sum of error reduction, time savings or marginal satisfaction for each partitioned task involving PAj. The task analysis works here to divide the whole tasks into sub-tasks involving specific product attributes. From the estimated benefit of PAj (j=1, …,J) in Eq. 2, the intangible benefit of each of QA is calculated in the form of the linear combination as follows: j B lQA = ∑ ∑ (B PA × C lj ) J
L
j=1 l =1
(5)
where BlQA represents the intangible benefits of each of QA and C lj means the
contribution of PAj to QAl, ranging from -1 to 1. The contribution is subjectively assessed by the stakeholders. The stakeholder should have a thorough idea of both PA and the concept of QA. The range [-1, 1] is intuitive (negative value mean lessening QA and vice versa) as used by Kazman et al. [10]. Assessed value of contribution is then normalized between 0 and 1. Sometimes the benefit of each QA can be more than 100 because the assessed value of each PA’s contribution is independent each other.
564
I.K. Kim et al.
The benefit of QAl means the total of reduced time, error or increased satisfaction from all the related product attributes. To integrate all the benefit of each QA into the possible intangible benefit, the relative importance of each QA should be assessed from the business perspective. It is advisable that the cross-functional group composed of project managers, sales person, and decision makers is to assess the relative importance of each QA collectively. The estimation of total intangible benefit (TIB) of software comes as follows: TIB = ∑ (B lQA × I lQA ) L
(6)
l =1
L
where I lQA is the relative importance of QAl, so that ∑ I lQA = 1 . l =1
4 Results 4.1 Usability Test and Task Analysis of the D-Software
The usability test includes the evaluation of user satisfaction and performance by questionnaire and measurements on small group test. Especially, the modeling aspect of an RFID application using the D-software is focused in the test. The subject group is composed of 32 under/graduate students in the software engineering, where male are 28 and female are 4. Subjective responses from questionnaire are also collected. For setting the reference point of usability, apart from usability testing, 2 experts who have experiences in RFID programming over 3 years with MS degree in software engineering participated with the same use case without the D-software (using manual coding). We analyzed the usability degree of user satisfaction (qualitative) and performance (quantitative) for understanding the usability degree of the D-software through usability evaluation model. From the observation of modeling with the D-software and manual coding, modeling stages can be categorized as follows from the results of task analysis: 1) Component searching, 2) Component modeling, 3) Component manipulation, 4) Model searching, 5) Error detection & correction and 6) Attaching related library supports. Table 2 below shows the metrics of usability testing. Table 2. Metrics of usability testing Metrics Component searching time (CST) Component manipulation time (SMT) Component connection time (CCT) Model searching time (MST) Number of deleted component (NDC) Frequency of code line deletion (FOCLD) Frequency of copy & paste of previously made codes (FOCP) Frequency of adding description (FOAD)
Description Search time for choosing appropriate component Cost time for arranging component Cost time for connecting each component Cost time that subject searches conceptual model to constructing system Number of deleted component while task completion
A New Framework of Measuring the Business Values of Software
565
4.2 Tangible Benefit of the D-Software
The table 3 below shows the consumed time during each modeling stage calculated from measured time and frequency, where τdel means the unit time to delete a component or a line, τcp the unit time to copy and paste a bunch of codes, and τdes the unit time to add one line of description. Table 3. Consumed time during each modeling stage Modeling stages Component searching (CST) Component modeling (CCT) Component manipulation (CMT) Model searching (MST) Error correction (1NDC×τdel; 2FOCLD×τdel) Copy & Paste / Descriptions (2FOCP×τcp+FOAD×τdes)
Total modeling time is computed as the summation of the consumed time of the six modeling stages from the usability test. The value of the total modeling time was 779.79sec (=0.217hr) using the D-software and 6645sec (=1.846hr) using manual coding, respectively. The total modeling time can be used to calculate the savings in the development cost Cdevelopment in Eq. 1. The daily working hour is set to be 8hrs. It is assumed that the decrease in the productivity caused by increasing size of model is relatively low in the D-software modeling, thus not applying to the cost in the D-software. For conservative estimation of savings in cost, we assume that the line of codes of general use case amounts to 30,000 lines of codes. To eliminate the cost effects of programming language, we assume that GCL in the D-software is in the similar level of difficulties with Visual Basic, HTML, Delphi, etc. Also, we assume that relatively little java code is used for the D-software modeling. Then, the development cost Cdevelopment is calculated using Eq. 1. For the overhead expenses Cover head, it usually takes 110~120% of direct labor cost. For the technology cost Ctechnology, it usually takes 20~40% of direct labor cost plus overhead expenses. As a result, the saving percentage in the development cost Cdevelopment by the D-software compared to manual coding by C# is estimated to be greater than 93% in the use case. To estimate the maintenance cost Cmaintenance in the D-software system over manual coding, the total maintenance complexity (TMP) is set to be 35 by the D-software and 40 by the manual coding using C#, respectively. The major difference between the Dsoftware and the manual coding in terms of operation and maintenance is that the Dsoftware requires much less knowledge of hardware devices and the code itself. As a result, TMP in the D-software is 5 points less than that of the manual coding. Here, we assume that number of maintenance is below 4 per year (score = 0), that number of data transaction is below 500,000 per year (score = 10), that interconnection to other system is more than 3 (score = 10), and that divided transaction is in the integrated state (score = 10). We assume that the life cycle of the product is 5 years from the moment of completion of development. The product is purchased at the end of completion of development, which is set to be the 0 year. Maintenance is repeated
566
I.K. Kim et al.
every year except the two years of 0 and 5. The discount rate is assumed to be 6% yearly. The saving percentage in the maintenance cost Cmaintenance by the D-software compared to the manual coding using C# is estimated to be greater than 93% in the use case. 4.3 Intangible Benefit of the D-Software
Before estimating intangible benefit of the D-software, key PA and QA should be identified. Five PAs which mean the specific and representative characteristics incarnated in the D-software are identified by developers: 1) GCL (Graphical composition language) (PA1), 2) State machine (PA2), 3) Component Library (PA3), 4) Deployment (PA4), and 5) Code generator (PA5). The survey and interview method are used to select major QAs. First, candidate quality attributes used and emphasized in similar RFID-related software are collected. Next, four key QAs are identified through the interview with sales expert and project manager. The resulting list of QA is the same with the four business values that consist of the intangible benefit as shown in Table 1. First step is to identify and quantify the relative importance of the QAl, I lQA , for all l in Eq. 6. The I lQA can be imposed from the managerial perspective. That is, the value of each I lQA is obtained from focus group interview (FGI) with a sales manager. According to the sales manager, Ease in the development, use and maintenance (QA1) is emphasized to be the most important among four QAs. Quality (QA2) and Flexibility (QA3) are equally less important than QA1. Last, extendibility (QA4) has the least importance. From the comments, the values of I lQA for l=1,…4, are obtained as follows: Table 4. The relative importance of four qualitative attributes QA
Ease (QA1)
Quality (QA2)
Flexibility (QA3)
Extendibility (QA4)
Relative importance
I 1QA = 0.5
2 I QA = 0.2
I 3QA = 0.2.
4 I QA = 0.1.
Next, the contribution of each of PAj to each of QAl is derived on a scale of 0 to +1 from the subjective evaluation by a programmer. The matrix of J× L, where J = 5 and L = 4 will be derived as follows, which is the contribution matrix C lj for j=1,…,5, and l=1,…4. in Eq. 5. Table 5. The contribution of PA to QA
C lj
QA1
QA2
QA3
QA4
PA1 PA2 PA3 PA4 PA5
0.70 0.70 0.80 0.60 0.90
0.40 0.20 0.10 0.20 0.30
0.70 0.70 0.70 0.60 0.20
0.50 0.20 0.50 0.70 0.20
A New Framework of Measuring the Business Values of Software
567
We analyzed the result of user satisfaction (qualitative) and performance (quantitative) for understanding the usability degree of the D-software through the usability evaluation framework. The result calculated by experiments is as follows. In order to simplify the analysis, the set of tasks is assumed to be identical to that of the modeling stages shown in Table 3. Therefore, the increased usability in a task Tk, U Tk , can be computed as follows: Table 6. The increased usability in a task Tk Task Component searching (T1)
Increased usability U 1T = 0.85
Component modeling (T2)
U 2T = 0.95
Component manipulation (T3)
U 3T = 0.82
Model searching (T4)
U 4T = 0.89
Error correction (T5)
U 5T = -1.82
Copy & Paste / Descriptions (T6)
U 6T = 1.00
The random variable Xjk defined by Eq. 4 is shown in Table 7. The benefit of PA, defined by Eq. 3, is computed using the U Tk in Table 6 and Xjk. Table 7. The relevance between PA and T Xjk PA1 PA2 PA3 PA4 PA5
T1 1 0 1 0 0
T2 1 1 1 1 1
T3 1 0 0 0 1
T4 0 1 1 0 0
T5 1 1 0 0 1
T6 0 1 0 1 1
j B PA
0.05 0.06 0.16 0.11 0.06
j The intangible benefit of QA, defined by Eq. 5, is computed using the B PA in
Table 7 and C lj in Table 5. Now, we can quantitatively compute the total intangible benefit (TIB) of software, defined by Eq. 6, using I lQA in Table 5 and B lQA in Table 8. The resulting total intangible benefit (TIB) of the D-software is 0.25. Table 8. The benefit of QA QA
QA1
QA2
QA3
QA4
TIB
B lQA
0.33
0.09
0.27
0.21
0.25
5 Concluding Remarks This paper has presented a framework for measuring the business value of software. An implicit approach that introduce five product attributes and four quality attributes
568
I.K. Kim et al.
allows to quantitatively measure the intangible benefits of software, which have been usually assessed in qualitative ways. Therefore, all kind of business values, which are classified into the tangible- and the intangible benefit, can be quantitatively measured. This can enable project managers to evaluate the project of software development in quantitative way. The result of an example, which tested the D-software in the development stage, showed that the proposed framework can be used to quantitatively evaluate the business value of software. This will help the project managers to reducing the risk of failure of the investment.
References 1. Fried, L.: Nine principles for ergonomic software. Datamation 28(12), 163–166 (1982) 2. Boehm, B., Basili, V.R.: Software Defect Reduction Top 10 List. Computer 34(1), 135– 137 (2001) 3. Scaffidi, C., Arora, A., Butler, S., Shaw, M.: A Value-Based Approach to Predicting System Properties from Design. In: The seventh international workshop on Economicsdriven software engineering research, Missouri, St. Louis, ACM Press, New York (2005) 4. Karat, C.M.: Usability Engineering in dollars and cents. IEEE Software 10(3), 88–89 (1993) 5. Sasson, P.G.: Cost benefit analysis of information systems: a survey of methodologies. ACM SIGOIS Bulletin 9(2-3), 126–133 (1988) 6. Mantei, M.M., Teorey, T.J.: Cost/benefit analysis for incorporating human factors in the software lifecycle. Communications of the ACM 31(4), 428–439 (1988) 7. Krishnan, M.S.: Cost, Quality and User Satisfaction of Software Products: An Empirical Analysis. In: CASCON ’93. Toronto, Ont. Canada Nat. Res. Council of Canada (1993) 8. Bachmann, F., Bass, L.: Introduction to the Attribute Driven Design Method. In: The 23rd International Conference on Software Engineering. ICSE 2001. Toronto, Ont. Canada IEEE Comput. Soc. (2001) 9. Seffah, A., Donyaee, M., Kline, R.B., Padda, H.K.: Usability measurement and metrics: a consolidated model. Software quality journal 14(2), 159–178 (2006) 10. Kazman, R., Asundi, J., Klein, M.: Quantifying the costs and benefits of architectural decisions. In: The 23rd International Conference on Software Engineering. ICSE 2001. Toronto, Ont. Canada IEEE Comput. Soc. (2001)
Evaluating Usability Evaluation Methods: Criteria, Method and a Case Study P. Koutsabasis, T. Spyrou, and J. Darzentas University of the Aegean Department of Product and Systems Design Engineering Hermoupolis, Syros, Greece, GR-84100 Tel.: +30 22810 97100, Fax: +30 22810 97109 {kgp, tsp, idarz}@aegean.gr
Abstract. The paper proposes an approach to comparative usability evaluation that incorporates important relevant criteria identified in previous work. It applies the proposed approach to a case study of a comparative evaluation of an academic website employing four widely-used usability evaluation methods (UEMs): heuristic evaluation, cognitive walkthroughs, think-aloud protocol and co-discovery learning. Keywords: Usability evaluation methods, comparative usability evaluation, case study.
2 Related Work A comparative usability evaluation involves multiple evaluators or evaluation teams that employ a single or multiple UEMs to carry out parallel evaluations of the same target system. There are few comparative evaluations in HCI literature. Hertzum and Jacobsen [5] present a comparative study concerning eleven UEMs evaluations carried out with three of the four methods studied in this paper, namely CW, HE, and T-AP. Their results show that the average agreement between any two evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no one of the three UEMs is in general more consistent than the others. Unfortunately, Hertzum and Jacobsen could not find studies where heuristic evaluation was performed by evaluators who aggregated the results of their individual inspections to a group output (which is the case for our study). Heuristic evaluations are usually applied by a group of inspectors or users and the individual results are then aggregated [12]. Van den Haak et al (2004) make a comparison of T-AP and C-D to test the usability of online library catalogues. The UEMs were compared upon four criteria of comparison related to digital libraries: number and type of usability problems detected; relevance of the problems detected; overall task performance; and participant experiences. The study involved 80 students. The main result of their study was that the UEMs revealed similar numbers and types of problems that were equally relevant. Molich et al [9] report on the results a comparative evaluation of a single web site (Hotmail) by nine professional teams. The goal of this study was to investigate the consistency of the results obtained. Each team was let alone to select their particular UEM and carry out the evaluation according to their work practices. The results of this evaluation are quite surprising: a large ratio (75% - 232 of 310) of usability problems identified were unique for each team that participated in the experiment, while there were only two usability problems of the target system that were reported from six or more teams. Other comparative evaluations with different foci are presented in [3] [4] [6] and [10]. These comparative studies differ in terms of goals and the criteria used to compare evaluator performance and/or UEMs. Of particular interest for comparative evaluation work is the work of Hertzum and Jacobsen [5] who investigate the evaluator effect in usability evaluations. The term denotes the fact that multiple evaluators evaluating the same interface with the same user evaluation method detect markedly different sets of problems [6]. They [5] propose three generic guidelines to minimize the evaluator effect: ▪ Be explicit on goal analysis and task selection. ▪ If it is important to the success of the evaluation to find most of the problems in a system, then we strongly recommend using more than one evaluator. ▪ Reflect on your evaluation procedures and problem criteria. The work presented in this paper contributes to related work by synthesising a general set of criteria from previous work into a structured approach for comparative usability evaluations. Furthermore, it presents a case study of a comparative usability evaluation that provides various insights about the UEMs employed.
Evaluating Usability Evaluation Methods: Criteria, Method and a Case Study
571
3 A Structured Approach for Comparative Usability Evaluations: Criteria and Process 3.1 Criteria for Comparative Usability Evaluations The criteria that can be taken into account for comparative usability evaluations can be distinguished by whether they refer to the evaluation target or to the UEMs themselves. An example of the first category of criteria is [17] that evaluate a web-based digital library focusing on: layout, terminology, data entry and comprehensiveness. However, criteria that are related to the target system are quite different for systems that follow different user interface paradigms. On the other hand there are also generic criteria that refer to the UEMs and not the target system. Among these (a useful review is provided by [4]), the paper identifies as most important the following: Realness (or relevance) refers to whether a usability finding is a real usability problem or not (or to what degree, i.e. a severe or not important problem). According to [4] the realness of usability findings can be determined by: a) comparing with a standard usability problem list; b) expert review and judgment; and c) by end-user review and judgement. Any approach includes advantages and drawbacks regarding applicability, cost-effectiveness and trustworthiness. In this respect further research includes: severity ratings [11] and combinations of severity and probability of occurrence [15]. Validity (or accuracy) can be defined as the ratio of the number of real usability problems with respect to the total number of findings (i.e. real and ‘false alarms’) for each application of UEM [4] [16]. Thoroughness (or completeness) is identified in [4] and [16] as the ratio of the number of (real) usability problems found by the application of a UEM with respect to the total number of usability problems that exist in the target system. Obviously validity requires that the total number of real problems has been identified through a detailed cross-examination of results produced by all UEMs. Effectiveness. The criterion of effectiveness for UEMs has been synonymous to thoroughness and validity of usability findings by most related work [2] [4] [8]; this is also in line with the definition of effectiveness by the ISO 92412 standard for usability as the ‘accuracy and completeness with which users achieve specified goals’. Thus, the effectiveness of UEMs can be identified as the product of thoroughness and validity [4]. Some related work goes even further to the definition of effectiveness by adding the issue of predictive power of UEMs in relation to the uptake of usability findings by developer teams [7] [8]. The latter perspective has to cope with additional methodological considerations not only about the persuasiveness of usability findings reporting, but also about the nature of usability findings themselves e.g. ‘objective’ usability problems (such as broken links in a web site) are far more likely to be addressed by development teams, rather than ‘subjective’ findings (such as findings related to terminology), which are the most difficult to explain in usability reporting anyway.
572
P. Koutsabasis, T. Spyrou, and J. Darzentas
Consistency has been related to reliability [4] and repeatability [13]. In our work, we use a working definition of consistency of UEMs in terms of repeatability, as the extent to which multiple applications of different usability inspection methods produce ‘reasonably similar’ results. This working definition is similar to the approach of [9]. Again, the need for the identification of means for trustworthy interpretation of the similarity of usability findings is required and may be addressed by the same ways as with the realness problem. 3.2 Essential Process Steps for Carrying Out Comparative Usability Evaluations The set up and carrying out of any comparative usability evaluation needs first of all to ensure that it has controlled as much as possible the aspects of the experiment that are related to the evaluator affect, thus conform with the guidelines proposed by [5]. Furthermore, the processing of results needs to ensure effective decision making about the problems of realness or relevance of the results and about the similarity of the results obtained by the parallel usability evaluations. In order to address the issues above, we propose the following guidelines for comparative usability evaluations: Ensure Commons when Carrying out the Usability Evaluations: A number of issues related to the preparation and carrying out of the parallel usability evaluations need to be addressed uniformly for each evaluation. In particular: ▪ Select evaluators that have a similar level of experience for usability evaluations. This can be achieved by selecting professional evaluators for carrying out the experiments. When this is not possible and novice evaluators must participate, then ensure that they work in teams and that they are closely supervised. Having more than one evaluator to carry out a usability evaluation is also proposed by [5] to maximise the number of results that can be obtained; when novice evaluators are employed, then working in teams can also assist their interaction towards resolving issues about the carrying out of the usability evaluation provided they are supervised by an experienced evaluator. ▪ Assign UEMs to evaluators according to their experience. It is generally better to allow evaluators to select a method in which they have experience or feel most comfortable in using it. ▪ Provide a common set of tasks to carry out. Unless a common set of tasks is provided, there is no way to ensure that evaluators have examined the same or at least similar areas of the target system. ▪ Provide a common format for documentation - reporting of usability findings. As Hartson et al [4] remark “many UEMs are designed to detect usability problems but problem reporting is left to the developers/evaluators using the UEM... problem report quality will vary greatly according to the skills of the individual reporter at communicating complete and unambiguous problem reports.” Reporting of usability problems can aid significantly the processing of results, especially for the case of parallel usability evaluations.
Evaluating Usability Evaluation Methods: Criteria, Method and a Case Study
573
Ensure effective decision making when processing the results of multiple usability evaluations: In particular, ▪ Select criteria for comparative usability evaluation: As discussed in related work there are various criteria that can be considered for comparative evaluations. We make use of the criteria list presented in section 3.1, in order to draw more general conclusions about UEMs. However, aspects related to the target system affect the performance of UEMs, such as the user interface paradigm (e.g. hypertext, WIMP, 3D, etc.) and the level of maturity of the target system (e.g. application or prototype). For example, it has been argued that usability inspection methods may be more appropriate for finding problems in the early design stage of an interactive system [3]. Therefore, any conclusions drawn need to be interpreted carefully in the context of the particular class of target systems. ▪ Select a decision criterion for relevance of usability findings: making decisions about the realness or relevance of usability findings has been addressed in various as discussed above. We have addressed the relevance problem in a two-stage approach: first, the evaluation teams provided as part of their documentation their argumentations upon each usability finding; secondly, all usability findings were rated by an experienced usability evaluator (the first author of the paper) upon a three-scale severity scheme: 0 – not a problem; 1: minor problem; 2: serious problem. ▪ Select a decision criterion for similarity of usability findings: when all evaluations are available, there is a need to go through the reports in order to identify the similarity of usability findings. Again, we followed an expert-based approach for this task, which is the most usual condition in parallel usability evaluations. In this case it is however generally advisable that more than one expert performs this task. Van den Haak et al [17] have used five experts to interpret the results of their comparative study. However, there are various practical problems with involving more than a single expert. The amount of time that is needed to go through all evaluation reports, to process the large pool of data in terms of relevance and similarity and to resolve ambiguities and disagreements actually requires a lot of synchronous work. Therefore, we have used a single expert to go through the data, as well as others (e.g. [9]).
4 A Case Study of Comparative Usability Evaluation 4.1 Evaluation Object The web site evaluated is that of the Department of Product and Systems Design Engineering (www.syros.aegean.gr), University of the Aegean and has been operating since September 2000. The web site was designed to address the emerging needs of the new department and has been extended since, by the addition of web-based subsystems (both open source and in-house developments) for the support of administrative and teaching tasks. 4.2 Participants The usability evaluations were carried out by the MSc students of the department in terms of partial fulfilment of their obligations for the course on interaction design.
574
P. Koutsabasis, T. Spyrou, and J. Darzentas
The students have a wide range of backgrounds about design having graduated from departments such as arts, graphic design, industrial engineering and information systems. Only two (out of a total of 27) students had limited experience on usability from their bachelor studies and had carried out a usability evaluation before. However, all students had considerable knowledge about the web site since they had been using it repeatedly. Thus the selected subjects had a similar level of usability experience (novice) but a good knowledge of the target system. According to Nielsen [10] who reports in the context of heuristic evaluation: “usability specialists are better than non-specialists at performing heuristic evaluation, and “double experts” with specific expertise in the kind of interface being evaluated perform even better”. Thus the lack of previous experience of selected subjects on usability evaluations was partly compensated by their good knowledge of the target system. Furthermore, the progress of the exercise was reviewed in weekly sessions with all teams in order to allow for resolution of queries and guide the smooth progress of the usability evaluations. Finally the fact that there was a team that carried out the evaluation instead of single novice designers encouraged critical discussion and group decision making about the findings of the usability evaluation. 4.3 Tasks and Methods Selected The evaluation teams were assigned one from the four usability evaluation methods of heuristic evaluation (HE - 3 teams), cognitive walkthroughs (CW - 3 teams), thinkaloud protocol (T-AP - 3 teams) and co-discovery learning (C-D 1 team) according to their degree of confidence for carrying out a usability evaluation with each one of these methods. All four methods are widely used in industry and academia for usability evaluation. The evaluation teams were provided with an analytic template for documenting the results, which included table of contents for the usability report and a predefined categorization of types of usability problems. The evaluation teams should test the system by following two given user tasks: ▪ For a student, to locate information about a specific course: course description, instructor and online notes. ▪ For a visitor of the department, to locate necessary information about visiting the department at Hermoupolis, Syros, Greece: the map of the town, accommodation information and travel information. The evaluation teams were given a two-month period to organise, carry out and document the usability evaluation. Their main deliverables were the usability report and their presentation of their results to an open to all discussion session. 4.4 Results Realness or relevance: The realness of usability findings (Table 1) is generally high in most methods even reaching 100% in one case of HE. However, three UEMs were identified with a rather large number of false (not real) usability findings HE2, CW2 and C-D1. The fact that this variability appeared in three different UEMs leads to the conclusion that it cannot be safely related to intrinsic characteristics of methods themselves but rather to the inexperience of the evaluation teams.
Evaluating Usability Evaluation Methods: Criteria, Method and a Case Study
575
Table 1. Realness of usability findings and severity ratings UEMs HE1 HE2 HE3 CW1 CW2 T-AP1 T-AP2 T-AP3 C-D1
Usability findings 18 28 14 21 24 21 18 17 39 200
1 8 0 3 6 1 1 3 10
0: not a problem 5.6% 28.6% 0.0% 14.3% 25.0% 4.8% 5.6% 17.6% 25.6%
Validity: The validity of UEMs (Table 2) can be directly measured out of the process of identifying the realness (or relevance) of usability findings. Baring in mind that the evaluator teams had little experience in usability evaluations, the validity of UEMs was quite satisfactory besides the three applications of methods that were discussed above. Table 2. Validity of Usability Evaluation Methods UEMs HE1 HE2 HE3 CW1 CW2 T-AP1 T-AP2 T-AP3 C-D1
Thoroughness: Thoroughness can be specified by the total number of real usability problems identified by each UEM divided by the total number of real problems that exist in the system, which is the sum of unique real problems identified by all methods. The eight out of nine UEMs demonstrated similar performance regarding the thoroughness measure (Table 3): they identified about 1/4 to 1/5 of the total number of the usability problems found throughout the system. The last UEM (co-discovery learning) resulted to an impressive (in comparison to the other UEMs) 41.4% of usability problems identified. Effectiveness: The effectiveness of UEMs can be identified as the product of thoroughness and validity (Table 4). The effectiveness of UEMs has demonstrated wide ranging results: ▪ Five out of nine UEMs identified about 1/4-1/5 of the total number of usability problems effectively (HE1: 22.9%; HE3: 20%; CW1: 22%; T-AP1: 24.6%; TAP2: 22.9%)
576
P. Koutsabasis, T. Spyrou, and J. Darzentas
▪ Another three out of nine methods identified 1/6 of the total number of usability problems effectively (HE2: 14.7%, CW2: 17.2% and T-AP3: 16.5%). ▪ Only one UEM identified almost 1/3 the total number of usability problems effectively (C-DL1: 30.8%) The overall results about the effectiveness of UEMs are unsatisfactory with regard to one of the central questions in usability evaluation: whether the application of a single UEM can identify a considerable amount of usability problems. This was also shown by the comparative usability evaluation work of [9] that uses professional design teams. A second interesting result, regarding the comparison of the effectiveness of UEMs themselves is that the co-discovery learning method was significantly more effective than all other methods. Thus, it seems that this method seems to significantly help young teams to perform better than the other three methods. On the other hand, the fact that only one team selected this method constraints the safety of the conclusion, which can also be further pursued in other comparative usability evaluations. Table 3. Thoroughness of Usability Evaluation Methods
UEMs HE1 HE2 HE3 CW1 CW2 T-AP1 T-AP2 T-AP3 C-D1
Total number of real usability problems 17 17 14 18 17 19 17 14 29
Total number of usability problems that exist in the system
Consistency: The consistency of UEMs was not satisfactory (Table 5). About the half of usability problems found (50.7%) were uniquely reported by the application of just one UEM. Furthermore, only 2 of a total of 9 teams found a consistent set of about 1/4-1/5 of the total number of usability problems (22.9%). On the contrary there was not a single usability problem that was identified by all UEMs.
Evaluating Usability Evaluation Methods: Criteria, Method and a Case Study
577
Table 5. Consistency across UEMs Total number of usability problems ... found by 9 teams / UEM … found by 8 teams / UEM … found by 7 teams / UEM ... found by 6 teams / UEM … found by 5 teams / UEM … found by 4 teams / UEM … found by 3 teams / UEM … found by 2 teams / UEM … found by 1 team / UEM
70 0 1 3 5 0 5 5 16 35
% 0.0% 1.4% 4.3% 7.1% 0.0% 7.1% 7.1% 22.9% 50.0%
4.5 Discussion The main conclusions that stem out of the case study are that: ▪ The employment of a single method is not enough for comprehensive usability evaluation. If it is important to find most problems, parallel evaluations can be carried out. ▪ No method was found to be significantly more effective or consistent than others. ▪ The realness and validity of evaluation results was considerably high for most teams, which counts for young designers’ supervised participation to usability evaluations. In the case study presented, we have followed the proposed approach to inform current practice regarding the use of UEMs. The educational setting in which the case study was carried out imposed restrictions regarding the selection of evaluators (i.e. supervised teams of novice evaluators), the assignment of UEMs (i.e. only one team felt confident to carry out the usability evaluation following co-discovery learning) and the processing of results (i.e. an expert-based approach was followed to make final decisions about the relevance and similarity of the usability findings). On the other hand, the educational setting was convenient for a number of other reasons including that: UEMs were applied according to a common set of lecture notes; evaluators followed a common format for reporting; and they followed the same tasks to evaluate the system. These conditions are hard to achieve in an industrial setting. For example, Molich et al [9] perform a comparative usability evaluation where the evaluator teams use different UEMs (actually combinations of UEMs that have evolved by practice) and different templates for reporting.
5 Summary and Conclusions Comparative usability evaluations are important for the throrough identification of usability problems and the comparison of UEMs in particular contexts. The paper contributes to the understanding of criteria for comparative usability evaluation both in terms of providing a method for this task and by presenting a relevant case study
578
P. Koutsabasis, T. Spyrou, and J. Darzentas
for a web-based system. It is envisaged that the approach taken can be applied to other comparative studies as well. Also the results of the case study can inform the selection of UEMs particularly when young designers need to be employed in comparative usability evaluations.
References 1. Andre, T.S, Hartson, H.R., Belzand, S.M., McCreary, F.A.: The user action framework: a reliable foundation for usability engineering support tools. Int. J. Human-Computer Studies 54, 107–136 (2001) 2. Cockton, G., Woolrych, A.: Understanding inspection methods. In: Blandford, A., Vanderdonckt, J., Gray, P.D. (eds.) People and Computer, vol. XV, pp. 171–192. Springer, Heidelberg (2001) 3. Doubleday, A., Ryan, M., Springett, M., Sutcliffe, A.: A comparison of usability techniques for evaluating design. In: Proceedings of Designing interactive systems (1997) 4. Hartson, H.R., Andre, T.S., Williges, R.C.: Criteria for evaluating usability evaluation methods. International Journal of Human-Computer Interaction 15, 145–181 (2003) 5. Hertzum, M., Jacobsen, N.E.: The Evaluator Effect: A Chilling Fact about Usability Evaluation Methods. International Journal of Human-Computer Interaction 13(4), 421–443 (2001) 6. Jacobsen, N.E., Hertzum, M., John, B.E.: The evaluator effect in usability tests. In: Summary Proceedings of the ACM CHI 98 Conference, pp. 255–256. ACM Press, New York (1998) 7. John, B.E., Marks, S.J.: Tracking the effectiveness of usability evaluation methods. Behaviour and Information Technology, 16(4/5), 188–202 (1997) 8. Law, E.L-C., Hvannberg, E.T.: Analysis of strategies for estimating and improving the effectiveness of heuristic evaluation. In: Proceedings of NordiCHI 2004, Tampere, Finland (October 23-27, 2004) 9. Molich, R., Ede, M.R., Kaasgaard, K., Karyukin, B.: Comparative usability evaluation. Behaviour and Information Technology 23(1), 65–74 (2004) 10. Nielsen, J.: Finding Usability Problems Through Heuristic Evaluation. In: Proceedings of CHI Conference on Human Factors in Computing Systems, pp. 373–380. ACM, New York (1992) 11. Nielsen, J.: Usability Engineering. Academic Press, San Diego (1993) 12. Nielsen, J.: Usability Inspection Methods. In: CHI’94, Boston, Massachusetts (1994) 13. Öörni, K.: What do we know about usability evaluation? - A critical view, In: Conference on Users in the Electronic Information Environments, September 8 - 9, 2003 Espoo, Finland (2003) 14. Rosson, M.B., Caroll, J.M.: Usability Engineering: Scenario-Based Development of Human-Computer Interaction. Morgan-Kaufmann, San Francisco (2002) 15. Rubin, J.: Handbook of Usability Testing. John Wiley & Sons, Inc. New York (1994) 16. Sears, A.: Heuristic Walkthroughs: Finding the Problems Without the Noise. International Journal of Human-Computer Interaction 9(3), 213–234 (1997) 17. Van den Haak, M.J., De Jong, M.D.T., Schellens, P.J.: Employing think-aloud protocols and constructive interaction to test the usability of online library catalogues: a methodological comparison. Interacting with Computers 16, 1153–1170 (2004)
Concept of Usability Revisited Masaaki Kurosu National Institute of Multimedia Education [email protected]
Abstract. Based on the historical review, a new model on the concept structure of usability and satisfaction was proposed. As a proposer of user engineering, the author redefined the concept of usability of which the usability engineering is responsible and linked the concept of satisfaction to the user engineering. It is based on the differentiation of the objective characteristics of artefact and the subjective impression of user. Keywords: usability, satisfaction, usability engineering, user engineering.
1 Introduction Since ISO13407 was standardized in 1999, usability engineering entered a new era and increased amount of attention has begun to be cast on the usability or the quality in use. At least in Japan, the concept of usability is based on ISO13407 which cites the definition of ISO9241-11. As the alias of “big usability” suggests, it covers wider range of quality compared to the “small usability” originally proposed by Nielsen. But the author was questioning the conceptual dependency among the sub-concepts of the definition of ISO9241-11. Here the author present a revised version of the notion of usability and put more emphasis on the satisfaction as an ultimate goal for user engineering.
“usability”. As will be discussed later, the goal of our activity should not be limited to the traditional connotation of the term “usability” but should be more broadened.
3 Concept of Usability In this section, some major definitions of usability will be reviewed and finally a new concept of usability and satisfaction will be proposed. Nielsen The formal and structural definition of usability concept was first given by Nielsen, J. as follows. As is shown in this figure, usability is composed of such sub-concepts as the learnablity, the efficiency, the user retention over time, the error rate and the satisfaction. And it should also be noted that the utility is put aside as a mutually exclusive concept to the usability. This concept structure may be related to the activity of Nielsen himself. As is well known, he proposed the heuristic evaluation method for evaluating the usability, i.e., for detecting the problems. Thus, for him, the usability is an activity to improve the negative aspects of the artefact that will be found by the evaluation method. In other words, it’s a “non-negative” concept of the usability and it is aiming to improve the artefact from minus level to zero level or the normal level. Looking back the history of usability engineering, it started from the evaluation activity using the usability testing, the inspection method, etc. So it is quite natural that Nielsen focused on the evaluation and proposed the concept structure of usability as such. But the usability activity based on the evaluation had some limitations. For one, engineers and designers who designed the artefact wouldn’t easily agree to accept the result of evaluation, and claim that the users can use it if they should follow the procedure that was designed by them. For another, managers wouldn’t put emphasis on the evaluation-based usability activity because just improving the defects will not contribute to the sales. Of course, there were some engineers, designers, and
Concept of Usability Revisited
581
managers who could understand the significance of the usability activity even though it is a “non-negative” approach. But most of them put their energy to the development of utility, or the functionality and the performance. So it could be said that the “non-negative” concept of usability, sometimes called the “small usability” is not sufficient, and something more should be considered.
ISO 9126 ISO9126 was standardized for defining the quality of software. As can be seen in the figure, there are many quality characteristics that include the usability as just a part. In this standard, the usability is considered to be consisting of the understandability, the learnability and the operability.
582
M. Kurosu
It is reasonable that this standard included the usability as one aspect of the software quality, but its definition is narrow and insufficient. ISO9241-11 An influential definition of usability was proposed by ISO9241-11. The definition of usability clearly specifies that the usability is related to the goad-achievement and it put emphasis on the context of use. The sub-concepts of usability consist of the effectiveness, the efficiency and the satisfaction. It is important that the effectiveness and the efficiency are not only related to the “non-negative” aspects but also the “positive” aspects of the artefact. Regarding the effectiveness, the artefact can become usable by minimizing the difficulty of use. But at the same time, the artefact can become usable by providing the function that will solve the user’s problem and make it easier to achieve the goal. Regarding the efficiency, the usability will be improved by changing the interaction procedure in order to shorten the time of operation. But it could also be improved by providing the faster CPU. In this sense, the definition of usability of ISO9241-11 is not just the “nonnegative” one but also is the “positive” one. In other words, it could be said that this definition includes both the usability and the utility in the definition of Nielsen, and is almost the same with his definition of usefulness. This definition is sometimes called as the “big usability”.
This definition also includes the satisfaction as a promoting factor of the use of artefact. But this point is a bit controversial. The effectiveness and the efficiency are the property of the artefact so is the usability. But the satisfaction can be achieved as the result of the object property, i.e. the effectiveness and the efficiency, and is the subjective impression on the side of the user. Another point that should be carefully looked at is the use of the term “specified users”. It is quite natural that the manufacturer presupposes the user as the targeted
Concept of Usability Revisited
583
user. But it was frequently observed that the profile of the targeted user was based on the engineers and designers themselves. Thus, it is sometimes criticized that the profile of user is a male, aged around 30’s, having a certain level of knowledge of IT. As a result, the artefact they designed frequently becomes difficult to use by everyday people. This was sometimes pointed out by those who are working in the field of universal design and accessibility. Thus it should be redefined as to include the every possible type of users. Anyway, this definition of usability of ISO9241-11 was so influential that ISO13407, the core standard of usability, and other standards such as CIF, ISO18529, ISO16982, ISO20282, etc. are adopting this definition. Jordan Patrick Jordan put emphasis on the pleasure and proposed a three-level concept structure. The functionality is placed as the first level and the usability as the second level. He put emphasis on the pleasure as the third level, because it is inevitable for the artefact not just fulfilling the ease of use but also enhancing the emotional aspects. This corresponds to the current trend that focuses on the emotional aspect of the artefact as was proposed by Norman. In his definition of usability, he cited that of ISO9241-11. But this point is a bit confusing. As was mentioned above, the definition of usability of ISO9241-11 includes the satisfaction as its part. So it is difficult to clearly differentiate the satisfaction and the pleasure. Furthermore, his definition is a bit too simple and does not refer to other aspects of the artefact. Kurosu Considering the insufficiencies of past definitions, Kurosu proposed a new hierarchical model of usability. He differentiated the objective properties of artefacts (on the left hand side) and the subjective characteristics of users (on the right hand side). In the left half, the effectiveness and the efficiency are included as to be influenced by the utility and the (small) usability. The former consists of the functionality and the performance and the latter consists of the ease of operation and the ease of cognition. The ease of operation was once a main target of usability activity by applying the methods and knowledge of the human factors engineering and ergonomics. The ease of cognition later became the center of the concern of the usability professionals that was triggered by the advent of computer and its applications. It was put forward by applying the methods and knowledge of the cognitive psychology. The effectiveness, the efficiency and the satisfaction were regarded as the subconcepts of the usability in ISO9241-11, but Kurosu limits the range of usability concept only to the effectiveness and the efficiency on the left hand side of the figure, admitting the influence of these concepts to the satisfaction. It is based on the notion that the usability is the property of artefact.
584
M. Kurosu
Besides the effectiveness and the efficiency, such quality properties as the cost, the safety and the reliability are located in the property of artefacts that may influence the satisfaction. Some other properties such as the re-usability could be added to the list of quality properties if necessary. On the right hand side of the figure, the satisfaction is located as the top concept and some other subjective characteristics on the side of user such as the pleasure, the aesthetic impression, the attachment, the motivation, the drive and the value system are described as influencing the satisfaction. It is also suggested that the satisfaction, the supreme goal of the artefact, may be related to the user experience (UX), the customer satisfaction (CS) and the quality of life (QOL).
In addition to this structure, Kurosu points out that it is important to include the spatial dimension and the temporal dimension. He put emphasis for considering the diversity of user characteristics and the diversity of context of use as a spatial expansion of the concept. This could be the basis of the concept of universal usability as was originally proposed by Shneiderman. He also introduced the temporal dimension and put emphasis on the long term use or the prolonged use. It is a contrasting approach to the former usability engineering that focus on just a short time use as can be evaluated in the situation of the usability testing. These points will be explained in detail in later sections.
Concept of Usability Revisited Characteristics Age Generation Gender Physical traits Mental traits Educational background Social status Knowledge and skill Language Culture Communication style Cognitive style Learning style Functional Insufficiency
Situation Life style Economical situation Political situation Emotional status Geographical environment Historical background Urgency
585
Value Preference Political attitude Religion Tradition
4 Concluding Remarks Based on the notion of goal achievement, a few notable definitions of usability were reviewed and a new concept of usability and the concept of satisfaction were redefined respectively, thus putting more emphasis on the user engineering. Although this idea is not quite new to the author, he is now confident that the satisfaction is the ultimate goal of people (user) living in this world. Artefacts include invisible systems such as the educational system, the local government system, the banking system, and the transportation system. The concern of author was now enlarged to consider how the educational system can satisfy people who are considering their life-path and the carrier-path. In this sense, the usability of the educational system that include the hardware, the software, the humanware and their total system should be inspected from the viewpoint of the satisfaction. The system should support people to finally select their life path and empower them to have knowledge and skill to realize that goal. The effectiveness and the efficiency of the educational system is just a matter of usability and that will not fulfil their goal of life. In this sense, the author is now interested in pursuing the difference of artefacts in time and in place. He is now conducting the ethnographic research to find out how people invented and decided to use some specific form or pattern of artefact for supporting their life. This is called the “Artefact Development Theory” and will be presented in the next opportunity.
References 1. ISO ISO13407, Human-centred design processes for interactive systems (1999) 2. ISO ISO9241-11, Ergonomic requirements for office work with visual display terminals (VDTs). Guidance on Usability (1998)
586
M. Kurosu
3. Jordan, P.W.: An Introduction to Usability? Taylor and Francis, London (1998) 4. Jordan, P.W., Thomas, B., Weerdmeester, B.A., McClelland, I.L.: Usability Evaluation in Industry? Taylor and Francis, London (1996) 5. Kurosu, M.: What is usability?, HCD-Net News 2006. In: Nielsen, J., Usability Engineering?AP Professional (1993)
How to Use Emotional Usability to Make the Product Serves a Need Beyond the Traditional Functional Objective to Satisfy the Emotion Needs of the User in Order to Improve the Product Differentiator – Focus on Home Appliance Product Liu Ning1 and Shang Ting2 1
Abstract. A traditional definition of usability cites the successful attainment of some related control within a specified period of time and a minimum number of errors. Therefore, most of attempts focused on the function of the product. At present, user centered design is highly emphasized; in addition, more entertainment-oriented products has received high attention by consumer. So, whether or not the product can meet the emotion needs of the consumer is significant for the brand. This paper provides the definition of emotional usability based on the traditional usability research and introduce one of the most famous home appliance company Haier how to use it during the course of product development through case study and provides the process to apply emotional usability to make the product serves a need beyond the traditional functional objective to satisfy the emotion needs of the user in order to improve the product differentiator.
key when people made decision, for example people choose die because of the certain brief; people choose loyal to their lover because of deep falling in love. No doubt, people make decision to pay for the product partly is also partly influenced by emotion. This emotion could come from various origins, such as requirements, impression, trustily and so on. Even sometime people buy the product not because they need it, just because of experience or deeply moved. In the general, the requirement of human for the product performance usually includes the functional, atheistic and quality. From the view of functional usability which means the people who use it can do so quickly and easily to accomplish their own tasks. People will never pay for the product which let them spent a great deal of time on figuring out how to work or keep making errors when customers use the product. It’s easy for customer to abandon these kinds of products. For atheistic, a dictionary provides a definition, “the beautiful, in the taste and art.” This component includes the first and last impression of the product. The role of atheistic has already be the significant component of the product. But sometime, the product which is equip with the aesthetic couldn’t meet customer’s needs or hard to use. This paper will deeply explain the definition of emotional usability and introduce one of the most famous home appliance product company how to address the emotional usability into the new product development.
2 Emotion Usability Hence, what’s the emotion usability really? Let’s talk about usability first. Usability contains two parts; one is functional usability which focuses on easy and efficiency to use the product. The other part of usability is emotion usability refers to a degree to which a product is desirable or serves needs beyond the traditional functional usability. Take the refrigerator for example, Chinese people consider this product which could make the food last longer in 1985s which is very beginning of refrigerator in China; later, people think this function is not enough, customers want to fridge make the food more fresh, save the energy, keep the humidity and so on. However, at present, refrigerator is not just a home appliance for them. Chinese expect the product could be nice furniture in their apartment. Therefore, traditional components of appearance, such as color which includes white and argentine are obvious lack of attractive for customer. Even those products equip with satisfied quality and function. This case could be applied for many types of human electronic product, like wash machine, TV and IT products. Does this mean quality is no more important or no more tempting for Chinese? Absolutely not, let’s back to see how people interact with the product. Figure 1 describes the cognitive judgment for the products. Normally, customer interacts with the product from five senses which is smell, touch, taste, taste and sight. Nevertheless, touch, sight and hearing are main sense of customer interact with home appliance product. Basically, hear could have connection with the performance and quality, such as the degree of noise etc. However, part of touch and particularly sight are related to customer’s emotion reaction. Sight prefers more about aesthetic part and whether or not
Focus on Home Appliance Product
589
the product could serve their needs, such as appearance, user interface design, and function. Touch tends more about the process of interact with the product, such as the usability. Certainly, not only aesthetic could impact human’s emotion changing. Functional requiems point the product could meet people’s potential needs and the enjoyable interactive process not just easy to use. These factors all take great influence on human’s emotion. Some time the product is unique enough, but it couldn’t meet the customer needs, the product will still suffer Waterloo. In the history of industry design, innumerable well designed product failed in the commercial place. As we mentioned, serving the potential needs beyond the functional usability is the key point. This means the most important thing is address the needs which customer doesn’t realize but they really need in the product. In this point, understanding customer’s way of thinking is significant.
Interaction Obje ctive sm ell touc
tast
heari sig ht
Cognitive judgment
Fig. 1. Cognitive judgment for the product
Nowadays, many western companies try to won huge China market. However, there is a deep gap between Chinese and western people. Primarily, there’s distinct difference between western culture and Chinese culture. These differences reflect to the emotion part. The core of Chinese culture is implicit, middlebrow, be afraid of loosing face or let other people feel embarrassed. These are the some important reasons why many Chinese pay attention to the products which can stand out their status, taste and
590
N. Liu and T. Shang
personalize. In contrast, simple, straight forward are the key points of western culture. Western people pay attention to more about simple, practical about product. As we mentioned the refrigerator case, Chinese feel proud when the guest comes to their apartment to see the refrigerator with attractive features. Chinese like to show the new mobile phone which they just bought to their friends, even the old one is not bad all. People feel bored or out of date to use the refrigerator with traditional design or the old model of the mobile phone, even this product has good quality. In this case, the product which can successfully address western people emotion usability might not be able to work in China.
3 The Process to Apply Both Functional and Emotion Usability in Haier Haier is one of the most famous home appliances in China. The company gains the compelling success in China. The secret of success of Haier in the beginning is meticulous support after sell. At present, one more secret of success is always address Chinese customer’s potential needs. Haier have the practical process to adrress emotional usability in the new product. Frist, let’s introduce the principle of Haier design. 3.1 The Principle of Haier Design In 2002, Haier develop the compact wash machine for summer clothes and other small clothes which couldn’t wash quickly because of too light by the normal wash machine. This product has broken the rules which wash machine is relative hard to sell in summer also won G-Mark of Japanese design competition. In 2006, Haier air conditional again won the International Industrial Design competition (IF) in German.
In fact, the principle of Haier’s design is the product must satisfied user’s needs, here mentioned needs not only for those which are already been discovered, the most important part of the needs is keeping exposure the user’s potential needs. Central concepts of Haier emotional usability are two related product attributes. First, the
Focus on Home Appliance Product
591
product must have appropriate aptness, and first/lasting impressions. In another words, the product can catch and maintain user’s attention quickly and longer; second, eliminate the machine feeling which includes fear and lack of humidity to user. 3.2 Process From the beginning of the new product development, usability team is involved in Haier. The responsibility include discover user’s needs; build user interface design, usability testing and other types of testing. In summary, usability team plays a key role in the early new product development process. The following shows the snapshot of these activities (Figure2).
Fig. 2. The process of combine functional and emotional usability
Through conducting user research for the previous products, Haier can get the directly customer feedback. To some extent, this way could also gather user requirements to the new products. Obviously, Haier also conduct other type’s research on discovering potential customer needs. Then, these needs will be imputed into the new product developments team to develop the new concept. In fact, the most important steps to discover users emotional usability needs are conducted in the pre-development stage even before. Because, customer usually couldn’t provide very useful suggestion directly, mostly, customer doesn’t really know what they want. Through the various researches in pre-development stage, Haier gather more answer about why rather than how from the customers, research team keep working to figure out the story which is behind the reason to discover people’s potential needs. For example, in 2003, China was suffered by SARS, especially in South part. This disease can be easily infected even through air. Chinese is really serious afraid of it. The product team through user research to find that when people wash the clothes, people used to sterilize it after washes machine finished washing. However, the clothes got terrible flavor by disinfector. Because there is no wash machine which equip with disinfection function, some people use disinfector to wash the clothes first by hand then
592
N. Liu and T. Shang
put it into the wash machine. The reason of people use this unusual way to wash clothes obvious because of SARS. This potential needs was quickly identified by the new product development team. Haier launched the first wash machine which has disinfection function in Chinese market very soon at that moment. This product brought the huge profit to Haier. In this case, it’s easy to see, the product development team properly transfer customer needs into the new product which achieved the first aim of Haier’s emotional usability- the product must be engaging. Customers could be deeply moved by the unique design, also could because of the product provide something which they never saw but they really need it. Without doubted, aesthetics engaging is also important. Once goes to the concept stage, the product design and user interface design team will provide design concept based on those needs. After the new concept is finished, Haier will send the prototype to the laboratory to conduct one to one acceptance testing or using other research methods to evaluate the feature of the product. The opinion which is collected from the customer will directly sent to the product development team in order to quickly improve the product. At the same time, acceptance testing or type’s research is also conducted. Based on user’s reaction to the product, the company could simply evaluate whether or not the product could catch people’s eyes. It includes product design, usability, user interface even inside frame.
Take this red refrigerator for granted, after this product was finished design stage, Haier did competitive testing between Haier’, Sumsung, Panasonic and LG in China. The result is this model got pretty much praise by customer. Once Haier launch this model, Haier got quite a lot success in the market. Meanwhile, Haier also sent this model to German to participate the IF competition in 2006. Haier won the award. In order to better to achieve the emotional Usability aims, especially for eliminate the fear of the machine, Haier emphasizes the user interface design in the beginning of the new product development. Actually, achieving the functional usability is just a base goal for Haier. The interface design team combines functional usability and visual appealing together to let people are engaging with the product by the easy and enjoyable interaction process between product and customer. The user interface design
Focus on Home Appliance Product
593
concept finally will be conducted by usability testing with the customer to evaluate whether or not the concept could achieve the aim of usability. • Weak point of the process There is no omnipotence weapon in the world. This process already made fantastic impaction on new product development for Haier. However, it also showed the negative parts. First, the cognitive of human keep changing, it also include the opinion which they provide to the company. Sometime, what they said is not really carefully considered. Second, usability is adopt from western, it include many methods. This question back to the previous issue which is western has great difference with China. Chinese people tend to be silence, implicit, consider carefully about company’s face while the research is conducted which are not good. Except that, many methods could work well in western, however, those methods couldn’t work ideal in China. The bad result of research might serious mislead the company’s decision. Therefore, finding a proper methods and make the methods localization are another key issue now.
4 Conclusions To summarize, usability program is new and still growing in Haier. Haier actually just start the fundamental process of usability process; include early and continual focus on consumer and perfect our interactive design process that relies heavily on the research and prototype. Based on this process, Haier customize the different department into new product development process in order to finally improve the differentiator in China. As the principle of Haier design group, emotional usability is the key to improve the product differentiator and eventually achieve success in the market place.
References 1. Micheal, E. wiklund Robert, J.: Usability in Practise, Logan, PhD, Behavior and emotional usability (1994) 2. Oscar Person, Usability is not enough
Towards Remote Empirical Evaluation of Web Pages' Usability Juan Miguel López1, Inmaculada Fajardo2, and Julio Abascal1 1
Laboratory of Human-Computer Interaction for Special Needs (LHCISN) Computer Science Faculty. University of the Basque Country Manuel Lardizabal 1; Donostia - San Sebastian [email protected], [email protected] 2 Cognitive Ergonomics Group Department of Experimental Psychology. University of Granada Cartuja Campus; Granada [email protected]
Abstract. The functional description of EWEB, a tool for automatic empirical evaluation of web navigation, is presented in this document. EWEB supports naïve evaluators for designing experiments which contain experiment type (within-subject, factorial, etc.), web logs to be captured (time, visited pages, etc.), task models (search task, free navigation) and surveys (questionnaires, card sorting) to be performed by experimental participants. EWEB stores navigational data preserving the experiment structure and supports data analysis and interpretation, with the possibility of generating usability metrics. Requiring minimal installation on client computer, EWEB can be used for both lab evaluation and remote evaluation in multiple browsers. One empirical web study, designed and performed by means of EWEB, is described in order to illustrate its validity as a research tool. Keywords: web usability experiments, log capturing and analyzing, web navigation metrics.
Towards Remote Empirical Evaluation of Web Pages' Usability
595
that reason, tools for facilitating the users’ recruitment and for automating the process of designing, registering and interpreting web usability experiments are essential. With the aim of assisting researchers, there exist tools that automate these processes separately. Many useful tools for the capturing of user behavior during web interaction can be found in the literature. On the one hand, tools such as [3], [4] or [5] store data generated by the HTTP level communication between a web server and a browser on a client machine. On the other hand, there are tools for capturing data from user interface, in this case a local web browser, such as [6], [7], [8], [9] and [10]. A subgroup in this category is the one of the tools that use a new or modified client browser specifically prepared for storing user navigation information, such as [11], [12], and [13]. There also exist a number of tools for automating knowledge elicitation tasks administration (mainly card sorting) in the Web context, such as [14], [15], [16] and [17]. However, these tasks are not usually integrated in automated capture tools. [15] is an exception that combines automation of event capture and card sorting tasks, so that researchers can design the task introducing the concepts to be classified by users and gathering the result of this categorization. Once user actions are registered, next step consists on analyzing and interpreting them. Among the tools that allow analyzing registered web navigation information, [10], which provides comparison between task models and user behaviour, [13] and [18] can be found. Other analysis tools such as [19] and [20] do not register user logs but provide interesting navigation metrics, for instance, the disorientation degree (L index of Lostness, [21]), St index of linearity and/or Cp index of complexity of the Web navigation route followed by the user [22]. As a summary, it can be stated that there are numerous tools which automate some of the processes implicated in an empirical evaluation of website usability, mainly capturing and analyzing. However, tools aiding the process of designing a web experiment are fewer and incomplete. Furthermore, to our knowledge there are no tools that facilitate conducting jointly the processes of designing complex web experiments, capturing user interaction, and analyzing and interpreting captured data. This fact may conduct evaluators to acquire and invert a big amount of resources in learning how to use different tools for each process, which paradoxically would interfere instead of making lighter the empirical evaluation of web navigation processes. Given the deficits found, and with the aim of suiting the need of one tool including all previously mentioned aspects, EWEB tool was developed. EWEB (acronym of Experimentation in the WEB) is a tool for automatic empirical evaluation of web navigation. EWEB supports naïve evaluators for creating experiments which contain experiment type (within-subject, factorial, etc.), web logs to be captured (time, visited pages, etc.), task models (search task, free navigation) and surveys (questionnaires, card sorting, etc) to be performed by experimental participants. Additionally, EWEB stores web navigation data preserving the experiment structure and supports data analysis and interpretation, with the possibility of generating usability metrics such as Lostness or similarity to the optimum path [21]. Finally, EWEB can be used for both lab evaluation and remote evaluation in multiple browsers requiring minimal installation on client computer.
596
J.M. López, I. Fajardo, and J. Abascal
2 EWEB Tool: Technical and Functional Description EWEB tool (Experimentation in the WEB) consists on three different modules (see Figure 1): Experimental Session Design module, User Guidance and Monitor module and Analysis module.
Fig. 1. Architecture of EWEB tool
An experimenter defines a session using the design module, creating a XML file as output. This file is used to define the experiment session. User guidance and monitor module uses the XML file to conduct the session in user's computer while monitoring navigational data. These data are stored in a remote repository, from which it can be analyzed according to the experiment session. Each part of the architecture is described next. 2.1 Experimental Session Design Module Any experiment can be described as a study that investigates the effect of X on Y. Therefore, when a Web experiment is carried out, it must be decided which variables are going to be manipulated (X or independent variables), controlled (strange
Towards Remote Empirical Evaluation of Web Pages' Usability
597
variables) and observed (Y or dependent variables), and in which way. For instance, let's consider a company that wants to evaluate the impact of the background colour (blue, white or green) of its website on the time that users require to complete a search task. In the design module of EWEB, the evaluator must add one Independent Variable (Background colour) and specify its levels (blue, white or green). Furthermore, the evaluator must decide whether all users will perform the search task for the three levels of the Independent Variable (within-subject design) or each user will perform it for just one level (between group design). As a result of these initial steps, the design model calculates the number of experimental groups and conditions automatically. In our example, the between group design would have three experimental conditions and three experimental groups. The evaluator must include three different groups of users and each of them would perform the search task with an unique background colour level. If the evaluator selected the within-subject design, the number of experimental conditions would also be three but the number of experimental groups would only be one since all users will perform all three experimental conditions. The next step is to define the task models and the procedure for each experimental condition. Currently, EWEB is designed to implement two types of web navigation tasks (search and free navigation) and two types of surveys (card sorting and questionnaires). 2.1.1 Search Task and Free Navigation Task The search task consists of users searching for a series of targets in the web site with a temporal limit. The design module allows configuring instructions, number of search tasks, time limit for the task, target URL, initial URL for the task, data to be logged (accuracy, time to find the target, pages accessed, total time per page, order of pages accessed) and order of searching trials (random or fixed order). This last point is very important in experimentation in order to prevent practice or fatigue effects, which may mask the effects of the independent variables. The free navigation task consists of asking users to navigate freely through the web starting from a specific web page for a given time. The design module allows configuring initial URL, time limit (if any) and navigational data to be logged. 2.1.2 Surveys Card sorting task is used as a knowledge elicitation task and it has been used extensively in Cognitive Psychology and Artificial Intelligence to study user learning or the so called Mental Model [23]. The card sorting task consists of asking users to sort cards which contain task relevant concepts. The output is a vector or matrix with the user data that can be compared graphically or statistically with a theoretical or expert matrix. Design module allows evaluators to introduce task relevant concepts and specify the theoretical vector. Questionnaire option allows evaluators to design a set of questions to be fulfilled by users. The number and type of questions (true-false, forced choice, scale, etc.) and the presentation format can be designed in this module. Finally, although the general instructions for the experimental conditions are not users tasks, they can be designed within each condition in order to facilitate the description of the procedure.
598
J.M. López, I. Fajardo, and J. Abascal
Continuing with previous example, if the figured company sells books, the speed users are able to find and match their targets is a relevant usability index. Therefore, if the evaluator started with the “blue background” experimental condition, he/she could select the search task. In the design module interface, the evaluator would select the number of search tasks (e.g. two searching trials), time limit (e.g. 20 seconds) and initial and target URLs. As some of the experimental conditions can be identical to the previously defined ones, EWEB allows the evaluator to copy tasks and procedures and later change some parameters on them. Finally, the procedure, that is, the presentation order of the sets of tasks for a specific experimental condition, can also be defined by the evaluator as random or fixed. For instance, if an experimental condition includes two tasks, search and card sorting, and the evaluator selects a fixed order, he/she should indicate which one must be performed first. This module provides as output a XML file with a specific format for the experiment design created. All different variables used and their different conditions are coded in this file. The specification file is later used by the User Guidance and Monitor Module, which assigns the different tasks users have to perform based on its information. The file is also used by the Analysis Module to facilitate the analysis of all user evaluation data. 2.2 User Guidance and Monitor Module In order to perform an experiment, this module must be downloaded to be run locally in user’s computer. As this module is developed using Java technology, the only requirement for the user machine is that Java Virtual Machine is installed. If so, the module will run by means of Java Web Start technology. This module is composed of two different parts: User Guidance Module and Monitor Module. 2.2.1 User Guidance Module User guidance module is based on the experimental design created by the experimenter in the previous stage. A XML file describing the design of the experiment is received as an input. According to the design, the tasks to be performed by the user and their order are established. For instance, if the procedure of an experimental condition is defined as random, all tasks related to it will be randomized when the user passes through this condition. Tasks to be performed are prepared and executed, based on the given experimental design. Instructions for the different groups of tasks to be performed are also provided by the experiment file, as are the texts for error or OK messages that may appear depending on user actions. 2.2.2 Monitor Module The Monitor Module is executed locally on the client machine and its goal is to monitor all information related to user interaction while performing the given tasks. In order to ensure that the evaluation is performed in a realistic scenario, data recovery must be performed in such a way that the user is not aware of any difference from the web navigation he/she performs in his/her browser. Therefore, this module defines no user interface. As almost all current browsers allow the option to connect to the web through a proxy server, in this module a proxy or intermediate software is used to route all the
Towards Remote Empirical Evaluation of Web Pages' Usability
599
incoming and outgoing client browser’s web traffic, so that relevant user navigation data can be captured using this technique. This approach permits the proxy to be used by almost all existing browsers and operating systems. In fact, EWEB has been successfully tested with different browsers such as Internet Explorer, Mozilla, Mozilla Firefox and Konqueror. Modularity of this approach allows a rapid adaptation for this module to be used with new browsers or new versions of supported browsers. This module performs both the proxy navigation activation and deactivation automatically, so that the user does not notice changes in browser configuration. The mechanism is different for each browser, so that different pieces of code have been developed to perform this, one for each different browser. If user’s browser already has a proxy configured, a hierarchy is created in the Monitor Module proxy so that the incoming and outgoing web traffic is rerouted to previously defined proxy. When user session ends, browser settings are restored so that the user can navigate with the previously defined proxy. Use of cached information is disabled to ensure that all users perform the evaluation in the same conditions, because the use of cached web pages can affect the results of the experiments. As user navigation data are recovered locally by the proxy, there is no problem with network latency and received data are more accurate than data obtained in a remote proxy server (millisecond accuracy can be achieved). Data from user performed tasks can be stored either in a remote repository or in a local file, depending on the information specified in the experiment. All recovered data are also stored and labeled according to the design. 2.3 Analysis Module Once users’ data are stored in the remote data repository after an experimental session, both directly or by adding local files manually, they can be analyzed using this module. In addition, data stored in the repository can be reformatted in plain text format so it can be directly imported from different tools such as Excel or Statistica for performing statistical analysis. The type of analysis that can be performed is different for each task type. In the case of the search task, this module allows analysis of parameters such as the number of correct trials per user (target found), average correct trials per experimental condition, total and average elapsed time for finding a target per user or per experimental condition, and the similarity of user path to task or task model's optimal one. The last parameter can be evaluated by means of the Lostness metric [21] per trial and subject, or the average Lostness per trial. Lostness index ranges between 0 and 1. The greater the values are, the greater user's Lostness will be. For the free navigation task, this module allows calculating the total time required by the user to navigate through the website (if there was no time limit) and a matrix of the transitions between actions for the analysis of user’s navigation strategies [22], or for the coherence between accessed nodes [24].
3 Case Study The material for the illustration come from an experimental study carried out by [25] with the aim of comparing ten different websites in terms of accessibility (measured
600
J.M. López, I. Fajardo, and J. Abascal
with the metrics proposed by [26]) and usability (measured by the accuracy, effectiveness and satisfaction searching information in the Web). EWEB was used to design, capture and analyze the experimental data. Twenty volunteers participated in the experiment (fourteen women and six men) whose average age was 25 years old. They were asked to search 54 targets in ten different Websites that was indicated each time (six searches per Website). The order for Websites presentation and searches per Website were randomized for participants. By means of the design module researchers introduced the number and types of variables and the program automatically calculated the number of experimental conditions and showed then to researchers. In this case, “Website” independent variable was manipulated within-subject and with ten levels, one for each website to be studied. Therefore, there were ten experimental conditions. Then, experiment designers selected each experimental condition they defined the tasks users should perform, in this case, a search task and a satisfaction questionnaire. Finally, researchers defined the number and characteristics of each one of the ten search trials: instructions, time limit, target, randomization, etc. Figure 2 shows a piece of the XML form that EWEB generated as a result of this process.
Fig. 2. Piece of the XML form which contains the experiment characteristics
Once researchers defined tasks and procedures of each one of the ten experimental conditions, they started the User Guidance and Monitor Module in the local computer of the experimental participant. Then, researchers selected the experiment, specified the participant identification and experimental group (the list of groups was automatically generated by EWEB). Since Website variable was a within-subjects variable, there was just one experimental group and all participants had to perform the searches and the satisfaction questionnaire for each one of the ten websites. The experiment started controlled by the user guidance module, which asked participants to perform the task according to the specified procedure (in this case, the administration of the websites and search targets was randomized for each participant). In the meantime, monitor module monitored participants’ actions while navigating and saved them.
Towards Remote Empirical Evaluation of Web Pages' Usability
601
The data were saved into a remote repository based on the structure of the experiment. The data report was grouped into task, experimental condition, measures, etc. In Table 1, it can be seen the accuracy and efficiency data of one participant in the 3 trials of the search task for one of the ten websites. Results were merged and exported to a statistic program in order to perform the required statistics. Table 1. Data calculated by EWEB for Participant 1 in the condition “W3C website” User Code: Participant 1 Search Task Experimental Condition: W3C website Trial Response Time (ms) 0 3656 1 60000 2 60000
Target Found
Lostness 1 1 0
0,25 0,780625 0
3.1 Results The results of one Website were removed from the analysis because the recollection of data failed for some users. Consequently, the accessibility of the remaining nine websites was calculated by means of [26] and compared to the usability metrics (search time, percentage of target found, lostness and satisfaction) calculated automatically by EWEB. The results showed the nine websites differed significantly in usability and accessibility and, which is more interesting, accessibility and usability metrics are not correlated and provide different websites ranking (see [25] for a wide description of this experimental results). Therefore, based on the results, it was concluded that technical web accessibility is not a good usability predictor.
4 Conclusions and Future Work The experimental study carried out by [25] illustrates that EWEB automates jointly the processes of experimental design, data registering and data analysis. The design module automatically generates and interprets a complex XML file containing the characteristics of the experiment without the need of researchers been experts in XML language. The design module safeguards the requirements for manipulation and control of an experimental study. In addition, the use of the experimental design facilitates the identification and analysis of users’ data by storing them attending to the patterns defined in the experiment. Since the experimental design and number of experimental conditions are defined in the design module, results are displayed as a function of the experimental structure. Identifying users by experimental conditions has an additional benefit because it prevents from confusing different users with a unique users or vice versa as it can happen when IPs are used for user identification. Another advantage is that EWEB automatically calculates metrics as the accuracy and users' lostness in the search task by comparing the task models provided by the researcher in the design module and user behaviour. That means that EWEB can analyze high level user behaviour and not only isolated events. Definitively, EWEB provides a great versatility at the same time that reduces the evaluators’ investment of
602
J.M. López, I. Fajardo, and J. Abascal
time and resources. From a technical point of view, the use of Java technology allows implementing an easily portable and multiplatform tool that eliminates network latency when measuring response time from users. Future work must consider the inclusion of new tasks and metrics such as Efficiency rating (E), Confidence rating (C ), St (index of linearity) and Cp (index of to the strategy related complexity [22]). In addition, it is interesting to improve the data display by introducing graphical information representation to facilitate the visual analysis to evaluators.
References 1. Shneiderman, B.: Designing the user interface: Strategies for effective human-computer interaction, 2nd edn. Addison-Wesley, Reading, MA (1992) 2. Salmerón, L., Salmerón, L., Cañas, J.J., Kintsch, W., Fajardo, I.: Are expert users always better searchers? Interaction of expertise and semantic grouping in hypertext search tasks. Behaviour and Information Technology 24(6), 471–475 (2005) 3. AccessWatch (n.d.), (2007) Retrieved on February 2007 from http://www.accesswatch.com/ 4. Analog (n.d.) (2007) Retrieved on February from http://www.analog.cx/ (2007) 5. WebTrends (n.d.) (2007) Retrieved on February 2007 from http://www.webtrends.com/ 6. Ellis, R.D., Jankowski, T.B., Jasper, J.E., Tharuvai, B.S.: Listener: a tool for client-side investigation of hypermedia navigation behavior. Behavior Research Methods, Instruments & Computers 30(6), 573–582 (1998) 7. Etgen, M., Cantor, J.: What does getting WET (Web Event-logging Tool) Mean for Web Usability? In: Proceeding of 5th International Conference on Human Factors and the Web, Gaithersburg (1999), http://zing.ncsl.nist.gov/hfweb/proceedings/etgen-cantor/index.html 8. Scholtz, J., Laskowski, S.: Developing usability tools and techniques for designing and testing web sites. In: Proceedings of the fourth conference on Human factors and the web. Basking Ridge, NJ (1998) Available at http://zing.ncsl.nist.gov/WebTools/ 9. Gonzalez, M.: ANTS: An Automatic Navigability Testing Tool for hypermedia. In: Proceedings of the Eurographics Multimedia’99 Workshop, Milán, Italy. Multimedia’99, Italy, Springer, Wein, Austria (2000) 10. Paganelli, L., Paternò, F.: Intelligent Analysis of User Interactions with Web Applications. In: Proceedings of ACM IUI 2002. San Francisco, CA. pp. 439–445 (2002) 11. HotJava (n.d.) (2007) Retrieved on the 10th of February 2007 from http://java.sun.com/ products/archive/hotjava/index.html 12. WebWindow (n.d.) (2007) Retrieved on the 10th of February 2007 from http:// www.javio.com/webwindow/webwindow.html 13. Edmonds, A.: Uzilla: A new tool for web usability testing. Behavior Research Methods, Instruments and Computers 35(2), 194–201 (2003) 14. WebSort (n.d.) (2007) Retrieved on February 2007 from http://www.websort.net/ 15. WebCAT (n.d.) (2007) Retrieved on the 8th of February 2007 from http://zing.ncsl.nist.gov/WebTools/WebCAT/overview.html 16. UzCardsort (n.d.) (2007) Retrieved on the 8th of February 2007 from http://uzilla.mozdev.org/cardsort.html 17. Harper, M.E, Jentsch, F.G, Berry, D., Lau, H.C, Bowers, C., Salas, E.: TPL–KATS-card sort: A tool for assessing structural knowledge. Behavior Research Methods, Instruments and Computers 35(4), 577–584 (2003)
Towards Remote Empirical Evaluation of Web Pages' Usability
603
18. Carmel, E., Crawford, S., Chen, H.: Browsing in hypertext: a cognitive study. IEEE Transactions on Systems, Man. and Cybernetics 22(5), 865–884 (1992) 19. Richter, T., Naumann, J., Noller, S.: LOGPAT: A semi-automatic way to analyze hypertext navigation behavior. Swiss Journal of Psychology 62(2), 113–120 (2003) 20. Brunstein, A., Naumann, A., Krems, J.F.: The Chemnitz LogAnalyzer: A Tool for Analyzing Data From Hypertext Navigation Research. Behavior Research Methods 37(2), 232–239 (2005) 21. Smith, P.A.: Towards a practical measure of hypertext usability. Interacting with Computers 8, 365–381 (1996) 22. McEneaney, J.E.: Graphical and numerical methods to access navigation in hypertext. International Journal of Human Computer Studies 6(5), 761–786 (2001) 23. Cañas, J.J., Antolí, A., Barquier, P., Castillo, A., Fajardo, I., Gámez, P., Salmerón, L.: Representación mental de los conceptos, objetos y personas implicados en una tarea realizada en una interfaz. Inteligencia Artificial 16, 107–113 (2002) 24. Foltz, P.W., Kintsch, W., Landauer, T.K.: The measurement of textual coherence with Latent Semantic Analysis. Discourse Processes 25, 285–307 (1998) 25. Arrue, M., Fajardo, I., López, J.M., Vigo, M.: Interdependence between technical web accessibility and usability: its influence on web quality models. International Journal of Web Engineering and Technology 3(3), 307–328 (2007) 26. Arrue, M., Vigo, M., Abascal, J.: Quantitative metrics for web accessibility evaluation. In: Lowe, D.G., Gaedke, M. (eds.) ICWE 2005. LNCS, vol. 3579, Springer, Heidelberg (2005)
Mixing Evaluation Methods for Assessing the Utility of an Interactive InfoVis Technique Markus Rester1 , Margit Pohl1 , Sylvia Wiltner1 , Klaus Hinum2, Silvia Miksch3 , Christian Popow4 , and Susanne Ohmann4 1
Institute of Design and Assessment of Technology, Vienna University of Technology, Austria [email protected] 2 Institute of Software Technology and Interactive Systems, Vienna Univ. of Technology, Austria 3 Department of Information and Knowledge Engineering, Danube University of Krems, Austria 4 Department of Child and Adolescent Psychiatry, Medical University of Vienna, Austria
Abstract. We describe the results of an empirical study comparing an interactive Information Visualization (InfoVis) technique called Gravi++ (GRAVI), Exploratory Data Analysis (EDA) and Machine Learning (ML). The application domain is the psychotherapeutic treatment of anorectic young women. The three techniques are supposed to support the therapists in finding the variables which influence success or failure in therapy. To evaluate the utility of the three techniques we developed on the one hand a report system which helped subjects to formulate and document in a self-directed manner the insights they gained when using the three techniques. On the other hand, focus groups were held with the subjects. The combination of these very different evaluation methods prevents jumping to false conclusions and enables for an comprehensive assessment of the tested techniques. The combined results indicate that the three techniques (EDA, ML, and GRAVI) are complementary and therefore should be used in conjunction. Keywords: Information Visualization, Evaluation, Utility, Focus Groups, Insight Reports, Methodology.
1 Introduction Several authors have pointed out the importance of evaluation studies of Information Visualization (InfoVis) techniques (see e.g. [1], [2], [3]). In the past few years usability studies concerning visualization techniques have become more frequent, and valuable information about the design of such systems has been gathered. Nevertheless, as [4] mentions, there is still too little systematic information about the specific strengths and weaknesses of the features of InfoVis techniques. Studies presenting data from practical experiences with InfoVis techniques can help to develop a more systematic framework to support the decision which InfoVis technique to use in a given context. Medical data is a very interesting application area for Information Visualization. One of the reasons for this is the complex and time dependent character of these data. For such data, interesting InfoVis techniques have been developed in the past few years. J. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2007, LNCS 4550, pp. 604–613, 2007. c Springer-Verlag Berlin Heidelberg 2007
Mixing Evaluation Methods for Assessing the Utility
605
In the following, we will describe a study analyzing several different methods used to assess the therapeutic treatment of anorectic young women. During the therapy process a large amount of highly complex data is collected. Statistical methods are not suitable to analyze these data because of the small sample size, the high number of variables and the time dependent character of the data. The data results from extensive questionnaires the young women and their parents have to fill in several times before, during and after the therapy. These questionnaires treat questions like, e.g., the young women’s propensity for depression, their social behavior or their attitude about eating. The therapists want to find patterns in the young women’s behavior and try to isolate the specific factors influencing success or failure in the therapy (predictors). InfoVis techniques might be a valuable possibility to represent these data, but in accordance with the therapists we also chose two other potential techniques (Machine Learning and Exploratory Data Analysis). Up till now, evaluation in Information Visualization was centered around two variables: time and error. This approach has been criticized recently [5]. For many applications, the measurement of time and errors is too narrow. Many visualization methods support extensive exploration processes and the formulation of hypotheses. For an exploration process, the measurement of time does not make sense, and in the context of the development of hypotheses, errors in a narrow sense do not occur. In an ill-structured domain with no clear-cut results like psychotherapy, for example, other approaches are necessary. Therefore, the concept of insights was introduced to make the results of the exploration processes based on InfoVis techniques more tangible [6]. Unluckily, there is no agreed upon definition of insights although cognitive psychology has dealt with this topic quite extensively (see e.g. [7]). Most authors define ’insight’ in a quite pragmatic manner. In addition, there are no general frameworks for categorizing insights. [8] points out that a starting point might be using user tasks as, for example, finding clusters or extreme values. There are some general cognitive activities which often appear as insight categories, as, for example, finding detailed, factual information, identifying clusters, generalizations, identifying changes over time, etc. [6,9]. We developed our own classification system, partly based on the generic categories described above and partly adapted to the specific task for which our visualization method was developed. Finding predictors plays an important part in the therapists work, therefore it is a central category of our analysis. Developing a theoretical framework for the concept of insights and the definition of relevant categories of analysis will be an important area of future research.
2 Compared Techniques An interactive InfoVis technique named Gravi++ (GRAVI) was developed to support the therapists and clinicians in exploring the multidimensional, abstract, and time dependent data [10]. GRAVI is based on a spring metaphor. The questions from the questionnaires are positioned on a circle. The icons representing the anorectic young women are arranged within this circle depending on the strength of attraction of the questions. The questions function, to a certain extent, like magnets. The final position of the patients’ icons is a combination of the forces of all given answers on the questions
606
M. Rester et al.
Fig. 1. GRAVI: Interactive InfoVis-Tool for Exploration of Multi-Dimensional Time Dependent Data (Typical Screenshot). Concept of Spring-Based Positioning Leads to Formation of Clusters.
(see Fig. 1). GRAVI uses animation to deal with the time dependent data. The position of the patients’ icons change over time. This allows analyzing and comparing the changing values. Various visualization options are available, like Star Glyphs and attraction rings to communicate the exact values of each answer or traces to show the paths of the patients’ icons over all time steps. We decided to compare GRAVI with the following techniques used so far for analyzing the data: Exploratory Data Analysis (EDA) and algorithms of Machine Learning (ML). In the case of EDA boxplots, histograms, scatterplots, and statistical measures were used (e.g., Fig. 2). The ML algorithms were: a C4.5 decision tree (e.g., Fig. 3) and a Support Vector Machine (SVM) trained by Sequential Minimal Optimization (SMO). Exploratory Data Analysis (EDA) was developed by Tukey [11] and is based on statistics. It helps users to review and analyze data on a descriptive level. Tukey thought that the emphasis on statistical testing might be too narrow an approach. He, therefore, suggested EDA as a possibility to formulate hypotheses and assess assumptions. Subjects were given printouts of these techniques. Machine Learning is an area of AI concerned with the development of algorithms that enable computers to ’learn’. A Machine Learning technique learns from observed examples or data. In general, there are two types of machine learning algorithms: supervised
Mixing Evaluation Methods for Assessing the Utility
607
Fig. 2. Exploratory Data Analysis (EDA) Sample: Boxplots
and unsupervised. In case of supervised learning, a priori knowledge about the data is used and in case of unsupervised learning, no prior information is given regarding the data or the output. We utilized two supervised schemes using WEKA [12]: a Support Vector Machine with Sequential Minimal Optimization algorithm [13,14] and a pruned C4.5 decision tree [15]. The output of these two techniques were again available to the subjects as printouts.
3 Evaluation Methods An extensive evaluation of InfoVis has to take place on different stages. Important areas of interest can be: usability evaluation, insight study, case study, and transferability assessment (see [16] for details). For results of a usability evaluation of GRAVI see [17]. The used methods in the insight study were insight reports [16] and focus groups (cf. [18]). A sample of 32 subjects participated in the study. They were computer science students and can therefore be described as domain novices. Therefore they received a comprehensive introduction to the domain (data, real users’ tasks, etc.) and introductions to the three different techniques to use. The evaluation with insight reports took place in a laboratory setting and lasted for an overall of 155 minutes. There was equal time for the three techniques (GRAVI, EDA, ML). Subjects were divided into three groups which used the three techniques in different order. Every technique was was once used in first, second, and third place (MEG, EGM, GME).
608
M. Rester et al.
Fig. 3. Machine Learning (ML) Sample: C4.5 Decision Tree
The subjects used a report system to formulate and document their findings during the exploration process in a self-directed manner. Whenever an insight occurred they had to generate a report with this system. The following data was collected: used material, description of finding, and confidence rating. The insight reports were later classified in the following categories: complexity of each insight, plausibility of an insight, and whether an assigned insight has been elaborated in more detail and if so, whether this elaboration was sound or not valid (see Fig. 4). Focus groups can give interesting insights into the users’ attitudes and experiences although they do not provide representative results [18]. [19] reports that focus groups are especially valuable for evaluating InfoVis techniques as they are able to uncover unexpected problems that cannot be perceived by other research methods. In this sense, they can be an interesting complementary approach to other more systematic methods. So focus groups with the same subjects were held a week after the laboratory setting. They lasted about 100 minutes each. Eight questions were discussed (e.g., ease of use and utility, major strength and weakness, similarity and difference of insights gained with the different techniques, appropriateness of combined use). The value of
Mixing Evaluation Methods for Assessing the Utility
609
Fig. 4. Insight Report Documented by Subject with Classification and Categorization Options for Investigators
this method is that it reveals subjective impressions on questions not asked before and gives a different perspective on as well as arguments for interpretation of the data collected in the experiment. The discussion guideline consisted of eight questions. A set of the first four questions had to be discussed by subjects for each of the three used techniques (GRAVI, EDA, ML) separately. Afterward four more questions were addressed to them concerning all three techniques: 1. 2. 3. 4. 5. 6. 7. 8.
Appropriateness of the allowed time. Ease of use and usefulness of the technique for gaining insights. Overall confidence in insights gained with the technique. Major strength and weakness of the technique. Similarity and difference of gained insights using different techniques. Assumed comprehension rates of the complex matter with each technique. Appropriateness of combined use of the three techniques. Order for best possible comprehension of the data.
4 Results 4.1 Insight Reports The 32 subjects documented an overall of 876 reports. In the classification process we defined 805 different insights which were assigned 2166 times to the reports. Statistical analysis of the collected data from this experiment was carried out. In depth details of these results are currently subject to reviewing and will be published in the near future.
610
M. Rester et al.
To sum up, the results could lead to the conclusion that ML is not a recommendable technique. The subjects’ confidence ratings were low, the complexity of the gained insights was low and few predictors were found. On the other hand GRAVI performed very well concerning insights with high domain value (finding predictors). Confidence ratings were also generally high. EDA lies somewhere in between. Histograms and scatterplots are well known. But the interpretation of boxplots and statistical measures require some familiarity with these techniques. EDA seems especially suited – or more precisely, it was utilized in particular – to analyze single values of individual patients in specific time steps. This may also be the reason why there were fewer wrong arguments with EDA. In contrast, there were many wrong arguments with ML. 4.2 Focus Groups Appropriateness of the Allowed Time. Concerning ML the subjects’ statements clearly show a connection to the position of ML in the order of used techniques: in the case ML was the first technique all of the subjects stated that there was too little time for the tasks. If ML followed GRAVI the allowed time was rated appropriate. Using ML at last led to the assessment that there was too much time left. Many explanatory statements were as follows: subjects are not familiar with ML; ML is no suitable technique to start with if one is not a domain expert already; ML is complex and confusing; and there were no new insights that have not been already gained with the other two techniques. In general the time allowed while using EDA was predominantly rated appropriate. Once more only when used as the first technique the subjects would have needed more time for the tasks. The familiarity with EDA was pointed out by the subjects. Only the statistical measures from EDA were criticized as difficult to interpret. The ratings for GRAVI are similar to ML, though not as pronounced, and follow the position of GRAVI: if GRAVI is used as first technique, subjects would have needed more time to get familiar with both technique and domain. For GRAVI in second position we have a trend towards too much time available for the tasks. Used as last technique subjects rated the allowed time appropriate. Ease of Use and Usefulness of the Technique for Gaining Insights. ML had the lowest scores regarding the usefulness for gaining insights. 55% of all statements made by the subjects belong to the lowest category on this scale. Once again, unfamiliarity with and complexity of ML led to high level of uncertainty. The assessment of EDA was twofold: scatterplots and histograms scored very well, whereas boxplots and statistical measures were not rated as useful. The former were favored for their simplicity and for being visualizations. The latter were criticized for being complicated and in the case of statistical measures for not being a visualization additionally. Also for GRAVI the subjects appreciated some elements as well as disapproved of others. The interactivity of this technique in general and its powerful capability to handle the time dependent data in particular were rated as very useful. Different visual details, like poor visibility of missing data, were mentioned to hinder usefulness.
Mixing Evaluation Methods for Assessing the Utility
611
Overall Confidence in Insights Gained with the Technique. ML had an even worse assessment in the focus groups compared to the ratings given in the lab setting: 65.6% of the statements rated ML in the category “low confidence”. This high ratio is most likely due to peer pressure in one of the three groups where all of the 12 participants rated ML unanimously (low confidence). Interestingly, EDA scored better than GRAVI in the focus groups. One possible explanation for this may be that EDA received a lot of high ratings in the focus group of those who used EDA at last. So we have probably on the one hand a form of learning effect leading to more domain expertise which also affects the confidence in observations. On the other hand EDA was the only technique the subjects were rather familiar with. So it is the more noteworthy that GRAVI did only receive a few more ratings in the “low confidence” category than EDA. Major Strength and Weakness of the Technique. Although the subjects could not make much use of ML they believe that for experts of ML this technique allows for very concise and valid insights. There was a strong appreciation of the automaticity of calculations and a high level of faith in the correctness of the results. The latter was also raised by the often positively mentioned confusion matrix, which is an self-evaluation on correctness provided by the ML algorithms. The visualization of the decision tree was rated a plus whereas the formula of SMO was mentioned to be confusing. The mentioned strengths of EDA were: visual elements (scatterplots, histograms), simplicity, familiarity, and clarity of displayed data. The lack of interactivity, the impossibility of comparison of patients and/or groups of patients, and problems with the exploration of time dependency are the downside of EDA. GRAVI impressed by its interactivity, many options to visualize data in different ways, the handling of time dependent data, its simplicity, and its intuitive interface. Subjects saw the major weaknesses of GRAVI in the fact that visualizing much data rapidly leads to glutted displays. Also the need for check and re-check of possible insights with different constellations is important. Otherwise false conclusions could easily be drawn. Similarity and Difference of Gained Insights Using Different Techniques. The subjects reported by majority that they found the same insights with the three different techniques. Almost 2/3 of the made statements went in this category. Nevertheless the detail of insights varied. Assumed Comprehension Rates of the Complex Matter with Each Technique. ML showed to contribute very little to the comprehension of the provided data. This is in clear accordance with the former statements of the subjects. EDA and GRAVI on the other hand could be utilized well by subjects. Appropriateness of Combined Use of the Three Techniques. 45% of statements put on record that the combined use of ML, EDA, and GRAVI makes perfect sense because all three techniques offer different views on the data and therefore facilitate a deeper understanding and extensive exploration. Other 45% of statements pleaded for omission of ML due to its marginal contribution in comprehension of the data for the subjects who were not familiar with this complex technique.
612
M. Rester et al.
Order for Best Possible Comprehension of the Data. There were almost as many preferred orders in using the three techniques as there were subjects. But there are also some major similarities in the statements: ML is not suitable as the first technique but more useful to recheck insights gained with other techniques. GRAVI and also parts of EDA (simple visual parts: histograms and scatterplots) are viable techniques for first exploration of data. Another interesting outcome in the discussion was that the different techniques should not be used sequentially like in the laboratory setting but simultaneously. The already mentioned different views they provide on the data could add much more value in this way.
5 Conclusion The use of diverse evaluation methods enables different views on the technology under investigation. Whereas insight reports can reveal strengths and weaknesses in form of summative tests followed by statistical analysis, focus groups often give reasons and additional subjective opinions of subjects and therefore also ensure correct interpretation of the former. The outcome of insight reports could lead to the conclusion that ML is not a recommendable technique because of low confidence ratings, low complexity of the gained insights, and small number of found predictors. On the other hand GRAVI performed very well. There were many insights with high domain value (predictors) and with high confidence ratings. EDA seems especially suited to analyze single values of individual patients in specific time steps. The outcome of focus groups shows that GRAVI is useful for gaining insights with a high confidence rating, because of its flexibility through interactivity, the ability to explore more dimensions simultaneously, and the straightforward navigation within the time dependent data. Moreover, subjects rated GRAVI an appropriate visualization tool. ML should be omitted unless there is enough expertise with this technique. If so, it still can and probably will be a powerful technique to gain insight. EDA rapidly leads to insights (although rather basic ones) due to the general familiarity with this technique. Combining these results we see, that all three techniques offer different views on the data and therefore a combined use will likely lead to more insight and comprehension. Acknowledgments. The project “Interactive Information Visualization: Exploring and Supporting Human Reasoning Processes” is financed by the Vienna Science and Technology Fund [Grant WWTF CI038]. Thanks to Bernhard Meyer for the collaboration in the classification process.
References 1. Chen, C.: Empirical evaluation of information visualizations: an introduction. Int. J. HumanComputer Studies 53(5), 631–635 (2000) 2. Plaisant, C.: The challenge of information visualization evaluation. In: Costabile, M.F. (ed.) Proceedings of the working conference on Advanced visual interfaces, pp. 109–116. ACM Press, New York (2004)
Mixing Evaluation Methods for Assessing the Utility
613
3. Tory, M., M¨oller, T.: Human factors in visualization research. Visualization and Computer Graphics, IEEE Transactions on 10(1), 72–84 (2004) 4. Spence, R.: Information Visualization. ACM Press, New York (2001) 5. Stasko, J.: Evaluating information visualizations: Issues and opportunities (position statement). In: Bertini, E., Plaisant, C., Santucci, G. (eds.): Beyond time and errors: novel evaLuation methods for Information Visualization – Proceedings of BELIV’06, Venice, Italy, pp. 5–7 ( 2006) 6. Saraiya, P., North, C., Duca, K.: An insight-based methodology for evaluating bioinformatics visualizations. Visualization and Computer Graphics, IEEE Transactions on 11(4), 443–456 (2005) 7. Eysenck, M.W., Keane, M.T.: Cognitive Psychology. A Student’s Handbook. Psychology Press, Taylor and Francis Group, London, New York (2005) 8. North, C.: Toward measuring visualization insight. Computer Graphics and Applications, IEEE 26(3), 6–9 (2006) 9. Lanzenberger, M.: The Interactive Stardinates – An Information Visualization Technique Applied in a Multiple View System. PhD thesis, Vienna University of Technology, Vienna, Austria ((September 2003) 10. Hinum, K., Miksch, S., Aigner, W., Ohmann, S., Popow, C., Pohl, M., Rester, M.: Gravi++: Interactive information visualization to explore highly structured temporal data. Journal of Universal Comp. Science 11(11), 1792–1805 (2005) 11. Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading, Mass (1998) 12. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA (2005) 13. Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–210. MIT Press, Cambridge (1998) 14. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computing 13(3), 637–649 (2001) 15. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA (1993) 16. Rester, M., Pohl, M., Hinum, K., Miksch, S., Popow, C., Ohmann, S., Banovic, S.: Methods for the evaluation of an interactive infovis tool supporting exploratory reasoning processes. In: BELIV ’06: Proceedings of the 2006 AVI workshop on Beyond time and errors, New York, NY, pp. 32–37. ACM Press, New York (2006) 17. Rester, M., Pohl, M., Hinum, K., Miksch, S., Ohmann, S., Popow, C., Banovic, S.: Assessing the usability of an interactive information visualization method as the first step of a sustainable evaluation. In: Proc. Empowering Software Quality: How can Usability Engineering reach these goals?, Austrian Computer Society, pp. 31–44 (2005) 18. Kuniavsky, M.: User Experience: A Practitioner’s Guide for User Research. Morgan Kaufmann, San Francisco (2003) 19. Mazza, R.: Evaluating information visualization applications with focus groups: the coursevis experience. In: BELIV ’06: Proceedings of the 2006 AVI workshop on BEyond time and errors, New York, NY, USA, pp. 1–6. ACM Press, New York (2006)
Serial Hanging Out: Rapid Ethnographic Needs Assessment in Rural Settings Jaspal S. Sandhu1,∗, P. Altankhuyag2, and D. Amarsaikhan3 1
College of Engineering, University of California, Berkeley, USA [email protected] 2 Asian Development Bank, Ministry of Health, Mongolia 3 Postgraduate Institute, Health Sciences University of Mongolia
Abstract. This paper presents an ethnographic method for assessing user needs in designing for rural settings. “Serial Hanging Out” consists of short-term participant observation with multiple, independent informants. The method is characterized by: (1) its short-term nature, (2) the use of participant observation supported by specific field techniques, and (3) the emphasis on user needs for design. It is discussed in relation to similar methodological work in associated fields. To ground the discussion, the method is presented in the context of ongoing work to develop improved information systems to support rural health workers in Mongolia. Keywords: participant observation, ethnography, design, qualitative methods, user needs, rural, Mongolia.
1 Introduction Rapid ethnographic methods play a critical role in human-centered design. They have been applied extensively not only in human-computer interaction (HCI) and computer-supported cooperative work (CSCW) [1][2][3], but also in product design [4][5], and marketing and consumer research [6][7]. The common motivation for the use of these methods across various disciplines is that they provide a much richer understanding of people, in context and from their own perspective. Blomberg et al. succinctly describe the key dimensions of ethnographic research: (1) it takes place in natural settings, (2) it is holistic, i.e. understanding is framed in systems larger than the immediate context, (3) it is descriptive, and (4) it strives to consider the member’s own perspective [8]. Rapid methods are necessary because projects operate under severe time and budgetary constraints; however, holism and the member’s perspective are often sacrificed in order to operate within these constraints [9]. While ethnographic methods have been used for requirements generation [10], the emphasis here is on design innovation in unfamiliar environments, specifically rural communities in the context of international development. The focus is not only on how technology can ∗
Serial Hanging Out: Rapid Ethnographic Needs Assessment in Rural Settings
615
be designed, but also on whether technology makes sense in the first place. In either case, the objective is to design holistic and systemic interventions that target real development problems. Such design innovation requires a deep understanding of target users, which is challenging to obtain using rapid methods. Dourish criticizes ethnographic practice in HCI on the basis that it tends to neglect the interpretive nature of ethnography [9]. Achieving a deep understanding and preserving the interpretive nature of ethnographic research are typically correlated to long term fieldwork, but it is unrealistic to propose long-term fieldwork in most applied design settings. The presented work addresses this issue through longer engagements with individuals than are typical in applied design settings, and by operating from an interpretive perspective, namely one that sits “between two worlds or systems of meaning – the world of the ethnographer … and the world of cultural members” [11]. In HCI, Rapid Ethnography [12] and Quick and Dirty Ethnography [3] have been presented, but these methods have not been sufficiently evaluated in practice [13]. While these forms of ethnographic research focus on a place – offices and air traffic control rooms, respectively – the method proposed here is focused on the individualas-anchor; this makes more sense in many rural applications and is a major thrust of framing this method as one for rural applications. Rapid Ethnography and Quick and Dirty Ethnography share in common an emphasis on focused studies. Such a premature focus may blind researchers to critical information in relatively unfamiliar, rural settings. Although most ethnographic work is cross-cultural to some degree, work in rural communities for international development represents an extreme because those involved in conducting the research often come from very different cultural backgrounds, even if they are from in-country. The settings of international development are necessarily more diverse than business settings, where “through years of experience, trained ethnographers build up a great deal of knowledge … about those segments of the population who are reliably of interest to business” [14]. Given the relative unfamiliarity of the context to those conducting the research informal methods [15] are required, as is a need to “approach social life with a wideangle lens” [16]. Prior methodological work in HCI has not focused on rural settings, so there is an opportunity for HCI to contribute to related work in the international development community [17][18]; however, the proposed methodology – although related to HCI and innovation – is more concerned with meeting specific development objectives than it is with achieving novel technology gains.
2 The Method The proposed method for understanding user needs in the context of designing information systems for rural communities is Serial Hanging Out (SHO): sequential, short-term (2-4 days) participant observation with multiple, independent informants. The participant observation techniques are more sophisticated than the phrase
616
J.S. Sandhu, P. Altankhuyag, and D. Amarsaikhan
“hanging out” suggests.1 Still, the metaphor of “hanging out” captures the essence of the work: participant observation is “a way to collect data in naturalistic settings … [to] observe and/or take part in the common and uncommon activities of the people being studied” [19]. SHO is related to Sanjek’s [20] network method in urban ethnography. His “network-serials” consisted of intensive interviewing of informants in order “to describe the behavior and purposes of the members … and to chart the range of interaction settings.” A key similarity between Sanjek’s network serials and SHO is that they focus on informants as “anchors”, providing access to different activities, interactions, relationships, and actors. Despite the similarities, there are several key differences. First, the core field method in SHO is short-term participant observation rather than interviewing. Sanjek’s suggestion of “direct behavioral recording” as a plausible alternative to “intensive interviewing” highlights another important difference related to the respective goals of the two methods. Network-serials are intended to chronicle behavior, but SHO uses the concept of serial interactions and social networks to explore the interactions themselves. It is not simply interactions occurring in a particular time and place with particular actors that are of interest in SHO, but it is the quality, content, and meaning of those human interactions. Moreover, SHO is not limited to interpreting interactions and movements – in particular, other elements of understanding arise from contextual, ethnographic interviews. In practical terms, SHO uses multiple teams of 1-2 researchers each in order to conduct a study. The need for parallel teams is driven by the significant time investment in each participant, not only the time with the informant, but also the transport time (often significant in rural areas, especially with disparate informants) and the time spent synthesizing field data. As Millen states, international research may require multiple researchers on a single team in order to “help with language and local cultural issues” [12]. With respect to parallel teams, he notes that multiple researchers can observe different groups. Researchers will always have some influence on the situations in which they are involved. This is understood and is in fact an integral part of interpretive research; however the influence of more than 2 researchers is so disruptive that two is the maximum number of researchers to be involved with a single informant. SHO avoids Geertz’s criticism of "hit-and-run" ethnography [21] by concentrating on specific informants methods, and themes of inquiry. Further, the focus on informants-as-anchors, the longer engagement, and the emphases on holism and member’s point-of-view distinguish SHO from contextual inquiry [22].
3 Context This paper presents a theoretical argument for SHO that informs the design of an applied research study in rural Mongolia. While the method draws from the primary 1
The phrase is adopted from Geertz [21] who cites Clifford as originating the phrase “deep hanging out”. Clifford’s tone is intended as an affront to traditional ethnography, but Geertz picks up the phrase, dusts it off, and wears it proudly.
Serial Hanging Out: Rapid Ethnographic Needs Assessment in Rural Settings
617
author’s past experience with design research in rural settings [23][24], it is being formally implemented and evaluated in the context of this Mongolia study. Mongolia is unique among developing countries in that one-quarter of the overall population is nomadic or semi-nomadic, and rural areas are extremely sparsely populated [25]. By many standards, the Mongolian health infrastructure is highly developed; however, facilities and human resources are increasingly limited in providing effective healthcare for rural populations that lie beyond aimag (provincial) capitals. Bagiin baga emch, rural health workers, provide services at the bag (smallest administrative unit) level by traveling from household to household by motorcycle or horse; however, there are significant unmet needs in continuing training of these paraprofessionals [26] and in providing support for their work practice. This research focuses on understanding the lives of bagiin baga emch in order to design improved information systems to support their work. The primary field research is being undertaken in partnership with an Asian Development Bank (ADB) project: “Information and Communication Technology for Improving Rural Health Services in Mongolia” (JFICT 9053-MON). The objective of this project is to improve the health of vulnerable, rural populations – especially mothers and young children – by using ICT (information and communication technology) tools to support health services delivery. Part of this project involves providing PDAs (personal digital assistants, or handheld computers) to bagiin baga emch, primarily in order to support data collection. PDA deployment and associated training began in spring 2007. In addition, this work will contribute to an International Development Research Council (IDRC) pilot project that will provide PDAs to rural health workers in Nepal (via HealthNet Nepal) and Mongolia (Health Sciences University of Mongolia). The intention of providing PDAs is to support continuing education and decision-making at the point of care.
4 Sampling Sampling issues are fundamental to ethnographic research, particularly in rural settings where potential users of systems may be geographically and culturally dispersed. In SHO, maximum variation sampling [27] is preferred because it provides maximal coverage of perspectives, behaviors, practices, interactions, and activities. Maximum variation sampling is purposeful selection of informants representing a broad range on key dimensions. In the Mongolia case, such dimensions include geographic zone (e.g. arid steppe, forest steppe, mountain, desert), transport (motorcycle, horse, camel), and work experience. Bagiin baga emch are a formal part of the health care system, so the researchers have access both to the entire sampling frame and to key intermediaries at the aimag (provincial) and soum (county) levels. In other settings, access to informants, or information about them, may be more difficult to be obtain. In such cases, other sampling strategies [27] may be more suitable. The goal of this method is as much to define intracultural variation [28] in apparently homogeneous groups as it is in defining cultural patterns. By understanding variation in
618
J.S. Sandhu, P. Altankhuyag, and D. Amarsaikhan
meanings and practices within our sample of bagiin baga emch, we can develop strategies that have more universal appeal and can even take advantage of knowledge owned by a subset of the population. It can be argued in this regard that von Hippel’s work with lead users is primarily motivated by intracultural variation [29].
5 In the Field Regarding specific field techniques, pens and paper notebooks will be used for field notes since, from prior experience, researchers will be highly mobile. The recommendations of Emerson et al. [30] will be followed closely for developing field jottings. In addition to their recommendations, sketching and diagramming will be used as mnemonic devices. Opportunistic digital audio recording will be used to capture unstructured interviews. Digital photographs and short digital video clips will be recorded on a limited basis to supplement observations and interviews. The primary output of an encounter with an informant will be a narrative – blending realist and impressionist styles [11] – supported by annotated photographs. The audio and video will be used primarily to support the construction of these narratives, but will also remain available for secondary analysis in later phases of the design process. The narratives will be written immediately after leaving the field in order to support maximal recall. In the Mongolia work, this will typically mean writing the narratives at soum health centers, within 24-48 hours of leaving the informant. To this point, SHO has been presented as a cleanly operationalized process; however, as with other ethnographic enterprises, this is simply not the case. Significant field preparation is required and has been undertaken with the Mongolia project. Review of previous research with bagiin baga emch and other rural health workers was an initial step in beginning to understand the culture and work conditions of bagiin baga emch. This is an imperative stage in the process although the literature may sometimes be non-existent [3] or misleading [15]. For the lone foreign researcher on this team, language preparation was critical (as it was for Sanjek’s work in Accra, Ghana [20]). The essential nature of language – and some basic cultural understanding – is why it is important for this work to be done by in-country researchers, or at the very least, in close collaboration with them. Other field preparation has included key informant interviews, observation of continuing training in aimag capitals, and pilot testing of the field protocol.
6 Time There are many factors to consider in the design of rapid ethnographic research, but time-in-the-field is often a primary concern (Table 1). Although Table 1 seems to indicate significant differences in time of engagement, the differences are in fact less dramatic, as the cumulative sum of fieldwork days is greater for those methods that involve multiple sites or informants (Rapid Ethnography, Serial Hanging Out).
Serial Hanging Out: Rapid Ethnographic Needs Assessment in Rural Settings
619
Table 1. Sample time recommendations for different rapid ethnographic methods Author Millen [11] Hughes et al. [3]
Method Rapid Ethnography Quick and Dirty Ethnography
Beebe [18] Handwerker [31] Sandhu et al.
Rapid Assessment Process Quick Ethnography Serial Hanging Out
Time of Engagement 1 day or less 4 weeks each study, multiple studies over 3 years 4-40 days 3-90 days 2-4 days
Unit of Analysis Per site Single site
Single site Single site Per informant
In the business world, “ethnographies (read: participant observation) can last a half a day or even less. How is this possible? Ethnographers working in business are generally PhDs and typically manage this seemingly impossible feat by applying their methodological skill and accrued knowledge of theories of human behavior and social interaction” [14]. There are two problems in applying such logic to international development: (1) such familiarity does not exist for rural international development, even for trained people who are from in-country, and (2) such an attitude shifts the power from informant to ethnographer – the informant becomes a subject, rather than participant, in the research. In Mongolia, the selection of 2-4 days per informant is motivated by the nature of bagiin baga emch activities – some activities take much longer than a half day, such as monthly visits to households (2-3 days), visits to soum health centers (1-3 days), and summons (duudlaga) to a patient homes (half to full day). In addition, the strategy is to maximize the cost-benefit given the relatively high time and monetary investment in rural travel.
7 Multiple Researchers and External Reliability Issues of external reliability2 are a primary concern in SHO given the use of multiple, parallel researchers. LeCompte and Goetz indicate that ethnographers “enhance the external reliability of their data by recognizing and handling five major problems” [32]. These five problems and the mechanisms for managing them in SHO are presented below: 1. Researcher status: Mobility along the participant-observer spectrum is limited by selecting similar informants (all bagiin baga emch), by selecting a research team with comparable abilities to one another, and by making explicit the desired status among the research team. 2. Informant choices: The informants are similar along several dimensions since they represent a single class of users (bagiin baga emch). 3. Social context: An engagement of 2-4 days will ensure access to multiple social settings and actors. 2
“External reliability addresses the issue of whether independent researchers would discover the same phenomena or generate the same constructs in the same or similar settings” [32].
620
J.S. Sandhu, P. Altankhuyag, and D. Amarsaikhan
4. Analytic constructs: A single field protocol will be used by all field researchers. 5. Data collection/analysis: The field protocol and periodic team meetings will be used to manage data collection and preliminary analysis in the field, while a structured, team-based process will be used during the data analysis phase.
8 Data Analysis Following the model of Griffin and Hauser [33], the narratives resulting from the SHO will be analyzed by teams of researchers and students, in this case students from the Health Sciences University of Mongolia. Part of the motivation for doing so is to evaluate the effectiveness of this method in uncovering user needs. User needs are one, but by no means the only, way to bridge ethnography and design. Urban and Hauser define needs as “statements in the words of the customer that describe the benefits they need, want, or expect to get from a product” [34]. SHO extends this definition in 3 ways. First, while statements are important, other aspects of ethnographic work are also included. Second, products include services, not just artifacts or technological systems. Third, in international development more than business development, such products or services as are being developed may not exist or may exist in a radically different forms, so wants and expectations may be difficult to obtain. The needs analysis is a team-based process of identifying both explicit (stated) and implicit (latent or unarticulated) needs from the narratives. Upon completion of the needs analysis, all needs will be merged and redundant needs will be removed. The team will then use affinity diagramming3 to create a hierarchy of needs. Such a hierarchy makes the process translating user needs into novel design concepts more tractable. Finally the user needs will be tied to the context from which they came to preserve the richness of the design research. The resulting needs and associated data will be used to generate novel design concepts. The concepts will then be prototyped and tested with users, whether or not the prototyped systems are “technological”.
9 Summary and Assessment Serial Hanging Out (SHO) is a rapid, ethnographic method for uncovering user needs in rural settings. This method is particularly well-suited to the study of a geographically dispersed class of users [20], as may be found in rural institutions (in the present case, Mongolia’s rural health care system). Moreover, by selecting informants as anchors, and by experiencing interactions over a multi-day period, a rich sample of interactions and activities can be included in the development of ethnographic narratives. Although the method draws from Sanjek’s network-serial method, SHO is less concerned with behavioral mapping than it is with using spatial movements and interactions as a scaffolding for ethnographic inquiry. Also, as 3
Affinity diagramming is a team-based process of grouping ideas, in this case user needs, based on the ideas themselves, rather than external categories. It is also known as the KJ method after Japanese ethnologist Kawakita Jiro, the inventor of the method.
Serial Hanging Out: Rapid Ethnographic Needs Assessment in Rural Settings
621
opposed to network-serials, SHO emphasizes emic rather than etic4 perspectives, and operates in an interpretive frame of reference. Although there are some similarities to other rapid methods in HCI [3][11], this work is different in that is emphasizes the use of multi-day participant observation and that it has a less specific focus at the onset. In any case, none of these methods has been substantially evaluated in practice [12]. By formally implementing this method in the context of applied research in the rural Mongolian health sector, this method can be evaluated in situ, providing evidence as to its usefulness and to the key elements study design. The evaluative component of this research will address the efficacy, efficiency, and quality of the methods. Details of the evaluation and procedures for evaluation will be presented in future publications. Although this research will have relevance to the design of information systems, it is also expected to have utility beyond design [9], in both applied and theoretical senses. The ethnographic results should provide a deeper understanding of bagiin baga emch for those developing health strategy for the bag/soum level in Monoglia and should also provide a unique view into the work culture of rural health professionals in a particular place and time. This plurality is a main driver of this research. Acknowledgements. Thanks to those who have partnered in, or supported, the prior field research which serves as the basis for the concepts in this paper: Jonathan Hey, Catherine Newman, Alice M. Agogino, Teresa DeAnda, Jessica Granderson, Expedita Ramirez, and Kirk R. Smith. Countless discussions with colleagues, mentors, and friends (and a few non-academic strangers) have been instrumental in the development of these ideas. Special thanks on this front to Judd Antin, Michael Barry, Sara Beckman, Griff Coleman, Peter Lyman, and AnnaLee Saxenian. Preliminary fieldwork in Mongolia was supported by a Foreign Language Area Studies grant. The current research is funded by a Fulbright Fellowship and an NSEP Boren Fellowship. Mahad Ibrahim and Andrei Marin provided invaluable feedback on early drafts of this article. Finally, none of this would be possible without the cooperation of past and current research participants, who have invited us into their homes and daily lives. To them we are most indebted.
References 1. Gilmore, D.: Business: Understanding and Overcoming Resistance to Ethnographic Design Research. Interactions, 9(3) (May 2002) 2. Wasson, C.: Ethnography in the Field of Design. Human Organization 59(4), 377–388 (2000) 3. Hughes, J., Rodden, T., King, V., Anderson, H.: The Role of Ethnography in Interactive Systems Design. ACM Interactions 2(2), 56–65 (1995) 4. Rosenthal, S.R., Capper, M.: Ethnographies in the Front End: Designing for Enhanced Customer Experiences. Journal of Product Innovation Management 23, 215–237 (2006) 4
Emic refers to terms or concepts meaningful to the cultural member, while etic refers to terms or concepts meaningful to the external researcher.
622
J.S. Sandhu, P. Altankhuyag, and D. Amarsaikhan
5. Squires, S., Byrne, B. (eds.): Creating Breakthrough Ideas: The Collaboration of Anthropologists and Designers in the Product Development Industry. Bergin and Garvey, Westport, Connecticut (2002) 6. Mariampolski, H.: Ethnography for Marketers: A Guide to Consumer Immersion. Sage, Thousand Oaks, California (2006) 7. Arnould, E.J., Wallendorf, M.: Market-Oriented Ethnography: Interpretation Building and Marketing Strategy Formulation. Journal of Marketing Research 31(4), 484–504 (1994) 8. Blomberg, J., Burrell, M., Guest, G.: An Ethnographic Approach To Design. In: Jacko, J.A., Sears, A. (eds.) The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, pp. 964–986. Lawrence Erlbaum Associates, Mahwah, New Jersey (2003) 9. Dourish, P.: Implications for Design. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Montréal, Québec, Canada, 541-550 (2006) 10. Sommerville, I., Rodden, T., Sawyer, P., Bentley, R., Twidale, M.: Integrating Ethnography into the Requirements Engineering Process. In: Proceedings of IEEE International Symposium on Requirements Engineering, San Diego, California, pp. 165– 173 (1993) 11. Van Maanen, J.: Tales of the Field. University of Chicago Press, Chicago (1988) 12. Millen, D.R.: Rapid Ethnography: Time Deepening Strategies for HCI Field Research. In: Proceedings of the Conference on Designing Interactive Systems, New York City, New York, pp. 280-286 (2000) 13. Kujala, S.: User Involvement: A Review of the Benefits and Challenges. Behaviour and Information Technology 22(1), 1–16 (2003) 14. Plowman, T.: Ethnography and Critical Design Practice. In: Laurel, B. (ed.) Design Research: Methods and Perspectives, pp. 30–38. MIT Press, Cambridge, Massachusetts (2003) 15. Agar, M.: The Professional Stranger: An Informal Introduction to Ethnography. Academic Press, New York (1980) 16. Spradley, J.P.: Participant Observation. Holt, Rinehart and Winston, New York (1980) 17. Chambers, R.: The Origins and Practice of Participatory Rural Appraisal. World Development 4(7), 953–969 (1994) 18. Beebe, J.: Rapid Assessment Process. Altamira Press, Walnut Creek, California (2001) 19. Dewalt, K.M., Dewalt, B.R.: Participant Observation. Altamira Press, Walnut Creek, California (2002) 20. Sanjek, R.: A Network Method and Its Uses in Urban Ethnography. Human Organization 37(3), 257–268 (1978) 21. Geertz, C.: Deep Hanging Out. The New York Review of Books 45(16) (October (1998) 22. Beyer, H., Holtzblatt, K.: Contextual Design: Defining Customer-Centered Systems. Morgan Kaufmann, San Francisco (1998) 23. Sandhu, J.S., Hey, J., Newman, C., Agogino, A.M.: Informal Health and Legal Rights Education in Rural, Agricultural Communities using Mobile Devices. Proceedings of IEEE International Conference on Advanced Learning Technologies, Kaohsiung, Taiwan, pp. 988-992 ( 2005) 24. Granderson, J., Sandhu, J.S.: Efficiency and Design of Improved Woodburning Cookstoves in the Guatemalan Highlands. Technical Report Max-05-1, School of Public Health, University of California, Berkeley (2005) 25. Ebright, J.R., Altantsetseg, T., Oyungerel, R.: Emerging Infectious Diseases in Mongolia. Emerging Infectious Diseases 9(12), 1509–1515 (2003)
Serial Hanging Out: Rapid Ethnographic Needs Assessment in Rural Settings
623
26. Directorate of Medical Service, Department of Human Resource Development: Survey Report on Training Needs of Bag Feldshers. Ulaanbaatar, Mongolia (2004) 27. Patton, M.Q.: Qualitative Evaluation and Research Methods. Sage, Newbury Park, California (1990) 28. Pelto, P.J., Pelto, G.H.: Anthropological Research: The Structure of Inquiry. Cambridge University Press, Cambridge, United Kingdom (1978) 29. von Hippel, E.: The Sources of Innovation. Oxford University Press, New York (1988) 30. Emerson, R.M., Fretz, R.I., Shaw, L.L.: Writing Ethnographic Fieldnotes. University of Chicago Press, Chicago, Illinois (1995) 31. Handwerker, W.P.: Quick Ethnography. Altamira Press, Walnut Creek, California (2001) 32. LeCompte, M.D., Goetz, J.P.: Problems of Reliability and Validity in Ethnographic Research. Review of Educational Research 52(1), 31–60 (1982) 33. Griffin, A., Hauser, J.R.: The Voice of the Customer. Marketing Science 12(1), 1–27 (1993) 34. Urban, G.L., Hauser, J.R.: Design and Marketing of New Products. Prentice Hall, Englewood Cliffs, New Jersey (1993)
Effectiveness of Content Preparation in Information Technology Operations: Synopsis of a Working Paper A. Savoy1 and G. Salvendy2 1
Purdue University, West Lafayette, IN, USA [email protected] 2 Purdue University, West Lafayette, USA and Tsinghua University, Beijing, China [email protected]
Abstract. Content preparation is essential for web design [25]. The objective of this paper is to establish a theoretical foundation for the development of methods to evaluate the effectiveness of content preparation in information technology operations. Past studies identify information as the dominant concern of users, and delivery mechanism as a secondary concern [20]. The best presentation of the wrong information results in a design with major usability problems and does not aid the user in accomplishing his task. This paper shifts the focus of existing usability evaluation methods. It attempts to fill the void in usability literaoture by addressing the information aspect of usability evaluation. Combining the strengths of content preparation and usability evaluation yields major implications for a broad range of IT uses. Keywords: Content preparation, World Wide Web, Usability.
Effectiveness of Content Preparation in Information Technology Operations
625
This paper unveils a theoretical foundation for the development of methods to evaluate effectiveness of content in IT operations. Combing this with existing methods of usability evaluation will provide an overall effectiveness evaluation of IT systems with which humans interact. It attempts to fill the void in usability literature by addressing the information aspect of usability evaluation.
2 Content Preparation The concept of content preparation emerged from a conference panel discussion in 2002. This panel assessed the preparation of content and its management for four elements of web design: knowledge elicitation, information organization, information retrieval, and information presentation [21]. Identifying specific information needed by users and/or customers is a main goal. Content preparation is a fairly new concept in web design with relevance to e-business and cross-cultural design issues [16]. Due to its infancy, there is no study or development of an evaluation tool for this concept and its principles. Further, there has not been documentation of an information structure that aids the development of an evaluation tool. This study aims to equip designers with end-user information requirements. The focus of Content Preparation evaluation and traditional Usability evaluation differ considerably. The former concentrates on “what” information is needed and provided. The latter evaluates presentation and functionality [16]. Both evaluations are important to the design of human-centered interfaces. Currently, only usability testing is practiced. If Content Preparation evaluations were used as a supplement to traditional usability testing, interface evaluations would be more comprehensive. A similar concept of content preparation was established in the yester years. Its principles were captured in the production of catalogs, newspaper, product manuals, and paper-based bank statements. Studies have investigated differences between traditional and non-traditional media: printed versus online catalogs, printed versus online magazines [17], printed versus online newspapers, printed versus online references, and printed versus online presentation of information. As a result, the varying characteristics of printed and online materials influence modifications/additions for web-based content preparation.
3 Literature Review Many research studies have addressed the issues of data and information quality [24], [14], [19]. In the Information Systems (IS), Human-Computer Interaction (HCI), and Web-design areas, there have been different approaches for classifying important factors and evaluating quality. 3.1 Information Quality The majority of IS studies are founded upon the effectiveness of databases, search and retrieval algorithms, and content management systems [12]. Research in these areas tends to include an overall approach. In hopes of discovering a construct defining useful content for general use, studies in IS were investigated.
626
A. Savoy and G. Salvendy
Research pertaining to information/data quality has been conducted covering traditional and non-traditional contexts of use and presentation. Common classifications of the dimensions include Quality Category, Assessment Class, Criteria, and Dimensions. The Quality Category classification stems from an information quality framework developed to allow IS managers to better understand and meet the needs of their information consumers [24], [12]. This framework consists of the following four categories: Intrinsic information quality, Representational information quality, Accessibility information quality, Contextual information quality. The first three categories refer to general credibility issues. Intrinsic information quality suggests that information should have an independent quality [11]. This category is defined further with related dimensions among which accuracy is labeled the most important [22], [11]. It could be viewed as General Content information quality. Representational information quality is concerned with the format and presentation of information. This category deals with the visual design (e.g. “how” aspect) of web-design, which is not the focus of this paper. Accessibility information quality focuses on security and privacy [12]. This category could be viewed as Trust information quality. The final category is inline with this paper’s objective. Contextual information quality denotes consumers’ need for information may differ according to their tasks. However, general categories: Relevancy, Value-added, Timeliness, and Amount of data can be used for classification. Content requirements for these categories were not mentioned. However, it demonstrated the use of general categories for classification. Findings from IS, provided details for aspects of credible information rather than useful information. While useful information should be credible, credible information is not necessarily useful. A clear definition of useful content can not be deduced from the studies in this area alone. 3.2 Internet Domains Content preparation emphasizes information that will aid users in their decisions and tasks. Although studies show that user preferences and information needs change according to their purposes for using the web or different website domains [3], [26], the desire of useful information is common in all domains. A number of studies are restricted to specific domains, which include E-Commerce, Entertainment, Education, Advertisement, and Medical. The results of each study provide in-depth analysis of what aspects/features were important to usability and customer satisfaction, rarely is information content addressed. E-commerce is ranked among the top two domains for visitation and use. Online shopping surfaced as a by-product of the internet, which produces billons of dollars in revenue [8]. Customers need specific information that will aid in their decisionmaking. However, similar research often does not consider specific elements of information for e-commerce websites [8]. It takes more than simple convenience to persuade a shopper to buy from a particular website due to the vast number of competitors. Deltor et al (2003) was one of the first studies to address the challenge of identifying specific information elements of e-commerce websites. Their research
Effectiveness of Content Preparation in Information Technology Operations
627
concerning browse vs. search for pre-purchase online information seeking is highly referenced. The elements were categorized into three groups: Product, Retailer, and Interface related dimensions of information content. The construct was developed after conducting research on consumers’ information preferences across browsing and searching activities. Some of Deltor et al’s (2003) elements are listed in Table 1. Table 1. Essential Domain Specific Information Components Domain Ecommerce
Aesthetics Product Specification Price
Advertisement
Education
Information components References Deltor, Reliability Delivery 2003 Purchase Advice Retailer Retailer Reputation service Brand Retailer Policy Product alternative Availability Manufacturer
Price-Value Quality Availability Special Offer Packaging Guarantees Company-Sponsored Research Admissions Alumni Facilities FAQs Placement Programs Board Members
Performance Components Taste Nutrition New Ideas Safety Independent Research
Online advertisements have boosted revenues “from $1.9 billion in 1988 to $4.6 billion in 1999 alone” [8]. In addition, the role of information has established itself as a central factor in many discussions of how advertising works [9]. Advertising has many models and theories dedicated to explaining how a consumer searches for functional information to assist decision-making during the purchasing process. Advertisement research has roots that date back to the 1970s. Research conducted in the marketing area addressed needs for certain types of information in ads. Its focus on information cues has the closest relation to the proposed conceptual model. Information cues are defined as categories of information that are potentially useful to consumers [9].The majority of studies cited refer to cross cultural television advertisements. The most referred set of cues is accredited to Resnik and Stern (1977). These cues spawned from the results of a content driven investigation of 378 television commercials. The set originally included 14 cues (refer to Table 1). The longstanding validity of Resnik & Stern’s (1977) information cues provides a strong foundation for the development of a definitional framework of useful content for IT operations. According to studies conducted by Zhang et al (2001), the education domain ranks among the top two (paired with E-commerce) based on user familiarity. After 9/11, universities noticed a dramatic drop in campus visits and need a new method of recruitment. This and other factors have influenced the onset of e-recruitment techniques and tools. Among those tools, websites are the primary. Now university websites contain with vast amounts of information to aid visitors in performing different tasks: selecting a university or course, retrieve personal records, and bill payment.
628
A. Savoy and G. Salvendy
Most research in this area address the website as a recruitment tool. Prospective students view university websites for information to assist in their school selection process. Therefore, the preparation of the website content should deliver information that would market the school appropriately [8]. Griffin (1999) conducted a qualitative analysis of content provided on 16 web sites. Evaluation of these websites over a two week period generated the list of informational cues cited in Table 1.
4 Conceptual Model All domains should consider content preparation for the design of their websites. Ecommerce, Advertising, and Education have relatively strong internet presence. However, there are other domains (i.e. Financial, Government, Medical, and Entertainment) attempting a transition to online environments. For example, egovernment is the attempt to make government more citizen-friendly with well designed websites [15]. The new domains need to identify and assess the content and functionality necessary to motivate their audience to use these websites [15]. Chan & Swatman (2002) attempted to improve university recruitment websites with lessons learned form e-commerce. They conducted a review comparing universities and their use (or potential use) of methods from the e-commerce domain. This comparison was founded on the application of Ho’s framework (1997) to the websites of universities in Australia and Hong Kong SAR. The results concluded 15 different information components for education websites. This research ties the information needs of the education domain with those in the e-commerce domain. Again, the demonstration of transferability encourages the development of a definitional framework of useful information for IT operations that is not limited by domain. The elements listed by domains as important information elements are not mutually exclusive. They can be integrated to form an information structure that would provide a baseline definition for useful content. The categories have to be selected appropriately, to capture all the elements and have relevance to web-based interfaces. Inspecting the elements discovered from the literature review a subjective analysis suggested classification by eight categories: 1.
2. 3.
4.
Site Information – Information concerning the overall perceived quality of information provided be the website. Information content should be frequently updated and the users should be aware of when these updates are occurring [2]. Transaction information – Information explaining different aspects of the purchasing process. This is supported by the E-commerce and Education domains. Users want to make informed purchase decisions [5], [22], [6]. Company information – Information providing details on the many characteristics of a company. All domains support this predicted factor. The internet allows anyone to conduct E-business. Therefore, prospective consumers require information about company characteristics [4], [7], [6]. Security Information – Information describing measures implemented in the website to ensure transfer and storage of personal data is secure. The majority of websites request personal information. Websites should describe their efforts to secure users’ information [2].
Effectiveness of Content Preparation in Information Technology Operations
Users
Information Needed
Information Provided
Site
Revision Date Creation Date Date of Next Update
Product
Aesthetics Price Availability
Shipping
Delivery Date Tracking Number Shipping Cost
Company
Name Mission Sponsors
Security
Payment Information Security
Customer Service
Help Contact Information Refund Policy
Transaction
Taxes Payment methods Quantity
Membership
Account Status History Personal Information
Generalization
629
Domains
Specialization
Fig. 1. Conceptual Model
5.
6.
7. 8.
Product Information – Information providing details about the product and/or services. This predicted factor has the most content requirements. It is important because obtaining products/services is the main purpose of interaction between the user and interface [6], [4]. Customer Service Information – Information describing the purchase assistance and/or after-sales support. Some aspects (i.e. Customer Service) of traditional shopping must be retained by E-commerce. Users are concerned with services during and after sale [7], [2]. Shipping Information – Information explaining the shipping process, payments, and tracking options. The content components in this predicted factor increase product awareness beyond initial interaction with the website [22], [6]. Membership Information – Information pertaining to customer account status, fees, purchase history, and preferences. Most sites allow users to register accounts. This affords desired customized web experience [9], [7], [23].
630
A. Savoy and G. Salvendy
The challenge is to develop a definitional framework depicting the characteristics of useful information. Figure 1 illustrates the classification of the specific content elements noted as essential in the literature review. Useful content is defined as information that is needed to aid a user in accomplishing his/her task. The conceptual model portrays the information needed by users which is the same information that developers should provide. This illustration will equip web designers with a guide for basic content preparation for any domain. There is a general and specific view of the information elements. Please note that only a portion of the specific components are captured in Figure 1. Moreover, this model serves as a framework for the development of an evaluation tool for the effectiveness of content preparation. The tool will evaluate the developer’s interpretation and implementation of the content guide.
5 Conclusion Usability evaluation has established its ability to improve a wide range of interactive systems over the years. However, less than five percent of these methods have addressed the information aspect of interface design for IT operations. Content preparation has been documented as an essential phase of website design [25]. The information provided by the website is a dominant concern of users [20]. Therefore, a tool for evaluating information content is greatly needed to assess the developer’s implementation and interpretation of Content Preparation guidelines. The lack of literature in this area prevents the immediate construction of such a tool. A clear structure of content specific elements was necessary for its development. This paper delivers such a structure as the theoretical foundation for development of methods to evaluate the effectiveness of content preparation in IT operations.
References 1. Akoglu, C., Ozcan, O.: Usability evaluation of architecture based web sites. In: Proceedings of the Tenth International Conference on Human-Computer Interaction, 22-27 June 2003, Heraklion, Crete, Greece, pp. 743-747 ( 2003) 2. Alexander, J.E., Tate, M.A.: Web wisdom: how to evaluate and create information quality on the Web. Lawrence Erlbaum, Mahwah, NJ (1999) 3. Baierova, P., Tate, M., Hope, B.: The impact of purpose for web use on user preferences for web design features. In: Proceedings of the 7th Pacific Asia Conference on Information Systems, 10-13 July 2003, Adelaide, South Australia pp. 1853-1872 (2003) 4. Barnes, S., Vidgen, R.: Assessing the quality of auction web sites. In: Proceedings of the 34th Hawaii International Conference on System Sciences, 3-6 January 2001, Maui, HI, p. 7055 ( 2001) 5. Chan, E.S.K., Swatman, P.M.C.: Web content and design: a review of e-Commerce/eBusiness program sites. In: Proceedings of the 13th Australasian Conference of Information Systems, 4-6 December 2002, Melbourne, Australia, pp. 49-60 (2002) 6. Detlor, B., Sproule, S., Gupta, C.: Pre-purchase online information seeking: Search versus browse. Journal of Electronic Commerce Research 4(2), 72–84 (2003) 7. Gehrke, D., Turban, E.: Determinants of successful website design: relative importance and recommendations for effectiveness. In: Proceedings of the 32nd Annual Hawaii International Conference on System Sciences, 5-8 January 1999, Maui, HI (1999)
Effectiveness of Content Preparation in Information Technology Operations
631
8. Greer, J.: Evaluating the credibility of online information: a test of source and advertising influence. Mass. Communication and Society 6(1), 11–28 (2003) 9. Griffin, G. 1999, A typology of online positioning strategies among creative programs. Available online at: http://www.ciadvertising.org/studies/student/99_fall/phd/griffin/ online paper/abstract.html (accessed 9 January 2006) 10. Hornbaek, K.: Current practice in measuring usability: Challenges to usability studies and research. International Journal of Human computer Studies 64(2), 79–102 (2006) 11. Huang, H., Lee, Y., Wang, R.: Quality Information and Knowledge. Prentice-Hall, Upper Saddle River (1999) 12. Ives, B., Olson, M.H., Baroudi, J.J.: The measurement of user information satisfaction. Communications of the ACM 26(10), 785–793 (1983) 13. Jones, M.Y., Pentecost, R., Requena, G.: Memory for advertising and information content: Comparing the printed page to the computer screen. Psychology and Marketing 22, 623– 648 (2005) 14. Katerattanakul, P., Siau, K.: Measuring information quality of web sites: development of an instrument. In: Proceedings of the 20th International Conference on Information Systems, 12-15 December 1999, Charlotte, NC, pp. 279-285 (1999) 15. Krauss, K.: Testing an e-government website quality questionnaire: a pilot study. In: Proceedings of the 5th Annual Conference on World Wide Web Applications, 10-12 September 2003, Durban, South Africa (2003) 16. Liao, H., Proctor, R., Salvendy, G.: Content preparation for cross-cultural e-commerce: a review, Behaviour and Information Technology ( 2006) 17. Lu, M.Y.: Evaluating and selecting online magazines for children [Electronic Version]. Eric Digest. Available online at http://www.indiana.edu/ reading/ieo/digests/d180.html (accessed 25 March 2006) (2003) 18. Mueller.: An analysis of information content in standardized vs. specialized multinational advertisements. Journal of International Business Studies 22(1) (1st Quarter), 23–39 (1990) 19. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002) 20. Pitt, L.F., Watson, R.T., Kavan, C.B.: Service quality: a measure of information systems effectiveness. MIS Quarterly 19(2), 173–187 (1995) 21. Proctor, R.W., Vu, K.P.L., Salvendy, G.: Content preparation and management for web design: eliciting, structuring, searching, and displaying information. International Journal of Human-Computer Interaction 14(1), 25–92 (2002) 22. Resnik, A., Stern, B.L.S.: An analysis of information content in television advertising. Journal of Marketing, vol (January), pp. 50-53 (1977) 23. Salvendy, G., Fang, X.: Siemens report: guidelines and rules for design of e-business: Purdue University (2001) 24. Strong, D.M., Lee, Y.W., Wang, R.Y.: 10 potholes in the road to information quality. Computer 30(8), 38–46 (1997) 25. Vu, K., Proctor, R.W.: Web site design and evaluation. In: Salvendy, G. (ed.) Human Factors and Ergonomics, 3rd edn., John Wiley and Sons, Inc., New York, NY (2006) 26. Zhang, P., von Dran, G., Blake, P., Pipithsuksunt, P.: Important design features in different website domains: an empirical study of user perceptions. e-Service Journal 1(1), 77–91 (2001)
Traces Using Aspect Oriented Programming and Interactive Agent-Based Architecture for Early Usability Evaluation: Basic Principles and Comparison Jean-Claude Tarby1, Houcine Ezzedine2, José Rouillard1, Chi Dung Tran2, Philippe Laporte1, and Christophe Kolski2 1
Laboratoire LIFL-Trigone, University of Lille 1, F-59655 Villeneuve d’Ascq Cedex, France {jean-claude.tarby, jose.rouillard, philippe.laporte} @univ-lille1.fr 2 LAMIH – UMR8530, University of Valenciennes and Hainaut-Cambrésis, Le Mont Houy, F-59313 Valenciennes Cedex 9, France {houcine.ezzedine, chidung.tran, christophe.kolski} @univ-valenciennes.fr
Abstract. Early evaluation of interactive systems is currently the subject of numerous researches. Some of them aim at explicitly coupling design and evaluation by various software mechanisms. We describe in this paper two approaches of early evaluation exploiting new technologies and paradigms. The first approach is based on aspect oriented programming; the second one proposes an explicit coupling between agent-oriented architecture and evaluation agents. These two approaches are globally compared in this paper. Keywords: Human-computer interaction, Early evaluation, Usability, Traces, Agent-based architecture, Aspect oriented programming.
Traces Using Aspect Oriented Programming and Interactive Agent-Based Architecture
633
effective tasks. The first approach exploits the paradigm of aspect oriented programming to integrate mechanisms of trace in interactive applications. The concept of trace was the subject of various studies in HCI [16]. The second approach proposes an explicit coupling between agents constitutive of an agent based architecture, and several evaluation agents. These two approaches are first described; then, they are compared.
2 First Approach for Early Usability Evaluation: Injection of Mechanism of Traces by Aspects 2.1 Aspect-Oriented Programming New paradigm of programming appeared in the middle of the Nineties, AspectOriented Programming (AOP) results from the Xerox PARC. AOP must be perceived like an extension of Object-Oriented Programming: indeed, complementary generic mechanisms significantly come to improve separation of the concerns within the applications [14]. In a traditional approach, the business objects locally manage their technical constraints (identification/authentication, security, transactions, data integrity...). The duplication of these crosscutting elements in methods of classes leads to a phenomenon of dispersion and interlacing of the level system concerns and increases the complexity of the code. AOP allows the modularization of these elements by the addition of a new dimension of modularity, the aspect. The scope of the crosscutting concerns supported by AOP exceeds that of the current solutions like the EJB. Join point, advice, aspect, pointcut, are the principal concepts introduced by AOP: • A join point represents a particular location in the flow of the program instructions (beginning or end of method execution, field’s read or write access ...). • Advices are methods which are activated when precise join points are reached: the mechanism of weaving inserts in the initial code the advices calls either in a static way (at compile-time) or in a dynamic way (during execution). Advice can execute before, after, or around the join point. • An aspect is a module which allows the association between advices and join points by means of pointcuts. • Pointcuts are used to define a set of join points on which will have to activate an advice. Furthermore, a pointcut allows capturing the execution context of join points. For a method call, this context includes the target object, the arguments of the method and the reference of the returned object, as many information of most useful for the injection of mechanism of traces. Based on the principle of inversion of control (IOC), AOP thus extracts from the business code the dependences with the technical concerns by locating them in the aspects and by managing them from outside by the mechanism of weaving. It becomes consequently possible to be focused on business logic.
634
J.-C. Tarby et al.
Moreover, AOP proposes the mechanism of introduction. This last allows the modification of classes, interfaces or even of existing aspects: it is possible to inject a method or an attribute in a class, to add a relation of heritage, to specify that a class implements a new interface. For example, in the objective to automatically sort a collection of Java class instances, an aspect will declare that the latter implements the interface Comparable and inject the required method compareTo to it. 2.2 Traces by Aspects Thanks to the principle of separation of concerns, AOP can inject traces mechanisms in existing applications (cf. Figure 1, step c) by writing aspects (step d) which on the one hand listen user actions, method calls, changes in data values, etc., and on the other hand produce the traces. These aspects are then weaved with the initial code (step e) which remains intact. The code produced by weaving contains then the initial code and code of aspects (step f). The initial application can be used completely normally without the aspects or be traced with them (step g). The mechanism of trace is thus disengageable without any effect on the initial code.
AspectJ
d Formats of trace
e
f
Aspects of trace
c
g Application to be traced
Aspect weaving Initial application
interactions
j
Data analysis
Traces
i
h
Fig. 1. Injection of mechanism of traces by aspects
To produce a trace we need three types of information: data to be traced, when to produce the trace and where to store it. Traced data mainly relate to the functional core (and consequently the associated tasks) and the user interface (actions from the user, but also displayed data…). For example it is possible to trace the beginning, the end or the interruption of a task, the opening of a window, the selection in a dropdown list, etc. Because our work is use-oriented, it is easier to trace the actions of the user when the functional core and the user interface are built from a task oriented
Traces Using Aspect Oriented Programming and Interactive Agent-Based Architecture
635
design method. Thus, if the application is designed with an evaluation-oriented approach as presented in [23], it is easy to recover other data such as the context of execution of the tasks, the role of the user (in CSCW for example), etc. Most of the time, the traces are produced when a method is called or at the end of the execution of the method, and these methods may be associated to tasks. AOP provides us all the requested services for the production of traces (cf. before and after keywords present in AOP). Moreover, it is very easy to parameterize the productions of traces, for example to produce them by a dedicated thread, or only if a condition is true. Today the traces are generated in XML files (step h) whose contents are parameterized by a set of formats also written in XML (step d). This allows us to generate traces in different formats while emitting same information from the traced application. Although we privilege traces in XML format, the external definition of formats will make it possible to generate very compact textual files (not XML). With our approach the exploitation of traces is facilitated because we choose data that we want to trace, as well as the format for the result, contrary to approaches based on log files. The analysis of traces (step i) produce statistics, task models (step j), filtered information, etc. This side of our work is not presented in this paper. At the moment this analysis is done after the production of traces, but we plan to realise real time analysis in the future (for an adaptation of the application, to advise the user, etc.). Our work is similar to works such as [2,5,6,9,10,11,24]. It uses AspectJ [4] but it could be made with other languages supporting AOP such as [3,21,25].
3 Second Approach for Early Usability Evaluation: Interactive Agent-Based Architecture and Evaluation Module 3.1 Agent-Oriented Architecture for Interactive Systems Several architecture models have been put forward by researchers over the past twenty years. Two main types of architecture can be distinguished: architectures with functional components (Langage, Seeheim, Arch and their derived models) and architectures with structural components (PAC and its derived models [7], the MVC model (Model-View-Controller ; from Smalltalk) and its recent evolutions, AMF and its variants [20], H4 [8]…). The classic models of interactive systems distinguish three essential functions (presentation, control and application). Some models (such as the Seeheim and ARCH models) consider these three functions as being three distinct functional units. Other approaches using structural components, and in particular those said to be distributed or agent approaches, suggest grouping the three functions together into one unit, the agent. These architecture models propose the same principle based on separation between the system (application) and interface. Thus, an architecture must separate the application and the interface, define a distribution of the services of the interface, and define a protocol of exchange. The interest to separate the interface from the application is to facilitate the modifications to be made on the interface without
636
J.-C. Tarby et al.
Application Application agents
dialogue control agents
Interfaces agents
User
Fig. 2. An agent oriented architecture for interactive systems
touching with the application. Figure 2 proposes a comprehensive framework for architecture [12,15], showing a separation in three functional components, called respectively: interface with the application (connected to the application), controller of dialogue, presentation (this component being in direct relation with the user). These three components group together agents: − the application agents which handle the field concepts and cannot be directly accessed by the user. One of their roles is to ensure the correct functioning of the application and the real time dispatch of the information necessary for the other agents to perform their task, − the dialogue control agents which are also called mixed agents; these provide services for both the application and the user. They are intended to guarantee coherency in the exchanges emanating from the application towards the user, and vice versa, − the interactive agents (or interface agents), unlike the application agents, are in direct contact with the user (they can be seen by the user). These agents coordinate between themselves in order to intercept the user commands and to form a presentation which allows the user to gain an overall understanding of the current state of the application. In this way, a window may be considered as being an interactive agent in its own right; its specification describes its presentation and the services it is to perform. 3.2 Principle of Coupling Between Architecture Based on Agents and Evaluation Agents Our starting objective was to propose a tool for collecting objective data, adapted to agent based interactive systems. This tool corresponds to an electronic informer; it consists of a program, invisible for the user (of the system to be evaluated), which transmits and records all the interactions (actions of the operator and reactions of the system) in a data base. The exploitation of this data base has the aim of then providing the evaluator with data and statistics enabling him/her to draw conclusions with regard to various aspects of utility and utilisability.
Traces Using Aspect Oriented Programming and Interactive Agent-Based Architecture
637
Fig. 3. Principle of coupling between agent-based architecture of the interactive system and its evaluation [26]
This informer being dedicated to the evaluation of agent-based interactive systems, it must be closely related to the architecture of the system to evaluate [13,26]. We are interested particularly in the interactive agents. This electronic informer, figure 3, consists of several informer agents deduced starting from architecture from the system to evaluate and more particularly starting from the multi-agent system concerning presentation. It is based primarily on the acquisition of information and specific data of the system to be evaluated (actions of the user and reactions of the system). Those will make it possible to rebuild the tasks really carried out by the user (a posteriori mode) and to confront them with the model of tasks to be carried out (a priori mode), according to confrontation principles described in [1]. Let us suppose a module of presentation made up of 6 interactive agents (each one being able to interact with the user), 6 evaluation agents will be instanced and connected to the interactive agents. During the interactions with the user, the 6 evaluation agents memorize in real time the data concerning interaction between the user and the 6 interactive agents. After the realization of the tasks, these data are analyzed automatically; using a specific user interface dedicated to the evaluator, these data are presented in time differed at this one. They can go from a bottom level, corresponding to simple user or system events, to higher levels (for example concerning task level). Examples are available in [26].
4 Comparison Between the Two Approaches A comparison of the two approaches is given in Table 1. The two approaches have common objectives: to gather data to compare predicted tasks and activity, and to highlight utility and usability problems. The ways used to obtain these data differ according to the approaches.
638
J.-C. Tarby et al.
From the point of view of integration in the software engineering, the two approaches require particular specifications. Approach 1 (AOP approach) needs to know the methods and data that can be traced, as well as the formats of trace; this information can be collected during the specifications or after the implementation. Approach 2 (agent approach) requires the specification of the elements of the interactive system, and the evaluation agents. No particular architectural design is requested for the AOP approach, but agent approach requires that the design of the interactive system architecture must be based on interface agents, as well as the establishment of connections between the interactive agents and the evaluation agents. About the implementation, AOP approach automatically generates the code of the aspects and the weaving with the initial code of the application to be traced; agent approach requires programming the services of the interactive system agents and the evaluation agents. From the user centred evaluation point of view, in addition to the fact that the two approaches can be coupled with other techniques such as interviews, eye tracking, etc., they use different modes to gather data: with AOP approach, data are automatically collected by the execution of the code issued from the aspect weaving on the initial application code; with agent approach, data are collected from the evaluation agents by observing the interactions between the interface agents and the user. To be collected with AOP approach, data must be accessible by a method (with the meaning of the object-oriented programming); this method can be public, inherited, etc. Time is accessible in the same way. Data collected with agent approach are potentially multiple (cf. Table 1). In their current version the approaches use different languages. AOP approach uses Java and AspectJ; agent approach is based on C++. In the future, it is expected that AOP approach will be extended to other languages supporting AOP such as PHP, C++, etc., and that agent approach will uses Java. Concerning the types of application, AOP approach currently can trace any application written in Java and supporting AspectJ. However, the traced applications are today mainly interactive applications (WIMP1 applications). In the future, it is planned that AOP approach will be applied to information systems, distance learning applications, and mobile applications. Agent approach is currently applied to information systems used in a context of supervision of network of bus and tramway. In the future, it should aim any type of information system. The advantages of these two approaches are that they provide principles and mechanisms facilitating and prompting early evaluation. In addition AOP approach allows keeping intact the initial code and thus leading in parallel and/or serially the realization of the application and the realization of the mechanisms of traces. The disadvantages are as follows. With agent approach, it is difficult to define for the moment the optimal number of evaluation agents (the first version contained an evaluation agent by interaction agent, and the new version will contain only one for the need for new design methods of user interface envisaging a coupling between interface agents and evaluation agents. To be more effective, AOP approach needs 1
Window, Icon, Mouse, Pull-down menu.
Traces Using Aspect Oriented Programming and Interactive Agent-Based Architecture
639
Table 1. Comparison between the two approaches AOP approach
Agent approach
Injection of mechanism of traces by aspects
Traditional stages of software engineering
User-centred evaluation (simultaneously with other possible methods: interviews, eye tracking, questionnaire, etc.) Goals
Languages
Types of application
Coupling of interface agents and module of automatic acquisition Preliminary or Explicit consideration of early Explicit consideration of faisability study evaluation in the project early evaluation in the project Specification of: Specification Specification of: interactive interactive system agents, system, parameters to be evaluation agents traced, formats of traces Architectural (empty) Design of the design interactive system architecture based on interface agents; connections between interactive system agents and evaluation agents Coding Generation of the code of the Coding of the services of aspects and weaving between the interactive system the code to be traced and the agents and evaluation aspects agents Interaction data Execution of the weaved code Espionage by the gathering evaluation agents of the interactions between interface agents and the user Collected data Any data accessible by a User and system events, method (in the meaning of errors, time of tasks object-oriented programming) execution, unused objects, + Time number of help requests… Depends on how traces are exploited: gathering data to compare predicted tasks and real activities, highlighting problems of utility and usability… Current Java with AspectJ C++ Intended Any language supporting Java AOP Current WIMP applications Information systems used in a context of supervision of network of bus and tramway Intended Information systems, Information systems distance learning applications, mobile applications
design methods integrating aspects for the evaluation. That means for example that any potentially traceable data must be accessible by object methods.
5 Conclusion The early evaluation field is the subject of active researches in the HCI community. For our part, we work on two complementary approaches. The first is based on aspect
640
J.-C. Tarby et al.
oriented programming; it allows the injection of mechanisms of traces in existing applications. The second is based on new possibilities offered by agent based approaches; it aims at ensuring a coupling between agent based architectures and evaluation agents. Although turned towards same objectives in term of evaluation, these two approaches have different characteristics, advantages and disadvantages which were compared in the paper. For these two approaches, the research perspectives are numerous: it is important to study adapted design methods, to improve the current mechanisms, to test them in various application domains. Acknowledgments. The present research work has been supported by the “Ministère de l'Education Nationale, de la Recherche et de la Technologie », the « Région Nord Pas-de-Calais » and the FEDER (Fonds Européen de Développement Régional) during the projects SART, MIAOU and EUCUE. The authors gratefully acknowledge the support of these institutions.
References 1. Abed, M., Ezzedine, H.: Vers une démarche intégrée de conception-évaluation des systèmes Homme-Machine. Journal of Decision Systems 7, 147–175 (1998) 2. Aksit, M., Bergmans, L., Vural, S.: An object-oriented language-database integration model : the Composition-Filters Approach. In: Madsen, O.L. (ed.) ECOOP 1992. LNCS, vol. 615, pp. 372–395. Springer, Heidelberg (1992) 3. aoPHP, Aspect Oriented PHP http://www.aophp.net 4. AspectJ project http://www.eclipse.org/aspectj/ 5. Balbo, S., et al.: Project WAUTER (Website Automatic Usability Testing EnviRonment) http://wauter.weeweb.com.au 6. Champin, P-A., Prié, Y., Mille, A.: MUSETTE: Modeling USEs and Tasks for Tracing Experience. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 279–286. Springer, Heidelberg (2003) 7. Coutaz, J.: PAC, an Object-Oriented Model for Dialog Design. In: Bullinger, Hans-Jorg, Shackel, Brian. (ed.): Interact’87, 2nd IFIP International Conference on Human-Computer Interaction, September 1-4, Stuttgart, Germany, pp. 431-436 (1987) 8. Depaulis, F., Jambon, F., Girard, P., Guittet, L.: Le modèle d’architecture logicielle H4: Principes, usages, outils et retours d’expérience dans les applications de conception technique. Revue d’Interaction Homme-Machine (RIHM) 7, pp. 93–129 (2006) 9. Ducasse, S., Gîrba, T., Wuyts, R.: Object-Oriented Legacy System Trace-based Logic Testing. In: Proceedings 10th European Conference on Software Maintenance and Reengineering (CSMR 2006), IEEE Computer Society Press, Washington (2006) 10. Egyed-zsigmond, E., Mille, A., Prié, Y.: Club (Trèfle): a use trace model. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 146–160. Springer, Heidelberg (2003) 11. El-Ramly, M., Stroulia, E., Sorenson, P.: Mining system-user interaction traces for use case models. In: Proceedings of the 10th International Workshop on Program Comprehension (IWPC’02), Paris, France (27-29 June 2002) 12. Ezzedine, H., Kolski, C., Péninou, A.: Agent oriented design of human- computer interface. Application to supervision of an urban transport network. Engineering Applications of Artificial Intelligence, vol. 18, pp. 255-270 (2005)
Traces Using Aspect Oriented Programming and Interactive Agent-Based Architecture
641
13. Ezzedine, H., Trabelsi, A., Kolski, C.: Modelling of an interactive system with an agentbased architecture using Petri nets, application of the method to the supervision of a transport system. Mathematics and Computers in Simulation 70, 358–376 (2006) 14. Filman, R., Elrad, T., Clarke, S., Aksit, M.: Aspect-oriented software development. Addison-Wesley Professional, London (2004) 15. Grislin-Le Strugeon, E., Adam, E., Kolski, C.: Agents intelligents en interaction hommemachine dans les systèmes d’information. In: Kolski C. (ed.): Environnements évolués et évaluation de l’IHM, IHM pour les SI 2 (Éditions Hermes, Paris, pp. 207-248 (2001) 16. Hilbert, D.M., Redmiles, D.F.: Extracting usability information from user interface events. ACM Computing Surveys 32, 384–421 (2001) 17. Ivory, M., Hearst, M.: The State of the Art in Automated Usability Evaluation of User Interfaces. ACM Computing Surveys 33, 173–197 (2001) 18. Jacko, J.A, Sears, A.: The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications (human factors and ergonomics). Lawrence Erlbaum Associates, London (2002) 19. Nielsen, J.: Usability Engineering. Academic Press, Boston (1993) 20. Ouadou, K.: AMF: Un modèle d’architecture multi-agents multi-facettes pour Interfaces Homme-Machine et les outils associés. Ph.D. Thesis, Ecole Centrale de Lyon (1994) 21. PHPAspect http://phpaspect.org/ 22. Sweeney, X.E., Sibertin-Blanc, M., Maguire, M., Shackel, B.: Evaluation user-computer interaction: a framework. International Journal of Man.-Machine Studies 38, 689–711 (1993) 23. Tarby, J.C.: Evaluation précoce et conception orientée évaluation. In: Proceedings ErgoIA’ (Biarritz, France, 11-13 Octobre 2006), ESTIA and ESTIA.Innovation, Biarritz, pp. 343-346 (2006) 24. The Compose* project http://janus.cs.utwente.nl:8000/twiki/bin/view/Composer/ 25. The Java Aspect Components (JAC) project http://jac.objectweb.org/ 26. Trabelsi, A.: Contribution à l’évaluation des systèmes interactifs orientés agents: application à un poste de supervision du transport urbain (in french). PhD Thesis, University of Valenciennes and Hainaut-Cambrésis, Valenciennes, France (2006)
Usability and Software Development: Roles of the Stakeholders Tobias Uldall-Espersen and Erik Frøkjær Department of Computing, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen {tobiasue, erikf}@diku.dk
Abstract. Usability is a key issue when developing software, but how to integrate usability work and software development continues to be a problem, which the stakeholders must face. This study aims at developing a more coherent and realistic understanding of the problem based on 14 interviews in three case studies. The results indicate that usability during software development has to be considered with both a user interface focus and an organizational focus. Especially techniques to support the uncovering of organizational usability are lacking in both human computer interaction and software engineering. Further, the continued engagement of stakeholders, who carry the vision about the purpose of change, stands out as a critical factor for the realization of project goals.
1 Introduction Integrating usability work into software development is not easy [3]. It requires thorough understanding about usability work methods and software development practices to reach a proper integration, but this understanding seems insufficient when aiming at improving end product usability. Despite heavy investments in information technology we observe deficiencies in practical usability work and significant lack of impact [4]. Even current research fails to explain why [7]. This paper reports from a study combining both an organizational and an individual approach to understanding and exploring the problem. By selecting this approach we seek an understanding of how organizational issues and stakeholders in the organization influence end product usability.
Usability and Software Development: Roles of the Stakeholders
643
development process by two stakeholders: a graphical user interface designer and a business representative responsible for requirement specification, test planning and user education. These two persons were interviewed as well. The main research question was how practitioners in software development projects are working with usability and what we can learn from their practices? All interviews had the same interview guide as starting point, but there were significant differences in how they progressed. The interview guide covered four themes: (1) The software development process. (2) Software quality. (3) Developing usable software. (4) General experiences with development of usable and useful software products. During the interviews theme 1 and 3 were given most attention, and theme 1, 2 and 3 were all discussed based on one specific software development project significant to the interviewees and their organization. Each interview took 60-90 minutes. The interviews were transcribed and analyzed using elements from grounded theory [5]. During the analyses we looked for information that directly or indirectly related to usability. This information was for instance statements about stakeholders’ perception of usability, descriptions of usability related activities, and non-usability related issues that influenced end product usability. 2.1 Usability as a Concept Our data suggests that usability is treated with different goals in mind in the various development projects and their organizational context. This leads us to look further into the relevance and practical conditions of conducting usability work in software development projects in order to examine the various stakeholders’ roles and the possible risks regarding realization of the full potential of the solution. The ISO 9241-11 standard defines usability as: “The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.” Using this definition, usability is depending on four variables, i.e. a product, specified users, specified goals, specified context of use. Following our organizational approach we observed how specified goals had significant influence on the handling of usability. This we found important since these goals existed more or less autonomously of the product, the users and the context of use; three variables which traditional usability work often have special focus on. Various stakeholders formulated goals and their direct or indirect roles informed each case significantly. We found it useful to distinguish between two groups of stakeholders, the users, i.e. persons who interact with the system, and the other stakeholders, i.e. persons who are directly or indirectly affected by the system or have important interests regarding it. Our data suggests that usability work is oriented towards two different dimensions, which is related to the various goals in the development project, among the stakeholders, and in the organization. The two dimensions found were: (1) Usability work oriented towards the user interface or user interests, which we refer to as user interface usability. (2) Usability work oriented towards the organization or other stakeholders, which we refer to as organizational usability. Incidents with both identical and different interests between the two usability orientations were observed, which support our assumption about the importance of analyzing these two dimensions.
644
T. Uldall-Espersen and E. Frøkjær
3 Results The cases had both strong similarities and differences. All projects were based on web technology and were all considered quite successful by the interviewees. In relation to their organizations the developed applications were innovative and both influencing and influenced by their organizations. All the systems had various user groups and groups of people that were influenced by the systems. The systems were all initiated centrally, and anchoring the systems locally in the organizations was a challenge. By nature the systems were very different. Two systems were custom-developed by external contractors and an in-house development team developed one system. Case 1: Development of a new insurance sales tool. This case regards the development of a new sales tool for two groups of users, insurance agents and customer service persons. The tool was developed in-house over a period of 18 months. At the most 25 employees were working at the project. About 400-500 employees would be using the tool. The two user groups had significantly different requirements as the insurance agents were selling at the customers’ locations, typically in their homes, and the customer service persons serviced customers over the phone. It was not considered possible by the project management team to make two different interfaces and considerable efforts were made to make one suitable interface. The sales tool was build as a front-end to two large insurance administration systems and it was a challenge to avoid letting administrative procedures inform the design. A customer centred approach was taken and all possible stakeholders were involved. The aim was to ensure the users the best possible tool and the main improvements were a better quality of data and an improved general view of the customers and their households. The company had a strongly centralized organization rooted at the head office, but employees at five regional offices generated the majority of the sale. A main challenge was to avoid that the tool became “another head office’s idea” and a considerable effort was done to insure that the tool was firmly anchored locally. The project was innovative and utilized new technology, such as wireless access to the back-end systems and other relevant systems, e.g. the national civil registration number register. The new technology also caused severe technological and usability problems. The company did not use a formal software development method and usability was not prioritized initially in the project. Two stakeholders strongly insisted on taking usability seriously and they gradually succeeded in making usability a significant and comprehensive part of the project. The project management team took a risk by yielding control with the process and allowing anyone involved in the project to have an opinion and express it. The software developer described the space for communication this way: “We had our arguments and we have been bloody angry at each other, close to physical fights, but it is like that to integrate systems if you ask me, and I find it great that we could ... we really could go directly to each other and say that this is really annoying. Can’t we ... I think this is foolish ... but I think this is foolish ... why aren’t you done now ... why shall I be done now, and so on. There we really had a very close collaboration.” So, space was made for rewarding discussions and iterations, but the downside was that much decisions making became very time consuming. Case 2: Developing a new IT platform for a political organization. This case regards the development of a new IT-platform for a political organization. The
Usability and Software Development: Roles of the Stakeholders
645
IT-platform was custom-developed by an IT-contractor in close co-operation with the central office in the organization. The co-operation continued over several years where components continuously were delivered and put into use. The project team consisted of six or seven persons from the contractor and the customer’s organization. The organizational leaders had strong visions about modernizing the organization and the new IT-platform was a key tool to fulfil this vision. There were strong economic incentives in the project as well. The IT-platform should serve two purposes. First it should replace an existing, but outdated communication platform used by 2.000– 3.000 members. Otherwise a costly renewal of the license to the old communication platform was needed, which was not a realistic option. Introducing a new platform should help opening up the organization and make it more attractive to new members. The new platform included an advanced CMS-system available for all members (about 50.000) and specific tools for running effective and professional election campaigns. Second the IT-platform should serve as a new tool for membership administration, which would be decentralized and handed over to the local chapters of the organization. Membership administration includes issues like collection of dues, signing members up for courses and the national congress, and internal polling functionality. The contractor applied a highly agile and strongly business process oriented approach to the development. This was a key success factor since external events periodically completely did remove the customers’ focus from the project and changed the short termed goals. A very special contract was made between the contractor and the customer’s organization. No formal requirement specification was agreed upon, but a vision was developed, thoroughly discussed in the management group, and written down. The customers’ project manager describes it this way: “We ended up writing up a two-page contract and some enclosures, which essentially stated that we could put the deliveries into use when we were satisfied, and when we did so we paid. The whole issue of accepting that they had delivered what we needed was handed over to us, by stating ‘our experience is, that you only pay if you are satisfied, so let us put that into the contract.’ Thus, it was completely up to us to decide when things were approved, but it could not be put into use before it was accepted. This model does not work in all projects, but it was extremely operational in relation to what we were going through.” The agreement was governed by a fairness principle ensuring that the customer’s organization and the contractor treated each other respectfully and this converted potential conflicts to win-win situations. According to one of the key persons, “enlightened despotism” dominated within the customer’s organization and only three stakeholders were thoroughly involved in the project. Case 3: Developing a coherent physical and electronic department store. This case regards the development of a new website for a department store with a number of locale houses. The website was developed by an external contractor who were specialized in user centred web development. The customer’s organization did only little to involve itself in the project. The customer considered the solution to be a high-class web-solution and it was technically efficient, but it was poorly anchored in the customer’s organization. The contractor’s information architect experienced the lack of anchoring this way: “ ... they might not all have much notion about what this website should be used for, and they also had different positions. The commercial
646
T. Uldall-Espersen and E. Frøkjær
manager had another position than the marketing manager, who had another position than the loyalty manager. And then ... they need to clarify it internally, and then they can come to us, because we are going to make something they can use for what they have agreed the system to be used for.” The project was completed within five months and five persons from the contractor were core project members. The unique in this case was the idea of creating a coherent solution where the physical and electronic world supplemented each other in order to maintain a leading role in the physical department store market in Denmark and, if possible, also establish a position within the web-shop market. Two different goals were formulated. The first goal was to enable department store customers to buy articles in a traditional web-shop and this was given most attention by the development team. This was a limited success since only about 1 out of 1000 articles from the physical stores were available in the webshop when it opened. It proved to be a non-trivial task to add articles to the web-shop and to ensure that the organization was able to handle the logistics. The second goal was to present information and to inspire potential customers to buy articles in the physical shops, which was the primary goal according to the business representative. A large effort was put into unifying these two goals. A combined physical and webbased fashion magazine was created and when searching for products at the web site, the search function returned information including the physical placements of the articles in the department stores. The development process was split into three phases, each sold individually to the customer. This was an efficient way to keep the project on track, but some economic surprises did occur. Most significant was the surprise when the cost of following the strict HTML 1.0 standard was summed up. This standard was not previously followed and the budget was blown for the html-development without adding significant quality to the usability of the solution. Furthermore, the customer did neither want to pay for a thorough analysis of the target group, i.e. department store customers, nor a final user test. These cost savings watered down the user centred process. 3.1 Cross-Analyzing the Cases Our data suggests three different approaches across the cases, which we use as starting points for analyzing and comparing the cases. Each approach seems to have or could have a significant impact on usability of the end product. The approaches are: (1) The existence or development of living visions or organizational goals in the organizations. (2) The technology used to implement the system and the technical context in which it was implemented. (3) The shaping of the software development process. Approach 1: The existence or development of living visions or organizational goals in the organizations. All three cases were influenced by visions or organizational goals, but the effect of these was very different. In case 1, two main goals were important. (1) Tying up customers closer to the company by selling product from more than one branch of the company. This goal was pursued by making the tool customer centred and by making it easy to refer customers to other branches. (2) Following best practises when selling insurance products. This was done by never leaving the customers with obvious needs that were not treated in the sales process. The treatment was documented in the printed policy
Usability and Software Development: Roles of the Stakeholders
647
and signed by the customer. This was done to harmonize the expectations between the customers and the company and thereby avoiding disappointed and complaining customers when a possible insurance event happened. The redesign of the printed policies introduced a problem with clarity of the policy, since a normal policy that was handed over and signed by the customer was on about 18-20 pages. Since the old tool produced a three-page policy, this change directly influenced the sales process. In case 2 there was a clear vision about modernizing and opening the organization to make it more attractive to new or younger members. Modernizing included revising the administrative processes in order to save money and strengthen the campaign machinery. For example, the new platform included a web-based publication module where members, from a set of templates, could create folders and posters and send them directly to the printing house without dealing with colour formats and other technical issues. One key to opening the organization was through the design of an individual entry page called ‘my page’. My page should give the members easy access to discussion boards, mailing lists, and relevant homepages, but the page suffered from lack of user interface usability. It provided too much information and was difficult to use. This problem could be explained by a significant disagreement among the stakeholders about its purpose, functionality and design. In case 3 the buyer had a set of visions that was not clearly absorbed in the project team, and some of the project members expressed doubts about the realism of fulfilling the visions. The website should inspire customers and attract them to the physical department stores, and should help building and maintaining customer loyalty. Two means supported this. First, the company developed an electronic and physical fashion magazine, which included various articles about fashion, showed various shopping articles, and linked to other text articles on the website. Second, the buyer introduced a special search concept. When customers were searching for an article or a brand, the search result displayed the various available articles of that brand and where to physically find them in the department stores. Based on the three cases, we observe how fulfilling visions and goals in a project are strongly influenced by organizational usability. In all three cases the systems were important tools for creating loyalty or solidarity, but different approaches were chosen. In case 1 the utilization of the visions grew out of the comprehensive involvement of the various stakeholders, through workshops and formal or informal evaluations. In case 2 the design of the contract was an important factor for letting the understanding of the organizational usability develop, while the design and redesign of business processes were an important tool to its realization. The small project team with tightly cooperating members was well qualified for the job. In case 3 only one or a few key persons from the customer’s organisation understood the concept that was implemented and they did not succeed in making the solution an integral part of the organisation. Furthermore our data suggests that successful realization of visions and goals depends on thorough and coherent understandings of the users and the situation of use. Thus inadequacy of user interface usability constitutes a significant risk for not fulfilling the visions and goals. Approach 2: The technology used to implement the system and the technical context in which it is implemented. All three cases relied on web technology and were dependent of the technical context, but the technical impact on usability was very
648
T. Uldall-Espersen and E. Frøkjær
different. One important commonality across the cases was the centralized architecture that made it easy and relative inexpensive to fix errors and ‘roll out’ new corrected versions of the software. Compared to traditional software development the test efforts were reduced because of the easy access to fix problems. In case 1 and 2 less attention was directed at the deliveries when they first were put into use, and the organizations thereby failed to profit fully from the centralized architecture. The tool in case 1 was a Java application running on a number of Citrix servers accessed through a traditional wired network or a high-speed mobile phone connection. On an early workshop the users were asked “What can we do to make your everyday better?” This provided important information about the possible improvements of the tool, such as how online access to the national civil registration register could help the users forming the household fast and correct while visiting the customers. The online abilities also made data validation possible through integration to the back-end systems. This drastically reduced the number of errors that required intervention from other employees after the sales were finalized. The wireless setup had a major performance problem and it took up to 17 minutes to print the policy, which preferably should be signed by the customer during the visit. Case 2 relied on a component based service oriented architecture. This architecture made the solution extremely flexible to expand and modify and supported fast adoptions to changes in the short termed goals of the organization. For example, components of the existing infrastructure was easily integrated into the new solution, which made the solution usable from an early stage in the overall development process, and the ability to fast adoptions to changing goals proved very useful when internal and general elections were announced. Case 3 took the most conservative approach to technology. The customer’s main focus was on getting a stable solution, which they got. The contractor put a lot of effort in delivering a strict html 1.0 compliant solution. This did not have a clear influence on usability of the end product, but increased the cost of the solution significantly. Integration of the web-shop with the existing enterprise resource planner-system was a major issue, which was postponed since the customer’s ITdepartment lacked resources to assist this work. This left the administrative and logistic processes to be carried out more or less manually and thereby exposed to human failures. This caused concerns among the stakeholders and would have been a major problem in the organization had the web-shop been a large success. The technological comparison suggests a number of things. First, the ability to integrate with other systems can have huge effect on both user interface usability and organizational usability and failing to integrate can have severe consequences for the organization. Our data suggests that successful integration depends on continuously bringing experts together. Second, discovering and utilizing the technological abilities can be a learning process that needs space and time. Relying on well-known technology and solution patterns reduces risks of technical issues, but might also reduce innovation in the solution and in the organization, which can reduce both the user interface usability and the organizational usability. New technology can be used to evolve usability and increase the usefulness of the end product, but with a greater risk. Third, relying on specific technology and standards can introduce limitations, formal and informal. This can be a reasonable overall decision, but the consequences for usability is hard to anticipate.
Usability and Software Development: Roles of the Stakeholders
649
Approach 3: The shaping of the software development process. In our three cases we see three very different software development processes. Case 1 relied on a human centred development process. The team aimed at putting the customer in the centre in the tool. All possible stakeholders within the company were involved and anyone at the team was entitled to have an opinion and share it. Occasionally this made the process very time consuming and demanding to handle. The result of the development process was a solid all round sales tool, where different orientations of usability were considered. Neither the user interfaces nor the processes were optimized but both were designed well. Through a number of iterations involving various users most parts of the user interface were tested before the final user tests. Case 2 was a business process centred development process. The main focuses were on identifying important business processes, describing the processes into details, identifying stakeholders in the processes, and then implement the processes. All main design activities started with drawing up and analyzing the involved processes and the project organization saw it as their main task to “add electric current to the business processes”. The positive outcome of the process-oriented development was a system that supported a variety of processes in the organization and was well integrated with existing and new processes and components. However, it also resulted in a non-optimized user interface with serious flaws. Case 3 had a user centred development process as starting point. The user centred process was reduced due to economic limitations, since the customer did not want to pay for a target group analysis or a user test. This decision was inconsistent with the contractor’s advice. In the development process, focus was on the front-end of the system and the back-end was only minimally adjusted to the customer’s organization. The customer took only minimally part in the development project and although the contractor paid some attention to the organizational issues, the integration to the existing business did not work well and introduced a serious risk to the project. The comparison of the three different development processes suggests two main issues regarding usability. First, a process-oriented approach favours organizational usability while a user centred approach mainly considering direct users, favours user interface usability. The human centred approach of case 1 aiming at considering all possible stakeholders, places it self in between by promoting both organizational usability and user interface usability. Second, the human centred approach required lots of resources because of the broad discussions, which was deliberately avoided in case 2 and 3. In both case 2 and 3 the project managers were clearly aware of the risk of overloading the project and refrained from involving users in specific situations, while the project manager in case 1 aimed at ensuring that ‘the user involvement did not get out of hand’.
4 Discussion We discuss possible means to improve integration of usability work and software development based on the three approaches. Approach 1: The existence or development of living visions or organizational goals in the organizations. We find that the main issues regarding this approach are: (1) How is a living vision established, evolved, and maintained throughout the
650
T. Uldall-Espersen and E. Frøkjær
development process? (2) How are visions and goals transformed into concrete and usable systems design? (3) How is usability of the systems design evaluated together with the visions? Participatory IT Design [2] and Contextual Design [1] suggest how to develop and utilize visions in systems design, but how to evolve, maintain and evaluate the vision and goals is not discussed. In our cases the visions and goals are initially anchored among the non-technical stakeholders and it becomes their task as vision carriers to maintain and propagate the visions to the entire set of stakeholders, and particularly to anchor the visions and keep them alive together with the key technical stakeholders. This is for example carried out through workshops, and workshops are also used as a place where visions and goals can inform the concrete systems design. Case 1 and 2 include a number of critical decision points, where the intervention by the vision carrying stakeholders was necessary to retained focus on the overall project goals, also in situations where fast and comprehensive reordering of priorities were urgent. Also, we do not see this issue discussed in either the usability literature or the software engineering literature. Since goals and visions seem to have great influence on organizational usability, an iterative process with evaluations and redesigns taking shape in accordance with visions might be a way to better support organizational usability and thereby to better realize the full potential of the solution. Approach 2: The technology used to implement the system and the technical context in which it was implemented. We find that the main issues regarding this approach are: (1) How do we best realize the technological possibilities regarding usability? (2) How do we visualize and evaluate the consequences of the technological choices regarding usability? (3) How do we evaluate the technical implementation regarding usability before it is to late? Both Participatory IT Design [2] and Contextual Design [1] suggest that technology and the technical context are important when planning and designing new IT-systems, but the need for ongoing evaluation during development is not covered. Our cases show that key stakeholders are aware of how technology can support usability work, for example by making it easy and inexpensive to update web-based software on central servers, which should make it possible to fix a number of usability issues with a reasonable cost. Unfortunately, our data also shows that this possibility is not properly utilized, since focus shifts to other important tasks, even though an insufficient or even defective system is put into use. Furthermore, it might be more difficult than anticipated to upgrade the systems after a large number of users have taken the system into use. Also we observe how rigidly relying on standards can introduce new risks, if they are not necessary and coherent with the visions. Adhering to standards can make demand on considerable scarce resources and remove focus from more critical issues. Approach 3: The shaping of the software development process. We find that the main issues regarding this approach are: (1) How is the development process organized? (2) How do the stakeholders stay engaged of the development process? (3) What tools are advantageous and profitable to apply? We have not yet seen a process taking both organizational usability and user interface usability into account in a controlled and efficient manner. This applies to both the involvement of stakeholders and the use of methods and techniques. So far methods and techniques in HCI are primarily backing user interface oriented usability. This is visible for instance in the many evaluation techniques such as Heuristic Evaluation, Cognitive Walkthrough and
Usability and Software Development: Roles of the Stakeholders
651
Think Aloud Tests. Techniques for uncovering organizational usability issues are far fewer and less commonly used [6].
5 Conclusions The study reports from three interview-based case studies of software development projects, where important web-based applications were implemented. We have aimed at describing different stakeholders’ contributions through cross analysis of the development projects. In all three cases the stakeholders appear as individuals without an archetypical role. They all have positions, interests, and competences that make them important individual contributors. The cases show how end product usability is depending on various factors in the software development project, such as the presence of living visions, the technological choices, and the applied software development processes. Important usability contributors are found both at the user interface usability level and at the organizational level. While many techniques for developing user interface usability are employed, techniques to support the uncovering of organizational usability are lacking. Particularly important are the vision carriers, who are able to keep the project on track with clear focus on the organizational usability issues when plans have to be adjusted. Descriptions of work practises and techniques supporting this task are rare, both in human computer interaction and software engineering. Acknowledgments. This work is part of the USE-project (Usability Evaluation & Software Design) founded by the Danish Research Agency through the NABIIT Programme Committee (Grant no. 2106-04-0022).
References 1. Beyer, H., Holtzblatt, K.: Contextual Design. Morgan Kaufmann, San Francisco (1998) 2. Bødker, K., Kensing, F., Simonsen, J.: Participatory IT Design. The MIT Press, Cambridge, Massachusetts (2004) 3. Juristo, N., Windl, H., Constantine, L.: Introducing usability, IEEE Software, 20–21 (2001) 4. Landauer, T.K.: The trouble with computers. MIT Press, Cambridge, MA (1995) 5. Strauss, A., Corbin, J.: Basics of Qualitative Research. SAGE Publications, Thousand Oaks, CA (1998) 6. Vredenburgh, K., Mao, J., Smith, P., Carey, T.: A survey of user-centered design practice, In: Proc. CHI 2002, Minneapolis, Minnesota, USA (2002) 7. Wixon, D.: Evaluating usability methods: Why the Current Literature fails the Practitioner. interactions 10(4), 29–34 (2003)
Human Performance Model and Evaluation of PBUI Naoki Urano1 and Kazunari Morimoto2 1
SHARP Corporation, Nagaikecho 22-22, Abeno-ku, Osaka-shi, Osaka 545-8522, Japan [email protected] 2 Graduate School of Science and Technology, Kyoto Institute of Technology Matsugasaki, Sakyo-ku, Kyoto-shi, Kyoto 606-8585, Japan [email protected]
Abstract. We analyze and discuss human performance model for PBUI (PushBased User Interface) in this paper. PBUI is a user interface method in which a user performs a desired task by selecting a target object that usually represents the task itself. The candidate objects are sequentially and automatically presented to the user by the system. When a target object is presented, user selects the target object by a simple action such as just pushing a button. In this paper, we propose human performance model of PBUI and discuss the characteristics of PBUI. We also evaluate performance of PBUI by comparing with GUI. Keywords: user interface model, performance model, PBUI (push-based user interface).
for ordinary PC users to manipulate such scroll bars, but it is often hard for noncomputer users, beginners, novice users, naïve users, or disabled users [2]. There are following cases that make GUI difficult to use. 1. Users do not know how to use a graphical pointer and manipulate GUI elements to perform a task. 2. Users themselves have difficulty to use a graphical pointer or cannot manipulate GUI elements smoothly to perform a task. 3. Users are in the limited environment so that users cannot use a graphical pointer to manipulate GUI elements. In case 1, they are usually beginners, novice users, or naïve users. Note that there are many non-computer users in the world. They usually do not have opportunity to have lessons how to use computer devices. Hence, it is sometimes difficult for them to manipulate the computer devices actively. However, it is common for them to watch the display screen and to react to the display similar to watching TV. Therefore, if we provide passive user interface that means less active operations, they might be able to use a computer more effectively. In case 2, they are physically handicapped and have difficulty to use graphical input devices unrestrainedly. If we provide a user interface that incorporates a simpler input device, they can use a computer more smoothly. In case 3, they cannot use the graphical input devices resulting from difficult situations. It is sometimes difficult to manipulate windows or widgets in the limited environment. For example, you are not free to use hands to use complex input devices when you are driving an automobile. In the above cases, if we provide simpler and easier user interface than GUI, it helps the user to perform the task where they cannot perform easily in GUI. We propose push based user interface (PBUI). It has the following characteristics. 1. Simpler input device 2. Less input operations 3. Passive user interface Simple input devices with less input operations are important for beginners or novice users. In PBUI, user performs an arbitrary task by just pushing a button to select a target object presented by the system. We call it one-button interface. The target object is usually represented in icon, image, or graphics which user can easily recognize the meaning of the object. Passive user interface is an important feature for PBUI. It guides the user to a designated task without any user’s active input. The interface prompts the user to select the target object representing a task that the system presented to user. In other words, the system pushes a suggestion to user. That’s why we call it push based user interface. In most GUI, users need to manipulate input device actively. Users move the pointer to the menu bar, open menus, point the chosen menu, and click to perform a task. Unless user makes some actions, the system does not change its state and the display does not change at all. In PBUI, users do not need to make actions actively. Instead of making an action first, users wait for the chosen menu or object to be presented to users. Users just select it by pushing the button when the designated
654
N. Urano and K. Morimoto
menu or object is displayed. The system changes the display to guide the user to perform a task. In the above photo image searching application, PBUI displays the photo images (i.e. candidate objects) sequentially rather than just displays a group of photo images and waits for user's active input of pointing, dragging, scrolling, picking etc. In PBUI, users just wait until a desired photo image (i.e. target object) is displayed. It is unnecessary to know how to use the GUI manipulators of windows such as scroll bars. Rather than manipulating the widows, candidate objects are automatically changing. The user selects the target object when it is displayed. We present human performance model for PBUI, important facors of PBUI and discuss important issues about PBUI that are decried below in this paper. 1. What is right duration time for displaying candidate objects? 2. What is a right number of candidate objects to be displayed simultaneously to user? 3. Is it affected by the complexity of objects? 4. What is an effective way to display candidate objects to user? 5. Performance compared with GUI.
2 PBUI Human Performance Model There are many possible ways to display candidate objects. PBUI’s performance is very dependent on how the objects are displayed to user. If it is displayed one by one, it is easy for user to recognize it to select. However, it takes much time to reach the target object. It is very inefficient if a candidate object is presented to user one by one. On the other hand, if many candidate objects are displayed at once, it is very difficult for user to explore the right target among many candidates and user often misses the right target even though it has been displayed. To maximize the efficiency, we should know human performance model for PBUI. Total time to perform a task consists of perception time, cognition time, and motion time based on the model human processor [3]. The task operation is processed as follows. 1. 2. 3. 4.
Visual stimulus by displaying candidate objects. Perception: Perceive the candidate objects. Cognition: Recognize the target object. Motion: Push a button to select the object.
Displaying time of object is determined by the perception process, cognition process and the motor process. Perception time, Tperception, is the time duration that is needed for user to perceive candidate objects are displayed to the user. Cognition time, Tcognition, is the time duration that is needed for user to recognize the meaning of object. Motion time, Tmotion, is the time duration that is needed for user to react to the target object for selection. Thus the total time, Ttotal, needed for user to decide selecting the target object is the sum of the time. It is represented in the equation (1). Ttotal= Tperception + Tcognition + Tmotion
(1)
Human Performance Model and Evaluation of PBUI
655
It is important for an application associated with PBUI to design the appropriate time duration for displaying candidate objects. We assume that the perception time and the motion time are constant for a particular PBUI system. We should carefully design the time duration for the cognition time to maximize the human performance.
3 Experiments 3.1 Displaying Time In this section, we investigate an right duration time for displaying candidate objects. To find a reasonable displaying time for an application, we need to know how long it takes to process a candidate object. We had related experiments before [4]. We used simple objects shown in the figure 1 for the experiments.
Fig. 1. Simple objects used in the experiments
Experiments were carried out as follows. 1. A target object is presented to test subject. A test subject remembers it. 2. Candidate objects are randomly displayed in front of the test subject. 3. If target object is displayed, the test subject selects it by moving a finger from the home position to it. If there is not the target object, he or she just releases a finger from the home position to move to the next display. The average time to process a candidate object was 0.42 sec. The average time to process a target object was 0.61 sec that means the presented candidate object and the target objects were the same object. We have experimented for 8, 16, 24 object cases. In 8 object case, 8 objects are displayed simultaneously by 2 x 4 (i.e. 2 rows and 4 columns) style. Correspondingly, candidate objects are displayed by 4x4 for 16 objects, and 4x6 for 24 objects. More time is needed to process multiple candidate objects on the same display. The figure 2 shows the results. In that the target object exists in the display screen, the process time is less than that the target object does not exist because the test subject does not need to examine all the candidate subjects. The equation below shows the elements of time required for user to process one display. Tprocess= Tperception + n*Tcognition + Teye(n) + Tmotion
(2)
n:number of objects displayed simultaneously Tperception:Time required to perceive display change Tcognition:Time required to recognize a candidate object is the target or not the target. Teye(n):Time required to move one candidate object to another candidate object. Tmotion :Time required to select the target or to move to the next display
N. Urano and K. Morimoto
& KURNC[KPIVKO G㧔UGE㧕
656
6CTIGVQDLGEV 0 QVCIGVQDLGEV
0 WO DGTQHFKURNC[KPIQDLGEVU Fig. 2. Displaying time for different number of objects
3.2 Appropriate Number of Objects Based on the experiment and equation explained in the previous section, we designed an experiment to investigate the optimum number of simultaneously displaying objects. In this experiment, we measured total time to perform the task on each number of simultaneously displaying objects. In this experiment, the test subject processes all candidate objects to select the target object. Total time required for the task is represented in the equation (3). Ttask= N/n(Tperception + n*Tcognition + Teye(n) + Tmotion) N:Total number of candidate objects.
㫋㪸㫊㫂㩷㫋㫀㫄㪼䋨㫊㪼㪺䋩
㪈㪇㪇
㫋㪸㫊㫂㩷㫋㫀㫄㪼
㪏㪇 㪍㪇 㪋㪇 㪉㪇 㪇 㪈
㪏 㪈㪍 㪉㪋 㫅㫌㫄㪹㪼㫉㩷㫆㪽㩷㪻㫀㫊㫇㫃㪸㫐㫀㫅㪾㩷㫆㪹㫁㪼㪺㫋㫊
Fig. 3. Task time (total time to perform a task) for different number of displaying objects
(3)
Human Performance Model and Evaluation of PBUI
657
In this experiment, the total number of candidate objects is 100. Number of test subjects are 6. Each test candidate had 80 trials for each number of displaying objects. The average times required for the task are 56.05 sec for 1 displaying object, 11.42 sec for 8 displaying objects, 16.32sec for 16 displaying objects, and 15.06 sec for 24 displaying objects as depicted in the figure 3. The results show that the optimum number of simultaneously displaying objects is between 8 and 16. 3.3 Complexity of Objects We discuss how complexity of objects affects the number of displaying objects to yield good performance. We had related experiments before [5][6]. We used the similar objects depicted in the figure 4 to test the performance. The complexity was set based on the number of vectors included in a picture. In figure 4, the complexity of the picture is increased from left to right.
Complexity 1
Complexity 2
Complexity 3
Fig. 4. Sample objects with different complexity
Experiments showed that if complexity is high the user needs more time to recognize the object as expected. However, the complexity does not affect the optimum number of simultaneous displaying objects. If the optimum number is n, displaying n objects applies to any complexity of objects. The area where an object occupies on the screen might affect the recognition time. We suggest that the object should be large enough to recognize it by user. 3.4 Effective Way to Display and Performance Compared with GUI We discuss what an effective way to display objects in PBUI is in this section. We suggest two types of PBUI for comparison. They are an automatic paging user interface and an automatic scrolling user interface. The display designs are depicted in the figure 5. The circle represents the position where a candidate object like a graphical object in the figure 4 is placed. The automatic paging system was experimented as follows. 1. When user’s finger is placed on the home position, 12 candidate objects are displayed for the displaying duration time. The duration time is deduced from the equation 2 based on the preliminary experiment of figuring out right values for each element of the equation for the objects. 2. After displaying the page for the certain time, it automatically displays the next page including the next 12 candidate objects.
658
N. Urano and K. Morimoto
Object
Object
Home
Home
Automatic Paging
Automatic Scrolling
Fig. 5. Display designs of PBUI
3. When the test subject finds the target object, he or she moves the finger to the target object to select. 4. The total time to perform the task is measured. In the automatic scrolling system, candidate objects are presented to the user in a different way. Instead of changing the whole page simultaneously, the objects are smoothly moved to the right. Each object is displayed for the certain time. The method of selecting the target object is the same. The table 1 shows results of the average time to perform a task in which the target object appears on the tenth page or equivalent. The automatic paging and the automatic scrolling showed about equal performance on every complexity. We don’t conclude which PBUI is better than the other, but it was reported that the accuracy of selecting right target is different [6]. The accuracy of the automatic scrolling is better than the automatic paging’s. We suppose that user has to move the eyes actively from an object to the next object in one page in the automatic paging. It might be easy for user to miss the target object because user has to pay attention to scan all objects in the page within the time set by the system. Rather than actively moving the eyes, user’s eyes relatively stay at the same vertical line to scan whole candidate objects in the automatic scrolling. It is easy for user to scan all objects. In other words, user seldom misses the target object. We think that the accuracy difference comes from the user’s scanning ability. Thus, we think that the automatic scrolling is suitable for naïve user. It is consistent with our assumption that PBUI is a user interface for beginners, novice users, or naïve users. Table 1. The results of task performance for PBUI on different complexity of objects
Fig. 6. Graph of task performance for PBUI and GUI
3.5 Performance Compared with GUI We provided two pilot GUIs for comparison. One is a manual paging user interface, and the other is a manual scrolling user interface. The display designs are depicted in the figure 7. The test subject needs to move the finger to the arrow to go to the next page including the next 12 candidate objects in the manual paging user interface. In the scrolling user interface, the test subject needs to manipulate the scrolling bar by dragging to navigate in the window to find the target object. The table 2 shows results of the average time.
Object
Object
Next Page Home
Scroll Bar
Manual Paging Fig. 7. Display designs of GUI
Manual Scrolling
660
N. Urano and K. Morimoto Table 2. The results of task performance for GUI on different complexity of objects
The performance of PBUI of the automatic paging user interface and the automatic scrolling user interface is about equal to the manual paging user interface. Those three show better performance than the manual scrolling user interface that is widely used in GUI for photo applications.
4 Conclusions This paper explains the characteristics of PBUI and suggests some important factors of PBUI. PBUI is an alternative user interface for the users discussed in the introduction. We present the human performance mode by the equation. Based on the human performance model, we discussed important factors of PBUI that are the duration time for displaying object, the number of objects to be displayed simultaneously, the complexity of objects, and the displaying method. We summarize our answers to the issues raised in this paper as follows. 1. What is a right duration time for displaying candidate objects? The duration time should be expressed in the equation 2. 2. What is a right number of candidate objects to be displayed simultaneously to user? If it is a simple application like an image exploring application, the number is between 8 and 16. 3. Is the number of simultaneously displayed candidate objects affected by the complexity of objects? The experiments show it is independent of the complexity of objects. 4. What is an effective way to display candidate objects to user? There are many ways to display candidate objects. Automatic scrolling user interface is a typical PBUI in which users do not need to scan the objects actively. 5. Performance compared with GUI. It depends on applications. If application is very simple like an image exploring application, performance of PBUI shows as good as, or better than GUI’s. It is important to find a suitable application using PBUI. We have to prove that PBUI should be very effective user interface for the real applications for the future work.
References 1. Margone, S., Shneiderman, B. (eds.): A study of file manipulation by novices using commands versus direct manipulation, Twenty-sixth Annual Technical Symposium, pp. 154–159. ACM, Washington DC (1987) 2. Maulsby, D.L., Witten, I.H.: Inducing programs in a direct manipulation environment, Proc. CHI’89 Conference, Human Factors in Computing Systems, ACM, New York, pp. 57–62. ACM, New York (1989)
Human Performance Model and Evaluation of PBUI
661
3. Card, S.K., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Mahwah (1983) 4. Takekuni, T., Urano, N., Morimoto, K., Kurokawa, T.: Proposal of Push-based User Interface and its Operating Characteristics, In: 2003 Japan Ergonomics Society Kansai branch conference proceedings, pp.146– 149 (2003) 5. Takekuni, T., Urano, N., Morimoto, K., Kurokawa, T.: A Study on Number of Objects in Push-Based User Interface, In: Human interface symposium proceedings, pp.109–112 (2004) 6. Li, Q., Urano, N., Morimoto, K., Kurokawa, T.: A study of the visual Push-Based User Interface that considers practicality. In: The 7th. human media workshop proceedings (2006)
Developing Instrument for Handset Usability Evaluation: A Survey Study Ting Zhang, Pei-Luen Patrick Rau, and Gavriel Salvendy Department of Industrial Engineering Tsinghua University, Beijing 100084, China [email protected]
Abstract. Handset is transforming from a traditional cellular phone to an integrated content delivery platform for communications, entertainment and commerce. Their increasing capabilities and value-added features provide more utilities, and at the same time, make the design more complicated and the device more difficult to use. An online survey was conducted to measure user’s perspective of the usability level of their current handset using a psychometric type of instrument. A total of 9 usability factors were derived from the results of exploratory factor analysis. The total percentage variance explained by these 9 factors of the overall variance of the data was 65.20%. The average internal consistency in this study is 0.70. Keywords: Handset; Usability; Usability measurements; Usability factors; Instrument; Survey.
Developing Instrument for Handset Usability Evaluation: A Survey Study
663
end-users need to be considered when measuring the usability of handset. Survey research can facilitate large amounts of data to be gathered with relatively little effort and support broad generalization of results [7]. 2) Mobility is quite difficult to simulate in a laboratory setting because of the changing context. The use of such devices in the context of doing other work also has implications for determining the context of use for usability testing [8]. 3) As an inquiry method, questionnaire survey plays a major role in subjective measurements. While past reviews of research has indicated a lack of survey study and psychometric instruments when simultaneously measuring multiple key concepts in the quality of experience in software systems [7, 9]. Furthermore, in many cases, questions of the standardized instruments are not specific enough to investigate handsets [10]. To supply the gap, the present study contributed to develop a usability instrument, comprising of specific design elements and structured usability factors unique to handset devices. The objective of this study is to develop an instrument to measure the perceived usability of handset product. The research issues of this study focused on two questions: (1) what are the most important usability factors for indicating the handset overall perceived usability? And, (2) how do the factors contribute to the handset overall perceived usability? The approach used in this study is expected to provide an innovative and systematic methodology for explaining and measuring the usability of handsets.
2 Literature Review 2.1 Usability of Handset There are many studies focused on the usability testing of individual handset design issues such as keystrokes [11, 12], content presentation [13, 14], battery duration [15], menu structure [16, 17], etc. However, few of them contribute to identifying usability dimensions and design factors for multiple features regarding to subjective feeling about the use of handsets. Chuang, Chang, & Hsu [18] examined the relationship between user’s preference of mobile phones and their form (hardware) design elements. Participants were asked to judge 26 mobile phone designs by using a user preference rating scale for 11 image words. Han et al. [1] measured 88 specific design elements for 36 products by using a measurement checklist. Another related study investigated the relationships among the design features of cell phones according to 1,006 college students’ preference ratings to their current cell phone [19]. According to the results, 5 design issues that significantly impact user’s overall satisfaction were identified, including calling-related features, personal preferences, portability, durable aspect, and aesthetic aspect. All of those studies were based on currently available cell phone models and focused on frequently used features. So the results can just tell us which is best among current designs. However, mobile technology is growing very quickly. It is necessary to consider the advanced features and functions which probably are uncommonly used now but will be popular soon. In the further research of Ling, Hwang, & Salvendy [20], they investigated the relationship among five advanced features and users’ preference
664
T. Zhang, P.-L.P. Rau, and G. Salvendy
level. The results showed that color screen, voice-activated dialing, and Internet browsing feature can strongly predict users’ satisfaction levels. We must aware that the advanced features are changing quickly. This brings difficulties for the usability research work. For instance, the color screen and camera function may be attractive features a few years ago. Today most cell phones have a color screen and the camera function also turns into a must-have feature for most mobile phone users. 2.2 Handset Usability Dimensions/Factors Menu structure and user interfaces are critical design features related to those basic communication issues: text entry, dialing and messaging, calendar, etc. The usability of those basic features can strongly influence the users’ overall satisfaction with the product. So there are many usability criteria have been studied for those basic handset features. Efficiency, effectiveness, simplicity (complexity), learnability, consistency, feedback, and memorability are essential and often addressed [21-26]. Ziefle [26] also indicated that predictability, familiarity, and generalizability (transfer of knowledge of a specific to similar interaction) are also the crucial users’ criteria for selecting mobile phones. Ji, Park, Lee, & Yun [27] developed a usability checklist consisted of five groups of usability factors for mobile phone user interface based on 21 usability principles. The efficient retrieval of information includes the organization of information, the method of accessing it, and form of delivery. Information must be presented to be both easy to obtain and easy to mentally integrate. It is because that performing navigation tasks on handset will place heavy cognitive demands on the user’s short-term memory because of limited screen size, scrolling capabilities, and slower processing [21]. The researches on the user performance with information retrieval largely focus on efficiency (speed and errors) and effectiveness (quality of outputs) [16, 22, 28]. Providing direct access to focused valuable content and simple hierarchies will increase the efficiency and decrease the keystrokes and text entry [16]. Another study of offering efficient information retrieval can be found in the study of [29], in which the mobile phone feature named two-phase fetch in the mail retrieving via Internet services is provided. Other critical usability factors include network connectivity, flexibility [30, 31], and personalization [32]. The emerging mobile commerce (m-commerce) technology promises exciting possibilities, but the user experience and acceptance of this technology still awaits an understanding. Bruner & Kumar [33] applied the technology acceptance model (TAM) to the consumer context and found that, the “fun” attribute contributes to consumer adoption of handheld Internet devices even more than the perceived usefulness. It has been strongly suggested that personalization is essential to creating positive mobile experience [34, 35]. Tarasewich [36] suggested that the context of use and security must be taken into account during the design and use of mobile commerce applications which will be affected by changing environmental conditions. Malloy, Varshney, & Snow [37] suggested the reliability (dependability) of wireless network infrastructure is necessary for the success of mobile commerce.
Developing Instrument for Handset Usability Evaluation: A Survey Study
665
Koutsiouris, Vlachos, & Vrechopoulos [38] provided a tailored evaluation framework to the mobile music services which integrated of key variables (factors) involved the user-mobile interaction process both from a business and technique perspective. Location-awareness provides mobile users with topical and personal contents that may increase the appeal of mobile guides in different application fields. Based on the results of seven field studies of Kaasinen [39], the usability factors of utility and user trust were identified to strongly affect the user acceptance of location-aware mobile guides. Fithian et al. [40] indicated that privacy would be the primary concern when using location-aware technologies, and the integration (with other functions, e.g. calling), understandability (of icons, labels, options, and how the application works), feedback (of system status or confirmation of some important actions) are some crucial determinants of users’ performance and satisfaction with the use of mobile location-aware applications. Except for the technology aspects, Ciavarella & Paternò [41] address the usability criteria for graphical UI design of the mobile guide applications with five concerns: web metaphor, navigation feedback, orientation support in the surrounding environment, minimal graphical interaction, and no redundancy in input commands. Howell, Love, & Turner [42] investigated the effect of interface metaphor and context of use (private/public) on the usability of a hierarchically structured speech-activated mobile city guide service. The results showed that visualization of the metaphor-based service significantly affected participants’ attitudes.
3 Survey Study A psychometric type of instrument was developed to gather end-users’ subjective perspective of usability of their current handset product. Firstly, handset usability dimensions were carefully selected, deleted and integrated from various resources [3, 6, 27, 43, 44]. Then, initial items were generated from a series of published usability instruments [2, 45-51] and modified with particular considerations for the identified handset usability dimensions. The first version of the instrument consisted of 98 items. All of the 98 items were firstly examined by the author according to correlation between items and centrality to the concept of usability. Then critiques of items were obtained from three PhD students and three master students who were familiar with the research topic and instrument design. 10 items were deleted, and word modifications were made according to those critiques. The final version of the scale includes a list of 88 items, with one global scale (item 89) measuring the perceived usability of the subject’s current handset. The instrument asks respondents to indicate how strongly they agree or disagree with each item on the instrument using a scale from 1 (strongly disagree) to 7 (strongly agree). The global scale can be used to analyze the criterion-related validity of the instrument. Not all questions were available on all handsets. If any question is not available on user’s handset or they perceive that they don’t have that question, they will mark “Not Applicable” (N/A) for the question.
666
T. Zhang, P.-L.P. Rau, and G. Salvendy
The survey was implemented online using HTML forms to a broad sample of individuals in China. Participants were recruited via personal contact, email contact, university BBS message, blogs, and Web forum announcement. Demographic information including user’s age, gender, job, and education level were collected. The experience with handset, manufacture and model of his/her current handset were also important to this study. When entering the survey website, the respondents were firstly instructed to read a short introduction of handset devices and the general definition of usability. Then respondents were asked to fill in a background questionnaire concerning demographic information, experience with handset, manufacture and model of their current handset. After that, the list of items was given.
4 Results Analysis 4.1 Respondents The total number of users participating in the survey study was 408. Prior to the analysis, 42 cases were deleted because of incomplete or inconsistent responses or repeated submissions. Particularly, it was assessed that how many respondents fully agreed or fully disagreed with two pairs of items that expressed opposite views to a scale in each of the two pairs (item 3 and 18, and item 37 and 46). 5 respondents had fully agreed or disagreed in both pairs. Therefore, those 5 respondents were deleted in the later data analysis. After the procedures, a total of 361 valid cases (143 males and 218 females) remained for the analysis. Respondents averaged 24.9 years of age (SD = 3.10 years), 4.6 years of handset experience (SD = 1.53 years), and 2.9 years of experience with their current handset (SD = 1.34 years). There are over 22 manufactures were collected in this survey, including four most popular manufactures which covered 71.2% of the sample size: Nokia (34.9%), Motorola (15.8%), Samsung (13.3%) and Sony Ericsson (7.2%). 4.2 Factor Analysis First of all, item response means were evaluated to determine whether a large percentage of participant responses created a “floor” or “ceiling” effect, which is observed when many of the individual scores are at one or both of the extreme ends of the scale suggesting that the scale may not have captured the actual variability in responses. All of the survey items had mean responses greater than 2.4 and less than 5.6, therefore no items were excluded due to “floor” or “ceiling” effects. A series of exploratory factor analyses were repeated to identify the factor structure of the 88-item instrument. Then sample data of 361 responses was examined using a principle component factor analysis together with equamax rotation method. Item reduction procedure was processed based on the four common employed criteria: 1) eliminating items with factor loadings less than 0.50 on all factors or greater than 0.50 on two or more factors [52]; 2) eliminating single-item factors [53]; 3) the value of Cronbach’s alpha of each factor should not decrease substantially when the item within
Developing Instrument for Handset Usability Evaluation: A Survey Study
667
that factor was dropped [54]; and 4) the derived structure should be simple and easy to interpret [55, 56]. As seen in Table 1, after an iterative sequence of factor analysis and item reduction, the finally identified instrument consisted of 29 items left. A total of 9 factors were derived, which explained 65.20% of the overall variance of the handset usability. The first two factors, satisfaction (how the user satisfy and enjoy with the product) and controllability (ability for the user to regulate, control, and operate the product), accounted for one third of the total variance of handset usability. The first five factors accounted for almost 50% of the total variance of handset usability. The internal consistencies of the 9 factors ranged from 0.60 to 0.84 with an average level of 0.70, indicating an acceptable level of internal consistency. Table 1. Factors, eigenvalues, percentage of variance explained and internal consistencies
Satisfaction
6.5718
Total Variance Explained % 22.66
2
Controllability
2.9111
3
Effectiveness
1.8337
4
Frustration
5
Factors 1
Initial Eigenvalues
No. of Items
Internal Consistency (Cronbach’s Alpha)
5
0.84
32.70
3
0.76
39.02
3
0.74
1.4759
44.11
3
0.72
Customizability
1.4419
49.08
3
0.67
6
Navigation
1.3064
53.59
3
0.66
7
Attractiveness
1.2602
57.93
4
0.64
8
Helpfulness
1.0908
61.70
3
0.66
9
Consistency
1.0165
65.20
2
0.60
5 Conclusion and Discussion The proposed instrument for handset usability testing was partially derived from the experimental and theoretical base outlined in the published literatures. The biggest difference between the proposed instrument and other published instruments is that, the factors and items were selected with special considerations for the handset characteristics. Its internal consistency in the present survey study is acceptable. But the construct validity and discriminant validity need more evaluations. The contribution of this study is both theoretical and practical. Few studies focus on the measurements of subjective perceptions on the usability of handsets. Most of the published instruments are limited in the traditional dimensions of software usability. Furthermore, the methodology of identifying relationships between usability factors and design features has not been systematically addressed. The results and methodology proposed in the present study supply those themes and contribute to the practice of handset designing in industry. With increasing efforts on the side of
668
T. Zhang, P.-L.P. Rau, and G. Salvendy
technology development, there is a lack of in-depth inquiry of the underlying phenomenon. The concept of mobility and mobile users are poorly understood. The approach used in this study is expected to provide an innovative and systematic methodology for explaining and measuring the usability of handsets. The survey results may cumulate a base of knowledge on this topic and help designers to recognize false assumptions and better ground their design choices. The study was limited in several aspects. Firstly the instrument was tested only in Chinese language, which generating the problem of semantic validity because of translation. Not all the items were selected from the established instrument items. The content validity and criterion-based validity need to be tested in the future. Secondly, the survey sample size is not large enough to conduct more statistical analysis. The nomological validity should be validated using structural equation modeling (SEM) in the future. Finally, the test-retest reliability of the instrument should be evaluated. Furthermore, because of the diversity of manufactures and models within each manufacture, it is difficult to perform statistical analysis to individual models within manufactures, due to small sample size for each model. Further investigation with usability experiment should be conducted to extract more specific design guidelines to improve specific features.
References 1. Han, S.H., et al.: Evaluation of product usability: development and validation of usability dimensions and design elements based on empirical models. International Journal of Industrial Ergonomics 26(4), 477–488 (2000) 2. Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly 13(3), 318–340 (1989) 3. Han, S.H., et al.: Usability of consumer electronic products. International Journal of Industrial Ergonomics 28(3-4), 143–151 (2001) 4. Jordan, P.W.: Human factors for pleasure in product use. Applied Ergonomics 29(1), 25–33 (1998) 5. Jokela, T., et al.: The standard of user-centered design and the standard definition of usability: Analyzing ISO 13407 against ISO 9241-11. In: de Janeiro, R. (ed.) Proceedings of the Latin American conference on Human-Computer Interaction, Brazil, ACM Press, New York (2003) 6. Zhang, D., Adipat, B.: Challenges, methodologies, and issues in the usability testing of mobile applications. International Journal of Human-Computer Interaction 18(3), 293–308 (2005) 7. Kjeldskov, J., Graham, C.: A review of mobile HCI research methods. In: Chittaro, L. (ed.) Mobile HCI 2003. LNCS, vol. 2795, pp. 317–335. Springer, Heidelberg (2003) 8. Scholtz, J. Usability evaluation (2004) [cited 2006 Oct. 27th]; Available from http://www.itl.nist.gov/iad/IADpapers/2004/Usability%20Evaluation_rev1.pdf. 9. van Schaik, P., Ling, J.: Five psychometric scales for online measurement of the quality of human-computer interaction in Web sites. International Journal of Human-Computer Interaction 18(3), 309–322 (2005) 10. Lee, Y.S., et al.: Systematic evaluation methodology for cell phone user interfaces. Interacting with Computers 18(2), 304–325 (2006)
Developing Instrument for Handset Usability Evaluation: A Survey Study
669
11. Klockar, T., et al.: Usability of mobile phones. In: Proceedings of the 19th International Symposium on Human Factors in Telecommunications, Berlin, Germany (2003) 12. Ziefle, M., Bay, S., Schwade, A.: On keys’ meanings and modes: The impact of different key solutions on children’s efficiency using a mobile phone. Behaviour & Information Technology 25(5), 413–431 (2006) 13. Bederson, B.B., et al.: A fisheye calendar interface for PDAs: Providing overviews for small displays. In: Proceedings of CHI’03 Conference on Human Factors in Computing Systems, ACM Press, Ft. Lauderdale, Florida, USA (2003) 14. Bederson, B.B., et al.: DateLens: a fisheye calendar interface for PDAs. ACM Transactions on Computer-Human Interaction (TOCHI) 11(1), 90–119 (2004) 15. Bloom, L. et al.: Investigating the relationship between battery life and user acceptance of dynamic, energy-aware interfaces on handhelds. In: Mobile Human-Computer Interaction Mobilehci, Proceedings. pp. 13–24 (2004) 16. Buchanan, G., et al.: Improving mobile internet usability. In: Proceedings of the 10th international conference on World Wide Web, ACM Press, Hong Kong (2001) 17. Ziefle, M., Bay, S.: Mental models of a cellular phone menu. Comparing older and younger novice users, in Mobile Human-Computer Interaction - Mobilehci 2004, Proceedings. pp. 25–37 (2004) 18. Chuang, M.C., Chang, C.C., Hsu, S.H.: Perceptual factors underlying user preferences toward product form of mobile phones. International Journal of Industrial Ergonomics 27(4), 247–258 (2001) 19. Ling, C., Hwang, W., Salvendy, G.: A survey of what customers want in a cell phone design. Behaviour & Information Technology (2005) 20. Ling, C., Hwang, W., Salvendy, G.: Diversified users’ satisfaction with advanced mobile phone features. Universal Access in the Information Society 5(2), 239–249 (2006) 21. Albers, M.J., Kim, L.: User Web browsing characteristics using palm handhelds for information retrieval. In: Professional Communication Conference, 2000. Proceedings of 2000 Joint IEEE International and 18th Annual Conference on Computer Documentation (IPCC/SIGDOC 2000), IEEE, Cambridge, MA (2000) 22. Chittaro, L., Dal Cin, P.: Evaluating interface design choices on WAP phones: navigation and selection. Personal and Ubiquitous Computing 6(4), 237–244 (2002) 23. Christie, J., Klein, R.M., Watters, C.: A comparison of simple hierarchy and grid metaphors for option layouts on small-size screens. International Journal of Human-Computer Studies 60(5-6), 564–584 (2004) 24. Kjeldskov, J., Stage, J.: New techniques for usability evaluation of mobile systems. International Journal of Human-Computer Studies 60(5-6), 599–620 (2004) 25. Marila, J., Ronkainen, S.: Time-out in mobile text input: The effects of learning and feedback. In: Chittaro, L. (ed.) Mobile HCI 2003. LNCS, vol. 2795, pp. 91–103. Springer, Heidelberg (2003) 26. Ziefle, M.: The influence of user expertise and phone complexity on performance, ease of use and learnability of different mobile phones. Behaviour & Information Technology 21(5), 303–311 (2002) 27. Ji, Y.G., et al.: A usability checklist for the usability evaluation of mobile phone user interface. International Journal of Human-Computer Interaction 20(3), 207–231 (2006) 28. Jones, M., et al.: Improving Web interaction on small displays. Computer Networks 31(11-16), 1129–1137 (1999) 29. Rao, H., et al.: iMail: a WAP mail retrieving system. Information Sciences 151, 71–91 (2003)
670
T. Zhang, P.-L.P. Rau, and G. Salvendy
30. Watters, C., Duffy, J., Duffy, K.: Using large tables on small screen display devices. International Journal of Human Computer Studies 58(1), 21–37 (2003) 31. Watters, C., Zhang, R.: PDA access to Internet content: Focus on forms. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS’03) - Track 4., IEEE Computer Society, Hawaii (2003) 32. Anderson, C.R., Domingos, P., Weld, D.S.: Personalizing web sites for mobile users. In: Proceedings of the 10th international conference on World Wide Web, ACM Press, Hong Kong (2001) 33. Bruner, G.C., Kumar, A.: Explaining consumer acceptance of handheld Internet devices. Journal of Business Research 58(5), 553–558 (2005) 34. Ho, S.Y., Kwok, S.H.: The attraction of personalized service for users in mobile commerce: an empirical study. ACM SIGecom Exchanges 3(4), 10–18 (2002) 35. Venkatesh, V., Ramesh, V., Massey, A.P.: Understanding usability in mobile commerce. Communications of the ACM 46(12), 53–56 (2003) 36. Tarasewich, P.: Designing mobile commerce applications. Communications of the ACM 46(12), 57–60 (2003) 37. Malloy, A.D., Varshney, U., Snow, A.P.: Supporting mobile commerce applications using dependable wireless networks. Mobile Networks and Applications 7(3), 225–234 (2002) 38. Koutsiouris, V., Vlachos, P., Vrechopoulos, A.: Developing and evaluating mobile entertainment applications: The case of the music industry. In: Rauterberg, M. (ed.) ICEC 2004. LNCS, vol. 3166, pp. 513–517. Springer, Heidelberg (2004) 39. Kaasinen, E.: User acceptance of location-aware mobile guides based on seven field studies. Behaviour & Information Technology 24(1), 37–49 (2005) 40. Fithian, R., et al.: The design and evaluation of a mobile location-aware handheld event planner. In: Proceedings of Human-Computer Interaction with Mobile Devices and Services: 5th International Symposium, Mobile HCI, 2003. Udine, Italy (2003) 41. Ciavarella, C., Paternò, F.: Design criteria for location-aware, indoor, PDA applications. In: Dignum, F.P.M., Cortés, U. (eds.) Agent-Mediated Electronic Commerce III. LNCS (LNAI), vol. 2003, pp. 131–144. Springer, Heidelberg (2001) 42. Howell, M., Love, S., Turner, M.: The impact of interface metaphor and context of use on the usability of a speech-based mobile city guide service. Behaviour & Information Technology 24(1), 67–78 (2005) 43. Folmer, E., Bosch, J.: Architecting for usability: a survey. Journal of Systems and Software 70(1-2), 61–78 (2004) 44. Hornbaek, K.: Current practice in measuring usability: Challenges to usability studies and research. International Journal of Human-Computer Studies 64(2), 79–102 (2006) 45. Chin, J.P., Diehl, V.A., Norman, K.L.: Development of an instrument measuring user satisfaction of the human-computer interface. In: Proceedings of SIGCHI ’88, Washington, DC: New York: ACM/SIGCHI (1988) 46. Lewis, J.R.: Psychometric evaluation of the post-study system usability questionnaire: the PSSUQ. In: Proceedings of the Human Factors Society 36th Annual Meeting, Human Factors Society, Atlanta, GA (1992) 47. Kirakowski, J., Corbett, M.: SUMI: The software usability measurement inventory. British Journal of Educational Technology 24(3), 210–212 (1993) 48. Lewis, J.R.: IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. International Journal of Human-Computer Interaction 7(1), 57–58 (1995)
Developing Instrument for Handset Usability Evaluation: A Survey Study
671
49. Lin, H.X., Choong, Y.Y., Salvendy, G.: A proposed index of usability: A method for comparing the relative usability of different software systems. Behaviour & Information Technology 16(4-5), 267–278 (1997) 50. Kirakowski, J., Claridge, N.: Website Analysis and MeasureMent Inventory (Web Usability Questionnaire). (1998) [cited 2006 Dec. 05th]; Available from: http://www.ucc.ie/hfrg/questionnaires/wammi/index.html 51. Muylle, S., Moenaert, R., Despontin, M.: The conceptualization and empirical validation of web site user satisfaction. Information & Management 41(5), 543–560 (2004) 52. Hair, J.E., et al.: Multivariate data analysis: with readings, 4th edn. Prentice-Hall, Inc, Upper Saddle River, NJ, USA (1995) 53. Stiggelbout, A.M., et al.: Ideals of patient autonomy in clinical decision making: a study on the development of a scale to assess patients’ and physicians’ views. Journal of Medical Ethics 30(3), 268–274 (2004) 54. Chiou, C.F., et al.: Development and validation of the revised Cedars-Sinai Health-Related Quality of Life for Rheumatoid Arthritis Instrument. Arthritis & Rheumatism-Arthritis Care. & Research 55(6), 856–863 (2006) 55. Smith, B., CapUti, P., Rawstorne, P.: The development of a measure of subjective computer experience. Computers in Human Behavior 23(1), 127–145 (2007) 56. Wang, Y.-S., Liao, Y.-W.: The conceptualization and measurement of m-commerce user satisfaction. Computers in Human Behavior 23(1), 381–398 (2007)
Part III
Understanding Users and Contexts of Use
This page intentionally blank
Tips for Designing Mobile Phone Web Pages for the Elderly Yoko Asano1, Harumi Saito1, Hitomi Sato1, Lin Wang2, Qin Gao2, and Pei-Luen Patrick Rau2 1
Cyber Solutions Laboratories, Nippon Telegraph and Telephone Corporation, 1-1 Hikarinooka, Yokosuka-Shi, Kanagawa, 239-0847, Japan {asano.yoko, saito.harumi, sato.hitomi}@lab.ntt.co.jp 2 Department of Industrial Engineering, Tsinghua University, Shunde Building, Tsinghua University, Beijing, 100084, P.R. China [email protected], [email protected], [email protected]
Abstract. This paper proposes tips for designing Web pages appropriate for the elderly. The characteristics of mobile phone Web pages and the effects of aging are elucidated. The elderly had difficulty in reading texts, finding the focus, operating pages and input, and understanding the contents in some cases. Tips for designing Web pages that are appropriate for the elderly are proposed based on our observations. Keywords: mobile phone Web pages, Web design, the elderly, aging effect.
This paper proposes tips for designing Web pages that are appropriate for the elderly. The characteristics of mobile phone Web pages and the characteristics of the elderly are surveyed. Behavior of the elderly when accessing the mobile phone Web is observed. Tips for designing Web pages appropriate for the elderly are then proposed based on the results of the observation.
2 Characteristics of Mobile Phone Web There are differences in the design and operation of mobile phone Web services around the world. There are also differences among mobile phones in Japan. We talk here about the common characteristics of mobile phone Web service in Japan. The three major characteristics are shown in Table 1. One is small display. The second is that the interface is quite inflexible, i.e. font size, color, and so on. The third is that there are few keys to operate. Small Display. The small displays trigger many negative effects. Only a little information can be displayed at a time. Therefore, text is apt to be closely displayed. Lists are aligned. Abbreviated words, symbols, and icons are frequently used to shorten the text. Many Restrictions on Interface Flexibility. Character size and font can not be changed. Carriage returns and figures are used to indicate paragraphs. The layout is apt to be simple. Color variation is often used to indicate information structure. We can change only the background color and character color. The focus of cursor is generally indicated by color reversal. The color tones differ with the terminal type or usage. Moreover, the cursor only jumps from link to link. Table 1. Characteristics of mobile phone Web Major characters Small display
Interface restrictions
Few keys available
Characteristics of mobile phone Web Only a little information can be displayed at once. Texts are closely displayed. Lists are aligned. Abbreviated words, symbols, and icons are frequently used. Character size and font can not be changed. Color variation is often used in lieu of other formatting techniques. The cursor point is generally indicated by color reversal. The cursor jumps from link to link. Cursor jumps to the next link by down key input even if the link is to the right of the current cursor position. Display mode must be changed to input characters. Mode key must be pushed many times to change input character type. Same key must be pushed many times to input one character.
Tips for Designing Mobile Phone Web Pages for the Elderly
677
Few Keys. The average mobile phone has only twenty keys or so. All operations must be executed through these keys. The cursor is moved to the next link by the down key regardless of the direction of the next link. Even if the links are horizontally aligned, the user must push the down key to move the cursor to the next link. Another big problem is the difficulty of inputting characters. Display mode should be changed to input the characters. In Japanese, we use many types of characters; hiragana, katakana, Chinese characters, alphabets, numerals, and so on. We may have to push the mode key many times to change input character type. Moreover, more than 50 kinds of syllabic characters must be accessed through only twenty keys. We sometimes have to push the same key five times to input one character.
3 Aging Effect of the Elderly Many abilities of the elderly decrease with age. Table 2 shows the aging effects of the elderly as related to the usage of mobile phones. They are divided into three groups: effects related to physical ability, cognitive ability, and mental load. Physical Ability. The most important aging effect is poor sight. 80 percent in sixties and 90 percent in seventies suffer from cataracts. These have several symptoms. Most patients suffer a decrease in eyesight. Everything appears fogged with a yellow tint. The ability to distinguish contrast also decreases [4]. Moving ability also decreases. The elderly are not good at detailed work. Cognitive Ability. The most important impact on cognitive ability is a drop in distinction ability. The elderly take time to discover and recognize information. They are apt to have difficulty in recognizing the difference even when the two things are displayed simultaneously. Moreover, it is difficult for them to perceive a change over time. Table 2. Aging effects of the elderly Category Physical ability
Cognitive ability
Mental load
Common aging effects Eyesight decreases. Everything appears to have a yellow tint. The ability to distinguish contrast decreases. Weak at detailed work. Takes time to discover and recognize information. Tends to have difficulty in recognizing differences. Tends to have difficulty in perceiving things that change over time. Poor at forming mental model. Difficulty in memorizing and retrieving information. Poor spatial ability. Decline in motivation and understanding.
The elderly are also poor at forming mental models. They perceive their operations and information as low level elements, so they can not memorize and retrieve them easily. It causes a failure in spatial ability.
678
Y. Asano et al.
Mental Load. The decline in motivation to do anything causes a decline in understanding. Moreover, the elderly tend to give up easily when they are confronted with a challenge.
4 Behavior of the Elderly Using Mobile Web We conducted an experiment to observe how the elderly interacted with some existing mobile phone Web sites. Behavior of the elderly and their comments were collected. 4.1 Method of Experiment Subjects. Ten subjects participated in the experiment. Four were male and six were female. All subjects were over 55 years old. The average age was 65.4. All subjects had experience in using the telephone call function of mobile phones. Five of them had used the mail function. Only one of them had accessed a mobile phone Web site. Equipment. One of the most popular mobile phones, P901i, made by NTT DoCoMo was used in the experiment. It had been on the market for one year and eight months. Its screen displays only 12 by 12 characters. Objects. Thirteen mobile Web sites were used in the experiment. Two were portal sites, five were electronic commerce sites, two were air ticket reservation sites, three were stock exchange sites, and the other was broadcasting service site. Tasks. The subjects were instructed to access all mobile Web sites and perform a task specific to each Web site. For example, they were instructed to search for a specific flight and to reserve a seat on the air ticket reservation site. They were also asked to remark on the Web design and the problems encountered while using the Web sites. Observation data. Behavior of the subjects as they used the mobile Web sites and their remarks were captured. Particular attention was paid to the behavior and remarks made when they committed some error or were at a loss. 4.2 Results and Considerations The significant results are shown in Table 3. They were related to visibility, focus recognition, understanding, operation, and page structure. Visibility. Most subjects remarked that the pages were not easy to read because the characters were too closely packed and their weak visual acuity. We noted that white backgrounds made reading easier because the contrast between the characters and the white background tended to be high. In some cases, subjects did not recognize scrolling and blinking objects; it was hard for the elderly to recognize objects that rapidly changed and they gave up easily. Focus Recognition. In many cases, the subjects took too long time to identify the focus point. This is because there were many color combinations and they could not easily identify which combination indicated the focus point. In some cases, the focus area was too small to identify easily because short words were used. In other cases, the focus color matched surrounding color so well that users could not identify the focus easily.
Tips for Designing Mobile Phone Web Pages for the Elderly
679
Understanding. In many cases, the subjects skipped abbreviated words, foreign word, symbols, and icons because it was difficult for them to understand their meanings. Moreover, characters drawn in icons or figures were too small for the elderly to recognize. Several similar misinterpretation problems were observed. Many subjects thought, wrongly, that red text always meant a warning. This indicates poor ability in forming mental models. Table 3. Significant problems encountered by the elderly Category Visibility
Focus recognition
Understanding
Operation
Page structure
Major problems Closely packed characters are difficult to read. White background color made reading easier. Scrolling and blinking objects were sometimes skipped. They could not easily discern which color was used for focus highlighting. The focus area was small to recognize because short words were used. They skipped abbreviated words, foreign word, symbols, and icons because they were difficult to understand. They misunderstood red as always indicating a warning. They were at a loss when entering characters. They often tried to jump to the next link (to the right of the cursor) by wrongly pressing the right key instead of the down key. They were apt to try to understand the content based on only the information displayed on the screen at a time. They sometimes had difficulty in choosing one among choices when not all of the choices could be displayed at the same time.
Operation. Most subjects were at a loss when entering characters. This was because they failed to form a mental model of the operation of inputting the characters. Moreover, many input operations imposed high loads on the elderly. A lot of erroneous operations were observed when the subjects intended to move the cursor to the next link; they tried to use the right key instead of the down key. This was caused by the mismatch between the directions of cursor movement and those of the operation key. Page Structure. Most subjects tried to understand the content based on only the information displayed on the current screen; they were not good at remembering content. They sometimes had difficulty in choosing one of several choices when not all of the choices could be displayed at the same time. Moreover, they often got lost on mobile phone Web sites because they could not remember how they reached the present page.
5 Tips for Designing Mobile Web for the Elderly We propose tips for designing mobile phone Web pages for the elderly based on characteristics of mobile phone Web service, the aging effects of the elderly, and our observations of their behavior.
680
Y. Asano et al.
Layout. Use color variation and carriage returns to format content into chunks of information that are easier to follow. However, try to keep information density high so that as much information as possible can be seen without scrolling. Moreover, set choices in one screen so that users can compare all of them at once. Visibility. Do not use scrolling or blinking texts because text changes tend to be too fast for the elderly to recognize. Color. Use only enough color variation so as to make the information structure understandable. Too many meaningless colors hinder recognition of which color combination indicates the focus of attention. Only high contrast color combinations should be used to indicate highlighting. Words. Do not use abbreviated words, foreign word, symbols, or icons for the important words or links because the elderly tend to skip over unfamiliar symbols. Moreover, do not use short words for linked text so as to make the focus of attention stand out. Operation. Try to minimize the number of characters that must be input because input operations impose high loads on the elderly. Choosing one of a few choices is easier for them.
6 Conclusion The characteristics of mobile phone Web pages and the effects of aging were elucidated. The phones' small displays, interface restrictions, and few keys available caused many problems combined with the elderly aging effects of diminished physical ability, cognitive ability, and mental load. We found that the significant problems of using mobile phone Web pages were related to visibility, focus recognition, understanding, operation, and page structure. Tips for designing Web pages appropriate for the elderly were proposed based on the results of our observations. The tips proposed in this paper suit the development of mobile Web sites that is applicable to various users, from the young to the old. Acknowledgments. We would like to thank Ms. Mamiko Mori for conducting and managing the observations. She also gave us a lot of valuable recommendations.
References 1. Cabinet Office (ed.): Statistics of coverage of durable goods. Annual Report of Consumption Trend Investigation, March (2006) 2. Statistics Bureau (ed.): Information on the 2005 Population Census of Japan, Ministry of Internal Affairs and Communications (2006) 3. Mobile Society Research Institute (ed.): White Paper on Mobile Society, NTT Publishing, pp. 57–61 (2006) 4. Okajima, K., Takase, M.: Computerized Simulation and Chromatic Adaptation Experiments Based on a Model of Aged Human Lens. Optical Review 8(1), 64–70 (2001)
The Role of Task Characteristics and Organization Culture in Non-Work Related Computing (NWRC) Gee-Woo Bock, Huei-Huang Kuan, Ping Liu, and Hua Sun National University of Singapore, Department of Information Systems, School of Computing, 3 Science Drive 2, Singapore 117543 {bockgw, mkuan}@comp.nus.edu.sg
Abstract. Many organizations have scrambled to get control measures and discipline systems in place to deter employees from engaging in NWRC. Since control measures and discipline systems are insufficient to curb NWRC at the workplace, we propose to integrate the control perspective with task characteristics and organization culture. Thus, we examine the following research questions: How would the amount of NWRC control mechanisms affect employees’ NWRC behavior under different task characteristics? Does a match between the disciplinary approach and organization culture lead to more effective NWRC management? Two separate studies on full-time employees in various organizations revealed three important findings. Firstly, the ineffectiveness of NWRC control mechanisms occurred under high degree of task non-routineness. Secondly, the fit between discipline systems and organization culture leads to higher employee satisfaction with NWRC management, which subsequently led to lower time spent on NWRC. Thirdly, there is no best NWRC discipline system for each organization. Keywords: Non-Work Related Computing, Task Characteristics, Organization Culture, Fit.
characteristics and organization culture in NWRC management. Thus our research questions of this study are: How would the amount of NWRC control mechanisms affect employees’ NWRC behavior under different task characteristics? Does a match between the disciplinary approach and organization culture lead to more effective NWRC management?
2 Literature Review A number of terminologies have been used synonymously with NWRC such as “junk computing” and “cyberloafing”. Since we are interested in various forms of NWRC that involve Internet access, we define NWRC in this study as using the Internet during office hours for personal e-commerce (i.e. watch stock prices), personal communication (i.e. instant messaging with MSN), Internet browsing (i.e. reading news on the Internet), downloading files for personal purposes (i.e. movies and music) and Internet gaming (i.e. Yahoo! Games) [1]. As NWRC behavior is performed at the expense of organizational resources, much research has uncovered the negative impact of NWRC on corporate productivity [5]. Bock and Ho [5] attributed this result to the interruptive nature of NWRC activities, which are found in their study to be predominantly from emails and internet messaging. Personal emails and instant messaging are forms of distraction for employees and the recovery time compounded from both these interruptions may result in a great deal of time wasted leading to lower job performance. 2.1 NWRC Control Mechanisms Control mechanisms exist to discourage employees from engaging in NWRC at the work place [4]. Sharma and Gupta [7] cited that 45% of all firms and 17% of Fortune 100 companies use monitoring software of various kinds. The study also reported that software that records employees’ keystrokes are used by companies such as Exxon and the U.S State Department. Much of research on control mechanisms are based on General Deterrence Theory [8], which suggests that organizational control approaches can deter employees’ abuse of computing resources by increasing the perceived costs of computer abuse. Although many of the controls have been employed by numerous organizations, they have failed to significantly lessen NWRC behavior. Even by implementing tighter controls, NWRC still persists in organizations of the banking industry, which is equipped with strong security controls [2]. As evident, the implementation of control mechanisms in organizations alone is not sufficient to warrant the success of NWRC management. Organizations are unable to adequately apply it in their environments because General Deterrence Theory does not cover all the factors affecting the effectiveness of NWRC management. Many studies based on General Deterrence Theory did not consider employees’ task characteristics in the implementation of the control mechanisms.
The Role of Task Characteristics and Organization Culture in NWRC
683
2.2 Discipline Systems Besides control mechanisms, several IS studies have also suggested discipline systems to deal with the misuse of organizational computing resources [2]. Discipline can be meted out in several ways. Organizations are reported to discipline computer abusers by internal sanctions such as suspension and dismissal or even report offenses to third parties such as the police or FBI. There are predominantly two kinds of disciplinary systems: progressive and positive [9]. Progressive discipline provides that increasingly serious punishments be meted out to members of the organization who fail to behave acceptably. Positive discipline, on the other hand, retains the idea of making progressively serious contacts with an employee when work problems arise but eliminates the hastiness to use punishment as a means of getting the employee to adhere to rules and regulations. Instead, it seeks to prevent problems through formal and informal managerial practices and the recognition of good performance. Employees are allowed to participate in the disciplinary decision-making process, so they may be more responsible for their own behaviors and more willing to follow disciplinary policies [10]. As such, researchers generally believe that positive discipline is more effective than progressive discipline in curbing employees’ misconduct [9, 10]. 2.3 Fit Fit plays an essential role in strategic management and can be defined as a theoretically defined match between two variables [6]. Fit as matching is specified without reference to a criterion variable, although subsequently, its effect on a set of criterion variables could be examined. Task-Technology Fit model [11] argues the relation between users’ task requirements and usage of organizational IS may play a pivotal role in determining the effectiveness of organizational policies and measures. Goodhue [11] proposed that information systems (systems, policies, IS staff) have a positive impact on performance only when there is correspondence between their functionality and the task requirements of users. Therefore, we examine whether the fit between task characteristics and control mechanisms does indeed help to enhance NWRC management in organizations (Study 1). Prior research suggests that the match between discipline system and organization culture can help to curb NWRC. Crow and Hartman [12] revealed that health care organizations which neglect the detrimental elements of their culture can find themselves at risk of poor employee relations and ineffectiveness in applying discipline. Schwartz and Davis [13] have recommended that management should consider the cultural risk of implementing strategies. The lack of fit between organization culture and the discipline system may result in employees’ resistance towards the system, ultimately leading to the failure of the discipline system. Thus, we examine whether the match between disciplinary system and organizational culture has an impact on NWRC management (Study 2).
684
G.-W. Bock et al.
3 Study 1 Study 1 was conducted to examine the effects of fit between control mechanisms and task characteristics on NWRC management. In this study, tasks are broadly defined as the actions carried out by individuals in turning inputs into outputs [14]. The TaskTechnology Fit model suggests that task-technology fit will lead to greater performance of the technology [11]. Goodhue [11] measured a two-dimensional construct of task characteristics: non-routineness (non-repetitive and non-analyzable search behavior) and interdependence (reliance on other organizational units). This paper would focus on the analysis of these two dimensions of task characteristics because they are closely related to the requirement of information processing capabilities. As the objective of NWRC management is to reduce NWRC behavior, the dependent variable in this study is NWRC behavior. 3.1 Task Characteristics-Control Mechanisms Fit Task non-routineness is defined in this study as the level of structuredness, analyzability, difficulty and predictability of a task [14]. Tasks which have a high degree of non-routineness require employees to engage in intensive analysis, discussion and research in order to minimize the task uncertainty and find out a solution. The use of Internet browsing could provide immense amount of information as well as useful resources for the tasks to support non-routine tasks. Belanger and Slyke [15] found that a certain amount of playful use of the Internet can lead to learning that may be of value to the organization. Thus, certain NWRC like Internet browsing could be perceived as useful for non-routine tasks, since these tasks require employees to acquire more skills, knowledge as well as up-to-date information. If the NWRC control mechanisms within the organizations are tight, the employees may perceive the control mechanisms as a barrier for increasing their job performance and are hence more likely to ignore the control mechanisms. Hypothesis 1: The higher the task non-routineness, the weaker the negative effect of NWRC control mechanisms on employees’ NWRC behavior. Task interdependence is defined in this study as the degree to which the task is related to other organizational units and the extent to which coordination with other organizational units is required [16]. Coordination between interdependent parties needs to be supported by the organization’s information systems. Van de Ven et al [17] found that departmental communication increased as interdependence among employees increased. Instant messaging software and email provide a good communication platform for employees to communicate. As these tools match the communication requirement necessary for tasks of high interdependence, the prohibition of these tools would conflict with employees’ task requirements. If NWRC control mechanisms within the organization are tight, the employees may perceive control mechanisms as a barrier to communicate with other employees for work purposes and are more likely to ignore the control mechanisms. Hypothesis 2: The higher the task interdependence, the weaker the negative effect of NWRC control mechanisms on employees’ NWRC behavior.
The Role of Task Characteristics and Organization Culture in NWRC
685
3.2 Methodology A survey was carried out to test the proposed hypotheses. We target full-time employees who have easy access to the Internet at work. 40 organizations were contacted and 26 of them finally participated in the survey. 250 questionnaires were distributed either by mail or in person. After deleting the responses with missing data, there were 167 valid responses (effective response rate = 66.8%). All scale items are operationalized at the individual level. The measurement of NWRC behavior was measured in terms of the self-reported time spent for NWRC [1]. Respondents of the study were assured that any information they provide would be kept confidential to minimize underreporting of NWRC behavior. Before proceeding to test the hypotheses, we tested the validity of the measures. Convergent validity was shown by item-total correlation coefficients above 0.40. Cronbach’s alpha coefficients ranged from 0.60 to 0.81, which shows acceptable reliability for exploratory research [18]. To test discriminant validity, factor analysis with Varimax rotation was performed and loadings showed that the constructs are distinct from one another. 3.3 Results and Data Analysis Results were analyzed using SPSS 13.0. In performing hierarchical regression, we firstly added in the amount of control mechanisms as a predictor followed by a task characteristic and finally the interaction term. Hierarchical regression equations show that NWRC control mechanisms or NWRC control mechanisms along with non-routineness do not have significant impact on NWRC behavior. When the interaction variable is added, the effects of control mechanisms and the interaction variable are significant at p = 0.012 and p = 0.035 respectively. ∆R2 is 0.027 and the F statistic is 4.52, which is well above 1 [19]. This shows task non-routineness moderates the relationship between NWRC control mechanisms and NWRC behavior. Thus, H1 is supported. The hierarchical regression equations also show that NWRC control mechanisms or NWRC control mechanisms along with interdependence are insignificant on NWRC behavior. When the interaction variable is added, the effects of control mechanisms and the interaction variable are insignificant at p = 0.341 and p = 0.561 respectively. ∆R2 is 0.002 and F statistic is 0.34, which is below 1 [19]. This indicates that H2 is not supported. 3.4 Discussion of Results The first finding shows a significant result for the interaction of task non-routineness and the amount of NWRC control mechanisms. The ineffectiveness of NWRC control mechanisms under high task non-routineness may result from the perceived usefulness of NWRC. Tasks of high non-routineness are typically more difficult to accomplish and in order to accomplish these tasks, the use of up-to-date information and various resources are absolutely essential. Since Internet browsing is the most efficient and effective way to get information in today’s workplace, any restrictions on the Internet browsing would affect the accomplishment of these tasks. As they are required to browse the Internet to complete their tasks, tight control simply brings
686
G.-W. Bock et al.
inconvenience for their jobs and they would still continue to engage in NWRC due to the usefulness of the Internet in their jobs and their personal agenda. Thus, control mechanisms would be ineffective under high degree of non-routineness. Task interdependence may not be a significant moderator between the amount of NWRC control mechanisms and NWRC behavior due to two reasons. With high task interdependence, success at the workplace is contingent on one another’s output. As such, employees may devote more time to accomplish their tasks so that their work unit can accomplish their collective objectives effectively, instead of spending time to engage in NWRC. To collaborate with others within the organization, employees may use other substitutes for communication such as phone calls and/or face-to-face meetings which would be more effective to complete tasks. Thus the control mechanisms for instant messaging software have little conflict with interdependence.
4 Study 2 Study 2 focuses on examining the fit between discipline systems and organization culture. In this study, we include another dependent variable, satisfaction with NWRC management, as managers are also interested to find out if the fit between discipline system and organization culture can improve satisfaction with NWRC management. Organizational culture has pervasive effects on an organization and is defined as a socially constructed, cognitive reality that is rooted in deeply held perceptions, values, beliefs or expectations that are shared by, and are unique to, a particular organization [20]. Although there are several classifications of organization cultures in previous literature, the Organization Culture Index [21] is the most appropriate for the study as it provides three different distinct dimensions of organizational cultures to minimize ambiguity: bureaucratic, innovative and supportive. Bureaucratic cultures are hierarchical and compartmentalized. They are usually based on control and power, with clear lines of responsibility and authority. Innovative cultures are exciting and dynamic. They are creative places to work in, filled with challenge and risk. Supportive cultures are warm and “fuzzy” places to work in and employees are friendly and helpful to each other. 4.1 Organization Culture-Discipline System Fit Management literature suggests that the fit between organization culture and discipline system is crucial for management. Any management idea (which includes discipline systems), no matter how good it is, will not work in practice if it does not fit the culture [13]. Commanducci [22] also stressed that even management simulations require proper fit with company culture. The disciplinary system adopted in an organization is a type of management practice which can affect employees’ satisfaction of the system and NWRC behavior. Satisfaction of the discipline system is defined as a generalized positive or negative evaluation of the discipline system (adapted from [23]). The fit between organization culture and the discipline system may result in stronger employees’ compliance towards the discipline system [12], which can ultimately lead to greater satisfaction with NWRC management and lesser NWRC behavior.
The Role of Task Characteristics and Organization Culture in NWRC
687
Hypothesis 3: Employees of an organization with a disciplinary system closely matched with its organizational culture will have higher satisfaction with its NWRC management. Hypothesis 4: Employees of an organization with a disciplinary system closely matched with its organizational culture will engage in less NWRC behavior. 4.2 Methodology A survey was carried out to test the proposed hypotheses. Similar to study 1, we target full-time employees who have easy access to the Internet at their work place. 182 questionnaires from 30 organizations were collected. There are 174 valid questionnaires for data analysis (effective response rate = 95.6%). All scale items are operationalized at the individual level. The measurement of NWRC behavior was operationalized in terms of self-reported time spent on NWRC [1]. Similar to Study 1, respondents were also assured that any information they provide would be kept confidential. The validity of measures was also tested. Convergent validity was shown by itemtotal correlation coefficients above 0.40. Cronbach’s alpha coefficients ranged from 0.84 to 0.91, which shows acceptable reliability for exploratory research [18]. To test discriminant validity, factor analysis with Varimax rotation was performed and loadings showed that the constructs are distinct from one another. A cluster analysis was conducted to identify homogeneous groups of cultural profiles. 3 types of organization cultures were identified as the complete linkage dendrogram has suggested the possibility of a partition into 3 clusters [24]. Within each cluster, t-tests showed that there are significant differences in the means of one culture dimension from the rest of each cluster at p<0.01. The bureaucratic culture cluster’s mean of bureaucratic dimension is significantly different from its means of innovative and supportive dimensions with t = 9.19 and t = 10.30 respectively. The innovative culture cluster’s mean of innovative dimension is significantly different from its means of bureaucratic and supportive dimensions with t = 4.30 and t = 3.27 respectively. The supportive culture cluster’s mean of supportive dimension is significantly different from its means of bureaucratic and innovative dimensions with t = 4.24 and t = 4.71 respectively. This shows that each cluster possesses a strong inclination toward one type of culture. Described clustering procedures above were carried out for each culture cluster to further sub-cluster it into two groups based on means of scales measuring disciplinary approach. A higher score of disciplinary approach will indicate positive discipline. One-way ANOVA was carried out to determine if significant differences in means existed among the two subgroups within the same cluster. The differences between two disciplinary groups within the same culture are significant at p<0.001, providing evidence of the existence of positive and progressive discipline (see Table 1). For example, within the bureaucratic cluster, the mean of the disciplinary approach of sub-cluster 1 (1.82) was higher than sub-cluster 2 (1.29) with F = 101.46 (p < 0.001). This indicates that sub-cluster 1 has positive discipline while sub-cluster 2 has progressive discipline.
688
G.-W. Bock et al. Table 1. Summary of within culture clusters one-way ANOVA test results
4.3 Results and Data Analysis The hypotheses of study 2 are analyzed with MANOVA and ANOVA. We conducted a test of between-subjects effects with MANOVA to analyze which individual dependent variables contribute to the significant multivariate effect. Bonferroni-type adjustment is applied to control experiment-wise error and decrease the chance of type I error [24]. In this paper, the adjusted alpha is equal to 0.025 (0.05/2). Using this alpha level, we have a significant univariate main effect on satisfaction towards NWRC management with F = 3.882 (p = 0.022) only when organizational culture and disciplinary system are considered together, supporting H1. However, the univariate main effect on the time spent on NWRC is not significant with F = 0.641 (p = 0.528). We also analyzed whether the disciplinary system has significant effect on satisfaction with NWRC management within each culture group using one-way ANOVA (see Table 2). For bureaucratic and supportive culture groups, satisfaction levels between two disciplinary approaches are different with F = 6.36 and F= 3.20 respectively, significant at p < 0.10. For the innovative culture group, the satisfaction levels between two disciplinary approaches are different with F = 8.90, significant at p < 0.05. These findings are consistent with the MANOVA results. Table 2. Summary of one-way ANOVA Test Results within each Culture Cluster Mean of Satisfaction Positive Progressive Discipline Discipline Bureaucratic (df 1:70, F>2.79, α=0.10) Innovative (df 1:40, F>4.08, α=0.05) Supportive (df 1:60, F>2.79, α=0.10)
FSAT
4.02
4.65
6.36
4.48
3.63
8.90
4.8
4.1
3.20
4.4 Discussion of Results The combination of organizational culture and NWRC disciplinary approach has significant effect on employee satisfaction toward NWRC management (H3). We also examined which discipline system fits best with which organization culture by looking at satisfaction. From Table 2 positive discipline is more accepted in
The Role of Task Characteristics and Organization Culture in NWRC
689
innovative and supportive cultures, with higher means of satisfaction (4.48 and 4.8 respectively) compared to progressive discipline (3.63 and 4.1 respectively). Progressive discipline is more accepted for bureaucratic cultures, with a higher satisfaction compared to positive discipline. This shows that positive discipline does not always produce better results than progressive discipline, although theoretically, positive discipline is superior as it encourages employees to participate in the management process. The fit between organizational culture and NWRC discipline have no significant impact on time spent on NWRC (H4). This may be explained by the reasons why employees engage in NWRC which include rational decision making factors like normative awareness regarding NWRC, influence of peer acquiescence and unconscious factors such as habit [25]. However, post-hoc linear regression shows that satisfaction with NWRC management negatively affects time spent on NWRC (p=0.002). Thus, we see that the fit between organizational culture and NWRC discipline system can exert an indirect effect on NWRC behavior through employees' satisfaction with NWRC management.
5 Conclusion and Directions for Future Research Studies 1 and 2 offer some insights for NWRC management in organizations. Firstly, the universal enforcement of control mechanisms or simply increasing the amount of disciplinary actions against offenders cannot reduce NWRC behavior. From the first study, the discrimination of sanctions according to task non-routineness is essential. The Internet activities of employees engaging in non-routine tasks should not be controlled rigidly. Instead, practitioners need to emphasize the balance of NWRC activities in employees’ daily use of Internet. From the second study, there is no best discipline system for each organization. The fit between the discipline system and organization culture leads to greater satisfaction towards NWRC management and eventually leads to less time spent on NWRC. This paper also offers contributions to research. This is a pioneer research effort to consider the effects of task characteristics and organization culture for NWRC management. Majority of NWRC literature has mainly focused on examining the antecedents of NWRC behavior (i.e. [25]) or elucidate the consequences of NWRC in organizations [5]. Although “fit” has occupied a central role in strategic management field [6], it has never been incorporated into NWRC research. However, since this research has not considered the role of industry norms and national culture, future research can further extend this paper by examining industry norms and national culture on NWRC.
References 1. Siau, K., Nah, F.F.H., Teng, L.: Acceptable Internet Use Policy. Communications of the ACM 45, 75–79 (2002) 2. Lee, J., Lee, Y.: A Holistic Model of Computer Abuse within Organizations. Information Management & Computer Security 10, 57–63 (2002)
690
G.-W. Bock et al.
3. Lim, V.K.G.: The Moderating Effects of Neutralization Technique on Cyberloafing and Organizational Justice. In: The Proceedings of Academy of Management Conference, Denver (2002) 4. Urbaczewski, A., Jessup, L.M.: Does Electronic Monitoring of Employee Internet Usage Work? Communications of the ACM 45, 80–83 (2002) 5. Bock, G.W., Ho, S.L.: Non-Work Related Computing (NWRC): Is there a Productivity Payoff? Accepted and forthcoming in Communications of the ACM 6. Venkatraman, N.: The Concept of Fit in Strategy Research: Toward Verbal and Statistical Correspondence. Academy of Management 14, 423–444 (1989) 7. Sharma, S.K., Gupta, J.T.N.: Improving Workers’ Productivity and Reducing Internet Abuse. Journal of Computer Information Systems 44, 74–78 (2003) 8. Beccaria, C.: On Crime and Punishments. Bobbs Merril, Indianapolis (1963) 9. Osigweh, C.A.B., Hutchison, W.R.: Positive Discipline. Human Resource Management 28, 367–383 (1989) 10. King, K.N., Wilcox, D.E.: Employee-Proposed Discipline: How Well is it Working? Public Personnel Management 32, 197–209 (2003) 11. Goodhue, D.L.: Understanding User Evaluations of Information Systems. Management Science 41, 1827–1844 (1995) 12. Crow, S.M., Hartman, S.J.: Organizational Culture: Its Impact on Employee Relations and Discipline in Health Care Organizations. The Heath Care Manager 21, 22–28 (2002) 13. Schwartz, H., Davis, S.: Matching Corporate Culture and Business Strategy. Organizational Dynamics 10, 30–48 (1981) 14. Perrow, C.: A Framework for the Comparative Analysis of Organizations. American Sociological Review 32, 194–208 (1967) 15. Belanger, F., Slyke, C.V.: Abuse or Learning? Communications of the ACM 45, 64–65 (2002) 16. Thompson, J.D.: Organizations in Action. McGraw-Hill, New York (1967) 17. Van de Ven, A.H., Delbecq, A.L., Koenig, R.: Determinants of Coordination Modes within Organizations. American Sociological Review 41, 322–338 (1976) 18. Nunnally, J.: Psychometric Theory. McGraw-Hill, New York (1967) 19. Carte, T.A., Russell, C.J.: In Pursuit of Moderation: Nine Common Errors and Their Solutions. MIS Quarterly 27, 479–501 (2003) 20. Hofstede, G., Neuijen, B., Ohayv, D.D., Sanders, G.: Measuring Organizational Cultures: A Qualitative and Quantitative Study across Twenty Cultures. Administrative Science Quarterly 35, 286–316 (1990) 21. Litwin, G.H., Stringer, R.A.: Motivation and Organizational Climate. Harvard University Press, Cambridge, Massachusetts (1968) 22. Commanducci, M.: Training Can Be Fun: Management Simulations Require Proper Fit with Company Culture. Canadian HR Reporter 11, 15 (1998) 23. Kidwell, R.E., Bennett, N.: Employee Reactions to Electronic Control Systems. Group and Organization Management 19, 203–218 (1994) 24. Coakes, S.J., Steed, L.G.: SPSS Analysis without Anguish: Version 10.0 for Windows. Wiley, Brisbane (2001) 25. Lee, O.K., Lim, K.H., Wong, W.M.: Why Employees do Non-work Related Computing: An Exploratory Investigation through Multiple Theoretical Perspectives. In: Proceedings of Hawaii International Conference of System Sciences, Hawaii (2005)
Searching for Information on the Web: Role of Aging and Ergonomic Quality of Website Aline Chevalier, Aurélie Dommes, Daniel Martins, and Cécile Valérian University of Paris X-Nanterre Cognitive Processes and Interactive Behaviours Laboratory 200 avenue de la République 92001 Nanterre cedex, France {aline.chevalier,adommes,daniel.martins}@u-paris10.fr
Abstract. Despite rapid growth in the number of websites, there is still a significant number of ergonomic problems, which hinder cognitive activities of web users. As cognitive aging is generally associated with a decrease of working memory capacities, an inhibition failure and a slowing of the speed of processing, we argue that aging may have negative effects on information search activities, especially when the website incorporates ergonomic problems. In the present experimental study, we compare younger and older web users performances while searching for information in two websites: one that fits the ergonomic recommendations and another with ergonomic problems. The results show that aging had negative consequences on users’ activities of information search (more times to find information, more number of steps required to find information and more cognitive resources involved in the activity). These consequences are more important for the non-ergonomic web site than for the ergonomic site. Keywords: Information search, Cognitive load, Ergonomics, Aging.
Recently, we notice that few researchers are interested in studying and determining cognitive strategies and difficulties that older users experience while searching for information (see, e.g., [2], [33]). Nevertheless, studies must be led in order to a better understanding of cognitive processes and cognitive difficulties involved in searching for information on the Web. Towards this end, we conducted an experimental study with younger and older web users. This study aimed at determining the influence of aging and ergonomic quality of websites on information search activity. The following section provides an overview of relations between information search activity, cognitive load and cognitive aging. Section 3 presents the experimental study. The results obtained are discussed in Section 4.
2 Searching for Information: Cognitive Load and Aging During the 80-90s, several attempts of modeling cognitive activities involved in searching for information were suggested (see [1], [9], [11]). First models described this activity as cyclical, i.e. the individual defines a (cognitive) goal, selects an information category, extracts information and integrates it into previous extracted information; the individual begins this cycle over and over again until s/he reaches her/his search goal. Nevertheless, these models do not explain why users fail when searching for information. Rouet and Tricot ([25]; then [34]) defined a more complete cognitive model that includes different factors, such as the degree of precision of the user's objective (vague vs precise), the extraction of unique or various sources of information, and the experience of users. This model is close to those used for searching in electronic information systems developed by Marchionini et al. [19] and Shneiderman et al. [32] with one main difference however: the latter models did not consider, for instance, the specific differences between the Web and bibliographical database systems. The model developed by Rouet and Tricot proposes an information search activity which is both cyclical (like Guthrie's model [11]), and close to text comprehension, problem-solving and decision-making activities. Accordingly, when individual searches for information, s/he elaborates a cognitive goal, selects a set of information, extracts information and integrates it to previous one. Individual restarts this cycle until s/he reaches the research goal. Therefore, searching for information consists in transforming a representation of a need of information into a request; its formulation depends on the contents and the constraints imposed by the system (here, the website). Next, the individual has to choose among the sources which are supplied to her/him those relevant for her/his information search (e.g., list of links or items presented on a website), by estimating them with regard to her/his representation of the goal. If the document is relevant, individual deepens her/his searching; if the document contains information partially relevant or very few relevant information, individual generally modifies her/his strategy and so her/his request. Ultimately, if the document is irrelevant, individual changes her/his request but also sometimes his/her representation of the goal. According to this model, searching for information is a very complex cognitive activity that requires many cognitive resources in working memory and so a high cognitive load.
Searching for Information on the Web
693
Sweller [31] distinguishes two kinds of cognitive loads: 1. Intrinsic cognitive load is linked to the task at hand. It depends on the difficulty of the content to be learned and on the amount of information that the individual has to simultaneously process in working memory. Intrinsic cognitive load decreases as knowledge in long-term memory increases. Consequently, a high intrinsic cognitive load corresponds either to a highly complex material or to an individual's low expertise. 2. Extraneous cognitive load is linked to the presentation of information, which has influence on cognitive resources involved. Intrinsic and extraneous cognitive loads are additive to create a total cognitive load [22]. As just indicated, working memory plays a central role in information search, since it allows to maintain and process temporarily several available information and individual’s goals [3]. Working memory is also one of the factors which has been revealed in cognitive aging research as a possible predictor of age-related declines in performance observed on a wide variety of cognitive tasks, such as reading or problem-solving. Older adults might experience difficulties in temporarily keeping and processing several information (for a review, see [35]): the amount of information that can be simultaneously processed and stored in working memory would decrease with aging. According to Hasher and Zacks [12], it is not so much the working memory size which would be important, but the way in which the information in working memory are managed with regard to the goal of the task at hand. Age-related differences in memory and other cognitive functions would be attributed to a decline in attentional inhibitory control over the contents of working memory (for a review see [18]). Older adults might be less able than younger adults to suppress and inhibit irrelevant information. Irrelevant information could overload working memory and thus interfere with the task to be performed. Inhibition failures associated to aging have been observed in numerous cognitive activities, such memory, language comprehension and reasoning [12], [18]. Last but not least, one of the most wide-spread theories of cognitive aging postulates a generalized decrease of the speed of executing processes, independently of the type or structure of the information being processed [28]. Many studies show that slowing of processing accounts for a significant portion of the age-related variance on a large number of cognitive tasks. Salthouse [28] suggested that two mechanisms underlie this effect: (1) according to the limited time mechanism, cognitive operations essential to the success of an activity are not all correctly executed by the elderly. Cognitive operations would be executed too slowly to be entirely accomplished in the assigned time, because much of the available time is taken up with early processes. (2) According to the simultaneity mechanism, the outcomes of early processes may be lost before they can be used by later processes. Decline in working memory capacities, inhibition failure and generalized decrease of the processing speed are the main effects of aging on cognition, and seem to explain the decline of older adults performances observed on numerous simple as well as more complex cognitive tasks, such as information search.
694
A. Chevalier et al.
3 Experimental Study 3.1 Research Problem and Objectives Information searching requires involvement of many cognitive resources, which depend on the individual’s cognitive capacities, their age, as well as characteristics and constraints of the used website. Among web interface characteristics, ergonomic quality has a central role. Indeed, if the website does not fit the users’ cognitive capacities, the cognitive load should increase. The possible consequences would be an overload and lostness on the Web [10], [30] ; for instance, Nielsen [23] noticed that half of the researches on the Web failed. Moreover, because of age-related changes in cognitive functioning, accessing to websites could become more complex for older users, especially when websites do not fit the users’ needs, i.e. sites incorporating (many) ergonomic violations. Accordingly, this study aims at determining the influence of the users’ age and the ergonomic quality of website on: (a) The time necessary to find information; (b) The number of steps (i.e. the number of hyperlinks visited by the participants) required to find information; (c) The amount of cognitive resources involved in finding information; (d) The participants' usability satisfaction with regard to the visited site. 3.2 Procedure Forty novice web users (in line with [13]) participated in this study: twenty younger users (M= 31 years old) and twenty older users (M= 64 years old). All of the participants had the same educational characteristics (Bachelor Degree) and they used punctually Internet to information searching and e-mails. Two versions of the same website were created, which presented an e-shop selling music products: • An ergonomic version that was consistent with ergonomic recommendations for web interfaces (ergonomic site, hereinafter referred to as ES, see Figure 1). • A non-ergonomic site (hereinafter referred to as NES, see Figure 2) that included the main ergonomic problems identified by designers and users in a previous study [7].
Fig. 1. Homepage of the ES
Fig. 2. Homepage of the NES
Searching for Information on the Web
695
We chose an e-commerce music site (selling CDs, show tickets, …) since those products are bought by many people on-line and do not require specific knowledge linked to the site content. The study was divided in two stages: Stage 1: Participants had to search for information to answer three questions successively presented from the homepage; each question had only one correct answer. For each participant, the order of the question presentation was counterbalanced. Navigation activities of the participants (visited pages, etc.) were recorded for analyzing. To respect the ergonomic recommendations and to compare the search time between the two sites, two steps (or two hyperlinks) were necessary to find the three correct answers. To measure cognitive load, participants, while they are searching for information, had to react to auditory signals (from Tholos software developed by [5]) by pressing a pedal with her/his foot (her/his hands remained free to use the computer). This made it possible to determine an average reaction time (in milliseconds). The participant's baseline reaction times measured during a training phase was subtracted from the reaction times measured during the experimental task (information search task), thus providing "reaction time interference scores" (RT). Such scores allowed us to measure the participants' cognitive resources: the greater the reaction time, the more cognitive resources were involved (for more details see, [5]). Stage 2: After searching for information, participants had to freely navigate the website to answer a usability satisfaction questionnaire including seventeen affirmations (based on the WAMMI, [14]). For each affirmation, participants had to indicate the degree to which they agreed on a 5-point-scale; the more the participants evaluated the site as satisfactory, the closer to 5 the grade was. 3.3 Results All of the results are presented in Table 1. Statistical analysis (ANOVAs) were conducted with Age (younger vs older users) and Site (ES vs NES) as factors. Analysis have considered, in the following order, search time and number of steps (§3.3.1), cognitive load (§3.3.2) and usability satisfaction (§3.3.3). 3.3.1 Times and Number of Steps Necessary to Find Targeted Information The time necessary to find targeted information (in sec.) was calculated from the moment the participant saw the homepage to the moment s/he said s/he had found the information. All of the participants succeeded in finding the three correct answers. The younger users needed significantly less time to find information than older users (F(1,36)=11.151 ; p<.002): in mean, 8.2 sec. and 11.56 sec., respectively. The ES users needed less time than the NES users (in mean 8.28 sec. vs 11.49 sec. respectively - F(1,36)=10.149 ; p<.003). Age*Site interaction is not significant. The post-hoc analyses show that the younger users found targeted information quicker while using the ES than the NES (p<.006), whereas the older users do not (see Table 1). In addition, the younger users found information quicker than the older users when navigating the ES (p<.005) whereas there is no significant difference between the younger and the older users while navigating the NES.
696
A. Chevalier et al.
Table 1. Mean (and standard errors of the mean) of time and steps to find information, cognitive load and of the usability satisfaction notes according to the users’ age and the ergonomic quality of the website Ergonomic Site (ES) Younger Older users users
Non-ergonomic Site (NES) Younger Older users users
Mean time (and SEM) to find information (in sec.)
6.12 (0.14)
10.44 (1.04)
10.28 (1.37)
12.68 (1.02)
Mean number of steps (and SEM)
2.1 (0.07)
2.27
2.23
2.85
Cognitive load (in ms)
120 (17.21) 145.4 (19.68)
187.9 (9.46)
420.4 (91.95)
Mean notes (and SEM) on 5 points to the usability satisfaction questionnaire
3.79 (0.19)
2.83
2.85
3.67
(0.14)
(0.2)
(0.1)
(0.25)
(0.27)
(0.27)
Recall that the optimal number of steps was the same for the two sites (2 steps). Nevertheless, the younger users made significantly less steps than the older users (F(1,36)=5.709 ; p<.03): in mean, 2.17steps vs 2.56 steps, respectively. Moreover, the ES users made significantly less steps (in mean 2.18) than the NES users (in mean 2.54) (F(1,36)=4.778 ; p<.04). Post-hoc analyses show that the older users made significantly more steps than the younger users exclusively when navigating the NES (p<.02). The older users made significantly less steps with the ES than the NES (p<.02), whereas there is no significant difference due to the site for the younger users (see Table 1). 3.3.2 Cognitive Load Recall that cognitive resources were measured using Tholos software (for details, see [5]), which allow to determine reaction time interference scores in milliseconds (RT). The RT of older users were significantly higher than the RT of the younger users (F(1,36)=7.207 ; p<.02 ): in mean, 280.9 msec. vs 153.95 msec., respectively. The ES generated RT lower than the NES (F(1,36)=12.74 ; p<.001): in mean, 132.7 msec. vs 302.15 msec., respectively. Interaction between Age and Site is significant (F(1,36)=4.647 ; p<.04): the older users involved less cognitive resources when navigating the ES than the NES (p<.0003), whereas there is no significant difference in the younger groups. In addition, the younger users involved less cognitive resources than the older users when navigating the NES (p<.002); but, there is no significant difference between the younger and the older users when using the ES (see Table 1). 3.3.3 Usability Satisfaction for the Two Web Sites At the end of the experiment, the questionnaire presented seventeen affirmations. Participants had to indicate for these affirmations the extent to which they agreed on a 5-point scale. The more satisfied a participant was, the closer to 5 the evaluation was.
Searching for Information on the Web
697
There is no significant effect of Age (in mean, 3.26 for the older users and 3.31 for the youngest). In contrast, the website used by participants has a significant effect (F(1,36)=14.596 ; p<.0005 ; see Table 1): the users were more satisfied when navigating the ES (in mean 3.73) than the NES (2.84).
4 Discussion and Conclusion This experiment aimed at determining the effects of the ergonomic quality of the website on the information search activity of younger and older novice users. The results show that older web users needed more time than younger users to find target information. This finding corroborates the most wide-spread theories of cognitive aging which suggested a generalized age-related decrease of the speed of executing cognitive processes, independently of the type or structure of the information being processed [28]. So, older adults need more time than younger adults to perform a same task (problem-solving, memorization, etc.). More precisely and surprisingly, this aged-related difference was only significant when participants navigated the ergonomic site. Despite this website was friendly and fit the users’ cognitive capacities, the older adults found more slowly the targeted information than the younger users. Moreover, the younger users found information quicker when navigating the ergonomic site than the non-ergonomic site, whereas the time to find information did not significantly vary between the two websites for the older users. So, when the site comprises numerous ergonomic violations, the younger adults need more time to process all information presented on the layout and to select the most appropriate information for their task. In contrast, no significant difference appeared between the two sites for the older users. Consequently, the younger users would beneficiate from the ergonomic quality contrary to the older users. Concerning the number of steps needed to find targeted information, the younger users made less steps than the older users. This finding is in accordance with those obtained by Kubeck et al. [15] involving younger and older web novice users. In their study, older participants made more actions than younger participants to achieve close performances. These results may reflect aged-related limitations of available cognitive resources in working memory [27]. Given both the limitations of cognitive resources and the decrease of the speed of executing cognitive processes, older adults may forget certain information, such as web pages visited, previous results and the general goal of the information search. Consequently, older users may have to go back more often than younger users, and so to make more steps. This explanation is in line with the one suggested by Rouet et al. [26]: the older adults may experience more difficulties to manage a set of goals (links), because of the decrease of working memory capacities. The results also show that the ergonomic website required less steps than the non-ergonomic one to find information. These findings corroborated the results about the time needed to find information. But a significant effect of the age appeared for the site composed of ergonomic violations: the older users made more steps than the younger users to find information. The younger users seemed less disturbed with the ergonomic violations than the older users; the latter ones needed to open up more web pages than the others. Because of a aged-related decline in attentional inhibitory
698
A. Chevalier et al.
control over the contents of working memory [12], [18], older adults may experience more difficulties than younger adults to suppress irrelevant information, i.e. distractive information, and to not focus attention on it. In this way, our results show that the older users made more steps with the non-ergonomic site than with the ergonomic site, whereas no significant difference appeared in the younger group. Moreover, no significant aged-related differences appeared concerning the number of steps made in the ergonomic site. The results about the cognitive load confirm these findings: the older users involved more cognitive resources than the younger users, and the non-ergonomic site also required a cognitive load higher than the ergonomic site. The interaction effect shows that the older users’ cognitive load was particularly high while navigating the non-ergonomic site, whereas no significant difference appears between the two websites in the younger group. Therefore, as just indicated, the non-ergonomic site may generate more difficulties for the older users than for the younger users: the older users may experience difficulties in selecting the relevant information with regard to the goal of the task to achieve. Because of a decline of inhibitory mechanisms, the older users may process more information than the younger, so they need more time, more steps and more cognitive resources to find the target information. According to Sweller’s theory [31], we can also suggest that the older users would be more sensible than the younger users to the extraneous cognitive load (load due to the ergonomic violations in the non-ergonomic). Consequently, the older users would experience more difficulties than the younger users to inhibit irrelevant information displayed on the non-ergonomic website. Finally, the younger and older users were more satisfied with the ergonomic site than with the non-ergonomic site; this finding is in line with the performance findings. To conclude, based on these findings, we argue that a better understanding of the differences between younger and older users in searching for information is required not only in a specific website, but on the Web with search engine tools. In addition, it would seem necessary to help designers develop a user-centered design activity. At the present, researches are developed to help web designers better understand and consider future users’ needs while designing websites [8]. Therefore, complementary researches are required to help designers focus on the specific needs of older users.
References 1. Armbruster, B.B., Armstrong, J.O.: Locating information in text: A focus on children in the elementary grades. Contemporary Educational Psychology 18, 139–161 (1993) 2. Aula, A.: Older adults’ use of Web and search engines. Universal Access in the Information Society 4, 67–81 (2005) 3. Baddeley, A.: Working memory. Clarendon Press, Oxford (1986) 4. Bhatt, G.: Bringing virtual reality for commercial Web sites. International Journal of Human Computer Studies 60, 1–15 (2004) 5. Cegarra, J., Chevalier, A.: Using Tholos software for combining measures of cognitive load: towards theoretical and methodological improvements (in revision) 6. Charness, N., Schumann, S.E., Boritz, G.M.: Training older adults in word processing: effet of age, training technique, and computer anxiety. International Journal of Technology and Aging 5, 79–106 (1992)
Searching for Information on the Web
699
7. Chevalier, A.: Evaluer un site Web: les concepteurs et les utilisateurs parviennent-ils à identifier les problèmes d’utilisabilité? Revue d’Intelligence Artificielle 19, 319–338 (2005) 8. Chevalier, A., Fouquereau, N., Vanderdonckt, J.: The Influence of a Knowledge-Based System On the Designers’ Cognitive Activities: A Study involving Professional Web Designers. Behaviour & Information Technology (in press) 9. Dreher, M.J.: Searching for information in textbooks. Journal of Reading 35, 364–371 (1992) 10. Gwizdka, J., Spence, I.: Implicit Measures of Lostness and Success in Web Navigation. Interacting with Computers (in press) 11. Guthrie, J.T.: Locating information in documents: examination of a cognitive model. Reading Research Quarterly 23, 178–199 (1988) 12. Hasher, L., Zacks, R.T.: Working memory, comprehension, and aging: A review and a new view? In: Bower, G.H. (ed.) The psychology of learning and motivation, pp. 193–225. Academic Press, San Diego (1988) 13. Hölscher, C., Strube, G.: Web search behavior of Internet experts and newbies. Computer Networks 33, 337–346 (2000) 14. Kirakowski, J., Claridge, N., Whitehand, R.: Human Centered Measures of Success in Web Site Design. In: Proceedings of the Human Factors and the Web Workshop. Basking Ridge (1998) 15. Kubeck, J.E., Miller-Albrecht, S.A., Murphy, M.: Finding information on the world wide web: exploring older adults’ exploration. Educational Gerontology 25, 167–183 (1999) 16. Ling, J., van Schaik, P.: The influence of font type and line length on visual search and information retrieval in web pages. International Journal of Human-Computer Studies 64, 395–404 (2006) 17. Lowe, G.S.: Computer literacy. Canadian Social Trends 19, 13–15 (1990) 18. Lustig, C., Hasher, L., Tonev, S.T.: Inhibitory control over the present and the past. European Journal of Cognitive Psychology 13, 107–122 (2001) 19. Marchionini, G., Dwiggins, S., Katz, A., Lin, X.: Information seeking in full-text end-useroriented search systems: the roles of domain and search expertise. Library & Information Science Research 15, 35–69 (1993) 20. Marquié, J.C., Jourdan-Boddaert, L., Huet, N.: Do older adults underestimate their actual computer knowledge? Behaviour & Information Technology 21, 273–280 (2002) 21. Morrell, R.W., Mayhorn, C.B., Bennett, J.: A survey of World Wide Web use in middleaged and older adults. Human Factors 42, 175–182 (2000) 22. van Merriënboer, J.J.G., Sweller, J.: Cognitive load theory and complex learning: Recent developments and future directions. Educational Psychology Review 17, 147–177 (2005) 23. Nielsen, J.: Designing Web Usability. New Riders Publishing, Indianapolis (2000) 24. Nielsen, J.: Web Usability for Senior Citizens: 46 Design Guidelines Based on Usability Studies with People Age 65 and Older. Nielsen Norman Group Report (2002) 25. Rouet, J.F., Tricot, A.: Task and activity models in hypertext usage. In: van Oostendorp, H., de Mul, S. (eds.): Cognitive aspects of electronic text processing. Ablex, Norwood, pp. 239–264 (1996) 26. Rouet, J.F., Ros, C., Jégou, G., Metta, S.: Chercher des informations dans les menus WEB: interaction entre tâche, type de menu et variables individuelles. Le. Travail Humain 4, 379–395 (2004) 27. Salthouse, T.A.: Working memory as a processing resource in cognitive aging. Developmental Review 10, 101–124 (1990)
700
A. Chevalier et al.
28. Salthouse, T.A.: The processing-speed theory of adult age differences in cognition. Psychological Review 103, 403–428 (1996) 29. Slone, D.J.: Internet search approaches: The influence of age, search goals, and experience. Library & Information Science Research 25, 403–418 (2003) 30. Smith, P.A.: Towards a practical measure of hypertext usability. Interacting with Computers 4, 365–381 (1996) 31. Sweller, J.: Cognitive load during problem solving: Effects on learning. Cognitive Science 12, 257–285 (1988) 32. Shneiderman, B., Byrd, D., Croft, B.: Sorting out searching: a user-interface framework for text searches. Communications of the ACM 41, 95–98 (1998) 33. Stronge, A.J., Rogers, W.A., Fisk, A.D.: Web-based information search and retrieval: Effects of strategy use and age on search success. Human Factors 48, 443–446 (2006) 34. Tricot, A., Rouet, J.-F.: Activités de navigation dans les systèmes d’information. In: Hoc, J.M., Darses, F. (eds.): Psychologie ergonomique: tendances actuelles. PUF, Paris (2004) 35. Zacks, R.T., Hasher, L., Li, K.Z.H.: Human memory. In: Salthouse, T.A., Craik, F.I.M. (eds.): Handbook of Aging and Cognition (2nd Edition). Lawrence Erlbaum, Mahwah 293–357
Creating Kansei Engineering-Based Ontology for Annotating and Archiving Photos Database Yu-Liang Chi1, Shu-Yun Peng 2, and Ching-Chow Yang 2 1
Dept. of Management Information System, Chung Yuan Christian University, 200 Chung-Pai Rd., Chung-Li (32023), Taiwan 2 Dept. of Industry Enginerring, Chung Yuan Christian University, 200 Chung-Pai Rd., Chung-Li (32023), Taiwan [email protected], [email protected], [email protected]
Abstract. Ontology is built to establish a classification and conceptualization in knowledge disciplines. With the support of ontology technologies, users can retrieve information in a semantic manner. A primary course of ontology building is concepts development. Typical concept constructing approaches are usually consulting experts or analyzing documents. However, ontology-based systems usually do not allowed user involvement during developing ontology. To acquire expertise from users, this study utilizes Kansei Engineering to translate human emotions such as perception, feeling, or impression of things into the design elements of ontology concepts. The new design ontology then depends upon user-centric conceptual structure. This study particularly interests in archiving photos by employing ontology with user involvement. Empirical lessons show user involvement can reduce the gap in defining concepts between experts and users. Keywords: Ontology, Knowledge, Affective Design, Kansei Engineering.
"a specification of a conceptualization" [7]. A conceptualization is an abstract, simplified view of the world that is used for representational purposes. That is, the ontology is a formal description of the concepts, attributes, and relationships involved in constructing common understanding for cognitions of real world events. One of the main utilizations of ontology-based applications is addressing semantic differential among physical expressions. Thus, this study employed the ontology technology in metadata to improve semantic representation. The typical strategy of ontology development was collecting abstract expertise from knowledgeable persons. The expertise was then translated into a concrete conceptual structure of ontology. Nowadays, ontology technologies were widely applied in various areas. In most cases, ontologies relied on experts-centric development procedures including concepts collection, formalization, and construction. Since users rarely participated in ontology building, they had to learn domain expertise before using the systems. Thus, current knowledge systems were far away from public needs. To facilitate user involvement during ontology building, approaches to translate user emotional feeling into design elements of ontology-based metadata schema were essential. This study employed Kansei Engineering in knowledge acquisition by following steps: gathering users’ intuition, merging similar intuitions to a perception, and concluding related perceptions to a conception. This paper is organized as follows. Section 2 describes the specific problem domain. We discuss the reasons of why user involvement is important in ontology building. Section 3 proposes a research design of the ontology-based system; three stages are designed to guide system development. Section 4 is detail designs of user involvement; the Kansei engineering is employed to gather user emotions and translate into ontology concepts. Section 5 discusses knowledge representation and query interface design; we propose two steps query procedures including an affective query interface and a temporal photo query interface. Section 6 concludes results of the paper.
2 Problem Description In most traditional archiving systems, information management utilized data level models such as the entity-relationship (ER) model to represent content structure or been applied to indexing schema design. Several limitations such as dependent, constraint, generalization and etc. were proved in researches that data model was insufficient to support rich semantics for in-depth applications [8]. The critical issue of data level model only provided text-based search functionalities. Some systems declared enabling semantic search using synonyms dictionary. However, semantic issues were not only synonyms, but also in-depth implications and semantic divergence. For examples, the plant Alocasia odora implied one of family Araceae, one of order Alismatales, one of class Liliopsida, one of phylum Magnoliophyta, and finally one of kingdom Plantae. That is, Alocasia odora inherited all features from its superior layers. Semantic divergence was also often engendered when the vocabulary was used in different domains. For examples, the mouse might be used as an animal name, an appliance of computers, or even a character of cartoons. Different cognition and knowledge behind distinct application domains are obvious.
Creating Kansei Engineering-Based Ontology
703
On the other hand, semantic level model provided rich data representation types to describe the real world. Evidences in the literatures suggested using semantic level model for a better and flexible information system solution [11][12]. Ontology approach is one of semantic representation techniques that are built to establish a classification and conceptualization in knowledge disciplines. With the support of ontology technologies, users can retrieve information in a semantic manner. As shown in Fig.1, ontological system development can be roughly divided into two phases: ontology engineering and inference applications. Ontology building is a set of engineering processes that comprise knowledge acquisition and knowledge representation. The principal task of expertise acquisition is making concepts and organizing concepts becoming a hierarchical structure.
Fig. 1. Ontology-based system development includes ontology engineering and inference developing
Most ontology developers might agree that building a proper conceptual structure for an applied domain is challenging. Typical knowledge acquisitions usually collect expertise from experts, knowledge engineers, or even elicit knowledge in document [3]. In most cases, however, ordinary users were hardly to use the knowledge systems because the system was built by professional people. Ontology development is highly depended upon the synthesis of three components such as problem domains, applied fields, and human perspectives. Thus, similar but different ontologies were often incurred when components change. For example, if a knowledge system was designed for public, we needed to consider user involvement during ontology development. The left process sketched in Fig.1 was labeled as ‘affective design’ that was a preprocess of expertise acquisition. This study developed approaches to correlate human perspectives that were dedicated not only by experts but also by ordinary users. To develop an affective ontology-based system, this study particularly interested in gathering user emotions such as intuitions, images, and perceptions of things and translated into the design elements of concepts. More detail designs are described in the following sections.
3 Research Design Ontology is increasingly popularly applied in industry; however, the ontology development approach is still at an early stage. Several studies have been noted that ontology building was more of an art than a science [5]. As shown in Fig.2, this study was divided into three stages: knowledge acquisition, knowledge representation, and knowledge application.
704
Y.-L. Chi, S.-Y. Peng , and C.-C. Yang
The first stage dealt with constructing a conceptual structure within a specified domain of interest. For example, this study focuses on the vascular plant domain. The main task was acquiring affective elements of user’s cognitions about a vascular plant. Another issue was how to translate affective elements into concepts of ontology. As a beginning, this study examined the constituent parts of a concept and find spaces for connecting with affective design. A concept is a common understanding of things. Ontology concept concerns determining what definitions of being are fundamental and regards under what kind circumstances. Therefore, a concept can be labeled a name and contain several definitions. The affective design of this study was based on Kansei engineering (KE) method to translate human’s perception, feeling, or image into concrete design elements of the needs of ontology concepts. Kansei Engineering was originally used as ergonomic technology to capture consumers' psychological perceptions for product development [10]; some revisions of KE approach were made to conform to the need of our design. More detailed implementations were described in section 4.
Fig. 2. An design of affective aware ontology based archiving system
The second stage was about building an ontology knowledge base. Within ontologies, the knowledge base can be denoted as K=(T, A) [2]. The expression represents that a knowledge base (K) can be derived from intentional knowledge ‘Tbox’ (T) and extensional knowledge ‘A-box’ (A). The T-box contains the conceptual definitions into a terminology module (i.e., a taxonomy), and the A-box contains assertions about individual states into an assertional module or so called assertional knowledge. As the middle sketch illustrated in figure 2, the T-box was a conceptual structure which is the result of the first stage. An annotation system facilitated developers to describe meaning of objects in a semantic manner. This study used digital photos of vascular plants as annotating examples. The last stage implemented information retrieval. As the bottom sketch illustrated in figure 2, users retrieved information via a Web-based interface. The inference system was an application integrator based on a reasoning engine to manipulate
Creating Kansei Engineering-Based Ontology
705
ontology knowledge base and then accessed digital contents system. More details were described in the section 5.
4 User Involvement in Expertise Acquisition To develop an emotional aware ontology, the expertise acquisition required expert and user work together. As a beginning illustrated in Fig. 3, the domain was about how to identify a vascular plant through users’ emotional descriptions. The Affectivebased ontology has been proposed as an emotion-enabled repository to support knowledge sharing and reuse across different applications. To do so, affective ontology building must capture semantic and property sets that can be further used to define the conceptual structure. A property set was about features of the domain objects. For example, plant properties may include the root, stem, flower, and so on. Domain experts contributed their expertise to property set in this stage. A semantic set was vocabularies that were about human emotions of specific properties. For example, the vocabularies such as tufted, xylem, and stingy can be used to describe a plant stem. The process of Kansei Engineering was implemented to collect, refine, and group user cognitions to critical vocabularies. User emotions were gathered in this stage. To link semantic set and property, some statistical methods were employed for producing Kansei words. The term ‘Kansei’ was a Japanese word which means psychological feeling or experience that people have in their mind [10]. For example, imagine we were going to eat an ice cream bowl. Some people might feel tasty and joy, but some other might feel fatness. These emotions can be grouped as an abstract concept related to ‘ice cream’. The final stage was identifying concept definitions and a conceptual structure. Knowledge engineers utilized analysis tools such as formal concept analysis (FCA) to make a prototype of ontology.
Fig. 3. Knowledge acquisition and modeling process
To gather plant properties, several resources such as botanic books, encyclopedia, and Internet were referred. Two botanists were invited to figure out general plant properties, combined similar properties to categories, and identified critical physical parts of plants. 8 parts of the vascular plant are finally determined. On the other hand, a semantic set process was much complex because of emotions gathering. Three steps of the semantic set building were described below: 1. Emotional vocabularies collection: Most emotional vocabularies were adjectives. Several sources such as magazine, online articles, dialogue, and even slang were
706
Y.-L. Chi, S.-Y. Peng , and C.-C. Yang
referred. The collection task continued for one week by three assistants. More than 334 words were collected. 2. Data cleaning and grouping: To reduce the number of vocabularies, we first deleted infrequent vocabularies manually. Then, an affinity diagram tool was employed to assemble vocabularies with similar semantics and construct their relationships in the nature. The affinity diagram is originally applied to discover meaningful brainstorm into useful ideas. This study utilized the affinity diagram to sort emotional vocabularies into naturally related groups. 96 vocabularies were grouped and further formed in bipolar pair as illustrated in table 1. Table 1. Examples of partial bipolar vocabularies
Big --- Small Shout---Slender Wide---Narrow Decorative---Practical Wild---Plant …..
3. User survey: To understand semantic differential (SD) of users while using the above affective vocabularies in describing a plant, a questionnaire survey was developed. The SD questionnaire measures human’s reactions to stimulus cognitions. Participants were required rating on bipolar vocabularies defined with opposite constructs at each end. A pre-questionnaire was carried out by 15 participants. Some vocabularies were moved if the ratings are neutral. A revised questionnaire survey via the Internet has received valid responses from 573 people out of 812 polled in two weeks. 4. Data analysis: The results was analyzed by using a statistic tool SPSS. This study utilized two functions including factor analysis and cluster analysis to find the ‘Kansei’ words. There were 14 clusters are identified and then labeled as the Kansei words. The synthesis stage utilized formal concept analysis (FCA) to recognize relationships between Kansei words and properties. FCA can be noted as a triple (G, M, I) formula, if G is a set of objects, M is a set of attributes, and I is binary relations between G and M [13]. Here, G is referring parts of a plant and M is referring Kansei words. The formula is usually represented by a context lattice table. As illustrated in figure 4, Kansei words were listed in the top row and features were arraged in the left column. The cross symbol indicated that a specific feature had the corresponding Kansei words. The analysis has been implemented by using a FCA tool Galicia. An interactive system questioned developers each inconsistency context in terms of implication. Users must then either confirm that the implication was always true or disagree by placing in a counterexample of the existing cases. This counterexample was then added to the formal context. The program stops when all uncertain implications of the context were valid in the universe.
Creating Kansei Engineering-Based Ontology
707
Fig. 4. A context lattice table example
In ontology design view, for example, items list in both top row and left column were all concepts or classes. The cross symbol actually described relations between two concepts. For examples, a concept stem has filler classes such as color, quantity, size, shape and so on. The FCA line diagram functionality can be further used for drawing the conceptual structure of ontology.
5 Knowledge Representation and Query System Design Ontologies were used to model domain knowledge of interest. Several XML-based ontology representation languages such as RDF, DAML, and OWL provided different schemas and capabilities [9]. This study utilized Web Ontology Language (OWL) which was the newest ontology language from World Wide Web Consortium (W3C). OWL-based ontology consisted of classes, properties, and individuals. The Protégé OWL editor was employed to create OWL ontology. OWL class was as same as the ontology concept which contained formal definitions. OWL property was used to link classes. A typical definition was formed by a formula that consisted of properties, and filler classes to limit scope of a class.
Fig. 5. An affective query interface allowed the user to enter emotional query
708
Y.-L. Chi, S.-Y. Peng , and C.-C. Yang
Moreover, the description logic (DL) symbols can be added as modifiers to further restrict the definition. For example, an asserted definition of the class stem can be expressed as ∃hasShape Shape hasShape Size . The ‘owl:Thing’ was the root class of an OWL ontology. The root class connected subclasses to create a hierarchical class structure also known as the ‘T-box’ of a knowledge base. OWL individual was also known as an instance of a class. In annotation stage, developers filled in values of each property according to real circumstances of an individual. These asserted individuals were also known as the ‘A-box’. Total 223 asserted individuals of vascular plants have been inserted. The major improvement of this study was involving affective design to a conceptual structure of the ontology. Since the data repository has employed ontology as a knowledge base, the system was available to support an affective user interface development. In many cases, people can not really believe something unless they can see it and see with their own eyes that the thing is true. Therefore, the system first retrieved possible items represented in photos. After user identify target, the second query then sent to the server to get corresponding information. This study has designed two stages information retrieval system that supported an affective query interface and a temporal photo query interface.
Fig. 6. A vascular plant has corresponding information were retrieved by clicking a photo
Creating Kansei Engineering-Based Ontology
709
Here, a scenario supposed that a user wanted to query a vascular plant without the plant name. The information retrieval was implemented as the following two stages processes: ● As shown in fig. 5, users follow the affective directions to select proper concepts of their impressions of the searching target. After the query was done, the knowledge system invoked a reasoner to infer possible answers. ● Some qualified items were represented by photos on a temporal Web page. A user identified the plant by clicking a proper photo. As shown in fig. 6, more detailed corresponding information have been accessed and then presented to the user. This study has developed an information retrieval mechanism based on a Javabased reasoner. The programmed mechanisms were supported by Pellet and Jena API to infer OWL ontology bases. With the Java Server Page helped, this knowledge system can be used in the Internet.
6 Conclusion Advantages of using ontology-based systems included rich semantics, logic expressions, and knowledge sharing. Generally, ontology-based system development involved several processes in terms of knowledge engineering including expertise collection, knowledge modeling, conceptual structure building, and inference mechanisms. In traditional ontology building, knowledge acquisition was carried out by knowledgeable persons who had specialized domain expertise. However, there were doubts that users and system designers had identical cognitions and viewpoints. Particularly, expertise was the basis of ontology building that directly impacted the performance of systems. As a result, users might be difficult to use expert-centered knowledge systems. Thus, a crucial problem underlying ontology building was that users’ needs were often ignored. This study develops an approach that regarded user involvement during the design of concepts development. The Kansei engineering has introduced to gather user emotions and translate into design elements of concepts. Several approaches have been utilized to support expertise acquisition such as a questionnaire test, an affinity diagram, and a formal concept analysis. Empirical lessons related to user involvement were concluded as follows: First, Kansei words were the primary conclusions of user emotions of the domain of interest. These words can be used as ontology concepts. Second, the formal concept analysis can be used to elicit relationships between affective words and properties. Both affective words and properties should be treated as concepts during ontology building. Consequently, user involvement during ontology development is necessary when application systems are designed for the public. Future works suggested developing more efficient ways to collect user emotions. Acknowledgments. The authors would like to thank the National Science Council of the Republic of China, Taiwan for financially supporting this research under Contract No. NSC 95-2416-H-033-009.
710
Y.-L. Chi, S.-Y. Peng , and C.-C. Yang
References 1. Arora, J.: Network-enabled digitized collection at the Central library, IIT Delhi. Intl. Info. and Libr. Review 36, 1–11 (2004) 2. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Parel-Schneider, P.: The description logic handbook. University Press, Cambridge UK (2003) 3. Chi, Y.-L.: Elicitation synergy of extracting conceptual tags and hierarchies in textual document. Expert Sys. with Apps. 32, 349–357 (2007) 4. Chi, Y.-L, Hsu, T.-Y., Yang, W.-P.: Ontological techniques for reuse and sharing knowledge in a digital museum. The Electronic Libr. 24, 147–159 (2006) 5. Compton, P., Jansen, R.: A philosophical basis for knowledge acquisition. Knowledge Acquisition 2, 241–258 (1990) 6. El-Sherbini, M.: Metadata and the future of cataloging. Library Computing 19, 180–191 (2000) 7. Gruber, T.R.: Towards principles for the design of ontologies used for knowledge sharing. Int. J. of Human-Computer Studies 43, 907–928 (1995) 8. Hull, R., King, R.: Semantic Database Modeling: Survey, Applications, and Research Issues. ACM Computing Surveys 19, 201–260 (1987) 9. Horrocks, I., Patel-Schneider, P.F.: Reducing OWL entailment to description logic satisfability. J. of Web Semantics 1, 345–357 (2004) 10. Nagamachi, M.: Kansei Engineering: A new ergonomic consumer-oriented technology for product development. Int. J. of Industrial Ergonomics 15, 3–11 (1995) 11. Sugumaran, V., Storey, V.C.: Ontologies for conceptual modeling: their creation, use, and management. Data. and Knowledge Engineering 42, 251–271 (2002) 12. Uschold, M., Grueninger, M.: Ontologies: principles, methods and applications. Knowledge Eng. Review 11, 93–155 (1996) 13. Wille, R.: Concept lattices and conceptual knowledge system. Computers and Mathematics with Application 23, 493–515 (1992)
Influence of Avatar Creation on Attitude, Empathy, Presence, and Para-Social Interaction Donghun Chung1, Brahm Daniel deBuys2, and Chang S. Nam3 1
School of Communication Kwangwoon University 447-1 Wolgye-Dong, Nowon-Gu, Seoul Korea 139-701 [email protected] 2 Department of Communication University of Arkansas 417 Kimpel Hall, Fayetteville, AR 72701, U.S.A [email protected] 3 Department of Industrial Engineering University of Arkansas 4207 Bell Engineering Center, Fayetteville, AR 72701, U.S.A [email protected]
Abstract. The present paper focuses on the influence of avatar creation in a video game. More specifically, this study investigates the effects of avatar creation on attitude towards avatar, empathy, presence, and para-social interaction of female non-game users. As a cyber-self, an avatar is a graphic character representing a user in cyberspace. Avatars are primarily used in the entertainment industry as high-tech novelties, controlled by game users, for high-end video games. Some games provide game characters by default that users cannot change, but other games provide various options gamers can choose. What if game users can create their own avatars? Do they have more psychological closeness with their avatars as their cyber-selves? This study tested the differences of attitude, empathy, presence, and para-social interaction of female non-game users between an avatar creation group and a non-avatar creation group and resulted in no difference. Keywords: Avatar, Attitude, Empathy, Presence, Para-Social Interaction, Wii.
Many games also provide users with a variety of avatars and options for creation. Avatars play an important role in gaming because gamers are visually exposed to their avatars. Consciously or unconsciously, gamers interact with their avatars and this interaction directly or indirectly affects entertainment. The role of game character/avatar creation was investigated in a few studies. For instance, Cordova and Lepper [4] found its importance in motivation and engagement in learning, and Lim [5] found that avatar choice leads to greater arousal and identification. In the continuation of those studies, the present research investigated the influence of avatar creation in interactive video games. More specifically, we examined attitude towards avatar, empathy, presence, and para-social interaction as outcomes of avatar creation.
2 Literature Reviews Little research has been done to examine the role of avatar creation, but identification with game characters has been examined [6] [7] [8]. The research has shown that various gaming situations (such as violent, story-based, and the third person POV (point of view) games) increase the level of identification with the characters. Most interestingly, Lim [5] found that avatar choice leads to greater identification. So what other outcomes will avatar creation bring about? One possible variable is attitude. Many variables influence attitude formation. Attitude formation refers to the transition from having no attitude towards a given object to having some attitude towards it, either positive or negative [9]. Many researchers examined cognitive processes to find out various factors to determine attitude formation. As a predisposition, attitude is formed when people have feelings toward an object, so personal experience is one of the most important factors. Indeed, according to Breer and Locke’s task-experience theory [10], when someone works to achieve a goal, nature of task (easy or hard), operations (individually or collectively), and outcome (success or fail) will shape attitudes. When gamers create their avatars, they feel a more unique experience than when they just receive an assigned avatar. Indeed, if gamers pleasantly spend time and endeavor to create their avatars at ease and are satisfied with it, they will more likely have positive attitude. Therefore, we hypothesized that: H1. Gamers who create their own avatar will have more positive attitude towards it than gamers who receive an avatar by default. Tamborini and his colleagues [11] state that there are nearly as many definitions of empathy as there are individuals attempting to study it. However, as a multidimensional construct, empathy has been examined in affective and cognitive dimensions. Cognitive dimension is a process by which we imaginatively place ourselves in another person’s situation such as perspective taking and fictional involvement. Affective dimension is a process associated with a tendency to experience strong emotional reactions to another person’s pain or misfortune such as empathic concern and emotional contagion. Zillmann [12] noted that a viewer’s affective response to a media message was dependent on the veridicality of the portrayal of the circumstances that fostered the emotions of a character on screen.
Influence of Avatar Creation
713
Cummins’ [13] interpretation is that the audio-visual presentations characteristic of contemporary electronic media have the potential to generate the greatest sense of veridicality and thus have great potential for eliciting affective responses. From this perspective, people who create their own avatar are supposed to have a greater sense of veridicality because they create it based on their realistic ideas and greater affective responses. Therefore, we hypothesized that: H2. Gamers who create their own avatar will have greater empathy than gamers who receive an avatar by default. Many scholars in various fields have different approaches to understanding presence. To explicate the concept of presence, the International Society for Presence Research (ISPR), which is a community of scholars interested in the presence concept, has revised a conceptual definition of presence. According to ISPR [14], presence is defined as a psychological state or subjective perception in which, even though part or all of an individual's current experience is generated by and/or filtered through human-made technology, part or all of the individual's perception fails to accurately acknowledge the role of the technology in the experience. Lombard and Ditton [15] pointed out that presence is the perception of non-mediation. In the early stage of conceptualizing presence, Witmer and Singer [16] noted that involvement and immersion are necessary for experiencing presence. They defined involvement and immersion as similar psychological processes. Involvement is defined as a psychological state experienced as a consequence of focusing one’s energy and attention on a coherent set of stimuli or meaningfully related activities and events. On the other hand, immersion is characterized by perceiving oneself to be enveloped by, included in, and able to interact with an environment that provides a continuous stream of stimuli and experiences. The two are distinguished from one another in that involvement depends on focusing one’s attention and energy on a coherent set of stimuli while immersion depends on perceiving oneself as a part of the stimulus flow. Witmer and Singer propose that a valid measure of presence should address factors that influence involvement as well as those that affect immersion. So what is the role of avatars in presence? Lim [5] used a two way ANOVA test with avatar choice and visual point of view (POV), she found that avatar choice leads to a greater sense of presence in the third person point of view. Therefore, we hypothesized that: H3. Gamers who create their own avatar will have a greater sense of presence than gamers who receive an avatar by default. Horton and Wohl [17] conceptualized para-social interaction as the imaginary oneway relationship that viewers develop with people on television. Scholars have examined (imaginary) relationships between reporters and viewers, anchors and viewers, TV characters and viewers, and so on. Recently, Klimmt and his colleague [18] examined how people perceive their avatars as interaction partners using the para-social interaction perspective. The Basic concept of para-social interaction is that media users psychologically interact with characters appearing on-screen and much literature has shown that para-social interaction is developed by frequent exposure. Therefore, if people are more exposed to their avatars by creating their own, it was
714
D. Chung, B.D. deBuys, and C.S. Nam
supposed that the avatar creation group would have a greater sense of para-social interaction. Therefore, we hypothesized that: H4. Gamers who create their own avatar will have a greater sense of para-social interaction than gamers who receive an avatar by default.
3 Console Games In the world of next-generation home gaming systems, the emphasis has been on increasing graphical capabilities and processing speed. The XBOX 360 was the first next-generation system to be released to the public (2005). It boasts a dual-layer DVD-ROM that enables HD quality graphics (http://www.gamespot.com/features/ 6125087/index.html?type=tech). Its processing speed is 3.2 GHz. The PlayStation 3 (PS3) was released at the end of 2006 and is struggling to catch up to the commercial success of the 360. It has a Blu-ray BD-ROM, which also enables HD quality graphics. Its processing speed is also 3.2 GHz, though with a different processor than the 360. Both the PS3 and XBOX 360 can play DVDs. In an attempt to get back into the highly competitive video game market, Nintendo realized that they couldn’t keep up with the hardware capabilities of the PS3 and the XBOX 360. Nintendo decided to approach home gaming from a different perspective, emphasizing playability and interactivity over enhanced visual capabilities. The Nintendo Wii is what emerged from such an attempt to change the way video games are perceived. The Wii has several significant differences from the other next-gen gaming platforms (wii.nintendo.com). For one thing, it is very small, at 8.5 cm x 6 cm x 2 cm and 3.84 lbs. It also has wireless connectivity. The Wii is backwardscompatible with the previous Nintendo system (the Gamecube) and many of the games from previous Nintendo systems will be available for download. The ability to play older games with limited graphical capabilities and relatively simplistic gaming features is consistent with Nintendo’s focus on gameplay over graphics. The most innovative feature of the Nintendo Wii is the controller, called the Wii Remote. It contains sophisticated motion-sensing technology that enables a variety of gaming functions. You can swing the controller like a tennis racket to play a tennis game. You can grab the controller with both hands and steer it like a steering wheel. You can point and shoot in first-person shooters. With an additional controller connected to the Remote, you can box an opponent by engaging in a punching and blocking motion using both hands. The Nunchuk is an additional controller for the other hand that is connected to the Remote. The Nunchuk has an analog joystick to control fine movements, so that motion can be controlled by the left hand while the right hand engages in swinging motions, jabbing movements, or whatever is appropriate for the game. The Nunchuk also contains motion-sensing technology, so controller movements can involve both hands at once. The Remote also contains other features that may contribute to a more immersive experience. It has a rumble feature to supply kinesthetic feedback. The Remote also has a speaker build into the controller itself. Both the Remote and the Nunchuk have an ambidextrous nature that allows for right- or left-handed players to use them with equal facility. The controllers are wireless and interact with the system
Influence of Avatar Creation
715
using Bluetooth technology via a sensor bar that perches on top or in front of the television. The sensor bar can pick up motion from up to 30 feet away. However, unlike XBOX 360 and PS3, the Wii can only play Wii discs and Gamecube discs and resolution is limited with the Wii (480p) compared to the high definition available for the PS3 and XBOX 360 (1080i) (http://reviews.cnet.com/ Nintendo_Wii/4540-6464_7-31355104-4.html?tag=sub). All three systems have wireless controllers.
4 Method 4.1 Participants and Procedure Since many game studies have shown that gender and game experience are significant variables having different outcomes, this research controlled for the gender and game experience factors. Therefore, only non-game female users were applicable to this research. Also, participants must not have a pre-existing avatar because avatar creation is a manipulation in this research. Finally, sixteen female undergraduate students having no game experience were drawn from a few communication classes at a large university. Ages ranged from 19 to 25 and the average age was 21.25 (SD=1.57). All participants filled out a consent form and then entered into a game lab which had a TV set, Nintendo Wii, and home theater system. As an experimental group, only participants who were randomly assigned to an avatar creation group created their avatars for 6 minutes. Participants who were assigned to a non-avatar creation group as a control group just received a default avatar. Since the game provided 3 different default female avatars, we assigned the third female avatar to females in the control group to avoid any internal validity threat. Overall, each group had eight participants. Many options are available to customize an avatar (known on the Nintendo Wii as a Mii). There are 45 initial random faces to choose from, and then participants can keep choosing from 9 altered versions of the face until “Use this face” is chosen. There are then 9 screens of options to choose from. The initial screen contains Nickname, Favorite (outfit color), Gender, Birthday, Favorite Color (12 options), Mingle (choose whether to let your avatar interact with other avatars), and Mii Creator. The second screen has to do with Body Type. Height and Weight can both be altered along a sliding scale. The third screen has to do with Face. Shape (8 options), Facial Characteristics (12), and Color (6) can all be altered. The fourth screen has to do with Hair. Style (72), Color (8), and the Side of the Part can be altered. The fifth screen has to do with Eyebrows. Type (24), Color (8), Up/Down Placement, Size, Rotation, and Left/Right Placement can be altered. The sixth screen has to do with Eyes. Type (48), Color (6), Up/Down Placement, Size, Rotation, and Left/Right Placement can be altered. The seventh screen has to do with Nose. Type (12), Up/Down Placement, and Size can be altered. The eighth screen has to do with Mouth. Type (24), Color (3), Up/Down Placement, and Size can be altered. Finally, the last has to do with Accessories. Glasses (Type (9), Color (6), Up/Down Placement, and Size), Mustache (Type (4), Color (8), Up/Down Placement, and Size), Mole (Type (2), Up/Down Placement, Size, and Left/Right Placement), and Beard
716
D. Chung, B.D. deBuys, and C.S. Nam
Type (4) and Color (8)) can all be altered. Participants in the avatar creation group experienced all these functions and created their own avatars. After participants had their avatars in both groups, a researcher explained how to play a tennis game. The researcher showed them how to serve, do a forehand, and do a backhand. They were asked to play the tennis game for 5 minutes as a training session. All of the lights were turned off and the researcher helped guide them during that time. After this training session, participants played alone for 15 minutes. At the end of the gaming session, participants were given a main questionnaire that asked about attitude towards avatar, presence, empathy, and para-social interaction. 4.2 Instruments The participants were questioned about attitude towards avatar, empathy, presence, and para-social interaction. Attitude towards avatar had four items which measure participants’ general feeling of favorableness or unfavorableness for their avatar. A five-point semantic differential scale was employed and all items were retained: I think that my avatar is “useless/useful,” “unimportant/important,” “foolish/wise,” and “unpleasant/pleasant.” It was found to be reliable (M=3.36, SD=0.71, α=.72). Empathy was operationally defined as feeling the same way that an observed avatar is feeling and eight items were newly created such as “when my avatar was happy, I was
Fig. 1. Overview of the experimental condition
Influence of Avatar Creation
717
happy,” “my emotional state affected the interaction of my avatar and myself,” etc. Two items were deleted and the measure was reliable (M=2.71, SD=0.73, α=.84). Presence had fourteen items based on Witmer and Singer’s presence measure (1998) which asked about involvement and immersion in the gaming environment. All of the items were newly created and retained. It was reliable (M=3.03, SD=0.83, α=.94). Lastly, based on Rubin and Perse [19], para-social interaction was revised. Para-social interaction between gamers and game characters was measured by eight items and all of the items were retained and reliable (M=2.46, SD=0.90, α=.91). Empathy, presence, and para-social interaction used a five-point Likert scale. The Nintendo Wii was chosen for this research, based on its greater interactivity compared with other systems. A LG 42 inch LCD TV was used with the Wii. It is 46.3 x 30.2 x 11.8 (in) and 90.4 (lbs) with the stand. The resolution is 1366 x 768 (Dot) and the television system is NTSC-M, ATSC, 64 & 256 QAM. For better sound, A Panasonic SA-HT940 home theater system was used. It has 5 +1 channels.
Fig. 2. “I can’t see my avatar”
5 Results Results showed that no hypothesis was supported. First, there was no significant difference in attitude between avatar creation (M=3.31, SD=.61) and non-avatar creation groups (M=3.41, SD=.83), t(14)=-.26, ns, two-tailed. Second, between avatar creation (M=2.92, SD=.57) and non-avatar creation groups (M=2.5, SD=.85), there was no significant difference of empathy t(14)=1.16, ns, two-tailed. Third, there was
718
D. Chung, B.D. deBuys, and C.S. Nam
no significant difference in presence between avatar creation (M=2.82, SD=.67) and non-avatar creation groups (M=3.04, SD=1.04), t(14)=-.49, ns, two-tailed. Lastly, no significant difference was found in para-social interaction between avatar creation (M=2.78, SD=.86) and non-avatar creation groups (M=2.14, SD=.88). t(14)=1.48, ns, two-tailed.
6 Discussion The goal of the present study was to find out the influence of avatar creation on attitude, empathy, presence, and para-social interaction. In order to answer this question, we divided the participants into the two groups of avatar creation and nonavatar creation and compared the outcomes. The avatar creation group created their own avatar while the non-avatar creation group received a default avatar. The results showed that no hypothesis was supported. There may be a few reasons why there were no differences between the avatar creation and non-avatar creation groups. First, the sample might not have been appropriate. Having only female participants might be a reason why there were no significant effects. According to Lim [5], gender of the game player was a significant factor that determined many aspects of the game play experience, both physiologically and subjectively. More specifically, males exhibited more significant outcomes than females in arousal, heart rate, and valence, and females’ physiological responses did not depend on avatar choice. However, this research recruited only females because it was hard to find male non-game users. Since creating an avatar is a manipulation in this research, we sought people who did not play games and who did not have an avatar. In the preliminary study, we found that most male students enjoyed gaming and had an avatar, so we limited participants to females. Given that in the preliminary research the average video game self efficacy score, which is a person’s judgment of her ability to play video games, was below the midpoint of the measurement scale (M=2.80, SD=.89), the data might show that female non-game users were not appropriate participants because they were not confident in playing the game. Second, the tennis game itself was too simplistic. Playing a simple game can be an advantage as well as disadvantage in gaming research. The reasons that we chose this game were that participants were non-game users and females. We wanted them to be familiar with an easy game immediately following the five minute training session. Cordova and Lepper [4] found that with a more challenging game, greater use of complex operations and greater strategic play had significant outcomes in the participants’ motivations and engagement. Since the tennis game had simple operations, low difficulty, and no strategy, the participants might not have significant difference in presence, empathy, and para-social interaction between the two groups. It may be a good idea to ask “perceived ease of use” for gaming, which means the degree to which the user feels the game to be easy or free of effort. Third, the manipulation might not be enough. The experimental group was supposed to spend six minutes to create their avatars, but it was hard to know if they really spent the whole time creating their avatars. The experimental group was also told that all nine screens of options should be clicked and used, but again, we had no
Influence of Avatar Creation
719
way to check up on it. Unfortunately, it is hard to know the role of the manipulation because we did not ask how they felt about creating avatar. Fourth, two very important problems existed in terms of avatars. First, the avatars were facing away from the participants during game play so they did not even see their creation’s face, and all of the characters looked pretty much the same from the back