This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
6947
Pedro Campos Nicholas Graham Joaquim Jorge Nuno Nunes Philippe Palanque Marco Winckler (Eds.)
Human-Computer Interaction – INTERACT 2011 13th IFIP TC 13 International Conference Lisbon, Portugal, September 5-9, 2011 Proceedings, Part II
13
Volume Editors Pedro Campos Nuno Nunes University of Madeira 9020-105, Funchal, Portugal E-mail: {pcampos, njn}@uma.pt Nicholas Graham Queen’s University Kingston, ON K7L 3N6, Canada E-mail: [email protected] Joaquim Jorge Instituto Superior Técnico 1049-001 Lisbon, Portugal E-mail: [email protected] Philippe Palanque Marco Winckler University Paul Sabatier 31062 Toulouse Cedex 9, France E-mail: {palanque, winckler}@irit.fr
ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-23770-6 e-ISBN 978-3-642-23771-3 DOI 10.1007/978-3-642-23771-3 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011935338 CR Subject Classification (1998): H.5.2, H.5.3, H.3-5, I.2.10, D.2, K.3-4, K.8 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI
Advances in interactivity, computing power, mobile devices, large displays and ubiquitous computing offer an ever-increasing potential for empowering users. This can happen within their working environment, in their leisure time or even when extending their social skills. While such empowerment could be seen as a way of connecting people in their workspace, home or on the move, it could also generate gaps requiring larger effort and resources to fruitfully integrate disparate and heterogeneous computing systems. The conference theme of INTERACT 2011 was “building bridges” as we believe human–computer interaction (HCI) is one the research domains more likely to significantly contribute to bridging such gaps. This theme thus recognizes the interdisciplinary and intercultural spirit that lies at the core of HCI research. The conference had the objective of attracting research that bridges disciplines, cultures and societies. Within the broad umbrella of HCI, we were in particular seeking high-quality contributions opening new and emerging HCI disciplines, bridging cultural differences, and tackling important social problems. Thus, INTERACT 2011 provided a forum for practitioners and researchers to discuss all aspects of HCI, including these challenges. The scientific contributions gathered in these proceedings clearly demonstrate the huge potential of that research domain to improving both user experience and performance of people interacting with computing devices. The conference also is as much about building bridges on the human side (between disciplines, cultures and society) as on the computing realm. INTERACT 2011 was the 13th conference of the series, taking place 27 years after the first INTERACT held in early September 1984 in London, UK. Since INTERACT 1990 the conferences have taken place under the aegis of the UNESCO International Federation for Information Processing (IFIP) Technical Committee 13. This committee aims at developing the science and technology of the interaction between humans and computing devices through different Working Groups and Special Interests Groups, all of which, together with their officers, are listed within these proceedings. INTERACT 2011 was the first conference of its series to be organized in cooperation with ACM SIGCHI, the Special Interest Group on Computer–Human Interaction of the Association for Computing Machinery. We believe that this cooperation was very useful in making the event both more attractive and visible to the worldwide scientific community developing research in the field of HCI. We thank all the authors who chose INTERACT 2011 as the venue to publish their research This was a record year for the conference in terms of submissions in the main technical categories. For the main Technical Program there were a total of 680 submissions, including 402 long and 278 short papers, out of which we accepted 171 (111 long and 60 short submissions), for a combined acceptance rate of less than 25%. Overall, from a total of 741 submissions for all tracks, 290 were accepted, as follows:
VI
– – – – – – – – – – –
Foreword
111 Full Research Papers 60 Short Research Papers 54 Interactive Poster Papers 17 Doctoral Consortium Papers 16 Workshops 12 Tutorials 5 Demonstrations 6 Organizational Overviews 4 Industrial Papers 3 Special Interest Groups 2 Panels
Our sincere gratitude goes to the members of our Program Committee (PC), who devoted countless hours to ensure the high quality of the INTERACT Conference. This year, we improved the reviewing process by moving to an associate chair model. With almost 700 submitted papers, it is impossible for the PC Chairs to read every paper. We recruited 103 Associate Chairs (ACs), each of whom handled up to 12 papers. The ACs recruited almost 800 external reviewers, guaranteeing that each paper was reviewed by three to six referees. ACs also provided a meta-review. Internal discussion among all the reviewers preceded the final decision between the PC Chairs and the AC. This herculean effort was only possible due to the diligent work of many people. We would like to thank you all for the effort and apologize for all the bullying required to get the work done on time. In addition, sincere thanks must be extended to those whose contributions were essential in making it possible for the conference to happen and for these proceedings to be produced. We owe a great debt to the Conference Committees, the members of the International Program Committee and the numerous reviewers who had to review submissions from the various categories. Similarly, the members of the conference Organizing Committee, the staff at INESC-ID, especially Manuela Sado, deserve much appreciation for their tireless help with all aspects of planning and managing the many administrative and organizational issues. We would like to especially thank Tiago Guerreiro for his dedication with the Student Volunteer program, and Jos´e Coelho who worked tirelessly to make the online program a reality. Thanks are also due to Alfredo Ferreira for keeping and single-handedly maintaining the website, and to Pedro Campos and Marco Winkler for the superb work done with the conference proceedings. Finally, our thanks go to all the authors who actually did the scientific work and especially to the presenters who took the additional burden of discussing the results with their peers at INTERACT 2011 in Lisbon. July 2011
Nicholas Graham Daniel Gon¸calves Joaquim Jorge Nuno Nunes Philippe Palanque
IFIP TC13
Established in 1989, the International Federation for Information Processing Technical Committee on Human–Computer Interaction (IFIP TC13) is an international committee comprising 30 national societies and 7 working groups, representing specialists in human factors, ergonomics, cognitive science, computer science, design and related disciplines. INTERACT is its flagship conference, staged biennially in different countries in the world. IFIP TC13 aims to develop the science and technology of human–computer interaction (HCI) by encouraging empirical research; promoting the use of knowledge and methods from the human sciences in design and evaluation of computer systems; promoting better understanding of the relation between formal design methods and system usability and acceptability; developing guidelines, models and methods by which designers may provide better human-oriented computer systems; and, cooperating with other groups, inside and outside IFIP, to promote user-orientation and humanization in system design. Thus, TC13 seeks to improve interactions between people and computers, encourage the growth of HCI research and disseminate these benefits world-wide. The main orientation is toward users, especially the non-computer professional users, and how to improve human–computer relations. Areas of study include: the problems people have with computers; the impact on people in individual and organizational contexts; the determinants of utility, usability and acceptability; the appropriate allocation of tasks between computers and users; modelling the user to aid better system design; and harmonizing the computer to user characteristics and needs. While the scope is thus set wide, with a tendency toward general principles rather than particular systems, it is recognized that progress will only be achieved through both general studies to advance theoretical understanding and specific studies on practical issues (e.g., interface design standards, software system consistency, documentation, appropriateness of alternative communication media, human factors guidelines for dialogue design, the problems of integrating multi-media systems to match system needs and organizational practices, etc.). IFIP TC13 stimulates working events and activities through its working groups (WGs). WGs consist of HCI experts from many countries, who seek to expand knowledge and find solutions to HCI issues and concerns within their domains, as outlined below. In 1999, TC13 initiated a special IFIP Award, the Brian Shackel Award, for the most outstanding contribution in the form of a refereed paper submitted to and delivered at each INTERACT. The award draws attention to the need for a comprehensive human-centered approach in the design and use of information technology in which the human and social implications have been taken into
VIII
IFIP TC13
account. Since the process to decide the award takes place after papers are submitted for publication, the award is not identified in the proceedings. WG13.1 (Education in HCI and HCI Curricula) aims to improve HCI education at all levels of higher education, coordinate and unite efforts to develop HCI curricula and promote HCI teaching. WG13.2 (Methodology for User-Centered System Design) aims to foster research, dissemination of information and good practice in the methodical application of HCI to software engineering. WG13.3 (HCI and Disability) aims to make HCI designers aware of the needs of people with disabilities and encourage development of information systems and tools permitting adaptation of interfaces to specific users. WG13.4 (also WG2.7) (User Interface Engineering) investigates the nature, concepts and construction of user interfaces for software systems, using a framework for reasoning about interactive systems and an engineering model for developing user interfaces. WG13.5 (Human Error, Safety and System Development) seeks a framework for studying human factors relating to systems failure, develops leading– edge techniques in hazard analysis and safety engineering of computer-based systems, and guides international accreditation activities for safety-critical systems. WG13.6 (Human-Work Interaction Design) aims at establishing relationships between extensive empirical work-domain studies and HCI design. It promotes the use of knowledge, concepts, methods and techniques that enables user studies to procure a better apprehension of the complex interplay between individual, social and organizational contexts and thereby a better understanding of how and why people work in the ways that they do. WG13.7 (Human–Computer Interaction and Visualization) is the newest of the working groups under the TC.13. It aims to establish a study and research program that combines both scientific work and practical applications in the fields of human–computer interaction and visualization. It integrates several additional aspects of further research areas, such as scientific visualization, data mining, information design, computer graphics, cognition sciences, perception theory, or psychology, into this approach. New WGs are formed as areas of significance to HCI arise. Further information is available on the IFIP TC13 website: http://csmobile.upe.ac.za/ifip
IFIP TC13 Members
Australia Judy Hammond Australian Computer Society Austria Andreas Holzinger Austrian Computer Society Belgium Monique Noirhomme-Fraiture Federation des Associations Informatiques de Belgique Brazil Simone Diniz Junqueira Barbosa (TC 13 secretary) Brazilian Computer Society (SBC) Bulgaria Kamelia Stefanova Bulgarian Academy of Sciences Canada Heather O’Brian Canadian Information Processing Society China Zhengjie Liu Chinese Institute of Electronics Cyprus Panayiotis Zaphiris Cyprus Computer Society Czech Republic Vaclav Matousek Czech Society for Cybernetics and Informatics
Denmark Annelise Mark Pejtersen Danish Federation for Information Processing Finland Kari-Jouko R¨aih¨a Finnish Information Processing Association France Philippe Palanque (TC 13 vice chair) Societe des Electriciens et des Electroniciens (SEE) Germany Tom Gross Gesellschaft fur Informatik Hungary Cecilia Sik Lanyi John v. Neumann Computer Society (NJSZT) Iceland Marta Kristin Larusdottir The Icelandic Society for Information Processing (ISIP) India Anirudha Joshi Computer Society of India Italy Fabio Patern` o Italian Computer Society
X
IFIP TC13 Members
Ireland Liam J. Bannon Irish Computer Society
South Africa Paula Kotz´e The Computer Society of South Africa
Japan Masaaki Kurosu Information Processing Society of Japan
Spain Julio Abascal Asociaci´ on de T´ecnicos de Inform´ atica (ATI)
Kenya Daniel Orwa Ochieng Computer Society of Kenya Malaysia Chui Yin Wong Malaysian National Computer Confederation New Zealand Mark Apperley New Zeeland Computer Society (NZCS) Nigeria Chris Nwannenna Nigeria Computer Society Norway Dag Svanes Norwegian Computer Society Poland Juliusz L. Kulikowski Poland Academy of Sciences Portugal Joaquim A. Jorge Associa¸c˜ ao Portuguesa de Inform´ atica Singapore Henry Been-Lirn Duh Singapore Computer Society
Sweden Jan Gulliksen TC 13 (chair) Swedish Interdisciplinary Society for Human–Computer Interaction (STIMDI) - Swedish Computer Society Switzerland Ute Klotz Swiss Association for Research in Information Technology SARIT The Netherlands Gerrit C. van der Veer Nederlands Genootschap voor Informatica UK Andrew Dearden British Computer Society (BCS) USA-based John Karat Association for Computing Machinery (ACM) Nahum Gershon The Computer Society, Institute of Electrical & Electronics Engineers (IEEE-CS) Expert members Nikos Avouris, Greece Paula Kotz´e, South Africa Gitte Lindegaard, Canada Annelise Mark Pejtersen, Denmark Marco Winckler, France
IFIP TC13 Members
Working Group Chairpersons WG13.1 (Education in HCI and HCI Curricula) Lars Oestreicher, Sweden SIG13.1 (Interaction Design and International Development) Janet Read, UK WG13.2 (Methodology for User-Centered System Design) Peter Forbrig, Germany SIG13.2 (Interaction Design and Children) Panos Markopoulous, The Netherlands WG13.3 (HCI and Disability) Gerard Weber, Germany WG13.4 (joint with WG 2.7) (User Interface Engineering) Fabio Patern´ o, Italy WG13.5 (Human Error, Safety, and System Development) Philippe Palanque, France WG13.6 (Human-Work Interaction Design) Torkil Clemmensen, Denmark WG13.7 (Human–Computer Interaction and Visualization) Achim Ebert, Germany
XI
INTERACT 2011 Technical Committee
Conference Committee General Co-chairs Joaquim A. Jorge, Portugal Philippe Palanque, France Honorary Co-chairs Larry Constantine, Portugal Don Norman, USA Anneliese Mark Pejtersen, Denmark Technical Program Co-chairs Daniel Gon¸calves, Portugal Nick Graham, Canada Nuno Nunes, Portugal
Technical Program Committee Demonstrations Co-chairs Ver´onica Orvalho, Portugal Greg Philips, Canada Doctoral Consortium Co-chairs Gitte Lindgaard, Canada Manuel Jo˜ ao Fonseca, Portugal Full Papers Co-chairs Nick Graham, Canada Nuno Nunes, Portugal Industrial Program Co-chairs Antonio Cˆamara, Portugal Miguel Dias, Portugal Stacy Hobson, USA Oscar Pastor, Spain Virpi Roto, Finland Interactive Posters Co-chairs Ad´erito Marcos, Portugal Monique Noirhomme-Fraiture, Belgium
XIV
INTERACT 2011 Technical Committee
Keynote Speakers Co-chairs John Karat, USA Jean Vanderdonckt, Belgium Organization Overviews Co-chairs Teresa Chambel, Portugal Mary Czerwinski, USA Panels Co-chairs Regina Bernhaupt, Austria Nuno Correia, Portugal Peter Forbrig, Germany Short Papers Co-chairs Daniel Gon¸calves, Portugal Special Interest Groups (SIGs) Co-chairs Gerrit van der Veer, The Netherlands Teresa Rom˜ao, Portugal Student Design Competition Co-chairs Simone Diniz Junqueira Barbosa, Brazil Luis Carri¸co, Portugal Tutorials Co-chairs Jos´e Creissac Campos, Portugal Paula Kotz´e, South Africa Workshops Co-chairs Julio Abascal, Spain Nuno Guimar˜ aes, Portugal
Organizing Committee Local Organization Co-chairs Alfredo Ferreira, Portugal Pauline Jepp, Portugal Manuela Sado, Portugal Multimedia Conferencing Co-chairs Jos´e Coelho, Portugal Lars Oestreicher, Sweden Publications Co-chairs Padro Campos, Portugal Marco Winckler, France
INTERACT 2011 Technical Committee
XV
Publicity Co-chairs Paula Alexandra Silva, Portugal Tiago Guerreiro, Portugal Student Volunteers Co-chairs Tiago Guerreiro, Portugal Xavier Ferre, Spain Effie Law, UK Website Co-chairs Alfredo Ferreira, Portugal
Associate Chairs - Full Papers Julio Abascal, Spain Jose Abdelnour-Nocera, UK Silvia Abrah˜ ao, Spain Vincent Aleven, USA Nikolaos Avouris, Greece Cecilia Baranauskas, Brazil Simone Barbosa, Brazil Patrick Baudisch, Germany Regina Bernhaupt, France Robert Biddle, Canada Jeremy Birnholtz, USA Kellogg Booth, Canada Gaelle Calvary, France Pedro Campos, Portugal Torkil Clemmensen, Denmark Nuno Correia, Portugal Enrico Costanza, UK Joelle Coutaz, France Jos´e Creissac Campos, Portugal Mary Czerwinski, USA Peter Dannenmann, Germany Andy Dearden, UK Anke Dittmar, Germany Ellen Do, USA Gavin Doherty, Ireland Andrew Duchowski, USA Henry Been-Lim Duh, Singapore Michael Feary, USA Peter Forbrig, Germany Nahum Gershon, The Netherlands Marianne Graves Petersen, Denmark
Phil Gray, UK Tom Gross, Germany Mark D Gross, USA Jan Gulliksen, Sweden Michael Haller, Austria Richard Harper, UK Andreas Holzinger, Austria Kasper Hornbaek, Denmark Horst Hortner, Austria Matt Jones, UK Anirudha Joshi, India Hermann Kaindl, Austria Evangelos Karapanos, Portugal Rick Kazman, USA Ute Klotz, Switzerland Vassilis Kostakos, Portugal Masaaki Kurosu, Austria Ed Lank, Canada Marta Larusdottir, Iceland Henry Lieberman, USA Panos Markopolous, The Netherlands Christian Muller, Germany Miguel Nacenta, Canada Laurence Nigay, France Monique Noirhomme, Belgium Eamonn O’Neill, UK Ian Oakley, Portugal Oscar Pastor, Spain Fabio Paterno, Italy Lia Patr´ıcio, Portugal Helen Petrie, UK
XVI
INTERACT 2011 Technical Committee
Nitendra Rajput, India Janet Read, UK Dave Roberts, UK Kari-Jouko Raiha, Finland Miguel Sales Dias, Portugal Jaime Sanchez, Chile Robert St Amant, USA Kamelia Stefanova, Bulgaria James Stewart, Canada
Wolfgang Stuerzlinger, UK Jan van den Bergh, Belgium Gerrit van der Veer, The Netherlands Jos van Leeuwen, Portugal Gerhard Weber, Germany Janet Wesson, South Africa Marco Winckler, France Volker Wulf, Germany
Associate Chairs - Short Papers Jose Abdelnour-Nocera, UK Elisabeth Andr´e, Germany Mark Apperley, New Zealand Nathalie Aquino, Spain Simone Barbosa, Brazil Alexander Boden, Germany Gaelle Calvary, France Robert Capra, USA Luis Carri¸co, Portugal Marc Cavazza, UK Teresa Chambel, Portugal St´ephane Conversy, France Nuno Correia, Portugal Tim Davis, USA Antonella de Angeli, UK Andy Dearden, UK Anke Dittmar, Germany Carlos Duarte, Portugal Achim Eber, Germany David Elsweiler, UK Danyel Fisher, USA Peter Forbrig, Germany Tiago Guerreiro, Portugal Jacek Gwizdka, USA Marc Hassenzahl, Germany Anirudha Joshi, India Hermann Kaindl, Austria Ute Klotz, Switzerland
Tessa Lau, USA Gitte Lindgaard, Canada Floyd Mueller, USA Lennart Nacke, Canada Yukiko Nakano, Japan Monique Noirhomme, Belgium Lars Oestreicher, Sweden Eamonn O’Neill, UK Dan Orwa, Kenya Tim Paek, USA Ignacio Panach, Spain Fabio Paterno, Italy Lia Patr´ıcio, Portugal Nitendra Rajput, India Francisco Rebelo, Portugal Dave Roberts, UK Teresa Rom˜ao, Portugal Virpi Roto, Finland Raquel Santos, Portugal Beatriz Sousa Santos, Portugal James Stewart, Canada Sriram Subramanian, UK Feng Tian, China Manas Tungare, USA Gerhard Weber, Germany Astrid Weiss, Austria Marco Winckler, France Chui Yin Wong, Malaysia
INTERACT 2011 Technical Committee
Reviewers Al Mahmud Abdullah, The Netherlands Ana Paula Afonso, Portugal Jason Alexander, UK Jan Alexandersson, Germany Dzmitry Aliakseyeu, The Netherlands Majed Alshamari, Saudi Arabia Margarita Anastassova, France Craig Anslow, New Zealand Caroline Appert, France Nathalie Aquino, Spain Pedro Arezes, Portugal Ernesto Arroyo, USA Mark Ashdown, UK Ching man Au Yeung, Japan Chris Baber, UK Paula M. Bach, USA Nilufar Baghaei, New Zealand Sebastiano Bagnara, Italy Gilles Bailly, Germany Martina Balestra, USA Emilia Barakova, The Netherlands Jakob Bardram, Denmark Shaowen Bardzell, USA Javier Bargas-Avila, Switzerland Louise Barkhuus, Denmark Pippin Barr, Denmark Barbara Rita Barricelli, Italy Gil Barros, Brazil Len Bass, USA Remi Bastide, France Rafael Bastos, Portugal Eric Baumer, USA Gordon Baxter, UK Michel Beaudouin-Lafon, France Nikolaus Bee, Germany Yacine Bellik, France Kawtar Benghazi, Spain Mike Bennett, USA Fran¸cois B´erard, France Olav W. Bertelsen, Denmark Nigel Bevan, UK Ganesh Bhutkar, India
Matthew Bietz, USA Mark Billinghurst, New Zealand Dorrit Billman, USA Fernando Birra, Portugal Mike Blackstock, Canada Marcus Bloice, Austria Marco Blumendorf, Germany Mark Blythe, UK Cristian Bogdan, Sweden Morten Bohoj, Denmark Matthew Bolton, USA Birgit Bomsdorf, Germany Rodrigo Bonacin, Brazil Sebastian Boring, Canada Aviaja Borup, Denmark Matt-Mouley Bouamrane, UK Doug Bowman, USA Giorgio Brajnik, Italy Pedro Branco, Portugal Willem-Paul Brinkman, The Netherlands Gregor Broll, Germany Christopher Brooks, Canada Judith Brown, Canada Steffen Budweg, Germany Lucy Buykx, UK Marina Buzzi, Italy Daragh Byrne, Ireland Cristina Cachero, Spain Jeff Calcaterra, USA Licia Calvi, The Netherlands Eduardo Calvillo Gamez, Mexico Maria-Dolores Cano, Spain Xiang Cao, China Cinzia Cappiello, Italy Robert Capra, USA Luis Carlos paschoarelli, Brazil Stefan Carmien, Spain Maria Beatriz Carmo, Portugal Ant´ onio Carvalho Brito, Portugal Luis Castro, Mexico Daniel Cernea, Germany Matthew Chalmers, UK
XVII
XVIII
INTERACT 2011 Technical Committee
Teresa Chambel, Portugal Beenish Chaudry, USA Tao Chen, China Fanny Chevalier, Canada Keith Cheverst, UK Yoram Chisik, Portugal Yu-kwong Chiu, China Georgios Christou, Cyprus Andrea Civan Hartzler, USA Laurence Claeys, France Luis Coelho, Portugal Fran¸cois Coldefy, France Karin Coninx, Belgium Maria Francesca Costabile, Italy C´eline Coutrix, France Nadine Couture, France Anna Cox, UK David Coyle, Ireland Leonardo Cunha de Miranda, Portugal Edward Cutrell, India Raimund Dachselt, Germany Jos´e Danado, Norway Tjerk de Greef, The Netherlands Alexander De Luca, Germany Luigi De Russis, Italy Clarisse de Souza, Brazil Alexandre Demeure, France Charlie DeTar, USA Ines Di Loreto, Italy Eduardo Dias, Portugal Paulo Dias, Portugal Claire Diederich, Belgium Andre Doucette, Canada Carlos Duarte, Portugal Emmanuel Dubois, France Cathy Dudek, Canada Andreas Duenser, New Zealand Mark Dunlop, UK Sophie Dupuy-Chessa, France Matthew Easterday, USA Achim Ebert, Germany Florian Echtler, USA Amnon Eden, UK Serge Egelman, USA Linda Elliott, USA
Niklas Elmqvist, USA Alex Endert, USA Dominik Ertl, Austria Parisa Eslambolchilar, UK Augusto Esteves, Portugal Pedro Faria Lopes, Portugal Robert Farrell, USA Ian Fasel, USA Ava Fatah gen. Schieck, UK Jean-Daniel Fekete, France Xavier Ferre, Spain Mirko Fetter, Germany Sebastian Feuerstack, Brazil Nelson Figueiredo de Pinho, Portugal George Fitzmaurice, Canada Joan Fons, Spain Manuel J. Fonseca, Portugal Alain Forget, Canada Florian F¨orster, Austria Derek Foster, UK Marcus Foth, Australia Teresa Franqueira, Portugal Mike Fraser, UK Christopher Frauenberger, UK Andr´e Freire, UK Carla Freitas, Brazil David Frohlich, UK Dominic Furniss, UK Luigi Gallo, Italy Teresa Galv˜ao, Portugal Nestor Garay-Vitoria, Spain Roberto Garc´ıa, Spain Anant Bhaskar Garg, India Vaibhav Garg, USA Jose Luis Garrido, Spain Nahum Gershon, Canada Florian Geyer, Germany Werner Geyer, USA Giuseppe Ghiani, Italy Andy Gimblett, UK Patrick Girard, France Sylvie Girard, UK Leonardo Giusti, Italy Guilherme Gomes, Portugal Daniel Gon¸calves, Portugal
INTERACT 2011 Technical Committee
Jos´e Luis Gonz´ alez S´anchez, Spain Phil Gosset, UK Nitesh Goyal, USA Toni Granollers, Spain Anders Green, Sweden Collin Green, USA Saul Greenberg, Canada Olivier Grisvard, France Tiago Guerreiro, Portugal Sean Gustafson, Germany Mieke Haesen, Belgium Jonna H¨ akkil¨ a, Finland Martin Halvey, UK Judy Hammond, Australia Mark Hancock, Canada Morten Borup Harning, Denmark John Harris, Canada Kirstie Hawkey, Canada Elaine Hayashi, Brazil Brent Hecht, USA Steffen Hedegaard, Denmark Mathias Heilig, Germany Ruediger Heimgaertner, Germany Ingi Helgason, UK Sarah Henderson, New Zealand Bart Hengeveld, The Netherlands Wilko Heuten, Germany Michael Hildebrandt, Norway Christina Hochleitner, Austria Eve Hoggan, Finland Paul Holleis, Germany Clemens Holzmann, Austria Jettie Hoonhout, The Netherlands Michael Horn, USA Eva Hornecker, Germany Heiko Hornung, Brazil Horst H¨ ortner, Austria Juan Pablo Hourcade, USA Aaron Houssian, The Netherlands Andrew Howes, UK Dalibor Hrg, Germany Ko-Hsun Huang, Portugal Jina Huh, USA Tim Hussein, Germany Dugald Hutchings, USA
XIX
Junko Ichino, Japan Netta Iivari, Finland Emilio Insfran, Spain Samuel Inverso, Australia Shamsi Iqbal, USA Petra Isenberg, France Howell Istance, UK Linda Jackson, USA Robert Jacob, USA Mikkel Jakobsen, Denmark Jacek Jankowski, USA Hans-Christian Jetter, Germany Sune Alstrup Johansen, Denmark Jeff Johnson, USA Simon Jones, UK Martino Jose Mario, Brazil Rui Jos´e, Portugal Marko Jurmu, Finland Don Kalar, USA Vaiva Kalnikaite, UK Martin Kaltenbrunner, Austria Matthew Kam, USA Mayur Karnik, Portugal Hannu Karvonen, Finland Sebastian Kassner, Germany Dinesh Katre, India Sevan Kavaldjian, Austria Konstantinos Kazakos, Australia Pramod Khambete, India Vassilis-Javed Khan, The Netherlands Hyungsin Kim, USA Jayne Klenner-Moore, USA Christian Kray, UK Per Ola Kristensson, UK Hannu Kukka, Finland Andrew Kun, USA H. Chad Lane, USA Yann Laurillau, France Effie Law, Switzerland Marco Lazzari, Italy Karin Leichtenstern, Germany Juha Leino, Finland Barbara Leporini, Italy Sophie Lepreux, France Olivier Lequenne, France
XX
INTERACT 2011 Technical Committee
Chunyuan Liao, USA Conor Linehan, UK Agnes Lisowska Masson, China Zhengjie Liu, China Sara Ljungblad, Sweden Claire Lobet, Belgium Steffen Lohmann, Spain Fernando Lopez-Colino, Spain Anja Lorenz, Germany Stephanie Ludi, USA Bernd Ludwig, Germany Andreas Luedtke, Germany Jo Lumsden, UK Kris Luyten, Belgium Kent Lyons, Canada Allan MacLean, UK Joaquim Madeira, Portugal Rui Madeira, Portugal Angela Mahr, Germany Stephann Makri, UK Sylvain Malacria, France Benjamin Mangold, Germany Javier Marco, Spain Gary Marsden, South Africa Mark Marshall, UK Hannah Marston, Canada Jean-Bernard Martens, The Netherlands Lynne Martin, USA Diego Mart´ınez, Spain C´elia Martinie, France Masood Massodian, New Zealand Sara Mastro, USA Maristella Matera, Italy Akhil Mathur, Canada Eva Mayr, Austria Davide Mazza, Italy emanuela mazzone, UK Gregor McEwan, Canada Kevin McGee, Singapore Marilyn McGee-Lennon, UK Indrani Medhi, India Gerrit Meixner, Germany Guy Melancon, France Eduarda Mendes Rodrigues, Portugal
Helena Mentis, UK Tim Merritt, Singapore Mei Miao, Germany Alex Mitchell, Singapore Robb Mitchell, Denmark Jose Pascual Molina Masso, Spain Francisco Montero, Spain Meredith Morris, USA Ann Morrison, Denmark Christiane Moser, Austria Omar Mubin, The Netherlands Florian ’Floyd’ Mueller, USA Christian Mueller-Tomfelde, Australia Michael Muller, USA Maurice Mulvenna, UK Dianne Murray, UK Lennart Nacke, Canada Peyman Nasirifard, USA David Navarre, France Ather Nawaz, Denmark Luciana Nedel, Brazil Vania Neris, Brazil Colette Nicolle, UK Femke Nijboer, The Netherlands Valentina Nisi, Portugal Leonel Nobrega, Portugal Sylvie Noel, Canada Manuel Noguera, Spain Marianna Obrist, Austria Johanna Renny Octavia, Belgium Amy Ogan, USA Michael O’Grady, Ireland Kenton O’Hara, UK Timo Ojala, Finland Eugenio Oliveira, Portugal Veronica Orvalho, Portugal Nuno Otero, Portugal Benoit Otjacques, Luxembourg Ana Paiva, Portugal Yue Pan, USA Jose Ignacio Panach Navarrete, Spain Alex Pang, UK Nadia Pantidi, UK Luca Paolino, Italy Eleftherios Papachristos, Greece
INTERACT 2011 Technical Committee
Narcis Pares, USA Andrew Patrick, Canada Celeste Lyn Paul, USA Sharoda Paul, USA Andriy Pavlovych, Canada Greg Phillips, Canada Lara Piccolo, Brazil Martin Pielot, Germany Emmanuel Pietriga, France franck poirier, France Benjamin Poppinga, Germany Christopher Power, UK Raquel Prates, Brazil John Precious, UK Costin Pribeanu, Romania Andreas Pusch, France Alexandra Queir´ os, Portugal Ismo Rakkolainen, Finland Dave Randall, UK Alberto Raposo, Brazil Stuart Reeves, UK Patrick Reignier, France Ren´e Reiners, Germany Malte Ressin, UK Bernardo Reynolds, Portugal Andy Ridge, UK Xavier Righetti, Switzerland Pierre Robillard, Canada Simon Robinson, UK Carsten R¨ ocker, Germany Yvonne Rogers, UK Markus Rohde, Germany Teresa Rom˜ao, Portugal Virpi Roto, Finland Anne Roudaut, Germany jose rouillard, France Mark Rouncefield, UK Nicolas Roussel, France Jaime Ruiz, Canada Pascal Salembier, France Antti Salovaara, Finland Nithya Sambasivan, USA Krystian Samp, Ireland Paulo Sampaio, Portugal Vagner Santana, Italy
Carmen Santoro, Italy Jos´e Santos, Portugal Teresa Sarmento, Portugal Cheryl Savery, Canada Dominique Scapin, France Thomas Schlegel, Germany Kevin Schneider, Canada Johannes Sch¨oning, Germany Eric Schweikardt, USA Gig Searle, Austria Thomas Seifried, Austria Marc Seissler, Germany Malu Seixas, Brazil Ted Selker, USA Abi Sellen, UK Dev Sen, Canada Andrew Seniuk, Canada Aaditeshwar Seth, India Leslie Setlock, USA Ehud Sharlin, Canada Aditi Sharma, South Africa Huihui Shi, Germany Aubrey Shick, USA Garth Shoemaker, Canada Bruno Silva, Brazil Frutuoso Silva, Portugal Hugo Silva, Portugal Klaus-Martin Simonic, Austria Mikael B. Skov, Denmark Roger Slack, UK David Smith, Canada Dustin Smith, USA Thomas Smyth, Canada William Soukoreff, Canada Kenia Sousa, Belgium Jan Stage, Denmark Danae Stanton Fraser, UK Gunnar Stevens, Germany Erik Stolterman, USA Markus Stolze, Switzerland Steven Strachan, USA Simone Stumpf, UK Sriram Subramanian, UK Ja-Young Sung, USA Alistair Sutcliffe, UK
XXI
XXII
INTERACT 2011 Technical Committee
David Swallow, UK Colin Swindells, Canada Gerd Szwillus, Germany Susanne Tak, New Zealand Anthony Tang, USA Charlotte Tang, Canada Michael Tangermann, Germany Franck Tarpin-Bernard, France Alex Taylor, UK Stephanie Teasley, USA Ant´ onio Teixeira, Portugal Michael Terry, Canada VinhTuan Thai, Ireland Harold Thimbleby, UK Martin Tomitsch, Australia Daniela Trevisan, Brazil Sylvia Truman, UK Manfred Tscheligi, Austria Nikolaos Tselios, Greece Simon Tucker, UK Markku Turunen, Finland Brygg Ullmer, USA Leon Urbas, Germany Teija Vainio, Finland Leonel Valbom, Portugal Egon L. van den Broek, Austria Thea van der Geest, The Netherlands Ielka van der Sluis, Ireland Erik van der Spek, The Netherlands Jean Vanderdonckt, Belgium Radu-Daniel Vatavu, Romania Manuel Veit, France Jayant Venkatanathan, Portugal Arnold P.O.S. Vermeeren,
The Netherlands Bart Vermeersch, Belgium Jo Vermeulen, Belgium Fr´ed´eric Vernier, France Roel Vertegaal, Canada Markel Vigo, UK Nadine Vigouroux, France Thomas Visser, The Netherlands Stephen Voida, USA Ivan Volosyak, Germany Jade Wang, USA Qing Wang, China Leon Watts, UK Astrid Weiss, Austria Peter Wild, UK Graham Wilson, UK Max Wilson, UK Heike Winschiers-Theophilus, Namibia Jacob Wobbrock, USA Peter Wolkerstorfer, Austria Chui Yin Wong, Malaysia Michael Wright, UK Min Wu, USA Peta Wyeth, Australia Alvin W. Yeo, Malaysia James Young, Canada Ray Yun, USA Loutfouz Zaman, Canada Panayiotis Zaphiris, Cyprus Martina Ziefle, Germany Juergen Ziegler, Germany Gottfried Zimmermann, Germany Martin Zimmermann, Germany
INTERACT 2011 Technical Committee
Sponsors Gold
Silver
Bronze
Supporters
Organization
XXIII
Table of Contents – Part II
Long and Short Papers Health I Finding the Right Way for Interrupting People Improving Their Sitting Posture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Haller, Christoph Richter, Peter Brandl, Sabine Gross, Gerold Schossleitner, Andreas Schrempf, Hideaki Nii, Maki Sugimoto, and Masahiko Inami
1
Exploring Haptic Feedback in Exergames . . . . . . . . . . . . . . . . . . . . . . . . . . . Tadeusz Stach and T.C. Nicholas Graham
18
Identifying Barriers to Effective User Interaction with Rehabilitation Tools in the Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephen Uzor, Lynne Baillie, Dawn Skelton, and Fiona Fairlie
36
Clinical Validation of a Virtual Environment Test for Safe Street Crossing in the Assessment of Acquired Brain Injury Patients with and without Neglect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patricia Mesa-Gresa, Jose A. Lozano, Roberto Ll´ orens, Mariano Alca˜ niz, Mar´ıa Dolores Navarro, and Enrique No´e
44
Health II Smart Homes or Smart Occupants? Supporting Aware Living in the Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lyn Bartram, Johnny Rodgers, and Rob Woodbury
52
Input Devices in Mental Health Applications: Steering Performance in a Virtual Reality Paths with WiiMote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maja Wrzesien, Mar´ıa Jos´e Rup´erez, and Mariano Alca˜ niz
65
´ ˇ in Electronic Patient Record Research: A Bridge SActed RealityS between Laboratory and Ethnographic Studies . . . . . . . . . . . . . . . . . . . . . . . Lesley Axelrod, Geraldine Fitzpatrick, Flis Henwood, Liz Thackray, Becky Simpson, Amanda Nicholson, Helen Smith, Greta Rait, and Jackie Cassell Exercise Support System for Elderly: Multi-sensor Physiological State Detection and Usability Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Macek and Jan Kleindienst
73
81
XXVI
Table of Contents – Part II
Human Factors I Estimating the Perceived Difficulty of Pen Gestures . . . . . . . . . . . . . . . . . . Radu-Daniel Vatavu, Daniel Vogel, G´ery Casiez, and Laurent Grisoni
89
On the Limits of the Human Motor Control Precision: The Search for a Device’s Human Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fran¸cois B´erard, Guangyu Wang, and Jeremy R. Cooperstock
107
Three around a Table: The Facilitator Role in a Co-located Interface for Social Competence Training of Children with Autism Spectrum Disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massimo Zancanaro, Leonardo Giusti, Eynat Gal, and Patrice T. Weiss
123
Human Factors II Moving Target Selection in 2D Graphical User Interfaces . . . . . . . . . . . . . . Abir Al Hajri, Sidney Fels, Gregor Miller, and Michael Ilich
141
Navigational User Interface Elements on the Left Side: Intuition of Designers or Experimental Evidence? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Holzinger, Reinhold Scherer, and Martina Ziefle
Study on the Usability of a Haptic Menu for 3D Interaction . . . . . . . . . . . Giandomenico Caruso, Elia Gatti, and Monica Bordegoni
186
Interacting in Public Spaces Balancing Act: Enabling Public Engagement with Sustainability Issues through a Multi-touch Tabletop Collaborative Game . . . . . . . . . . . . . . . . . Alissa N. Antle, Joshua Tanenbaum, Allen Bevans, Katie Seaborn, and Sijie Wang Understanding the Dynamics of Engaging Interaction in Public Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Dalsgaard, Christian Dindler, and Kim Halskov Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Astrid Weiss, Nicole Mirnig, Roland Buchner, Florian F¨ orster, and Manfred Tscheligi
194
212
230
Table of Contents – Part II
XXVII
Interacting with Displays Comparing Free Hand Menu Techniques for Distant Displays Using Linear, Marking and Finger-Count Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . Gilles Bailly, Robert Walter, J¨ org M¨ uller, Tongyan Ning, and Eric Lecolinet Design and Evaluation of an Ambient Display to Support Time Management during Meetings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Valentina Occhialini, Harm van Essen, and Berry Eggen Does Panel Type Matter for LCD Monitors? A Study Examining the Effects of S-IPS, S-PVA, and TN Panels in Video Gaming and Movie Viewing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ki Joon Kim and S. Shyam Sundar ModControl – Mobile Phones as a Versatile Interaction Device for Large Screen Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthias Deller and Achim Ebert
248
263
281
289
Interaction Design for Developing Regions A New Visualization Approach to Re-Contextualize Indigenous Knowledge in Rural Africa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kasper Rodil, Heike Winschiers-Theophilus, Nicola J. Bidwell, Søren Eskildsen, Matthias Rehm, and Gereon Koch Kapuire Design Opportunities for Supporting Treatment of People Living with HIV / AIDS in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anirudha Joshi, Mandar Rane, Debjani Roy, Shweta Sali, Neha Bharshankar, N. Kumarasamy, Sanjay Pujari, Davidson Solomon, H. Diamond Sharma, D.G. Saple, Romain Rutten, Aakash Ganju, and Joris Van Dam In Class Adoption of Multimedia Mobile Phones by Gender - Results from a Field Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elba del Carmen Valderrama-Bahamondez, Jarmo Kauko, Jonna H¨ akkil¨ a, and Albrecht Schmidt
297
315
333
Interface Design Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicholas Graham, Emmanuel Dubois, Christophe Bortolaso, and Christopher Wolfe Pattern Tool Support to Guide Interface Design . . . . . . . . . . . . . . . . . . . . . Russell Beale and Behzad Bordbar
341
359
XXVIII
Table of Contents – Part II
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity in Reminiscing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Helmes, Kenton O’Hara, Nicolas Vilar, and Alex Taylor
376
International and Cultural Aspects of HCI Culture and Facial Expressions: A Case Study with a Speech Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beant Dhillon, Rafal Kocielnik, Ioannis Politis, Marc Swerts, and Dalila Szostak Equality = Inequality: Probing Equality-Centric Design and Development Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rilla Khaled e-Rural: A Framework to Generate Hyperdocuments for Milk Producers with Different Levels of Literacy to Promote Better Quality Milking . . . . Vanessa Maia Aguiar de Magalhaes, Junia Coutinho Anacleto, Andr´e Bueno, Marcos Alexandre Rose Silva, Sidney Fels, and Fernando Cesar Balbino Designing Interactive Storytelling: A Virtual Environment for Personal Experience Narratives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilda Ladeira, Gary Marsden, and Lesley Green
392
405
422
430
Interruptions and Attention Choosing Your Moment: Interruptions in Multimedia Annotation . . . . . . Christopher P. Bowers, Will Byrne, Benjamin R. Cowan, Chris Creed, Robert J. Hendley, and Russell Beale
The Role of Modality in Notification Performance . . . . . . . . . . . . . . . . . . . . David Warnock, Marilyn McGee-Lennon, and Stephen Brewster
572
Multi-User Interaction / Cooperation Co-located Collaborative Sensemaking on a Large High-Resolution Display with Multiple Input Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katherine Vogt, Lauren Bradel, Christopher Andrews, Chris North, Alex Endert, and Duke Hutchings Exploring How Tangible Tools Enable Collaboration in a Multi-touch Tabletop Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tess Speelpenning, Alissa N. Antle, Tanja Doering, and Elise van den Hoven Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration in Information Seeking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathias Heilig, Stephan Huber, Jens Gerken, Mischa Demarmels, Katrin Allmendinger, and Harald Reiterer
589
605
622
XXX
Table of Contents – Part II
Navigation and Wayfinding A Tactile Compass for Eyes-Free Pedestrian Navigation . . . . . . . . . . . . . . . Martin Pielot, Benjamin Poppinga, Wilko Heuten, and Susanne Boll Are We There Yet? A Probing Study to Inform Design for the Rear Seat of Family Cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Wilfinger, Alexander Meschtscherjakov, Martin Murer, Sebastian Osswald, and Manfred Tscheligi Don’t Look at Me, I’m Talking to You: Investigating Input and Output Modalities for In-Vehicle Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lars Holm Christiansen, Nikolaj Yde Frederiksen, Brit Susan Jensen, Alex Ranch, Mikael B. Skov, and Nissanthen Thiruravichandran Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
640
657
675
693
Finding the Right Way for Interrupting People Improving Their Sitting Posture Michael Haller1,3, Christoph Richter1, Peter Brandl1, Sabine Gross1, Gerold Schossleitner2, Andreas Schrempf2, Hideaki Nii3, Maki Sugimoto3, and Masahiko Inami3 1
Media Interaction Lab, Upper Austria University of Applied Sciences, Austria Medical Technology, Upper Austria University of Applied Sciences, Austria 3 Keio-NUS Cute Center, Singapore/Japan [email protected]
2
Abstract. In this paper, we present three different ways of interrupting people to posture guidance. We developed an ergonomically adjustable office chair equipped with four sensors measuring the office worker’s posture. It is important that users do some training after bad posture and be alerted of this; therefore, we implemented three different alert modalities (Graphical Feedback, Physical Feedback, and Vibrotactile Feedback), with the goal to find out which of the techniques is the most effective interruption modality without causing a huge disruption effect. To measure the task-performance, we conducted a formal user study. Our user study results show there are different effects on performance and disruptiveness caused by the three interruption techniques. While the vibrotactile feedback might have higher information awareness benefits at the beginning, it causes a huge intrusion side-effect. Thus, the physical feedback was rated less disruptive to the workflow as the other two feedback modalities. Keywords: Posture Care, Interrupts, Physical Feedback, Graphical Feedback, Vibrotactile Feedback.
Fig. 1. Sitting postures that can be classified by the intelligent office chair. 1. upright, 2. leaning back, 3. leaning forward, 4. sitting at the front edge, 5. leaning right, 6. right leg crossed over left leg, 7. left leg crossed over right leg, 8. slouching.
they spend on work activities while sitting statically [9]. In order to improve the sitting behavior of office workers, we developed an intelligent office chair, which allows us to classify the sitting posture and the corresponding time the person sits statically in any position (Fig. 1). The aim of the intelligent office chair is to guide the person through an effective feedback to a more dynamic and healthy sitting behavior. Our setup is based on a regular adjustable office chair, equipped with four independent, especially designed force transducers. The four sensors are located at each corner under the seating surface, thus making it possible to compute the coordinates of Center of Pressure (CoP). In order to do this, the reference frame is located in the center of the seating surface. The coordinates of the CoP vary according to the posture the person is sitting on the chair, which in turn allows users to classify the sitting posture (Fig. 1) and the time spent in the corresponding position. After inadequate sitting, our system provides feedback [9] and the office worker gets an alert for performing a training session. We implemented three different techniques, including a graphical, a physical, and a vibrotactile interruption feedback, with the goal to find out which of the techniques is the most effective interruption modality without causing a huge disruption effect (cf. Fig. 2). Humans have their cognitive limitations, which makes them susceptible to errors once interrupted [2]. On the other side, we know from multiple resource theory that humans are able to perform different tasks in parallel as long as the tasks do not utilize the same cognitive resource [21]. Based on this theory, we assumed that office workers (who rely heavily on visual processing) would find both the graphical and physical feedback alert more distracting and less acceptable than the vibrotactile feedback. On the other side, the vibration might be harder to detect at the beginning, but it might be also harder to ignore once it was present [2]. We expected that the vibrotactile feedback followed by the physical avatar feedback is less disruptive than the digital interruption modality. However, the digital technique might be the fastest way to get the users’ attention, because it is directly shown in the user’s field of view.
Finding the Right Way for Interrupting People Improving Their Sitting Posture
①
❷
3
③
Fig. 2. Graphical (1), Physical (2), and Vibrotactile Feedback (3) should alert users to perform a training session
Summarizing, all three techniques are being used with two different meanings: providing a feedback about the office worker’s posture and triggering an alert message once the user should perform a relaxing training session. All techniques allow users to decide themselves when they want to switch from the primary task to the training session without capturing the user's context and realizing that they are not in a critical phase of a task.
2 Related Work 2.1 Posture Detection and Posture Guidance There are three possibilities to train users sitting ergonomically correct on chairs, including a) an ergonomic chair with instable support, b) triggering a training session, and c) providing direct guidance on the actual sitting position. First of all, an ergonomic chair with instable support can be used, where users always have to balance their body on the chair that keeps them active (e.g. Haider - Bioswing) and/or by triggering users to perform a training session (hereby users can use a regular office-chair1). WorkPace2 is one of the most well known applications for users to train their muscle fatigue and recovery. The application alerts users whenever exercises (e.g. stretching exercises) should be performed. In contrast to WorkPace, we are tracking the chair during a longer period, thus getting permanent feedback from the user. Consequently our software can trigger alerts more precisely and provide optimized training exercises. Another program is RSI-Shield3, a user customizable application, which simply generates break events with a pre-defined frequency. During the break the user is advised to perform simple exercises, which can be done using a normal office chair. In contrast to the latter applications, the aim of our work is to detect whether the user
is sitting correct or not and only to interrupt the user, if an unhealthy sitting position is recognized by the sensors of the intelligent chair. Finally, Zheng and Morell [22] propose an ergonomic chair, which guides office workers to sit in a pre-defined position by providing a vibrotactile feedback. Force sensors placed on the sitting support as well as on the back rest of the chair are used to compute a feedback signal by using small vibration actuators. If the user is sitting in a desired position, one or more actuators vibrate in order to direct the user away from an undesired position. In their paper, the authors postulate that the sporadic “buzzes” helped users successfully to guide them into the desired posture. In contrast, our approach is to detect the sitting position of the user and the corresponding time the user sits in this position. Only if the user sits statically longer than a pre-defined time period, an interrupt will be generated. Since e.g. a slouching position is more harmful to the spine than leaning back using the backrest (compare position 8 vs. position 2 in Fig. 1) the according time period has been adjusted accordingly. 2.2 Interruption and Feedback In the last decade, a number of research groups have presented a lot of work around interruption and recovery with the goal of having a highly efficient interrupt with low intrusion [4, 7, 8, 12, 13, 14, 20]. Different feedback modalities for efficient interruptions have been explored by Selker and Arroyo. In [3], the authors present five different feedback modalities including smell, heat, sound, vibration, and light. In their paper, they come to the conclusion that using the right channel can evoke certain memories, which again might be optimal to be used in a system. The implementation of the Graphical Feedback has mainly been influenced by Bailey et al. [5], where the authors propose a new way interrupt people providing an optimal balance of information awareness with very low intrusion. However, they embed the alert window in the browser, which limits its usage for everyday applications. In our setup, we also used a vibrotactile feedback to posture guidance. This feedback modality was mainly inspired by Zheng and Morell [22]. In their paper, the authors postulate that the sporadic “buzzes” helped users successfully to guide them into the desired posture. However, the authors did not compare the haptic feedback with other modalities. From the multiple resource theory we know that noncompeting channels might have less negative disruption effects to office workers. In this paper, we went one step ahead and compared the impact of haptic and graphical feedback on the workflow.
3 Feedback Modalities 3.1 Graphical Feedback The Graphical Feedback is based on the ideas of Bailey et al., where the authors postulate an “Adjusting Window” technique for notifying users of new information [5]. Using this technique, the primary (task) window shrinks and the alert window is getting docked to the right side of the main window.
Finding the Right Way for Interrupting People Improving Their Sitting Posture
5
Fig. 3. Additional alert messages can be visualized by shrinking the main window (left) and by docking a new window on the right side of the main window (center). Users can decide the right timing for performing the training (right). In this case the main window shrinks again and becomes the secondary window docked to the main window of the training window.
Bailey et al. postulated in [5] that the Adjusting Window technique seems to be an optimal compromise notifying users about new information with very low disruption. In addition, a changeable icon in the taskbar represents the current status of the users’ posture. A small bending plant visualizes the posture of the user. In our setup, the main working window slightly shrinks, and the new alert window is wrapped along the right side of the primary window, see Fig. 3 (left, center). Based on the size of the actual working window, the docked window is getting its size. The main window stays remained in the adjusted state until the user is performing the training session, thus clicking to “performing exercise button” inside of the docked window. Once done by the user, the exercise window is becoming the new main window and the old main window animates to a small window, docked along the left side of the exercising window, cf. Fig. 3 (right). A one-minute exercise video is presented to the office workers, which they should follow. Finally, once the exercise has been performed, the main window is animated back to its original size and the small docked window on the right side disappears again. In contrast to Bailey et al.’s approach, we did not embed the docked window inside one specific application (e.g. browser), but we resized the actual working window, by getting the active window handle from the topused window [16]. 3.2 Physical Feedback In contrast to the Adjusting Window technique, which is working only digitally, we also implemented a physical avatar (see Fig. 4). Instead of using a physical puppet [9], we used a plastic plant based on the Hanappa toy4. The original Hanappa plant, manufactured by Sega Toys, flexes its petal and/or leafs based on human’s speech input. We modified the plant in three different ways: • Connection over USB: while the original plant is a stand-alone plant without any bidirectional communication, we embedded a computer board to drive the actuator via USB. • Replacement of the actuator module: The physical avatar uses a Shape Memory Alloy (SMA) technology, which makes it possible to bend its shape. Similar to the 4
http://www.segatoys.co.jp/hanappa
6
M. Haller et al.
Fig. 4. (left) The bending plant represents the office worker’s posture. (center) The image of the plant and detail of the actuator which consists of a fixed and a SMA wire. (right) The bending shape of the actuator without the bloom (top: without power, bottom: with power).
changing icon in the Windows taskbar, the plant can bend its shape to represent the user’s posture. Moreover, it can shake itself to motivate the user to perform a training session. In the modified version, we replaced the actuator module to improve both the bending angle and time. The original Hanappa can bend its leafs and petal with an angle ±10°. We changed the SMA by using a longer Biometal wire with a larger diameter (Biometal BMF-100 for the petal and Biometal BMF75 for the leafs). Both Biometal wires are able to change their length for 5% (thus the wire gets larger and/or smaller) depending on the applied power. Therefore, the modified version is able to bend with an angle of 60°, so that users get the impression that the plant is flabby and droopy. Similar to the changing icon in the Windows taskbar, the plant can bend its shape representing the users’ posture. Moreover, it can shake itself once the user should get motivated to perform an exercise. • Adding another leaf: Finally, we added another (second) SMA-leaf to the original Hanappa, which can be controlled again via USB. While the bloom of the flower should represent the user’s head, both leafs are representing the arms. 3.3 Vibrotactile Feedback Finally, we developed vibrotactile feedback with the aid of an actuator, so that users are motivated to change their seating behavior. To provide feedback about the status of the users’ posture, they receive innocuous vibrations, along with sporadic “buzzes”. The vibrations are created by the force feedback unit of a Logitech Rumblepad 2. In order to alert the user about a wrong sitting position, we used 0.5 seconds lasting „buzzes“. The alert’s magnitude was increased whenever participants constantly ignored the feedback. We started with a light vibration using 30% of the feedback’s maximum strength (which was defined by the maximum force that the Rumblepad 2 could achieve) and increased the force with 6 discrete steps (40%, 50%, 60%, 70%) to finally 80%.
Finding the Right Way for Interrupting People Improving Their Sitting Posture
7
4 User Study 4.1 Pilot-Study In a pilot-study with 6 participants we tracked the participants’ posture without any alert during a document-editing task. All participants had to extract words out of a text, where all spaces were removed. During the one-hour tracking session episodes of static sitting were identified, where the coordinates of the COP remained within a pre-defined region. During the one-hour session 5.8 (SD = 3.9) episodes of static sitting were identified lasting for 7.2 (SD = 13.5) minutes. The condition for an interrupt alert was satisfied, if an episode of static sitting lasts longer than 5 minutes. We found out that participants were achieving 5.8 (SD = 4.9, MIN = 2, MAX = 13) of “silent” interrupt alerts during the one-hour tracking session. 4.2 Experimental Design 12 participants from the local software park were recruited to perform the laboratory user study. Participants were asked to perform three time consuming tasks. The experiment consisted of three different tasks, including the editing of a document, writing a transcript based on a video clip, and searching & planning a trip task. Summarizing, the study was a 3 (task) x 3 (feedback), counterbalanced within-subject design, which took 1.5 hours (10 minutes for each task). In the study, we measured the participant’s posture for triggering an alert. In addition, the results of the pre-study motivated us to interrupt participants within 10 minutes at least once - even if they wouldn’t have triggered an alert themselves (which actually did not happen). After each interrupt, participants could decide when to start a training session. If they ignored the training request for more than 15 seconds, the alert stopped and reminded them again after 30 seconds. This reminding sequence was repeated until the user finally started the training. 4.3 Tasks Building on the experimental task classes suggested in [1], we devised three types of tasks for this study. The first type was a document-editing task. Articles from Wikipedia were converted into Microsoft Word document and shortened to an average of 1,820 words. Afterwards, spelling errors were introduced, words were replaced or skipped and some punctuation was removed. In all cases mistakes were indicated by the inbuilt spellchecker and marked with comments. Three instances of the editing task were created, building on similar yet distinct articles. Participants were instructed to make the requested changes as quickly and accurately as possible. This task required work within a single application. The second set of tasks consisted of three news media clips. The video clips were about 2 minutes 20 seconds in length. Participants were instructed to produce a transcript of the narrator’s text in Microsoft Word; thus participants had to work with two different desktop applications in parallel. The third type of task was a combined web-search and planning task. Participants were asked to plan for a short trip and search for information regarding transportation
8
M. Haller et al.
and accommodation on the web in line with the key data given in the assignment. Participants were asked to document their findings in a Word document. Destinations, types of accommodation and means of transportation were varied between task instances to counteract strategic learning effects. Thus, in the third task, participants had to work with multiple applications and make active use of information from various web sources. These tasks were chosen to cover a variety of task demands, while still being meaningful examples of commonly performed everyday tasks. The tasks are supposed to induce a variation of cognitive loads on different mental resources, triggering different cognitive processes and differing in complexity. To ensure that time on task was the same for all subjects, the length of the texts and videos as well as a the key data for the short trip were adjusted so that full completion of the task without interruption would be quite unlikely within the time limits set. 4.4 Participants Twelve participants (3 females) aged between 22 and 42 years of age (average age was 28 years old) joined the user study. All participants had a good experience with both Microsoft Windows and their Office products. 4.5 Apparatus Fig. 5 (left) depicts the apparatus of our study, including a Tobii eye-tracking screen, a webcam on the top for tracking the participants’ faces and the physical Hanappa, placed on the right side of the LCD. All three feedback modalities (as described before) were used in the study based on the same hardware. The experiments were performed using a 17” TFT with a screen resolution of 1280×1024 pixels. 4.6 Procedure Participants were welcomed and introduced to the purpose of the study. They were then given instructions on the tasks they have to perform. In addition, they were also informed that they would be interrupted periodically to do a training session. The participants were told to do the task exercises as fast and as accurate as possible. After each task they were shown a modified NASA-TLX survey. 4.7 Performance Measurements We counted the number of training sessions that were postponed during a task. A training session was considered to be postponed if the participant did not react to the alert within 15 seconds. After every 30 seconds, the participant was reminded again to perform the training. If ignored, the exercise was again counted as postponed. Furthermore, we measured the times until an exercise was started after the initial alert. If an exercise was postponed, we added the time to the overall time until the exercise has been started. Finally, we measured the time for returning to the suspended primary task after a performed exercise. Therefore, we logged the first mouse and/or keyboard input event on the primary window after the exercise was finished. To confirm the measured times, we additionally double-checked the transition times from
Finding the Right Way for Interrupting People Improving Their Sitting Posture
9
Fig. 5. (left) The apparatus of the user study including the eye-tracking LCD of Tobii. (right) The custom analyzing tool allows a better analysis of how participants react to an interrupt. Capturing the participants’ screen augmented with the gazing plot helped us to analyze better the results.
the end of a training to the resumed primary task through the Tobii’s eye capturing screen5. In addition to the gaze data, the system logged any input on the screen, thus allowing us to exactly analyze the time when a user started to work on the primary task again. For further exploring the participants’ behavior to an interrupt, we implemented our custom analyzing tool (cf. Fig. 5 (right)). The tool parses Tobii’s log files and allows visualizing gaze data over time, choosing a user-specific time interval. The timeline provides a time span control that helps to analyze the time period shortly before and after an interrupt happened. In addition to the recorded gaze data, the tool allows to browse through snapshots of the participants that were taken by the additional webcam, mounted on top of the Tobii LCD. In order to provide additional information, we color-coded the gazing plots for periods before and after the interrupt. Fig. 5 (right) depicts the timeline with an interrupt occurring after 29 seconds. The gazing blobs of the left screen of the figure occurred before the interrupt are visualized with warm gradient colors (red to yellow). In contrast, all gazing blobs after the alert are cold color-coded (turquoise to green). 4.8 Emotional State Measurements To measure the effect of the interruptions on the user experience and emotional state, a modified version of the NASA-TLX survey way administered to the participants after each trial. While the NASA-TLX survey [11] was originally meant to assess the subjective workload, its scales are also relevant to the experience of interruption [1]. A particular advantage of the NASA-TLX is its short length with 6 items in the original and 8 items in our modified version, which allows us to present it frequently as required in this study. The modified version used in this study was derived from the German translation of the TLX by [18]. While the Physical Demand scale was skipped, we added the following three items to get more specific information on the perceived impact of the continuous feedback and interruptions: 5
http://www.tobii.com
10
M. Haller et al.
1. Workflow: How disturbing was the alert for the workflow? 2. Feedback: How disturbing was the continuous feedback? 3. Training: How disturbing was the alert to perform the training? As suggested by [1], we administered the survey on paper rather than in electronic form to avoid interference with the experimental tasks.
5 Results 5.1 Performance Measurement Results The results of the performance measurement are depicted in Table 1 (top/bottom). A two-way within-subjects analysis of variance (ANOVA) was conducted to evaluate the effect of feedback conditions and task type on the number of postponements, the time span from the training alert to the start of the training as well as the time it took to resume the main task. These results are summarized in Table 1 (top). For all tests an alpha level of 0.05 was used. There were no significant interaction effects. The Greenhouse-Geisser correction was used when the assumption of sphericity was violated. Post-hoc analyses were conducted on the significant main effects. These consisted of paired-samples t-tests with familywise error rate controlled across the test using Holm’s sequential Bonferroni approach. Significant differences between the means of pairs of conditions are presented in Table 1 (bottom). Table 1. (top) Main effects for performance measures. (bottom) Significant mean differences along performance measures between pairs of conditions. Starred results indicate marginally significant results 0.05 > p > 0.0167.
t(11) = -2.359 t (11) = -2.213 t (10) = 3.062 t (9) = -3.443 t (10) = -4.748
0.038* 0.049* 0.012 0.038* 0.003
Time until 1st training was started Time to resume to main task
5.1.1 Type of Task The type of task had a significant main effect on the tendency to postpone a training session. The fact that less trainings were postponed during the editing task (M = 1.63, SD = 1.76), in comparison to the transcription task (M = 2.67, SD = 2.12) and the search and planning task (M = 2.79, SD = 1.68), might be explained by the fact that this task required participants to work on a single document only, thus making it easier to respond to the interrupt and return to the main task afterwards. For the other
Finding the Right Way for Interrupting People Improving Their Sitting Posture
11
two tasks, multiple documents had to be handled simultaneously which made a switch to the training and return to the main task more complex. 5.1.2 Type of Feedback Even though the main effect for the type of feedback was non-significant, the tendency to postpone a training session appeared to be lowest under the vibrotactile feedback condition, cf. Fig. 6 (left). This might be due to the fact that the vibrotactile feedback was assessed as the most disturbing one (see results of NASA-TLX). Furthermore, regarding the time span until a training was started the results indicate that participants took significantly longer to start with the training under the graphical feedback condition (M = 42.22, SD = 28.54) than under the vibrotactile condition (M = 16.58, SD = 9.71). Analyzing the times to return to the main task after a training session, we found surprisingly short time spans. Again we found a main effect for the type of feedback, indicating that the time to return to the main task was longer under the graphical feedback condition (M = 6.42, SD = 1.65) than under the vibrotactile condition (M = 4.79, SD = 1.24) or the physical feedback condition (M = 3.97, SD = 1.71).
Fig. 6. (left) Boxplot of the number of trainings postponed for each task (right) and of the timespan until the first training session was started after the initial alert
Fig. 7. Snapshot of a participant, who is looking at the physical avatar before starting the training. First, the user is working on the primary task (left). The shaking avatar attracts his attention (center) and s/he starts the training (right).
We counted the number of times a user looked at the taskbar feedback icon in the digital avatar situation from the Tobii screen recordings (cf. Fig. 7). We realized that participants paid attention to the feedback, although some stated that they were not
12
M. Haller et al.
aware of the feedback (Task 1: M = 3.41, SD = 3.34; Task 2: M = 3.58, SD = 1.56; Task 3: M = 3.92, SD = 3.29). This discrepancy between their evaluation and the collected data might be caused by the fact that they frequently glanced at the icon when switching between two applications (word and a browser, for example). In this situation, they might not have intentionally looked at the icon, but they nevertheless shortly checked the state of the graphical feedback. 5.2 Emotional State Measurements Results The results of the NASA-TLX questionnaires are depicted in Table 2 and Table 3. Table 2. Main effects for TLX dimensions. Starred results are significant for α = 0.05
Temporal Demand Performance Effort Frustration Workflow Disturbance due to Feedback Disturbance due to Training
Table 3. Significant mean differences along TLX dimensions between pairs of conditions.
TLX Value
Pair
F
p
Temporal Demand
Edit - Transcribe Edit - Plan Edit - Plan Transcribe - Plan Edit - Transcribe Edit - Plan Edit - Plan Edit - Transcribe Graphical – Physical Graphical - Vibration Physical - Vibration Graphical – Physical
t(11) = -4.930 t (11) = -5.438 t (11) = -7.910 t (11) = -3.893 t (11) = -3.443 t (11) = -4.748 t (11) = -4.415 t (11) = -3.273 t (11) = 3.785 t (11) = -4.899 t (11) = -4.529 t (11) = 4.597
Performance Effort Frustration Workflow Workflow Disturbance due to Feedback Disturbance due to Training
Finding the Right Way for Interrupting People Improving Their Sitting Posture
13
A two-way within-subjects analysis of variance (ANOVA) was conducted to evaluate the effect of feedback strategies and task type on the various TLX measures. The dependent variables were continuous TLX ratings of 0 (low) to 20 (high). Results are summarized in Table 2. There were no significant interaction effects. Post hoc analyses were conducted on the significant main effects. These consisted of pairedsamples t-tests with familywise error rate controlled across the test using Holm’s sequential Bonferroni approach. Significant differences between the means of pairs of conditions are presented in Table 3. 5.2.1 Type of Task The type of task had a significant main effect on reported temporal demand, performance, effort and frustration (cp. Fig. 8).
Fig. 8. Boxplot of average NASA-TLX scores for the 3 task types (0 = low, 20 = high)
The task load was assessed as lowest for the editing task, while the planning task was assessed as the most demanding. Besides this the type of task also had a significant main effect on the workflow and the disturbance due to training alerts. The impact on the workflow and the disturbance due to training alerts was rated as most severe for the transcription task. 5.2.2 Type of Feedback The type of feedback had a significant main effect on participants’ reported interruption of the workflow (cf. Fig. 9). The physical avatar was rated as less disruptive to the workflow as the other two feedback modalities. The type of feedback also had significant main effects on the perceived disturbance due to both continuous feedback on the sitting position as well as the alert for a training session, even though the patterns are different (cf. Fig. 10). While participants rated the vibrotactile feedback as more disturbing than the digital and physical feedback, digital and vibrotactile feedback were assessed as more disturbing than the physical feedback when providing alerts for training.
14
M. Haller et al.
Graphical Feedback
Fig. 9. Boxplot depicting participants ratings of the Impact on Workflow (0 = low, 20 = high)
Fig. 10. Boxplot of the perceived disturbance due to continuous feedback and alerts (0 = low, 20 = high)
6 Discussion The fact that the vibrotactile feedback resulted in quite low response times across all three task types (cf. Fig. 6, right) is in line with participants’ comments that they intuitively tried to stop the continuous vibration feedback highly soon, because it was annoying for them. Four participants reported during the survey that the vibrotactile feedback has been “extremely disruptive” during the task and they “might switch it off if they had to use it in a long-term study”. The vibrotactile feedback was harder to ignore for them compared to the graphical and physical feedback. In contrast to the graphical and physical feedback, the vibrotactile feedback was clearly noticeable with the lightest feedback status already (30% of the full vibration strength). With increasing strength (up to 80%), the feedback’s disrupting effect was also incrementing. The most obvious way to deal with the feedback was to simply react to the alert and start with the training. The general short times we got for returning to the main task seem to be caused by the type of interrupt we are dealing with; since there is low cognitive load during the physical training and the participants are informed about how long the task lasts (through the countdown timer), they already plan the next steps of the primary task. Fig. 11 depicts the tracking data of one participant, where she looked at the browser’s
Finding the Right Way for Interrupting People Improving Their Sitting Posture
15
Fig. 11. During the training session (1), the participant checks the current time in the lower right corner (2) and already plans an actions in the primary task window (3). At the end of the training, the attention is focused to the close button (4). Immediately after closing the training window (5), the participant clicks the browser tab as planned before (6).
tab 19 s before the end of the training. After the training was finished, it took her only 2 s to click exactly this tab. This phenomenon was highly interesting and it has been observed with other 6 participants. Although the tendency to postpone a training appeared to be lowest under the vibrotactile feedback condition, most of the participants had the impression that the vibrotactile feedback was assessed as the most disturbing one, which would result in a termination of the system after a short period of use. In the participants’ feedback comments, we noticed the same comments saying “I noticed the vibrotactile really fast – but it was so disturbing – I just wanted to turn it off”. Similar to McFarlane’s conclusion, we found that giving people the control when to react to an interrupt might cause the side-effect that people always try to postpone interrupting alerts [17]. Finally, in our study, we also noticed that 7 of the participants did not notice the shaking physical avatar at the very beginning. But they also mentioned that once they noticed it, it was less disturbing, because it was not in their field of view.
7 Conclusion and Future Work Summarizing it turns out that the proposed posture chair with the physical (ambient) interrupt motivates people to improve their sitting behavior – even if they have to work focused on a primary task. The comments and the data from our study also demonstrated that an additional (visual) feedback will be accepted – especially if it does not interfere with the working screen. The results of the first study motivate us to improve the current system. Moreover, we plan to do a long-term field study using 12 chairs for a period of three months. This study will be done in cooperation with physical therapists with the overall goal to demonstrate the benefits of a posture chair setup. Acknowledments. This project is part of the Research Studio Austria NiCE, funded by the FFG, 818621. This project is also partially funded by a grant from the National Research Foundation administered by the Media Development Authority of Singapore (CUTE Project No. R-705-000-100-279). Moreover it is also supported by the
16
M. Haller et al.
government of Upper Austria and the Upper Austria University of Applied Sciences under the project name “PostureCare”. The authors would like to thank our anonymous reviewers for their very useful comments.
References 1. Adamczyk, P.D., Bailey, B.P.: If not now, when? The effects of interruption at different moments within task execution. In: Proceedings of CHI 2004, pp. 271–278. ACM, New York (2004) 2. Arroyo, E., Selker, T.: Arbitrating multimodal outputs: Using ambient displays as interruptions. In: Human-Computer Interaction: Theory and Practice (Part II) Proceedings of HCI International 2003, vol. 2, pp. 591–595 (2003) 3. Arroyo, E., Selker, T., Stouffs, A.: Interruptions as Multimodal Outputs: Which are the Less Disruptive? In: IEEE International Conference on Multimodal Interfaces, p. 479 (2002) 4. Bailey, B.P., Konstan, J.A., Carlis, J.V.: The effects of interruptions on task performance, annoyance, and anxiety in the user interface. In: INTERACT, pp. 593–601 (2001) 5. Bailey, B.P., Konstan, J.A., Carlis, J.V.: Adjusting windows: Balancing information awareness with intrusion. In: Kortum, P., Kunzinger, E. (eds.) Proceedings of the 6th Conference on Human Factors and the Web: Doing Business on the Web, Austin, TX (2000) 6. Beach, T.A., Parkinson, R.J., Stothart, J.P., Callaghan, J.P.: Effects of prolonged sitting on the passive flexion stiffness of the in vivo lumbar spine. Spine J. 5, 145–154 (2005) 7. Czerwinski, M., Cutrell, E., Horvitz, E.: Instant Messaging: Effects of Relevance and Timing. In: People and Computers XIV: Proceedings of HCI 2000, pp. 71–76 (2000) 8. Czerwinski, M., Horvitz, E., Wilhite, S.: A diary study of task switching and interruptions. In: Proceedings CHI 2004, pp. 175–182. ACM, New York (2004) 9. Daian, I., van Ruiten, A.M., Visser, A., Zubic, S.: Sensitive chair: a force sensing chair with multimodal real-time feedback via agent. In: Proceedings of the 14th European Conference on Cognitive Ergonomics: Invent! Explore!, vol. 250, pp. 163–166. ACM, New York (2007) 10. Ertel, M., Junghanns, G., Pech, E., Ullsperger, P.: Effects of VDU-assisted work on health and well-being. Research Report 762, Federal Institute for Occupational Safety and Health, BAuA (1997) 11. Hart, S.G., Staveland, L.E.: Development of a NASA-TLX (Task load index): Results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, pp. 139–183 (1988) 12. Horvitz, E., Apacible, J.: Learning and reasoning about interruption. In: Proceedings of the 5th International Conference on Multimodal Interfaces, pp. 20–27. ACM, New York (2003) 13. Iqbal, S.T., Bailey, B.P.: Effects of intelligent notification management on users and their tasks. In: Proceeding of CHI 2008, pp. 93–102. ACM, NY (2008) 14. Iqbal, S.T., Horvitz, E.: Disruption and recovery of computing tasks: field study, analysis, and directions. In: Proceedings of CHI 2007, pp. 677–686. ACM, NY (2007) 15. Kingma, I., van Dieen, J.H.: Static and dynamic postural loadings during computer work in females: Sitting on an office chair versus sitting on an exercise ball. Applied Ergonomics 40(2), 199–205 (2009)
Finding the Right Way for Interrupting People Improving Their Sitting Posture
17
16. Lieberman, H.: Autonomous interface agents. In: Proceedings of CHI 1997, pp. 67–74. ACM, New York (1997) 17. McFarlane, D.: Coordinating the interruption of people in human-computer interaction. In: Proceedings of Interact 1999, pp. 295–303 (1999) 18. Pfendler, C.: Vergleichende Bewertung der NASA-TLX-Skala bei der Erfassung von Lernprozessen. Forschungsinstitut für Anthropotechnik, Wachtberg, Bericht No. 2 (1991) 19. Rivera, D.: The effect of content customization on learnability and perceived workload. In: Proceedings of CHI 2005, pp. 1749–1752. ACM, New York (2005) 20. Salvucci, D.D., Bogunovich, P.: Multitasking and monotasking: the effects of mental workload on deferred task interruptions. In: Proceedings of CHI 2010, pp. 85–88. ACM, New York (2010) 21. Wickens, C.D., Hollands, J.G.: Engineering Psychology and Human Performance. Harper Collins, New York (1992) 22. Zheng, Z., Morrell, J.B.: A vibrotactile feedback approach to posture guidance. In: 2010 IEEE Haptics Symposium, pp. 351–358.
Exploring Haptic Feedback in Exergames Tadeusz Stach and T.C. Nicholas Graham School of Computing, Queen’s University Kingston, ON, Canada {tstach,graham}@cs.queensu.ca
Abstract. Exergames combine entertainment and exercise in an effort to encourage people to be more physically active. Although exergames require active input, interactions are less physical than those experienced in real-world exercise. Interactions can feel artificial, limiting the captivating experience exergames aim to provide. To address this problem, haptics have been proposed as a means of providing additional feedback to players through the sense of touch. However, there is very little empirical evidence supporting the benefits of haptics in exergames. To address this, we have identified and evaluated three ways in which haptic feedback can enhance exergames: by helping to balance group exercise among people of different fitness levels, by guiding players toward safe and healthy interaction, and by increasing peoples’ sense of virtual presence in exergames. We present three novel exergames incorporating haptic feedback, and report on experiments investigating their success. We find that haptics which are consistent with actions displayed on-screen increase immersion and improve enjoyment. However, we discover pitfalls when using haptics to represent phenomena that do not have a physical basis. These results allow us to present a set of design issues for haptic feedback in exergames. Keywords: Exergames, haptics, force-feedback, exercise video games, exertion interfaces, active games.
In this paper we help to address this gap by investigating how haptic feedback can be used to improve three aspects of exergaming: Balancing: A major advantage of exergames over traditional physical activity is that games can mediate the effort players must expend, allowing people of disparate physical abilities to play together. We demonstrate how haptics can aid in creating effective and enjoyable balancing in multiplayer exergames. Safe and Healthy Interaction: Another significant advantage of exergames is that they can guide players toward a level of exertion that is both safe and beneficial to their health. We demonstrate how haptic feedback can be integrated into gameplay to provide players with subtle cues as to when exertion limits have been reached. Presence: The key promise of exergames is that players’ enjoyment of physical activity can be enhanced by presenting it through an immersive virtual world. Players’ sense of presence is achieved through consistent visual and auditory feedback. We demonstrate that haptics can significantly increase players’ feeling of presence within a virtual world. To illustrate these applications of haptics, we have created three novel exergames. In the following sections, we explore the design of haptic feedback within these games, and report on experiments investigating their success. We found both positive and negative results for haptics. For example, haptics works well when reporting a physical phenomenon in the virtual world, and less well when tied to abstract properties that cannot be directly observed. We conclude by summarizing our lessons learned from exploring haptic feedback in exergames.
2 Background This work builds on and combines existing research in exergaming and haptics. Exergames, a form of “exertion interface” [21], aim to encourage physical activity by combining video games and exercise. Commercial exergaming platforms include the Nintendo Wii, Konami Dance Dance Revolution (DDR), Fisher-Price Smart Cycle, as well as the recently released Sony PlayStation Move and Microsoft Kinect. Numerous academic exergames have also been developed, such as Breakout for Two [21], Push’N’Pull [21], Frozen Treasure Hunter [33], and Swan Boat [1]. Most existing exergames provide at best limited physical output in response to actions occurring in the virtual world. Mueller et al. suggest that in order for exergames to simulate real-world activity, they must include force feedback [21]. For instance, in the Breakout for Two game, players kick soccer balls at a projected wall display to break virtual bricks [21]. This configuration allows the player to feel her foot making contact with a ball. In Remote Impact, players feel the moment of impact when punching and kicking a projected image of their opponent [20]. However, these systems are based on augmented reality and offer no force feedback in response to action occurring in a virtual world.
20
T. Stach and T.C. Nicholas Graham
The majority of exergaming systems are based on virtual reality. For instance, in Heart Burn, players pedal an exercise bike in order to race their truck along a virtual track [28]. When racing a real car, a driver feels vibrations from rough terrain and forces from rapid turns at high velocity. In Heart Burn, these are represented visually through motion of the truck, but are not transmitted as physical sensations to the player. Similarly, in Swan Boat, players run on a treadmill and use arm motions to control their on-screen boat, but players receive no physical feedback when they collide with virtual objects [1]. As exergames require players to physically exert themselves, this lack of physical response to their actions is keenly felt. Haptic feedback conveys information to people via applied forces and/or vibrations. Haptic feedback may be tactile or kinesthetic [24]. Tactile feedback provides a sense of touch (e.g., texture and vibration), while kinesthetic feedback leads the user to perceive force (e.g., weight and resistance). Haptics have been used to improve interaction in a variety of contexts. For instance, haptic awareness in distributed groupware systems has been found to improve task performance and peoples’ sense of virtual presence [3, 25], and haptic warning systems integrated into automobile steering wheels have been found to decrease peoples’ reaction times [29]. Gamepads and joysticks often include hardware mechanisms to provide haptic feedback [7]. These have limited functionality and so can express only the most basic feedback. Gamepads typically support a vibration mode that can be turned on or off; “force feedback” joysticks dynamically adjust the resistance felt when moving the joystick. Most commercial exergaming equipment provides little or no haptic feedback; e.g., the tension setting on the PCGamerBike Mini is not under programmable control, and the Wii Remote provides a limited vibration mode similar to traditional gamepads. Haptic feedback has nevertheless been explored in some exergames. In Airkanoid, players swing hand-held paddles to hit virtual balls [9]; the player’s paddle vibrates when it hits an on-screen ball. The Push’N’Pull exergame uses a Powergrid Fitness Killowatt controller to provide resistance when players interact with virtual items [21]. Virku allows players to explore a virtual environment by pedaling on an exercise bike [17]. Tension on the bike pedals increases when a player climbs a virtual slope, and decreases as the player descends. There has been to-date little evaluation of the effectiveness of haptic feedback in exergames. One exception is Morelli and colleagues’ work on games for the visually impaired. The VI-Tennis exergame provides audio and tactile cues to notify players when to swing a Wii Remote to hit a virtual ball [18]. Haptic feedback was found to improve the in-game performance of visually-impaired players, and was preferred to audio feedback alone.
3 Applying Haptic Feedback to Exergames Exercise brings people into intimate sensory contact with the physical world around them. A cyclist riding down a hill feels the vibration of the road in her hands, the sideto-side movement of the bike in her core muscles, and the wind in her face. It is important for computer-mediated exercise to preserve this physicality of real-world
Exploring Haptic Feedback in Exergames
21
activity in order to provide an engaging and immersive experience. Although it is widely assumed that haptics addresses this issue, to-date there has been surprisingly little research to validate this assumption. We propose that haptic feedback can help to address three important areas of exergame design: balancing of group exercise so that people of different physical abilities can play together, guidance of players to safe and healthy levels of interaction, and provision of an immersive experience to increase the enjoyment of physical activity. These three areas capture the fundamental promise of exergames: that computer-mediation can overcome the significant barrier in traditional exercise of not having people to work out with [13], that playing can be beneficial (e.g., meeting recommended levels for physical activity), and that game tasks can absorb players’ focus and distract them from physical exertion [31]. In this section, we discuss how haptics can contribute to these three design questions, and present three novel exergames to illustrate example design solutions. In the following section, we present the results of a study showing the effectiveness and pitfalls of adding haptics to exergames. 3.1 Balancing Group Exercise Grouping has been shown to be an important motivating factor in exercise [4]. However, it can be difficult for people of disparate abilities to exercise together [2]. Computer mediation can help balance exercise so less fit people can maintain a sense of competitiveness. We argue that haptics are a useful tool in such balancing strategies; for instance, haptics can be used to increase the physical workload of a winning player and decrease the demand on a losing player. Three main approaches are used for balancing players of different abilities: ladders and rankings, asymmetric roles, and dynamic difficulty adjustment. Many online games use ranking systems to group players of similar skill. This approach has the disadvantage of making it hard for friends and family to play together [33]. Another approach to balancing is to assign players different in-game roles. For example, in the Frozen Treasure Hunter exergame, one player uses an exercise bike to move an avatar, while a second player swings a Wii Remote to swat virtual snowballs [33]. Dynamic difficulty adjustment is a good fit with competitive games. This approach adjusts game parameters in real-time based on the player’s performance [14]. Dynamic difficulty adjustment is used in the Age Invaders exergame to allow young and old players to play together [16]; parameters such as response time are dynamically adjusted based on the players’ ages. The Heart Burn exergame and the Jogging over a Distance system provide dynamic difficulty adjustment by basing peoples’ performance on their heart rate rather than their raw exercise power [19, 28]. An advantage of dynamic difficulty adjustment is that people need not be aware that balancing is taking place. We propose that haptics can complement existing approaches of using age or heart rate to dynamically adjust difficulty in exergames. The core idea is to use kinesthetic feedback to increase the game’s difficulty for the stronger player. Sinclair et al. [27] have suggested a similar idea; but did not implement or test it. We illustrate this idea through the Truck Pull exergame, presented below.
22
T. Stach and T.C. Niccholas Graham
Fig. 1. The Truck Pull gam me. Player 1 (red) has taken the lead with 6 seconds remainingg
Truck Pull: In Truck Pull, two players engage in a virtual tug-of-war by pedalingg on their respective stationary bikes. Each player is represented by an on-screen truuck. Both players’ trucks are co onnected by a big chain (see figure 1). The trucks movee in the direction of the player who w is pedaling with the higher cadence. After one minuute, the player who has moved the t trucks closer to her side of the screen is the winner. The pedal tension of a player’s p exercise bike increases as she moves the truckks to her side of the screen, and decreases as the trucks move to the opponent’s side off the screen. This has the effect that t the winning player must work significantly harder tthan the losing player. While th he stronger person is still more likely to win, the happtic feedback keeps victories frrom becoming overly lopsided. As we shall see, the happtic version of Truck Pull leads to more balanced games, and players strongly prefer it to a non-haptic version. 3.2 Guiding Players to Sa afe and Healthy Interaction Aerobic exercise is often prescribed at a specific intensity level, both to provvide athletes with optimal trainin ng [30] and to avoid over-exertion in the less athletic [111]. Training in indoor cycling g classes (or Spinning) is often based on maintaininng a certain pedal cadence (e.g.,, 80 RPM). Runners often attempt to maintain a particuular racing pace (e.g., 6 minutees/km). Taking a cue from ergometers in fitness studdios, exergames can help users maintain a desired pace by providing displays show wing cadence, heart rate or pow wer output. Such displays risk focusing players on the physical activity rather thaan the game, possibly reducing immersion in the gam me’s world and activities. We propose p that haptic feedback can subtly guide exergaame players to healthier and safer interaction without requiring immersion-breakking performance displays. Current exergames do a poor job of guiding players towards appropriate levelss of exertion. For instance, peop ple playing Wii Sports do not achieve the exercise intennsity recommended by the American A College of Sports Medicine (ACSM) [112]. Alternatively, in more vigo orous exergames, such as Heart Burn [28] or Swan B Boat
Exploring Haptic Feedback in Exergames
23
[1], it is possible for play yers to overexert themselves. In-game cues and dynam mic difficulty adjustment have been proposed for regulating players’ levels of exertiion. For example, Ijsselsteijn ett al. [15] developed an exergame with a virtual coach. T The coach encourages a player to speed up if her heart rate is too low, or slow downn if heart rate is too high. Sim milarly, exergames have been developed to adjust gaame difficulty based on a playerr’s heart rate [6]. In this case, game difficulty increasees if heart rate is low, or becom mes easier as heart rate increases. While promising, the efficacy of these approachees have not been tested. Haptic cues integrated in nto exercise equipment have been found to be effectivee at signaling people to adjust exercise e intensity [10]. The use of haptic cues in exergam mes to guide players to effective levels of exercise has not been explored. In orderr to illustrate this approach, we created the Balloon Burst exergame, described below.
Fig. 2. The Balloon Burst gaame. Player 1 (red) has shot a balloon for 25 points; Playeer 2 (green) has just shot two balloo ons for 50 points; 104 seconds remain in the game.
Balloon Burst: In Balloon Burst, players attempt to shoot as many on-screen ballooons as possible (see figure 2). A player is awarded 25 points for each balloon hit. At the end of two minutes, the plaayer who has accumulated the most points is declared the winner. A wireless Xbox 360 3 gamepad and a recumbent exercise bike are usedd to interface with the game. Pressing P the “A” button on the gamepad fires the playeer’s virtual gun. The speed at which w a player pedals on a stationary bike determines hhow often balloons are launch hed on screen. The faster the player pedals, the m more frequently balloons are laun nched, and therefore the easier it is to score points. In order to guide playerss to a safe level of exertion, a maximum pedal speed iss set in the game. While the ch hoice of maximum pedal speed is dependent on the uuser population and their exercisse goals, for our experiment we chose an upper limit off 80 RPM based on recommend dations for optimal bicycle training [5]. If a player exceeeds this maximum pedal caden nce, no balloons are launched. Balloons begin to lauunch again once pedal speed drops back below the maximum. It is therefore in the playeer’s best interest to pedal as clo ose to the maximum cadence as possible without exceedding it. Players receive tactile feedback f indicating their cadence through the hand-hheld
24
T. Stach and T.C. Niccholas Graham
gamepad. The gamepad produces p a pulsing sensation every time a balloonn is launched; therefore, the fasster a player pedals, the more rapidly the pulses occurr. If the maximum pedal cadencce is exceeded, the gamepad vibrates continuously, givving the player a cue to slow dow wn. In Balloon Burst, this haptic h feedback allows players to focus on gameplay (i.e., trying to hit the balloons) rather than on a visual cadence monitor. The feedbackk is subtle, and therefore does not break the immersion of the game. Our experimenntal results will show that hapticcs are as effective as visual feedback for maintaining peedal cadence. However, the majo ority of players prefer the precision of visual feedback, and have trouble interpreting thee gradations of haptic feedback. 3.3 Increasing Presence Exergames aim to make exercise more enjoyable by shifting peoples’ focuss to gameplay, rather than the exertion e of their physical activity. One way to achieve this goal is by increasing playeers’ sense of presence in the game environment. Preseence refers to the feeling of beeing in the virtual environment rather than the physical location [32]. Presence forrmalizes the more colloquial concept of “immersion”” in virtual worlds. b shown to influence presence in virtual environmeents Physical feedback has been [24]. A lack of haptic feedback can decrease presence and task performancee in collaborative groupware [25 5]. It has been proposed that force-feedback is importannt in exergames to simulate real-world activity. As we have discussed, several exergam mes do provide physical feedb back to actions occurring on-screen. However, to our knowledge, the effect of haptic h feedback on presence in exergames has not bbeen experimentally investigated d. Currently, it is unclear if the benefits of hapticss in desktop environments [25]] are transferable to exergames. To help fill this gap, we have created the Pedal Ra ace exergame. In this game, haptic feedback is usedd to enhance the experience of riding r over different types of virtual terrain.
Fig. 3. The Pedal Race game. Player 1 (red trike) has completed 1 lap and is moving acrosss the c 2 laps and is in the mud. ice; Player 2 (green trike) has completed
Exploring Haptic Feedback in Exergames
25
Pedal Race: In Pedal Race, players race virtual tricycles around a circular track. The first player to complete three laps wins the game. A player powers her tricycle by pedaling on a recumbent exercise bike. The faster a player pedals, the faster her tricycle will move. Steering of the tricycles is handled automatically by the game. The virtual race track is made up of three types of terrain: asphalt, mud, and ice (see figure 3). The pedal tension of the exercise bike changes according the terrain; mud has twice as much tension as asphalt, and ice has half the tension. The effect of this feedback is that when players enter the mud, they need to pedal harder to maintain the same speed. When they go over ice, the sudden reduction in tension evokes the feeling of spinning wheels. As we shall see, this haptic connection between the visual representation of terrain type and the change of pedal resistance leads to an increased sense of presence and increased enjoyment in the game. 3.4 Summary In this section, we have discussed three ways that haptic feedback can enhance exergames. Although haptics have been included in exergames, empirical evidence supporting the benefits of haptic feedback is currently lacking. Through our three novel games, Truck Pull, Balloon Burst, and Pedal Race, we have shown how haptic feedback can be used to help balance games for people of disparate physical abilities, to help guide players to effective and healthy levels of exercise, and to enhance presence in exergames. In the next section, we discuss the results of our experimental investigation of the effectiveness of these techniques. These results illustrate advantages and pitfalls in the use of haptics in exergames.
4 Evaluation We performed a user study to determine the effectiveness of haptic feedback in our Truck Pull, Balloon Burst, and Pedal Race games. The study was comprised of three experiments designed to explore how well these games balance group exercise, guide players to safe and healthy interaction, and increase virtual presence. Our study design is described below. 4.1 Participants Twelve pairs of participants, 24 people in total, took part in the study. All participants were recruited from the university community. There were fifteen males and nine females, ranging in age from 18 to 42, with a mean age of 23. The majority of the participants reported playing video games at least a few times a month. Three people stated they performed exercise rarely or never, five stated they exercised once or twice per week, and 16 reported exercising at least three times a week. Most participating pairs had known each other for at least 2 months; however, 4 people did not know their partner prior to the study. Participants were chosen based on their ability to play video games using a gamepad, and to operate a recumbent exercise bicycle. We used the Physical Activity Readiness Questionnaire (PAR-Q) to screen participants whose health made it potentially inadvisable to perform exercise [26].
26
T. Stach and T.C. Niccholas Graham
4.2 Equipment The three exergames were designed d as distributed two-player games. The games w were developed in C# using XNA A 3.1. To interface with the games, players used a Tunnturi E6R exercise bike attached to a Windows PC via a serial connection. In Ballooon mes Burst, additional input was captured using a wireless Xbox 360 controller. The gam were projected onto a largee screen measuring 6’ x 8’ (see figure 4). Game music and sound effects were transmitted through 5.1 audio speakers. Two sets of equipm ment pants to be located in different rooms. were used, allowing particip 4.3 Method Before completing the stud dy trials, each participant was asked to complete a seriess of background questionnaires. Each pair of participants was then split into the two roooms d above. Participants played both a haptic annd a containing the equipment described non-haptic condition for each of the three exergames, for a total of six trials. T The d the corresponding conditions were randomized. Prior to order of the exergames and each trial, an experimenterr described the game and condition to be completed. T The game computer logged relev vant game data such as pedal cadence and game score. The study was designed to investigate the effectiveness of haptic feedbackk in exergames, not the long-terrm exercise efficacy of the games. Therefore, each trial w was short, lasting between 1 an nd 5 minutes. Participants were given time between eeach trial to cool down and retu urn to a resting state. Following the completion of all six trials, the two participants were brought back into one room for a post-experim ment interview and debriefing.
Fig. 4. Equipment setup. Left: L player pedaling on bike; Right: player's view of the gamee
5 Experiment 1: Bala ancing Group Exercise Although grouping can be a motivating factor for physical activity, it can be difficcult for people of different capaabilities to exercise together. We used Truck Pull to test our
Exploring Haptic Feedback in Exergames
27
hypothesis that haptic feedback can help balance competition in exergames, and therefore enhance players’ enjoyment. Kinesthetic feedback is a natural fit for balancing exergames specifically, since haptic force can exert more strain on peoples’ muscles. The control version of Truck Pull was identical to the original, but where the haptic feedback was removed (i.e., pedal tension was constant). 5.1 Experimental Method Half of the participants completed the haptic trial first, while the other half played the control version first. Each game of Truck Pull lasted for one minute. After completing both versions of the exergame, participants were asked how they perceived the balance of competition, and to state their preference between the versions. The game was instrumented to record the position of the middle point between the players’ trucks on one second intervals. This data allows game balance to be measured by the average distance (in pixels) of the center of the chain from the start line. A small average distance means that the game was well-balanced, with neither player maintaining a large advantage over the other. 5.2 Results For each pair of participants, we compared average distances to the start line for each condition using a paired-samples t-test. Average distances to the start line were significantly lower in the haptic case (M=39.98) than the control case (M=66.63) at the alpha=0.05 level: t(11)=2.41, p=0.035, d=0.98. The reported Cohen’s d value is a measure of effect size [8]. The reported value of 0.98 indicates a large effect. Participants were split between conditions when asked which version of Truck Pull gave them the best chance of winning (10/24 haptic, 9/24 control, and 5/24 no difference). However, a majority of participants felt the haptic version allowed for more equal competition (15/24 haptic, 7/24 control, and 2/24 no difference), and was the preferred case overall (15/24 haptic, 5/24 control, and 4/24 no difference). 5.3 Analysis In order to better support group exercise, exergames need to balance competition between players of disparate abilities. By adjusting the physical demands of the game based on a player’s in-game performance, exergames can allow for more balanced group exercise. Balance in the Truck Pull exergame is measured by the average distance between the trucks to the starting line. Our results indicate that competition was significantly more balanced in the haptic version than in the control version. Additionally, a strong majority of participants stated the haptic version allowed for more equal competition than the control game. In Truck Pull, ultimately the stronger player is more likely to win. However, the haptics lead to closer competition, reducing the likelihood of “blow-out” victories, and leading to greater enjoyment. When asked which version of Truck Pull they preferred, a strong majority of participants preferred the haptic case. In addition to the balancing effect, people found that haptic feedback added richness to the game. For instance, participants stated that the haptic version “requires more skill,” “seemed to make the game more strategic,” and “made the outcome more interesting.” Two participants suggested that the haptic feedback added more realism to the game. One person said “it seemed more realistic for tension to change” and that the change in pedal tension “more simulated real life.”
28
T. Stach and T.C. Nicholas Graham
The small minority of participants who favoured the control version of Truck Pull cited the lower physical effort required. Comments included that “pulling the opponent to the end was much easier” and “made it more worthwhile to pedal hard and take the lead because in the [haptic version] the lead is easily lost.” This experiment validates that haptic feedback is a useful tool for balancing exergames for people of disparate physical abilities.
6 Experiment 2: Guiding Players to Safe and Healthy Interaction We used the Balloon Burst exergame to investigate the hypothesis that haptic cues can be effective at guiding players to safe and healthy levels of exercise. A control version of Balloon Burst was created in which the haptic representation of pedal cadence was removed, and cadence was instead displayed as a number on the screen. The games were instrumented to record cadence information at one-second intervals. 6.1 Experimental Method Half of the participants played the haptic version first, and the other half started with the control version. Each game of Balloon Burst lasted for two minutes. An engagement questionnaire [22] was used to measure the effects of the intensity cues on players’ gameplay experience. (Questions pertaining to attention/flow and endurability were asked after each trial; questions on esthetics, novelty, and usability were completed after both versions of the game had been played.) After playing both versions, players were asked their preference between the versions, and which condition they believed allowed them to better maintain constant pedal cadence. 6.2 Results To compare the effectiveness of the two forms of intensity cues at guiding players to desired exercise intensity, we compared the mean cadence for each condition using a paired-samples t-test. The mean cadence was slightly higher in the control case (M=66.73) than in the haptic case (M=66.06), but this difference was not significant at the alpha=0.05 level: t(23)=0.41, p=0.69, d=0.06. Post hoc analysis reveals that power was low (0.40). Game scores for each version of Balloon Burst were compared. The mean scores were higher in the control case (M=2,178.1) than the haptic version (M=1,983.3). However, this difference was not significant at the alpha=0.05 level: t(23)=1.650, p=0.113, d=0.4. Post hoc analysis reveals that power was low (0.264). Questionnaire results show that players preferred the control version of Balloon Burst (12/24 control, 8/24 haptic, and 4 no difference) and perceived that it allowed them to more easily maintain constant pedal cadence than the haptic game (18/24 control, 3/24 haptic, and 3/24 no difference). Scores from the engagement questionnaire were significantly higher for the control case (M=193.50) than the haptic version (M=184.20) at the alpha=0.05 level, t(23)=4.76, p<0.001, d=0.33. 6.3 Analysis Questionnaire results show that players preferred the visual display of exercise intensity, felt it provided a more engaging experience, and allowed people to better
Exploring Haptic Feedback in Exergames
29
maintain a constant pedal cadence than the haptic version. However, data logs from the game sessions demonstrate that there was no significant difference in pedal cadence or scores between the versions of Balloon Burst. Power was low for both the cadence and score comparisons, indicating the possibility of type 2 errors, but the standardized differences between the means were small enough that even had significance been obtained, the difference would be inconsequential to gameplay. Player comments suggest that the visual display of exercise intensity was preferred due to the precision it offered. Participants stated that “seeing the numbers lets you know how close you are to 80RPM,” and “with RPM displayed on screen, I can see if I have reached maximum RPM, therefore knowing my progress.” Alternatively, some players found the haptic cues distracting because they had difficulty interpreting their meaning. Comments included that “the [haptic feedback] mapped to an otherwise unknown threshold. This was distracting and redundant rather than informative.” and the visual cues were “less distracting from the [haptic feedback]… [which] allowed me to focus better.” In cases where participants preferred the haptic feedback to the visual feedback, people felt it allowed them to focus more on the game task. For instance, comments included “it was easier to focus on hitting the balloons [with the haptic feedback],” and “balloons were easier to keep track of [with haptic feedback].” This suggests a possible advantage of using the haptic channel to convey intensity information, freeing the visual channel for playing the game. The central lesson from these findings is that people found it difficult to interpret the haptic cues, and therefore preferred the precise visual representation. Interestingly, the pedal cadence and score between the conditions were not significantly different, suggesting that the players’ underrated their ability to interpret the haptic feedback. We believe that two issues contributed to players’ dislike of the haptic interface. First, the haptic device itself was imprecise, making it difficult to distinguish between different levels of haptic intensity. Second, the linkage between cadence and feedback (vibration in the hands) was not obvious, leading to a disconnection between the virtual world and physical feedback.
7 Experiment 3: Increasing Presence We used the Pedal Race exergame to test our hypothesis that haptic feedback can enhance peoples’ sense of presence in a virtual game world. Presence is defined as “the subjective experience of being in one place or environment, even when one is physically situated in another” [32]. Specifically, we hypothesize that playing Pedal Race leads to higher presence than playing the non-haptic version of the game. In the non-haptic (control) version, players did not receive haptic feedback when changing terrain, but speed did change (double over ice, halved over mud.) 7.1 Experimental Method Half of the participants played the haptic version of Pedal Race first, while the other half played the control version first. In each trial, players completed a total of three ingame laps, regardless of whether they won or lost. Overall, the races lasted an average of 89 seconds.
30
T. Stach and T.C. Nicholas Graham
An abbreviated version of the Presence Questionnaire (PQ) was used to measure peoples’ sense of virtual presence [32]. This version of the PQ included questions most relevant to the study. Questions from the subscales of involvement and adaptation/immersion were selected for their focus on interaction and movement within a virtual environment. After completing each trial, participants completed the abbreviated PQ. Once both the haptic and control cases had been performed, participants were asked to compare the sense of realism between the versions, and express which version they preferred. 7.2 Results To determine whether haptic feedback in exergames offers higher virtual presence than the non-haptic control case, we compared mean PQ responses for each condition using a paired-samples t-test. Mean scores were significantly higher in the haptic case (M=5.02) than the control version (M=4.53) at the alpha=0.05 level: t(23)=3.25 p=0.004, d=0.44. When asked which version of Pedal Race felt more realistic, the majority of participants chose the haptic case: 21/24 chose haptic, 2/24 chose control, and 1/24 chose “no difference”. The haptic version of Pedal Race was also selected as the most preferred by users (17/24 haptic, 4/24 control, and 3/24 no difference). 7.3 Analysis The goal of exergames is to provide a captivating experience in order make exercise more palatable. It is therefore desirable to increase players’ presence in the game’s virtual world. We would expect that including haptic feedback to player’s actions increases this sense of presence. Results from the PQ suggest that players’ sense of presence was indeed higher in the haptic version of Pedal Race. This indicates that game designers should seriously consider haptics as a way of increasing players’ involvement in the game. Participants overwhelmingly responded that the haptic feedback made the game feel more realistic and preferred the haptic version of the game. These results indicate that it is possible to increase presence and realism in exergames by providing haptic feedback in response to in-game activity. Post-experiment interviews indicated that most people preferred the haptic version of Pedal Race because of the improved realism it offered. Participants stated that the haptic game felt “how I imagine it would feel as you pedal through mud and ice” and that the “physics [of the exercise bike] made it feel more lifelike.” In the few cases where players preferred the control version of Pedal Race, the tension-based feedback of the haptic game was seen as more physically challenging. One participant commented that the control case “put less strain physically than the version with the change in pedal tension.” Another participant stated that “the change in tension made it more tiring” and “the [tension over the ice terrain] felt really weird because it was zero tension.” Although a small minority did not like the increased effort required by the kinesthetic feedback, on balance it had a significantly positive impact on virtual presence. While this experiment involved a single game, it seems reasonable to assume that similar results would be found in other exergames where kinesthetic feedback can be used to enhance the primarily visual feedback presented by the game.
Exploring Haptic Feedback in Exergames
31
8 Discussion Each of our three exergames addresses an important problem in the design of exergames: balancing group exercise to allow people of different physical abilities to play together, guiding players to safe and healthy interaction, and providing an immersive experience to increase enjoyment of exercise. This indicates the diverse ways in which haptics can be applied to exergames. Because of the inherently physical nature of exergames, haptics are a natural fit, engaging a third sense in addition to traditional audio and visual feedback. These advantages make haptic feedback a valuable tool for exergame designers. Players preferred the haptic version over the control version in two of our three games (Truck Pull, and Pedal Race). This suggests that the use of haptics increased the enjoyment of the games. Both use haptic feedback to report a physical phenomenon occurring in the game: in Truck Pull, pedal resistance is mapped to difficulty in pulling; in Pedal Race, modulating pedal tension allows players to feel variations in the virtual terrain. Questionnaire results from Pedal Race show that haptics improved peoples’ sense of virtual presence. Similarly, players’ comments indicate that people experienced greater presence in the haptic version of Truck Pull. In contrast, the haptic feedback in Balloon Burst is less obviously associated with a physical phenomenon – players cannot apply their experience in the real world to explain how cadence should map to vibration of a hand-held controller. Although players were able to maintain appropriate pedal cadence, engagement scores and player feedback suggest that haptics did not enhance presence as intended. This hints that in order for haptics to improve virtual presence, the feedback should obviously correspond to what a player sees and hears in the game. It is interesting to note that while haptic feedback was overwhelmingly preferred in Truck Pull and Pedal Race, this preference was not unanimous. These games provided kinesthetic feedback by adjusting pedal tension on the exercise bike, in some cases increasing the effort required from the player. Some participants highlighted this increase in physical difficulty as a negative, and reported preference for the nonhaptic versions of the games. This shows that resistance-based feedback should not be so drastic as to frustrate players with the amount of physical effort required to win. 8.1 Limitations Haptic feedback in our three exergames was provided through an exercise bike or a hand-held gamepad. However, haptics may be difficult to implement in games where there is no physical contact with hardware. For instance, camera-based systems, such as the PlayStation Eye and Microsoft Kinect, do not currently offer a way of providing haptic feedback. Truck Pull and Pedal Race used kinesthetic feedback (sensation of force and weight), and the Balloon Burst game utilized tactile feedback (sense of touch). The type of haptic feedback chosen may not be interchangeable. In Pedal Race, players receive feedback on the terrain they are crossing in the virtual environment through kinesthetic haptics. It would be difficult to represent more tactile sensations, such as a “bumpy” terrain, using resistance on the exercise bike. Similarly, in Truck Pull it is possible to change the physical effort required from players using pedal resistance.
32
T. Stach and T.C. Nicholas Graham
Achieving the same effect with tactile haptic feedback would be challenging. Alternatively, the Ballon Burst game allowed us to explore the application of tactile feedback to exergames. Several commercial active gaming devices do not support kinesthetic feedback, but have the ability to include tactile feedback. For example, the Wii Remote and the PlayStation Move accessory support rumble feedback. 8.2 Design Issues The main lesson from this work is that designers should consider haptic feedback in exergames. Our exergames demonstrate that integrating haptics is feasible and can enhance interaction. The improved immersion seen in our third experiment, and the general preference for two of our three haptic games, provides support for increased use of haptic feedback in general. Our experience results in a set of issues for designers to consider: Haptics can serve diverse purposes in exergames. Haptic feedback can help to address several critical aspects in the design of exergames. Specifically, we demonstrated that haptics can improve balance in multiplayer exergames, help guide people to healthy and safe levels of interaction, and increase virtual presence. Designers should consider these, and possibly other, applications of haptic feedback in order to improve the richness of interactions in exergames. Haptic feedback should have a clear link to physical situations in the game environment. Our results from experiments with Truck Pull and Pedal Race show that haptics need to be consistent with physical actions occurring in the on-screen world. Designers should attempt to make logical mappings from observable events to haptic feedback in order in increase immersion. Our Balloon Burst game shows the pitfall of using overly abstract mappings that players find hard to interpret. Strategies for integrating haptics are limited by the capabilities of current hardware. Designers of exergames need to recognize the limitations in haptic feedback provided by existing gaming platforms. The Wii provides only tactile feedback through vibration in the Wii Remote controller. The otherwise excellent PCGamerBike Mini bicycle peripheral provides no programmatic control over its tension setting. Microsoft’s Kinect platform is vision-based, and therefore can provide no haptic feedback at all. We encourage developers of active input peripherals to consider the importance of haptics in exergames. Even when haptic feedback is possible, it may not provide sufficient resolution for all purposes. As we saw, the Xbox gamepad has limited vibration settings. In Balloon Burst, we simulated intensity by pulsing the vibrations. As our experiment showed, players reported difficulty in distinguishing between these speeds of the Xbox gamepad. Designers should therefore perform user testing with haptic strategies in order to ensure that they work with all intended hardware platforms. 8.3 Future Work All three of the exergames used in this work were based on exercise bikes. As the variety of exergaming peripherals increases, it will be important to support haptic feedback across different hardware. For instance, kinesthetic feedback could be achieved by changing the incline on a treadmill, or adjusting the resistance on a
Exploring Haptic Feedback in Exergames
33
rowing machine. This will allow for better portability of haptic exergames across different platforms. Allowing for customization of haptic feedback in exergames could improve its effectiveness. Physical strength varies from person to person, and therefore kinesthetic feedback may need to be tailored to each player. In Pedal Race, some people found the changes in tension not substantial enough, while others found it too drastic. Additionally, other forms of haptics could be investigated, such as feedback through temperature or air movement. We were interested in investigating the effectiveness of haptic feedback in exergames, not the long-term health benefits of the games. In order to test our three hypotheses without exhausting the participants, each trial session was short. Further experiments would be interesting to measure players’ view of haptic feedback over long-term play. We have not yet explored the combination of our three applications of haptic feedback. In future, it will be important to examine whether haptics can simultaneously provide effective multiplayer balancing, guide people to healthy and safer interaction in exergames, and increase virtual presence.
9 Conclusion Exergames currently lack the physicality of real-world sports and exercise. We propose that haptics – feedback related to peoples’ sense of force and touch – can provide richer exergaming experiences. In our study, we found that haptic feedback can be applied to three areas relevant to exergames: supporting group exercise among players of different fitness levels, guiding people to safe and healthy interaction, and increasing players’ sense of presence in game environments. Our experiments showed both advantages and pitfalls in applying haptics, leading to a set of principles to assist designers in applying haptic feedback to exergames. Acknowledgements. We gratefully acknowledge Eril Berkok for his assistance in running the experiments. This work was supported by the Natural Science and Engineering Research Council of Canada, and the GRAND Network of Centers of Excellence.
References 1. Ahn, M., Kwon, S., Park, B., Cho, K., Choe, S.P., Hwang, I., Jang, H., Park, J., Rhee, Y., Song, J.: Running or Gaming. In: Proc. ACE, pp. 345–348 (2009) 2. Allender, S., Cowburn, G., Foster, C.: Understanding Participation in Sport and Physical Activity Among Children and Adults: a Review of Qualitative Studies. Health Education Research 21(6), 826–835 (2006) 3. Basdogan, C., Ho, C., Srinivasan, M.A., Slater, M.: An Experimental Study on the Role of Touch in Shared Virtual Environments. TOCHI 7(4), 443–460 (2000) 4. Beauchamp, M.R., Carron, A.V., McCutcheon, S., Harper, O.: Older Adults’ Preferences for Exercising Alone Versus in Groups: Considering Contextual Congruence. Annals of Behavioral Medicine 33, 200–206 (2007)
34
T. Stach and T.C. Nicholas Graham
5. Brisswalter, J., Hausswirth, C., Smith, D., Vercruyssen, F., Vallier, J.M.: Energy Optimal Cadence vs. Freely-Chosen Cadence During Cycling: Effect of Exercise Duration. Int. J. Sports Medicine 22(1), 60–64 (2000) 6. Buttussi, F., Chittaro, L., Ranon, R., Verona, A.: Adaptation of graphics and gameplay in fitness games by exploiting motion and physiological sensors. In: Butz, A., Fisher, B., Krüger, A., Olivier, P., Owada, S. (eds.) SG 2007. LNCS, vol. 4569, pp. 85–96. Springer, Heidelberg (2007) 7. Chang, D.: Haptics: Gaming’s New Sensation. Computer 35(8), 84–86 (2002) 8. Cohen, J.: Statistical Power Analysis for Behavioral Sciences, 2nd edn. Lawrence Erlbaum Associates, Mahwah (1988) 9. Faust, M., Yoo, Y.: Haptic Feedback in Pervasive Games. In: Proc. PerGames (2006) 10. Ferber, A., Peshkin, M., Colgate, J.E.: Using Kinesthetic and Tactile Cues to Maintain Exercise Intensity. IEEE Transactions on Haptics, 224–235 (2009) 11. Foster, C., Porcari, J.P., Battista, A., Udermann, B., Wright, G., Lucia, A.: The Risk in Exercise Training. American Journal of Lifestyle Medicine 2(4), 279–284 (2008) 12. Graves, L., Stratton, G., Ridgers, N.D., Cable, N.T.: Comparison of Energy Expenditure in Adolescents when Playing New Generation and Sedentary Computer Games: Cross Sectional Study. British Medical Journal 335, 1282–1284 (2007) 13. Hohepa, M., Schofield, G., Kolt, G.S.: Physical Activity: What do High School Students Think? Journal of Adolescent Health 39, 328–336 (2006) 14. Hunicke, R.: The Case for Dynamic Difficulty Adjustment in Games. In: Proc. ACE, pp. 429–433 (2005) 15. Ijsselsteijn, W., de Kort, Y., Westernik, J., de Jager, M., Bonants, R.: Fun and Sports: Enhancing the Home Fitness Experience. In: Rauterberg, M. (ed.) ICEC 2004. LNCS, vol. 3166, pp. 73–81. Springer, Heidelberg (2004) 16. Khoo, E.T., Lee, S.P., Cheok, A.D., Kodagoda, S., Zhou, Y., Toh, G.S.: Age Invaders: Social and Physical Inter-Generational Family Entertainment. In: Proc. CHI, pp. 243–246 (2006) 17. Mokka, S., Väätänen, A., Heinilä, J., Välkkynen, P.: Fitness Computer Game with a Bodily User Interface. In: Proc. ICEC, pp. 1–3 (2003) 18. Morelli, T., Foley, J., Columna, L., Lieberman, L., Folmer, E.: VI-Tennis: A Vibrotactile/Audio Exergame for Players who are Visually Impaired. In: Proc. FDG, pp. 147–154 (2010) 19. Mueller, F., Vetere, F., Gibbs, M.R., Agamanolis, S., Sheridan, J.: Jogging Over a Distance: The Influence of Design in Parallel Exertion Games. In: Proc. Siggraph (2010) 20. Mueller, F., Agamanolis, S., Gibbs, M., Vetere, F.: Remote Impact: Shadowboxing over a Distance. In: Proc. CHI, pp. 3531–3532 (2009) 21. Mueller, F., Stevens, G., Thorogood, A., O’Brien, S., Volker, W.: Sports Over a Distance. Personal and Ubiquitous Computing 11(8), 633–645 (2007) 22. O’Brien, H.L., Toms, E.G., Kelloway, E.K., Kelley, E.: Developing and Evaluating a Reliable Measure of User Engagement. In: Proc. American Society for Information Science and Technology, pp. 1–10 (2008) 23. Pollock, M., Gaesser, G., Butcher, J., Despres, J., Dishman, R., Franklin, B., Garber, C.: The Recommended Quantity and Quality of Exercise for Developing and Maintaining Cardiorespiratory and Muscular Fitness, and Flexibility in Healthy Adults. Med. & Sci. in Sports and Exercise 30(6), 975–991 (1998) 24. Ramsamy, P., Haffegee, A., Jamieson, R., Alexandrov, V.: Using Haptics to Improve Immersion in Virtual Environments. In: Proc. ICCS, pp. 603–609 (2006)
Exploring Haptic Feedback in Exergames
35
25. Sallnäs, E., Rassmus-Gröhn, K., Sjöström, C.: Supporting Presence in Collaborative Environments by Haptic Force Feedback. Trans. Comput. -Hum. Interact. 7(4), 461–476 (2000) 26. Shephard, R.J.: PAR-Q Canadian Home Fitness Test and Exercise Alternatives. Sports Medicine 5, 185–195 (1988) 27. Sinclair, J., Hingston, P., Masek, M., Nosaka, K.: Using a Virtual Body to Aid in Exergaming System Development. Comput. Graph. Appl. 29(2), 39–48 (2009) 28. Stach, T., Graham, T.C.N., Yim, J., Rhodes, R.E.: Heart Rate Control of Exercise Video Games. In: Proc. GI, pp. 125–132 (2009) 29. Suzuki, K., Jansson, H.: An Analysis of Driver’s Steering Behaviour During Auditory or Haptic Warnings for the Designing of Lane Departure Warning System. JSAE Review 24(1), 65–70 (2003) 30. Swain, D.P., Franklin, B.A.: Comparison of Cardioprotective Benefits of Vigorous Versus Moderate Intensity Aerobic Exercise. The American Journal of Cardiology 97(1), 141– 147 (2006) 31. Warburton, D.E.R., Bredin, S.S.D., Horita, L.T.L., Zbogar, D., Scott, J.M., Esch, B.T.A., Rhodes, R.E.: The Health Benefits of Interactive Video Game Exercise. Applied Physiology, Nutrition, and Metabolism 32(3), 655–663 (2007) 32. Witmer, B.G., Singer, M.J.: Measuring Presence in Virtual Environments: A Presence Questionnaire. Presence: Teleoperators and Virtual Environments 7(3), 225–240 (1998) 33. Yim, J., Graham, T.C.N.: Using Games to Increase Exercise Motivation. In: Proc. Future Play, pp. 166–173 (2007)
Identifying Barriers to Effective User Interaction with Rehabilitation Tools in the Home Stephen Uzor, Lynne Baillie, Dawn Skelton, and Fiona Fairlie Multimodal Interaction Research Group, School of Engineering and Computing, Glasgow Caledonian University. 70, Cowcaddens Road, Glasgow, UK {Stephen.Uzor,L.Baillie,Dawn.Skelton,F.Fairlie}@gcu.ac.uk Abstract. This paper presents the results from a user workshop that was undertaken to investigate the relationship between the nature of current home rehabilitation tools and the motivation to exercise. We also present a method of visual feedback which we hope will be an effective tool for informing users regarding important clinical measures associated with their recovery. Older adults over the age of 60 were involved in the study. The findings from the user workshop suggest that the relatively passive nature of current rehabilitation materials is less than ideal for sustaining motivation to exercise. Furthermore, our results suggest that visual feedback and more interactive methods can play an important role in engaging users in home rehabilitation. Keywords: falls prevention; user interaction; rehabilitation; visual feedback; user workshop.
1 Introduction Rehabilitation is often used to improve a person’s ability to cope with physical activities in everyday life. In musculoskeletal disorders associated with ageing, rehabilitation usually involves exercises which are designed to increase functional capacity; elevate confidence and encourage independence in older adults [17]. In rehabilitation for falls (and similar conditions affecting strength and balance performance), patients are often given exercises to perform in the home. However, evidence suggests that there is a problem with uptake and adherence to these exercises in the home setting [15]. This could be attributed to a lack of motivation on the user’s part [12]. To inform the design of more useful home rehabilitation tools, it is necessary to understand the limitations of the current tools and explore ways in which technology can be used to encourage users to engage in rehabilitation. This paper describes the proceedings of a user workshop – facilitated by HCI and health researchers – to investigate the effects of user interaction with current home rehabilitation materials on motivation to exercise.
Identifying Barriers to Effective User Interaction with Rehabilitation Tools in the Home
37
[16]. This system utilized a depth sensing camera that captured the user’s image and placed it in a virtual environment where the users could interact with virtual objects. However, this study was limited to the laboratory setting as the equipment required was expensive and bulky. The study by [18] revealed how video-conferencing over an internet connection enabled users at home to attend a virtual exercise class with other users during pulmonary rehabilitation. The premise behind this study was that, because many patients lived far away from rehabilitation centers, patients (and in some cases, physiotherapists) were required to make frequent long journeys to the classes (or patients’ homes). The technology used in the study enabled active involvement in rehabilitation by the participants, who remained motivated. Previous research has also investigated the use of visual feedback of clinical data to aid rehabilitation in various settings. The findings from these studies propose that, not only can visual feedback of progress be used during rehabilitation to encourage users to exercise; but it can also inform users about important clinical measures associated with their recovery. MacDonald et al [11] investigated the use of visual feedback of biomechanical data to enhance older adults’ understanding of mobility problems that they faced in everyday life. Their visualizations included a ‘stick figure’ with joints that were illuminated – using green, amber and red colors – depending on the amount of functional demand placed on the users’ joints during certain tasks that they performed in everyday life. Their findings suggest that this visual feedback enabled clinicians to communicate effectively with the users about their therapy. In this way, the users felt more involved in their rehabilitation.
3 Methods In order to investigate how the current rehabilitation tools may affect the users’ motivation to exercise in the home, we held a user workshop with older adults. The aim of the workshop was to obtain a wide range of opinions on current rehabilitation tools used in the home, which could highlight possible factors affecting motivation and adherence to the programmes. The tools investigated in this study included illustrations and videos containing exercises for the rehabilitation of muscle strength and balance in older adults.
Fig. 1. An illustration of a strength training exercise used in falls rehabilitation in the home
38
S. Uzor et al.
One exercise – the “sit to stand” (Fig 1) exercise – was chosen as an example. This exercise was available to the participants in both paper and video forms. Participants were recruited through posters and flyers which were placed at different locations around Glasgow Caledonian University. We recruited 9 healthy older adults (8 females and 1 male) between 62 and 75 years of age. The sample size was kept small in order to allow all the participants the opportunity to contribute to each of the phases of the workshop in the two-hour session. All the participants were educated to at least high-school level. 3.3 User Workshop The workshop was divided into a number of phases which employed a range of qualitative techniques identified as effective – based on previous research [2][4] – in obtaining informative comments from participants. There were three groups of participants with a facilitator in each group. Notes were taken during the different phases by experts in the field of user interaction within the research team. The phases are detailed in Table 1. Table 1. User Workshop Phases Phase Discussion
Rationale To acquire a range of opinions regarding the problems with current rehabilitation tools
Procedure Participants were shown examples of illustrations and videos from current rehabilitation tools. They were asked to discuss the issues that users may face while using them in the home
Duration 10 minutes
Scenarios
To provide participants with everyday usage scenarios that they can comment on for their applicability to a user
User Journey through rehabilitation using two personas. Participants were asked to discuss the issues that the personas may face in the relevant scenarios
15 minutes
Through the use of scenario 3, the participants were asked to discuss the use of visual feedback, showing the results of a walking test at different stages of rehabilitation
10 minutes
Demonstratio To obtain feedback on n of the usefulness of Visualizations visualizations to highlight progress over time
3.3.1 Discussions on Past Experiences with Rehabilitation In this phase of the workshop, we investigated participants’ previous experiences with rehabilitation tools similar to the exercise illustrations and videos. It was important to do this, as previous experience with such tools may provide us with some clues regarding the effects of the use of these on their motivation to exercise.
Identifying Barriers to Effective User Interaction with Rehabilitation Tools in the Home
39
3.3.2 Scenarios Scenarios are useful in conceptualizing and managing the performance of tasks in certain environments which could improve the usefulness and usability of technology [13]. In this phase of the workshop, the users were provided with 2 personas designed to encapsulate the characteristics of typical users of existing rehabilitation tools. The personas, along with the key characteristics of each one are shown in Table 2 below. Table 2. Personas Personas Persona 1
Name Jack Bishop
Key Details 68 year old retired school coach; suffered stroke at 65; lacks confidence
Persona 2
Agnes Newman
82 year old retired office worker; suffered a minor hip fracture; has fear of falling
Each of the personas tackled the use of both of these types of media (illustrations and videos) separately. The participants were asked to discuss the issues and challenges facing each individual persona while they performed exercises in the home according to certain set scenarios. Three scenarios were used in the user workshop: • • •
Scenario 1 – Home rehabilitation session with illustrations of exercises. Scenario 2 – Home rehabilitation session with exercise videos. Scenario 3 – The use of visual feedback to show improvements in mobility during rehabilitation.
3.3.3 Feedback on Visualizations In this phase of the workshop, the participants were shown a simple animation (created by Glasgow School of Art [10]) of two stick figures walking side by side. One of the stick figures walked with an abnormal gait while the other walked normally. Both figures left a trail of footsteps behind that highlighted the ‘Stride Lengths’ (an important clinical measure associated with falls risk [6], [9]) of both stick figures. Through Scenario 3, the participants observed how this visual feedback could be shown to Agnes (persona 2) during occasional visits to her home by a physiotherapist. The participants were asked to discuss the usefulness of these visuals in showing the persona’s improvements in gait over time.
4 Findings The numerous comments made by the participants in the various phases of the workshop were analyzed and arranged into themes. The following sections highlight some of the main issues shared by the participants across all the groups. 4.1 Discussion Phase None of the participants had previously used such rehabilitation tools. Approximately half of the participants attended exercise classes in a gym. These participants drew
40
S. Uzor et al.
comparisons between the rehabilitation exercises in the home and the exercise classes in the gym. The main issues shared by the participants in this section were related to the passive nature of the home rehabilitation materials. There were comments that music could engage the users better in the video exercises. 4.2 Scenario Phase One of the main issues raised by the participants in this phase was the duration of exercise when using both the paper and video tools. The participants were aware of the fact that rehabilitation exercises needed to be performed for a minimum of 3 hours a week in order to be effective [19]. However, they suggested that the exercises be split into shorter sessions as the personas may quickly lose interest while using such passive rehabilitation tools. Table 3 highlights the main issues – facing the personas in the various scenarios – identified by the participants. Table 3. Scenarios (with main issues identified by participants) Persona Jack
Scenario Rehabilitation session in the home using exercise illustrations for 45 minutes
Participants Comments 1. The exercises should be broken down into smaller chunks as 45 minutes is too long to keep motivated. 2. The exercise video would be more interactive than the booklet and would engage him better. 3. As a retired coach, having to perform exercises in such a passive manner may reduce his confidence even further.
Agnes
Rehabilitation session in the home using exercise videos for 45 minutes
1. The pace of the video exercises never changes. Exercises with more repetitions will be helpful in later stages of rehab. 2. Progress cannot be tracked using video exercises. Agnes lives alone and needs to be constantly motivated. 3. Something to interact with would be useful to Agnes as the video exercises seem passive and dull.
The main themes that emerged from the discussions in this phase of the workshop were related to motivation, interactivity and charting progress during rehabilitation. All the participants agreed that the personas required a more interactive solution to home rehabilitation in order to keep them motivated. 4.3 Visual Feedback of Progress Prior to our explaining the concept behind the visual feedback method described in 3.3.3, the participants were asked whether they understood the purpose of the animation.
Identifying Barriers to Effective User Interaction with Rehabilitation Tools in the Home
41
A majority of them said that they thought it showed a comparison between one person with abnormal gait and another person walking normally. Initially it was not clear to the participants that both stick figures represented the same individual, but rather highlighted progress during rehabilitation. Interestingly, one of the participants mentioned that the video highlighted stride lengths (drawing the attention of the other participants in their group to the footprints left behind by the stick figures). Others identified specific features of abnormal gait such as: poor balance, apparent muscle damage and knee problems. Subsequent to explaining the purpose of the animation; the participants agreed that Agnes’ (persona 2) ability to see improved gait over time – during rehabilitation – will improve her confidence and motivate her to exercise. They also mentioned that visuals such as this one would be useful to them if they were in rehabilitation.
5 Conclusion Exercise illustrations and videos currently used in home rehabilitation contain exercises which have been shown in previous research to significantly reduce the risk of falling in older adults. It is necessary for the users to perform these exercises regularly in order to obtain the maximum benefit that they can offer [19]. The findings from the workshop provided us with valuable information as to why users may not use such materials in the home setting. The three key issues that the users of our workshop felt that we should take into consideration when designing tools for home rehabilitation are: progress, interactivity and motivation. If the tools we design do not have these three factors then we may encounter the same problems as befalls the current tools in that they will not be used. Our findings also suggest that visual feedback can play an important role in home rehabilitation. We discovered that it is not only important that older adults in rehabilitation are constantly motivated to exercise; but that their ability to receive feedback on progress during rehabilitation can improve adherence to home exercise.
6 Future Work One of the limitations of this workshop was that we asked relatively healthy older adults to provide feedback on the use of rehabilitation tools which they had not used previously. We believe that there may be more issues which have not been identified by the scenarios and personas used in this study that we should take into consideration when designing rehabilitation tools for older adults. In order to identify these issues, we intend to conduct a similar user workshop involving older adults with a history of falls who have previously used exercise illustrations and videos for rehabilitation in the home. By doing this, we anticipate that we will gain a better understanding of the use of these tools; and the factors that contribute to a lack of adherence to exercise programmes. We also intend to recruit a larger sample (16) with an equal balance of male and female participants; as we feel that this would improve the quality of the findings in the next workshop.
42
S. Uzor et al.
Furthermore, by addressing the issues identified in 4.2, we aim to investigate how rehabilitation tools can be improved in order to enable effective interaction by users. Acknowledgments. I would like to thank Lee Morton, Mobolaji Ayoade, Glasgow School of Art and the Envisage research team, whose help made the user workshop possible. This research was funded by the MRC Lifelong Health and Wellbeing programme (LLHW).
References 1. Age UK official website, http://www.ageuk.org.uk/ (last referenced: April 04, 2010) 2. Baillie, L.: The Home Workshop. In: Home-Oriented Informatics and Telematics International Working Conference, California, USA (2003) 3. Benford, S., Bederson, B.D., Akesson, K.P., Bayon, V., Druin, A., Hansson, P., Hourcade, J.P., Ingram, R., Neale, H., O’Malley, C., Simsarian, K.T., Stanton, D., Sundblad, Y., Taxen, G.: Designing Storytelling Technologies to Encourage Collaboration Between Young Children. In: Paper presented at CHI: The Future is Here (2000) 4. Carroll, J.M., Rosson, M.B., Chin, G., Koenemann, J.: Requirements Development in scenario-based design. IEEE Transactions on Software Engineering 24(12), 1156–1170 (1998) 5. Gabbard, J.L., Hix, D., Edward Swan, J.: User-Centered Design and Evaluation of Virtual Environments. IEEE Journal of Computer Graphics and Applications 19(6) (November 1999) 6. Hausdorff, J.: Gait Dynamics, Fractals and Falls. Finding Meaning in the Stride-to-Stride Fluctuations of Human Walking. Hum. Mov. Sci. 26(4), 555–589 (2007) 7. Health Education Authority, Older people and accidents; Fact Sheet 2. HE, London (1999) 8. Kwakkel, G., Van Peppen, R., Wagenaar, R.C., Wood Dauphinee, S., Richards, C., Ashburn, A., Miller, K., Lincoln, N., Partridge, C., Wellwood, I., Langhorne, P.: Effects of augmented exercise therapy time after stroke: a meta-analysis. Stroke 35, 2529–2539 (2004) 9. Lord, S.R., Lloyd, D.G., Sek, K.: Sensori-motor Function, Gait Patterns and Falls in Community-dwelling women. Journal of Age and Ageing 25, 292–299 (1996) 10. Loudon, D., Macdonald, A.S.: Enhancing dialogues between rehabilitation patients and therapists using visualisation software. In: Proc. Envisaging the Future of Home Rehabilitation Workshop. Pervasive Health (2011) 11. MacDonald, A.S., Loudon, D., Rowe, P.J., Samuel, D., Hood, V., Nicol, C., Grealy, M., Conway, B.: Towards a design tool for visualizing the functional demand placed on older adults by everyday living tasks. Universal Access in the Information Society 6, 137–144 (2007) 12. Mataric, M.J., Eriksson, J., Feil-Seifer, D.J., Winstein, C.J.: Socially assistive robots for post-stroke rehabilitation. Journal of NeuroEngineering and Rehabilitation (2007) 13. McKim, R.H.: Experiences in Visual Thinking. PWS Publishers, Boston Mass (1972) 14. Newell, A.F., Arnott, J., Carmichael, A., Morgan, M.: Methodologies for Involving Older Adults in the Design Process. In: Stephanidis, C. (ed.) HCI 2007. LNCS, vol. 4554, pp. 982–989. Springer, Heidelberg (2007), doi:10.1007/978-3-540-73279-2_110
Identifying Barriers to Effective User Interaction with Rehabilitation Tools in the Home
43
15. Robinson, L., Dawson, P., Newton, J.: Promoting adherence with exercise-based falls prevention programmes. In: Vincent, M.L., Moreau, T.M. (eds.) Accidental Falls: Causes, Prevention and Intervention, ch. 12, pp. 283–298. Nova Science Publishers, New York (2008) 16. Silver Fit Soft Kinetic rehabilitation (2010), http://www.silverfit.nl 17. Skelton, D.A., Dinan, S.M.: Exercise for falls management: Rationale for an exercise programme aimed at reducing postural instability. Journal of Physiotherapy Theory and Practice (1999) 18. Taylor, A., Aitken, A., Godden, D., Colligan, J.: Group pulmonary rehabilitation delivered to the home via the Internet: feasibility and patient perception. In: Proc. CHI 2011 (2011) 19. Taylor, D., Stretton, C.: The Otago Exercise Program, An evidence-based approach to falls prevention for older adults living in the community. Journal of Primary Health Care 31(6) (2004)
Clinical Validation of a Virtual Environment Test for Safe Street Crossing in the Assessment of Acquired Brain Injury Patients with and without Neglect Patricia Mesa-Gresa1, Jose A. Lozano1, Roberto Llórens1, Mariano Alcañiz1,2, María Dolores Navarro3, and Enrique Noé3 1
Universidad Politécnica de Valencia, Instituto Interuniversitario de Investigación en Bioingeniería y Tecnología Orientada al Ser Humano, LabHuman, Camino de Vera, s/n, 46022, Valencia, Spain 2 Ciber, Fisiopatología Obesidad y Nutrición, CB06/03, Instituto de Salud Carlos III, Spain {pmesa,jlozano,rllorens,malcaniz}@labhuman.i3bh.es 3 Servicio de NeuroRehabilitación. Hospital NISA Valencia al Mar y Sevilla Aljarafe, Fundación NISA, Valencia, Spain [email protected]; [email protected]
Abstract. Acquired brain injury (ABI) is a complex disease that involves loss of brain functions related to cognitive and motor capabilities and that can produce unilateral spatial neglect (USN). The heterogeneity of the symptoms of these disorders causes a lack of consensus on suitable tools for evaluation and treatment. Recently, several studies have initiated the application of virtual reality (VR) systems as an evaluation instrument for neuropsychological disorders. Our main objective was to evaluate the validity of the VR Street Crossing Test (VRSCT) as an assessment tool. Twenty-five patients with ABI were evaluated with traditional tests and with the VRSCT. The results showed significant correlations between the conventional tests and the measures obtained with the VRSCT in non-negligent patients. Moreover, the VRSCT indicated significant differences in performance of negligent and non-negligent subjects. These pilot results indicate that ABI patients with and without USN can be assessed by the therapists using the VRSCT system as a complementary tool. Keywords: Acquired brain injury, unilateral spatial neglect, pencil-and-paper tests, cognitive assessment, virtual reality, rehabilitation.
Clinical Validation of a Virtual Environment Test for Safe Street Crossing
45
patients may suffer cognitive difficulties such as problems with attention, memory, concentration, and executive functions (planning, judging, reasoning, etc.). Taking this heterogeneity of symptoms into account is essential to be able to carry out a proper assessment of the disease in order to ensure the success of the treatment. Unilateral Spatial Neglect (USN) is a frequent disorder that is detected after brain damage [1]. The main feature of USN is the inability of patients to pay attention to stimuli located on the contralateral side of injury, and these symptoms are not related to sensorial or motor deficits [3][4]. This alteration may have important functional consequences for patients in their daily lives that make recovery more arduous. Patients who suffer symptoms usually fail to attend to their left/right side thereby disturbing their personal space (for example, shaving only half of their face), peripersonal space (reading one side of the two pages of a newspaper), and extrapersonal space (bumping into objects when walking) [5]. The complexity and heterogeneity of the symptoms of USN cause a lack of consensus on suitable tools for evaluation and treatment [4]. Although many cases of USN can be detected by observation, certain specific diagnoses require a specific evaluation to measure the severity of symptoms and possible progression. Traditional tools for assessment are based on pencil-and-paper tests as well as behavioral batteries [5][6]. The effectiveness of evaluation tests of this kind have been proved, although there are some limitations that must be taken into account such as the difficulty of interpreting the results, the extrapolation of results to daily life tasks, the difficulty of differentiating between sensory deficits, the lack of attention that is typical of USN, and the absence of assessment of changes in personal space [4][5]. In the last few decades, new procedures using virtual reality (VR) technologies have emerged. VR is a new technology based on computerized-generated stimulation that immerses the user in a realistic 3D world with multisensory stimuli and that offers the possibility to interact with the elements and receive feedback [4][7][8]. This technology, which has demonstrated its validity in the area of neuropsychology [2], can overcome some of the limitations of traditional evaluation methods. The main advantages of using VR are that it permits the evaluation and treatment in a realistic environment that is related to daily life and that is safer and has intrinsic ecological validity [5][7]. VR can generate different environments that allow more interaction and sense of presence and that improve the motivation of the users while enabling a precise control of each session to be maintained [8]. In the same way, the performance of the subjects can be recorded from different measuring procedures, which allows for individualized and adapted sessions in accordance with the limitations of the subject and/or the previous results obtained by the VR system [7][8]. In the case of the assessment of USN, the use of visual trackers by VR systems enables the eye movements of patients and their visual search pattern to be evaluated, which is important in learning more about the characteristics of this disease [5]. In recent years, several studies have initiated the application of VR systems as an evaluation tool for neuropsychological disorders. Some studies have carried out virtual versions of the classic pencil-and-paper tests. Fordell et al. [9] designed the VR-DiSTRO system which consists of a VR-test battery based on conventional tests for USN evaluation. The main results obtained in this study showed a high sensitivity of the VR system in detecting cases of neglect and a high level of agreement between measures of conventional tests and those obtained by the VR-DiSTRO. In another
46
P. Mesa-Gresa et al.
study [10], a semi-immersive work bench with stereoscopic glasses and a haptic device were used with a virtual version of the cancellation test. In this study, the results revealed that negligent and recovered patients showed irregular exploration performance in the VR task, with the VR system being more instructive than the conventional test. However, other VR systems include innovative diagnostic tasks. Dvorkin and colleagues [11] designed a Virtual Environment for Spatial Neglect Assessment application consisted of a 3D room-shape where patients had to respond when they detected a target (balls). The results indicated the sensitivity of the VR system for differentiating between negligent patients and control subjects, even though the similarity between the results of the VR system and a traditional test was less conclusive. More recently, Kim et al. [12] examined the efficiency of a 3D immersive virtual street crossing program for the assessment of post-stroke patients [7][13]. The VR system consisted in a real street crossing that could evaluate extrapersonal space in patients and showed significant differences between negligent and non-negligent patients. A previous study by Navarro et al. [14] has demonstrated that our VR Street Crossing Test (VRSCT) is perceived by the patients as being a usable and satisfactory system in the rehabilitation of ABI and USN. Therefore, the main objective of our study is to evaluate the validity of the VRSCT system in terms of cognitive assessment. More specifically, the study compares the scores obtained from patients with ABI (with and without neglect) between different pencil-and-paper tests of attention and the measures found in the VRSCT system. Finally, we have tried to verify the sensitivity of our VR system as a complementary tool in neuropsychological assessment of attentional deficits in patients with ABI and with the diagnosis of USN.
2 Materials and Methods Subjects. The participants of this study included twenty-five patients, 14 men and 11 women, aged 51.2 ± 12.62 years (mean ± standard deviation or SD), with a mean chronicity of 505.42 days (SD: 335.11) and a mean of 11.84 years of education (SD:4.22). All the participants had sustained either a right or a left hemispheric brain lesion due to an ABI (hemorrhagic stroke: n=12, ischemic stroke: n=10, brain tumor: n=3). The participants in this study were selected based on their scores on the MiniMental State Examination test (MMSE), which is a brief assessment of cognitive abilities, and an adapted version of the Mississipi Aphasia Screening Test (MAST), which evaluates comprehension of orders. All the subjects had a score greater than or equal to 24 on the MMSE and greater than or equal to 45 on the MAST. Therefore, participants had a cognitive level and proper comprehension to be able to handle the software. VRSCT system. The VRSCT system simulates a typical city with its buildings, streets, cars, crosswalks, traffic lights, etc. In this environment, the objective of the patient is to go to a specific place in order to confront these adverse elements. This virtual environment is programmed to offer the therapist the possibility to configure it according to the difficulty level that is most appropriate for the patient. In order to
Clinical Validation of a Virtual Environment Test for Safe Street Crossing
47
offer the patient the virtual experience in an immersive way, we decided to use a conventional panoramic 47" LCD monitor, and a conventional 5.1 surround sound system. In order to interact with the virtual environment in an easy, intuitive and noninvasive way, we offered the patient two wireless devices: a conventional joystick for navigation and interaction, and an optical tracking system (TRACKIR, a product of NaturalPoint Company) to be able to track the patient's head movements. This last aspect is very important in the rehabilitation of neglect patients because they must get used to moving their heads to the neglect side of the space. The patient uses a cap that has three reflecting marks. The position and orientation of these marks are captured by a USB infrared camera that is placed in a panoramic LCD monitor. The device allows the patient to link his/her head movements in 3D space to the movement of the point of reference in the virtual environment. The device also allows these movements to be configured, for instance, linking small head-movements of the patient in 3D space to magnify the point of reference of movements in the virtual environment; this creates a panoramic view of the virtual environment of up to 180º. In the experiment, the patients were placed in a quiet room where they sat at a table in front of a widescreen monitor. A therapist trained in the virtual system was in the same room with the patient to give specific instructions about the assessment task and the software and to control the process. First, the patients received instructions on the use of the program. Then, the head tracking system was adjusted for each patient and a training session was administered to practice with the software and the hardware. For this training session, the patient had to navigate through the virtual streets and complete a single route without traffic or any other distractor. Once the patient had undergone the training session successfully, the assessment session began. The main goal of this session was to cross two-way roads in order to arrive at a destination point, a supermarket, and then return to the start point as quickly and as safely as possible. When an accident occurred, the patient received emotionally-intense audiovisual feedback (for example, sound of a car horn) and new instructions from the therapist. In this case, the program was automatically restarted from the initial point, without discounting the time already consumed. Taking into account previous studies, the session was considered completed when the patient performed two complete routes (each included arriving to the supermarket and getting back to the starting point) without more than four accidents. Therefore, when the patient did less than two complete routes or had more than four accidents, the patient was considered to have failed the task. The measures evaluated during the assessment session were the following: the number of times the participant looked to the left and to the right, the total time needed to finalized the task, the total number of accidents, and the completion/non-completion of the task (See Figure 1).
Fig. 1. Overview of the VRSCT and description of traking system
48
P. Mesa-Gresa et al.
Instruments. The pencil-and-paper tests used in the study are described below. The Behavioral Inattention Test (BIT) is a behavioral test battery that is designed to evaluate USN. It is composed of 15 subtests, 9 of which evaluate aspects of daily life and 6 of which are traditional tasks that assess neglect. The subtests of daily living are assessed by tasks like tracing line drawings, dialing, reading a menu, reading an article, telling and setting the time, choosing currencies, copying statements and addresses, map orientation and choosing cards. The evaluation of neglect on these tests consists of letter cancellation, star cancellation, a copying figures and shapes, line bisection and representative drawing [6]. To determine the presence of USN, the BIT scores were used taking the score of 129 as the cut-off. Based on the scores obtained by the participants and the normative scales of the test, the subjects were classified as “Negligent subjects” if the score was less than or equal to 129, and “Non-negligent subjects” if the score was greater than 129. The second test used was the Color Trail Making Test (CTT). This is a behavioral test that is based on the evaluation of sustained visual attention, mental flexibility, sequencing, visual tracking, and graphomotor ability. The CTT is a color version of the Trail Making Test. This test has two different parts: 1) Part A: the subject must connect the 25 numbers that are randomly distributed on a sheet with consecutive lines; and 2) Part B: There are 25 duplicate numbers within circles of different colors in which the subject must connect consecutive numbers with lines while at the same time alternating colors. The variable evaluated in this test is time (seconds) to perform each of the parts. Finally, the Conners’ Continuous Performance Test-II (CPT-II) is a computerized test that evaluates the sustained attention of the subject and the ability to inhibit inappropriate responses. In this test, the subjects must press a key when they detect letters other than the letter ‘X’. The assessment contains 6 blocks that vary in the rate of submission of the letters. After testing, the program generates an automated report that includes data on various variables such as the number of omissions (letters that the subject did not mark), number of commissions (times that subject marked ‘X’), reaction time (HIT Rt), and the capacity to adapt to the temporary demands of the task (HIT Rt ISI). Procedure. All the patients were classified and selected based on their scores on the MMSE and the MAST. Once the 25 patients who participated in the study were selected, a cognitive evaluation was performed prior to using the VRSCT system. The cognitive assessment, which consisted of BIT, CTT, and CPT-II, was conducted during the same week as the virtual training. Then, evaluation and training with the VRSCT were initiated. The virtual test consisted of a preliminary training session and an evaluation session based on the task explained above. The training session took approximately 10 minutes and the evaluation session lasted until the patient finished the task and/or the patient was considered to have failed the task. Data analyses. All statistical analyses were performed by SPSS 17.0 for Windows. The Pearson parametric correlation was used to analyse the BIT scores that classified patients as negligent or non-negligent in relation to the total time for the completion of the task in the assessment session of the virtual program and the number of accidents. The Pearson parametric correlation for continuous variables was used to measure the numerical scores obtained on tests of attention assessment (CTT and
Clinical Validation of a Virtual Environment Test for Safe Street Crossing
49
CPT-II) by non-negligent subjects that were related to the variables obtained by the VRSCT (total time to finalize the task and the number of accidents). The measures evaluated during the assessment session on the VRSCT were compared between negligent and non-negligent patients using the Mann-Whitney U-test. All the data are presented as mean±SD. In all cases, the significance levels were set at p<0.05.
3 Results Correlation between the BIT and the VRSCT system with negligent and non-negligent patients. According to the BIT scores, five of the twenty-five patients were classified as negligent subjects. The results obtained for non-negligent subjects (n=20) showed a negative correlation that is statistically significant between the score obtained on the BIT and the total time needed to finalize the VRSCT (r2= -0.507, p<0.05) and the total number of accidents (r2= -0.515, p<0.05) (See Table 1 and Figure 2 a). None of the VRSCT measures were significantly correlated with the BIT score in negligent subjects. Correlation between the conventional test for assessing attention (CTT and CPT-II) and the VRSCT system with non-negligent patients. On the CTT, scores obtained by non-negligent patients in part A (r2= 0.802, p<0.001) and part B (r2= 0.506, p<0.05) correlated significantly with the total time needed to finalize the VRSCT (See Figure 2 b). On the CPT-II test, only the significant correlation was between the HIT Rt ISI variable of the test and the total time needed to complete the task of the VRSCT (r2= 0.613, p<0.01). Other measures obtained with the CPT-II test (omissions, commissions, and HIT Rt) did not reach a significant correlation with the variables of the VRSCT. Table 1. Correlation between the pencil-and-paper tests (BIT, CTT and CPT-II) and the variables of VRSCT for negligent and non-negligent subjects CPT HIT Rt ISI Non-negligent Non-negligent Non-negligent Non-negligent subjects subjects subjects subjects -0.507* 0.802** 0.506* 0.613** BIT
Negligent subjects VR System Total time (sec) 0.704 Number of -0.231 accidents *p<0.05; **p<0.01
-0.515*
CTT-A
CTT-B
0.243
0.429
0.151
Comparison of negligent and non-negligent patients’ performances on the VRSCT. The total number of accidents registered in the VRSCT was significantly higher in negligent patients than in non-negligent patients (U=17.000, p<0.05). Non-negligent patients showed a lower number of accidents (3.00±1.41) than the negligent patients (1.15±1.27). However, the total time needed to finalize the task in the VRSCT did not reach statistical significance between negligent and non-negligent patients (p=0.734).
r2= -0.507
Total Time VRSCT (sec) (a)
Time CTT-A (sec)
P. Mesa-Gresa et al.
BIT scores
50
r2= 0.802
Total Time VRSCT (sec) (b)
Fig. 2. (a) Correlation between the BIT scores and the total time needed to finalize the VRSCT in non-negligent patients. (b) Correlation between CTT-A scores and the total time needed to finalize the VRSCT in non-negligent patients.
4 Discussion In this study, we evaluated the validity of VRSCT for the assessment of both negligent and non-negligent ABI patients. The results of the study showed that the VRSCT system was effective in terms of cognitive assessment. The measures obtained by the VRSCT system correlated with BIT score for non-negligent patients. More specifically, subjects that showed a lower score on the BIT spent more time completing the VRSCT task and had a higher number of accidents. However, with regard to the negligent patients, the results showed no significant correlation with the BIT data. Despite these result, the relationship between the scores of the BIT test and the VRSCT system in non-negligent patients may indicate that the virtual system measurements could be sensitive to the subject's attentional capacities. Future studies could make a more accurate comparison between the two types of diagnoses (i.e., traditional vs. VRSCT), by taking into account the subtasks of the BIT and expanding the measures assessed by the virtual system such as reaction time or time required for specific task performance. The results also show that the VRSCT system has a positive correlation in nonnegligent patients with other traditional tests used to assess attention. More precisely, patients who took less time to solve the CTT and presented higher capacity to adapt to the temporary demands of the tasks of CPT-II had fewer accidents and spent less time completing the task in the VRSCT. With respect to differences in the implementation of the VRSCT between negligent and non-negligent patients, the VRSCT showed significant differences in the measure of total number of accidents. The results show that the negligent patients had a higher number of accidents than non-negligent patients. Moreover, the system requirements to complete the task can evaluate the capacity of the patient to have an appropriate emotional response to the dangerous stimuli in daily life as well as the capacity of the patient to have an adequate reaction time. These pilot results indicate that the VRSCT can be used as a complementary tool for the diagnosis of patients with ABI affected by USN. We hope that this study is a first step in the construction of a complete platform for diagnosis and rehabilitation of ABI. Indeed, future studies and the implementation of improvements would allow the VRCST system to respond to the needs of both patients and therapists.
Clinical Validation of a Virtual Environment Test for Safe Street Crossing
51
Acknowledgments. This study was funded by Ministerio de Educación y Ciencia Spain, Project Game Teen (TIN2010-20187) projects Consolider-C (SEJ2006-14301/PSIC), “CIBER of Physiopathology of Obesity and Nutrition, an initiative of ISCIII” and Excellence Research Program PROMETEO (Generalitat Valenciana. Conselleria de Educación, 2008-157).
References 1. Ríos-Lago, M., Benito-León, J., Paul, N., Tirapu-Ustárroz, J.: Neurophychology of Acquired Brain Injury. In: Tirapu-Ustárroz, J., Ríos-Lago, M., Maestú, F. (eds.) Neuropsychology Handbook, Viguera, Barcelona, pp. 311–341 (2006) 2. Rose, F.D., Brooks, B.M., Rizzo, A.A.: Virtual Reality in Brain Damage Rehabilitation: Review. Cyberpsychol. Behav. 8, 241–261 (2005) 3. Dawson, A.M., Baxbaum, L.J., Rizzo, A.A.: The Virtual Reality Lateralized Attention Test: Sensitivity and Validity of a New Clinical Tool for Assessing Hemispatial Neglect, pp. 77–82. IEEE, Los Alamitos (2008) 4. Tsirlin, I., Dipierrix, E., Chokron, S., Coquillart, S., Ohlmann, T.: Uses of Virtual Reality for Diagnosis, Rehabilitation and Study of Unilateral Spatial Neglect: Review and Analysis. Cyberpsychol. Behav. 12, 175–181 (2009) 5. Ting, D.S.J., Pollock, A., Dutton, G.N., Doubal, F.N., Ting, D.S.W., Thomson, M., Dhillon, B.: Visual Neglect Following Stroke: Current Concepts and Future Focus. Surv. Ophthalmol. 56, 114–134 (2011) 6. Peña-Casanova, J., Gramunt Fombuena, N., Gich Fullà, J. (eds.): Neurophychological Tests. Elsevier, Barcelona (2006) 7. Katz, N., Ring, H., Naveh, Y., Kizony, R., Feintuch, U., Weiss, P.L.: Interactive Virtual Environment Training for Safe Street Crossing of Right Hemisphere Stroke Patients with Unilateral Spatial Neglect. Disabil. Rehabil. 29, 177–181 (2005) 8. Peñasco-Martín, B., De los Reyes-Guzmán, A., Gil-Agudo, A., Bernal-Sahún, A., PérezAguilar, B., De la Peña-González, I.: Application of Virtual Reality in the Motor Aspects of Neurorehabilitation. Rev. Neurol. 8, 481–488 (2010) 9. Fordell, H., Bodin, K., Bucht, G., Malm, J.: A Virtual Reality Test Battery for Assessment and Screenig of Spatial Neglect. Acta Neurol. Scand. 123, 167–174 (2011) 10. Broeren, J., Samuelsson, H., Stibrant-Sunnerhagen, K., Blomstrand, C., Rydmark, M.: Neglect Assessment as an Aplication of Virtual Reality. Acta Neurol. Scand. 116, 157– 163 (2007) 11. Dvorkin, A.Y., Rymer, W.Z., Harvey, R.L., Bogey, R.A., Patton, J.L.: Assessment and Monitoring of Recovery of Spatial Neglect within a Virtual Environment, pp. 88–92. IEEE, Los Alamitos (2008) 12. Kim, D.Y., Chang, W.H., Park, T.H., Lim, J.Y., Han, K., Kim, I.Y., Kim, S.I.: Assessment of Post-stroke Extrapersonal Neglect Using a Three-Dimensional Immersive Virtual Street Crossing Program. Acta Neurol. Scand. 121, 171–177 (2010) 13. Weiss, P.L., Naveh, Y., Katz, N.: Design and Testing of a Virtual Environment to Train Stroke Patients with Unilateral Spatial Neglect to Cross a Street Safely. Occup. Ther. Int. 10, 39–55 (2003) 14. Navarro, M.D., Alcañiz, M., Ferri, J., Lozano, J.A., Herrero, N., Chirivella, J.: Preliminary Validation of Ecotrain-Cognitive: A Virtual Environment Task for Safe Street Crossing in Acquired Brain Injury Patients with and without Unilateral Spatial Neglect. J. Cyber. Ther. Rehabil. 2, 199–203 (2009)
Smart Homes or Smart Occupants? Supporting Aware Living in the Home Lyn Bartram, Johnny Rodgers, and Rob Woodbury School of Interactive Arts + Technology, Simon Fraser University Surrey, BC, CANADA V3T 0A3 {lyn,jgr3,rw}@sfu.ca
Abstract. Awareness of resource consumption in the home is a key part of reducing our ecological footprint yet lack of appropriate understanding and motivation often deters residents from behaviour change. The coming deployment of smart metering technologies, the increasing practicality of embedded devices, and the widespread use of Internet and mobile tools offer new opportunities for “greener” residents. We report on the design and implementation of a holistic interactive system that supports residents in awareness of resource use and facilitates efficient control of house systems to encourage conservation in daily activities. Initial response from two high-profile deployments in unique homes indicates this approach has great potential in engaging residents in sustainable living, but presents many challenges in how technology is integrated into the home environment. Keywords: Residential resource use, interaction design, ubiquitous computing, information visualization, sustainability, domestic design.
Smart Homes or Smart Occupants? Supporting Aware Living in the Home
53
1.1 A Tale of Two Houses Our insights and experience in building information and control systems for the aware resident arise from our involvement in the design and implementation of two sustainable homes: North House and West House. North House is a small solar-powered home that recently placed 4th overall in the 2009 Solar Decathlon. North House incorporates sophisticated custom energy systems, adaptive intelligent building envelope technologies, specialized lighting and climate systems, and automated optimization behaviour. No conventional control systems were integrated into the home: a digital control panel and an iPhone™ provided the only means for the resident to control, track and manage energy performance in the house. West House, our second and current project, addresses a different set of goals. It is a small, passively efficient house that uses electricity and natural gas from the public utilities and solar energy to augment heating, hot water and electricity production. It is presented as a conventional home, and typical controls (light switches, thermostats and security systems) are included throughout the house, so that digital and physical controls and feedback are intermingled. We built West House as part of our ongoing collaboration with the City of Vancouver, whose policy makers are keenly interested in how information technology, social media, alternative energies and building design can be combined to foster more sustainable living practices in “typical” houses. During the 10 days of the Solar Decathlon, North House saw more than 60,000 visitors. 65,000 people visited West House during its public display at the Vancouver 2010 Winter Olympic Games. We used these opportunities to engage in conversation and informal interviews with many different people interested in the systems. These design cases provided us with pragmatic insights into where tradeoffs are likely to occur when deploying these systems into real-world environments. We are faced with a multitude of design constraints not typically associated with traditional approaches. The physical layout of a home introduces numerous constraints on placement, visibility, aesthetic choices, and interactive affordances. Similarly, designing for the idiosyncratic habits and expectations of users in home environments requires sensitivity to how such tools are likely to be used in everyday activities, and to how they cohere with other elements in the home. For example, residents have competing ideas about where visible technology should be located and who controls it. Simply installing software on the home PC or hanging a monitor on the wall is only going to help residents make some kinds of decisions — and only as long as it integrates coherently with their daily activities.
2 Related Work The general promise of the smart home – that intelligent operation could be offloaded to a computational component – has been confounded by the human factor. The single most daunting factors in smart home automation have not been technological capability but complexity and poor usability [7]. However, significant research has focused on the power of pervasive and networked computing to automate and enable supportive and adaptive services within smart homes. This work has largely been
54
L. Bartram, J. Rodgers, and R. Woodbury
targeted at enabling assistive environments for in-home care such as the Aware home [13]. A recent Apple patent [8] extends home automation to support variable control of how devices are powered and provide feedback on consumption at the device level. More recently, sensing networks have been proposed to analyse and react to user behaviour in the environment to optimize power use [11] and enable load shifting. The focus in these technologies is measurement, analysis and control of power in the home, with automation as an underlying principle. In a different approach, Weiss et al. developed a web-based application for monitoring home energy use that allows the resident to monitor consumption on a smart phone and turn individual appliances on or off [25]. Psychological research [1] shows that feedback is a central aspect of motivating resource conservation in the home. Recent web-based services partnered with power utilities approach this goal. Google PowerMeter™ and Microsoft Hohm™ allow residents to monitor and analyse aspects of their energy consumption using common “energy dashboard” displays and some description of energy use impacts. In-home displays such as Blueline’s Power Cost Monitor™ show total electricity consumption in terms of kWh hours and money spent. Point of consumption tools such as the Kill A Watt™ are dedicated energy monitoring units attached to a particular appliance or outlet that provide numerical electrical and financial expenditure. All of these tools present aggregate data related to overall use. In contrast with these traditional computing displays, recent work has brought considerable attention to the design of “ecofeedback” technologies [9, 19] and visualization techniques to promote resource use awareness [12]. While others have described the design of eco-feedback technology [9, 19] and eco-visualization at a high level [12], our work builds on design-driven exploration of working prototype solutions in this space, similar to [10,14] and fits into the Persuasive Technology and Pervasive Sensing genres that diSalvo distinguishes [6]. We depart beyond much of the HCI research, however, in that we are exploring an entire system for human-home interaction, considering both feedback and control. How these technologies may aid or hinder residents in developing more sustainable behaviours within the home is an open question. As several researchers have found, simple data feedback is not enough. Awareness does not equate to behaviour change, and a diversity of motivations exists for conservation. Key issues [4] for residents are the lack of real time information around consumption; comprehension of what the energy use units actually mean in terms of behaviour [26]; the complexity of energymanagement devices such as programmable thermostats; poor location of feedback away from locations where resource use decisions are made [26]; and the need for motivational tools such as goal-setting abilities and social networking features [16,18]. Perhaps the most widespread example of a poorly implemented human-house interactive tool is the programmable thermostat. It fails for two primary reasons. First, the interface is non-standard and invariably complex across products, so only a minority of users actually configures it successfully. More important, however, is its functional design. Most peoples’ lives are more complex and variable than the simple schedules the thermostat accepts. As a result, people tend to program it for the minimum case, and end up heating their homes unnecessarily in periods when they are
Smart Homes or Smart Occupants? Supporting Aware Living in the Home
55
absent. This example is representative of the need for a more flexible and responsive design approach. Chetty advocates several design principles: make real-time information visible and comprehensible; design for individual and collective agency for motivation and reward; ensure technologies are attainable; and seek new ways of stimulating discussion and engagement [4]. Fitting these technologies into the home poses additional challenges. Residents have competing ideas about where visible technology should be located and who controls it. They also feel overburdened by the complexity and inflexibility of home technologies they already use [24]. This can be seriously aggravated by automation: humans have an uneasy relationship with automated control [29], as we discovered in North House. 2.1 Lifestyle and Human Factors Social scientists provide some insight into what kinds of lifestyle factors contribute to residential energy use. Research indicates four variables influence behaviour: norms and beliefs; external persuasive forces (community pressure, advertising); personal knowledge and skills; and habit and routine [23]. Energy use affected by the last is driven by comfort and effort (time and actions required to execute): conservation activities depend on lowering both cost and effort [21,22]. In addition, this research pinpoints the role of knowledge; many people have incorrect estimations of how much energy is used in hot water and appliances, influencing their use. So while demographic factors such as gender and income play major roles, the more malleable psychological aspects of habit, effort and knowledge have a strong influence.
3 Design Rationale These themes were repeated in our preliminary user workshops with people who described themselves as “interested” in sustainable living environments but who had no experience with them. Participants included students, professionals and blue and white-collar workers. While the motivating models differed (some were more interested in positive financial outcomes where others were more interested in their energy footprint and ecological impact), we found several common threads. The first was time: all of our “users” identified themselves as very busy people and were concerned about having to spend too much time and effort in managing the house. A related thread was place: people are very mobile, wanted appropriate information and controls accessible from wherever they were, and wanted localized and contextually appropriate access in the house itself. For example, none found the notion of a central control panel and dashboard for lights, shutters, etc. very useful, but liked the idea of information and controls in place. A third was related to complexity: whereas several indicated they would be interested in learning more about how the house actually worked, all wanted a simple interface with a low learning curve that would provide quick access to reasonable house configuration while allowing the more expert user to fine tune settings. Finally, participants really wanted to know “how they were doing” in the context of their particular goals (financial, energy use) and how this changed over time and events.
56
L. Bartram, J. Rodgers, and R. Woodbury
The design dialogue in the development of efficient buildings has largely focused on smart automation of the building systems and components for optimal performance rather than on effectively supporting how people use them [15]. But the single most daunting factors in smart home automation have not been technological capability but complexity and poor usability [7]: residents are not professional facilities managers. In contrast to the automated smart home populated with intelligent devices, we focus on the aware home with support for the smart occupant, and we focus on an integrated, extensible system as opposed to a loose collection of different tools. We aim to reduce the technological and cognitive effort required to make decisions about resource use and understand its impacts. Our rough “grounding equation” can be expressed as Cost > Benefit ≠ Change. That is, if the perceived cost (effort, time) of doing something outweighs the perceived benefits, people are unlikely to change their behaviour [22]. We propose we need to reduce the overhead of performing conservation actions and increase the motivating benefits, including non-financial incentives. Additionally, we posit that a piecemeal approach introduces complexity, and that residents will benefit from a coherent ecosystem of information feedback and control tools that are integrated into systems they already use [2]. We based our system design on the following criteria.
Fig. 1. The ALIS software architecture
1. Rich, real-time feedback. Make real-time and cumulative resource use information available to support decision-making and information access at a variety of levels: in-the-moment awareness; lightweight monitoring; analysis and reasoning; consequential judgment and prediction. 2. Context. Present information in contextually appropriate ways: for example, energy use expressed in both financial terms and common usage (“enough to power a washing machine”). Embed information where decisions are made.
Smart Homes or Smart Occupants? Supporting Aware Living in the Home
57
3. Individual and Social Motivation: Provide goal-setting capabilities and integration with social and community networks. 4. Control: Enable efficient resource use decisions by building a control hierarchy for optimizing resource conservation. Design and distribute controls appropriately: embedded in the home, remote or mobile. 5. Aesthetics: Respect the design sensibility of a home. Explore subtle ways to provide feedback. 6. Familiarity: Reduce complexity by leveraging tools people already use in their information landscape, such as calendars, browsers and clocks. This combination of design goals is substantially more sophisticated than what available home automation systems or monitoring applications offer. In fact, it moves the design brief from that of smart home automation or energy use monitoring to a more comprehensive resource management system that is integrated with the home environment and residents’ activities while respecting the dimensions of aesthetic and emotional comfort.
4 The Aware Living Interface System Our system comprises three main components: a control backbone that provides finegrained measurement, device control and automation logic that is based on a heavily customized commercial application; a Web services layer that manages data and commands between components; and the Aware Living Interactive System (ALIS) that embodies the resident’s interaction with the home (Figure 1). ALIS is built on a comprehensive information model incorporating control and device details, resourcespecific production and consumption data in terms of standard units, pricing levels, and standard usage equivalencies, personal and shared goals, and a hierarchical model of energy-control settings to enable “one-step” optimization. It currently comprises different forms: a set of variably configured client interfaces run on web browsers in several platforms; a mobile application; and ambient, “informative art” displays embedded in the home. Fig. 2. The Dashboard
58
L. Bartram, J. Rodgers, and R. Woodbury
ALIS supports a variety of feedback displays and analytical tools. These displays are presented as web pages running in a browser in different platforms: while they can be served to any web-enabled device, they are optimized for the viewing experience of a “typical” computer screen. We currently link to and embed information related to conservation from a number of sources, including the public utilities, directly into the ALIS information framework. At the top level is the Resource Dashboard that expresses resource use in variable terms: as standard units, financial figures, by usage (“Today’s water use is equivalent to two baths”) and in relative terms (“25% less electricity than yesterday”) (Figure 2). We are exploring appropriate contextual ways to present the information, as these vary not only by individual preference, but also by location and task: for example, in the garage, the resident may wish to see the power consumed by his/her electric car overlaid on the top-level Dashboard view, and to see it in terms of “kilometers earned.” Detailed information on resource production and use is available in real-time and historical views, categorized in different ways (by type of device, by location in house, by time of use). We also make use of professional building energy analysis tools from the industry leader, Pulse™ Energy, for detailed performance analysis and prediction. The application integrates visualization components in the display of energy use data in multiple formats: as numerical financial data and graphs in the Overview, comparative and scalable performance data in the Resource Usage tool, and simplified comparative performance data in the Neighbourhood Network views. Residents can set personal milestones and challenges that can be measured by the system – for example, “use 10% less energy than last month.” Figure 2 shows the progress towards one such goal on the Dashboard. 4.1 Social Interaction: The Neighbourhood Network The community interface, which we call the Neighbourhood Network, encourages competition, comparison, and collaboration between community members. It is designed to take advantage of the widespread use of social networking software to enable individuals to connect with a wider community of people pursuing Fig. 3. Facebook App similar goals around sustainability, enabling them to share strategies, incentives and successes. It is available on both PCs and mobile devices as part of the web application. Occupants can see a historical view of their energy consumption compared to a community average and set conservation goals, allowing them to compete with others in achieving those goals while sharing tips and comments. Those who follow through with their commitments receive awards (in the form of digital trophies) that are displayed to others within the community. These eco-trophies are not only applied to challenges, but also recognize the degree to which one engaged with the community by posting conservation advice and replying to other members’ feedback. We note that we are not developing a social network itself; rather we integrate existing forum
Smart Homes or Smart Occupants? Supporting Aware Living in the Home
59
tools. A Facebook™ plug-in connects to the ALIS server to automatically collect data to publish (Figure 3): other non-ALIS data we intend to collect using voluntary self-entered reporting. Note that an issue in sharing energy use information relates to normalization: simple numbers between dissimilar houses do not support effective comparisons, so we are instead relying on the more general metric of progress towards a goal. 4.2 Informative Art We are exploring the use of informative art visualizations in two contexts. The Ambient Canvas is an informative art piece embedded in the kitchen backsplash that provides feedback on the use of electricity, water, and natural gas. As opposed to typical graphical displays Fig. 4. The Ambient Canvas that may use numbers or charts to convey information, the Ambient Canvas combines LED lights and filters of various materials to produce light effects on the kitchen backsplash. This subtle feedback on performance and energy efficiency does not require active attention on the part of the resident, and integrates into the home cohesively. In addition, we are currently studying the use of landscape photographs that subtly change according to levels of water consumption. These are meant to serve as both aesthetic elements and informative views. In both these contexts, informative art is intended to promote awareness of resource use to assist and influence sustainable in-themoment decisionmaking while respecting aesthetic Fig. 5. One view of the ALIS controls interface constraints. We
60
L. Bartram, J. Rodgers, and R. Woodbury
emphasize that the attraction of such imagery is idiosyncratic to each home and are actively exploring the utility and acceptance of different representations. 4.3 Control As in standard home automation, ALIS enables the resident to control and monitor lights, shades, and climate settings. In addition, the resident can configure energyoptimizing “modes” as presets in ALIS controls: for example, turning off most lights and lowering the thermostat in a “Sleep mode”, or tuning settings and shutting down standby power in “Away” mode. These presets can be activated via one button from any ALIS control interface (Figure 4). Modes can also be scheduled. Modes are entirely user configurable, and coexist with individual control settings for fine-grained control when desired. Finally, we implemented a control hierarchy so that residents could easily access master controls and the most commonly used switches, or selectively drill down to fine-grained controls on demand. 4.4 Platforms ALIS is currently implemented largely as a Web services application and delivered on four hardware platforms: embedded touch panel computers, personal computers, mobile phones, and the integrated informative art. The system can also accommodate expansion across further platforms. As implemented, the user interfaces provide control, feedback, and community features across the various devices, with each set of features contextualized for the location, application and form factor of the delivery hardware. The ALIS PC interface offers the most detailed access to feedback about resource consumption in the home. It provides the resident with standard full access to the entire ALIS system. Residents use the PC interface from any computer on the Internet to configure modes, set goals, and schedule house operation. Three touch panels are embedded into the structure of each home. Each display runs a browser in full screen kiosk mode. A large touch panel is placed in a central location: in the kitchen backsplash in North House, and on the wall in the central hallway in West House. Additionally, two smaller panels provide localized control and feedback throughout the home. In North House these are placed at the two entryways. In West House, one is installed in the garage and the other is placed beside the upstairs desk. Access to the full ALIS PC interface is possible at each access point, but complicated by the affordances and form factor of a touch screen. To address this, we have configured the interfaces on the panels to default to information and control views appropriate to their location. For Fig. 6. Mobile example, the main control panel allows easy access
Smart Homes or Smart Occupants? Supporting Aware Living in the Home
61
to all house controls and the Resource Dashboard, while the garage panel provides lighting and climate controls for the garage area, one-touch home control presets, and community and transportation information. We anticipate that at this location the resident may be particularly interested in public transit schedules, carpooling options, and data on the performance of their electric car. The mobile device provides feedback and control to the residents of the home from their pocket – a simplified remote house control. Mobile applications were developed as an extension of the web application (Figure 5). They offer a subset of the features available in the PC user interface, with each feature redesigned for use on a mobile device. For example, the controls available from the mobile emphasize ease of use through logical groupings. These “master” controls allow the resident to adjust the lights for a whole room, or shades for a whole house façade, with a single control. More fine-grained control of individual fixtures is still available, but a hierarchy of control makes the most used items easily accessible. Graphing displays are simplified; a playful ‘Spinner’ interface allows residents to select pairs of performance variables through a slot-machine style interaction to compare and graph on demand. The mobile application provides alerts about house status and energy consumption and production thresholds and allows residents to keep in touch with their neighbourhood network, offering challenge monitoring and community chat.
5 Discussion There are many issues and challenges with meeting the design criteria we have established, and numerous open questions and investigations to pursue. Recently, West House has become tenanted as a living lab, and we will explore the system through longitudinal studies in a real-world context over the coming years. This design exercise of building the house system has given us the initial chance to assess the challenges and utility of the initial criteria. ALIS provides a variety of views that support all levels from in-the-moment awareness to detailed analytical visualizations. These are available both via the Internet and distributed throughout the house. However, while there has been enormous interest and keen appreciation of the informative art approach, we have as yet not established what are the best mappings to communicate resource use, nor even if the simple resource use data is most appropriate to show. In North House we displayed bands of light for progress towards the resident’s current goals as well as electricity production and use. In West House, the backsplash illuminates successively more as the day’s usage of resources increases. We discovered immediately (and unanimously) that our visitors loved the Ambient Canvas fully lit – exactly the wrong reinforcement message to send! Similarly, we discovered in our water photos study that even some depictive images are ambiguous. While the informative art approach seems very attractive and participants and visitors very positive about it, careful attention will need to paid to the representational strategies. We have only begun to explore contextual presentation and delivery. We do use common usage where possible to “demystify” energy use, but this is only the tip of the iceberg. Location affects meaning: water use in the laundry might be expressed in terms of loads of laundry and in the bathroom as bathtub equivalents. We have em-
62
L. Bartram, J. Rodgers, and R. Woodbury
bedded information in some places where decisions are made – notably the kitchen and the living room. We are beginning to prototype small, low-data displays for appliances to simulate different time-of-day and grid/water-load conditions, anticipating the introduction of sophisticated smart metering. ALIS’ controls seamlessly integrate with the standard house devices of switches and thermostats and are ubiquitously available. Simple one-button presents allow house energy use to be finely tuned to user-configured modes, avoiding the programmable thermostat paradigm, as the resident can now simply control the house according to his or her current activity as opposed to a schedule. We are developing computational models of “effort” to evaluate how this might reduce and/or aggravate the overhead of using the house. We have not yet fully implemented social norming tools, although currently we support a simplistic personal challenge that allows the user to set simple quantifiable goals. ALIS data can also be published to a Facebook™ plug-in that enables a community to share and compare challenges, as well as allowing the resident to compare their performance against regional performance if that data is provided by the public utilities. Finally, we opted to build the ALIS as a web application because it allowed us to easily connect to external services and tools already in use by residents, such as Google Calendar, localized weather APIs, social networks, and energy management software tools, and to provide a familiar interface and interaction paradigm to residents through the browser. Currently we are prototyping integrating ALIS data with common information management tools such as calendars. With this, the resident can see his or her energy use in the context of daily, weekly or monthly activities. This same philosophy guided the decision to set simple one-step controls to turn the house “on” or “off”; into active or automated states. People are already used to using light switches when they leave the house or go to sleep: we simply extend the reach of that simple habit to whole-home configurations.
6 Conclusion and Future Work The Aware Living Interface System is an integrated monitoring and interactive control system to encourage more efficient resource use in the home. ALIS is based on an extensible control and feedback architecture that can accommodate a diversity of energy sources, hardware devices, and home control contexts. It provides a range of user interfaces on different platforms, from smart phones and web browsers to embedded and ambient displays, each designed to support sustainable decision-making in the home. We have operationally tested the system in two very challenging public showcases, where more than 120,000 people have visited and interacted with the live systems over a combined period of 4 weeks. Our design experience with these two unique homes has helped us refine a set of criteria for building more effective information systems for residential conservation. We offer these criteria as a contribution. Our future work will build on these lessons, and emphasize thorough evaluation of the system in a real-world context. Studying residents in this fully functional home in a vibrant neighbourhood will enable us to evaluate a variety of approaches, and
Smart Homes or Smart Occupants? Supporting Aware Living in the Home
63
engage with occupants and community members to further develop our understanding of social motivations. We also intend to expand the range of visualization and control devices, and to explore adaptive display and lighting control that responds to viewing requirements and resident activity. We hope this research will help to improve the design and deployment of interactive interfaces for in-home technological systems that support sustainability, Acknowledgments. MITACS Accelerate BC, Western Economic Diversification Canada, BC Hydro, and the City of Vancouver funded this research. We sincerely thank our many industrial and government partners for their extensive support.
References 1. Abrahamse, W., Steg, L., Vlek, C., Rothengatter, T.: A review of intervention studies aimed at household energy conservation. Journal of Environmental Psychology 25(3) (2005) 2. Bartram, L., Rodgers, J., Muise, K.: Chasing the negawatt: visualization for Sustainable Living. IEEE Computer Graphics & Applications 30(3), 8–14 (2010) 3. Chetty, M., Bernheim Brush, A.J., Meyers, B.R., Johns, P.: It’s Not Easy Being Green: Understanding Home Computer Power Management. In: Proc. of the 27th Intl. Conf. on Human Factors in Computing Systems, pp. 1033–1042. ACM Press, New York (2009) 4. Chetty, M., Tran, D., Grinter, R.E.: Getting to Green: Understanding Resource Consumption in the Home. In: Proc. of the 10th Intl. Conf. on Ubiquitous Computing, pp. 242–251. ACM Press, New York (2008) 5. Clevenger, C., Haymaker, J.: The impact of the Building Occupant on Energy Simulations. In: Joint International Conference on Computing and Decision Making in Civil and Building Engineering, pp. 1–10 (2006) 6. DiSalvo, C., Sengers, P., Brynjarsdóttir, H.: Mapping the Landscape of Sustainable HCI. In: Proc. of the 28th Intl. Conf. on Human Factors in Computing Systems, pp. 1975–1984. ACM Press, New York (2010) 7. Eckl, R., MacWilliams, A.: Smart home challenges and approaches to solve them: A practical industrial perspective. In: Tavangarian, D., Kirste, T., Timmermann, D., Lucke, U., Versick, D. (eds.) IMC 2009. Communications in Computer and Information Science, vol. 53, pp. 119–130. Springer, Heidelberg (2009) 8. Fadelli, A.: Apple Inc. Intelligent Power Monitoring. U.S. Provisional Patent Application No. 61/079,751 9. Froehlich, J., Findlater, L., Landay, J.: The Design of Eco-Feedback Technology. In: Proc. of the 28th Intl. Conf. on Human Factors in Computing Systems, pp. 1999–2008. ACM Press, New York (2010) 10. Gustafsson, A., Gyllenswärd, M.: The Power-Aware Cord: Energy Awareness Through Ambient Information Display. In: Proc. of the 23rd Intl. Conf. on Human Factors in Computing Systems, Extended Abstracts, pp. 1423–1426. ACM Press, New York (2005) 11. Harle, R.K., Hopper, A.: The potential for location-aware power management. In: Proc. of UbiComp 2008, pp. 302–311 (2008) 12. Holmes, T.: Eco-Visualization: Combining Art and Technology to Reduce Energy Consumption. In: Proc. of the 6th ACM SIGCHI Conference on Creativity & Cognition, pp. 153–162. ACM Press, New York (2007)
64
L. Bartram, J. Rodgers, and R. Woodbury
13. Kientz, J.A., Patel, S.N., Jones, B., Price, E., Mynatt, E.D., Abowd, G.D.: The Georgia Tech aware home. In: CHI 2008 Extended Abstracts on Human Factors in Computing Systems, CHI 2008, Florence, Italy, April 05- 0, pp. 3675–3680. ACM, New York (2008) 14. Kim, Y., Schmid, T., Charbiwala, Z., Srivastava, M.B.: ViridiScope: Design and Implementation of a Fine-Grained Power Monitoring System for Homes. In: Proc. of the 11th Intl. Conf. on Ubiquitous Computing, pp. 245–254. ACM Press, New York (2009) 15. Leaman, A., Bordass, B.: Assessing building performance in use 4: the Probe occupant surveys and their implication. Building Research and Information 29(2), 129–143 (2001) 16. Mankoff, J., Matthews, D., Fussell, S.R., Johnson: Leveraging Social Networks to Motivate Individuals to Reduce their Ecological Footprints. In: HICSS 2007, Hawaii (2007) 17. Pierce, J., Schiano, D.J., Paulos, D.: Home, Habits, and Energy: Examining Domestic Interactions and Energy Consumption. In: Proc. of the 28th Intl. Conf. on Human Factors in Computing Systems, pp. 1985–1994. ACM Press, New York (2010) 18. Petersen, J.E., Shunturov, V., Janda, K., Platt, G., Weinberger, K.: Dormitory residents reduce electricity consumption when exposed to real-time visual feedback and incentives. International Journal of Sustainability in Higher Education 8(1), 16–33 (2007) 19. Riche, Y., Dodge, J., Metoyer, R.A.: Studying Always-On Electricity Feedback in the Home. In: Proc. of the 28th Intl. Conf. on Human Factors in Computing Systems, pp. 1995–1998. ACM Press, New York (2010) 20. Rogers, Y., Hazlewood, W.R., Marshall, P., Dalton, N., Hertrich, S.: Ambient Influence: Can Twinkly Lights Lure and Abstract Representations Trigger Behavioral Change? In: Proc. of the 12th Intl. Conf. on Ubiquitous Computing, pp. 261–270. ACM Press, New York (2010) 21. Schuitema, G., Steg, L.: Percepties van energieverbruik van huishoudelijke apparaten (Perception of energy use of domestic appliances). In: Bronner, A., Dekker, P., de Leeuw, E., de Ruyter, Smidts, A., Wieringa, J. (eds.) Ontwikkelingen in Het Marktonderzoek. Jaarboek 2005 (Developments in Marketing Research. Yearbook 2005), pp. 165–180. De Vrieseborch, Haarlem (2005) 22. Steg, L.: Promoting Household Energy Conservation. Journal of Elsevier Science, 4449– 4453 (2008) 23. Stern, P., Dietz, T.: The value basis of environmental concern. Journal of Social Issues 50(3), 65–84 (1994) 24. Stringer, M., Fitzpatrick, G., Harris, E.: Lessons for the future: Experiences with the installation and use of today’s domestic sensors and technologies. In: Fishkin, K.P., Schiele, B., Nixon, P., Quigley, A. (eds.) PERVASIVE 2006. LNCS, vol. 3968, pp. 383–399. Springer, Heidelberg (2006) 25. Weiss, M., Mattern, F., Graml, T., Staake, T., Fleisch, E.: Handy Feedback: Connecting Smart Meters with Mobile Phones. In: Proc. of the 8th Intl. Conf. on Mobile and Ubiquitous Multimedia, Article No. 15. ACM Press, New York (2009) 26. Wood, G., Newborough, M.: Energy-use information transfer for intelligent homes: Enabling energy conservation with central and local displays. Energy and Buildings 39, 495–503 (2007) 27. Woodbury, R., Bartram, L., Cole, R., Hyde, R., Macleod, D., Marques, D.: Buildings and Climate Solutions. Pacific Institute for Climate Solutions (October 2008) 28. Woodruff, A., Hasbrouck, J., Augustin, S.: A Bright Green Perspective on Sustainable Choices. In: Proc. of the 26th Intl. Conf. on Human Factors in Computing Systems, pp. 313–322. ACM Press, New York (2008) 29. Woods, D.D.: Automation: Apparent simplicity, real complexity. In: Mouloua, M., Parasuraman, R. (eds.) Human Performance in Automated Systems: Current Research and Trends, pp. 1–7. Erlbaum, Hillsdale (1994)
Input Devices in Mental Health Applications: Steering Performance in a Virtual Reality Paths with WiiMote Maja Wrzesien1, María José Rupérez1, and Mariano Alcañiz1,2 1
Instituto Interuniversitario de Investigación en Bioingeniería y Tecnología Orientada al Ser Humano, Universidad Politécnica de Valencia Camino de Vera s/n, 46022 Valencia, Spain 2 CIBER, Fisiopatología Obesidad y Nutrición, CB06/03 Instituto de Salud Carlos III, Spain
Abstract. Recent studies present Virtual Reality (VR) as potentially effective technology in the Mental Health (MH) field. The objective of this paper is to evaluate two interaction techniques (traditional vs novel) using a popular and low-cost input device (WiiMote) within a theoretical framework of the Steering Law. The results show that the WiiMote responds to the requirements for the MH technologies, and that the law of Steering continues to be valid on all of three paths. This opens up a new range of possible research studies for the design and evaluation of interaction techniques in MH field. Keywords: Mental health, virtual reality, steering law.
requirements regarding the input device design for the VR environments have been proposed [6]. However, the input devices used in mental health VR applications should take into account some additional design recommendations, closely related to general guidelines for MH technologies [7]. First, the input device should be attractive and engaging. Since engagement is one of the critical factors in creating client–therapist relationship, attractive and fun input device might be a first step in engaging clients in the interaction with technology that in turn might help them getting engage in the treatment and relationship with the therapist. Second, the input device should be intuitive and easy to use. Indeed, since the clients have different social and cultural backgrounds, the input device should be easy to use and to learn. Moreover, according to Doherty et al. [7], it seems to be beneficial for the clients to use familiar for them and easily available technologies, since technology is rarely used in day-to-day client work or in therapist training. Third, the input device should be affordable. In fact, interacting in a VR environment often involves the use of devices intentionally made for that purpose. However, these devices are generally expensive and often not available for the average consumer. Finally, the input device should be flexible, allowing the designers to program it in different ways. Indeed, an important requirement for the MH technology to be of practical use corresponds to the necessity for the system to be adapted to the brand range of MH disorders [8]. Thus, the flexibility of the input device would allow the designers to adapt it according to the needs of a specific application.
2 Evaluation The Steering Law [9] is a model of human movement that predicts the speed and total time spent by a user who steers with a pointing device through a tunnel or along a path presented on a screen. It describes a linear relationship between the mean time of steering along the path and the index of difficulty of the path, which is the ratio between the length and width of the path. In other words, the longer the path and the smaller the width, the greater the mean steering time. This predictive model was chosen as a framework for this evaluation for the following reasons. First, it brings consistency to empirical evaluations of input devices by using rigorous methodology. Second, it gives researchers the opportunity to compare their results in a cross-study comparison. Finally, it allows to extend its application in a 3D straight and circular paths [10] to a path that is more representative in the MH applications (a maze path). The WiiMote input device [11] was chosen for this study for the following reasons. First, this type of device is a low-cost and currently used 3D input device. Thus, its use seems interesting in terms of a low financial impact and quite high popularity which corresponds to the affordability, familiarity and attractiveness requirements. Second, the WiiMote presents many different possibilities in terms of interactions techniques (for 2D and 3D tasks). Finally, the WiiMote presents a wide range of possible functionalities to be programmed, which corresponds to the flexibility requirement. More specifically, in the case study presented in the following sections the WiiMote input device was programmed according to Bowman et al. classification [6]: input device with (a) both discrete components (buttons) for direction/target selection task, and continuous components (gesture-based movement, i.e. wrist movements) for velocity/acceleration selection, which we can call a traditional
Input Devices in Mental Health Applications: Steering Performance
67
interaction technique; (b) continuous components (gesture-based movement, i.e. wrist movements) for both direction/target selection task and velocity/acceleration selection, which can be called a novel interaction technique. The traveling task in a maze path was chosen as the task used in these empirical study for the following reasons. First, traveling, which is a motor component of navigation [6], supports a cognitive component of the navigational process, as well as other interaction-related and therapy-related processes. Second, traveling is a basic task in many VR mental health applications in order to move from a current location to another target location, and support other type of tasks. Finally, travel in a maze path seems better reflect the characteristics of MH applications than the previously basic shapes used in this theoretical framework (straight and circular). The travel task in a maze path such as a supermarket environment or a city center can be applied in the MH applications such as cognitive rehabilitation, preventive systems for dementia, or VR post traumatic stress disorder treatment.
3 Methodology A fully-crossed, within-subject factorial design with repeated measures was used. The independent variables were the following: (a) the path shape with three modalities (P=straight, circular and maze); (b) the interaction technique (IT=novel vs traditional); and (c) the index of difficulty defined by different path widths and the same path length, with six modalities (ID=122, 144, 168, 190, and 212 cm). The steering time and error rates were the dependant variables. Moreover, users’ short post-experience semi-directive interview data were collected. In order to avoid the learning process between trials, the order of use of the input device, index of difficulty, as well as the path shape presentation was balanced according to a Latin square pattern. Also, the entrance sides to the maze path and the turning direction for the circular path were changed between the two input devices. Touching the walls of the paths resulted in an error being recorded. According to Zhai and Woltjer [10], only the trials without errors were taken into account in the data analysis. The error rates were analyzed separately. Eighteen people (seven men and eleven women; 19-36 years old (MD=25,50, SD=5,15)) participated in this study. The participants were selected by previously fixed exclusion criteria (left handed people, significant experience in tridimensional and in basic computer games, significant experience in 3D graphics, and significant experience in use of the WiiMote). According to recommendations of Doherty et al.11 the evaluation was performed with non-clinical population; however the sample aimed at reflecting the clinical population characteristics (i.e., diversity in social and cultural background; diversity in age; and small experience with innovative technologies). Three different path shapes were used in the experiment: a straight path, a circular path (see Figure 1 (a) and (b)), and a maze path (see Figure 1 (c)). Six different widths of the path were set for each path shape at 100, 122, 144, 168, 190, and 212 cm simulating paths in which a human could move. The width of the avatar used to navigate in the virtual environment was 45 cm simulating a human body. The avatar was placed equidistant from the two walls of the paths. The virtual environment for the experiment was developed under 3dGamestudio, version 7.77.
68
M. Wrzesien, M.J. Rupérez, and M. Alcañiz
(a)
(b)
(c)
Fig. 1. The three paths used in the experiment: straight (a), circular (b), and maze (c)
The image of the paths was projected on a 180x240 cm screen with 1024 by 768 resolution. The output device was chosen as the most often used device for the VR mental health applications in our research center. The task consisted of traveling through a virtual environment toward the clearly marked finishing point. No time pressure or constraints were imposed during the trials; however, the participants were asked to perform the task as fast as possible without touching the walls, restriction needed to validate the Steering Law. Two different set-ups of the WiiMote were tested: a novel interaction technique involving no use of any buttons (only wrist movements; see Figure 2 (a)), and a traditional interaction technique involving use of buttons (see Figure 2 (b)).
(a)
(b)
Fig. 2. Two WiiMote set-ups (novel (a) vs traditional (b)). The bold blue arrows in Figure (a) indicate user motion while navigating and accelerating with the device involving no use of any buttons, only wrist movements, the bold blue arrows in Figure (b) indicate user motion while accelerating, and the thin blue arrows indicate directions of the arrow buttons used for traveling.
4 Results Figure 3 (a, b, c, d, e, f) show the regression results between the index of difficulty and the mean completion time for both interaction techniques and for all paths. The index of difficulty was calculated by dividing the path length by six different path widths. As Figures 3(a) and 3(b) show, there was a strong linear relationship between the mean time and index of difficulty for the straight path with the traditional
Input Devices in Mental Health Applications: Steering Performance
69
interaction technique (r2=0.996), and with the novel interaction technique (r2=0.985). Figures 3 (c) and 3(d) show a slightly different but still strong linear relationship between the mean time and index of difficulty for the circular path (r2=0.915 for traditional interaction technique, and r2=0.988 for the novel interaction technique). Finally, Figures 3(e) and 3(f) show a strong linear relationship between the mean time and the index of difficulty for the maze path with the traditional interaction technique (r2=0.991) and with the novel interaction technique (r2=0.962).
Fig. 3. Index of Difficulty vs trial completion time in the straight path with traditional (a) and novel interaction technique (b); in the circular path with traditional (c), and novel interaction technique (d); and in the maze path with traditional (e), and novel interaction technique (f)
The highest performance related to the traditional interaction technique in term of slope (less time increase per unit of ID, corresponding to 1/b in steering law equation) was for the straight path (IP=1/0.162), followed by the maze path (IP=1/0.335), and the lowest performance was for the circular path (IP=1/0.507). The performance related to the novel interaction technique in term of slope was highest for the circular path (IP=1/0.107), followed by the straight path (IP=1/0.117), and then the maze path (IP=1/0.833).
70
M. Wrzesien, M.J. Rupérez, and M. Alcañiz
Due to the differences in the programming set-up of the direction selection (input device set-ups with mixed components, with arrow buttons and wrist movements vs input device set-ups with continuous components, with only wrist movements), the input device set-ups cannot be compared using the slope and the intercept that are derived from the Steering Law equation. However, the two interaction techniques can be compares in terms of error rates. The results show (see Figure 4(a)) that the error rate depends on the path width (F5,85=53,362, p<.000) with significantly more errors for the narrowest path than for the other paths (post hoc multiple comparison with Bonferroni correction, p<.0001).
(a)
(b)
Fig. 4. The error rates for both interaction techniques regarding different path shapes and different indexes of difficulties (a), and regarding different path shapes
(a)
(b)
Fig. 5. The error rates for traditional (a), and for novel interaction technique (b), regarding different path shapes and different indexes of difficulties
In addition, as the results in Figure 4(b) show, the error rate depends on the different interaction technique (novel vs traditional) (F5,85=5,923, p<.026) with significantly more errors for the novel interaction technique (where only wrist was involved). More specifically, the error rates were significantly higher in the maze path than in the other two (post hoc multiple comparison with Bonferroni correction, p<.026). Finally, there was a significant interaction (F10,170=3,501, p<.000) between the path condition (straight, circular, maze) and the path width in both interaction techniques (see Figure 5(a) and (b)). The informal interview revealed that the participants preferred the novel interaction technique slightly more than the traditional interaction technique (57% vs 43%). The
Input Devices in Mental Health Applications: Steering Performance
71
majority of participants considered the novel interaction technique fun and intuitive (“I liked this technique because it was new and entertaining”; “It was fun to navigate like this”). Some of the participants that had difficulties in using the novel interaction technique considered that once they used to it, it was not so difficult (“Well, in the beginning it was hard, but once you get used to it it’s quite fun”). On the other hand, the participants that chose the traditional interaction technique considered it easy to use and familiar (“I liked it because it was the same thing as playing computer games on the keyboard”; “I preferred this technique because it’s like using the remote control”). In general, participants enjoyed traveling with both interaction techniques (“It was so fun to walk in virtual environments”; “Call me for the next study”).
5 Discussion and Conclusions The objective of the study was to evaluate an adapted to MH applications input device in a case study of traveling task. The results show that the WiiMote responds very well to all requirements. First, according to the informal interviews, the traveling task with different WiiMote set-ups was enjoyed and perceived as fun by all participants. Since positive emotional response such as enjoyment is an integral part of engagement [12], we can therefore consider that the WiiMote input-device can engage users. Second, the WiiMote set-ups were intuitive to use and easy to learn. Although, both interaction techniques were appreciate equally by the participants, the results show that there were significantly more errors for novel interaction technique (involving wrist movements) than for traditional interaction technique (involving use of arrow-buttons and wrist movements). On the other hand, the error rates obtained with the traditional interaction technique were significantly greater for the narrowest circular path (see Figure 5a). Therefore, we can conclude that the choice between these two interaction techniques and its usefulness depends on the path shape in a particular application. Finally, WiiMote responds to the flexibility requirement. In this study we evaluate two WiiMote set-ups; however, WiiMote can be programmed in many ways. For instance, a different interaction technique with a more controlled bimanual interaction (according to Guiard’s model) can be proposed (see Figure 6).
Fig. 6. Bimanual interaction technique. The bold blue arrows indicate user motion while accelerating and the thin blue arrows indicate user motion while navigating
Moreover, our objective was to extend the theoretical framework to a maze path. The study confirmed that the Steering Law established for the circular and straight paths in 3D environments is also valid for a maze path. The correlations in this
72
M. Wrzesien, M.J. Rupérez, and M. Alcañiz
experiment are slightly smaller than those obtained by Zhai and Woltjer [10]. These differences might be due to the inexperienced with new technologies population used in this study. We can consider this study as a first step forward to better know the requirements for the input devices used in specific domain such as VR mental health technologies. We hope that this will motivate more researchers to carefully chose and design the input devices for their specific MH applications, which in turn will allow to create a more systematic typology of input devices for MH technologies. Acknowledgements. This study was funded by Ministerio de Educación y Ciencia Spain, Project Game Teen (TIN2010-20187) and partially by projects Consolider-C (SEJ2006-14301/PSIC), “CIBER of Physiopathology of Obesity and Nutrition, an initiative of ISCIII” and Excellence Research Program PROMETEO (Generalitat Valenciana. Conselleria de Educación, 2008-157).
References 1. Frohlich, B., Hochstrate, J., Kulik, A., Huckauf, A.: On 3D Input Devices. IEEE Computer Graphics and Applications 26(2), 15–19 (2006) 2. Jin, W., et al.: Improving the visual realism of virtual surgery. In: Proc. Medicine Meets Virtual Reality, vol. 13, pp. 227–233 (2005) 3. Zudilova-Seinstra, E.V., de Koning, P.J.H., Suinesiaputra, A., et al.: Evaluation of 2D and 3D glove input applied to medical image analysis. International Journal of HumanComputer Studies 68(6), 355–369 (2010) 4. Botella, C., Quero, S., Baños, R.M., et al.: Virtual Reality and Psychotherapy. In: Riva, G., Botella, C., Légeron, P., Oplate, G. (eds.) Internet and Virtual Reality as Assessment and Rehabilitation Tools for Clinical Psychology and Neuroscience, IOS Press, Amsterdam (2004) 5. Zhai, S.: User Performance in Relation to 3D Input Device Design. Computer Graphics 32(4), 50–54 (1998) 6. Bowman, D.A., Kruijff, E., LaVoila Jr., J.J., Poupyrev, I.: 3D User Interfaces, Theory and practice. Addison-Wesley, Reading (2005) 7. Doherty, G., Coyle, D., Matthews, M.: Design and evaluation guidelines for mental health technologies. Interacting with Computers 22(4), 243–252 (2010) 8. Coyle, D., Doherty, G., Sharry, J.: PlayWrite: End-User Adaptable Games to Support Adolescent Mental Health. In: Proc. CHI (2010) 9. Accot, J., Zhai, S.: Beyond Fitts’ Law: Models for Trajectory-Based HCI Tasks. In: Proceedings CHI, pp. 295–302 (1997) 10. Zhai, S., Woltjer, R.: Human Movement Performance in relation to path Constraint–The Law of Steering in Locomotion. In: Proc. IEEE Virtual Reality Conference (2003) 11. Lee, J.: Hacking the Nintendo WiiRemote. IEEE Pervasive Computing 7(7), 39–45 (2008) 12. Shernoff, D., Csikszentmihalyi, M., Schneider, B., Schernoff, E.: Student engagement in high school classrooms from the perspective of flow theory. Sch. Psy. Quar. 2, 158–176 (2003)
‘Acted Reality’ in Electronic Patient Record Research: A Bridge between Laboratory and Ethnographic Studies Lesley Axelrod1, Geraldine Fitzpatrick2, Flis Henwood3, Liz Thackray1, Becky Simpson6, Amanda Nicholson4, Helen Smith4, Greta Rait5 , and Jackie Cassell4 1
Human Centred Technology Lab, Uni of Sussex, Brighton BN1 9QH UK HCI Group, Technical Uni of Vienna, Favoritenstrase 9-11, A-1040, Vienna, Austria 3 School of Applied Social Science, Uni of Brighton, Falmer, BN1 9PH, UK 4 Brighton and Sussex Medical School, Mayfield House BN1 9PH, UK 5 Dept. Primary Care and Population Health, UCL Medical School, London NW3 2NF UK 6 PLAYOUT http://www.playout.co.uk 2
Abstract. This paper describes and reflects on the development and use of ‘acted reality’ scenarios to study variability in General Practitioners’ (GPs’) record keeping practices, particularly their use of free text and coded entries. With actors playing the part of patients and in control of certain elements of the interaction, the acted reality approach creates a bridge between the controlled but often unrealistic laboratory setting and the arguably more ‘realistic’ but often messy world observed in traditional ethnographic studies. The skills and techniques of actors were compelling, helping to develop and sustain interaction, whilst keeping the process on track and providing rich data. This paper discusses the benefits and challenges of working with actors in this specific context and argues that the acted reality approach might be applied elsewhere in HCI research, especially in contexts where there are multiple individuals involved, but where the behaviour of one user is of special interest. Keywords: acted reality, electronic patient records, HCI, virtual patient, drama.
of free text data to health researchers may lead to misleading estimates of incidence and prevalence of disease and subsequent need for provision of care [14,21]. This paper reports on a methodological innovation developed for use within the large multidisciplinary Patient Records Enhancement Programme (PREP), which aims to find ways to extract useful research data from the free text entries of primary care EPRs. Our project comprises four work streams, one of which involves HCI user studies in the field, to explore and understand the context in which records are constructed in primary care, and the factors affecting the variability of data entry practices, particularly with respect to coded and free text entries. This work stream puts users at centre stage, drawing on HCI and wider socio-technical research. It is aimed at better understanding of, and support for development and use of health information technologies. Such research has an increasingly important role to play in understanding how technologies function in practice and in informing the design of systems that are cost-effective, support work practice and help deliver improved care. HCI approaches used in health settings include large-scale studies and in situ observations of clinical practices with real patients, vital to show how computerised health record systems impact on communication in the consultation process [9, 13]. Ethnographic research, in particular, has been very instructive in pointing to the complexity and contingent nature of health care work [2] and has provided a useful counterbalance to laboratory-based studies that seek to control such variability and complexity. In this paper, we describe and reflect upon the development and use of what we term ‘acted reality’ scenarios which use professional actors playing the part of patients to help control variability in primary care medical consultations. We argue that the use of acted reality scenarios helps to bridge the gap between the controlled, but often unrealistic, laboratory setting and the arguably more ‘realistic’ but often messy world observed in traditional ethnographic studies. We describe the process used to develop acted reality scenarios and reflect on the benefits and challenges of using such scenarios in the context of an ethnographically-informed ‘user study’ and on how such an approach might be applied more widely within HCI research. The use of drama and actors is not new in HCI research, which has a long tradition of exploring complex topics in various settings, using a degree of artifice or dramatic interpretation [5], for example, paper, cardboard or video prototypes, mock-ups, roleplay [3,10,18], personas [6], storyboards, Wizard of Oz (WOZ), forum theatre [12] and Body and Place Storming [15]. Applied theatre [1] has explored concepts via audience immersion and participation (participatory theatre, e-drama, promenade theatre, interactive dramatic installations or enactive cinema [22]) and to elicit people’s views about ‘privacy and prejudice’ arising with the use of EPRs [16]. Newell et al [12] suggest one use of drama in HCI research is “a well briefed actor replacing a user within (particularly early) usability testing”. We have extended this method beyond the study of usability to seek an understanding of how actors might be used to better understand variation in work practice (here, record keeping in the health domain). In our case, the actor as patient is not the user of interest but is employed specifically to help reduce patient-side variability as we seek to explain variation in primary care doctors’ electronic record keeping practices. An alternative to the use of professional actors as patients could have been the use of a ‘virtual patient’ of some kind. Health professionals working with EPRs already collaborate across hybrid ecologies that involve physical and digital domains [7]. For
‘Acted Reality’ in Electronic Patient Record Research
75
example, digitised manikins are used for learning resuscitation techniques, online scenarios may supplement work experience placements and ‘serious’ digital games can support training [19]. Virtual patients have levels of fidelity, for example, cartoons, text linked to real images, or embodied conversational agents (ECAs) [17]. Virtual world scenarios, with hospitals and clinics populated by virtual staff and ‘medbiquitous’ patients that have adaptable standards for different uses, are sometimes employed [11]. However, a particular challenge is the design of virtual patients that can respond appropriately to unexpected contingencies that often arise in medical consultations. Here professional role-playing actors can really add value. Work of professional actors in medical education is well established, particularly in examinations of medical students and doctors, where actors play the part of patients, providing a standardised repertoire of responses [4,23]. Method acting, [20] that involves actors taking on thoughts and emotions of the character and drawing on their own life experiences, can be particularly useful in medical contexts. Actors who are additionally experienced in role-play and able to operate without a director’s script, are able to react and ad lib in a scenario, rather than merely performing a role. We describe building on traditions from medical education and HCI research, working with professional role-playing actors to develop acted reality scenarios for use in our study, followed by reflections on how the acted reality approach played out.
2 Developing and Using Acted Reality Scenarios PREP is concerned with the documentation practices of primary care health professionals, who create and use EPRs, making the resulting data available for secondary users, particularly health researchers. In choosing an approach, we dismissed using simulated laboratory settings that lack connection with authentic practices and rich contexts in which health care staff develop and use recoding systems. Studies of real consultations offer authenticity, but we would need very large numbers of similar patients in order to find enough with the same profiles and illnesses to enable us to compare GPs’ recording practices in the context of specific medical conditions. Ethical considerations prevent us from asking real patients to visit numerous GPs, in order to facilitate such comparisons. Role-play by medical students was also considered, but their lack of acting skills might impair the process. We decided to work with professional actors, already experienced in both medical exams and role-play, to act as patients. We developed two patient personas with contrasting medical conditions, outlined in Table 1, to offer contrasts and challenges in the recording process. These were used by the actors in the acted reality scenario, where the same actor would consult with different GPs, enabling us to observe how GPs go about conducting and documenting the consultation using their particular EPR systems. Working with the National Institute for Health Research, Primary Care Research Network, South East (NIHR PCRN-SE), we recruited six GP practices with contrasting organisations and using different EPR systems. Four GPs from each of the six practices agreed to take part in the two simulated consultations (providing 48 simulated consultations in all). Both University and NHS Ethical and Research Governance approvals were obtained from the relevant authorities. The actors were
76
L. Axelrod et al.
paid industry rates and we compensated for NHS staff time used. GPs were given information sheets that detailed the project and informed consent was gained. Two patient personas were developed to offer significant contrasts and challenges in the consultation and recording process (see Table 1.) We iteratively developed and refined the persona scripts, with doctors and then with actors. As we could not be sure what questions GPs might ask, scripts had to be comprehensive before being sent to actors. Unclear or inconsistent details led to further refinement - for example, the script did not give non-verbal instruction and our ‘arthritic’ actor wanted to know how someone with such symptoms would move. We created ‘dummy’ medical histories, consistent with the personas and we worked with practice staff to install usable versions on each EPR system. In case GPs requested a physical examination, we provided actors with pictures that illustrated their condition (of swollen finger joints for rheumatoid arthritis and an inflamed throat for sore throat). We developed briefing notes for GPs telling them to act as normally as possible, to expect typical patients and to ask to examine patients if they so wished. A pilot consultation was videoed and studied by the research team and further refinements were made. Table 1. Persona variables – symptoms by condition (arthritis, sore throat) Medical Symptoms Gender Medical history “Red flag” symptoms National clinical guidelines Condition type Follow up
Rheumatoid Arthritis female more history present apply chronic likely to be required
Sore Throat male little history not present do not apply acute may not be required
Acted reality consultations were captured on video, using two cameras placed to give a close up and full body views of the patient-GP combination. On-screen activity was captured with ACA Screen Recorder software. Post-consultation walkthroughs of EPR entries, with the GPs, sought to understand their rationale for data entry. Data relating to the GPs’ experiences and training were gathered. Resulting field notes, video and walkthrough and other data were collated and analysed. Video was edited with ‘picture in picture’ of screen capture, ready for input to NVivo, along with data from other sources such as staff interviews, observations, real consultations etc., ready for subsequent thematic analysis. While the development and use of acted reality scenarios was largely very successful in highlighting variability in GPs’ record keeping practices using EPRs, and reasons for that, the focus of the rest of the discussion is the analysis specific to the role of the actor and the acted reality aspects of the study method, not on EPR data recording practices.
3 Reflections on the Use of Acted Reality Scenarios Developing personas and scripts with our wider team, including actors themselves, proved to be very effective in producing realistic and robust scenarios through which actors could enact the reality of the selected medical conditions with GPs. Regular
‘Acted Reality’ in Electronic Patient Record Research
77
monitoring, review and discussion were needed to check actors’ interpretation of roles. Through piloting (rather like a rehearsal) and later ‘performances’, the actors alerted us to points where the persona proved fragile, enabling us to improve upon it. The development of the dummy patient records on the different GP EPR systems also added to the convincing ‘stage set’ enabling roles to be played in context, although the decision to provide only minimal back history for the ‘sore throat’ patient worked less well as these consultations sometimes took longer as GPs spent time recording ‘new patient’ information. Providing briefings for GPs offered them much needed reassurance about the overall process and encouraged them to act as normally as possible. For example, we were aware that GPs often do physical examinations and were concerned they might feel constrained in doing so when knowing they were working with professional actors. We spent time explaining that they should proceed as they would in any other consultation and could ask to examine the actor-patients if they so wished. GPs’ experiences of engaging in acted reality scenarios varied. Most were impressed by the way the consultations worked and described the experience as feeling ‘realistic’ and ‘quite compelling’ and one GP ‘almost forgot it was not real’, although he said that it took a few moments to ‘get into it’. Although they knew they were dealing with actors, GPs often commented on ‘the patients’ needs’ and concerns. Some clearly found the interaction easy while others found the process caused trepidation. While it was recognised that actors would need a few quiet moments prior to starting sessions to study scripts and get ‘into role’, we realised that GPs also needed time to prepare themselves to perform effectively in the acted reality scenarios. While some GPs played along from the start, others were hesitant and, in these cases, the actors worked hard to draw the GPs into the scenario, using both spoken familiar phrases and body language to do this. The professional role-play actors played a critically important part in making the acted reality scenarios a success. They helped normalise the consultation; their confident handling included introducing themselves and greeting the GPs just as real patients do, showing their own vulnerability, and using their personalities to entice doctors to become players. The standardised personas were brought to life by the actors, who skillfully used both verbal and non-verbal means of communicating their symptoms to the GPs. For example, the actor-patient with arthritic symptoms held her hands as if they really were stiff and painful. In their real lives, the actors had personal experience of visiting doctors and either direct or indirect experience of the types of symptoms they were asked to present with. This added to the veracity of their performances and actors’ individualised interpretations meant they presented convincingly as patients. There were times when acted reality scenarios worked less well. Examinations of pictures of symptoms sometimes broke the ‘spell’ of the acted reality scenario and actors had to work particularly hard at drawing the GPs back into the performance at these points, often using a moment of exaggerated body language such as leaning in towards the GP to regain their attention. After the consultation, some GPs remarked that if the patient had been ‘real’ they might also have asked to make further physical examinations, for example of the feet, as well as the hands of the person with arthritic symptoms, but that they had proceeded without doing so on the basis that seeing the hands was enough. One GP broke into an ‘aside’ at the point of the physical examination. When presented with the photo, he stepped completely out of role,
78
L. Axelrod et al.
looked away from the patient, the photo and the computer and addressed the camera, saying ‘this shows…… and of course if this were real I would also ask to examine……’. GPs sometimes followed unexpected lines of questioning, requiring the actors to think on their feet and ad lib an unscripted answer. We advised them to be vague if they were unsure of the best response, and this worked well as patients in real life sometimes give vague or imprecise answers to questions. In one early session, a GP asked the actor-patient with arthritic symptoms what the pain felt like. We had not scripted for this. She replied ‘I’m not really sure how to describe it’ – which is often the case in real life. Actors described emotional effects from the consultation process, and after the performance was complete, they still ‘felt’ the symptoms they had been simulating and expressed their feelings about the consultation. They needed several minutes to get ‘out of role’ and start to be themselves again. Doctors may also need time to step out of role in order to be ready to reflect on their experiences. Time for this must be built in to study design. Using professional actors in research such as this does carry costs, however. Roleplay is a specialist discipline within acting and adding actors to the multidisciplinary team takes time, money and effort. There is extra organisational work in getting actors, technicians and researchers in place in pressurised real world locations where setting aside rooms and doctors’ time is difficult and locations are not perfect.
4 Conclusions In this paper, we have described and reflected upon the process of developing and using a novel acted reality method of working with professional role-play actors, in our case as part of a large multidisciplinary research programme that aims to extract useful information from free text entries in primary care EPRs. We wanted to understand working practices but sought to avoid both the controlled and inauthentic scenarios offered in laboratory-based research and the arguably more authentic but much more ‘messy’ scenarios observed in traditional ethnographic studies. Working with actors to build and repeat acted reality scenarios was devised as a method that would enable us to control the ‘patient’ aspects of the consultation in order to focus on the doctors’ working practices and identify factors shaping documentation practice that are not related to differences in patients and their symptoms. Our experience of developing and using the acted reality approach suggests that drawing on the experience and skills of professional role-play actors, using a combination of dramatic methods, has much to offer in HCI research. Professional actors, experienced in role-play, can enrich scenarios and bring personas to life. Further, the actors’ professional skills play a crucial role in engaging the ‘users’ of interest (here, GPs), in the performance, helping them to play their part, alerting them to moments where they step out of role, and taking steps to repair breaks in fidelity. Our reflections suggest that, as computer systems become pervasive and often function in settings where there is more than one individual involved in interaction of interest, acted reality scenarios are a valuable method to help facilitate a degree of control over that interaction without resorting to a fully-controlled laboratory-based research design. In the acted reality scenarios described here, the actors are largely
‘Acted Reality’ in Electronic Patient Record Research
79
controlled, allowing the research focus to remain on the doctors. This could be very valuable elsewhere in HCI research where there are multiple individuals involved in interaction, but where the behavior of one type of user is of particular interest. Many actors specialise in role-play in the medical domain, and there is no reason why some should not develop specialised acting skills that would be applicable in other HCI research domains. Our experience with acted reality scenarios also has implications for virtual patient design. For example, minor slips and errors and some vague responses could be built in to such systems to be used as adlibs or in response to unexpected actions by the user, without breaking with the storyline. In this paper, we have reflected only on the process of developing and using acted reality scenarios in HCI research on electronic patient record use. On-going analysis is being conducted to assess how far this approach has added to our overall understanding of these practices. In particular, data from these acted reality scenarios will be compared with data collected in consultations with real patients to see if and how documentation and recoding practices differ. The use of acted reality scenarios also raises interesting questions about the potential for a particular type of ‘Hawthorne effect’ (where people alter behaviour because they are aware of being observed). For example, it would be interesting to know whether actors’ presence in the consultation has an effect over and above any observation carried out by researchers via video recordings. Such knowledge will help contextualise any claims we might make on the basis of data derived from acted reality scenarios. Acknowledgements. This work is approved by NHS Ethics Committee Brighton West (REC 09/1111/45 SRG/04/05/39 and supported by The Wellcome Trust (086105). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Sussex NHS Research Consortium & PCRN-SE; all participants; the project team from Brighton and Sussex Medical School and Universities of Sussex and Brighton; and collaborators from PLAYOUT, University College, London, and the GPRD.
References 1. Ackroyd, J.: Applied Theatre: Problems and Possibilities. Applied Theatre Researcher 1 (2000) 2. Berg, M., Goorman, E.: The contextual nature of medical information. Int. J. Medical Informatics 56, 51–60 (1999) 3. Brandt, E., Grunnet, C.: Evoking the future: drama and props in user centered design. In: Proc. Participatory Design Conference, CPSR (2000) 4. Bosse, H., Nickel, M., Huwendiek, S., Jünger, J., Schultz, J., Nikendei, C.: Peer role play and standardized patients in medical training. BMC Medical Education 10, 27 (2009) 5. Carroll, J.M.: Making Use: Scenario-Based Design of Human-Computer Interactions. MIT Press, Cambridge (2000) 6. Cooper, A.: The Inmates are Running the Asylum. In: SAMS (1999) 7. Crabtree, A., Rodden, T.: Hybrid ecologies: understanding interaction in emerging digitalphysical environments. Pers. Ubiq. Computing 12, 481–493 (2008) 8. Fitzpatrick, G.: Integrated Care and the Working Record. Health Informatics J. 10, 251– 252 (2004)
80
L. Axelrod et al.
9. Greatbatch, D., Heath, C., Campion, P., Luff, P.: How do desk-top computers affect the doctor-patient interaction? Family Practice 12(1), 32–36 (1995) 10. Light, A., Weaver, L., Healey, P., Simpson, G.: Adventures in the Not Quite Yet: using performance techniques to raise design awareness about digital networks. In: Proc. DRS, Sheffield (July 2008) 11. Medbiquitous virtual patient working group website, http://www.medbiq.org/working_groups/virtual_patient/ index.html (accessed April 7, 2011) 12. Newell, A., Morgan, M, Gregor, P. Carmichael, A.: Theatre as an intermediary between users and CHI designers. In: CHI 2006, pp. 111–117 (2006) 13. Newman, W., Button, G., Cairns, P.: Pauses in doctor-patient conversation during computer use: The design significance of their durations and accompanying topic changes. Int. J. Hum.-Comput. Stud. 68(6), 398–409 (2010) 14. Nicholson, A., Tate, A.R., Koeling, R., Cassell, J.: What does validation of cases in electronic record databases mean? The potential contribution of free text. Pharmacoepidemiology & Drug Safety 20(3), 321–324 (2011) 15. Oulasvirta, A., Kurvinen, E., Kankainen, T.: Understanding contexts by being there: case studies in bodystorming. Pers. Ubiq. Computing 7(2), 125–134 (2003) 16. RAE Privacy And Prejudice EPR views of young people (2010), http://www.raeng.org.uk/news/publications/list/reports/ Privacy_and_Prejudice_EPR_views.pdf (accessed April 7, 2011) 17. Salazar, V.L.: Reducing the effort in the creation of new patients using the virtual simulated patient framework. In: WMED IEEE Workshop, pp. 764–769 (2009) 18. Seland, G.: System designer assessments of role play as a design method: a qualitative study. In: Proc. NordiCHI 2006, vol. 189, pp. 222–231. ACM, New York (2006) 19. Stott, D.: Learning the Second Way. BMJ 335, 1122–1123 (2007) 20. Strasberg, L.: A Dream of Passion, http://www.leestrasberg.com/ (accessed April 7, 2011) 21. Tate, A.R., Alexander, M.G.R., Murray-Thomas, T., Anderson, S.R., Cassell, J.A.: Determining the date of diagnosis – is it a simple matter? The impact of different approaches to dating diagnosis on estimates of delayed care for ovarian cancer in UK primary care. BMC Medical Research Methodology 9(42) (2009); BMJ Open, doi:10.1136/bmjopen-2010-000025 22. Ticka, P., Vouri, R., Kaipainen, M.: Narrative Logic of enactive cinema. Digital Creativity 17(4), 205–212 (2006) 23. Ünalan, P., Uzuner, A., Çifçili, S., Akman, M., Hancıoğlu, S., Thulesius, H.: Using theatre in education in a traditional lecture oriented medical curriculum. BMC Med. Educ. 9(73) (2009) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2803161/ (accessed April 7, 2011)
Exercise Support System for Elderly: Multi-sensor Physiological State Detection and Usability Testing Jan Macek and Jan Kleindienst IBM Czech Republic V Parku 2294/4 Prague, Czech Republic {jmacek2,jankle}@cz.ibm.com
Abstract. We present an interactive system for physical exercise of older people and provide results of a usability study with target user group. The system motivates an elderly person to do regular physical activity based on an easy exercise in a monitored environment without a direct supervision from care-givers. Our system employs multi-modal interface including speech synthesis and speech recognition, as well as distance measurement using an ultrasound range finder. The system coaches the user through a sequence of body movements in the exercise utilizing an underlying human activity model. For evaluation of the performance of the user we present a statistical human activity model to estimate physical load of the user. The system tracks user load by monitoring heart rate and by scanning movement patterns using statistical estimators. At well-defined moments and when the scanning suggests there is a problem with the user, the user is asked to verify his ability to continue with the exercise. The system was tested on a set of elderly users to gather usability data and to estimate the acceptance of the system. While simplicity of the setup proved to work well for the users, suggestions for further extensions of the system were gathered. Usefulness of the concept was verified with a physiotherapist.
The system is composed of Automatic Speech Recognition (ASR) and Text-toSpeech (TTS) components, as well as of ultrasound sensor for distance measurements and of heart rate monitor. The advantage of the ultrasound sensor is in the reduced need of costly pre-processing of input signal, typically done for video input, before it is used in the actual system. The system is designed with high modularity in mind in terms of adding sensors and inputs to monitor the user. To make addition of new sensors to the system easy, we use a classifier fusion approach in the user state model. In the last few years, the area of exer-gaming became highly active with devices like Wii and Xbox offering new types of controllers and games controlled by whole body movements. Although these devices attracted interest among the older users [1], they do not provide games tailored to their specific needs. There exists a system SilverFit [2] tailored directly to the needs of the elderly. This system uses 3D vision system and comes with rather high price tag. We approach the problem with an expandable set of inexpensive and processing “low cost” sensors. Related work in the field of multimedia includes Lin et al. [3], who propose a meta-classifier approach based on classification of composed feature vectors. These vectors combine the predictions of classifiers from various modalities. Also, the approach copes with asynchronous nature of the outputs from the classifiers. In medical care, wireless sensor networks are developed [4]. Patients are equipped with wireless vital sign sensors to allow caregivers to monitor their status. The presented paper expands on the work [5] in following directions. It adds the heart rate sensor to the inputs of the system. It evaluates various classifiers for the expanded set of attributes. And finally, it presents more thorough usability study of the expanded system with the target user group. In the first part of the present paper, we demonstrate the performance of a selected set of features based on observation of user’s movements using the ultrasound range finder and of user’s heart rate. We use these features as inputs to four types of classifiers and we compare their performance. This allows us to conclude on usefulness of the measured variables in the framework of user’s physiological state estimation. Further, the comparison of single and fused classifier performances allows us to show their value for the modular user activity model. In the second part of the paper, we present a usability study of our system on a target user group that evaluates various aspects of the design as well as its overall acceptance.
2 Technical Realization We present an extension of the system PHEASY [5], an interactive multi-modal exercise support system for elderly. We add a heart rate monitor to the setup which combines voice technology with a distance measurement sensor. This setup allows the user to interact with the system primarily using body movements, leaving voice as a complementary modality of interaction used when the system needs to verify the user state. An avatar-based text-to-speech interface is used to improve the naturalness of the interaction. The heart rate monitor improves recognition of the physiological state of the user which is then reflected by the system during the exercise.
Exercise Support System for Elderly
83
idle user detected user welcome
welcome played
user left prompt playing
iterations done user finished
userdown user sits/stands up
verified fine verified unwell
userup
check load
overloaded geting feedback
verified fine
Fig. 1. The PHEASY system in action
Fig. 2. Simplified state model of the exercise system
We used an ultrasound sensor (SRF08 Ultra sonic range finder) to measure the posture height of the user during the exercise. Distance measurements were sampled at 5Hz, a rate sufficient to provide enough information about the subject’s activity to the model. Further, we used a heart rate monitor that sampled the heart rate of the user every two seconds. The synchronization of the sonar and heart rate data was done by reading the variables at the same time (every 200ms) while the subroutine handling the reading from the heart rate sensor refreshed the underlying value every 2 seconds. We used the same microphone (AKG400) that was used in training of the acoustic model for the automatic speech recognition (ASR) in the Embedded ViaVoice system (EVV) [6]. For the speech interface, we used the EVV text-to-speech (TTS) engine embedded in a talking head avatar [7]. The interaction of the components and the logical control of the system were implemented in the Conversational Interaction Management Architecture (CIMA) dialog framework [8, 9]. The system setup with a user in action is shown in Figure 1. The logical design of the system is presented in Figure 2. The calibration of the ultrasound sensor is crucial for well defined transitions between the states of the exercise. In the start, the system performs calibration of the chair height and goes to the "idle" state waiting for a user. When the user comes and sits (natural to do in our setup) the system goes to the "user welcome" state and records height of the seated user while playing the welcome message and instructions to the user. Based on the two measured heights, the system is able to detect the body postures as well as the event of user leaving the system. After the welcome and instructions are played to the user the system moves to the state "user down" and starts the cycle of prompting the user to perform respective posture change and waiting for the user to finish the change. This forms the main line of the exercise resulting in switching between the two states "user down" and "user up". During switching between the states, the switch durations and state dwelling durations are recorded. These values then translate to input features used by the diamond highlighted “check load” box to detect "high load"
84
J. Macek and J. Kleindienst
state of the user; if such an event occurs the system moves to the state "unwell". Here the user is prompted if he is feeling unwell and the exercise should terminate. If the user confirms the exercise terminates; if the user indicates that he feels fine the estimated state "unwell" is overruled and the exercises continues. When the user feels unwell or the final iteration was reached the system goes to the state "user finished". The goodbye prompt is played and after the user leaves the state "idle" takes over. User Activity Model. The cornerstone of the system is the user activity model. It is used to predict user's state based on the input features. The input features are extracted from the distance measurements and heart rate monitor readings during the exercise. In our experiments we use following input features: sit-down duration/average sitdown duration ratio, stand-up duration/average stand-up duration ratio, sitting/standing duration, difference of going up duration from average, difference of going down duration from average, sit-down/stand-up ratio, heart rate, and heart rate difference from the user’s average. The durations in the input features are measured from the moment the avatar prompts the user with the respective instruction until the user starts moving. The load estimation task is the highlighted diamond box in Figure 2. A classifier is used to estimate user’s physiological load and to classify it to the two classes “high” and “low”. Classifier fusion. Classifier fusion techniques combine classifiers that operate on distinct inputs. They work on a higher level of generalization than data fusion and feature fusion techniques (that we label “all attributes” in our result tables). The method of classifier stacking combines classifiers by training a meta-classifier that uses outputs of the partial classifiers to give the final classification. The classifier ensemble methods use classifiers with identical inputs which are trained using various types of data reweighting, resampling or bootstrapping and which are then assigned weights in the final ensemble.
3 Evaluation In this section we present experimental results which verify the possibility to recognize various levels of physical load of the user. The physiological load is estimated using statistical methods based on outputs from the distance sensor and from the heart rate monitor. All experiments were performed using a group of younger users, different from the target user group (60+ years). Data for the target user group will be collected in a longer term once the system is initially tuned. To evaluate performance of the proposed method, we collected a dataset of feature readings on eight subjects in their thirties. Two types of data were collected for this user group. In the first fold, the subjects came afresh to our measuring device and his performance during the exercise was recorded via the sonar distance measurements and heart rate sensor. In the second fold, the users were asked to perform physical exercises to achieve significant load on cardiovascular system prior to the exercise. As a result we obtained a dataset containing two classes in the examples, the “low load” class and the “high load” class. We collected a total of 166 examples: 76 cases of the class “low load” and 90 cases of the class “high load”.
Exercise Support System for Elderly
85
Table 1. Results of the baseline classification experiments. Precision, recall and F-measure shown per class. All attributes used.
Classifier Random forest C4.5 (decision trees) AdaBoost with decision stumps Bagging with REPTrees Naïve Bayes
Accuracy 82.5% 78.9% 76.5%
Precision Recall high low high low 0.802 0.862 0.9 0.737 0.796 0.781 0.822 0.750 0.763 0.768 0.822 0.697
F-measure high low 0.848 0.794 0.809 0.765 0.791 0.731
72.9% 64.5%
0.742 0.712 0.767 0.607 0.905 0.978
0.754 0.749
0.684 0.250
0.698 0.392
Experiments. To verify our hypotheses (1) about separability of the two classes in data and (2) about the improvements of classification accuracy by adding the distance based input features, we performed experiments by constructing following types of classifiers. Initially, we trained two separate classifiers on both types of data; i.e. one classifier just used the distance readings and the other one used heart rate readings. Next, we performed data fusion on the attribute level by merging the heart rate and sonar distance attributes into a single vector. Using this new merged data, we trained a combined classifier. Finally, we applied the classifier fusion technique. Here, we took the two classifiers trained on the two separate datasets and combined them using stacking into a single fused classifier [10]. As an addition and alternative to the previous schemes, we also experimented with an attribute selection algorithm that, using Best First search [11], selects a subset of attributes used to the train a classifier. In the experiments we used WEKA [12] to train and evaluate the classifiers. As the base classifier, we used the random forests that performed best in comparison to the other classifiers. In these classifier selection trials we compared Random Forests (10 trees built on 5 random attributes) to Bagging with REPTrees (Reduced Error Pruning decision trees), C4.5 (decision trees), AdaBoost with Decision Stumps, and Naïve Bayes. The results of the baseline experiments are shown in Table 1. Experimental Results. Table 2 shows the results for classification experiments with the selected type of base classifier over the range of attribute selection and classifier combination techniques. The results were obtained using 20-fold cross validation run on all 166 instances available. Apparently, the attributes collected proved to separate the two load levels. Taking the random forest built on all attributes as a baseline the performance of a random forest classifier drops when it uses sonar attributes and improvement when it is built on heart rate attributes. Further improvement is achieved when using the Best First search for attribute selection and the random forest classifier (labeled as “attribute selection” in Table 2). Finally, the best performance was achieved with the stacking technique. Initially, two classifiers were built on the exclusive sets of sonar and of heart rate attributes. Then they were combined into a single classifier. The top performance of 4.3% error rate was achieved when logistic regression was used as the meta-classifier.
86
J. Macek and J. Kleindienst
Table 2. Results of the classifier and attribute combination experiments. Precision, recall and F-measure shown per class.
Classifier Random forest, all attributes Random forest, HR only attributes (1) Random forest, sonar only attributes (2) Random forest, attribute selection Stacking of (1) and (2) with meta-classifiers: - C4.5 - logistic regression
Accuracy 82.5%
Precision high low 0.802 0.862
Recall high low 0.9 0.737
F-measure high low 0.848 0.794
88.5%
0.890 0.880
0.9
0.868
0.895
0.874
78.9%
0.772 0.815
0.867
0.697
0.817
0.752
85.5%
0.837 0.882
0.911
0.789
0.872
0.833
90.9% 95.7%
0.941 0.877 0.966 0.948
0.889 0.956
0.934 0.961
0.914 0.961
0.904 0.954
4 Usability Evaluation This section summarizes responses in a usability testing performed on 13 users from the target user group (distinct from the group of user used in previous section). The users chose values from semantic differential scales with even number of options. The answers were clustered in the summary plots (Figures 4 to 12) for better legibility. Age distribution is shown in Figure 3. Most of the users were in their sixties. The users had no significant impairments. Few had minor hearing and vision problems. We asked the users if the exercise introduction and explanation was intelligible. At his point, the user was welcomed to the system and detailed description of the exercise was presented. The explanation intelligibility is shown in Figure 4. The combination of spoken and animated UI elements proved to be confusing to some users, when they focused solely on the video animation. This caused decoupling of their movement from the spoken guidance. Sometimes, the user started with the exercise without waiting for the audio explanation. Although the system reacted properly to the movements of users, sometimes the commands were cut short due to the switch to the following state of the exercise. Further, we looked at the acceptance of the two modalities of navigation through the exercise. The subjective usefulness of the two modalities is presented in Figures 5 and 6, respectively. The comparison of the two modalities is in favor of the animation of the exercise as indicated by 85% of the users. An interesting point was the speed of the exercise. The command to start the following movement is always issued immediately after the user reaches a particular state. This immediacy had strong influence on the perception of the exercise speed. The subjective speed of the exercise is shown in Figure 7. Related to this issue is the perceived speed of synthesized speech as shown in Figure 8. For both qualities, we observed that most of the users felt the speeds were faster than they would prefer.
Exercise Support System for Elderly
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
87
We asked the users if they enjoyed doing the exercise with the system and if they liked the general idea of performing exercise guided by a computer (Figure 9 and 10), and also if they would consider using the system in their homes (Figure 11).
5 Discussion and Conclusion Our first goal was to validate the usefulness of the sonar range finder measurements of user movement and the output of the heart rate monitor for classification of user’s physiological load. To prove the usefulness of our approach, we demonstrated performance of five types of classifiers trained on this type of data. The presented results support the addition of these combined modalities to the inputs of the user activity model. The best performance of 95.7% accuracy was achieved with the stacking technique with logistic regression as meta-classifier and random forests, trained on separate datasets for movement and for heart rate, as base classifiers. This allows us to combine the two modalities, user’s body movements and heart rate, as inputs to the user activity model. The usability study showed good acceptance of the system by the target user group. Although the nature of the exercise is straightforward, it was found joyful way to exercise by most of the users. During the usability testing we had a chance to consult the usefulness of our system with a professional physiotherapist to her daily work. The concept of the system was welcomed mainly to its focus on the movements of the
88
J. Macek and J. Kleindienst
daily life. Suggestion on extension of the system with further exercises of similar focus on daily activities was given and use in company of users was proposed. In the future work, we will focus on the dialog development as the corner stone of fluent interaction. Further, improvements of the user activity model with larger datasets from real use will add to the system performance. Acknowledgments. We would like to acknowledge support of this work by the European Commission under IST FP6 integrated project NetCarity, contract number IST-2006-045508.
References 1. Theng, Y.-L., Dahlan, A.B., Akmal, M.L., Myint, T.Z.: An exploratory study on senior citizens’ perceptions of the Nintendo Wii: the case of Singapore. In: Proc. of i-CREATe 2009 (2009) 2. SilverFit (2011), http://www.silverfit.nl/ 3. Lin, W.-H., Jin, R., Hauptmann A.: Meta-classification of Multimedia Classifiers. In: Proc. of Int’l Workshop on Knowledge Discovery in Multimedia and Complex Data, Taipei, Taiwan (May 6, 2002) 4. Shnayder, V., Chen, B., Lorincz, K., Fulford-Jones, T.R.F., Welsh, M.: Sensor Networks for Medical Care. TR-08-05, Div. of Engineering and Applied Sciences, Harvard University (2005) 5. Macek, J., Kleindienst, J.: PHEASY – Physical exercise assistance system - Evaluation and Usability Study. In: IADIS Interfaces and Human Computer Interaction 2010 (2010) 6. IBM. Embedded ViaVoice (2010), http://www-01.ibm.com/software/pervasive/embedded_viavoice/ 7. Kunc, L., Kleindienst, J.: ECAF: Authoring language for embodied conversational agents. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 206–213. Springer, Heidelberg (2007) 8. Cuřín, J., Kleindienst, J., Kunc, L., Labský, M.: Voice-driven Jukebox with ECA interface. In: Proc. of 13th International Conference "Speech and Computer" SPECOM 2009 (2009) 9. Potamianos, G., Huang, J., Marcheret, E., Libal, V., Balchandran, R., Epstein, M., Seredi, L., Labsky, M., Ures, L., Black, M., Lucey, P.: Far-field Multimodal Speech Perception and Conversational Interaction in Smart Spaces. In: Proc. of the HSCMA Joint Workshop on Hands-free Speech Communication and Microphone Arrays (2008) 10. Ghahramani, Z., Kim, H.-C.: Bayesian Classifier Combination. Gatsby Technical Report. University College London, UK (2003) 11. Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. In: McDonald, C. (ed.) Proceedings of ACSC 1998, Perth, February 4-6, pp. 181–191 (1998) 12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Estimating the Perceived Difficulty of Pen Gestures Radu-Daniel Vatavu1, Daniel Vogel2,3, Géry Casiez2, and Laurent Grisoni2 1
Abstract. Our empirical results show that users perceive the execution difficulty of single stroke gestures consistently, and execution difficulty is highly correlated with gesture production time. We use these results to design two simple rules for estimating execution difficulty: establishing the relative ranking of difficulty among multiple gestures; and classifying a single gesture into five levels of difficulty. We confirm that the CLC model does not provide an accurate prediction of production time magnitude, and instead show that a reasonably accurate estimate can be calculated using only a few gesture execution samples from a few people. Using this estimated production time, our rules, on average, rank gesture difficulty with 90% accuracy and rate gesture difficulty with 75% accuracy. Designers can use our results to choose application gestures, and researchers can build on our analysis in other gesture domains and for modeling gesture performance. Keywords: gesture-based interfaces, pen input, gesture descriptors.
reasons, researchers have proposed a second strategy using formal user studies for participatory design and gesture set evaluation [2,16,17,19,25]. Involving users in any design process is a good idea, but the effort to plan, run, and analyze these kinds of studies is large compared to using a predictive model. We offer a practical solution in-between a model and a user study. Based on an estimate of actual production time, we found that designers can reasonably estimate user's perceived gesture execution difficulty. The notion of difficulty encompasses multiple criteria including the ease with which a gesture may be learned, remembered, and performed. This notion of difficulty has been mentioned in previous work [16,17,25], but there has been no previous attempt to examine it in detail or estimate it. In an experiment using single stroke pen gestures, we elicited a difficulty classification rating and a relative difficulty ranking from participants. Based on data from a second validation experiment, our results show that the difficulty ranking can be predicted with greater than 93% accuracy using measured production time and 87% using the Isokoski first-order predictive production time model [8]. Using a Bayes classification rule and measured production time, we can also classify the difficulty rating with 83% accuracy. Since the times predicted by the CLC predictive model [4] reduced the accuracy of our classification rule to 25%, we analyzed an alternative approach. We found that production time can be reasonably estimated by gathering a few samples of actual production time – a set of data which may already exist for the purpose of training a gesture recognizer. With three people supplying three gesture samples, our classification rule achieved 75% accuracy on average and increased the average accuracy of the estimated difficulty ranking to 90%. Our findings that gesture difficulty can be predicted from production time, together with our results regarding the reasonable estimation of production time based on a very small set of data, provide designers with a general measurement encompassing multiple criteria to assess gesture sets without a full formal user study.
2 Previous Work Creating a successful gesture-based interface is challenging. Once a vocabulary of gestures moves beyond a small set of directional strokes, it becomes more difficult to learn, remember, and use [11]. Techniques exist which assist with recall and help to transition users from novice to expert: examples include crib-sheet diagrams [11] and dynamic path guides [3]. While these techniques are effective, they assume that a good gesture set has already been created. 2.1 Gesture Design Tools One way to make the designer's job easier is to use a gesture design tool. An example is Appert and Zhai's Stroke Shortcuts Toolkit [1] which includes a simple tool with a predefined dictionary of stroke primitives. The hope is that a designer's creativity is stimulated with a "structured design space that can be systematically explored". Long et al.'s Quill gesture design tool [13,14] goes further by providing metrics to help designers evaluate potential gesture sets. The metrics relate to recognition rate, and conveyed through values such as classification distance or visualized as confusion
Estimating the Perceived Difficulty of Pen Gestures
91
matrices. Ashbrook and Starner's MAGIC tool [2] introduces gesture goodness as a metric. In an evaluation, this seemingly abstract metric was useful as a quantitative guideline compared to a specific breakdown of individual measures (such as interclass variability graphs). However, goodness is also closely related to recognition rate. Although participants were also asked to design gestures that would be easy to remember, perform, and be socially acceptable, MAGIC, like Quill, does not provide any quantitative feedback for these criteria. 2.2 Models Producing quantitative measurements to represent other criteria requires predictive models. For example, Long et al. [15] developed a model for predicting the perceived visual similarity of two gestures. Their model was generated by selecting a subset of geometric and dynamic features of gesture trajectories, and looking for a correlation with experimentally determined user rated visual similarity. The final model could predict visual similarity of two gestures reasonably well (correlated R2=.56 with ground truth). One application is increasing recognition rate by avoiding ambiguous gestures, but the authors also argue that a visual similarity metric may be used to improve a gesture's fit with its function. For example, designers could assign visually similar gestures to similar operations (such as scroll up and scroll down), and dissimilar gestures to more abstract tasks such as cut and paste. Isokoski [8] introduced a model to predict the relative ordering of gesture production times based on geometric complexity. The model sums the minimal number of straight segments needed to maintain a human recognizable shape in the gesture. This sum is interpreted as a complexity number and can be used as a firstorder ranking of gesture production time: the model ranked production times of Unistroke characters with R2=.85. Although there is ambiguity in the definition and calculation method, Isokoski's model has the advantage of being conceptually simple. Cao and Zhai's [4] Curves, Lines and Corners (CLC) model goes beyond Isokoski by attempting to predict the actual production time of a single stroke gesture. After decomposing a gesture into curved and straight segments, the model calculates individual production times for curves based on Viviani's 2/3 power law of curvature [22] and a simple power term based on the length of straight lines (no time is calculated for corners, they are only used to segment lines and curves). The authors found that CLC works very well as a first order predictor (correlations with test data had R2>.90), but over- or under-predicted arbitrary gestures times by 30% and overpredicted Unistroke and Graffiti gestures by more than 40%. Castellucci and MacKenzie also noted this type of performance for CLC [5]. Cao and Zhai attribute this behaviour to the model's inability to compensate for unfamiliar and little practiced gestures, or familiar and well-practiced gestures. 2.3 User Studies Rather than rely on predictive models, researchers have suggested that user studies should be used to assist in the design and evaluation of gesture sets. For example, Nielsen et al. [17] provide a user-centered procedure to design whole-hand gestures. The procedure requires two user studies, an initial study to gather user input to inform
92
R.-D. Vatavu et al.
design and a subsequent study to evaluate. In a case study application, they report they were able to obtain a good gesture set, but the procedure was very time consuming. Also, key stages such as the generation of scenarios must be carefully prepared or else results may be substandard. Wobbrock et al. [25] take a participatory design approach by eliciting a gesture set from users. Using wizard-of-oz techniques, they asked users to mimic the best multitouch gesture to match a demonstrated action such as scale, rotate, move, etc. The study, as well as a follow-up [16], also gathered rankings for each candidate gesture's intuitiveness and ease-of-execution. Perhaps surprising, but the authors report that gestures which experienced designers propose are not always preferred by users [16]. 2.4 Summary Ideally, the best way to design an intuitive and easy-to-perform gesture set is to involve users like Nielsen et al. [17] and Wobbrock et al. [25] since even experienced designers cannot predict user preference [16]. But, faced with the large amount of effort required to plan, run, and analyze these studies [17], perhaps there is a way for designers to evaluate candidate designs using predictive models and/or minimal user data. Long et al.'s [15] visual similarity predictive model is interesting since it can guide designers with a gesture's fit with a function. Isokoski [8], and Cao and Zhai [4], have made progress towards estimating actual gesture production time, a measure which should directly relate to how efficient a gesture is to perform. However, Cao and Zhai [4] and Castellucci and MacKenzie [5] note that production time is a partial function of many factors and therefore an accurate predictive model remains elusive. Inspired by Long et al., as well as Ashbrook and Starner's success with a seemingly abstract post-hoc measure of goodness [2], we focus on a measure of execution difficulty.
3 Experiment 1: Measuring Execution Difficulty The notion of execution difficulty (or the converse, ease-of-execution) is frequently mentioned [2,14,15,17] and has been measured for multi-touch gestures with postexperiment surveys [16,25], but there has been no attempt to estimate it a priori. Morris et al. associate difficulty with "carrying out the gesture's physical action" [16]. Carrying out an action refers directly to efficiency of performance, but also involves a cognitive process which relates to how easy a gesture is to learn and recall [4]. Thus, execution difficulty is a general quantitative measure which combines multiple design criteria: learn-ability, recall, and performance. More abstract measures, such as goodness [2] and general preference [26] may include additional criteria (such as social acceptability [19]), but the more general the measure, the more abstract it is due to more complex relationships of the underlying criteria. The challenge is how to estimate execution difficulty given a candidate gesture or gesture set, with the knowledge that it encompasses criteria which are known to be difficult to predict. In the first experiment, we measure perceived execution difficulty for a set of single stroke gestures. If there is significant agreement across participants, then it is likely to be an intuitive measure suitable for a priori estimation. Using the participants’
Estimating the Perceived Difficulty of Pen Gestures
93
movement logs and the geometric gesture shapes, we compute quantitative measures ("descriptors") and test these for correlations with the participants' responses. If well correlated descriptors exist, and they can be estimated or computed directly, then designers have a way to estimate perceived difficulty of candidate gesture designs. Participants 14 right-handed people (3 females) participated in the experiment (mean age 21 years, SD 1). 11 out of 14 participants had no pen-based interface experience. Apparatus Gestures were entered using a 17 inch (431 mm) Wacom DTU-710 Interactive Pen Display running at a resolution of 1280 x 1024 px (pixel pitch 0.264 x 0.264 mm) and capable of capturing pen input at 133Hz. The display was positioned horizontally to approximate a physical pen and paper context. A 2.4GHz computer ran a C# full screen application. The participant entered gestures in a 420 x 420 px (110 x 110 mm) square box centered in the display (Fig 1).
Fig. 1. Experiment Application: (a) current gesture to perform; (b) gesture input area; (c) postentry choice buttons
Task Each trial began with the path of the current gesture to be entered shown on the left side of the display (Fig 1a). Participants were instructed to enter a continuous stroke for the gesture and to balance speed and accuracy. After performing the gesture, two buttons were enabled representing a choice between flagging their input as incorrect or continuing to the next gesture. Participants were instructed to flag a stroke as incorrect if the shape they entered was different from the target gesture, or if some accidental input occurred such as the pen slipping or moving unevenly. This was logged as an input error and the participant was asked to re-execute the gesture. Like Wobbrock et al. [25], we wanted our participants to decide whether a gesture was similar to the template, avoiding any confounding effects due to the behavior of a recognizer. As an extra precaution, all participant executions were visually inspected by the authors and confirmed that they were correctly entered.
94
R.-D. Vatavu et al.
Gesture Set There were 18 different single stroke gestures (Fig 2). The set contains 9 gestures designed to be familiar (i.e. letters and shapes used in everyday writing) and 9 gestures designed to be unfamiliar (e.g. the twirl-omega and flower shapes may appear familiar, but are unlikely to be practiced as a pen stroke, while steep-hill and triangles-chain are completely new shapes). As discussed earlier, Cao and Zhai [4] argue that familiarity affects actual performance time due to practice. The idea is that a more practiced gesture will result in a lower performance time in spite of high objective geometric complexity. For example, although the letter g is a rather complex series of twists and 180-degrees turns, it would be difficult to reproduce initially; but, with practice it can be executed very quickly. Since practice also relates to how easy a gesture is to learn and recall, familiarity is likely to relate to execution difficulty. We expected that more familiar gestures will be rated as easier to perform, even if they have high objective complexity.
Fig. 2. The 18 gestures used in the experiment: (a) left 9 designed to be familiar; (b) right 9 designed to be unfamiliar
Design Each participant executed each gesture 20 times, with the 18x20 = 360 gestures presented in random order. The number of repetitions (20) was chosen larger than the current practice when eliciting gestures from users, be it for training gesture recognizers [26] or even for deriving performance models [4]. We purposely did this to ensure motor learning for all gestures so that participants would reach execution automaticity. Participants were allowed to take as many breaks as they wished. The experiment took approximately 40 minutes. Post-Experiment Questionnaire After the experiment, participants answered a short questionnaire regarding their perceived execution difficulty when performing the gestures. We gathered this information in two different ways: an individual execution difficulty Rating for each gesture using a 5-point Likert scale; and an ordered Ranking of all gestures according to relative execution difficulty. The 5-point Likert scale rating question (Table 1) was presented as a 5 column table: participants entered ratings for the 18 gestures in any order they chose. Participants were asked to enter the rating by drawing the gesture in the column corresponding to the desired Likert rating. We hoped this would allow
Estimating the Perceived Difficulty of Pen Gestures
95
them to re-enact the gesture performance and make visual inspection easy. They could modify previous ratings at any time until they were confident of their final choices. Table 1. Likert questions used to elicit execution difficulty Rating Likert rating
Associated explanation
1. very easy to execute
I executed these gestures immediately and effortlessly with absolutely no need to pay attention
2. easy to execute
I executed these easily, almost without paying attention
3. moderate difficulty
I occasionally paid special attention during execution
4. difficult to execute
I paid special attention with each execution
5. very difficult to execute
I had to concentrate for each execution. There were times when I did not get the right shape from the first attempt
The ordered ranking of all gestures according to ascending execution difficulty was completed after the Likert rating. This enabled participants to use the rating classes to assist with this otherwise difficult task. As before, we asked them to draw the gestures in order to revisit relative differences in difficulty as they completed the ranking. We also asked participants to explain their perception of gesture difficulty: what they found difficult or easy for each gesture execution. Finally, we asked them to identify which shapes they found Familiar (they had seen and practiced before) in order to test our choice for familiar and unfamiliar gestures.
4 Results We found a high degree of agreement between participant Rating of execution difficulty (Kendall's W=.781, χ2(17)=185.60, p<.001). The agreement was even stronger for Ranking which participants commented as being a difficult task (W=.82, χ2(17)=195.17, p<.001). Both coefficients are well above 0.5 indicating our sample size was appropriate with a large Cohen effect. Since Rating was designed to be used as a first approximation for Ranking, there was a significant correlation between their median ratings (ρ(N=18)=.97, p=.01). Fig 3 illustrates the median Rating and Ranking ratings for each gesture. A repeated-measures Friedman's ANOVA was used in order to test the influence of gesture type (nominal with 18 cases) over Rating and Ranking. The results showed a significant effect of Gesture over both Rating (χ2(17)=185.60) and Ranking (χ2(17)=195.17, at p<.001). Across all 14 participants there were 17 deviations (6.7% of the total responses) from our gestures set's assumed Familiarity. 14 deviations were assumed unfamiliar gestures: 7 participants found the twirl-omega gesture familiar, 4 reversed-pi, 2 flower, and one participant said the sail-boat and steep-hill were also familiar. The 1
Kendall's coefficient of concordance W in [0..1] where 0 denotes no agreement at all and 1 represents absolute agreement.
96
R.-D. Vatavu et al. 20
4
Median Ranking
Median Rating
5
3 2 1 0
15 10 5 0
all unfamiliar all familiar
triangle-chain polyline strike-through flower steep-hill sail-boat turn-90 twirl-omega reversed-pi g a triangle rectangle m 8 6 3 circle
all unfamiliar all familiar
triangle-chain polyline strike-through flower steep-hill sail-boat turn-90 twirl-omega reversed-pi g a triangle rectangle m 8 6 3 circle
Fig. 3. Left: median gesture Rating (higher Rating values were perceived to be more difficult to execute). Right: median gesture Ranking (higher numerical Ranking for gestures perceived to be more difficult to execute). In both graphs, gestures are ordered by ascending Ranking.
latter also noted that the assumed-to-be-familiar gestures a and g were unfamiliar because the starting point was not in the same location where they usually start those letters. As part of their comments regarding their perception of gesture difficulty, three participants noted the same issue of starting position with a and g and one participant with 8, but they did not feel this made them unfamiliar. This relates to the problem of allographic variation in handwriting where individual differences in the formation of character shapes pose problems for handwriting recognizers [21]. Aside from twirl-omega where Familiarity deviations occurred with half of our participants, our assumed gesture familiarity was reasonable. We could treat these deviations as outliers since they represent less than 4% out of the total responses, but when possible Familiarity related analysis is based on actual participant responses. The median Ranking and Rating across all familiar and all unfamiliar gestures (Fig 3) are significantly different according to a Wilcoxon signed-rank test (z(N=14)=-3.402, p=.001 for Rating and z(N=14)=-3.400, p=.001 for Ranking, both with large effects, r=.64). These 9 assumed familiar gestures are among the 11 gestures assigned to the easiest Rating levels, and are among the lowest 10 gestures in ascending difficulty Ranking (Fig 3). The twirl-omega and reversed-pi (two out of three contentiously unfamiliar gestures) also share the two easiest median Rating levels, and reversed-pi has the same median ranking as the familiar gesture g.
5 Towards Estimating Execution Difficulty Given the high agreement of perceived execution difficulty Rating and Ranking in experiment 1, we can search for a way to estimate difficulty in the absence of a formal experiment. Essentially, if a correlation exists with one or more characteristic gesture descriptors, then those descriptors can be used to estimate execution difficulty. We examined many potential descriptors (Table 2): all of Rubine's static geometric descriptors and measured quantities [20], the additional geometric descriptors used by Long et al. [15], Hu invariant spatial curve moments commonly used in image processing for contours and shapes [18](p. 606), Isokoski's complexity measure [8], and the production time predicted by Cao and Zhai's CLC model [4].
Estimating the Perceived Difficulty of Pen Gestures
97
Table 2. Descriptors (bold indicates significant correlation with Rating or Ranking) Rubine’s set [20]: Geometric 1. Cosine of initial angle (cosine1) 2. Sine of initial angle (sine1) 3. Size of bounding box (bbox size) 4. Angle of bounding box (bbox angle) 5. Distance between first and last points 6. Cosine of angle between first and last points (cosine2)
7. Sine of angle between first and last points (sine2) 8. Total length 9. Total turning angle 10. Total absolute turning angle (turn angle) 11. Sharpness or (energy)
Rubine’s set [20]: Measured 12. Production Time (time) 13. Speed
Long et al.’s visual similarity set [15]: Geometric 14. Aspect 15. Total angle traversed / total length 16. Total angle / total absolute angle 17. Distance between first and last points (density1)
18. Size of bounding box (density2) 19. Openness 20. Area of bounding box (bbox area)
Model predictions 28. CLC Predicted Production Time [4] 29. Isokoski’s complexity measure [8]
The calculation of the Rubine, Long et al., and Hu descriptors are straightforward to apply to the geometric shape of the gesture, given the descriptions and equations in the cited works. We computed these measurements using two representations of geometric gesture shapes. To approximate a design scenario where the gestures have been drawn, but not performed, we used the target gesture shapes displayed in the left panel of our experimental application (i.e. the vector drawings in Fig 1). We will refer to these as geometric descriptors using Drawn representations. We also computed mean descriptors using the actual gesture geometries as performed by the participants in our experiment. Theoretically, this is a best case scenario for geometric descriptor performance, but with the potential issue of overfitting. We will refer to these geometric descriptors using Performed representations. Both Drawn and Performed representations were preprocessed similar to previous work [4,10,26] by normalizing without deformation, centering on the origin, and re-sampling uniformly into n=32 points. To calculate the CLC predicted production time, we used the PlayCLC program2. As noted earlier, the definition and calculation of Isokoski's complexity measure is ambiguous. By studying examples [8](p. 360) we developed quantitative guidelines to perform the necessary reduction of arcs into line segments: if the angle α inscribed by an arc was greater than 270° use 3 segments; if α<120° use 1 segment; otherwise use 2 segments. We could verify these guidelines with our 3 and circle shapes, also included in Isokoski's examples. Note that all descriptors based on geometry are static and will not change with practice. For example, a geometrically complex, but familiar gesture such as g may have a lower Rating compared to a geometrically simple, but unfamiliar gesture such 2
From http://www.cs.toronto.edu/~caox/PlayCLC/PlayCLC.htm
98
R.-D. Vatavu et al.
as sail-boat. Rubine's Production Time and Speed descriptors are measured, i.e. they are computed from data gathered during actual gesture performance, so they include effects for practice. Of course, using this type of post-hoc measure for a-priori prediction seems paradoxical. Our initial rationalization is that some future model may be able to accurately predict these measures (such as an improved CLC model for Production Time), and we show later that the relevant measure of Production Time can be approximated with a very small set of informally gathered user data. All of the potential descriptors in Table 2 were tested for correlations with execution difficulty Rating and Ranking. This was done overall, as well as separately with familiar and unfamiliar gesture groups. Descriptors with at least one significant Spearman correlation coefficient are listed in Table 3 (for geometric descriptors using Drawn representations in Fig 1) and Table 4 (for geometric descriptors using participant Performed representations). Table 3. Correlations of geometric descriptors using Drawn representations. Spearman correlation of descriptor with median Rating and Ranking in descending order of overall Rating coefficients; coefficients are reported at p = .01 (**) and p = .05 (*) significance levels; N = 18 for all, N = 9 for familiar and N = 8 for unfamiliar gestures (twirl-omega was excluded). The largest coefficient in each column is shown in bold text (two bold coefficients in the same column are not significantly different). all bbox size bbox area length cosine2 density1 Hu2
Production time has the highest correlations with Rating and Ranking overall; and, in all but one case, it is among the highest correlations when tested separately with familiar and unfamiliar gesture groups. Speed had the second highest (negative) correlation when all gestures were considered together, but not significant when tested separately with familiar and unfamiliar. Note that there is evidence that production time should be a scale invariant.Viviani and Terzuolo [23] found that execution times for single strokes in handwriting are scale invariant. If we accept that a single stroke gesture is similar, then scale invariance should not be problematic. Isokoski [8] also provides additional evidence with his observation that average velocity increases with longer strokes. In many cases, descriptors based on geometry had significantly lower correlation coefficients compared to measured values. An exception is length, which has all significant coefficients in Table 4 and all but one in Table 3. In the case of familiar gestures in Table 4, coefficients for length, along with density2, and cosine2 are not significantly different from actual production time. Although not significantly highest, Isokoski's complexity and two bounding box descriptors in Table 4 correlate reasonably well when all gestures were considered together, but are not even
Estimating the Perceived Difficulty of Pen Gestures
99
Table 4. Correlations of geometric descriptors using Performed representations. Correlations reported as in Table 3 all time -speed length bbox size Isokoski bbox area density2 turn angle Hu2 CLC aspect cosine2 density1 energy
significant when tested separately with familiar and unfamiliar. In Table 3, some geometric descriptors such as the two bounding box descriptors and cosine2 correlate very well with unfamiliar gestures. Intuitively, the larger gestures may be more complex, and thus be more difficult to execute, but the high correlation of cosine2 is surprising. With low N values (8 for unfamiliar and 9 for familiar), there will be fewer significant differences between descriptors. The tendency for geometric descriptors to exhibit higher coefficients in either familiar or unfamiliar gesture groups is most likely because they cannot adapt to the effect of practice. This is similar to reasons given for the under- or over-estimation behavior of the CLC model [4,5]. Visual inspection of the most promising descriptors provides some intuition for their relative performance in predicting difficulty (Fig 4). Gestures are listed by ascending median Ranking on the horizontal axis, so a monotonic trend would suggest it is a good candidate for estimating Ranking. Actual production time ascends almost monotonically with Ranking demonstrating that gestures rated as being more difficult to execute have a greater production time. The static geometric descriptors for the most part increase with difficulty overall, but irregularities are much more pronounced suggesting a weaker fit. For example, letters a, g and m have long lengths, yet they are rated as easy to execute. This again speaks to familiarity: despite objective complexity, practiced gestures are rated with lower execution difficulty. There are also significant correlations between descriptors. Production time is correlated with length (ρ(N=18)=.89, p=.01) and Isokoski's complexity and production time are correlated with length (ρ(N=18)>.70, p=.01). This suggests a partial correlation between these three descriptors, so it is appropriate to test for shared variance. When controlling for production time, the other parameters are no longer significant (p>.05). When controlling for all but production time, production time was still found highly correlated with Rating (ρ(N=18)=.73) and Ranking (ρ(N=18)=.67) at p<.01. For familiar and unfamiliar groups, none of the correlations with Rating and Ranking were
R.-D. Vatavu et al. Mean Length (cm) Isokoski Complexity
100
Mean Time (s)
5
0
CLC Time (s)
5
0
10
0 20
all unfamiliar all familiar
triangle-chain polyline strike-through flower steep-hill sail-boat turn-90 twirl-omega reversed-pi g a triangle rectangle m 8 6 3 circle
all unfamiliar all familiar
triangle-chain polyline strike-through flower steep-hill sail-boat turn-90 twirl-omega reversed-pi g a triangle rectangle m 8 6 3 circle
Fig. 4. Visual comparison of the four most promising measures and predictors (y-axes) with actual ascending median gesture Ranking (x-axis). A monotonic trend suggests the measure or predictor is a good candidate for estimating Ranking (e.g. mean measured time). NOTE: Error bars in all figures represent 95% CI.
significant when either variable was controlled during partial correlations. The tstatistic for comparing coefficients [6] showed a significant difference between coefficients for Rating (t(15)=5.92) and Ranking (t(15)=4.02) at p<.01. The poor performance of the CLC predicted production time is somewhat surprising. Previous results found CLC to be highly accurate for first-order predictions when comparing relative ratios of gesture set production times [4,5]. So, we expected it would also perform well with a similar first-order prediction task for execution difficulty Ranking, but it has no significant correlations with Ranking at all. To investigate further, we directly compared the CLC predicted production times to actual production times. For magnitude, we found a significant, but low correlation (R2=.37, p=.01). For relative ranking, we also found a significant, but low Spearman correlation (ρ(N=18)=.53, p=.05). Production time is the best indicator of execution difficulty, but the CLC model is not able to accurately predict performance time for our purposes. So, we continue the development of execution difficulty estimation rules based on actual production time, with the assumption (and caveat) that we are at the moment using a post hoc measured value. Later, we show that a small sample of data will provide suitable estimations of production time.
6 Difficulty Estimation Rules We present two rules for estimating execution difficulty based on production time. The first is a simple rule which compares two candidate gestures according to relative
Estimating the Perceived Difficulty of Pen Gestures
101
execution difficulty (as Ranking does), and the second uses Bayes’ rule to classify a gesture into one of five categories of execution difficulty (such as those provided by the Rating measure). Rule 1: Relative Difficulty Ranking Gesture A is likely to be perceived as more difficult to execute than gesture B if the production time of A is greater than that of B: time(A) > time(B) suggests Ranking(A) > Ranking(B) To test this rule, we applied it to each pair of gestures (A,B) out of the (18x17)/2=153 possibilities in experiment 1 using the measured production time and counted how many times the rule was correct out of the total number of classification attempts (Ranking accuracy). The rule predicted the relative ranking correctly with 93% accuracy (11 errors out of 153 tests). Rule 2: Classifying Difficulty Rating Mapping from production time to one of our five difficulty classes (Ci, i=1..5: very easy, easy, moderate, difficult, and very difficult) is a pattern classification problem where each gesture is represented by a single feature, in our case production time. A common technique in statistical pattern recognition is Bayes’ rule that minimizes classification error [24]. Bayes’ rule uses each class-conditional density probability (i.e. the probability for a randomly chosen pattern x to lie in class Ci, denoted p(x|Ci)) together with the a priori probability of class Ci (or how likely it is to observe a pattern from this class, denoted p(Ci)). Using this data, Bayes’ rule computes the a posteriori probability of x belonging to each class, p(Ci|x), and assigns x to class Cj for which the a posteriori probability is maximum:
x ∈ C j ⇐ j = arg max{p( x | C i ) ⋅ p(C i )} i =1, 5
(1)
In order to apply Bayes’ rule, the conditional p(x|Ci) and a priori p(Ci) probabilities must be known for each of our 5 Rating classes. Normal parametric models are frequently assumed in practice (equation 2) for estimating the unknown conditional densities p(x|Ci) [24](p.34) for which the parameters (mean μi and standard deviation σi) can be easily computed from the training set (in our case, data from experiment 1). p ( x | Ci ) =
⎛ ( x − μi )2 ⎞ ⎟ exp⎜⎜ − 2σ i2 ⎟⎠ 2πσ i2 ⎝ 1
(2)
The a priori probabilities p(Ci) are estimated from the training set as the percentage of samples falling into each class [24](p.34-39). In our case, μi are the mean production times for each Rating class (expressed in seconds); σi the standard deviations (seconds); and p(Ci) the percentages of samples belonging to each Rating class. Table 5 lists these parameters as computed from our training data (experiment 1) with an illustration of each normal model superimposed over the production time histogram.
102
R.-D. Vatavu et al.
Table 5. Left: Bayes’ Rule Parameters for the Rating Classification Rule. Right: Production time frequency histogram with superimposed time normal models for each Rating.
We tested Bayes’ rule in order to see how good it fits our data. We counted how many times the rule was correct out of 18 classification attemps (the Rating accuracy) by applying it to each gesture in our set. The rule achieved an accuracy rate of 83% on its own set (15 gestures were correctly classified to their Rating category as indicated by the participants). The three errors occurred for the strike-through, turn90, and sail-boat gestures, all of which were misclassified to the next lower class. This confirms for now a good model fit for our data while Section 9 will show how the rule applies for new gestures in our validation experiment. The mean production times μi for each Rating level (see Table 5) could be approximated to more reasonable timestamps such as 1, 1.5, 2, 2.5 and 3.5 seconds (the μi simplified column in Table 5). These could represent more intuitive working estimates for each Rating class to be used by designers. When using these mean values with the computed standard deviations as before, we also obtained 83% classification accuracy.
7 Estimating Production Time Applying our rules using measured production time works very well, but we would like designers to estimate production time without running such a formal experiment. Ideally, this could be done with predictive models. However, using times predicted by CLC, Ranking accuracy dropped to 67% and Rating down to 28%. Although Isokoski does not predict actual time, it can be used for relative Ranking where it managed a prediction accuracy of 82%. Examining the data from experiment 1, we found that individual participant gesture production times are highly correlated with overall mean production times ρ(N=18)=.96, p=.01 (min .92, max 1.0). This consistency made us wonder if a designer could estimate difficulty based on only few samples of measured production time. Instead of a long formal experiment, a few people could perform the candidate gestures a few times in a simple data gathering application. Even more, this data is likely to already exist for training the gesture recognizer [12,20,26].
Estimating the Perceived Difficulty of Pen Gestures
103
We first consider the minimal case of gathering data from a single person. Again using data from experiment 1, for each participant, we randomly selected M out of 20 execution samples for each gesture to calculate a mean production time. Using these mean times, we apply our rules and compute the prediction accuracy for Rating and Ranking. The random selection was repeated 100 times as M varied from 1 to 20: thus 14 participants x 18 gestures x 20 M values x 100 repetitions = 504,000 predictions. The mean accuracy for Rating begins to level out at M=3 near 53%, and Ranking also approaches 91% (Fig 5, left). The effect of M over Rating is significant (χ2(19)=476.4, p<.001). A Wilcoxon signed-rank test found significant effects between (1,20), (3,20) and (5,20) with a small Cohen effect (r<.3). The effect of M over Ranking was significant (χ2(19)=4140.54, p<.001) with significant differences between (1,20) (r=.52), (3,20) and (5,20) with medium effects (r<.5). With 3 samples, mean Rating accuracy was 53% (SD 18%) and mean Ranking accuracy 89% (SD 3%). Ranking
90 80 70 Rating
60 50 40 1
3
5
10 15 Execution Samples
20
Ranking
100
MeanAccuracy (%)
MeanAccuracy (%)
100
90 Rating
80 70 60 Execution Samples: 1
50
3
5
40 1
3
5 10 Number of Participants
14
Fig. 5. Left: prediction accuracies for Rating and Ranking vs. number of execution samples. Right: Difficulty prediction accuracies vs. number of participants.
We continue our analysis by varying the number of participants N=1..14 given M=1,3,5 individual gesture execution samples from each. Similar to before, we randomly selected the gesture samples 100 times for each N: thus 14 participants x 18 gestures x 3 M values x 100 repetitions = 75,600 predictions. The mean accuracy of Rating increases from 52% using one participant to 77% (significant, χ2(13)=496.45, p<.001) when data from all participants is used (Fig 5, right). The same trend is observed independently for M=1,3, and 5 executions from each participant. The accuracy of Ranking increases from 88% to 93% (χ2(13)=715.1, p<.001). The effect of M was found significant for both Rating and Ranking (at p<.001) but the Wilcoxon signed-rank test showed small Cohen effects between (1,3) and (1,5) r<.3 and very small between (3,5) r<.15. With 3 participants and 3 execution samples mean Rating accuracy was 66% (SD 14%) and mean Ranking accuracy 91% (SD 2%). With 5 participants and 3 execution samples mean Rating accuracy was 70% (SD 13%) and mean Ranking accuracy 92% (SD 2%). In summary, on average, a designer could estimate a relative Ranking of execution difficulty with 89% using 3 gesture execution samples from a single person. To estimate Rating, 3 execution samples from 3 or 5 people are needed to achieve mean accuracies of 66% and 70% respectively.
104
R.-D. Vatavu et al.
8 Experiment 2: Validation of Difficulty Estimation Rules A second experiment, similar to the first, was used to validate our execution difficulty rules as well as our simple production time estimation technique. The same apparatus, task, and design were used, but with 20 different gestures (Fig 6) and 11 new participants: 11 x 20 x 20 = 4,400 executions.
Fig. 6. The validation set of 20 gestures
Results We found the same high level of correlation between participants' difficulty Rating (Kendall's W=.78, χ2(19)=163.61, p<.001) and Ranking (W=.80, χ2(19)=166.79, p<.001). Rating and Ranking were again highly correlated (ρ(N=20)=.94, p=.01). Estimates of Execution Difficulty We first establish an accuracy upper bound using the actual measured production times logged in the experiment. To test the accuracy of estimating Ranking using Rule 1, we ordered the gestures in ascending order of production time, and correlated the resulting ranks with the median participant Ranking. Again, there was a strong correlation (ρ(N=20)=.94, p=.01). Then, we applied Rule 1 for each pair of gestures (A,B) out of the (20x19)/2 = 190 possibilities, and calculated an accuracy rate (how many times the estimate was correct). In this way, estimating Ranking using Rule 1 attained 93% accuracy: 14 errors out of 190 tests. For Rule 2, we used the simplified Bayes parameters generated from Experiment 1 (Table 5). Estimating Rating using Rule 2 attained 90%: 18 gestures were correctly classified according to median participant Rating. The rectangle gesture was classified as easy instead of very easy to execute, and tree was classified as easy instead of moderate (both were shifted by one Rating class). Next, we tested the accuracy of our rules using an estimate of production time generated from a small number of samples. Based on our analysis in the previous section, we tested N=1,3,5 participants and M=3 gesture execution samples. Rating accuracies varied from 66.9% to 79.8% while Ranking increased from 89.6% to 91.3%. Table 6 shows the accuracy rates obtained. We also re-tested using CLC and Isokoski for input to the model. CLC still produced a low Rating accuracy of 25%, but it performed better for Ranking with 75% accuracy. Isokoski did very well with 87% for Ranking, but cannot be used to estimate Rating. Overall, our rules to estimate difficulty performed well with our validation data, even when using only three samples from three participants as an estimate of production time.
Estimating the Perceived Difficulty of Pen Gestures
105
Table 6. Validation experiments results: Ranking and Rating estimation accuracies using both measured and estimated production times
Production time Measured Estimated (3 executions) Predicted
x 1 participant x 3 participants x 5 participants Isokoski CLC
9 Conclusions and Future Work Reducing gesture execution difficulty is an often mentioned goal of gesture set design. Our work provides support for this argument with empirical evidence showing that people tend to have similar perceptions of execution difficulty, that it is highly correlated with gesture production time, and that difficulty can be estimated using two simple rules for relative ranking and a classification rating. Because existing models cannot accurately predict the magnitude of production time necessary for our classification rule, we provide evidence that an estimate of production time using only a few execution samples from a few people is good enough. Moreover, this set of estimation data may already exist when designers train a recognizer. Designers can use our quantitative rules as they are when selecting from candidate single stroke pen gestures. However, we plan to make this process more automatic, by incorporating our difficulty estimation into the popular $1 gesture recognizer [26]. As future work, we also plan to examine how execution difficulty relates to multi-stroke pen gestures and multi-touch gestures. Acknowledgement. This paper was supported by the project "Progress and development through post-doctoral research and innovation in engineering and applied sciences- PRiDE - Contract no. POSDRU/89/1.5/S/57083", project co-funded from European Social Fund through Sectorial Operational Program Human Resources 2007-2013.
References 1. Appert, C., Zhai, S.: Using strokes as command shortcuts: cognitive benefits and toolkit support. In: Proceedings of CHI 2009, pp. 2289–2298. ACM Press, New York (2009) 2. Ashbrook, D., Starner, T.: Magic: a motion gesture design tool. In: Proceedings of the CHI 2010, pp. 2159–2168. ACM Press, New York (2010) 3. Bau, O., Mackay, W.E.: Octopocus: a dynamic guide for learning gesture-based command sets. In: Proceedings of UIST 2008, pp. 37–46. ACM Press, New York (2008) 4. Cao, X., Zhai, S.: Modeling human performance of pen stroke gestures. In: Proceedings of CHI 2007, pp. 1495–1504. ACM Press, New York (2007)
106
R.-D. Vatavu et al.
5. Castellucci, S.J., MacKenzie, I.S.: Graffiti vs. unistrokes: an empirical comparison. In: Proceedings of CHI 2008, pp. 305–308. ACM Press, New York (2008) 6. Chen, P., Popovich, P.: Correlation: Parametric and nonparametrized measures. Thousand Oaks, Sage (2002) 7. Grange, S., Fong, T., Baur, C.: Moris: a medical/operating room interaction system. In: Proceedings ICMI 2004, pp. 159–166. ACM Press, New York (2004) 8. Isokoski, P.: Model for unistroke writing time. In: Proceedings of CHI 2001, pp. 357–364. ACM Press, New York (2001) 9. Kratz, S., Rohs, M.: A $3 gesture recognizer: simple gesture recognition for devices equipped with 3d acceleration sensors. In: Proc. of IUI 2010, pp. 341–344. ACM Press, New York (2010) 10. Kristensson, P.-O., Zhai, S.: Shark2: a large vocabulary shorthand writing system for penbased computers. In: Proceedings of UIST 2004, pp. 43–52. ACM Press, New York (2004) 11. Kurtenbach, G., Moran, T.P., Buxton, W.A.S.: Contextual animation of gestural commands. Computer Graphics Forum 13(5), 305–314 (1994) 12. Li, Y.: Protractor: a fast and accurate gesture recognizer. In: Proceedings of CHI 2010, pp. 2169–2172. ACM Press, New York (2010) 13. Long Jr., A.C., Landay, J.A., Rowe, L.A.: Helping designers create recognition-enabled interfaces. In: Multimodal Interface for Human-Machine Communication, pp. 121–146 (2002) 14. Long Jr., A.C., Landay, J.A., Rowe, L.A.: Implications for a gesture design tool. In: Proceedings of CHI 1999, pp. 40–47. ACM Press, New York (1999) 15. Long Jr., A.C., Landay, J.A., Rowe, L.A., Michiels, J.: Visual similarity of pen gestures. In: Proceedings of CHI 2000, pp. 360–367. ACM Press, New York (2000) 16. Morris, M.R., Wobbrock, J.O., Wilson, A.D.: Understanding users’ preferences for surface gestures. In: Proceedings of GI 2010, pp. 261–268. Canadian Inf. Processing Society (2010) 17. Nielsen, M., Störring, M., Moeslund, T.B., Granum, E.: A procedure for developing intuitive and ergonomic gesture interfaces for HCI. In: Camurri, A., Volpe, G. (eds.) GW 2003. LNCS (LNAI), vol. 2915, pp. 409–420. Springer, Heidelberg (2004) 18. Pratt, W.: Digital Image Processing, 3rd edn. John Wiley & Sons, Inc., Chichester (2001) 19. Rico, J., Brewster, S.: Usable gestures for mobile interfaces: evaluating social acceptability. In: Proceedings of CHI 2010, pp. 887–896. ACM Press, New York (2010) 20. Rubine, D.: Specifying gestures by example. SIGGRAPH Computer Graphics 25(4), 329– 337 (1991) 21. Schomaker, L.: From handwriting analysis to pen-computer applications. IEEE Electronics and Communications Engineering Journal 10(3), 93–102 (1998) 22. Viviani, P., Flash, T.: Minimum-jerk, two-thirds power law, and isochrony: converging approaches to movement planning. Journal of Experimental Psychology: Human Perception and Performance 21(1), 32–53 (1995) 23. Viviani, P., Terzuolo, C.: 32 space-time invariance in learned motor skills. In: Tutorials in Motor Behavior. Advances in Psychology, vol. 1, pp. 525–533. North-Holland, Amsterdam (1980) 24. Webb, A.: Statistical Pattern Recognition. John Wiley & Sons, Inc., Chichester (2002) 25. Wobbrock, J.O., Morris, M.R., Wilson, A.D.: User-defined gestures for surface computing. In: Proceedings of CHI 2009, pp. 1083–1092. ACM Press, New York (2009) 26. Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Proceedings of UIST 2007, pp. 159–168. ACM Press, New York (2007)
On the Limits of the Human Motor Control Precision: The Search for a Device's Human Resolution François Bérard1, Guangyu Wang2, and Jeremy R. Cooperstock2 1
University of Grenoble, LIG, Grenoble-INP BP 53 - 38041 Grenoble cedex 9, France 2 McGill University, Centre for Intelligent Machines, Montréal, H3A 2A7, Canada [email protected], {gywang,jer}@cim.mcgill.ca
Abstract. Input devices are often evaluated in terms of their throughput, as measured by Fitts' Law, and by their resolution. However, little effort has been made to understand the limit of resolution that is controllable or “usable” by the human using the device. What is the point of a 5000 dpi computer mouse if the human motor control system is far from being able to achieve this level of precision? This paper introduces the concept of a Device's Human Resolution (DHR): the smallest target size that users can acquire with an ordinary amount of effort using one particular device. We report on our attempt to find the DHR through a target acquisition experiment involving very small target sizes. Three devices were tested: a gaming mouse (5700 dpi), a PHANTOM (450 dpi), and a free-space device (85 dpi). The results indicate a decrease in target acquisition performance that is not predicted by Fitts' Law when target sizes become smaller than certain levels. In addition, the experiment shows that the actual achievable resolution varies greatly depending on the input device used, hence the need to include the “device” in the definition of DHR. Keywords: input device, target acquisition, accuracy, device's human resolution, resolution.
for the same displacement of one inch, a 200 dpi mouse can select at most 200 different positions (7.6 bits of information), whereas a 5700 dpi mouse can provide 12.5 bits of information. It is not clear, however, how such resolution can benefit users if they are not actually able to control the device to select any of the 5700 possible positions within one inch. Given that human motor control capabilities are bounded, the fundamental problem becomes “How small can a target be, so that it remains selectable by a human, using the device, without an inordinate amount of effort?”. Our claim is that, at some point, increasing the resolution of input devices is futile because the additional resolution is effectively unusable. We argue, therefore, that a device's physical resolution, itself, is fundamentally less relevant a characteristic for HCI than its human-capable resolution. Surprisingly, such a human-centered measure of resolution does not appear to have been studied explicitly in the literature. In this paper, we define a Device's Human Resolution (DHR) as the smallest target size that a user can acquire, using the device, with an ordinary amount of effort, that is, matching Fitts' [10] prediction. Studies have shown that human performance in device control depends on the muscle groups being activated [3,17]. As such, we expect that the achievable precision varies with the choice of device and mode of operation, as these collectively determine which muscle group is used. This motivates the definition of a Device's Human Resolution, rather than simply, the more general, “human resolution”. The definition of the DHR, and its measurement, should have significant value when choosing which input device to use for a particular task: for example, 3D positioning may require great accuracy for CAD applications, but less for entertainment applications. The DHR will be paramount in the first case, whereas a high number of degrees of freedom may be more relevant in the second case. Similarly, the DHR can help in the design of novel input devices or interaction techniques by making explicit the achievable resolution of the device controlled by a human hand. Touch displays, for example, are obvious candidates for DHR evaluation. The need to interact with an increasing amount of information on progressively smaller displays, e.g., the latest iPod Touch, which packs 960 x 640 pixels into a 3.5 inch diagonal screen, has been the source of a large body of work [2,4,5,14]. In particular, techniques have been designed to improve the precision of selection with fingers [5], for example using two fingers. Evaluation of the DHR of a touch surface would inform us as to whether the sensing device or human precision is the limiting factor. More practically, knowledge of this DHR would indicate the upper limit on useful sensing resolution for touch surfaces. Indeed, the most direct use of the DHR is in directing the allocation of resources where benefits are most likely: no human performance improvement is to be expected by increasing the physical resolution of a device if it already exceeds its DHR. Conversely, an improvement is likely if the device does not yet saturate its DHR. In the work reported here, we present and justify our definition of a Device's Human Resolution (DHR). We then describe an approach designed to evaluate the DHR. Following this approach, we report on a study of three different devices: a mouse, a stylus attached to an articulated arm, and a free-space device. This study confirms the existence of a DHR, and shows that it varies greatly depending on the
On the Limits of the Human Motor Control Precision
109
form factor of the device and its mode of operation. Moreover, the study provides estimates of the DHR for the three devices. We discuss the implications of this work, in particular to ongoing developments in sensing technologies for input devices.
2 Previous Work Fitts' Law [10,12] is one of the most widely used tools to study human performance in controlling input devices. The law predicts that the mean time (MT) to acquire a target increases linearly with the index of difficulty (ID) of the target. In its “Shannon” formulation [12], MT is predicted as: MT = a + b log2(A/W + 1)
(1)
where the log part of the equation is the index of difficulty (ID), A is the initial distance to target (or amplitude), and W is the target width. The a and b parameters are estimated empirically for each device being studied. The Fitts' Law prediction has been shown to be highly accurate, with a correlation coefficient of the linear regression coming very close to 1.0 (MacKenzie [12] offer a review of several such experiments, and describe the computation of an effective width We for an improved prediction). One important benefit of a Fitts' Law experiment is that it estimates the rate of information that can be transmitted to the computer with a given device (Zhai [16] discusses the use of TP = ID / MT or Ip = 1/b to estimate this rate). Fitts' Law experiments, however, are generally run under conditions where the law applies. In our case, we are investigating target acquisition at the limit of human motor control precision, i.e., in the range of target sizes where Fitts' Law may no longer apply. Fitts' Law allows estimation of the throughput of the tested device. Although throughput is fundamental to performance, this measure alone provides a limited understanding of the suitability of a device to a particular task. To obtain a better understanding of performance differences between devices, MacKenzie et al. [13] introduced seven new measurements, such as the number of target re-entries, meant to analyze the trajectory of the pointer during target acquisition. However, these measurements do not directly address the human resolution limits of the device. The effect on performance of the use of different muscle groups has been observed in several studies [3,17]. For example, for a six degrees of freedom docking task, a device held in the hand and rotated by the fingers is more efficient than the whole hand [17]. However, here again, the performance was only studied in terms of throughput, not in terms of achievable resolution (i.e., the DHR). One of the very few examples studying the high accuracy attainable by the human hand is the work of Guiard et al. on “multi-scale pointing” [11]. Their experiment showed that users can efficiently acquire targets having very high Fitts' ID. The experiment includes a target of a width of three points at a distance of 14,500 points in motor space units, resulting in a task ID greater than 12 bits. They observed that the limiting factor in this case was vision, rather than motor control: in the motor space, the target measures 0.06 mm and the amplitude is 289 mm. Such a small target would require a very high visual acuity if displayed directly. Instead, they introduced a multi-scale pointing technique in which pointer and targets were displayed at two different zoom factors. Participants first focused on the low-zoom display to bring the
110
F. Bérard, G. Wang, and J.R. Cooperstock
pointer into the neighborhood of the target, then switched their visual focus to the high-zoom rendering to complete the acquisition. In motor space, however, there was no discontinuity: participants were able to acquire this “high ID” target in a single gesture. This was possible because of the high resolution of the input device, a 1270 lpi1 WacomTM tablet, combined with the hand's ability to position the tip of a stylus efficiently in an area as small as 0.06 mm. When using a stylus on the Wacom tablet, the task followed a Fitts' Law prediction: IP remained at approximately 4 bits/s both for an easy target (ID = 4 bits, single-scale display) and a difficult one (ID = 12 bits, multi-scale display). However, when the same task was performed using a “puck”, similar to a computer mouse, IP dropped from 5.17 bits/s for the easy target to 3.09 bits/s for the difficult one. One key observation in this experiment is that the stylus and the puck positions were both sensed with the same device (the Wacom tablet). Thus, they shared the same physical sensing resolution and noise in measurement. This rules out the sensing mechanism as the source of performance difference, and points instead to the form factors of the devices themselves. Our interpretation is that the target size of 0.06 mm may have saturated human capacity when pointing with the puck, but not when pointing with the stylus. Thus, the stylus DHR exceeded 423 dpi (equivalent to 0.06 mm), but this is not the case for the puck DHR. This difference is to be expected since, as demonstrated by Balakrishnan et al. [3], the muscle group involved in control of a stylus yields greater efficiency than the muscle group (mostly the wrist) involved in control of the puck. We note that Guiard et al. were not searching for the limits of human resolution, which could have been done by experimenting with smaller target widths, e.g., of 1 point. However, their results provide strong motivation for the work reported here. While studying the effect of control/display (CD) gain on pointing performance, Casiez et al [9] acknowledged the existence of “limb precision problems” when using high CD gains, but unrelated to effects of device quantization. They noted “accuracy problems” with 2 mm targets and a CD gain of 12, thus estimating the limb precision at approximately 0.2 mm. However, this accuracy problem was only observed by an increased overshoot: participants were still able to acquire the targets. We are focusing on smaller target sizes that participants are not able to acquire in an ordinary amount of time. Our work on the limit of human resolution has its origin in a previous study comparing devices for 3D object placement [6]. We found that 3D positioning was more efficient using a regular mouse than directly pointing to the target location in the air with a free-space device. In trying to explain this surprising result, we hypothesized that the same task, which was possible using the mouse, became too difficult to perform with the free-space device due to a difference in the hand's precision when moving in the air, as opposed to sliding on a table. We went on to conduct an informal evaluation of the static stability of the hand with the free-space device. In comparison, the current work formally acknowledges the existence of a “Device's Human Resolution”, provides a definition for it based on the dynamic stability of the hand, and attempts to measure it for three devices through an empirical study. 1
Wacom measures their device resolution in lines per inch, which has a slightly different interpretation from dots per inch.
On the Limits of the Human Motor Control Precision
111
3 Defining the Device's Human Resolution 3.1 Device Resolution According to the The New Oxford American Dictionary, resolution is “The smallest interval measurable by a scientific (esp. optical) instrument” [1]. In the case of an input device, its resolution can be defined as the smallest displacement that will trigger a change in its reported position. For some devices, such as an optomechanical mouse, determining the resolution is easy. The measuring component itself, in this case, a wheel with holes, provides a discretization mechanism that only reports measurements when the device has been moved by an amount of at least its resolution. Many input devices, however, provide a noisy raw measurement in which the reported output varies even when the device itself is static. Such devices, for example, optical trackers or accelerometers, are not statically stable. In this case, the concept of resolution, as defined above, cannot be applied directly. Instead, the raw measurement is usually filtered until variations in measurement of a static object are not reported. One way to stabilize noisy device outputs is to estimate the standard deviation of the raw measurements of the position of a static object. Then, a threshold is defined as a multiple of this standard deviation. Variations in measurements are reported only if they exceed the threshold. Assuming Gaussian noise and a threshold of four standard deviations, for example, ensures that no variation is reported for a static object on 99.9937% of the measurements. Since the device only reports an output when it is moved by more than the threshold amount, this value defines the actual device resolution. This is the approach we followed to stabilize the output of our optical tracker for the free-space device. 3.2 Device's Human Resolution Static stability. We initially considered a simple definition for the Device's Human Resolution (DHR), based on the static stability of users manipulating the device. This measure can be evaluated in a similar manner to the static stability of the device itself. Users are asked to hold the device as statically as possible for a given duration, and the positions reported by the device are recorded. The standard deviation of the recorded positions is computed and the static resolution of the hand can be defined as four times the standard deviation. Note that this is only feasible if the recording device resolution exceeds that of the participants. Otherwise, another higher resolution measuring device must be used. For example, to determine the human resolution of a low-resolution inertial tracker, the inertial tracker could be augmented with optical markers and tracked by a high-resolution motion capture system. However, a static stability definition of DHR is not ideal for two reasons. First, it would not be applicable to self-stabilizing devices such as a regular mouse, which are perfectly stable when users release their grasp, and hence, provide a null standard deviation in their recorded values. Second, this measure estimates the capacity of users to hold the device stable for a duration, a much less frequent and more demanding task than that of target acquisition. We therefore chose another measure for DHR, inspired by the dynamic stability of the hand.
112
F. Bérard, G. Wang, and J.R. Cooperstock
Dynamic stability. A dynamically stable system quickly recovers its steady state after being moved away from it. By extension, the dynamic stability of the hand, coupled with a device, can be measured as the time for the hand to bring the device within a specified range of a target position, starting at some initial distance from the target. In this case, the range is equivalent to the target width for a Fitts' acquisition task. Our assumption is that below some specific threshold of target width, we should observe a significant drop in the ability of the user to acquire the target quickly. This threshold is a good candidate for a measure of the DHR. We thus define the DHR as the smallest target size that a user can acquire with the device, given an ordinary amount of effort. Obviously, ability varies between users, so the DHR should be measured as an average over a population, much in the same way that the throughput of a device is computed. 3.3 On the DHR and the Control/Display gain It could be argued that human pointing precision can be increased at will, and thus, arbitrarily small targets can be acquired comfortably, simply by lowering the control/display (C/D) gain of the input device. Indeed, the C/D gain is a useful software mechanism that allows the system to adapt the input capabilities of users and devices to the work area. However, this improvement in precision comes at the cost of longer acquisition of distant targets: lowering the gain trades off speed of motion for accuracy. Users have to make larger motions, and clutch more often, to reach targets across the work space. As shown by Casiez et al. [9], low C/D gains afford the lowest performance in task acquisition, either using constant gain or pointer acceleration. Regardless, the DHR is a physical limit that exists only in motor space, not in display space. In particular, this is independent of any technique used to transfer the users' physical motion into movement of the pointer. We interpret the DHR as the quantity of information that the user can transmit by unit of physical motion to the system, with a particular input device. Clever transformations may be applied to this information (e.g., pointer acceleration or semantic pointing [7]), but these are irrelevant to the underlying notion of a physical human resolution.
4 Measuring the DHR Assuming a fixed distance to target, an increase of effort is expected when acquiring targets of decreasing sizes. In ordinary circumstances, Fitts' Law tells us that this additional effort should be proportional to the variation of task difficulty, as defined by the Fitts' ID. Our approach to measuring the DHR is equivalent to finding where this increase is no longer proportional. In other words, it would be pointless to analyze the difference of mean acquisition times when target size decreases as they are expected to increase. The slope of the line fitted to this data (the Fitts' regression line) should remain roughly constant, though, and this is why we chose to analyze the variation of this slope, as a function of target size.
On the Limits of the Human Motor Control Precision
113
In addition, we expect error rate to remain roughly constant before the DHR, but increase thereafter. The experimental task we use is very similar to a Fitts' task, with the following variations: − Inclusion of difficult targets: Targets are varied from relatively large (easy) to small sizes that are suspected to be below the DHR (difficult). − Fixed amplitude: Rather than vary the distance to target, we chose a reasonable fixed value for this parameter. The target must be sufficiently far away to require a significant displacement of the device, but close enough so that acquisition can be performed in a single gesture. In particular, we ensure that clutching is not required. In our pilot studies, we did not observe any effect of target distance on the DHR, but more formal studies would be required to provide strong recommendations regarding the optimal distance. − Requested target width instead of effective target width: using the effective target width [12] in the Fitts' Law linear regression handles the speed-accuracy tradeoff problem and yields a better fit, and thus a better estimate of the device's throughput. However, the effective width method cannot be applied in our context: it would hide the difficulty for participants to acquire targets whose size is below the DHR, by counting failures as successes on larger targets. Our goal is to witness a drop in performance (either an increase of time or error rate) when participants actually try to acquire targets of the requested sizes. Besides, the lack of optimized fit of the regression line is not a problem in our context: in contrast with the throughput, the DHR is not a parameter of the regression line. − Validation of acquisition: Since we are focusing on the capability of the (dominant) human hand to position a device in a small area, we want to avoid any parasitic motion induced by a validation mechanism. We thus allocate the validation action, e.g. clicking, to the participant's other (non-dominant) hand. − Parametrization in motor space: Fitts' experiments have been used to evaluate not only devices, but also interaction techniques, such as semantic pointing [7]. Depending on the study, the two parameters of task difficulty, namely, target size and amplitude, can be defined in motor space, display space, or both. When measuring DHR, these parameters are always defined in motor space.
5 Experimental Verification We conducted an experiment, following the approach presented above, to verify whether a discontinuity does indeed appear in user performance as target size decreases to very small levels, and if so, to estimate these levels. 5.1 Input Devices Three input devices were tested: a mouse, a free-space device, and a six degrees of freedom (DoF) stylus. All three devices were tested in conjunction with a keyboard, used by the non-dominant hand, for validation of target acquisition.
114
F. Bérard, G. Wang, and J.R. Cooperstock
Fig. 1. The tracked free-space device
− mouse: a Logitech G500TM gaming mouse, with physical controls allowing its resolution to be configured from 200 to 5700 dpi. For our experiments, we used the mouse at its maximum resolution of 5700 dpi, placing it on a typical wooden desk surface as a reasonable compromise between sliding ability and friction. The mouse system driver was bypassed: raw data was used in order to eliminate any mouse acceleration function implemented in the system. We performed a quick verification of the advertised resolution by observing the output of the mouse when translated by a distance of 10 cm, as measured by ruler. We verified that the final output difference corresponded to expectations, and that all in-between values were outputted. − free-space device: a custom-built, comfortable, graspable rigid object, with six attached reflective markers, as shown in Figure 1. Its position was obtained by a carefully calibrated motion capture system (a six-camera NaturalPoint OptitrackTM), placed as close as possible to the workspace in order to maximize device resolution. This resolution was measured with the procedure explained in the above section “Device resolution”: static stability was evaluated in different locations of the workspace at a maximal standard deviation of stddevmax = 0.00335mm. The “tick size” (minimal displacement before motion is reported, i.e., the resolution) was thus set at 4 * stddevmax = 0.0134mm. However, pilot experiments indicated that the human hand stability in free space was nowhere close to this resolution, making it exceedingly difficult to acquire a target of one tick width. We therefore filtered the motion capture output further to a resolution of 0.3mm, equivalent to 85 dpi. − stylus: the stylus attached to a Sensable PHANTOM OmniTM force feedback device. The PHANTOM was used only for its input capability: the force feedback mechanism was not activated and remained passive. This provided some amount of self-stabilization compared to a free-space device. The device has a resolution of 450 dpi that we checked with a similar procedure as with the mouse. 5.2 The Task A mouse is designed for 2D tasks, whereas the free-space device and stylus are better suited for 3D tasks. Our experimental task, however, was a 1D target acquisition: in
On the Limits of the Human Motor Control Precision
115
measuring human resolution, we were looking for the smallest displacement controlable by hand. This displacement is measured as a 1D distance, even if performed in 2D space (for the mouse) or in 3D space (for the two other devices). As such, we believe that a 1D target acquisition with a constant gain of one is the most fundamental operation of any pointing device, regardless of interaction style. Moreover, this task presents the additional benefit of uniformity across devices. Although the task was one-dimensional, we wanted to evaluate the DHR of each device in its natural operating space. Thus, we did not physically constrain any of the devices to performing 1D motion as this would interfere with the usual 2D operating space of a mouse or the 3D operating space of the stylus and the free-space device. Moreover, we did not suggest any particular posture be used with all devices: we expected participants to optimize their posture so that they felt most comfortable to achieve these very difficult target acquisitions. Indeed, for the 3D devices, we observed that participants tended to find a way to lock their forearm on the desk so that only the degrees of freedom of the wrist and fingers would be used. Participants were asked to acquire targets of width 1, 2, 4, 8, 16, 24, and 32 ticks at a distance of 250 ticks, all measured in motor space. For the mouse and stylus, ticks correspond to the smallest displacement measurable by each device, or in other words, its resolution. For the free-space device, ticks were deliberately reduced in resolution, as described above. The corresponding Fitts' ID values, using Equation 1, were 3.14, 3.51, 4.06, 5.01, 5.99, 6.98, and 7.97. This range of Fitts' ID is typical of many Fitts' studies reported in the literature, hence a very good fit of the regression line to the experimental data would be expected, if not for the unusually small target sizes. Starting area
Pointer
Target
Fig. 2. The user interface of the experiment. Text labels have been added to the figure for clarity.
Graphical feedback was displayed on a 1920 x 1080 LCD monitor. As shown in Figure 2, a red pointer, a grey starting area, and a black target were all represented on a white background by vertical rectangles spanning the entire height of the display. The pointer and starting area widths were 1 and 40 ticks, respectively, and the target
116
F. Bérard, G. Wang, and J.R. Cooperstock
width was varied between 1 and 32 ticks. To avoid potential issues of visual acuity, we used a tick to pixel factor of 4, i.e., the 1 tick pointer was displayed as 4 pixels wide. Participants had to bring the pointer to the starting area and validate by pressing the spacebar key with their non dominant hand. This caused the next target to be displayed, centered exactly 250 ticks from the current pointer position. To terminate a trial successfully, participants then had to move the pointer to the target area and validate again. As visual feedback, the pointer color changed to green when above the target. If the pointer was outside the target upon validation, the system emitted a beep and the user was offered another chance. Upon either successful acquisition or three successive failed attempts2, the target disappeared. Participants would then begin the next trial by validating within the starting area. The experimental setup is shown in Figure 3.
(a) Mouse
(b) Free-space device
(c) Stylus
Fig. 3. The experimental setup
5.3 Experiment Design Eighteen right-handed volunteers, ages ranging from 22 to 38, participated in the study. Each device was tested with six participants, five male and one female each. Participants ran the experiment on only one device in order to limit the effects of learning transfer. Each target size was tested for 20 acquisitions. The overall number of trials was thus: 3 devices x 6 participants x 7 target sizes x 20 repetitions = 2520 target acquisitions 5.4 Results Results of the experiment are provided in Table 1 and illustrated in Figure 4. 2
The decision to require three successive attempts in the case of failure was made to discourage participants from ignoring the acquisition of difficult targets. Attempts to rush through this and failing incurred more target selection effort. However, only the first attempts at acquisition were considered successful for the computation of error rate and mean acquisition time.
On the Limits of the Human Motor Control Precision
In order to analyze the deviation of data from Fitts' prediction, we computed Fitts' regression lines on all subsets made of three successive IDs for every users (i.e., regressions were computed on sets of 3 x 20 trials minus failures, for each 6 participants x 3 devices). In the above referenced table and figure, slopes are reported for the Fitts' ID at the center of the 3- ID subset. Hence, there is no slope value for the first and last ID. If the mean acquisition time was following Fitts' Law, we would expect the slope to remain roughly constant. In our case, we expect the slope to increase significantly when the target size becomes too small. Slope. Three within-subjects ANOVA were computed, one for each device. Each revealed a significant effect of ID on the slope (F(4,20) > 3.67, p < 0.05). Pair-wise comparison were computed for all couples of ID using one-sided paired t-tests, as we were expecting that slopes at higher IDs would be greater than the ones at lower IDs when passing the DHR. For the mouse, the pair-wise comparison found a significant difference only between the mean slope at ID = 6 on one side, and on the other side at ID = 3.5, 4.1, 5 (t(5) < -4.1, p < 0.01). For the free-space device, all difference between mean slopes were significant (t(5) < -2, p < 0.05), except between ID = 6 and ID = 7. For the stylus, the only significantly different mean slopes (t(5) < 2.2, p < 0.05) were at ID = 3.5 on one side, and on the other side at ID = 5, 6, 7. Failure rate. The within-subjects ANOVA for the mouse and the stylus did not show an effect of ID on the failure rate, but the ANOVA for the free-space device did show a strong effect (F(6,28)=13.61, p < 0.001).
118
F. Bérard, G. Wang, and J.R. Cooperstock 6.00
2
Mouse
1.8
time 5.00
fail rate
1.6
slope
1.4
4.00
1.2 3.00
1 0.8
2.00
0.6 0.4
1.00
0.2 0.00
8.00
3.1 32 .14 180
3.5 24 .11 240
4 16 .071 360
5 8 0.036 710
6 4 0.018 1400
7 2 0.0089 2850
0 8 1 0.0045 5700
(bits) (ticks) (mm) (dpi)
2
Free Space
1.8
7.00
1.6 6.00 1.4 5.00
1.2
4.00
1 0.8
3.00
0.6 2.00 0.4 1.00 0.00
4.00
0.2
3.1 32 9.6 2.6
3.5 24 7.2 3.5
4 16 4.8 5.3
5 8 2.4 11
6 4 1.2 21
7 2 0.6 42
8 1 0.3 85
0
(bits) (ticks) (mm) (dpi)
2
Stylus
1.8
3.50
1.6 3.00 1.4 2.50
1.2 1
2.00
0.8
1.50
0.6 1.00 0.4 0.50
0.2
0.00 3.1 32 1.8 14
3.5 24 1.38 19
4 16 0.9 28
5 8 0.45 56
6 4 0.23 110
7 2 0.11 225
0 8 1 0.056 450
(bits)
(ticks) (mm) (dpi)
Fig. 4. Results of the experiment: mean time, mean failure rate, and mean slope of the Fitts' regression line (see text for details) as a function of Fitts' ID and target size for the three devices. Mean time (seconds) uses the left scale. Failure rate (%) and slope (dimensionless) share the right scale, but failure rate was multiplied by 10 for improved legibility. Mean slopes and failure rates are shown with standard error bars. Fitts' regression line for ID = 4 is shown as a black dashed line.
On the Limits of the Human Motor Control Precision
119
Par-wise comparison were computed for the free-space device only. No significant increase was detected on all the couples made from the first three IDs (ID = 3.1, 3.5, 4), nor from (ID = 5, ID = 6). All other difference were significant: t(5) < -2.1, p < 0.05 for (ID = 3.5, ID = 5) and (ID = 4, ID = 5); t(5) < -3.4, p < 0.01 far all remaining couple of ID. 5.5 Analysis As expected, target acquisition performance collapses as target size become smaller. This can be observed either on mean acquisition time, mean failure rate, or both. Mouse. In the case of the mouse, the slope remains roughly stable on the first three data points, but then it increases significantly at ID = 6. Even though the curve shows a very steep slope for the last data point (ID = 7), the lack of significant difference is due to the high variability of the data at this point. This variability can be explained by considering that participants were asked to perform target acquisitions that were far beyond their motor control capabilities. The unexpected result was the mean failure rate which remained at less than 4% across all targets, with no significant effect of ID. This came as a surprise as we anticipated an increased error rate past the DHR. Participants managed to acquire the single-tick, 0.0045 mm target. The explanation lies in the self-stabilizing capability of the mouse: after several trials and errors, once the pointer was shown on target, participants can release the pressure on the mouse and validate with their non dominant hand. However, successful acquisition through this trial and error approach comes at the expense of a large increase in acquisition time, and higher variability. Clearly, at this size level, target acquisition no more appears as an automatized motor control. If based on the last point that does not significantly diverge from Fitts' Law, the DHR of this mouse can be estimated at above target size 0.036 mm and below 0.018, or between 700 and 1400 dpi. Free-space device. The data for the free-space device is quite different. For this, the mean time exhibits a curved shape that does not clearly reveal a discontinuity. This translate to a steadily increasing slope, with all difference but the last being significant. Here again, the last estimation of the slope (ID = 7) shows a much larger variability than the previous ones. Here, the failure rate is far more informative of performance sensitivity to target size. Failure rate remain at a usable level below 5% with no significant difference with the first three target size. At target size 2.4 mm, however, failure rate jumps to 12% (a significant increase from the three previous failure rates), and eventually, as high as 60% for the smallest target size. Clearly, the participants' strategy was different from that for the mouse. The lack of self-stabilization made it very frustrating to try to hold the pointer inside the small targets for validation. When target size became too small, participants gave up and opted to fail rather than fight for success. This demonstrates the complementarity of error rate and acquisition time in determining the DHR. In the case of the free-space device, the bottleneck is error rate. The DHR appears to be bracketed in the range of 5-10 dpi.
120
F. Bérard, G. Wang, and J.R. Cooperstock
Stylus. The stylus exhibits the same trend as the mouse: a very good fit of the mean time to prediction on the first three points, and no significant difference detected on the slope for these points. At ID = 5, the slope is significantly higher than the one at ID = 3.5. However, at ID = 5 and ID = 6, the mean time is less than 13% above the prediction from previous points, and the error rate remains low below 5%. The next smaller target size is more problematic with a 10% error rate. We thus estimate the stylus DHR in the range of 100-200 dpi.
6 Discussion One of the unexpected discoveries of our experiment is the benefit of self-stabilizing devices for very accurate target acquisition. The property of such devices that they come to perfect rest when users release control allows for a trial and error approach. Users will make an attempt, release their grasp, and check the outcome of the attempt, repeating as necessary until success. At every attempt, users get a chance to rest. This is in contrast with non-self-stabilizing devices, where users have to maintain firm control of the device to ensure that it remains on target at the moment of validation. Even though we did not try to control this parameter, we realized a posteriori that our three devices cover a spectrum from completely self-stabilizing (the mouse) to not self-stabilizing at all (the free-space device), and the stylus providing a middle-ground by way of its mechanical arm, which provides some amount of self-stabilization. This could provide an explanation to the different trends observed in the data: failure rate remaining low for the mouse, but shooting up for the free-space device, and a middleground observed with the stylus. While we were looking for a clear threshold in performance, what we observed was rather a gradual degradation. This is not surprising considering the fact than we are estimating a human parameter, and that the estimation comes from a population of variable individuals. Similarly to the processors' cycle times and the short-term memory capacity in the Model Human Processor [8], a range is a better depiction than a single value for these parameters. Still, we were able to witness, in the case of the mouse and stylus, an initial alignment with the mean time prediction followed by a significant departure from the prediction. In the case of the free-space device, it is the error rate that revealed the collapse in performance. The most direct application of the DHR is a better allocation of the efforts to improve physical resolution of devices. In the case of the mouse, we found the DHR in the range of 700-1400 dpi. Even using the higher boundary, this raises the question of whether, for non-experimental purposes, there is any benefit beyond marketing for the 5700 dpi mouse we used. The results of our experiments suggest that further efforts to increase the dpi of the mouse would be wasteful. There are, however, new types of devices for which the DHR can be used to define how much improvement in resolution is required for optimal tracking. Mobile devices, for example, are typically equipped with accelerometers, allowing them to be used in new forms of gestural interaction. The peep-hole display [15], for example, is a clever technique to allow efficient navigation in a large virtual document with a small screen device. This interaction technique requires sensing of the device's
On the Limits of the Human Motor Control Precision
121
position in space, using its translation to navigate the document. This is very similar to the free-space interaction we evaluated, for which we determined that the DHR was in the range of 5-10 dpi. Thus, the accelerometers or other sensors used to track the device position for the peep-hole display would offer no improvement in control with a sensing resolution notably larger than 10 dpi. Our work also confirms what was hinted at by Guiard et al. [11] in their work on multi-scale pointing: there is room for interaction techniques that exploit the accuracy of the hand when used in conjunction with appropriate devices, e.g., the high resolution mouse used in our study or the stylus used in Guiard et al.'s study. Their experiment included a 12.2 bit target acquisition task with a target size of 0.06 mm. Our experiment shows that target size can be reduced as far down as 0.036 mm (a 13 bit target using the same 289 mm amplitude) while maintaining good performance, i.e., in line with the Fitts' Law prediction. We believe that the DHR provides an important measure of the information capacity of devices. It was shown that our stylus can be used efficiently to select targets of size down to 0.5 mm, whereas for a free-space device, the limit was 5 mm. Thus, for an equivalent operating range, the stylus can be used to select from ten times as many potential targets (in one dimension). However, the operating range of the stylus attached to the PHANTOMTM device is much smaller than the operating range of a free-space device (i.e. the space comfortably reachable around the user). Hence, the operating range of a device should also be taken into account when discussing the information capacity of the device. DHR and operating range define some form of static ability of a device: how many different targets can be fit inside the device's operating range. This should be studied in relation with the dynamic ability of the device: how efficiently can these targets be selected, i.e., the Fitts' throughput.
7 Conclusion While it could be anticipated that target acquisition performance collapses when reaching a human limit of accuracy, we are not aware of any work specifically studying this phenomenon and addressing the evaluation of this limit. Our experiment highlights the fact that there is a significant drop in human performance when the target size decreases below a measurable threshold. We call this threshold the Device's Human Resolution (DHR). The DHR is highly dependent on the form factor of the device and its mode of operation: our experiment revealed DHRs spanning three orders of magnitude. The DHR of the mouse was in the order of thousands of dpi, the 3 DoF stylus in the hundreds, and the free-space DHR in the tens of dpi. This work was a first attempt to demonstrate the existence of a Device's Human Resolution and at evaluating it for some specific devices. The DHR values that we estimated, however, were only one-dimensional. Further studies will be required to evaluate the variation of the DHR in higher dimensions, and evaluate the DHR for other devices. We hope that our initial effort in this regard inspires ongoing research by others to further explore these issues.
122
F. Bérard, G. Wang, and J.R. Cooperstock
References 1. Abate, F.R., Jewell, E. (eds.): The new Oxford American dictionary, 2nd edn. Oxford University Press, New York (2005) 2. Albinsson, P.-A., Zhai, S.: High precision touch screen interaction. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 105–112. ACM, New York (2003) 3. Balakrishnan, R., MacKenzie, I.S.: Performance differences in the fingers, wrist, and forearm in computer input control. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 303–310. ACM, New York (1997) 4. Baudisch, P., Chu, G.: Back-of-device interaction allows creating very small touch devices. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 1923– 1932. ACM, New York (2009) 5. Benko, H., Wilson, A.D., Baudisch, P.: Precise selection techniques for multi-touch screens. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 1263– 1272. ACM, New York (2006) 6. Bérard, F., Ip, J., Benovoy, M., El-Shimy, D., Blum, J.R., Cooperstock, J.R.: Did “Minority Report” Get It Wrong? Superiority of the Mouse over 3D Input Devices in a 3D Placement Task. In: Gross, T., Gulliksen, J., Kotzé, P., Oestreicher, L., Palanque, P., Prates, R.O., Winckler, M. (eds.) INTERACT 2009. LNCS, vol. 5727, pp. 400–414. Springer, Heidelberg (2009) 7. Blanch, R., Guiard, Y., Beaudouin-Lafon, M.: Semantic pointing: improving target acquisition with control-display ratio adaptation. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 519–526. ACM, New York (2004) 8. Card, S., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Mahwah (1983) 9. Casiez, G., Vogel, D., Balakrishnan, R., Cockburn, A.: The impact of control-display gain on user performance in pointing tasks. Human-Computer Interaction 23(3), 215–250 (2008) 10. Fitts, P.M.: The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology 47(6), 381–391 (1954) 11. Guiard, Y., Beaudouin-Lafon, M., Mottet, D.: Navigation as multiscale pointing: extending fitts’ model to very high precision tasks. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 450–457. ACM, New York (1999) 12. MacKenzie, I.S.: Fitts’ law as a research and design tool in human-computer interaction. Human-Computer Interaction 7, 91–139 (1992) 13. MacKenzie, I.S., Kauppinen, T., Silfverberg, M.: Accuracy measures for evaluating computer pointing devices. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 9–16. ACM, New York (2001) 14. Vogel, D., Baudisch, P.: Shift: a technique for operating pen-based interfaces using touch. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 657–666. ACM, New York (2007) 15. Yee, K.-P.: Peephole displays: pen interaction on spatially aware handheld computers. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 1–8. ACM, New York (2003) 16. Zhai, S.: Characterizing computer input with fitts’ law parameters–the information and non-information aspects of pointing. International Journal of Human-Computer Studies 61(6), 791–809 (2004); Fitts’ law 50 years later: applications and contributions from human-computer interaction 17. Zhai, S., Milgram, P., Buxton, W.: The influence of muscle groups on performance of multiple degree-of-freedom input. In: ACM Conference on Human Factors in Computing Systems (CHI), pp. 308–315. ACM, New York (1996)
Three around a Table: The Facilitator Role in a Co-located Interface for Social Competence Training of Children with Autism Spectrum Disorder Massimo Zancanaro1, Leonardo Giusti1, Eynat Gal2, and Patrice T. Weiss2 1
Abstract. In this paper we describe a co-located interface on a tabletop device to support social competence training for children with Autism Spectrum Disorder. The interface has been developed on the multi-user DiamondTouch tabletop device as a 3-user application for two children and a facilitator (therapist or teacher). It takes advantage of the DiamondTouch table's unique ability to recognize multiple touches by different users in order to constrain interactions in a variety of ways. This paper focus on the support provided by the system to enhance a facilitator's management of interaction flow to increase its effectiveness during social competence training. We discuss the observations collected during a small field study where two therapists used the system for short sessions with 4 pairs of children. Although limited by the number of participants to date, the interactions that emerged during this study provide important insight regarding ways in which collaborative games can be used to teach social competence skills. Thus the children benefit from the motivational and engagement value of the games while the facilitator gains access to new tools to intrinsically support and shape the session. Keywords: Autism Spectrum Disorder, collaborative games, multi-user colocated interfaces, Cognitive-Behavioral Therapy.
normal IQ, and some even exhibit exceptional skill or talent in specific areas. Still, dysfunction in social interaction and difficulties in emotional expression and recognition are indeed considered to be among the core deficits associated with ASD [1]. Training to improve social competence is therefore important for children with ASD, particularly for those with HFA [3]. When collaborative games are used in a therapeutic setting for children with HFA, the therapist plays a central role in facilitating tabletop activities and in leveraging the educational value of the experience. Yet, the design of these systems seldom explicitly acknowledges and empowers the role of the therapist as an active user of the interface. In this paper, we present the Join-In Suite, an application developed to train and enhance social competencies, specifically collaboration, of children with HFA using a therapeutic approach based on Cognitive-Behavioral Therapy (CBT). In contrast to other work in the domain of technology for ASD, the Join-In Suite has been implemented on the DiamondTouch multi-user table [6] as a 3-user application for a therapist and two children. Three constraint patterns have been implemented as cooperative gestures on the interface; these are used extensively both to provide the therapist with specific controls to regulate the flow of interaction and to embed collaboration opportunities into the games. The evidence collected during a formative evaluation indicated that the use of these constraint patterns effectively allows the therapist to pace the flow of activities and to shape the behavior of the children, promoting and regulating the collaboration experience.
2 Related Work Despite the well-known benefits of using technology for children with ASD [11,20], as well as the benefits of CBT interventions for those with HFA [3], the CBT model has not yet been systematically implemented via technology. Some encouraging results can be found in the literature on co-located applications for children with ASD. For example, Piper et al. [19] investigated how a four-player cooperative computer game that runs on tabletop technology was used to teach effective group work skills in a middle school social group therapy class of children with Asperger's Syndrome. Gal et al. [9] demonstrated the effectiveness of a threeweek intervention in which a co-located tabletop interface was used to facilitate collaboration and positive social interaction for children with HFASD; significant improvements in key positive social skills were achieved. Similarly, Battocchi et al. [2] studied the ability of a digital puzzle game to foster collaboration among children with low and high functioning ASD; in order to be moved, puzzle pieces had to be touched and dragged simultaneously by the two players. Although all these systems assume that the facilitator (therapist or teacher) adopts the role of fostering and regulating the children's participation, this role has, to date, not been designed explicitly to be part of the interface. Tabletop devices support user interaction in unique ways. Although accuracy of touch on a table top interface is inferior to mouse interaction [7], a touch modality appears to be preferable to a mouse and keyboard because it reduces the barrier
Three around a Table
125
between the user and the graphical elements of the interface. Hornecker and colleagues [8] showed that for collaborative tasks a direct touch interface is more effective than the use of multi-mice in particular for what concerns higher levels of awareness, fluidity of interactions and spatial memory. Direct manipulation becomes particularly useful for children who present different levels of motor coordination ability [19]. A further advantage of tabletop devices is that they can be made large enough to allow multiple users to collaborate without crowding; computer monitors are usually not big enough to allow equal visual perspective and interaction by more than one or two persons.
3 Technology and Cognitive-Behavioral Therapy CBT is based on three assumptions: (1) interpersonal cognitive processes and emotions can mediate interpersonal behavior; (2) social problem solving and recognition of emotions can be taught cognitively and can influence behavior; and (3) social problem solving and a more comprehensive understanding of emotions can lead to later successful social adjustment. It also presumes that a more efficient cognitive understanding of the social world will lead to successful social adjustment in future situations. There is good preliminary evidence of its potential for teaching social skills to children with ASD [3]. However, despite the well-known benefits of using technology for children with ASD [16], as well as the benefits of CBT interventions for those with HFA, there have been no attempts, thus far, to explore the ways in which CBT can be implemented via technology. In a typical CBT session, a facilitator (either a teacher or a therapist with specific training) involves one or more children in some structured activities (the experience part) that allow the children to experience some social constructs which are then reflected upon (the learning part). The two parts are interleaved such that either may precede the other. The role of the facilitator is to form and expand the child’s conceptual understanding of collaboration since only experiencing the task will not lead to inner understanding of the concept. A technology designed to support a CBT session should therefore aim at three goals: (1) include the two parts: experience part to involve the children and learning part to help them to reflect; (2) support the facilitator to control activity flow and shape the collaboration experience; (3) embed specific interaction mechanisms to foster and promote collaboration between a pair of children.
4 Join-In Suite The Join-In Suite was developed to address the area of social competence for children with HFA, and specifically their ability to collaborate with each other. It has been conceived as a 3-user application for a facilitator and two children and is intended to be analogous to a standard CBT session with the target population.
126
M. Zancanaro et al.
The application has been designed by a team of interaction designers, computer scientists, educators and occupational therapists. The design cycles included two workshops with teachers, occupational therapists and children with HFA from two schools which offer programs to include children with ASD in mainstream education. Following the tenets of CBT, the application is divided into a learning part which realizes a structured version of the CBT social problem solving technique and an experience part based on the CBT behavioral reinforcement technique. The former presents a series of social vignettes that present a social problem in which children select, suggest ways to solve the social problem and consider the possible consequences of each solution; the children then choose the best solution that will lead to a positive social experience. The latter consists of a game that allows the children to directly experience the chosen solution which is the social task to be acquired. The application is implemented on the multi-user DiamonTouch device [6], a multi-user device which has the capability of recognizing different users when they interact with the system, i.e., it can track who is touching where. The Join-In Suite takes advantage of the DiamondTouch table's unique ability to recognize multiple touches by different users in order to constrain interactions in a variety of ways. For example, in some cases, to operate the system the children and the facilitator need to tap on the surface together (the system is not activated if the three of them don't touch the surface at specific places at the same time). In other cases, only the facilitator can activate certain functions (e.g., starting a game) Furthermore, multi-user devices allow social rules to be “hard-wired” within the logic of the system which may be more effective than rules from a human facilitator. For example, the gestures of more than one user may be interpreted by the system to indicate a single, combined command or a "cooperative gesture" [17]; the latter can be used to increase the children’s sense of teamwork and facilitate control on large, shared displays. In our previous work [4], we focused primarily on Constraints on objects pattern: some elements of the system need to be operated by more than one user to be activated or used properly. We have now extended the available constraint patterns in the Join-In Suite to include: Different roles pattern: the individuals have to play different roles in order to perform a task (and the system prevents a single user from performing different roles at the same time) and Ownership pattern: participants have ownership of objects such that the system does not allow a user to use or access others’ objects without explicit consent. These three constraint patterns are used to foster collaboration between children. They also enhance the therapist’s functionality by affording specific control over the interface elements, thereby providing opportunities to intervene in the flow of activity and to mediate the collaborative experience (see below).
5 A Walk through the Interface The interface is oriented in such a way that both children sit on one side and the facilitator sits on the other side facing them. The facilitator controls the interface from
Three around a Table
127
a panel on one side of the table while the children interact with the surface from the other side (see figure 1). The application provides access to several social stories (currently three) that exemplify problematic social situations. In order to choose a story, both the children and the therapist have to tap on the related card which displays a possible solution as a cartoon graphic. Each story has the same structure. During the learning part, the problem situation is presented together with five alternate solutions, one of which involves collaboration while the others offer selfish, non-collaborative or ineffective solutions to the problem; during the experience part, the children can play a game that represents the story. The facilitator can move between the two parts freely.
Fig. 1. Location of users during a Join In session
Figure 2 displays a screenshot of the learning part. The children can explore the different vignettes by selecting them and listening to an audio clip. Together with the facilitator, they can discuss possible solutions. They then have to select one of the vignettes as the solution to the problem, by tapping simultaneously on it (the children and facilitator). If the selected alternative is not appropriate, the system provides a textual and auditory explanation and encourages the children to try something else. At any time during the activity, the therapist can decide to switch to the experience part (that is, the game). 5.1 The Three Stories In the Apple Orchard story, the two children have to collect a number of apples for their mother to make jam but the basket is too heavy to carry individually. In the
128
M. Zancanaro et al.
learning part, they are offeered the alternatives of carrying the apples in their arrms, asking their grandpa to carrry it for them and other ineffective or non-collaborattive strategies. They are also offered o the “right” solution, that is, to carry the bassket together. The game implements th he Constraints on Objects pattern and displays a basket tthat has to be moved to collecct falling apples. The basket can be dragged by a sinngle child but then it moves to oo slowly; when both children drag it simultaneouslyy, it moves much more quickly so that the apples can be gathered before they fall to the ground. The level of difficulty varies v according to how the apples fall from the trees and can be controlled by the th herapist; they can fall from lower or higher branches (and therefore the players have more m or less time to collect them) and they fall down eitther one at a time or two togeth her at different places so that the children need to negottiate which apple should be colleected or lose both.
Fig. 2. A sccreenshot of the learning part of Apple Orchard
The Save the Alien story describes the situation of an alien starship that haad a breakdown and made an em mergency landing on Earth. The children are asked to hhelp the starship by collecting sh hooting stars to be used as its fuel. Again, the children are offered 5 vignettes with 4 non-collaborative n solutions and a collaborative one.
Three around a Table
129
Fig. 3. A screenshot of Save the Alien game
The gaming part of the story implements the Different Roles pattern; one of the children has to catch the stars by tapping on them to make them fall toward the sea while the other child has to move a small boat to collect them (see Figure 3). The boat can accommodate only a small number of stars or it sinks; if a star falls into the sea, it causes waves that capsize the boat. The players have thus to carefully synchronize their actions in order to provide enough fuel (i.e., stars) to the starship before the time expires. The system prevents the child who drives the boat from also catching the stars and vice versa. Finally, the Bridge story presents the situation of a fallen bridge that the two children have to rebuild. However, the pieces have been randomly strewn on the two banks of the river. The non-collaborative solutions include suggestions that one child can do all the work and the possibility of ignoring the problem while the “correct” solution is sharing each child's pieces that have fallen on the other bank of the other child. The game part implements the Ownership pattern (see Figure 4). The system prevents the children from collecting pieces that are on the other’s side of the river so they have to ask the other child to put it on a transport machine.
130
M. Zancanaro et al.
Fig. 4. A screenshot of the Bridge game
5.2 The Facilitator as a Primary User of the Interface In designing a user interface it is important to clarify who are the primary users (those that actually perform actions on the interface) from the secondary users who do not use the interface directly although they are key actors in deciding the adoption of and the actual use of the system. In almost all educational and therapeutic systems, teachers and therapists are considered to be secondary users. In Join-In Suite, we have explicitly acknowledged their role as an additional primary user who interacts with the children using specific functions. The constraint patterns introduced in the previous section have also been used in the design of the Join-In Suite to enable the facilitator to control the flow of interaction in the following manner: (1) some of the controls on the interface can be operated by the facilitator alone (Different Roles pattern); for example, the difficulty level of the game or the transition between the learning part and the game experience part (the same controls when operated by the children do not work); (2) some operations on the interface require a cooperative gesture (Constraints on Objects pattern) by all three users (the therapist and both children). In these cases the facilitator can refrain from contributing until there is more discussion by the children; (3) in each game, when a collaborative pattern is implemented, the facilitator can always act as a ”super-user” (i.e., he can take either role in the Different Roles
Three around a Table
131
pattern, have access to any resource in an Ownership pattern and act as a replacem ment of either child in the Consstraints of Objects pattern). In this way the facilitator can provide help to the children n when they are not able to properly manage the games.
6 Field Study A field study was conductted at a mainstream primary school that has three speecial classes for children with ASD. A The evaluation involved 8 boys with HFA aged 99-12 years. All were enrolled in special education classes (Grades 2-5) within the elementary school. Two occcupational therapists who work at the school were traiined as facilitators to use the Jo oin-In Suite application. Due to the large variability fouund amongst children with ASD D, it is difficult to recruit a homogeneous group of childrren. Thus, studies on this popu ulation are usually made with smaller samples than in the case with some other user groups. g
Fig. 5. Two children c and the facilitator using the Join-In Suite
The study was organizeed as a qualitative/observational study to provide iniitial assessment of the applicaation along several dimensions. We aimed primarilyy at collecting evidence of the im mpact of the system on the therapist/children interactionn. Each session was perforrmed by one of two therapists and involved two childdren with HFA. First, the therap pist introduced the Join-In Suite application to the childrren, who then were engaged in each e of the three stories (Fig. 5). At the end of the sessiion, the children were asked to rank r the games and had a debriefing interview.
132
M. Zancanaro et al.
All the sessions were video-recorded. Two video cameras were used; one facing the children to record all peer interactions, the other placed above the table to record the operations on the interface. At the end of the study, the two therapists were debriefed to discuss their experience with the application as a therapeutic tool.
7 Results In this section, we briefly summarize the evidence collected on site and via videotaped observations and therapist de-briefing interviews. A qualitative analysis of the videotaped sessions and interview transcripts based on grounded theory was conducted. In this paper, the data are presented with a focus on the role of the facilitator; a general discussion on the dimensions of collaboration has been presented in [10]. 7.1 Facilitator Use of the System to Organize CBT Intervention Both the facilitators made use of the functionalities discussed in 5.2 to control the session. For example, during one of the sessions, one child was too dominant and the other child had to comply with his peer’s commands. During the learning phase, the dominant child tapped which alternative was the most appropriate and “ordered” the passive child to touch the selected alternative. The more passive child simply followed this instruction; however, since the system also required the touch of the therapist to perform this action, she was able to use the triple-tap requirement (Constraints on Objects pattern) to limit and control the dominant child and involve the passive one. She purposely did not acknowledge the dominant child’s choice and involved the passive child in a conversation which encouraged him to suggest an appropriate solution. In this case, the therapist used the triple-tap to shape the collaboration dynamics between the two children. At another session, the facilitator used the triple-tap (Constraints on Objects pattern) to maintain control over activity flow. One of the children was hyperactive, impulsively touching many buttons and boxes on the table surface. In this case, the triple-tap was used to keep the activity at the discussion phase of the various alternatives until both children were ready to select the best solution. Rather than operating directly on collaboration dynamics, control was exerted over the pace of the activity. In a third case, the facilitator used the possibility of being a super-user by helping the more passive child to move the boat in the Save the Alien game. In this way, she reduced the child’s frustration but did not disrupt the flow of engagement of the other child. Note that game performance is not, in itself, important; rather it serves as a driver for the experience part. It is worth noting that when the facilitator acts a primary user the children's engagement was not reduced. On the contrary, the facilitator was often able to minimize a child’s feelings of inadequacy due to poor performance that would otherwise divert his attention and make it more difficult for him to reflect on the collaboration task. During the debriefing interviews the facilitators recognized the importance of these mechanisms in controlling the flow of interaction and suggested the addition of
Three around a Table
133
further ways to achieve explicit control. In particular, they suggested that the system was too structured in forcing a sequence of activities for the learning part while they sometime felt the need, for example, to temporarily remove the alternatives in order to focus on the story again. Furthermore, they recommended the addition of a general pause button that can be operated by the facilitator if the children become too distracted or engage in repetitive behaviors typical of ASD. 7.2 Interleaving the Experiential Part in the Form of Games with Learning Objectives and the Learning Part A major factor that distinguishes Join-In Suite from commercial computer and video games is the therapeutic model behind it. Indeed, the children were always very focused during the games, but they tended, as expected, to be somewhat distracted during the learning part. The ability to interleave learning and gaming was very effective to maintain the coherence and engagement of the whole experience. The two facilitators used, in general, two different approaches: one preferred to go from learning to experience in order to better explain the concept of cooperation while the other sometimes went from experience to learning to introduce the concept of cooperation in a more natural manner via the game rules. Both approaches were effective and it will be important to observe the strategies used by other facilitators over a longer period of time to understand and expand upon the system’s flexibility in adapting to different approaches to manage the session. The facilitators appreciated that the multimedia cards were more engaging for the children than the conventional, paper-based vignettes they were used to. Still, they also expressed a concern that the system only offered exploration of a predetermined set of cards. Indeed, they did involve the children in discussions aimed at proposing their own solutions but these discussions could not be incorporated into the interface in the form used during the field study. 7.3 Using Game Structure to Balance Role of the Facilitator As discussed above, the constraint patterns may be used to provide ways for a facilitator to exercise control over the interaction flow. Yet, it is also important that the facilitator be able to “fade out” in order to foster direct interaction between the children. In general, our observations confirm the findings of Piper and colleagues [19] about the motivational and engaging roles of constraint patterns embedded in the system for these types of users. All the games were engaging even if several factors made the Apple Orchard less successful than Bridge and Save the Alien (i.e., the basket was difficult to grab together and collecting falling apples was not considered to be an exciting task by the children). (In fact, differences in how the children viewed the games were a positive factor since it allowed them to feel and affirm an element of choice.) The three games elicited different collaborative strategies. The Bridge, which of the three games was the most appreciated by the children, was very effective in eliciting a negotiation dialogue centered on the notion of sharing. During one session, one child who has limited visual matching abilities was helped by his peer who was more adept at identifying the correct puzzle pieces. He explicitly asked for help
134
M. Zancanaro et al.
during the game. During the final interview, the child said: “In the bridge I learned that sometimes there is a need for another person to help on the other side”. The Save the Alien game elicited mainly dialogue aimed at achieving real time motor coordination to synchronize actions such as “Now, I’m hitting a star, are you ready?” The constraints imposed by the system (a child cannot do both tasks) led naturally to this kind of interaction without the need of the facilitator to determine and control the roles. These aspects were positively recognized by the therapists who supported our view that verbalization of collaborative behaviors is a fundamental therapeutic issue. As discussed above, in some cases the facilitators decided to intervene in the game to support one the children whose game performance was not adequate. They further suggested the possibility of directly controlling the duration of the game. Though the pressure of a time-limit for a game is an important aspect for the engagement, sometimes it was frustrating for the children to fail the game because they were too slow. The facilitator is now able to extend time simply by tapping on the clock.
8 Redesign of the Interface The insights of the formative evaluation were used to refine the design of the Join-In Suite (Fig. 6). For what concerns the role of the facilitator as a primary user, two areas of improvement were identified.
Fig. 6. A screenshot of the new interface of Join-In Suite
First, we provided a more effective way of controlling the activity, in particular in enabling the facilitator to move back and forth to the different sub-tasks of the learning part and between the learning and the game parts. Second, we enabled the facilitator to involve the children in proposing alternative solutions rather than just discussing the ones previously determined by the system.
Three around a Table
135
This redesign mainly consisted of making the sub-tasks of the learning activity more easily identifiable and navigable on the interface. We identified 5 steps in the learning activity that are now presented as explicit steps in the new interface (including the new step for the recording of an alternative proposed by children): (1) choose step where the social story is chosen amongst the three available; (2) present step is where the social story is introduced as a social problem and discussed by the children and the facilitator; (3) record step where the children can record their own solution to the social problem; this step is optional and the facilitator can skip or postpone it (4) select step where the children are asked to consider the 4 different alternatives provided by the system (and if present the one recorded by the children) to choose the most adaptive one which best solves the presented social problem; and finally (5) the sum up step where the social problem and the chosen solution are presented together and the facilitator can take the opportunity for further elaboration; this step is also optional. 8.1 The Control Panel The facilitator control panel has been completely redesigned and is now a tool to control the system and as a navigational aid to access the 5 steps and move between the two learning and experience parts. As in the previous version, we exploit the multi-user capabilities of DiamondTouch to prevent the children from operating on the toolbar. The new control panel is displayed in Figure 7. Two buttons positioned at the center of the panel enable switching between the learning and game parts. When the Learn button is selected, the step selector is visible on left side. When the Play button is selected the step selector is made invisible and a new panel appears on the left with some game specific controllers (see below).
Fig. 7. Facilitator's new control panel: on the top there is the Learn configuration and on the bottom the Play configuration
136
M. Zancanaro et al.
In the Learn configuration, a step selector is displayed that lists the 5 steps as buttons. In order to move to a specific step, the facilitator presses the corresponding button. The selected button signals the current step. When a step cannot be accessed, the corresponding button is disabled, that is the corresponding button is displayed with a light gray label and cannot be selected (for example, before the completion of the choose step, i.e., before selection of the social story, the other steps cannot be accessed). In the Play configuration, the game settings (level of difficulty of the game and required level of interaction between the children) can be adjusted. The game is started and stopped with the appropriate buttons. Positioned between the Learn and Play buttons, there is a Pause button. It can be used in any time to freeze the application. When this button is selected, nothing can be done on the interface. To continue using the application, only facilitator can press again the same button. 8.2 The Record Card When entering in the record step, a small control panel for managing the recording appears below the corresponding step button in the control panel step selector (Figure 8). Again, only the facilitator can use this button to record the children narrating their own solution.
Fig. 8. A screenshot of the Record step in Join-In Suite
The system displays a card that is then shown together with the system alternatives in the Select step. This card can be recorded as many times as needed but a new recording cancel the previous ones (though all the recordings are stored by the system
Three around a Table
137
for logging and analysis). Once the card has been recorded, the facilitator can decide to move to the next step; the card will be shown together with the system generated alternatives.
9 Discussion This initial formative evaluation of the Join-In Suite provided valuable insight regarding the possibility of using collaborative games as a basis for teaching social competence skills. Join-In Suite's strengths derive not only from the motivational and engagement value of the learning and gaming tasks but also from the provision of new tools that intrinsically support a facilitator while conducting a session. A number of studies have demonstrated the ability of children to acquire skills when interacting independently with computer technology [21,18]. The capacity of software programs to deliver stimuli, reinforce correct responses, and demonstrate errors, all in a reliable, controlled and motivating manner, make this medium attractive as an educational and therapeutic tool. In addition, such programs may enable educational interventions to be delivered by staff and family members who have not had explicit training. Much has been written about how to design software to help children with ASD improve in their academic, social and communication skills [18]. Davis et al [5] and Jordan [12] recommend that specific factors should be taken into account when designing learning environments for children with ASD in order to enhance their strengths while reducing the need for abilities which are more difficult for them. First, it is important that both the task and the actions of the facilitator be reliable, consistent and predictable. Second, the introduction of novel elements must be done in a gradual and controlled manner. Third, the computer-based learning activities should be challenging, but children with ASD should not be overly penalized for mistakes. It must be recognized that while experiencing a virtual environment they may apply strategies that helped them in the past in a real world, setting but do not work as well in a computer-based environment. Thus it is best when negative feedback is provided together with clear cues as to how to proceed. Fourth, it is important to use time as a motivating factor but not in a way that will add pressure to the task and that will not permit sufficient practice to achieve mastery. The Join-In Suite has explicitly incorporated these design guidelines and, as demonstrated by both the focus groups and formative studies, they have proven to be appreciated by teachers, therapists and children with ASD as we discussed in greater depth in [10]. Far less attention has been paid in the literature to the role of the facilitator as an active user of the computer game. Although there is an increasing amount of evidence about the beneficial effect of the presence of a facilitator in educational interventions for children with ASD (e.g., [14]), for most educational games, the facilitator acts as a secondary user by moderating access to the game but without actually using the interface. In Join-In, because the interface has been designed to give the facilitators a fundamental and dynamic role in shaping the children’s experience, they become primary users who directly operate on the interface. They set the pace of the activity, influence the dynamics between the children, and help them reach a reasonable level of performance. Just as a therapist acts as a moderator during
138
M. Zancanaro et al.
a conventional CBT session (controlling the pace, supporting the performance of activities, encouraging reflection), Join-In acknowledges this role and provides explicit support to achieve it. The type of functionalities offered to the facilitator can be broadly classified into two categories: those that can be used to limit non-adaptive behavior by the children and those that may be used to empower them. ASD is a very broad classification that includes different levels and types of social difficulties. With Join-In, we include a sophisticated way of reducing the impact of social avoidance behavior and, at the same time, enhance the motivation to communicate with the other. These are mainly achieved by the need to explicitly recognize the contribution of the peer (and the facilitator) in order to operate the interface. In addition, the facilitator, as a primary user, may exert this controlling effect when supported by appropriate constraints on the interface; the addition of a pause button for the facilitators allows them to freeze the application at any time providing further control of non adaptive behaviors of one or both children. In contrast, it is well known that social avoidance behavior in children with ASD may be reduced when they are involved in child-preferred activities [13]. Allowing the children a degree of flexibility in choosing and organizing their preferred activities, may empower them and lead to more engagement in the task. It is important to note that the facilitators in our study were able to use the interface to exert this control and to engage children. They also asked for more explicit controls to empower the children, for example, the possibility of extending the time of game. Requiring the involvement of 3 players, one of them being a moderator, Join-In provides a strong assistive tool to the teacher, rather than replacing her. Our experience has shown that trained therapists were able to rapidly learn how to use Join In to implement CBT-based social competence training. The Suite appears to be sufficiently usable that even less experienced operators could use it to provide additional training opportunities to the children.
10 Conclusion In the paper, we presented a tabletop interface based on a multi-user device to support teaching social competence skills in children with ASD. This paper focused on the role of the facilitator as a primary user of the interface and how the functionalities provided by the system enhance the management of interaction flow and increase its effectiveness. In particular, two broad categories of functionalities are offered to the facilitator: those that may be used to limit the negative behaviors of the children and those that may be used to empower them. Our experience has shown that trained therapists were able to rapidly learn how to use Join-In to implement CBT-based social competence training. Although limited by the number of participants to date, the interactions that emerged during this study provide important insight regarding ways in which collaborative games can be used to teach social competence skills. In particular, the use of the three constraint patterns in the design of the interface enabled the facilitator to effectively pace the flow of activities and to shape the children's behavior, promoting and regulating the collaboration
Three around a Table
139
experience. We do not claim that these three patterns are sufficient for these types of interaction; rather, they demonstrate a proof of concept as a starting point for exploring further implementations and additional patterns. Acknowledgments. This work was partially funded by the European project COSPATIAL (FP VII - 231266). We thank Nirit Bauminger for her key contributions to the application of the CBT model and to the design phase of the Join-In Suite; Sigal Eden for her help in the preparation of the formative study; Alberto Battocchi for the initial concept of Join-In and Francesco Telch for the implementation of the prototype. Finally, we want also to thank the children and the therapists who participated in the study.
References 1. American Psychiatric Association, Diagnostic and statistical manual of mental disorders, 4th edn., text revision, Washington, DC (2000) 2. Battocchi, A., Ben-Sasson, A., Esposito, G., Gal, E., Pianesi, F., Tomasini, D., Venuti, P., Weiss, P.L., Zancanaro, M.: Collaborative Puzzle Game: a Tabletop Interface for Fostering Collaborative Skills in Children with Autism Spectrum Disorders. Journal of Assistive Technologies, 4(1), 4, 14 (2010) 3. Bauminger, N.: Group social-multimodal intervention for HFASD. Journal of Autism and Developmental Disorders 37(8), 1605–1615 (2007) 4. Bauminger, N., Gal, Y., Goren-Bar, D., Kupersmitt, J., Pianesi, F., Stock, O., Weiss, P.T., Yifat, R., Zancanaro, M.: Enhancing Social Communication in High Functioning Children with Autism through a Colocated Interface. In: Proceedings of IEEE Multimedia Signal Processing 2007 - Special Session on Multimedia Signal Processing for Education Applications, Chania, Greece (2007) 5. Davis, M., Dautenhahn, K., Powell, S.D., Nehaniv, C.L.: Guidelines for Researchers and Practitioners Designing Software and Software Trials for Children with Autism. Journal of Assistive Technologies 4(1), 38–48 (2010) 6. Dietz, P.H., Leigh, D.L.: DiamondTouch: A Multi-User Touch Technology. In: Proceedings of User Interface Software and Technology (UIST), Orlando, FL (2001) 7. Forlines, C., Wigdor, D., Shen, C., Balakrishnan, R.: Direct-Touch vs. Mice Input for Tabletop Displays. In: Proceedings of ACM Computer Human Interaction, CHI 2007, pp. 647–656 (2007) 8. Hornecker, E., Marshall, P., Sheep Dalton, N., Rogers, Y.: Collaboration and Interference: Awareness with Mice or Touch Input. In: Proceedings of ACM CSCW 2008, pp. 167–176 (2008) 9. Gal, E., Bauminger, N., Goren-Bar, D., Pianesi, F., Stock, O., Zancanaro, M., Weiss, P.L.: Enhancing social communication of children with high functioning autism through a colocated interface. Artificial Intelligence & Society 24, 75–84 (2009) 10. Giusti, L., Zancanaro, M., Gal, E., Weiss, P.L.: Dimensions of Collaboration on a Tabletop Interface for Children with Autism Spectrum Disorder. In: Proceedings of ACM Computer Human Interaction, CHI 2011 (2011) 11. Grynszpan, O., Martin, J.C., Nadel, J.: Designing Educational Software Dedicated to People with Autism. In: Proceedings of AAATE 2005, Lille, France, vol. (2005)
140
M. Zancanaro et al.
12. Jordan, R. (1997) Education of Children and Young People with Autism, Guides for Special Education no 10 http://unesdoc.unesco.org/images/0011/001120/112089eo.pdf (accessed January 2011) 13. Koegel, R.L., Dyer, K., Bell, L.: The influence of child-preferred activities on the children’s social behavior. Journal of Applied Behavioral Analysis 20, 243–252 (1987) 14. Kroeger, K.A., Schultz, J.R., Newsom, C.: A Comparison of Two Group-Delivered Social Skills Programs for Young Children with Autism. Journal of Autism Dev. Disord. 37, 808–817 (2007) 15. Martinez, R., Kay, J., Yacef, K.: Collaborative Concept Mapping at the Tabletop. In: Proceedings of ACM International Conference on Interactive Tabletops and Surfaces (2010) 16. Moore, D., McGrath, P., Thorpe, J.: Computer-aided learning for people with autism-a framework for research and development. Innovations in Education and Training International 37(3), 218–228 (2000) 17. Morris, R.M., Huang, A., Paepcke, A., Winograd, T.: Cooperative Gestures: Multi-User Gestural Interactions for Co-located Groupware. In: Proceedings of ACM Computer Human Interaction CHI 2006, Montréal, Québec, Canada (2006) 18. Pennington, R.: Computer-Assisted Instruction for Teaching Academic Skills to Students With Autism Spectrum, Focus Autism Other Dev. Disabl. (August 2010) 19. Piper, A.M., O’Brien, E., Morris, M.R., Winograd, T.: SIDES: A Cooperative Tabletop Computer Game for Social Skills Development. In: Proceedings of CSCW (2006) 20. Putnam, C., Chong, L.: Software and technologies designed for people with autism: what do users want? In: Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility (2008) 21. Roschelle, J., Pea, R., Hoadley, C., Gordin, D., Means, B.: Changing How and What Children Learn in School with Computer-Based Technologies. In: The Future of Children, vol. 10(2), pp. 76–101. Packard Foundation, Los Altos (2001) 22. Tse, E., Greenberg, S., Shen, C., Forlines, C.: Multimodal multiplayer tabletop gaming. Comput. Entertain. 5(2), 12 (2007)
Moving Target Selection in 2D Graphical User Interfaces Abir Al Hajri, Sidney Fels, Gregor Miller, and Michael Ilich Human Communication Technologies Laboratory, University of British Columbia, Vancouver, BC, Canada {abira,ssfels,gregor,michaeli}@ece.ubc.ca
Abstract. Target selection is a fundamental aspect of interaction and is particularly challenging when targets are moving. We address this problem by introducing a novel selection technique we call Hold which temporarily pauses the content while selection is in progress to provide a static target. By studying users, we evaluate our method against two others for acquiring moving targets in one and two dimensions with variations in target size and velocity. Results demonstrate that Hold outperforms traditional approaches in 2D for small or fast-moving targets. Additionally, we investigate a new model to describe acquisition of 2D moving targets based on Fitts’ Law. We validate our novel 2D model for moving target selection empirically. This model has application in the development of acquisition techniques for moving targets in 2D encountered in domains such as hyperlinked video and video games. Keywords: Human performance modeling, Fitts’ Law, 1D Selection, 2D Selection, Moving target selection.
method we call Hold (also referred to as Click-to-Pause) which improves upon existing selection techniques for moving targets. Our motivation comes from live action sports footage where both the objects of interest (players, adverts, etc.) and the camera can move, creating a difficult scenario for object selection. However the problem of moving target acquisition applies generally to video games, rich media, web-site design and other dynamic interactionbased media. Selecting a moving target is a challenging interaction task, especially compared with that of selecting static targets. A moving target in one dimension is either moving toward or away from the cursor: in the former a rendezvous is required to acquire the target, with different directions of motion for observer and target, and a possible reversal of observer motion if the target passes by the observer. The difficulty of the task increases if either the target size is decreased or the target speed is increased. The complexity of the problem is increased when attempting target selection in two dimensions due to variations in target velocity, non-intersecting paths of observer and target motion and the predictability of target motion. Thus, we have created our Hold technique to mitigate this challenge by helping the user select objects while watching the video without the need to use a separate pause button every time they need to do so or chase a moving target which can be very difficult. Hence, watching video while selecting moving hyperlinks does not feel like a video game We will initially review previous research including Fitts’ Law and its subsequent extensions as well as various techniques proposed for target selection. We then describe our extension to Fitts’ Law which models moving targets in 2D and introduce our solution to moving target acquisition, Hold. Following this we outline our experiment which validates the model and evaluates Hold against a traditional approach, then we discuss the results of the experiment and how the selection techniques fit within our model. Finally, we conclude the paper by reflecting on the impact of the experiment and providing direction for future research.
2 Review of Previous Work Rapid aimed movements can be characterized using two different motor control models: the iterative corrections model (Fitts’ model [11]) or the impulse variability model (Schmidt et al. model [26]), which depends totally on the task demands. Wright and Meyer [31] have indicated that the iterative corrections model is applied to tasks with spatially constrained movements while the impulse variability model applied to tasks with temporally constrained movements. Spatially constrained movement tasks are those where movements end within a target region while trying to minimize the average movement time. Temporally constrained movement tasks are those where movements are initiated having specified durations in mind while the movements end nearby a target point, not a region. Since our approach involves spatially constrained movements, we characterized and derived our model by extending Fitts’ law. Fitts’ Law [11] is the most commonly used approach to study new acquisition techniques, since it was the first successful model to predict the time required to complete an acquisition task. The index of difficulty (ID) is modeled on a logarithmic
Moving Target Selection in 2D Graphical User Interfaces
143
scale depending on target width (W) and distance from cursor (D); movement time (MT) is modeled as a linear relation of ID:
⎛D ⎞ ID = log 2 ⎜ + 1⎟ ⎠ ⎝W
(1)
MT = a + b × ID
(2)
where a, b are empirically determined constants. This equation was initially proposed for stationary targets in 1D which also assumed that the direction of movement is collinear with W. However, it is not an applicable time predictor for 2D pointing tasks. To modify Fitts’ Law to account for 2D targets, several factors should be taken into consideration beyond the target width and distance constraints of the 1D Fitts’ Law. 2D pointing is constrained by the target area, the location of the target from the cursor (i.e. the 2D vector representing angle and target position in 2D). Bivariate pointing was first studied by MacKenzie and Buxton [21], where they tested five different formulae to model the index of difficulty and found two of them fit with their experimental results. Their first correlated formulation substitutes the magnitude of the target in the direction of movement (W’) for W and thus:
⎛D ⎞ + 1⎟ IDW ' = log 2 ⎜ ⎝W ' ⎠
(3)
Their second formula, which is highly correlated to their experimental data, substitutes the smaller value of a target’s width (W) and height (H). The index of difficulty is then:
⎛ ⎞ D + 1⎟⎟ IDmin = log 2 ⎜⎜ ⎝ min (W , H ) ⎠
(4)
Accot and Zhai [2] later identified problems with these formulations: Equation 3 ignores height (shown to have an effect by Sheikh and Hoffmann [27]), while Equation 4 considers only one dimension and ignores the angle of approach. Accot and Zhai proposed a weighted Euclidean model (Equation 5) which addressed the dimension issue. 2 ⎛ ⎛ D ⎞2 ⎞ ⎛D⎞ IDWtEuc = log 2 ⎜ ⎜ ⎟ + η ⎜ ⎟ + 1⎟ ⎜ ⎝W ⎠ ⎟ ⎝H ⎠ ⎝ ⎠
(5)
Accot and Zhai’s model is similar to the Euclidean norm, with the addition of the parameter η, which weights the effect of height differently from the effect of width. However, the Accot and Zhai’s formulation does not account for the angle of the
144
A.A. Hajri et al.
target from the cursor and is constrained to rectangular targets. Therefore, Grossman and Balakrishnan [14] proposed a probabilistic model that is generalized to any target shape, size, orientation, location and dimension. For moving targets, Jagacinski et al. [19] and Hoffmann [17] investigated how moving targets at constant speed affected the index of difficulty in Fitts’ Law. Jagacinski et al. [19] used empirical data from their study to derive an estimate of the index of difficulty for pursuing a moving target in 1D. Hoffmann [17] gave three different extensions to the Fitts’ Law: using a first order continuous control system, a second order continuous control system, and a discrete response model. For our 2D moving target selection models, we are going to extend his first order continuous control system model ID1st:
ID1st
V ⎛ ⎜ D± K = ln⎜ ⎜W − V ⎜ ⎝ 2 K
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
(6)
where K is an empirically determined constant. We also derive this same model using Card’s [8] formulation. This is one of the models we empirically investigate to determine how well it fits actual human performance. Researchers have also proposed interaction techniques to help users select targets by reducing ID as implied by the Fitts’ extended models. One approach consists in decreasing the distance from the cursor to the target (D) such as moving targets closer to the cursor [4], skipping empty space between targets by jumping from one target to another [15, 3] or using empty space between targets to increase the effective width and decreasing D [5]. These techniques are affected by the layout of objects on screen and tend to work best when targets are sparse on the screen. Several other methods also focused on modifying the effective width either by increasing the target width [23, 32], cursor area 10, 16, 20, 30] or effective width (activation area) [13, 5, 9]. Expanding target size or cursor area out-performs the regular technique for the selection of single isolated targets but they do not perform well with clustered or dense areas of targets as selection ambiguity and visual distraction arise. Those techniques also do not address the speed of targets. There also have been some efforts [6, 10, 29, 30] to improve the target acquisition time by changing the control-display (CD) gain (the ratio between distance moved by the physical input device and distance moved by the visual cursor). By increasing the CD gain in the empty space and decreasing it when the cursor is reaching or over targets, the motor space D/W ratio is decreased. However, they faced problems when distractors are present as these reduce cursor movement time degrading performance compared to regular pointing. Empirical studies have evaluated the effect of similar techniques on selecting moving targets and on acquisition time [16, 24]. However, these techniques faced similar problems with stationary targets as visual distractions and ambiguity in the case of multiple targets. As in real applications such as games (e.g. real-time-strategy games where the selection of moving objects is not the task of
Moving Target Selection in 2D Graphical User Interfaces
145
the game) and interactive videos, this is an issue which could degrade the performance in selection. In this paper we use our approach to freeze target motion to allow selection without visual distraction. An overview of our methodology is provided in the following sections.
3 Modeling Target Acquisition Our approach extends Fitts’ Law to accommodate moving targets in both 1D and 2D. Fitts’ model [11], shown in Equation 1, is valid for stationary objects in 1D. Jagacinski et al. [19] and Hoffmann [17] showed that this model failed to predict accurately the acquisition time for moving targets at a constant speed. They found that including target speed in the index of difficulty of a task showed an excellent fit with the mean acquisition time.
Fig. 1. Analysis of a moving target in 1D
Therefore, the model used to predict the acquisition time of moving targets must include target speed. To accommodate target speed in the model, we applied a model proposed by Card [8] and extended it to moving objects. Each acquisition task involves a ballistic phase and a corrective phase which could be considered as smaller acquisition subtasks. Card used these subtasks to derive a model similar to the approach taken in the steering law for a straight path [1]. The acquisition time is determined by human cognitive, perceptual and motor control systems. Each single movement needs perceptual processing time (τP), cognitive processing time (τC) and motor processing time (τM) to move the hand towards the target. Therefore, n correction movements will need n (τP + τC +τM) of time to capture the target [8]. By applying Card’s model, we took the remaining distance after each move where the initial remaining distance (X0) is the initial distance between the cursor and the target D. Let the relative accuracy to reach a target in each move be , then the remaining distance after the first move will be X1 = X0 = D. We should take into consideration that our target is moving at constant speed V which by the end of the first move after some time t will have already moved Vt. Then X1 will be D + Vt as the target moves away from the cursor which is illustrated in Fig. 1. Continuing with this argument, we get the following for n moves:
146
A.A. Hajri et al.
X n = X n−1 + Vt
(
= α n D + Vt 1 + α + α 2 + … + α n −2 + α n−1
)
⎛1−α n ⎞ ⎟⎟ = α n D + Vt ⎜⎜ ⎝ 1−α ⎠ using a geometric series. If the target is captured at this point then Xn should be Xn = W/2 and if we solve for n:
V ⎛ ⎜ D± K n = − log 2 ⎜ ⎜W −V ⎜ ⎝ 2 K
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
log 2 α
(7)
MT = n(τ p + τ C + τ M )
=
− (τ p + τ C + τ M ) log 2 α
V ⎛ ⎜ D± K log 2 ⎜ ⎜W −V ⎜ ⎝ 2 K
(8)
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
(9)
where K = (1 - α)/t, which is empirically determined and the ± indicates the direction of movement i.e. towards or away from the cursor . Then the acquisition time and index of difficulty is:
V ⎛ ⎜ D± K IDC1 = log 2 ⎜ ⎜W −V ⎜ ⎝ 2 K
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
(10)
This model coincides with Hoffmann’s first order continuous-control model [17] for the moving targets in Equation 6. For our second model of moving target selection in 2D, we look first at how the acquisition time of stationary objects in 2D is modeled and then extend it to moving targets. We adopted Accot and Zhai’s weighted Euclidean model (shown in Equation 5) [1]. However, as we mentioned earlier this model does not count for possible differences in performance due to varying movement angles. Therefore, we applied an approach proposed by Grossman and Balakrishnan [12] for pointing targets in 3D. They accommodated angles by adding an additional empirically determined weight parameter fW;H;D(θ) (width, height and depth) for each component in the weighted Euclidean model. We applied their model by removing the third dimension constraint (depth). Hence our modified weighted Euclidean model becomes:
Moving Target Selection in 2D Graphical User Interfaces
For the sake of completeness, this model is compared with IDW’ (Equation 3) proposed by MacKenzie and Buxton [21] after extending it to accommodate the possible effects of different target dimensions and various movement angles. From this perspective, the model becomes:
Next, we need to include the target speed to the IDP2 model in Equation 11. Therefore, we combined the model we discussed for moving targets in 1D (IDC1)and the model for stationary targets in 2D (IDP2). We revised our IDC1 model and broke the target velocity V into x and y components as follows:
V x = V cos(θ ),
V y = V sin(θ )
Thus the resulting index of difficulty IDC2 incorporating target speed is
⎛ V ⎛ ⎜ ⎜ D± x ⎜ K IDC2 = log 2 ⎜ fW (θ ) ⎜ W Vx ⎜ ⎜ ⎜ − ⎜ ⎝ 2 K ⎝
2 V ⎛ ⎞ ⎜ D± y ⎟ K ⎟ + f H (θ ) ⎜ ⎜ H Vy ⎟ ⎟ ⎜ − ⎠ ⎝ 2 K
2 ⎞ ⎞ ⎟ ⎟ ⎟ + 1⎟⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎠
(13)
Incorporating target velocity V in a similar manner into the IDWtW’Ѳ model results in
IDVWtW 'θ
⎛ V ⎛ ⎜ ⎜ D± K = log 2 ⎜ fW ' (θ ) ⎜ ⎜ ⎜ W' − V ⎜ ⎜ ⎝ 2 K ⎝
⎞⎞ ⎟⎟ ⎟⎟ ⎟⎟ ⎟⎟ ⎠⎠
(14)
The IDVWtW’θ model looks complicated as it is presented in the current formulas; however, by using vector notation, we could rewrite it in a much simpler way and it could even be updated to 3D with an update to the assumptions. →
V=
→ ⎡ fW (θ ) ⎤ 1 ⎡Vx ⎤ → 1 ⎡W ⎤ → ⎡ Dx ⎤ ⎥ ⎢V ⎥, R = ⎢ ⎥, D = ⎢ D ⎥, and F = ⎢ 2 ⎣H ⎦ K ⎣ y⎦ ⎢⎣ f H (θ ) ⎥⎦ ⎣ y⎦
148
A.A. Hajri et al. →
→
→
By taking V as the velocity vector, R as the object vector, D as the distance →
vector between the cursor and the target and F as the weighted vector, the extended model will simply be
From the IDVWtW’θ model (Equation 14) we can see that the index of difficulty increases as the speed increases or the size decreases while keeping other factors constant. The model also predicts that targets moving towards the cursor (i.e., chasing behaviour) have a larger index of difficulty than those moving away (i.e., pursuit behaviour). To validate these models, we conducted a user study as described later in the paper. We did not use sport videos for our user studies to avoid any uncontrolled factors that could make the validation of the models challenging. Instead we run controlled experiments following Fitts’ protocol.
4 Moving Target Selection Technique We created a selection technique called Hold to overcome some of the drawbacks of previous methods and to reduce the difficulty of moving target selection. Our approach removes speed as a contributing factor to the index of difficulty of selecting moving targets; thus reducing the task to a simple 2D static selection task. Hold works as follows: when a user clicks the mouse button down, the moving targets temporarily pause while the user interacts with targets. When they release the button, the target starts moving again. The active engage of the motor system allows users to be aware of the temporary nature of the paused state which should reduce confusion [7, 22]. This approach removes the factor of target speed from the task of selecting a moving target on the basis of its distance from a pointer and its relative size. This is in contrast to the traditional chasing technique that involves moving a cursor over a moving target and accurately selecting it before it moves out of the way (we refer to this technique as Chase). Pausing the interface theoretically removes the target speed, reducing the movement time of the pointer because the pointer speed no longer needs to be coupled to the target speed. For the same reason, we expect the error rate to reduce. The main adjustment for users is that selection is done with a mouse-up event after mouse movement, which is different from the usual mental model for selection. Our experiment, described next, tests this hypothesis and compares our Hold technique with the usual Chase technique as well as a hybrid of the two to see if users can seamlessly and effectively use a combination of both.
5 Empirical Validation of Moving Target Models For our model validation, we ran a controlled experiment following the standard protocol used in Fitts’ experiment (discrete point-select) for both 1D and 2D where subjects are required to begin with the pointing device at a designated start location and
Moving Target Selection in 2D Graphical User Interfaces
149
move to within the target with a controlled set of independent variables (selection type, distance, size, velocity, angle and direction). The test environment was structured as a game developed in Flash CS4 called “Catch the Wisp” (based on a previous game created by Ilich[18]) to test the three approaches: Chase, Hold, and Hybrid. In this game the three interaction techniques were abstracted as game objects that we developed as potions. The game mental model was designed through an iterative design strategy to avoid confusion for subjects and establishes a simple analogy that minimizes the training and confounding explanation that you might see in traditional Fitts experiments directly applied to moving targets. The three techniques are: Chase: The user pursues and selects a moving target by predicting its movement and clicking the left mouse button when the cursor is over it. In state space shown in Fig. 2, State 0 represents a Button-Up state with no initial target selection, while State 1 represents a Button-Down state where an object has been selected. Hold: The user is able to freeze all targets’ motion by holding the left mouse button down. With the mouse button depressed, the user can move the cursor over the paused object and release the button to select it. Releasing the button while the cursor is not over a valid target will resume the motion of all targets in view. In state space as shown in Fig. 2, State 0 represents a Button-Up state with no initial target selection and the target is in motion; State 1 represents a Button-Down state where the target has been frozen and a subsequent Button-Up event with the cursor over the target will result in target selection. Hybrid: This model is a hybrid of the previous two models in which the user is free to chase the target and reduce the initial amplitude of movement, until they decide to click, hold the mouse button down and freeze the target for a final precision or corrective phase of movement. In this model, the target can be acquired by either a Button-Up or Button-Down event, provided the cursor is directly over the target. In state space as illustrated in Fig. 2, State 0 represents a Button-Up state with no initial target selection and the target is in motion; State 1 represents a Button-Down state where the target has been frozen and selected if the cursor was over the target. A subsequent Button-Up event with the cursor over the target will also result in target selection.
Fig. 2. State Transition Diagram for methods of interaction
150
A.A. Hajri et al.
The user study consisted of two phases in which we first compare the performance of Chase and Hold, and then observe user behavior in the Hybrid model (Chase-orHold) to see if users can effectively use both techniques together seamlessly. The challenge in creating a test environment for moving targets is that we needed to ensure that we could reliably have the moving targets in a predetermined position across conditions between subjects. Thus, we used a game called “Catch the Wisp” where the target object was abstracted as a wisp, or a ball of light from folklore, that would start at a fixed distance from the cursor. In order to make sure that the cursor is at a constant distance from the target, users would have to roll the mouse over a target start location, called a potion, in the game metaphor, which is located at a predefined location from the target. Users were presented with three different colors for the start location: red, blue and green representing the three different techniques Chase, Hold and Hybrid, respectively. The behaviour of each potion was framed as part of the game such as paralysis and removing a wisp’s shield to provide a mental model consistent with the technique being tested. Using this game, we could observe the acquisition of a variety of targets while the users remained engaged and following instructions without making mode errors with respect to technique. 5.1 Apparatus
The user study was conducted on a Toshiba laptop with a 2.10GHz Core2 Duo CPU with 2GB of RAM running Windows XP Professional. For the purposes of this experiment, “Enhance pointer precision” was disabled and the pointer speed was set to 6 (of 10). The laptop LCD display was used at a resolution of 1280 x 800 pixels at a refresh rate of 60Hz. A Microsoft optical wheel mouse was used as the input pointing device and the Adobe Air environment was set to 1024 x 640 pixels while running the Flash program. 5.2 Participants
Fifteen paid university students, eleven female and four male, participated in the experiment. Participants ranged in ages from 19 to 36, were all experienced computer users and have either normal or corrected to normal vision. None of the participants had color blindness. All participants controlled the mouse in the experiment with their right hand. Participants report playing computer games rarely or never. 5.3 Procedure
Participants played the “Catch the Wisp” game and they were asked to capture a moving wisp using the mouse functions according to the test conditions based on the start location, i.e. potion, color. The destination target, i.e. wisp, is presented as a white circle protected by a shield which is disabled differently by each technique. For the red potion, by simply rolling the mouse over the potion the shield will be disabled allowing users to capture the wisp by clicking on it; the Chase mode. For Hold, we use a blue potion. When rolling over the blue potion a blue web (24 pixels by 20 pixels) appears at the potion location and by depressing the left button of the mouse over the blue web the wisp’s shield will be removed and the motion will be frozen as if paralyzed. Keeping the mouse button down, users can then drag a thread from the
Moving Target Selection in 2D Graphical User Interfaces
151
web to the wisp and releasing the button over the wisp to catch it. With the green potion, the shield is removed when they roll over it. At that point, users could decide to pursue the wisp and click on it to catch it or if they prefer, they can hold the left button down, freezing the target and displaying a diagonal cross hair where the cursor is. Then, they can drag the diagonal cross hair over the wisp and release the button to catch it. These three methods are shown in Fig. 3 depicting the start mode (left panel) and the selection mode (right panel).
Fig. 3. Experiment Acquisition Types, Red Potion/Chase (a,b), Blue Potion/Hold (c,d) and Green Potion/Hybrid (e,f)
The game was structured as a series of trials, in each trial users were asked to select one moving target which starts moving from a predefined location which is at a constant distance from a start location. In each trial, the target has a constant size and moves at a constant speed along straight lines that bounce off the edges of the screen. Rolling over the start location would activate the corresponding technique causing the start target, i.e., potion to disappear and the target (i.e. wisp) to start moving. Participants started by completing a tutorial of 24 trials that included on-screen cues showing the next action they have to perform such as “Move the mouse over the potion” or “Click on the web to pause the wisp”. After completing the tutorial, participants started the game which consists of sixteen sets (8 red and 8 blue). Each set contains 36 trials, so in total the game has 576 trials. The sixteen sets were organized in alternation of Chase (Red) and Hold (Blue) sets where one type of condition is presented in each set. Upon completion of the Chase/Hold sets, participants would test the Hybrid (Green) condition. They did twelve practice trials with the Hybrid condition. For the experiment, participants completed 72 trials using Hybrid condition that was specified as a bonus set to keep in line with the game metaphor for the experimental set up. For each trial, participants have a maximum of five attempts to select the target after which the trial is ended and they are advanced
152
A.A. Hajri et al.
to the next trial. At the end of each set, a message was displayed informing the participants of the number of targets they caught in that set, the total credit they achieved up to that point and their best acquisition time. Participants were allowed to take breaks after the completion of each set. They were instructed to be as accurate and as fast as possible. 5.4 Experimental Design
The user study was run as two distinct experiments, divided between targets moving in 1D and in 2D since having one subject do both conditions was deemed to take too long. To quantitatively facilitate a comparison of performance between Chase trials and Hold trials a repeated measures within-participant factorial design was used. Selection method determined by the potion color was the independent variable. In order to determine the relation between the task difficulty and acquisition time, we used four other independent factors. These factors are: Size (W): The size of the wisp in each trial, as measured by the radius of its circle in pixels. The three level used are: 10 pixels, 18 pixels or 30 pixels. Speed (V): The constant speed at which each wisp traveled. Three levels were used:100 pixels/sec, 175 pixels/sec or 250 pixels/sec. Direction: The direction of the wisp from the potion. We used two levels: toward or away from the potion. Movement Angle (θ): The angle between wisp and cursor. The angles are: 0 and 180 degrees in 1D and 45, 135, 225 and 315 degrees in 2D relative to a horizontal axis through the center of the potion.
These factors were fully counter-balanced between trials. Within each selection method, participants in the 2D experiment tried 3 x 3 x 2 x 4 conditions in which they carried out 4 targeting trials for each condition. While participants in the 1D experiment tried 3 x 3 x 2 x 2 conditions in which each condition was carried out 8 times. Therefore, for each dimension participants were presented with 576 trials distributed evenly between two selection techniques, including 288 Chase and 288 Hold. In order to compare the results with the traditional Fitts tasks, different sizes were used while keeping the distance to the target constant. Varied speeds of movement were used to determine their effects on the index of difficulty of moving targets. Finally, we considered that objects moving away from the cursor may be easier (or harder) to select than objects moving towards the cursor so had trials for each to test this. For the second phase of the experiment where participants used the Hybrid method (green potion), a qualitative measure was used to observe their acquisition behavior. In this phase, participants had the option to pursue the target, freeze the target or a combination of both in order to select the moving target. In this phase, we were looking to see whether subjects optimize the two since allowing a target that is moving toward you while you move towards it may achieve faster target acquisition times versus pausing it; however, it may come at the price of accuracy as well as poorer time performance if you miss on the first rendezvous with the target.
Moving Target Selection in 2D Graphical User Interfaces
153
5.5 Performance Measures
For this study we used the acquisition time and the number of errors as our dependent variables. The acquisition time MT was measured from the time the start location (i.e. potion) was activated, when the mouse rolled over it, until the time the moving target was captured. The number of clicks or mouse up events that did not result in a moving target’s capture were counted as errors. Moreover, the position of the cursor and the moving target were recorded every frame. Every time a participant froze the target was also recorded with the total acquisition time in order to study participant behavior. For the Hybrid set, a qualitative analysis consisted of a statistical measure of the distribution of participant behaviors among four categories: Chase: A participant chose to pursue the target without freezing it. Hold: A participant chose to freeze the target immediately after the activation of the start location. Hybrid: A participant chose to start pursuing the target and then freeze the target closer to it. Error Correction: A participant missed the target and tried to correct their miss. We categorized each trial into one of the above four categories according to: If the trial had an error then it is categorized as an Error correction. Else if a participant did not Hold during the trial then a Chase behaviour was selected. Else if a participant did Hold then we checked the cursor position when the last Hold event occurred with the initial cursor and wisp position. – If the distance moved was less than the remaining distance to the wisp then it was considered to be a Hold behavior. – Else it was a Hybrid behavior. 5.6 Results
Both a repeated measures ANOVA analysis and a Generalized Linear Mixed Models (GLMMs) Test were carried out to test the significance of the results. We illustrate below the GLMMs Test results since our data does not have a normal distribution where it is positively skewed. We get almost similar significant effects in both. Selection technique, size, speed, angle and direction were taken as fixed factors in the GLMMs Test while subject and trials were taken as random factors. The outliers were removed based on the acquisition time and number of errors, such that any data point with extreme acquisition time where its frequency dropped to zero, or with 5 errors (since subjects were allowed to have only 5 attempts in each trial) were not included. In total, 0.45% of the 1D data and 0.90% of the 2D data were removed as outliers. Six subjects participated in the 1D selection task experiment, and nine subjects participated in the 2D selection task experiment. 1D Selection Task Phase 1: Our data was positively skewed so we used a Generalized Linear Mixed Models (GLMMs) test to analyze the dependent variable acquisition time. The analysis showed that the independent variables selection technique (F(1, 3099.89) =
154
A.A. Hajri et al.
260.26, p < 0.001), size (F(2, 3099.38) = 301.26, p < 0.001), speed (F(2, 3099.38) = 5.02, p = 0.007), direction (F(1, 3099.27) = 6.08, p = 0.014), and angle (F(1, 3099.37) = 101.36, p < 0.001) had a significant effect on the acquisition time. Moving to the left (θ = 180) was found faster than moving to the right (θ = 0) for both techniques. Post-hoc comparison showed that the mean acquisition time for Hold (M = 981.838, SD = 69.015) was longer than Chase (M = 835.525, SD = 63.931). This coincides with participants’ feedback that the blue potion (Hold) gives them the impression of having enough time to select the target and they do not have to rush as in red potion (Chase) trials. Hence, it slowed their reaction and took a longer time to get started after rolling over the potion. Also technique x size x speed x angle interaction showed significant effect (F(4, 3099.31) = 3.94, p = 0.003) on time. The overall mean acquisition times for techniques by size, speed, angle and direction are illustrated in Fig. 4(i). Errors were also analyzed using the same GLMMs test as acquisition time. A significant effect was observed on the number of errors for the selection technique (F(1, 3105.66) = 380.17, p < 0.001), size (F(2, 3101.72) = 17.35, p < 0.001), speed (F(2, 3101.69) = 3.33, p = 0.036) and angle (F(1, 3101.61) = 23.34, p < 0.001). Direction (F(1, 3100.85) = 0.182, p = 0.67) had no main effect on the number of errors. Chase contained more errors (M = 0.073, SD = 0.76) compared with Hold (M = 0.015, SD = 0.36) which illustrated the speed accuracy trade off in the two selection techniques consistent with users feeling they had to rush in the Chase mode.
Fig. 4. The effect of size, speed, angle and direction on mean acquisition time for (i) 1D and (ii) 2D
Phase 2: The 72 trials of the green potion were compared with the last 72 trials from each technique of the first phase of the experiment using the GLMMs Test. A significant effect was observed on acquisition time (F(2, 272.09) = 10.49, p < 0.001). Hybrid (M = 842.68, SD = 420.49) was faster than Hold (M = 1056.67, SD = 475.18) and Chase (M = 960.67, SD = 859.28). Moreover, a significant effect was observed on the number of errors (F(2, 267.21) = 35.15, p < 0.001). Hold (M = 0.017, SD = 0.397) had fewer errors than either Chase (M = 0.089, SD = 0.827) or Hybrid (M = 0.082, SD = 0.995).
Moving Target Selection in 2D Graphical User Interfaces
155
2D Selection Task
Phase 1: For the 2D selection task experiment, GLMMs was conducted to compare the total acquisition time for each technique. There was a significant effect (F(1, 4807.93) = 64.38, p < 0.001) when an overall comparison was made and the mean acquisition time for Hold (M = 1239.874, SD = 67.079) was shorter than Chase (M = 1450.053, SD = 127.114). The other independent variables size (F(2, 4807.58) = 668.66, p < 0.001), speed (F(2, 4807.30) = 65.50, p < 0.001), direction (F(1, 4807.14) = 156.26, p < 0.001), and angle (F(3, 4807.16) = 4.66, p = 0.003) showed significant effect on acquisition time. Pair-wise contrast between angles showed a significant difference between angle pair 45 and 315 degrees and this coincides with results found in [28]. This effect is most likely due to the fact that targets on either end of a vector are equally difficult to select and movement direction had no effect on acquisition time as illustrated in [12]. The effect of the angle on acquisition time was dependent on both size and direction indicated by the interaction size x direction x angle (F(6, 4807.15) = 2.24, p = 0.037), and also mode, size, speed and direction indicated by mode x size x speed x direction x angle (F(12, 4807.14) = 2.29, p = 0.007).The mean acquisition times for technique by size, speed, angle and direction are illustrated in Fig. 4(ii). In addition, significant effects were observed for size and speed combinations; however, Hold exhibited a lower mean acquisition time for the small target as well as the fast moving targets as shown in Fig. 5. A significant effect of technique was also observed on the number of errors (F(1, 4812.41) = 754.92, p < 0.001), Hold contained fewer errors (M = 0.009, SD = 0.014), compared with Chase (M = 0.083, SD = 0.098). The main effect of size (F(2, 4810.50) = 277.73, p < 0.001), speed (F(2, 4808.88) = 56.68, p < 0.001) and direction (F(1, 4808.01) = 5.56, p = 0.018) also found to be significant. However, angle showed no significant main effect on errors.
Fig. 5. The 2D mean time by size and speed for Chase and Hold
Phase 2: A GLMMs test was also conducted to analyze the 72 trials of the 2D Hybrid method. A significant effect was observed on acquisition time (F(2, 291.23) = 8.98, p < 0.001). Hybrid (M = 1162.07, SD = 635.63) was faster than Hold (M = 1198.16, SD = 499.15) and Chase (M = 1333.09, SD = 1343.40). Moreover, a significant effect was observed on the number of errors (F(2, 97.46) = 55.82, p < 0.001). Hold (M = 0.01, SD = 0.306) had fewer errors than either Chase (M = 0.096, SD = 0.96) or Hybrid (M = 0.054, SD = 0.76).
156
A.A. Hajri et al.
5.7 Model Fitting
By a least-squares fit method, we estimated the coefficients of the models described earlier for Chase and Hold. We adopted the original Fitts’ Law [11] in Equation 1 for trials involving Hold in 1D while we tested Equations 11 and 12 for Hold trials in 2D. For trials involving Chase in 1D, we adopted Equation 10 while Equations 13 and 14 were adopted for trials in 2D. For 1D targets, the Fitts’ model (R2 = 0.9869) and IDC1 (moving away R2 = 0.9662, moving toward R2 = 0.9755) have shown good fits with the experimental data. The IDP2 (R2 = 0.9717) and IDWtW’θ (R2 = 0.9505) models for stationary targets in 2D have also shown an excellent fit with the experimental data of Hold. For 2D moving targets models, the IDVWtW’θ (R2 = 0.9099) showed a good correlation with the data while the IDC2 model exhibited poor correlation with some angles. The poor correlation can be explained as discussed earlier that angle pairs (45, 225) and (135, 315) lay in the same diagonal vector. As in previous research each pair was considered as one movement angle and this was shown in [28] where a significant difference existed only between angles 45 and 315. When each pair was considered as one movement angle, the correlation got better (R2 = 0.9027) and it exhibited similar performance as the IDVWtW’θ. Another factor could be the target dimension, the IDC2 model gives different weights for width and height of the target, however, we used a circular target in our experiment where both height and width are equal. Fig. 6 illustrates the average acquisition time versus the index of difficulty for both IDVWtW’θ and IDC2 models.
Fig. 6. Mean Acquisition Time Vs. Index of difficulty for Chase in 2D using (a) IDVWtW’θ , and (b) IDC2 models
6 Discussion For 1D we observed from the first experiment that Chase exhibited lower acquisition time than Hold in all conditions. This trend can be interpreted in different ways. The overhead of clicking on the web and sustaining the dragging motion outweighed the benefit of freezing the target. In addition, as the target is frozen it gives an illusion of the time also being paused. In this state, subjects did not rush in selecting the target resulting in taking longer. Adding a score in the game based on time taken, target size and speed does not help some users in mitigating this effect. Adding a time limit to capture a target as some subjects suggested, would help we think. Questionnaire
Moving Target Selection in 2D Graphical User Interfaces
157
results, that subjects filled after the experiments, agreed with these findings. Subjects preferred Chase as they felt it was faster and they did not account for the accuracy. Also subjects tended to take a longer time to precisely release the button over the target in Hold which resulted in fewer errors but longer acquisition time. However, for 2D Hold showed faster acquisition time for conditions involving a target size of 20px (small) as well as conditions involving a target speed of 175px/s (moderately fast) and 250px/s (fast). This contradicts results for the selection task in 1D. We believe that this can be explained by the distance the target had to travel was restricted to a horizontal path in 1D. Because of this restriction in 1D, the target was more likely to rebound off the end and approach the cursor while in 2D the target moves in an angle and thus would take longer time to hit a wall and rebound towards the cursor again for users to take advantage of that. We confirmed that speed has little impact on Hold as observed in Fig. 4. In both 1D and 2D, subjects sacrificed accuracy for speed in Chase by attempting to click on the target in rapid succession. While they sacrificed speed for accuracy in Hold by carefully positioning the cursor over the target before releasing the mouse button. The results of the second phase of the experiment in 1D showed that Hybrid resulted in reduction in acquisition time of 12% over Chase and 20% over Hold. While in 2D, it showed a reduction of 13% over the Chase and 3% over the Hold suggesting subjects are performing optimization seamlessly. The mouse and target position logs for each subject were analyzed to categorize the technique used for each trial. The results are summarized by the percentage of trials for which each technique was chosen by target size, speed, angle, and direction in Fig. 7. In 2D, subjects tended to use a Hybrid approach more often while in 1D, subjects tended to use Chase as they thought it is faster. Subjects in the 1D experiment commented that they had used Chase more often in the second phase due to the unfamiliarity with Hybrid and they claimed that Chase is faster which would optimize their acquisition time.
Fig. 7. Technique chosen by size, speed, angle, and direction in (i) 1D and (ii) 2D
The distribution of the ratios (Chase /(Chase + Hold)) and (Hold /(Chase + Hold)) for the second phase of the experiment was also analyzed and it is summarized by target size, speed, angle, and direction in Fig. 8. Users tended to chase the target most
158
A.A. Hajri et al.
Fig. 8. The ratio distribution of Hold and Chase using Hybrid in (i) 1D and (ii) 2D space
of the time across the different conditions. However they froze the target more often as the target gets smaller and faster which indicates a seamlessly and effectively use of techniques.
7 Conclusion and Future Work We performed studies evaluating three selection techniques for moving targets and investigated the effect of target size, speed, movement angle, movement direction and their interactions on acquisition performance in both 1D and 2D. In 2D, our Hold method provides faster and more accurate selection performance for small or fast moving targets. Building upon prior work on 1D and 2D selection tasks, we introduced and validated variants of Fitts’ Law that model selection of moving targets in both 1D and 2D. We have shown that performance time for moving targets in 2D can be predicted for most situations using our models. For some variants of angles the correlation is not as accurate as we would like, motivating further studies to test various angles and adjust the model. The work we have presented has been validated for linear movement and mouse as input device which provides a foundation for future research and validation of pointing models for chaotic targets and other input devices (e.g. finger). We anticipate Hold will work well for complex movement types since a less predictable motion is more difficult to select with Chase. Moreover, we think the chase technique will be disadvantaged for touch due to the repetition of tapping, lifting required when a selection fails; such as when trying to select small, fast moving targets. However, we believe Hybrid would work well because one touch anywhere pauses the video and users just need to slide their finger to the target. Thus, if the first touch misses, keeping the finger down and sliding it to the target will allow easy correction. We suspect that users will optimize the selection time by approaching the target for a rendezvous before touching the display. Further, we also foresee that
Moving Target Selection in 2D Graphical User Interfaces
159
rapid aimed movements, such as moving target selection maybe a hybrid of an iterative correction model and an impulse variability model [25], suggesting a new area for future research. The first movement towards the target could be considered as an impulse task, where user tries to hit the target within a specific time in their mind, while the next moves are corrective movements. Thus, a hybrid model of Fitts’ law [11] and Schmidt’s law [26] may be an effective way to characterize rapid aimed movements. Moving targets are likely to be common in future interfaces and in these environments accuracy is generally more important than speed. We have shown that our Hold and Hybrid approaches provide an effective interaction technique that can be easily integrated into interfaces with moving targets.
References 1. Accot, J., Zhai, S.: Beyond Fitts’ law: models for trajectory-based HCI tasks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Atlanta, Georgia, United States, pp. 295–302 (March 1997) 2. Accot, J., Zhai, S.: Refining Fitts’ law models for bivariate pointing. In: Proceedings of ACM Conference on Human Factors in Computing Systems (CHI 2003), pp. 193–200. ACM, New York (2003) 3. Asano, T., Sharlin, E., Kitamura, Y., Takashima, K., Kishino, F.: Predictive interaction using the Delphian Desktop. In: Proc. UIST 2005, pp. 133–141. ACM, New York (2005) 4. Baudisch, P., Cutrell, E., Robbins, D., Czerwinski, M., Tandler, P., Bederson, B., Zierlinger, A.: Drag-and-pop and drag-and-pick: Techniques for accessing remote screen content on touch-and penoperated systems. In: Proceedings of Interact, pp. 57–64 (2003) 5. Baudisch, P., Zotov, A., Cutrell, E., Hinckley, K.: Starburst: a target expansion algorithm for non-uniform target distributions. In: Proc. AVI 2008, pp. 129–137. ACM, New York (2008) 6. Blanch, R., Guiard, Y., Beaudouin-Lafon, M.: Semantic pointing: improving target acquisition with control-display ratio adaptation. In: ACM CHI Conference on Human Factors in Computing Systems, pp. 519–525 (2004) 7. Buxton, W.: A three-state model of graphical input. In: Proceedings of the IFIP TC13 Third International Conference on Human-Computer Interaction, pp. 449–456 (August 1990) 8. Card, S.K.: The model human processor -a model for making engineering calculations of human performance. In: 25th Annual Meeting on Human Factors Society, Rochester, NY, United States, pp. 301–305 (October 1981) 9. Chapuis, O., Labrune, J., Pietriga, E.: Dynaspot: speed-dependent area cursor. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009, Boston, MA, USA, April 04-09, pp. 1391–1400. ACM, New York (2009) 10. Cockburn, A., Firth, A.: Improving the acquisition of small targets. In: British HCI Conference, pp. 181–196 (2003) 11. Fitts, P.M.: The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology 47, 381–391 (1954)
160
A.A. Hajri et al.
12. Grossman, T., Balakrishnan, R.: Pointing at trivariate targets in 3d environments. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vienna, Austria, vol. 6, pp. 447–454 (April 2004) 13. Grossman, T., Balakrishnan, R.: The bubble cursor: enhancing target acquisition by dynamic resizing of the cursor’s activation area. In: Proc. CHI 2005, pp. 281–290. ACM, New York (2005) 14. Grossman, T., Balakrishnan, R.: A probabilistic approach to modeling two-dimensional pointing. ACM Trans. Comput.-Hum. Interact. 12(3), 435–459 (2005) 15. Guiard, Y., Blanch, R., Beaudouin-Lafon, M.: Object pointing: a complement to bitmap pointing in GUIs. In: Graphics Interface, pp. 9–16 (2004) 16. Gunn, T.J., Irani, P., Anderson, J.: An evaluation of techniques for selecting moving targets. In: Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA 2009, Boston, MA, USA, April 04-09, pp. 3329–3334. ACM, New York (2009) 17. Hoffmann, E.R.: Capture of moving targets: a modification of Fitts’ law. Ergonomics 34, 211–220 (1991) 18. Ilich, M.V.: Moving Target Selection in Interactive Video. Master’s thesis, University of British Columbia, Vancouver, Canada (December 2009) 19. Jagacinski, R.J., Repperger, D.W., Ward, S.L., Moran, M.S.: A test of Fitts’ law with moving targets. Hum. Factors 22(2), 225–233 (1980) 20. Kabbash, P., Buxton, W.: The ”prince” technique: Fitts’ law and selection using area cursors. In: ACM CHI Conference on Human Factors in Computing Systems, pp. 273–279 (1995) 21. MacKenzie, I.S., Buxton, W.: Extending Fitts’ law to two-dimensional tasks. In: Bauersfeld, P., Bennett, J., Lynch, G. (eds.) Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1992, Monterey, California, United States, May 03-07, pp. 219–226. ACM, New York (1992) 22. MacKenzie, I.S., Sellen, A., Buxton, W.: A comparison of input devices in element pointing and dragging tasks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Reaching Through Technology, New Orleans, Louisiana, United States, pp. 161–166 (1991) 23. McGuffin, M., Balakrishnan, R.: Acquisition of expanding targets. In: ACM CHI Conference on Human Factors in Computing Systems, pp. 57–64 (2002) 24. Mould, D., Gutwin, C.: The effects of feedback on targeting with multiple moving targets. In: Proceedings of Graphics Interface 2004, London, Ontario, Canada, May 17-19. ACM International Conference Proceeding Series, vol. 62, pp. 25–32. Canadian HumanComputer Communications Society, School of Computer Science, University of Waterloo, Waterloo, Ontario (2004) 25. Rosenbaum, D.: Human motor control. Academic Press, San Diego (1991) 26. Schmidt, R.A., Zelaznik, H.N., Hawkins, B., Frank, J.S., Quinn, J.T.J.: Motor-output variability: A theory for the accuracy of rapid motor acts. Psychological Review 86(5), 415–451 (1979) 27. Sheikh, I., Hoffmann, E.: Effect of target shape on movement time in a Fitts task. Ergonomics 37(9), 1533–1548 (1994) 28. Whisenand, T.G., Emurion, H.H.: Some effects of angle of approach on icon selection. In: Proceedings of the CHI 1995 Conference on Human Factors in Computing Systems, pp. 298–299. ACM, New York (1995)
Moving Target Selection in 2D Graphical User Interfaces
161
29. Wobbrock, J.O., Fogarty, J., Liu, S.-Y.S., Kimuro, S., Harada, S.: The angle mouse: target-agnostic dynamic gain adjustment based on angular deviation. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, Boston, MA, USA, pp. 1401–1410 (April 2009) 30. Worden, A., Walker, N., Bharat, K., Hudson, S.: Making computers easier for older adults to use: area cursors and sticky icons. In: ACM CHI Conference on Human Factors in Computing Systems, pp. 266–271 (1997) 31. Wright, C.E., Meyer, D.E.: Conditions for a linear speed-accuracy trade-off in aimed movements. Quarterly Journal of Experimental Psychology 35(A), 279–296 (1983) 32. Zhai, S., Conversy, S., Beaudouin-Lafon, M., Guiard, Y.: Human on-line response to target expansion. In: ACM CHI Conference on Human Factors in Computing Systems, pp. 177–184 (2003)
Navigational User Interface Elements on the Left Side: Intuition of Designers or Experimental Evidence? Andreas Holzinger1, Reinhold Scherer2, and Martina Ziefle3 1
Medical University Graz, A-8036 Graz, Austria Institute for Medical Informatics, Research Unit Human–Computer Interaction Auenbruggerplatz 2/V, A-8036 Graz [email protected] 2 Graz University of Technology, Austria Institute for Knowledge Discovery, Laboratory of Brain-Computer Interfaces Krenngasse 37/IV, A-8010 Graz [email protected] 3 RWTH Aachen University, Germany Communication Science, Human Technology Centre (Humtec) Theaterplatz 14, D-52062 Aachen [email protected]
Abstract. Humans tend to direct their attention toward the left half of their area of vision, which is known as visual pseudo neglect. Most navigational elements are placed at the left side. However, there is neither a theoretical reasoning nor empirical evidence, why these elements should be placed left. In the present study we examined three independent variables (presentation side of elements (left, right), number of elements (one, three, five) and a visual cue prior to selection (with cue, without). Dependent variables were selection times and accuracy of task completion. 50 participants were exposed to elements consisting of single words in bubbles. After clicking on the start element in the middle of the screen a number of elements were presented randomly on the left or right. In 50% of trials the presentation side was announced in advance, by using a visual cue. It was tested, whether and to what extent there is a preference and performance (correct selection time) increase for elements placed on the left side. When the cue was presented, performance increased; without cue information, elements on the left were selected faster. The use of cues resulted in no significant differences between the left and right side. A significantly better performance was found when only one element was presented on the left. With an increasing number of elements, the performance decreased. The results of this study suggest that the presentation of elements on the left side is advantageous for the speed of information processing only in the case of single elements. When selecting between numbers of options (three, five), placing elements on the left does not affect the selection performance. Keywords: navigation, graphical user interface design, pseudo neglect, visual attention, performance, selection times.
Navigational User Interface Elements on the Left Side
163
1 Introduction and Motivation for Research It is common practice that user interface designers place navigational elements on the left side of user interfaces. Possibly, they are assuming an elementary benefit in information perception, encoding and cognitive processing. Moreover, although there is only little empirical evidence (e.g. [1]), many design guidelines exist, which recommend to put navigational elements on the left side, e.g. [2], [3], [4], . However, an underlying theory along with an experimental proof seems not to exist. The “rule to arrange navigational buttons on the left” [5] seems to be a well-known designer intuition, not grounded in theory. However, we must be careful with the term “intuitive”, which is not properly defined; any common definition of the term makes its use inappropriate in interface design [6]. Interestingly, we experienced that when designers placed navigational elements explicitly not on the left side, e.g. at the right side (as in figure 1), end users expressed that they “intuitively” did not like it – possibly because they are used to have the navigational elements on the right side. Motivated by these practical experiences, we began to think for a possible reason, a theory and for an experimental setting to answer the question. Consequently, this study aims at a critical investigation, whether and to what extent the positioning of navigational elements does factually benefit selection performance. In section 2 we provide some related work on which we build our work, in section 3 we provide some necessary theoretical background, in section 4 we present our questions addressed and the research logic of our experiment, in section 5 we describe our methods and materials used, in section 6 we present our results and in section 7 we draw conclusions and provide some future research outlook.
Fig. 1. After the „modern“ designers placed the navigational elements on the right side, end users massively complained that they “do intuitively feel that this is wrong”
2 Related Work Kalbach & Bosenick (2003) [1] compared two Web page layouts (see figure 2): one with the main site navigation menu on the left of the page, and one with the main site
164
A. Holzinger, R. Scherer, and M. Ziefle
navigation menu on the right. N=64 participants (44 of them female) were divided equally into two groups and assigned to either the left-hand or the right-hand navigation test condition. Using a stopwatch, the time to complete each of five tasks was measured. The hypothesis that the left-hand navigation would perform significantly faster than the right-hand navigation was not supported. Instead, there was no significant difference in completion times between the two test conditions. However, this work questioned the leading Web design thought as well as the common practice, that the main navigation menu should always be left justified. In their experiments, the participants were asked to carry out 6 different tasks, which required the use of the navigational elements. Interestingly, in average, the time to perform the tasks was about 3.7 seconds shorter when elements were placed on the left side (left M=30.5 SD=34.3; right M=34.2 SD=35.8). Kalbach & Bosenick performed an ANOVA for all tasks, however, again there were no significant difference between both conditions.
Fig. 2. The tested Webpages of the experiment done by Kalbach & Bosenick [1]
Van Schaik & Ling (2001) [7] empirically evaluated design guidelines according to the arrangement of navigational elements. They wanted to discover possible effects of the frame layout (the arrangement of the navigational elements) along with the contrast of the background on visual search performance. There were two independent variables (background – same and different contrast) and the frame layout (arrangement of the navigational elements: left, right, top, down). As experimental material they used a Webpage, which consisted of two parts: main frame layout and menu frame layout. The latter contained five hypertext links in blue text colour and a word length of 5 to 10 letters. They presented the navigational elements on different areas of the page (left, right, top, down) and measured the accuracy and the time of right and wrong answers and the subjective preference for a certain design. In their study they observed N=189 participants (146 of them female) and allocated 90 participants to the experimental condition “same contrast” and 99 participants to “different contrast”. In the first experimental condition the navigational frame layout and the main frame layout had both a grey background, in the other condition the navigational frame was white and the main had grey colour. Within the experiment they presented at first a blank white screen and then a word (target, black colour).
Navigational User Interface Elements on the Left Side
165
Afterwards, the participants had to find this word out of the five hypertext links. The word was either presented or hidden. Only if the participants found the word they pressed a button. Van Schaik and Ling analysed the hits and the misses and measured reaction times. A main effect was detected amongst the hits [F(3,561)=4,473, p<0,01], on the right side (M=97,47, SD=6,84) there were less hits as on top (M=98,88, SD=5,79) or on the left side (M=98,77, SD=5,17) (Tukey´s Honestly Significant Difference), p<0.05). The reaction time (in milliseconds) revealed also a main effect [F(3,561)=39,910, p<0,001]; right (M=1371, SD=269) and down much longer times were measured than on top (M=1302, SD=300) or left (M=1305, SD=235) (Tukey´s HSD, p<0,05). In conclusion the authors recommend arranging navigational elements on the left side or on top and they tried to explain these effects on intercultural influences (e.g. reading culture).
3 Theoretical Background 3.1 Visual Pseudoneglect It is known that humans primarily attend to objects on the left side of space as could be shown in classic cancellation tasks routinely used during neuropsychological testing (see figure 3, second from the left, and compare figure 4).
Fig. 3. Left (1-2): Settings to test effects on visual pseudoneglect [8]; Right (3-4): Settings where such effects were observed with humans [9]
Neglect and pseudoneglect are asymmetries of spatial attention, which are often assumed to have fundamental theoretical and neurological relationship to each other, although there is much dispute about it. Basically, when neurologically normal individuals bisect a horizontal line as accurately as possible, they reliably show a slight leftward error. This leftward inaccuracy is known as pseudoneglect [10], [11], [12]. Diekamp et al. (2005) [8] (see figure 3 (left) and figure 4) tested birds in a task that closely matches these cancellation tasks: the birds were required to explore an area in front of them and to sample grains. The results showed that birds displayed a clear bias into the left side, as evident in the pecking activity or the order in which pecks were placed in the left or right side. Let’s go back to humans: Line bisection tasks are a commonly used procedure for testing spatial attention.
166
A. Holzinger, R. Scherer, and M. Ziefle
Several studies before have found that patients with a so called left hemisphereneglect, bisect long lines too far to the right, but bisect short lines too far to the left, and some studies reported that normal participants bisect long lines too far to the left, presumably reflecting an over-estimation of the left side due to the role of the right hemisphere in attention [13]. Jewell & McCourt (2000) [11] provide a large-scale review of the literature concerning visual and non-visual line bisection in 2191 neurologically normal subjects. The meta-analytic results indicate a significant leftward bisection error in neurologically normal subjects, with an overall effect size of between −0.37 and −0.44 (depending on integration method).
Fig. 4. The results from Diekamp et al. (2005) [8]
3.2 Fitt’s Law and Hick’s Law: Human Performance Principles No paper about navigation can ignore to mention these two laws. According to Fitts (1954), [14] the average duration of responses is directly proportional to the minimum average amount of information per response. Fitts also discovered that movement time is a logarithmic function of distance when target size was held constant, and that movement time is also a logarithmic function of target size when distance was held constant; the time to acquire a target is therefore a function of the distance to and size of the target. Fitts' law has been applied to user interface design [15]. The time for pointing tp is a function of both the distance d of the reach and the width w of the target object. This relationship can be expressed in the following formula where a and b are empirical constants and vary depending the used device:
Navigational User Interface Elements on the Left Side
167
tp = a + b * log2 (2d/w) The expression log2 (2d/w) is, according to Shneiderman (1997), [16] also called index of difficulty (ID). Understanding how manipulations of d and w affect the time tp is of considerable importance in developing efficient user interfaces. For our study, we had, within certain parameters (including the dimensions of the screen), full control over the size of target objects and the distance of the user from the screen, and we had experience from previous work [9], [17]. However, we must emphasize, that the formula a + b * log2 (2d/w) is not quite correct. Consider the case where e.g. your finger happens to be where you wish to touch, when a new screen comes up. In that case d = 0, and log(0) is undefined, so the formula provides no result (it should give a as the time). Similar to Fitt’s Law, the so called Hick's Law [18], describes the time it takes to make a decision as a result of the possible choices of the end user; consequently it assesses cognitive information capacity in choice reaction experiments. The reaction time (rt) raises logarithmically, if the amount of alternatives raise: rt = k*log2 (n+1) Both laws are two surviving human performance principles based originally on Shannon and Weaver's (1949) Information Theory concept [19]. With the advent of user interfaces, these laws were used as the fundamental design principles [20]. A good discussion on the impact of those two laws can be found in [21].
4 Questions Addressed and Research Logic of the Experiment We conducted an experiment in which participants were instructed to select a target element, which was either placed on the left or the right side of the display. In addition, we examined if the directing of visual attention (using a cue) facilitates selection performance differentially for elements placed on the left and right side. Also we varied the number of distracting elements (the target stimulus had to be selected as a singleton as well as among three or five elements) and determined whether the number of elements asymmetrically influences the processing of information on the left and right side of the display.
5 Methods 5.1
Independent and Dependent Variables
Three independent variables were examined: (1) The first independent variable is the side on which the target is displayed, comparing elements on the left and the right side of the display. (2) The second independent variable is the presentation of a visual cue (directing visual attention), directly before the target had to be selected (both with cue, and without cue).
168
A. Holzinger, R. Scherer, and M. Ziefle
(3) The third independent variable is the number of elements, which were displayed on either side of the screen, differentiating one, three or five elements. The cuing as independent variable in combination with the side on which navigational elements are presented allows us to understand whether the selection performance on both sides is equally high under the cueing condition (if visual attention is beneficial then we should not find the left side benefit). In the condition without cuing we would expect the left side benefit. The arrow (the cue) did not disappear in the cuing conditions, but was not present in the non-cuing conditions (see figure 5). Dependent variables included the speed and accuracy of performance. For the speed, the selection times (in ms) were determined (executed by a computer mouse). Regarding accuracy, the percentage of correct answers was counted. 5.2 Hypotheses The following hypotheses were underlying the experiment: I. A significant main effect for the side at which elements were displayed • Faster processing times for elements placed on the left side of the display compared to elements placed on the right side of the display • A higher accuracy of answers for elements placed on the left side of the display compared to elements placed on the right side of the display II. A significant main effect for directing visual attention • Cuing the position of the expected elements prior to displaying it increases performance in terms of speed and accuracy of selection times, independently from the side at which elements are displayed. III. A significant main effect for the number of elements, among which the target stimulus has to be identified • The number of elements displayed on either side (right or left) decreases performance in terms of speed and accuracy of selection times 5.3 Experimental Task The task of each participant was to select a target word, which was positioned either on the left or the right side of the display. They started each trial by clicking on a centrally positioned element (word). In 50% of the trials the central element contained a visual cue (a bold arrow) above the start element, which indicated the presentation side of the target element. After participants clicked on the start element it disappeared and the target element(s) – one, three or five elements – appeared. The trial was finished when participants had clicked on the correct target. In order to meet requirements of ecological validity, no time restrictions were given. Participants were instructed to execute the trails in a normal easy and efficient working speed. In Figure 5, a screenshot of the cue (a bold arrow above the start element) is depicted which indicates the side at which the navigational elements would appear. After selecting, the start element (a word) disappeared and the target element (the same word) on the left or right side appeared and had to be selected. Figure 6 shows the display for a trial without a visual cue, and three elements on the left side (the central element is the target word). The distance from the start element to the
Navigational User Interface Elements on the Left Side
169
Fig. 5. Start element with a cue for the positioning of displaying side of the target element
first/third element (3-element condition) was 11.5 cm and to the second/forth element (5 element-condition) 12.5 cm. The position of the target word was randomized across trials. All words were between 4 to 7 letters and represented highly familiar words within the German language. The different word lengths were chosen to meet requirements of ecological validity: Normally, we process different word lengths when using websites and selecting objects. To simulate such a real-life scenario we also presented different word lengths in the study. As all experimental stimuli represented highly frequent words, we can exclude confounding effects of different word frequencies on selection times (which are well known). As all participants received the same stimulus material, word length did not asymmetrically influence the processing across participants. In order to learn if the word length has an impact on reaction times we analyzed this factor. No significant effect of word length was found, therefore we can exclude confounding effects of different word lengths.
Fig. 6. Screenshot for a trial without a visual cue, and three elements on the left side (the central element is the target word)
170
A. Holzinger, R. Scherer, and M. Ziefle
The distance from the center of the start element to the center of the target element (on the left or right side) was 11 cm and 10.38 degree (visual angle). 5.4 Participants A total of N=50 participants volunteered to take part in the experiment. Simulating the diversity of computer users, the age range varied (between 10 and 64 years of age, M = 32.4; SD = 11.8) as well as the educational level. 16 participants had a graduation from apprenticeship training, 4 persons had a vocational school degree; 22 Participants reported to have a general qualification for university entrance and 8 persons had completed a university diploma. Participants were surveyed regarding their experience with computers and Internet usage as well as their self-reported computer literacy. All persons reported to be frequent computer users with a satisfying experience when using computers and Internet. 35 persons indicated to use the Internet about 5-7 times per week, 9 persons reported to use the Internet about 1-2 times a week. 6 persons – the older ones – reported to use the Computer and the Internet to a lesser extent (about once a month). All persons had a normal visual acuity. 26 out of 50 persons wore their corrective lenses during the experiment. Participants were instructed to use the computer mouse with their dominant hand (all participants were right handed), as tested by a handdominance testing procedure. Participation was voluntarily and the participants were not gratified for their efforts. 5.5 Apparatus and Materials For the equipment, a standard working place was used. Participants sat in front of a computer screen, using a computer mouse (optical Logitech Wheel Mouse). In the beginning of the experiment, participants adjusted the height of the chair according to individual needs. In order to exclude different body movements and sighting conditions across participants, a chin rest was used which did not restrict participants, but allowed to control for comparable viewing conditions (figure 7).
Fig. 7. The Experimental setting including a Tobii 1750 eye tracker with chin rest
Navigational User Interface Elements on the Left Side
171
The stimulus material consisted of disyllabic (german) words (4-7 letters). The selected words are common in the everyday German language usage; thus we can exclude confounding effects from word processing of difficult or unfamiliar material. Moreover, the stimuli were presented in positive polarity (black letters on a white background). 5.6 Experimental Design and Procedure The experiment was based on a 3 (number of elements) x 2 (presentation side of navigational elements) x 2 (with and without cue) - design for repeated measurements. All participants absolved all conditions. The navigation side at which stimuli were located (right or left) and the number of elements (one, three or five) was varied randomly across trials. The cuing variable was blocked, with one half of participants starting with the order “with cue, followed by without cue” the other half vice versa (“without cue, followed by with cue”). As checked by ANOVA procedures, the order did not impact visual performance. Overall, 108 trials had to be carried out (54 trials with and 54 trials without a cue, 27 on the left and 27 trials on the right side). The whole experiment lasted about 3040 minutes, depending on the individual working speed. In order to meet the high control demands in experiments, we did not use a real webpage, but used a simulation.
6 Results Data were analyzed by analyses of variance for repeated measurements assessing effects of (1) the presentation side, (2) the presence/absence of a visual cue and (3), the number of elements presented on the interface at a time. The significance of the F-Tests was taken from Pillai values. The level of significance was set at p < 0.05. First, the main effects are reported, followed by interacting effects. The data revealed an accuracy level of 100%: None of participants had difficulties executing the selection of elements and perfectly carried out the tasks. Therefore, in the following, the selection times are focused at. 6.1 Main Effect of Presentation Side Participants needed, on average, 1155.7 ms (SD = 223.6 ms, SE = 30.7 ms) for the selection of the target elements presented on the left side compared to 1181.5 ms (SD = 244 ms, SE = 33.2 ms) for the selection of elements, which are placed on the right side of the display. The small mean difference of 26 ms revealed to be statistically significant (Figure 8 left) yielding a meaningful main effect of presentation side (F (1,49) = 11.8; p < 0.05). 6.2 Main Effect of Cuing Visual Attention Cuing visual attention facilitates the information processing: Conditions, in which a visual cue announced the presentation side of the elements, were significantly faster
172
A. Holzinger, R. Scherer, and M. Ziefle
Fig. 8. Left: mean selection time in ms for the left compared to the right presentation side of elements; right: mean selection times in ms for trials with and without visual cue
processed (M = 1102.6 ms; SD = 240.3 ms, SE = 32.22 ms) compared to trials without visual cuing (M = 1234.6 ms; SD = 249.1 ms, SE = 32.74 ms). The mean benefit of 132 ms represented another significant main effect (F (1,49) = 89.3; p < 0.05). Outcomes are visualized in Figure 8 (right side). 6.3 Main Effect of the Number of Elements Presented at a Time on the Screen Furthermore, the number of elements among which the target element had to be selected also showed a significant impact on the identification and selection of navigational elements (F (2,48) = 378.4; p < 0.05)). The outcomes are visualized in figure 9. Descriptive outcomes show that selection times were considerably faster for conditions with only one element (M = 978.9 ms; SE = 31.4 ms), compared to three elements (M = 1140.2 ms; SE = 31.9 ms) and five elements (M = 1386.7 ms; SE = 34.4 ms). The interacting effects between the three main factors are of considerable interest, as it shows whether the (small) advantage of the left side presentation of navigational element also holds when the identification and selection of target elements is facilitated by cuing the visual attention and when the number of elements which have to be processed at the same time increases. On a first sight, we expect that cuing of visual attention reduces the left side benefit and that the left side advantage profits from a small number of elements to be processed at a time. Outcomes of the interacting effects are reported in the next sections. 6.4 Interacting Effect of Visual Cueing and Presentation Side A significant interaction effect was found for the visual cuing of attention and the side on which navigational elements were positioned (F (1,49) = 11.3; p < 0.05, see figure 10).
Navigational User Interface Elements on the Left Side
173
Fig. 9. Mean selection time in ms for one, three and five elements presented on either side of the display
Fig. 10. Interacting effect of visual cue and presentation side (mean selection times in ms)
As can be seen from figure 10, the detrimental effect of having no visual cue that indicates the side of presentation is asymmetrical for the left and the right side: the selection times are slowest in conditions in which no visual cueing is present and the elements that have to be processed are on the right side. In other words this can be referred to as an indirectly evidence for the benefit of the left-side presentation as elements on the left side are disadvantaged to a lesser extent when there is no cuing of visual attention. 6.5 Interaction Effect of Visual Cueing and Number of Elements In this analysis we examined whether the cuing of visual attention is equally beneficial for the number of navigational elements that are presented at a time. Analyses revealed a significant effect (F (2,48) = 13.7; p < 0.05, figure 11): Performance drastically varied depending on the number of elements and a visual cue.
174
A. Holzinger, R. Scherer, and M. Ziefle
Fig. 11. Interacting effect of visual cue and presentation side (mean selection times in ms)
The best performance was given, when a visual cue indicated the presentation of one element only (M = 875 ms; SE = 28.73 ms), the worst performance was found in the condition in which no visual cue was given and five elements had to be processed, among which the target element had to be selected (M = 1439.9 ms; SE = 27.73 ms). Visual cuing facilitates the processing of elements and reduces the detrimental effects of having more than one element. 6.6 Interacting Effect of Presentation Side and Number of Elements Finally it was analyzed if there is a significant interaction of presentation side and the number of elements among which the target element had to be selected. As can be seen from figure 12, there is a significant effect (F (1,49) = 11.3; p < 0.05).
Fig. 12. Interacting effect presentation side and number of elements that have to be processed at a time (mean selection times in ms)
Navigational User Interface Elements on the Left Side
175
From an ergonomic point of view, this interacting effect is most insightful as it shows that the benefit of the left-side presentation of elements is only valid when one element is depicted (what is rather unusual in real web sites). Whenever the number of navigational elements increases (three, five elements) the benefit vanishes and shows the very same selection times.
7 Conclusion and Future Research Outlook The results of the presented center-out selection study show that placing elements on the left side significantly decreases the selection time only in cases were individuals elements have to be selected. When user had to choose between several options, no significant differences between left or right were found. As expected, the use of cues decreases the overall selection time. The availability of the a-priori information on the position, however, eliminates also the enhanced left-sided performance for individual elements. The number of elements was of no direct influence on the performance time. There has been an increase in time to perform task with one, three and five elements. As described in Hick’s law, there is a correlation between the number of reaction alternatives and the reaction time. The more complex one task, the longer one needs for the task. In terms of cognitive performance there were no differences between both sides. This is an important finding as this suggests that the performance of usery that are engaged in a selection task, such as navigating a webpage with several menu elements, is not affected by the side the elements are shown on the screen. Phenomena like the pseudoneglect or the intrinsic turning behavior in right-handers, i.e., turning toward the left side due to hemispheric dopamine asymmetries [22], may argue against such a behavior. Possible explanations for the enhanced performance during the perception of individual stimuli on the left side could be the biomechanics of the hand and the influence of the writing culture. As known there are many differences between western and eastern cultures, which have a direct influence on designing user interfaces. This brings us directly to the limitations of this study, consequently our further research duties: This study was conducted only with German speaking participants; hence the results cannot be generalized to other cultural areas – especially in terms of reading habits. It would be very interesting to conduct a similar study in a language with a completely different orthography (e.g. Japanese). By this, we could learn, whether and to what extend the effects are biased by alphabetical writing systems. Also, as the reading direction could also impact the left-side benefit, it would be insightful to replicate the study in a language with a reverse reading direction (as e.g Hebraic or Arabian language). A further limitation is that we used a webpage simulation in a laboratory experiment, and thus in real-life there could be a lot of different influences with different additional effects. For example, due to experimental requirements the distance between user and interface was fixed and controlled. In contrast, in real using settings, end users do naturally move in front of the screen, resulting in different distance and angles.
176
A. Holzinger, R. Scherer, and M. Ziefle
In future work different target elements should be used, e.g. symbols. Currently we have developed an online experimental tool, were we plan to replicate this studies within an intercultural context on a large scale. Summarizing, the best performance was achieved when selecting individual elements on the left side of the screen. Besides from this, when end users have no cue or several options to choose from, the side does not significantly impact on the performance. However, due to the fact that the majority of the population is righthanded, and right-handers tend to focus on the left side, placing navigational elements on the left side may be the most intuitive position. Besides this, end users are accustomed to it and they expect it there; consequently the first designers were intuitively right, and we can conclude, yes, it was proper intuition – and it is not recommendable for “modern designers” to switch to the other side. Acknowledgements. We are grateful for the valuable comments of the anonymous reviewers. Many thanks to the students Sylvia Graf and Stefan Mayr for their work on programming and experimenting, and to Prof. Dietrich Albert for his support of this work.
References 1. Kalbach, J., Bosenick, T.: Web page layout: A comparison between left and right-justified navigation menues. Online Journal of Digital Information (2011), http://journals.tdl.org/jodi/article/view/94/93 (last access: January 10, 2011) 2. Koyani, S.J., Bailey, R.W., Nall, J.R.: Research-Based Web Design & Usability Guidelines (2010), http://usability.gov/pdfs/guidelines_book.pdf (last access: December 20, 2010) 3. Torres, R.J.: User Interface Design and Development. Prentice Hall, Upper Saddle River (2002) 4. Preece, J., Sharp, H., Rogers, Y.: Interaction Design: Beyond Human-Computer Interaction. Wiley, New York (2002) 5. Johnson, J.: GUI Bloopers 2. 0: Common User Interface Design Don’ts and DOS. Morgan Kaufmann Series in Interactive Technologies. Morgan Kaufmann, Amsterdam (2007) 6. Raskin, J.: Intuitive = Familiar. Communications of the ACM 37(9), 17 (1994) 7. van Schaik, P., Ling, J.: The effects of frame layout and differential background contrast on visual search performance in web pages. Interacting with Computers 13, 513–525 (2001) 8. Diekamp, B., Regolin, L., Güntürkün, O., Vallortigara, G.: A left-sided visuospatial bias in birds. Current Biology 15(10), 372–373 (2005) 9. Holzinger, A.: Finger Instead of Mouse: Touch Screens as a Means of Enhancing Universal Access. In: Carbonell, N., Stephanidis, C. (eds.) UI4ALL 2002. LNCS, vol. 2615, pp. 387–397. Springer, Heidelberg (2003) 10. Porac, C., Searleman, A., Karagiannakis, K.: Pseudoneglect: Evidence for both perceptual and attentional factors. Brain and Cognition 61(3), 305–311 (2006) 11. Jewell, G., McCourt, M.E.: Pseudoneglect: A review and meta-analysis of performance factors in line bisection tasks. Neuropsychologia 38(1), 93–110 (2000)
Navigational User Interface Elements on the Left Side
177
12. McCourt, M.E., Jewell, G.: Visuospatial attention in line bisection: stimulusmodulation of pseudoneglect. Neuropsychologia 37(7), 843–855 (1999) 13. Rueckert, L., Deravanesian, A., Baboorian, D., Lacalamita, A., Repplinger, M.: Pseudoneglect and the cross-over effect. Neuropsychologia 40(2), 162 (2002) 14. Fitts, P.M.: The Information Capacity of the Human Motor System in Controlling the Amplitude of Movement. Journal of Experimental Psychology 47(6), 381–391 (1954) 15. MacKenzie, I.S., Tatu, K., Miika, S.: Accuracy measures for evaluating computer pointing devices. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 9–16 (2001) 16. Shneiderman, B.: Designing the User Interface. In: Strategies for Effective HumanComputer Interaction, 3rd edn., Addison-Wesley, Reading (1997) 17. Holzinger, A., Höller, M., Schedlbauer, M., Urlesberger, B.: An Investigation of Finger versus Stylus Input in Medical Scenarios. In: Luzar-Stiffler, V., Dobric, V.H., Bekic, Z. (eds.) ITI 2008: 30th International Conference on Information Technology Interfaces, pp. 433–438. IEEE, Los Alamitos (2008) 18. Hick, W.E.: On the rate of gain of information. Quarterly Journal of Experimental Psychology 4, 11–26 (1952) 19. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949) 20. Card, S.K., Moran, T.P., Newell, A.: The psychology of Human-Computer Interaction. Erlbaum, Hillsdale (1983) 21. Seow, S.C.: Information theoretic models of HCI: A comparison of the Hick-Hyman law and Fitts’ Law. In: Human-Computer Interaction, vol. 20(3), pp. 315–352 (2005) 22. Mohr, C., Landis, T., Bracha, H.S., Brugger, P.: Opposite turning behavior in righthanders and non-right-handers suggests a link between handedness and cerebral dopamine asymmetries. Behavioral Neuroscience 117(6), 1448–1452 (2003)
Pupillary Response Based Cognitive Workload Measurement under Luminance Changes* Jie Xu, Yang Wang, Fang Chen, and Eric Choi National ICT Australia University of New South Wales [email protected], {yang.wang,fang.chen,eric.choi}@nicta.com.au
Abstract. Pupillary response has been widely accepted as a physiological index of cognitive workload. It can be reliably measured with remote eye trackers in a non-intrusive way. However, pupillometric measurement might fail to assess cognitive workload due to the variation of luminance conditions. To overcome this problem, we study the characteristics of pupillary responses at different stages of cognitive process when performing arithmetic tasks, and propose a fine-grained approach for cognitive workload measurement. Experimental results show that cognitive workload could be effectively measured even under luminance changes. Keywords: Cognitive workload, eye tracker, luminance, pupillary response.
1 Introduction Cognitive workload measurement plays an important role in various application areas involving human-computer interface, such as air traffic control, in-car safety and gaming [2]. By quantifying the mental efforts of a person when performing tasks, cognitive workload measurement helps predict or enhance the performance of the operator and system. Physiological measures are one class of workload measurement techniques, which attempts to interpret the cognitive processes through their effect on the operator’s body state [5]. In the past, physiological measures usually entailed invasive equipment. With the advance of sensing technologies in recent years, the measuring techniques have become less intrusive, especially those through remote sensing. As a physiological index, eye activity has been considered as an effective indicator of cognitive workload assessment, as it is sensitive to changes of mental efforts. Eye activity based physiological measures [1] [3] [4], such as fixation and saccade, eye blink, and pupillary response, can be detected unobtrusively through remote sensing. The fact that changes of pupillary response occur during mental activity has long been known in neurophysiology, and it has been utilized to investigate cognitive workload. In an early work, Beatty investigates the pupillary response through *
NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council.
Pupillary Response Based Cognitive Workload Measurement
179
experiments that involve tasks of short-term memory, language processing, reasoning and perception [1]. Pupillary response is shown to serve as a reliable physiological measure of mental state in those tasks. Usually head-mounted eye trackers are used to measure pupillary response during the task. It is not until recently that remote eye tracking has become a popular approach for cognitive workload measurement [6] [9]. In comparison with the head-mounted eye trackers, remote eye tracker enables the non-intrusive measurement of cognitive workload without interfering with user’s activity during the tasks. Moreover, remote video eye tracker is shown to be precise enough for measuring the pupillary response. Though empirical evidence from the literature has demonstrated that eye activity based physiological measure is a useful indicator of mental efforts, it could be influenced by noise factors unrelated to the cognitive task. For example, it is reported that pupillary response is sensitive to illumination condition, fatigue, and emotional state [7] [8] [12]. These factors restrict the practical usage of pupillary response for cognitive workload measurement. In this paper we investigate the feasibility of measuring cognitive workload based on pupillary response even under luminance changes.
2 Related Work As a non-intrusive means of measuring cognitive workload, remote eye tracker has been demonstrated to be precise enough for recording detailed information of pupillary response. Klingner et al. examine the pupil-measuring capability of video eye tracking in [6]. In their experiment, cognitive workload is measured using a remote video eye tracker during tasks of mental multiplication, short-term memory, and aural vigilance. It has been observed that the remote eye tracker can detect subtle changes in pupil size induced by cognitive workload variation. Similarly, Palinko et al. also use remote eye tracking to measure cognitive workload in their experiment [9]. In a simulated driving environment, pairs of subjects are involved in spoken dialogues and driving tasks. The driver’s cognitive workload is estimated based on the pupillometric measurement acquired from the remote eye tracker. The pupillometric measurement and the driving performance exhibit significant correlation, which suggests the effectiveness of cognitive load measurement by remote eye tracker. Although the physiological measure based on remote eye tracker has exhibited its usage for cognitive workload measurement, its performance could be affected by various noise factors. Luminance condition is especially known as an important factor that influences the pupil size. Pomplun and Sunkara compare the effects of cognitive workload and display brightness on pupil dilation and investigate the interaction of both factors in [10]. They design a gaze-controlled human computer interaction task that involves three levels of task difficulty. In the experiments, each level of the difficulty is combined with two levels of background brightness (black and white), which results in six different trial types. The experiment results show that the pupil size is significantly influenced by both the task difficulty and the background brightness. There is a significant increase of pupil size when the workload demand becomes higher under both background conditions. However, the pupil size corresponds to the highest workload under white background is even smaller than that corresponds to the lowest workload under black background.
180
J. Xu et al.
3 Experiment 3.1 Participants and Apparatus Thirteen 24-to-46-year-old male subjects have been invited to participate in the experiment. All the subjects have normal or corrected-to-normal vision. Each subject receives a small-value reward for his participation.
Fig. 1. Experiment setup
The pupillary response data of each subject is recorded with a remote eye tracker (faceLAB 4.5 of Seeing Machines Ltd), which operates at a sampling rate of 50 Hz and continuously measures the subject’ pupil diameters. The skin conductance data is also recorded with a galvanic skin response (GSR) sensor (ProComp Infiniti of Thought Technology Ltd). However the analysis of the GSR data is out of the scope of this paper. Visual stimuli are presented on a 21-inch Dell monitor with a screen resolution of 1024 by 768 pixels. The experiment setup is demonstrated in Figure 1. 3.2
Experiment Design
Each subject is requested to perform arithmetic tasks under different luminance conditions. The arithmetic tasks have 4 levels of difficulty, and each level of task difficulty is combined with 4 levels of background brightness, which results in 16 different trial types in total. For each arithmetic task, each subject is asked to sum up 4 different numbers sequentially displayed on the center of the screen, and then choose the correct answer on the screen through mouse input. The task difficulty depends on the range of numbers. For the first (lowest) difficulty level, each number is binary (0 or 1); for the second difficulty level, each number has 1 digit (1 to 9); for the third difficulty level, each number has 2 digits (10 to 99); for the fourth (highest) difficulty level, each number has 3 digits (100 to 999). Each number will be displayed for 3 seconds, and there is no time constraint for choosing the answer. Before the first number appears, different number of “X” will be displayed at the center of the screen for 3 seconds. The number of “X” corresponds to the number of digits for each arithmetic task. During the experiment, the luminance condition varies when each subject performs arithmetic tasks. To produce different levels of luminance condition, luminance
Pupillary Response Based Cognitive Workload Measurement
181
Fig. 2. Time setting of an arithmetic task
(grayscale value) of the background are set as 32, 96, 160 and 224 for the four levels of background brightness (L1, L2, L3, and L4), respectively. Black background will be displayed for 6 seconds before each arithmetic task. The time setting for each arithmetic task is depicted in Figure 2. The experiment starts with a practice trial of which the data is not analyzed. Subsequently a one-minute resting data with black background is recorded before the test trials start. There are two tasks for each trial type, which results in 32 arithmetic tasks for each subject in the experiment. The tasks are presented randomly during the experiment. Once the subject finishes all the tasks, another one-minute resting data is also recorded. The whole experiment lasts about 25 minutes for each subject.
4
Analysis
In this section we analyze the correlation of the pupillary response and cognitive workload under different luminance conditions from the experimental data. 10 9
Subjective Rating
8
7 6
5
4 3
2
1
0
1
2
3
4
Difficulty
Fig. 3. Subjective rating of task difficulty
Figure 3 shows average subjective rating scores for the four levels of task difficulty from all the subjects. The scores range from 1 to 9, which correspond to the easiest and hardest tasks respectively. It can be known that there are significant differences between the subjective ratings of different task difficulty levels (F=108.63, p<0.05 in
182
J. Xu et al.
ANOVA test), which indicates the overall effectiveness of cognitive workload manipulation in the experiment. 4.1 Coarse-Grained Analysis For each subject, the pupillary response data of every arithmetic task during the experiment is examined. As a coarse-grained analysis, the average pupil diameter from the whole task period is used to characterize the cognitive workload. Figure 4 shows the average pupil diameters from that period under different levels of task difficulty and background brightness. It can be seen from the figure that the pupil diameter is influenced by the background brightness, in the sense that a smaller pupil diameter is usually observed under brighter background. On the other hand, the pupil diameter is also influenced by cognitive workload. For each background brightness level, the pupil diameter often increases when the task difficulty level becomes high. Together background brightness and cognitive workload could affect the pupil diameter. It can be observed that the pupil diameter at the highest task difficulty with highest background brightness is, in fact, smaller than that at the lowest task difficulty with lowest background brightness. This observation is consistent with previous empirical study that, luminance conditions take priority over cognitive demands in pupil diameter changes. Thus it is difficult to directly use the average pupil size or dilation to measure cognitive workload in the experiment.
Fig. 4. Pupil diameter under different task difficulty levels and background brightness conditions
The above analysis shows that the coarse-grained measures of pupillary response could not effectively measure cognitive workload under luminance changes. To overcome this problem, we propose a fine-grained analysis of pupillary response in the following section. It is expected that the dynamic characteristics of cognitive process could be reflected by the fine-grained measures of pupillary response, which will improve cognitive workload measurement under complex environments.
Pupillary Response Based Cognitive Workload Measurement
183
4.2 Fine-Grained Analysis For a fine-grained analysis of pupillary response, the 12-second task period is divided into smaller-size intervals. As shown in Figure 5, we examine five 3-second intervals corresponding to different stages of the cognitive process when performing the task. We denote X as the interval for the “X” displaying interval, and N1, N2, N3 and N4 as the four 3-second number displaying intervals respectively. Additionally, we also examine two 6-second intervals based on N1, N2, N3 and N4. Let M1 be the first 6second of number displaying interval, and M2 the second 6-second interval. The setting of task intervals can be found in Figure 5. On the basis of these interval definitions, there are 6 intervals for each arithmetic task. We measure the average pupil diameters from these 6 intervals.
Fig. 5. The setting of task intervals for fine-grained analysis
For each task, average pupil diameter is measured for all the intervals. To reduce the influence of luminance condition, we normalize the measurement values for interval N1, N2, N3, N4, M1, and M2 using average pupil diameter d X from the X interval, as there is no cognitive workload involved in that interval. Let d N be one pupil size measurement from one task interval, its normalized measurement value is defined as d n = (d N − d X ) . dX
Figure 6 demonstrates the distributions of measurement values from M1 and M2 under different task difficulty levels. The figure demonstrates the characteristics of pupillary response at different time intervals. Even under the influence of luminance changes, the measurement values increase as the task difficulty increases. Such trend is more significant in the measurement values from M2 (F=3.93, p<0.05 in ANOVA test). We further examine interval M2 by studying the measurement values from N3 and N4, which are shown in Figure 7. As shown in Figure 7, the trend of the increase in the measurement values with increasing task difficulty is more significant in N4 (F=3.43, p<0.05). In addition to the above analysis, cognitive workload classification has also been investigated using measurements from different intervals. We employ a decision treebased classification scheme [11] to classify the cognitive workload. Specifically, given the measurement values from different classes, a threshold is estimated such
184
J. Xu et al.
0.4
0.3
0.3
0.2
0.2 Value
Value
0.1
0.1
0
0 -0.1
-0.2
-0.1
1
2
3
-0.2
4
1
2
Difficulty
3
4
Difficulty
Fig. 6. Box plots of measurement values (sample minimum, lower quartile, median, upper quartile, and maximum) under different task difficulty levels: (left) M1, (right) M2 0.4
0.4
0.3
0.3
0.2 Value
Value
0.2 0.1
0.1
0
0
-0.1
-0.1
-0.2
1
2
3
4
-0.2
1
2
Difficulty
3
4
Difficulty
Fig. 7. Box plots of measurement values (sample minimum, lower quartile, median, upper quartile, and maximum) under different task difficulty levels: (left) N3, (right) N4
that maximum information gain can be achieved by splitting the data using that threshold. One threshold is needed for two-class classification while three thresholds are required for the four-class classification. We conduct both two-class classification (task difficulty 1, 2 vs. task difficulty 3, 4) and four-class classification of cognitive workload. The classification results are shown in Table 1. As shown in Table 1, M2 outperforms M1 for both two-class and four-class classification. N4 achieves the highest performance in both tasks. The measurements from different intervals reveal the dynamic characteristics of pupillary response at different stages of cognitive process, which can be utilized to improve the performance of cognitive workload assessment under complex environments. Table 1. The classification results of different pupillary response measurements Pupillary Measurements Two-class Classification Four-class Classification
M1 59.3% 36.6%
M2 71.6% 41.7%
N3 68.9% 43.0%
N4 72.7% 43.9%
Pupillary Response Based Cognitive Workload Measurement
185
6 Conclusion This work investigates the measurement of cognitive workload through remote eye tracking under the influence of luminance condition. We study the characteristics of pupillary response, by analyzing the measurements acquired from different stages of cognitive process. The experimental results demonstrate the feasibility of cognitive workload measurement under complex environments using the proposed fine-grained analysis. Our future work will be applying machine learning techniques to improve fine-grained analysis for cognitive workload measurement.
References 1. Beatty, J.: Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin 91, 276–292 (1982) 2. Cain, B.: A review of the mental workload literature. Technical Report, Defence Research and Development Canada Toronto (2007) 3. Chen, S., Epps, J., Ruiz, N., Chen, F.: Eye activity as a measure of human mental effort in HCI. In: International Conference on Intelligent User Interfaces, pp. 315–318 (2011) 4. Fogarty, C., Stern, J.: Eye movements and blinks: Their relationship to higher cognitive processes. International Journal of Psychophysiology 8, 35–42 (1989) 5. Grootjen, M., Neerincx, M., Weert, J.: Task-based interpretation of operator state information for adaptive support. In: Schmorrow, D., Stanney, M., Reeves, M. (eds.) Foundations of Augmented Cognition, 2nd edn., pp. 236–242 (2006) 6. Klingner, J., Kumar, R., Hanrahan, P.: Measuring the task-evoked pupillary response with a remote eye tracker. In: Eye Tracking Research and Applications Symposium, pp. 69–72 (2008) 7. Kramer, A.: Physiological metrics of mental workload: A review of recent progress. In: Damos, D. (ed.) Multiple-Task Performance, pp. 279–328. Taylor and Francis, Abington (1991) 8. Marshall, S.: The index of cognitive activity: Measuring cognitive workload. In: IEEE Human Factors Meeting, pp. 7-5–7-9 (2002) 9. Palinko, O., Kun, A., Shyrokov, A., Heeman, P.: Estimating cognitive load using remote eye tracking in a driving simulator. In: Eye Tracking Research and Applications Symposium, pp. 141–144 (2010) 10. Pomplun, M., Sunkara, S.: Pupil dilation as an indicator of cognitive workload in humancomputer interaction. In: International Conference on Human-Computer Interaction, pp. 542–546 (2003) 11. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993) 12. Xu, J., Wang, Y., Chen, F., Choi, H., Li, G., Chen, S., Hussain, S.: Pupillary response based cognitive workload index under luminance and emotional changes. In: Annual Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1627–1632 (2011)
Study on the Usability of a Haptic Menu for 3D Interaction Giandomenico Caruso, Elia Gatti, and Monica Bordegoni Dipartimento di Meccanica, Politecnico di Milano, Via La Masa, 1 20156 Milano, Italy {giandomenico.caruso,elia.gatti,monica.bordegoni}@polimi.it
Abstract. The choice of the interaction menu to use is an important aspect for the usability of an application. In these years, different solutions, related to menu shape, location and interaction modalities have been proposed. This paper investigates the influence of haptic features on the usability of 3D menu. We have developed a haptic menu for a specific workbench, which integrates stereoscopic visualization and haptic interaction. Several versions of this menu have been developed with the aim of performing testing sessions with users. The results of these tests have been discussed to highlight the impact that these features have on the user’s learning capabilities. Keywords: Mixed Reality, Haptic Interaction, Haptic Menu.
Study on the Usability of a Haptic Menu for 3D Interaction
187
with the aim of performing several testing sessions with users. The tests have allowed us to check the impact that these features have on the user’s learning capabilities. The paper describes the functioning of our haptic menu, the testing sessions, which have been conducted to validate it, and the elaboration and synthesis of the collected data.
2 The 3D Haptic Menu
Fig. 1. A user during the interaction with the 3D Haptic Menu (montage)
The 3D Haptic Menu (3DHM), which has been investigated in our study, has been developed for being used with a specific workbench. The workbench consists of a stereoscopic visualization system, which is based on mirrors and screens for projecting a 3D image over the haptic workspace [5], and of the HapticMaster (HM) device made by MOOG [6]. The end effector of the HM is linked with a 3D cursor that allows interacting with the 3D objects. The purpose of this workbench is to enable the user with the haptic shape modification of large 3D models in real scale. Consequently, an interactive menu is necessary to activate the different modeling functions. The shape of the 3DHM has been based on the pie menus that, with the right combination of selection method and assistive forces, can provide an excellent solution for 3D haptic environments [7]. The effectiveness of the pie menu approach has been also validated by a study carried out with blind users [8]. In the pie menu, the items are arranged in a circular design around the perimeter of the circle. The items of our 3DHM are represented by six colored spheres, which have the same diameter of the HM end effector (44.5mm), and are connected by a purple torus. Fig.1 shows the stereoscopic workbench and a user during the interaction with the 3DHM. The haptic feedback of the 3DHM has been created basing on the primitive entities provided by the HM. A haptic torus limits the user’s movement around the menu
188
G. Caruso, E. Gatti, and M. Bordegoni
while each menu item is haptically rendered as a magnetic point (snap). The magnetic point is a haptic object made up of three haptic springs with the same application point but different stiffness and deadband. The first spring S1 is an omnidirectional haptic spring with stiffness K1 =250N/m without deadband. The second spring S2 is another omnidirectional haptic spring with a higher opposite stiffness K2 =−2500N/m and a deadband D2 =0.0250m. With these two springs the attractive force remains constant outside the sphere, which represents the snap volume, while, if the HM end effector goes into the sphere, it remains locked, since S2 is stiffer than S1. The S3 haptic spring, instead, has stiffness K3 =− (K2−K1 )=2250N/m and the deadband is D3 =D2 · K2 / (K2 −K1) ≈ 0.0277m. The role of S3 is to reduce the action field of the attractive force to a thickness equal to D3 −D2, and to cancel the sum of the other two haptic springs outside the snap volume. The final effect perceived by the user through the HM end effector is a magnetic attraction in the nearby of a menu item that facilitates the user to select it. We have elaborated four different versions of the 3DHM to perform our usability study. The four versions have been named as follows: no-haptic, normal, contextual and contextual no-snap. The no-haptic version does not provide the user with haptic feedback and the menu item selection is only highlighted by means of a different illumination of the item. The normal version only returns the magnetic points as haptic feedback. In the contextual version torus and magnetic points are enabled and the position of the menu is defined according to the HM end effector position, i.e. the first menu item position is the same of the HM end effector. The high external stiffness of the torus compels the HM end effector to move only into the menu. The contextual no-snap version is similar to the contextual one but only the torus haptic feedback is rendered. Table 1 summarizes the four versions of the 3DHM. Table 1. The four versions of the 3DHM. 3DHM version no-haptic normal contextual contextual no-snap
torus disabled disabled enabled enabled
snap disabled enabled enabled disabled
position centered centered relative relative
3 Evaluation Tests When people interact with a simple menu, the task they perform could be interpreted as a point-to-point movement, similar to those described in [9], involving the sensorimotor skills. Visual information, such as the menu and the cursor position, and kinesthetic information, as the position of the hand, are integrated to create a motor program that permits users to move the cursor to a predefined point. In neuroscience the structure of this sensory integration and the building of this kind of motor program has been widely discussed [10-12]. In our study, we have assumed that the user first localizes the position of the menu item, which he has to select, and the position of the cursor. Then, he codifies such coordinates in the extrinsic space and subsequently these coordinates have to be translated into the intrinsic space, enabling the user’s
Study on the Usability of a Haptic Menu for 3D Interaction
189
motor system to execute the task. When the transformation has been made, the user plans the trajectory needed to reach the menu item. All the steps of these coordinate transformations are well described in [9] and summarized in Fig.2. Visual system
Coordinate Control transformation
Target location in extrinsic space
Trajectory planning
Target location in intrinsic space
musculoskeletal system
Control
Trajectory in intrinsic space
Motor command in intrinsic space
Fig. 2. Adaptive Internal Model of Intrinsic Kinematics [9]
The aim of the evaluation tests has been to assess whether the position and the haptic features of the 3DHM can facilitate the building and the memorability of the motor program needed to reach a menu target. To perform the evaluation tests, four different users’ groups, one for each 3DHM version, have been involved to perform a simple task (between group experimental design). Each group consists of 5 users (20 users in total), all males, right handed and aged between 24 and 35. The users are naïve to the task, and had no previous experience with the haptic menu. The tests have been repeated in 6 different sessions, to evaluate whether and how the users’ ability in interacting with a particular version of the menu changed over time. The task consisted in a simple menu item selection. When a user selected one of the menu items a white sphere appeared in the 3D space and specifically in a position related to the item selected. Then, the user had to reach that sphere and select it with the cursor. The sequence of menu items to select has been read from a list by the experimenter. After this selection, the white sphere disappeared and the experimenter read from the list the number corresponding to the next item that the user had to select. The session ended when the user selected all the 30 items listed in the experimenter’s list. The number of items in the list was balanced per item (6 items per 5 repetitions) and randomized. The users have completed the whole sessions in a week. 3.1 Results The task to accomplish has been really easy and intuitive: no one of the testers made selection errors during the testing sessions. The learnability of the menus has been assessed by analyzing the selection time during the six sessions. Since we were not interested in the time needed to complete the task, but only in the time in which the testers were able to select the right menu item, only the time occurring between the click of the white sphere and the successive menu item selection has been elaborated. Those intervals have been summed for each session, in order to have the total selection time that resulted equal to the time that the user spent in selecting the right menu item during the entire session. During the testing sessions also the user’s trajectory path has been recorded. Fig. 3 shows a trajectory path elaborated from a very brief part of a testing session. In particular, the red sphere indicates the position where the user activates the menu, the colored spheres are the menu items selected and the white spheres are the target spheres. This path has been used to extract some
190
G. Caruso, E. Gatti, and M. Bordegoni
Fig. 3. A trajectory path elaborated from a testing session
general information about user’s movements. We have elaborated such data to assess if the selection of a menu item can be assimilated to a generic reaching task, as we previously assumed. In particular, by analyzing the velocity profile, which has been obtained between two subsequent selections, we have observed bell-shaped curves. In order to evaluate the usability and the memorability in using the different menus, we have evaluated how the total selection time varied per each tester and per each menu during the six sessions. The sphericity of the data set has been assessed through the Mauclhy test (p-value > 0.05) and then a two way repeated measures ANOVA has been applied to analyze the variation of the selection time along the sessions. The ANOVA’s factors taken into account are the position of the menus (contextual vs. fixed) and the presence of the snap feedback (present, absent). The kind of torus has not been considered, because the conditions (absent vs. high) were collinear with the position factor. That has been a forced choice, since to the choice enabling a torus in a fixed menu where users directly reached the item did not make sense, whereas in a contextual menu the presence of the torus could be useful (see discussion). However, further studies will be aimed to disambiguate those conditions. The analysis showed a statistically significant interaction between the haptic feedback and the position of the menu between users (F1,16 = 5.4163, p-value < 0.05). In particular, post-hoc t-test comparisons at 95% of confidence interval showed a highly statistical difference between the fixed no haptic menu and other versions (p-value < 0.001 in all cases). A strong effect of repetition (F5,80 = 11. 1619, p-value < 0.001) and an effect of the interactions repetition: haptic feedback (F5,80 = 2.6106, p-value < 0.05) and repetition: menu position (F5,80 = 3.3479, p-value < 0.01) have been found within participants. Fig. 4a shows how the users’ performance changes along the repetitions for each group of users. Lines have been drowned only to evidence a general trend in the data. Fig. 4b shows the standard error for each kind of menu in each session. As a general comment, the contextual haptic menu seems to have a more homogenous dataset, whereas the fixed haptic standard menu generally a grater variance of the results.
Study on the Usability of a Haptic Menu for 3D Interaction
191
Fig. 4. a) Means of each group in each session. b) Standard error for each group and session
We also collected some impressions from the users who interacted with the different versions of the menus. The main feedback we received was about the haptic feedback modality, which was requested in the case of the non-haptic menu. Moreover, most of the users found disturbing the snap attraction in the contextual modality.
4 Discussion In the above-described testing sessions, the user’s learnability and usability of the different versions of our 3DHM have been evaluated. The test has been structured by assimilating the menu item selection to a reaching task. Our assumption has been supported by the analysis of the trajectories and the velocity profiles that are, in all cases, similar to those examined in [14] and assume characteristics as those of a reaching task. The two-way repeated measures ANOVA showed statistical significant differences between subjects belonging to different groups. In particular, post-hoc comparisons showed that in the no-haptic menu the performance times are different (higher) from those obtained when using the other menu versions. This is due to the fact that the total absence of a haptic feedback made difficult for users to reach the menu items. In other words, they found difficult to reach the items basing only on the visual feedback. In a 3D scene, indeed, it is important to have as much information as possible about the position of the object. The haptic modality provides a feedback in this direction, whereas the visual modality can be (at least in absence of external landmark, as in our case) insufficient. Furthermore, within effects showed that the learning ability (interaction between repetition and the other factors) marginally depends on the kind of snap haptic feedback (present/not present) but considerably on the position in which the menu appeared. Our initial hypothesis was that a haptic menu, which simplifies the motor program involved in a reaching task, could improve the usability and the learnability of the
192
G. Caruso, E. Gatti, and M. Bordegoni
menu itself. To understand how and to what extent the position and the haptic features influence the motor program simplification we have developed four versions of our 3DHM. The activation of a menu linked to the position in a 3D space of the user’s hand facilitates the construction of a motor program. Linking the position of the menu to the position of the cursor, indeed, makes the users able to learn the position of the menu items in the intrinsic space, since it is assumed that the position of the cursor should be already in the intrinsic coordinates. Thus, learning the position of the menu items in the intrinsic coordinates facilitates the learning of the menu: actually, in that way the user can avoid performing the coordinate transformation step during the creation of the motor program. Moreover, the presence of the torus could be determinant in minimizing the errors in the menu item selection during learning, constraining the user’s movement in a right plane: further studies will be aimed to support this hypothesis. The snap haptic feedback also appears to be slightly significant in the learning process. This is because extra kinesthetic cues, as the force feedback, could facilitate the tuning of the user attention on the proprioceptive cues, which are codified in intrinsic coordinates. Nevertheless, the best performance for the mean of execution time in different sessions has been obtained by the contextual nosnap version. This is probably due to the fact that in the contextual version, the presence of the snap points, although prevents a faster execution of the task, is useful for remembering the item position.
5 Conclusion In this paper a new kind of haptic menu has been presented: the 3DHM. Starting from the studies concerning the building of motor programs [9-12] the usability and the learnability of 4 versions of the 3DHM have been tested, in order to assess whether the position and the haptic features of the 3DHM can facilitate the building and the memorability of the motor program needed to reach a menu target. The data collected have highlighted the effectiveness of our investigation approach, based on the cognitive features of the user. In general, the haptic feedback improves the usability of a 3D menu both in terms of usability and learnability. Besides, we have assessed that there are significant differences related to the menu position in the 3D space: a contextual position makes the menu little intuitive in the first sessions but the performance, in term of task-execution times, sensibly increases. We also assisted to an effect of the haptic feedback both in terms of usability and learnability. We conclude that in both the cases haptic feedback is a useful characteristic that should be implemented in 3D menus. Due to the high variance of the collected data among the users, further studies are certainly necessary to achieve more significant results. In particular, we consider interesting to conduct supplementary tests, in blind conditions, to assess the influence of different haptic feedbacks in the memorability of a haptic menu and to clarify the role of the torus in the contextual version. Acknowledgments. The authors would like to thank all the participants to the testing sessions and in particular Daniela De Lucia and Eleonora Bartoli for their precious contribution.
Study on the Usability of a Haptic Menu for 3D Interaction
193
References 1. Raymaekers, C., Coninx, K.: Menu Interactions in a Desktop Haptic Environment. In: Proc. Eurohaptics, Birmingham, UK, pp. 49–53 (2001) 2. Oakley, I., Adams, A., Brewster, S., Gray, P.: Guidelines for the Design of Haptic Widgets. In: Proc. 16th British HCI Group Annual Conference, London (2002) 3. Oakley, I., Brewster, S., Gray, P.: Solving Multi-Target Haptic Problems in Menu Interaction. In: Proc. CHI 2001 Conference on Human Factors in Computing Systems, Seattle, Washington, pp. 357–358 (2001) 4. Oakley, I., McGee, M.R., Brewster, S., Gray, P.: Putting the Feel in ’Look and Feel’. In: Proc. CHI 2000 Conference on Human Factors and Computing Systems, The Hague, The Netherlands, pp. 415–422 (2000) 5. Bordegoni, M., Covarrubias, M.: Direct Visuo-Haptic Display System Using a Novel Concept. In: Proc. 3th Eurographics Symposium on Virtual Environments (2007) 6. HapticMaster: http://www.moog.com/products/haptics-robotics (accessed on June 2011) 7. Komerska, R., Ware, C.: A study of haptic linear and pie menus in a 3D fish tank VR environment. In: Proc. 12th International Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, pp. 224–231 (2004) 8. Sjöström, C.: Designing Haptic Computer Interfaces for Blind People. In: Proc. Sixth IEEE International Symposium on Signal Processing and its Applications, Kuala Lumpur, Malaysia (2001) 9. Imamizu, H., Uno, Y., Kawato, M.: Adaptive Internal Model of Intrinsic Kinematics Involved in Learning an Aiming Task. Journal of Experimental Psychology: Human Perception and Performance 24(3), 812–829 (1998) 10. Desmurget, M., Grafton, S.: Forward modeling allows feedback control for fast reaching movements. Trends in Cognitive Sciences 4(11), 423–431 (2000) 11. Plamondon, R., Alimi, A.M.: Speed/accuracy trade-offs in target-directed movements. Behavioral and Brain Sciences 20(2), 279–349 (1997) 12. Pelisson, D., Prablanc, C., Goodale, M.A., Jeannerod, M.: Visual control of reaching movements without vision of the limb. II. Evidence of fast unconscious processes correcting the trajectory of the hand to the final position of a double-step stimulus. Experimental Brain Research 62(2), 303–311 (1986) 13. Gazzaniga, M.S.: The new cognitive neuroscience. MIT Press, Cambridge (2000) 14. Digby, E., Gordon, B., Matthew, H.: The control of goal-directed limb movements: Correcting errors in the trajectory. Human Movement Science 18(2-3), 121–136 (1999)
Balancing Act: Enabling Public Engagement with Sustainability Issues through a Multi-touch Tabletop Collaborative Game Alissa N. Antle, Joshua Tanenbaum, Allen Bevans, Katie Seaborn, and Sijie Wang Simon Fraser University, Surrey, BC, Canada {aantle,jtanenba,alb19,kseaborn,swa50}@sfu.ca
Abstract. Despite a long history of using participatory methods to enable public engagement with issues of societal importance, interactive displays have only recently been explored for this purpose. In this paper, we evaluate a tabletop game called Futura, which was designed to engage the public with issues of sustainability. Our design is grounded in prior research on public displays, serious games, and computer supported collaborative learning. We suggest that a role-based, persistent simulation style game implemented on a multi-touch tabletop affords unique opportunities for a walk-up-and-play style of public engagement. We report on a survey-based field study with 90 participants at the 2010 Vancouver Winter Olympics (Canada). The study demonstrated that small groups of people can be immediately engaged, participate collaboratively, and can master basic awareness outcomes around sustainability issues. However, it is difficult to design feedback that disambiguates between individual and group actions, and shows the temporal trajectory of activity. Keywords: Public displays, sharable displays, digital tabletops, interactive surfaces, group interaction, multi-touch interaction, public participation, public engagement, social issues, sustainability, collaborative learning, serious games, simulations.
Balancing Act: Enabling Public Engagement with Sustainability Issues
195
sustainable development was a key theme during this sports and cultural event. The topic has several features that make it an appealing choice when investigating the design space of public displays. Understanding the key elements of planning for a sustainable future in an urban area requires that the public understand the views of the different stakeholders involved, recognize the spatial nature of the issue, and have an awareness of the complex trade-offs involved in preserving the environment and developing facilities to support a growing population. In response to our research questions, this opportunity, and our prior work studying interaction on digital tabletops (e.g. [3]), we developed a digital media application implemented on a custom multi-touch tabletop called Futura: the Sustainable Futures Game. In this paper, we discuss the problem of designing for engagement and small group interaction with a walk-up-and-play tabletop display in a busy public venue. Our hypothesis is that a multi-touch tabletop interface to a fast-paced, role-based simulation game can support groups of participants to meaningfully engage with content about social issues (e.g. sustainability) in a public space. We report on our field study at the 2010 Vancouver Winter Olympics. The study utilizes observational notes summarizing the reactions, responses and interactions of hundreds of players and a semi-structured survey with 90 study participants that was designed to provide insights and reveal issues with our design solution. Prior research based on the Futura game has included two observational studies focusing on learning [4], and a comparative study focusing on the effects of tangible and multi-touch tools on collaboration [5].
2 Related Work This research builds on recent work in the domains of sharable public displays, multitouch tabletops, serious games and computer supported collaborative learning. We provide a summary of relevant articles here, and identify key concepts and issues in this design space. 2.1 Shareable Public Displays Shareable displays are those interfaces that provide for multiple inputs and support group interaction [6]. While a large body of work focuses on the ambient nature of public displays (e.g. [7]), this is not our focus. Many other studies of sharable displays have occurred in semi-private spaces and involve groups of participants who know each other (e.g. [8]). Several findings are relevant. For example, an early study revealed the importance of supporting a walk-up-and-use style of interaction that is so self-explanatory that first-time users need no prior training [9]. An observational study on BlueBoard, a touch screen display for sketching that uses RFID to identify individuals, highlighted the importance of visibility of actions so that users can learn from watching each other’s actions and the challenges of turn-taking [10]. While these findings may be relevant, our focus is on small groups of the public (who may be strangers) interacting with displays in large and busy public venues. Research that focuses on large, multi-user wall displays in public settings is relatively rare due to technical difficulties with lighting and engaging first-time users
196
A.N. Antle et al.
[2]. However, several studies have revealed important factors to consider in design. One study raised the issue of how to encourage users to interact with the display in a busy setting [11]. The authors suggest positioning the display in a thoroughfare. Hornecker et al. identify entry and access points as important design considerations for displays in public spaces [12]. Entry points invite or entice people to interact with a display. Access points are interface characteristics that enable the user to interact, participate and join group interaction. Designs for public spaces must include both entry and access points to support walk-up-and-play. In a detailed enthnographic study of CityWall, the authors identify and address issues of conflict management of digital objects and difficulties in turn-taking [2]. They suggest that successful designs must be playful, easy to use and novel. In addition, the authors suggest that designers should consider how to support parallel interaction; how to support learning through observing others; how enable a teacherapprentice approach; and the impact of the deployment on the surrounding architectural space. In a follow-up study, the authors identified and addressed several challenges in designing for engagement and parallel interaction [13]. They suggest using a multiple, 3D virtual worlds metaphor to support parallel interaction, reduce conflict, and help manage territoriality. 2.2 Multi-touch Tabletop Displays Much research has been conducted on multi-touch and surface computing. However, the focus of this work has largely been on the details of touch-based interaction in collaborative workspaces and labs (e.g. [14-16]). Some findings are relevant to the design of public tabletops. Hornecker makes the case for the need for immediate apprehendability in public displays for learning [1]. Scott et al. make recommendations based on Computer Supported Collaborative Work studies concerning personal, group and storage territories [17]. Global actions should be visible and transparent. Local spaces should be configured according to functionality. Personal territories should be oriented to player location to ease reading. Occlusion that may occur in personal spaces should be considered. Storage territories should be provided for discrete, private objects. 2.3 Serious Games Serious digital games have been designed to support informal and formal learning and training. An excellent overview that we draw on here in presented in [18]. Digital games provide visual digital information to one or more players, take input from players, process that input according to a set of programmed game rules, and change the digital information displayed back to players. A defining characteristic of a game is the set of game rules. Developers of serious games have hoped to harness the motivational power of games for learning. However, many educational games appear to be neither entertaining nor particularly successful in teaching players. These games suffer from two fundamental flaws: the people who design them often do not have traditional game design experience [19] and they do not properly translate knowledge, facts and lessons into the language of games, namely mechanics, rules, rewards and feedback [20].
Balancing Act: Enabling Public Engagement with Sustainability Issues
197
A number of serious games have been developed that have environmentally conscious themes [21]. Games can potentially support an understanding of natural processes due to their use of procedural rhetoric to engage players in problem solving activities. In addition, games facilitate thinking procedurally about the consequences of actions on the environment. Microworlds are a simple but complete model of a domain or system that enables a person to “live” in that domain for some period of time. A simulation is a model of some domain [22]. An advantage of simulations for learning is that they provide direct access to subject matter or content that might not be readily accessible in the real world. A key assumption behind simulations is that users will “learn by doing”. However, there are challenges in using game-based procedural rhetoric for learning [23]. Procedural rhetoric may elicit emotions that are inconsistent with the game’s themes and goals, and in some cases the unexpected and emergent properties of the system can undermine the message of the game. 2.4 Computer Supported Collaborative Learning (CSCL) Collaboration has been defined as “a process by which individuals negotiate and share meanings” and “a coordinated, synchronous activity that is the result of a continued attempt to construct and maintain a shared conception of a problem” [24]. This differs from cooperative activities in which learners may coordinate their efforts, but the work performed is primarily individual and in parallel [25]. Within education, the CSCL community has begun to research how multi-touch tabletops may support collaborative learning. Several studies have found that learners find tabletop interfaces engaging, enjoyable and playful [26]. However, findings have not always suggested learning gains. For further details of important elements required to support learning through collaboration, see [4, 5].
3 Designing the Futura Tabletop Game 3.1 Design Challenges The high level design goal was to create an application that facilitates a shift in awareness or disposition around issues of sustainability in urban planning and development. Specifically, we are interested in helping the public understand the complexity of meeting the needs of a growing population while preserving the environment. We wanted to support the notion that sustainability is not simply achieved by making the right choices, but rather involves a complex negotiation between different stakeholders, in which short term decisions to meet human needs often have detrimental long term effects on the environment. A second important high level goal was that the installation had to be suitable for a fast throughput of large numbers of users at the Olympics cultural site. This type of massive public event introduces design constraints around scale, mobility, robustness and throughput of the tabletop installation. In order to successfully enable groups to engage with interactive content about the issue of sustainable land use planning for an urban basin, we propose the following five specific design goals that must be met. Learning: Individual users should gain an improved understanding of the complexity involved in balancing human and natural factors in land use planning for a
198
A.N. Antle et al.
sustainable future for an urban area. This outcome is conceptualized as something that is gained through a shift in awareness about the complexity of achieving sustainable development. It is not about learning concepts or facts. Group Participation: The installation must promote active participation by small groups who may be strangers. Collaborative activity should involve equitable participation by each member of each group. Walk-up-and-play: The activity must be immediately playable by small groups who can simply walk up and interact with the game, regardless of previous experience (or lack thereof) with new forms of interactive media. Apprehendability: The content must be immediately understandable to people with a range of different knowledge levels about sustainability issues. Appeal: The activity must appeal to a wide range of ages, cultures and preferences. In previous work, we presented and discussed our design rationale specific to learning outcomes for the game [4]. In order to avoid duplication, in this paper we do not provide a detailed design rationale. Instead, we provide an overview of the game, and focus on presenting the results from our survey study. We contribute suggestions for generalizable design guidelines and two issues that must be overcome when designing collaborative, tabletop games that enable public engagement with the social issues. 3.2 System Implementation Futura runs on a multi-touch version of EventTable, a custom multi-touch and tangible tabletop prototyping platform (initially described in [3]). Our system is housed in a modified IKEA wood and metal table with telescopic legs and a custom metal undercarriage that supports the camera, PC and projector hardware. The surface of the table’s wood frame supports an Endlighten™ acrylic surface. Our sensing system relies on a diffused surface illumination (DSI) technique with four infrared strips, one on each side of the rectangular surface. We capture touches with a single web camera with an infrared filter on a wide angle lens embedded in a custom mount. The camera can cover an active sensing area of 85 by 68 cm. The camera and firewire cable provides 30 frames per second (fps) data capture. We process camera data using Community Core Vision, an open source finger tacking software. The game application is written using an existing C# multi-touch library called Breezy. A single short-throw projector provides a resolution of 1024 * 768 pixels on the output surface of 103 by 68 cm. We use a velum mat, attached below the Endlighten™ acrylic for the projection surface, leaving the top surface of the acrylic exposed, which results in better touch tracking. 3.3 Game Description Futura is a multi-player simulation game played on a bar height tabletop (Figure 1). The game is designed to enhance people’s awareness of the complexity of sustainable development planning for an urban area. The goal of the game is to work with the other players to support a growing population as time passes while minimizing negative impact on the environment. The game can be played by two to six players. The game is suitable for players of 7 years old and above. The game takes place in a locale that resembles the Fraser River basin (near Vancouver, Canada).
Balancing Act: Enabling Public Engagement with Sustainability Issues
199
There are three distinct roles that players can take: food, shelter, or energy supply. Each role has an individual toolbar oriented to each of the three sides of the table (Figure 2). The goal of the game is to support the population living in the area without having a catastrophically negative effect on the environment. Players must decide what kinds of food, shelter, or energy producing facilities to construct, and attempts to achieve balance in terms of the population support: neither wasting resources nor failing to provide for the population’s needs.
Fig. 1. Sustainability game on our custom tabletop
At the start of the game, there is a small base population present in the area. Over the course of the game, the population gradually grows. To meet the needs of the growing population, players drag facility tokens from the left side of their individual toolbars (e.g. housing, power plants, and farms) onto the map. For example, Figure 2 shows that single dwelling houses have been dragged from the left of the shelter toolbar onto the map. The game is won if players can add facilities in ways that meet the needs of a growing population without compromising the environment. Each facility has a cost to build, can support a specific number of people and has a specific effect on the environment. Some facilities will produce more as time passes (i.e. will support a larger population), and some will produce less (i.e. support a smaller population). Information about different forms of environmental damage that may be caused by different facilities (e.g., physical waste, atmospheric pollution, chemicals in the water, pesticides in the food) are available through informational screens, which can be accessed before or during game play by touching and holding on each resource token. The map interface provides both feedback of the global game state and feedback of individual tools. Each role has an icon that represents it on that role’s toolbar and in the global display area (Figure 2). Food is a knife and fork. Shelter is a house. Energy is a lightning bolt. Each of the three toolbars has its role icon in the centre (left, bottom and right in Figure 2). A role toolbar is wide enough to be played by 1-2 players. Players use their toolbar to access information about facilities and can drag facilities from the toolbar to the map if they have enough money to pay for them. As the game progresses, the environmental and population impacts of all the facilities currently on the board add up to a global environmental and a population effect. A global status console area on the fourth side of the table (top in Figure 2) shows this cumulative state of the game in terms of the environment (anthropomorphic tree) and population
200
A.N. Antle et al.
(face). The cumulative game state is also indicated through the changing colours of the main map interface (Figure 3) and the tone of the ambient soundtrack. Figure 4 shows the three states of global environment feedback. The tree changes its colour (green yellow, red) and facial expression to reflect three levels of environmental damage. The population's facial expression indicates how well the current population is being supported.
Fig. 2. Map with individual toolbars (left, bottom, right) and global display console (top)
The global display console also shows each role’s status in terms of their individual impact on the environment up to this point in the game (indicated by the colour of role icon under tree) and their support of the current population up to this point in the game (indicated by the colour of role icon under face). For example, the red house on the left of the global display area indicates that the shelter player(s) has had a negative impact on the environment up to this point in the game. The game timer (top, left in Figure 2) shows the game time. The satellite is used to access game controls such as pause and reset. If the environmental impact becomes too negative, natural disasters and other negative events begin to occur (e.g. Figure 5). While the game runs in real time (there is no turn-taking), there is a time-based cycle to the game. At regular intervals (every 10-20 sec), the players each receive an additional amount of money (shown in Figure 2, right of the shelter toolbar as an orange and black circle with a “$” dollar sign).
Balancing Act: Enabling Public Engagement with Sustainability Issues
201
Fig. 3. Microworld map feedback states (colour changes)
Fig. 4. Tree global environment feedback states (tree colour and facial expression)
Fig. 5. World Event: Crime Spree (negative)
At the end of the game, a display informs the players of how they did. Summary information about group and individual outcomes is given through graphics and simple text. The game is designed to be played over and over in order to adjust strategies and learn from mistakes. It is possible, but non-trivial to end with a balanced game world. A video of the Futura project is available at http://www.antle.iat.sfu.ca/Futura/.
4 Field Survey Study In this paper we focus on the design space of public engagement with tabletop displays and report on our field survey study. The goal of the survey-based field study was to determine if our version of a collaborative simulation game implemented on multi-touch tabletop met our design goals related to public engagement with the social issue of sustainable development planning. We also wanted to identify insights or issues with our design in order to better understand the key design factors that impact public interaction with a social issue through interactive displays.
202
A.N. Antle et al.
4.1 Participants and Procedure We collected survey data from 90 participants aged nine and above. The age nine constraint was introduced by our ethics guidelines, not a design constraint. Participants were a convenience sample from the general population of visitors to one of the cultural sites at the 2010 Vancouver Winter Olympics. Our goal was to gather survey information from 80 to 100 players selected to ensure a range of ages, genders, social group configurations (e.g. family, teen groups, older adults) and cultures. 4.2 Study Setting On the Olympics Surrey cultural site, Futura was positioned in a busy and narrow thoroughfare, as suggested in [11]. This afforded people walking up to it as part of their traversal of the space and then either playing, observing, asking to play or joining the queue if there was one. The variable lighting conditions meant that the game was only open after dark, roughly 5 – 8 pm. 4.3 Survey Data Collection and Analysis Our survey focused on the five core areas based on our design goals. We asked both closed (Likert scale) and open questions to address the following design questions: • Did participants gain a better understanding of the key issues related to sustainable development? What did they learn? • Did participants actively work together to play the game? • Did this type of collaboration help them learn about key issues related to sustainable development? • Did participants “walk up and play” with little instruction? Did participants have usability problems with the multi-touch tabletop? • Did participants understand how their choices impacted the game state in real time without turn-taking? • Did participants of all ages understand the different kinds of information presented through territories on the tabletop interface (e.g. global information, individual player information)? • Did the experience appeal to participants? Did they enjoy playing? Would they play again? A seven-point Likert scale was used in the survey, where 1 was “strongly disagree” and 7 was “strongly agree”. Closed question responses were analyzed using descriptive statistics (mean, median, standard deviation) in order to determine how well our sustainability tabletop game met our design goals. We also analyzed participants’ written responses to open questions to search for common themes that would provide context for the quantitative findings. We also informally interviewed participants when the queue wasn’t too long. We restrict our analysis to the survey data with the occasional mention of findings from informal interviews in cases where this provides context for our survey results. We searched for insights into what worked and sought to identify issues in areas where participants’ responses were less positive.
Balancing Act: Enabling Public Engagement with Sustainability Issues
203
4.4 Limitations This survey-based study utilizes a convenience sample and may not be generalizable. We also recognize that the Olympics is a specific kind of pubic event that attracts a specific visitor population. The cultural event site was free for everyone but likely only attracted certain types of users. The survey was not validated, although we have used several of the statements in previous work. These factors limit our ability to generalize our results. However, for the purposes of initially studying our design, and identifying important factors and challenges, we feel that the sample and survey instrument were adequate.
5 Results Our results focus on providing evidence that we did or did not achieve our five design goals. Results provide validation for some of our design decisions, reveal insights about this design space, and uncover issues that must be addressed. We provide the survey statements and the mean, median and standard deviations of 90 participant responses for each closed question (Table 1). We follow this with quotes of written responses to open questions. 5.1 Participant Profile Ninety participants completed our survey after playing our game one or more times. Forty-three percent of the players were between the ages of 14 and 18 years old, twenty-two percent of the players were between the ages of 9 and 13, and another twenty-two percent were between 19 and 29, making the primary age group those aged between nine and 30 (i.e. youth and young adult). Sixty-four percent of the players were female and the remaining thirty-six percent male. 5.2 Overall Attitudes Overall participants found Futura fun and engaging to play and rated it a median value of 6/7 for enjoyment and repeat play (see statements S12, S13 in Table 1). Just fewer than half of the groups played more than once. This was determined in part by the length of the queue. All ages of audience, including children, teens and adults, enjoyed playing. For example, one adult player wrote, “It was educational yet very hands on and entertaining.” A nine year old boy said, “It’s fun. It’s really challenging and that’s why I like it.” 5.3 Experiences Based on the Survey Learning Outcomes. Most participants gained a somewhat better understanding of the importance of making sustainable land use decisions over time (S1) and gained a better understanding of how difficult it is to make sustainable land use choices over time (S2). Player’s written responses to the question, “What did you learn?” varied. Many related to the theme of the difficulty of balancing human and natural needs – which was a learning outcome for the game. For example, participants responded with the following written responses:
204
A.N. Antle et al.
“How difficult it is to actually balance people vs. the environment and how easily the environment is damaged.” “It's hard to make both the environment and population happy.” “I learned that we need to help the environment and conserve every little thing we have. I also learned that building houses is expensive and difficult.” “That it’s hard to keep the world in balance.” Group Participation and Interaction. Many participants somewhat agreed that they worked with the other players (S3) but there was some variation in responses. Participants were neutral about playing the game on their own and there was some variation in responses (S4). Participants somewhat agreed that working with other players helped them learn something new (S5). Most participants enjoyed playing with others (S6). In response to “What was learned working with other players?”, one player wrote, “You learned how to act as a team.” Another wrote, “The groups responsible for each aspect of development need to communicate frequently and in great detail to make sure everything working well.” Others wrote, “Cooperation is key. One person can't make a difference.” and “Everyone needs to work together in order to gain some understanding.” We observed that some players who played the game more than once shifted from individual play to collaboration, including discussions with each other about how each player was doing in terms of supporting the population and minimizing the environmental impact, and would work out strategies together to keep the indicators in the green. Walk-up-and-playability. Most participants could easily start to play by touching the screen and liked the multi-touch aspect of the game (S7). Many participants wrote that they enjoyed it because they could use their fingers to interact directly with the surface. Participants that had difficulty were usually young children, whose fingertips did not always register, or older adults who showed some hesitation to start. The variation in scores likely reflects some inconsistencies in touch tracking due to the angle of the sun in the late afternoon. Many participants understood and liked the real time (no turn-taking) style of game play (S8). Apprehendability. Many participants quickly understood what the icons and symbols on the food or shelter or energy player toolbars represented and how to use them (S9). Fewer participants also understood the role-based feedback in the global status console (S10) and/or understood how their choices affected the game over time (S11). We note here that this self-report might not be reliable but discuss this finding below (see 6.2 Issues). Although we didn’t explicitly ask about it, we observed that most participants quickly understood the world map metaphor. The map acted as a referential anchor for the players [27], enabling them to maintain a shared understanding of the game state by determining if they were winning or losing. Appeal. Many participants wanted to play the game again (S12) and enjoyed playing the game (S13). Participants responded to “I enjoyed the game because __” with the following written responses such as: "Fast, easy, educational." "Graphics and colours, move with fingers." "I got to meet new people." "Fun + thought provoking."
Balancing Act: Enabling Public Engagement with Sustainability Issues
205
Table 1. Survey results Statement S1. By playing this game I gained a better understanding of the importance of making sustainable land use decisions over time. S2. By playing this game I gained a better understanding of how difficult it is to make sustainable land use choices over time. S3. I actively worked with the other players while I was playing the game. S4. I played the game mostly on my own. S5. Working with other players helped me learn something new about sustainable land use planning. S6. I enjoyed playing this game with other players. S7. I liked playing this game because I could use my fingers to move objects around on the tabletop map. S8. I liked playing this game because the game was played out in real time without turn-taking. S9. I used the food or shelter or energy player toolbar on my side of the table to help me play the game. S10. I used the global impact information display at the top of the tabletop map to help me play the game. S11. I could see how my choices affected the game over time by using both the toolbar and global impact display. S12. If I had the chance, I’d like to play this game again. S13. I enjoyed playing this game.
Mean
Median
Std Dev
5.3
5
1.04
5.8
6
1.05
4.9
5
1.62
3.9
4
1.88
5.1
5
1.34
5.9
6
1.09
5.7
6
1.52
6.1
6
0.97
5.9
6
0.91
5.3
6
1.49
5.5
6
1.32
5.7
6
1.19
6.0
6
0.97
6 Discussion Our goals were to understand if a multi-touch surface application in a public space could support groups of users to walk up and engage with an activity about sustainability, and what design factors would be critical for its success. Our survey results are largely positive and provide evidence that it is possible to design a multitouch tabletop game to engage the public with the issue of sustainability at a busy public event. While we find that the survey results are very positive, we interpret this cautiously. It is possible that the nature of the venue for the study combined with the novelty of the game lead participants to be overly positive. Analysis of previous work in this design space combined with examination of closed and open survey question responses allow us to suggest five design features of importance that can be used to guide future designs, and to identify two issues with our design that must be overcome in future designs.
206
A.N. Antle et al.
6.1 Important Design Features Design choices result in specific design features, which in turn provide opportunities for interaction, which in turn create, shape, and constrain opportunities for public engagement with a tabletop game. Specific kinds of interactions ensure that the design meets the design goals related to learning, group participation, walk-up-andplayability, apprehendability and appeal. For example, choices about the approach and model for learning are made and instantiated in the features of the core mechanic of the game (e.g. learning through a simulation) and the user interface design (e.g. feedback providing playful guidance). Choices about the kind of group interaction enabled are made and instantiated in the game mechanic (e.g. cooperative), game rules (no turn-taking), and reward structure (winning requires collaboration). We propose that the following design features are critical to the success of a learning game that engages the public in a serious social issue such as sustainable development. We also suggest that this unique combination of design features works well together to enable walk-up-and-playability. Completely meeting the goal of apprehendability requires future work. We summarize some of our design decisions as general guidelines. In line with our exploratory survey methodology, we suggest that these guidelines are meant as considerations rather than prescriptive heuristics. They are also meant to be considered as a set, rather than individually. We suggest that it is the combination of this set of design features that lead to our positive results rather than any one of them. Entry through Fast Play in a Microworld. Drawing on Rieber’s work in blending microworlds, simulations, and games [22], the game invites people to participate in the issue of sustainable development by directly experiencing a simulated land development game in a microworld. Multi-touch interaction on a horizontal surface enables fast play by multiple players. The action on a colourful map surface draws and sustains attention in a crowded space. The four sides of the table provide entry points [12] for players (three sides) or spectators (fourth side). A hand icon on the introduction screens may have promoted touching the displays as reported in [1]. Hundreds of people played Futura during the event. We found that people played when there was no one playing (e.g. when it first opened), after taking a spectator role while waiting for a turn, and even when there was a queue. The only deterrent to entry seemed to be the length of the queue and the crowded space around the table. Guideline. Combining a simulation microworld style of game on a multi-touch tabletop enables people to quickly and easily enter the play space. Access through Roles and Territories. Futura invites people to participate in the game by providing access points [12] through roles and associated personal territories (i.e. toolbars) [17]. Some participants negotiated roles by referring to toolbars, which act as objects of shared negotiation [28], while others walked up and played the role on the side of the table where they stood. Survey data indicated that most players understood how to use the individual toolbars. Placing toolbars with a static orientation is contrary to the prevailing design guidelines that suggest content should be dynamically oriented to a user’s position [17]. In Futura, a static orientation ensures that each player is responsible for one role and one corresponding toolbar territory in a collaborative activity.
Balancing Act: Enabling Public Engagement with Sustainability Issues
207
Guideline. Providing individual roles and territories through role-based toolbars on three of the four sides of the tabletop enables people to quickly choose a role and access the game through that role and territory. Encoding Learning in Game Structure, Rewards and Feedback. In the sphere of public engagement in social issues, outcomes are informal and our survey data indicated that our goals were largely met. We suggest that a unique feature of our game is that the design addresses the problem of unreliable learning gain in collaborative games [26] by encoding learning in the game mechanics as suggested in [20]. We use a replayable game structure where winning the game both requires and rewards achievement of learning outcomes. The mechanics of the game were balanced in such a way that it was only possible to win by understanding the impact of different development choices on the environment. However, losing the game also advances learning by confronting players with the complexity of sustainable development planning. The game is currently paced so that it completes within about three minutes of play. This allows players to play and lose multiple times, and to observe the impact of different choices and decisions during this process. By keeping the time investment short, we hoped to reduce the attachment to the outcome that comes from deep sustained play in video games. Our survey showed that many participants wanted to play over and over (S12). Another important design decision was that losing is not stigmatized in any way. Although most participants lost multiple times, they still responded that they enjoyed the game (S13). The world events, such as crime spree (Figure 5), were viewed as cute and fun, and provided an indication that players were losing the game. The end state feedback is presented in informative displays that support productive discussion about what is required to keep the world in balance in future play (Figure 7). Guidelines. The combination of the following guidelines enabled learning from game play. We suggest that these guidelines are relevant to educational games in general, and that the affordances of a multi-touch tabletop enhances their effectiveness. • Using a fast, repeatable game style encourages repeat play and motivates wanting to win. • Designing the game mechanic in a way that winning requires learning outcomes to be met and it is easy to determine if learning outcomes are met. • Providing playful feedback reduces the stigma of losing and motivates learning through repeat play. • Designing the game mechanic in a way that losing furthers learning outcomes and ensures some learning even from losing. Playing without Turn-Taking. A simulation style game supports simultaneous play and avoids the issue of participants having to work out turn-taking protocols [17, 22]. Participants can drag and drop resources as fast as they can if they have enough money. The survey indicated that most players liked this style of play (S8). However, this style of game play results in some unforeseen effects described under Issues below. Guideline. Using multi-touch interaction enables fast entry to simultaneous play, which is well suited to walk-up-and-play style games in public spaces.
208
A.N. Antle et al.
Facilitating Collaboration through Game Mechanics and Interface Design. Players do not compete against each other in the game, but must work together. We balanced the game engine so that any one player had the ability to cause the whole group to lose. This meant that players who had mastered one aspect of the system had incentive to talk to other players in order to win. By encouraging the players to explain their learning to each other, we provided them with an opportunity to reinforce that learning. Written comments indicated that some participants found it difficult to win through collaboration. However, the survey results indicated that they did largely work together (S3-S6), although there was variation in responses. Another design choice that supports collaboration is that at any point in the game, any player can look at the global or individual toolbar displays to see how the other players are doing, and how they are affecting the world. No information is hidden between players. In addition, the size of the tabletop makes it possible for most people to reach across the table and interact with each other’s toolbars, which creates opportunities for negotiation [28] that we see as central to collaboration. However, the distances between all three toolbars make it impossible for any one player to reach all three toolbars simultaneously and control the game. In these ways, we support learning by observing others and a teacher-apprentice model as suggested in [2]. We also suggest that conflict was minimized simply by not rewarding it in the game rules. Players who did not collaborate, could not win. Guidelines. The combination of the following guidelines enabled collaboration. We suggest that these guidelines are relevant to collaboration in general, and that the affordances of a multi-touch tabletop enhances their effectiveness. • Requiring each player to individually win for the group to win facilitates collaboration. • Providing toolbars reachable by two of the three players reduces single player “take-over”. • Providing a single group display that is visible to all players supports group communication. • Using a single large map world interface supports players to come to a shared understanding of the game state. 6.2 Issues Issues were identified by looking at survey statements with lower mean scores and higher variation, and related written comments that allow us to infer areas that may need improvement. We identified two important issues that must be resolved in this design space and provide some suggestions for resolution. Balancing Individual and Group Game Feedback. Meeting our design goal of apprehendability requires that we reduce dependence on prior knowledge and make information understandable by players from a broad age range. To address this goal, we used visual symbols and simple images to communicate information. The size of the tabletop (103 by 68 cm) provides ample space for clear symbols and thumbnails, as well as for both individual toolbars and group display spaces. For example, we use interface metaphors including a world map, tree, and face that change colour to reflect
Balancing Act: Enabling Public Engagement with Sustainability Issues
209
cumulative, global game states. We also use house, knife and fork (food), and lightning bolt (energy) symbols to represent how well the population is supported by each role or facility type. While most players understood and used the individual toolbars (S9), not all understood that the global display represented group progress (variation in responses for S10) and each role’s individual contribution to group progress. The game requires individual play and rewards group participation. However, the real time style of play made it difficult for players to distinguish between individual and group effects, which is an important meta-cognitive strategy required for collaborative learning [29]. Understanding Short versus Long Term Effects. Participants expected to see an immediate effect of each action taken. Many players did not understand the relationship between the history of their actions and the cumulative game state. This may account for the variation in responses for S10. Many participants mentioned this in our informal interviews. However, the correct simulation model is that all actions of the players impact the game state cumulatively over time. We have encoded the correct model into the game mechanic but found it difficult to find a simple way to communicate short term impact and cumulative temporal effects to players through graphical interface feedback. In order to achieve apprehendability by using simple colour changes of the map and by using tree and face symbols to represent game states, we found it difficult to communicate the temporal trajectory of cumulative activity effectively to players. This trade-off may need to be reconsidered.
7 Conclusion Research that contributes to understanding how to design for public “walk-up-andplay” tabletop displays has only just begun. Our research contributes to this effort through the design, implementation, and deployment of an exemplar public “walk-upand-play” tabletop game. The use of such displays to engage the public in discussions about issues of societal importance is also a new field of study. Our findings from the field study of Futura demonstrate that a fast-paced, role-based, simulation, tabletop game can effectively engage the public in learning about the complexity of sustainability development. In our approach, we wove together knowledge from multiple domains to make design decisions about learning content, game mechanics, as well as information, interaction and interface design to create a tabletop and application that met goals about learning, group interaction, playability, apprehendability, and appeal. When designing for this space, we suggest that entry points can be achieved by using a colourful map-based microworld on a multi-touch tabletop platform. While previous work suggests dynamic orientation of territories [23], we use static personal territories to enable role-based play, which provides interface access points to encourage immediate playability and eliminates the need for turn-taking. However, our analysis revealed tension between using real time play and effectively supporting learning about how individual versus group actions affected the game outcomes. We encode learning content in the game mechanics of a simple simulation game where both winning and losing enable learning outcomes. We use a predominantly graphical style of information presentation to support apprehendability. We use a map as a
210
A.N. Antle et al.
referential anchor and shared displays to support collaboration. However, our analysis revealed a tension between using a largely visual approach to information and interface design and effectively supporting learning about the cumulative effects that short term actions have on sustainability. Future work is needed to address these design trade-offs. We suggest that our findings are applicable to researchers and designers involved in public display design and public engagement campaigns using new media channels. Acknowledgments. This research was funded by an NSERC RTI and a GRAND NCE grant (Canada).
References 1. Hornecker, E.: ”I don’t understand it either, but it is cool” - -Visitor interactions with a multi-touch table in a museum. In: Tabletop 2008, pp. 113–120. IEEE, Los Alamitos (2008) 2. Peltonen, P., Kurvinen, E., Salovaara, A., Jacucci, G., Ilmonen, T., Evans, J., Oulasvirta, A., Saarikko, P.: It’s Mine, Don’t Touch!: interactions at a large multi-touch display in a city centre. In: CHI 2008, pp. 1285–1294. ACM Press, New York (2008) 3. Antle, A.N., Motamedi, N., Tanenbaum, K., Xie, L.: The EventTable technique: Distributed fiducial markers. In: TEI 2009, pp. 307–313. ACM Press, New York (2009) 4. Antle, A.N., Bevans, A., Tanenbaum, J., Seaborn, K., Wang, S.: Futura: Design for collaborative learning and game play on a multi-touch digital tabletop. In: TEI 2011, pp. 93–100. ACM Press, New York (2011) 5. Speelpenning, T., Antle, A.N., Doring, T.v.d.H., E.: Exploring how a tangible tool enables collaboration in a multi-touch tabletop game. In: INTERACT (in press, 2011) 6. Sharp, H., Rogers, Y., Preece, J.: Interaction Design. John Wiley & Sons, New York (2007) 7. Vogel, D., Balakirshnan, R.: Interactive public ambient displays: transitioning from implicit to explicit, public to personal, interaction with multiple users. In: UIST 2004, pp. 137–146. ACM Press, New York (2004) 8. Huang, E., Mynat, E.: Sharable displays: Semi-public displays for small, co-located groups. In: CHI 2003, pp. 49–56. ACM Press, New York (2003) 9. Elrod, S., Bruce, R., Gold, R., Goldberg, D., Halasz, F., Janssen, W., Lee, D., McCall, K., Pedersen, E., Pier, K., Tang, J., Welch, B.: Liveboard: a large interactive display supporting group meetings, presentations, and remote collaboration. In: CHI 1992, pp. 599–607. ACM Press, New York (1992) 10. Russell, D.M., Drews, C., Sue, A.: Social aspects of using large public interactive displays for collaboration. In: Borriello, G., Holmquist, L.E. (eds.) UbiComp 2002. LNCS, vol. 2498, pp. 229–236. Springer, Heidelberg (2002) 11. Brignull, H., Rogers, Y.: Enticing people to interact with large public displays in public spaces. In: INTERACT, pp. 17–24. IOS Press, Amsterdam (2003) 12. Hornecker, E., Marshall, P., Rogers, Y.: From entry to access – How shareability comes about. In: DPPI 2007, pp. 328–342 (2007) 13. Jacucci, G., Morrison, A., Richard, G.T., Kleimola, J., Peltonen, P., Parisi, L., Laitinen, T.: Worlds of information: designing for engagement at a public multi-touch display. In: CHI 2010, pp. 2267–2276. ACM Press, New York (2010)
Balancing Act: Enabling Public Engagement with Sustainability Issues
211
14. Forlines, C., Wigdor, D., Shen, C., Balakrishnan, R.: Direct-touch vs. mouse input for tabletop displays. In: CHI 2007, pp. 647–656. ACM Press, New York (2007) 15. Wu, M., Balakrishnan, R.: Multi-finger and whole hand gestural interaction techniques for multi-user tabletop displays. In: UIST, pp. 193–202. ACM Press, New York (2003) 16. Rogers, Y., Hazlewood, W., Blevis, E., Lim, Y.K.: Finger talk: Collaborative decisionmaking using talk and fingertio interaction. In: CHI 2004, pp. 1271–1274. ACM Press, New York (2004) 17. Scott, S.D., Carpendale, M.S.T., Inkpen, K.M.: Territoriality in collaborative tabletop workspaces. In: CSCW, pp. 294–303. ACM Press, New York (2004) 18. Kirriemuir, J., McFarlane, A.: Literature Review in Games and Learning. Futurelab (2004) 19. Van Eck, R.: Digital game-based learning: It’s not just the digital natives who are restless. EDUCAUSE Review 41, 16–30 (2006) 20. Jenkins, H., Hinrichs, R.: Games to Teach Project, http://icampus.mit.edu/projects/GamesToTeach.shtml 21. Chang, A.Y.: Playing the environment: Games as virtual ecologies. In: DAC 2009 (2009), http://www.escholarship.org/uc/item/46h442ng?display=all 22. Rieber, L.P.: Seriously considering play: Designing interactive learning environments based on the blending of microworlds, simulations, and games. Educational Technology Research and Development 44, 43–58 (1996) 23. Treanor, M., Mateas, M.: Newsgames: Procedural rhetoric meets political cartoons. In: DiGRA (2009), http://www.digra.org/dl/display_html?chid=http://www.digra.o rg/dl/db/09300.09505.pdf 24. Roschelle, J., Teasley, S.: The construction of shared knowledge in collaborative problem solving. In: CSCL, pp. 69–197. Springer, Heidelberg (1995) 25. Dillenbourg, P.: What do you mean by ”collaborative learning”? In: Dillenbourg, P. (ed.) Collaborative Learning: Cognitive and Computational Approaches, pp. 1–16. Elsevier Science, New York (1999) 26. Piper, A.M., Hollan, J.D.: Tabletop displays for small group study: Affordances of paper and digital materials. In: CHI 2009, pp. 1227–1236. ACM Press, New York (2009) 27. Clark, H.H., Brennan, S.E.: Grounding in communication. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Perspectives on Socially Shared Cognition, pp. 127–149. American Psychological Association, Washington, DC (1991) 28. Suthers, D., Hundhausen, C.: An empirical study of the effects of representational guidance on collaborative learning. Journal of the Learning Sciences 12, 183–219 (2003) 29. Duffy, T.M., Dueber, B., Hawley, C.L.: Critical thinking in a distributed environment: A pedagogical base for the design of conferencing systems. In: Bonk, C.J., King, K.S. (eds.) Electronic Collaborators: Learner-centered Technologies for Literacy, Apprenticeship, and Discourse, pp. 51–78. Lawrence Erlbaum Associates, Mahwah (1998)
Understanding the Dynamics of Engaging Interaction in Public Spaces Peter Dalsgaard, Christian Dindler, and Kim Halskov CAVI & Centre for Digital Urban Living Aarhus University Aarhus, Denmark {dalsgaard,dindler,halskov}@cavi.dk
Abstract. We present an analysis of three interactive installations in public spaces, in terms of their support of engagement as an evolving process. In particular, we focus on how engagement unfolds as a dynamic process that may be understood in terms of evolving relations between cultural, physical, content-related, and social elements of interactive environments. These elements are explored through the literature on engagement with interaction design, and it is argued that, although valuable contributions have been made towards understanding engagement with interactive environments, the ways in which engagement unfolds as a dynamic process remains relatively unexplored. We propose that we may understand engagement as a product of the four above-mentioned elements, and in our analysis we provide concrete examples of how engagement plays out in practice by analyzing the emergence, transformation and relations between these elements. Keywords: Urban computing, engagement, interaction design.
Understanding the Dynamics of Engaging Interaction in Public Spaces
213
Turning to the literature on interaction design, the issue of engagement has been dealt with from a range of positions, from programmatic accounts arguing that engagement may be the centre of research agendas (e.g. [30]) to more detailed studies exploring particular design strategies for sparking engagement ([14], [4]), or concepts for understanding how particular attributes support engagement (e.g. [10]). While it has proved productive to study engagement with particular technologies, these technologies do not exist by themselves. Rather, they are parts of larger assemblies [19], wherein various technologies, physical properties, and forms of cultural practice shape people’s engagement with them. From our perspective, a focus on the individual object is thus too narrow for understanding people’s engagement with technology; an account is needed, that is capable of capturing engagement as a product of relations between physical, cultural, social, and content-related elements. Moreover, while studies in interaction design have succeeded in identifying specific aspects of engagement, we see a need for studying the dynamics of engagement, as it unfolds in concrete situations, and with assemblies of technologies. To pursue this line of inquiry, our paper is structured around two concerns; first, we provide an account of engaging interaction, keeping our focus on how engagement evolves, not only as a relationship between people and single technologies, but in complex situations involving other people, cultural practices, content, and physical surroundings. Second, building on this account, we explore in more detail the dynamics of engagement as it unfolds in concrete situations. To explore these dynamics, we have brought together three cases from our work. Our first case, Aarhus by Light, was a large-scale, urban installation of an interactive media façade in a public space, whereas the other two cases took place in different settings. Our second case, the Hydroscopes, was an interactive installation exploring new potential for engaging experiences at museums and science centres. Third and last, the LEGO Table was an interactive installation that explored new potential for digital marketing at a department store. From an analysis of these three cases in conjunction with one another, we have found that engagement with interactive installations may be construed as a highly relational phenomenon, characterized by the interplay between physical and spatial conditions, socio-cultural practices and constructs, and the content of the installations. In this paper, we focus on the dynamics between these four properties, and we will particularly focus on the transformations that occur during interaction, both between these aspects (for instance, transformations that concern both social and physical aspects) and within them (for instance, transformations of social aspects). Through the analysis of two specific instances from each of the three cases, we explore how engaging interaction may be understood as a relational, dynamic, and transformative phenomenon involving the above-mentioned four aspects as crucial elements. Before we turn to the specific analyses, we first introduce the four elements of engagement.
2 Elements of Engagement Based on a study of the research literature, we have identified four elements of engaging experiences: cultural practices, physical conditions, the content of the installations, and social practices. In this section, we introduce these four elements, and address how they are important aspects of engaging experiences.
214
P. Dalsgaard, C. Dindler, and K. Halskov
2.1 Cultural By highlighting the cultural element of interaction, we wish to bring attention to how cultural aspects come to bear on engagement. The use of interactive technologies typically unfolds at institutions or in situations that embody particular forms of practice. In a sense, these embody expectations concerning the kinds of activities and actions that may take occur, which in turn shape the perception of, and engagement with interactive technologies. Cultural conventions and norms may both relate to particular forms of social interaction, or particular ways of using the physical environment. Whether going to a concert, visiting a museum, or walking through the city, particular practices and conventions unfold, and are more or less implicitly expected of people. People may follow these conventions to greater or lesser extents. De Certeau’s [6] description of ‘walking in the city’ eloquently illustrates how people are influenced by the rules of power and culture determining spatial practices of moving through urban landscapes, but also the particular ways in which people circumvent and bend rules. Particular cultural conventions may be learned and shared among people, or may be manifest in artefacts and surroundings that are crystallizations of particular practices. Cultural historical activity theory ([18], [26]) explores the concepts of ‘institution’ and institutional forms of practice in great detail, highlighting how various institutional settings promote particular activities, and how artefacts reflect cultural practices. Our understanding of how socio-cultural codes of behaviour affect public interaction with digital installations is further informed by the notion of performing perception [5]. This refers to the phenomenon that most people experience, of being consciously or unconsciously aware of the constant possibility of being observed, and adjusting their behaviour accordingly, which will usually affect how an installation will be used in practice. Often, designers take advantage of established forms of practice to improve the usability and usefulness of their products. In some situations, designers will deliberately adopt a strategy of breaking with convention, in order to spark reflection (e.g. [9]), or as part of research efforts to understand these practices, as in the use of breaching experiments [13] or provotypes [25]. 2.2 Physical Physical presence and actions are central to many, if not all, engaging experiences. Physical engagement may take a number of forms. In a straightforward sense, physical engagement covers the physical manipulations carried out when controlling input devices and handling tangible user interfaces, and the bodily movements that control playful, movement-based systems such as the Wii, for instance. This understanding of physical engagement is thoroughly documented in the literature (see [22] for overview). However, our definition of physical engagement is expansive, and also covers aspects of embodiment, affect, interactive cognition, and intertwined action-reflection. As argued by Dourish [8], embodiment, ‘the property of being manifest in and as a part of the world’, has become a central feature of many recent theoretical developments in HCI. Our existence in the world as physical beings is central to the ways in which we make sense of the world, and this basic premise is inescapable, when we examine how people’s engagement with technology unfolds in
Understanding the Dynamics of Engaging Interaction in Public Spaces
215
practice. Our physical presence in a given setting means that we are affected by our surroundings even before we consciously enter into sense-making processes, which is the basis for Fritsch’s work on affective engagement in interaction design [12]. Furthermore, several strands of research have addressed the notion of sense-making through physical action: when we make sense of things, reflect upon, analyse, and make plans for our actions in the world, these processes are often supported by physical actions or manipulations, as explored in the fields of distributed cognition [20] and interactive cognition [15], for example. These schools of thought also stress the importance of understanding the role of physical materials in our interaction with our surrounding environment. In the field of interaction design, the intertwined nature of action<>reflection and mental<>physical has been explored from a number of angles, for instance, through the externalization and internalization of culturalhistorical activity [26], through the exploration of aesthetic interaction [29], and the means of engagement in interaction design [4] based on pragmatist philosophy. 2.3 Content When we interact with digital systems, or with other types of media, for that matter, there are various ways of engaging with the content of the medium or system. Our understanding of engagement with content is influenced by Dewey’s pragmatist aesthetics [7], in that we consider engagement to occur when a person invests part of herself in the encounter with the content. It is not inherent in the content itself, although the content may be structured in ways that make certain encounters and experiences more likely. Engagement with content may take place both with regard to specific parts of the content, such as a discrete instance in a narrative, or as a response to the composition of longer passages of the content, or even the totality of the content. Often, an analysis of engagement with content must be longitudinal, and encompass the entire encounter, since some of the content may initially seem insignificant or troublesome, but may later gain new significance as more content is explored or presented. Much content comprises recognizable structures and elements, such as genres, archetypes, or narrative structures [16]. However, engagement is often sparked by elements of conflict that prompt an inquisitive attitude [3]. Crafting engaging content is, therefore, also a question of addressing the balance and tension between recognizable and perplexing elements. Often, when content is static or linear, engagement with it may occur as an internal process, although the nature of the subject of our research, interactive systems, frequently demands some form of overt action. This is particularly prominent in those cases in which the content is dynamic and has a non-linear structure that prompts interaction, for example, hypertext systems, responsive installations, interactive games, and so forth ([1], [23]). 2.4 Social With regard to the social element of engaging interaction, we focus on the relation between users/potential users of the interactive installation. Our exploration of social aspects of engagement is inspired by contributions concerning computer supported cooperative work [17], which addressing the relationships between users and systems
216
P. Dalsgaard, C. Dindler, and K. Halskov
in general, and more specific contributions concerning aspects such as social interaction [28] and co-experience [11. In order to address engaging interaction, we have identified the following social relationships as crucial for understanding the various forms of engaging interaction. Social Interaction describes situations in which two or more people with no prior relationship interact, in contrast to Group Interaction, which refers to interactions initiated by two or more persons who approach the table as a group, for instance two friends, or a parent and his child. Individual Interaction describes interactions carried out by a single person. One particular kind of interaction is self-expression, wherein one or several individuals interact with an installation, with a particular focus on having others watch them [5]. Moreover, we distinguish between various forms of initiating and resuming engaging interaction (see [27]). We use the term Watch-and-join to refer to interactions initiated by people who watch the installation, then join those who are already interacting with it, in contrast to Watch-and-take-over, which describes situations in which people watch the installation, and wait until other users have left, before engaging in the interaction. Though not in itself a social element, we also distinguish between Walk-up-and-use, which applies when a person approaches the installation and immediately begins interacting with it, and Interact-and-run, in which case a person only briefly initiates the interaction before leaving the installation. Return describes the behaviour of people who return to the installation after interacting with it previously.
3 Analysis In the following, we use the above four aspects and their associated sets of distinctions as a platform for analyzing two specific vignettes from each of the four cases. In each of the cases, we have carried out extensive studies and analyses of user interaction in order get a better understanding of engaging interaction. The vignettes presented here have been selected on the basis of said studies and analyses because they are representative of the types of interaction that occurred with the installations in question, and because they embody the dynamics of engagement in a way that can be conveyed in a relatively straightforward manner. The sections are structured such that we first present the case and our approach to studying it; we then present two vignettes for each case in which different aspects of engagement come into play. In our presentation and analysis of the vignettes, we will focus in particular on three ways in which the dynamics of engagement unfolds: how engagement emergence as an encounter between a person and an installation in situe; how engagement concerns relations between one aspect and another aspects, for instance how social interaction in a situation is scaffolded by content; and how engagement unfolds as a series of transformations, for instance when individual engagement is transformed to social engagement, or when an embodied, individual state of engagement is suddenly transformed because cultural aspects come into play. 3.1 Aarhus by Light Aarhus by Light was a two-month experiment with an interactive media façade at the Aarhus City Concert Hall (Musikhuset) in Aarhus, Denmark. The main component of
Understanding the Dynamics of Engaging Interaction in Public Spaces
217
the installation was an interactive media façade designed to explore how digital technologies can affect and transform public behaviour and social interactions. It was developed by a research group at the Digital Urban Living research centre, in order to explore new possibilities of digital media in urban settings. The installation was designed for the large glass façade of the concert hall building, which was fitted with 180 square metres of semi-transparent LED screens. The LED screens were distributed in a irregular array behind the surface of Musikhuset, which faces a public park (Fig. 1).
Fig. 1. Musikhuset with the media facade installation and the three interaction zones
Visitors to the park were greeted by a spectacular view of animated creatures crawling around the structure of the glass façade, along with a constantly moving outline of the Aarhus skyline. When visitors walked through the park, they passed through three interaction zones marked by coloured carpets (pink, blue, and yellow). Using these zones allowed visitors to enter the world of the small, luminous creatures. The luminous creatures were social beings that were always (or usually) happy to see you. In the interaction zones, camera tracking translated the visitors’ presence and movements into digital silhouettes on the façade, and through the silhouettes, visitors could caress, push, lift, and move the small creatures. The creatures would wave back, fight, sleep, climb, jump, kiss, and occasionally, leave and return, thereby establishing a relationship with the visitor that was not only physical and embodied, but also emotional and narrative. Our research methodology for studying Aarhus by Light consisted of collecting empirical data in a number of ways: we compiled system data logs of the number and duration of registered interactions; we carried out a number of in-situ observations during the two-month period; we conducted twenty-five structured interviews with users; we video-recorded approximately one hundred interactions with the installation; and finally, we made time-lapse videos compiled from photos taken from a nearby tower, both during and after the two-month period, in order to observe largescale interaction patterns. In our subsequent analysis, we have combined these sources of data in order to obtain a richer understanding of how people engaged with the installation. Since the installation was placed in a prominent urban space, and was in
218
P. Dalsgaard, C. Dindler, and K. Halskov
use for a long period, it provided us with the opportunity to explore patterns of interaction that extended beyond the individual encounter. In the following presentation and analysis of how people engaged with the Aarhus by Light installation, we present two vignettes that were typical of the use patterns we observed. It is evident that physical, cultural, social, and content-oriented aspects are intertwined in these cases. In the first of the two representative vignettes, it seems that physico-spatial and content-oriented aspects are the most prominent concerns. In the second vignette, cultural and social aspects are the focus. Physical Space and Social Relations A family, consisting of a father, mother, daughter, and son, enters Musikhuset’s park from the main street. They walk down the path in which the interaction zones are located. The children start making gestures in the first zone they reach, indicating that they have noticed the interactive façade elements, and the connection between the interaction zones and the silhouettes on the façade. When they reach the interaction zone delineated by the carpet closest to the façade, the boy and girl move about and gesture, while keeping their eyes on the display. The parents do not interact, but walk slowly around in the vicinity of the interaction zones. The boy goes through a range of motions – jumping, spreading his arms and imitating a bird, waving, and trying out strange walks. When he is most active, the girl steps off the carpet. At one point, the boy takes off his jacket and waves it about, seemingly pleased with the effect it has on his silhouette and the creatures on the façade. The girl is less active, but also tries out different moves while watching the response on the façade. They occasionally turn around to look for their parents, who are also in the park, but most of the time their attention is directed at the façade. While the brother and sister are playing there, a mother and her daughter approach the carpet. The mother instructs her daughter to enter the carpet that marks the zone. The daughter waves her arms and wiggles briefly then looks towards her mother, who nods, and they leave. A group of children arrives. The boy and girl look at the group, and move to the edge of the carpet, remaining in the interaction zone. The new group of children waves and shouts while passing through. A girl from the group remains on the carpet. She moves, jumps, and so on. The boy and girl who were originally on the carpet step off it, leaving it to the newly arrived girl. The boy watches the new girl and her silhouette for a while, then moves onto the carpet and makes a few gestures, but in a more subtle way than before. If we focus on the recurring actors in the vignette, the brother and sister, it is clear that their engagement unfolds as a series of interconnected events – or a trajectory, in the words of Benford et al. [2]– as they move through the park in front of the Musikhuset, and explore the features of the installation. In at least three ways, the physico-spatial aspects seem very important to understanding their engagement: first, with regard to their physical movement through the park, among the interaction zones; second, with regard to their embodied, playful exploration of the installation; third, with regard to territorial aspects. Concerning the first issue, the installation encompasses not only the façade, but reaches into the park, and is visually very prominent, especially at night. Looking at the family in the vignette, there is a flow between the different interaction zones as they approach the concert hall, and the interaction with the luminous creatures
Understanding the Dynamics of Engaging Interaction in Public Spaces
219
becomes increasingly evident. The design of the installation allows for engagement to emerge in different ways, and for users to shift between different modes of engagement. It also allows for a social appreciation of the installation, since the individual’s interactions are made visible and enhanced through the silhouettes on the façade. This leads us to the second concern, the embodied and playful exploration of the installation. We can clearly see the emergence of physical engagement as the children enter the different interaction zones, culminating in the zone nearest the building. The boy, in particular, seems very engaged in exploring and experimenting with all kinds of movements and gestures and his engagement is transformed from an initial curiosity to a very active state. The girl is a bit more hesitant, but also tries out a variety of moves; her engagement also undergoes transformations, often towards a more observant state, because of her brother’s activity. Throughout this period of gesturing, they are both very attentive to the façade, and there is a clear relation between the physical and content-oriented aspects of their engagement. Our observations and interviews generally indicate that the most intriguing elements of the content were the luminous creatures that responded to the silhouettes. Concerning the third issue, we see several territorial aspects at work. When the brother and sister are on the carpet, they move about freely. Then, as groups pass through, they make way, although they keep their attention on the façade. Finally, when the new girl arrives, the boy pays attention to her, and moves to the edge of the carpet, making way for her. During this passage, his gaze wanders from the girl to the façade, and back. He is clearly engaged with not just interacting himself, but also with the social relation brought about by the installation. These territorial issues may be understood as a combination of physical and social aspects of engagement. There seems to be an implicit understanding of sharing the space, first between the brother and sister, then, as they move aside for the larger group, while keeping their focus on the façade; the final episode involving the new girl is a bit different, for the boy seems just as interested in observing the girl, and his movements now implicitly signal to her that he recognizes her presence and wants to share the space. Another interesting episode that highlights the role of social aspects is that of the girl who is brought to the carpet by her mother. In our reading, one of the main reasons the girl interacts with the installation seems to be that the mother prompts her to do so. This girl seems equally preoccupied with satisfying her mother and interacting with the content through physical movements. Cultural and Social Aspects A woman moves back and forth across the carpet while watching the screen. A few metres away, two men watch the screen. One of the men glances at the woman occasionally, but then turns his eye to the façade again. The woman walks away, and the man slowly approaches the carpet. He puts his hands in his pockets and turns his side to the façade. He smiles at the other man, and walks casually across the carpet, seemingly not paying much attention to it. He then walks around the edges of the carpet, making small wiggling motions, while keeping his hands in his pockets. As a couple approaches and passes through the zone, he stands back from the carpet. After they have passed, he once again steps onto the carpet and moves about. His interest shifts between observing his silhouette on the carpet, and observing how other people look at him while he interacts with the installation.
220
P. Dalsgaard, C. Dindler, and K. Halskov
As was seen in the first vignette, the physico-spatial, social, and content-oriented aspects all come into play, when we consider how the man in this vignette is engaged in the interaction situation. While he is obviously curious, and wishes to explore the content, as we see in his intentional gestures and movements while on the carpet, what is most striking is his attention to the social aspects. He waits some metres away from the carpet while the woman interacts. We can construe this as an emergent social engagement. Then, as she leaves the carpet, he takes his turn and we see the emergence of physical and content-oriented elements that transform his over-all state of engagement, although this is very clearly related to and framed by the social and cultural elements of the situation. The man is curious, but his movements are more subtle than those we observed in the previous vignette, in which the boy’s physical exploration was very adventurous and energetic. In this second vignette, the man’s behaviour is more akin to how we would typically expect adults to behave. The spatial layout of the environment also comes into effect when the man waits to take his turn, and makes way for others, which communicates a certain social etiquette; on the other hand, it is also apparent in the way in which he seems simultaneously curious enough to explore the content, yet aware that he should behave like an adult, and therefore restrain his movements. In this case, our understanding of engaged interaction with the installation cannot be decoupled from an awareness of the established practices of public behaviour. In this respect, Aarhus by Light highlights the importance of examining both the space in which an installation will be placed, and the shared social practices of that space. 3.2 Hydroscopes Our second case derives from our work in the Interactive Experience Environments (IXP) project, aimed at exploring novel interactive installations for museums and science centres. Specifically, we focus on a prototype designed for the Kattegat Centre, which is a marine centre displaying marine life from all over the world. The centre is primarily composed of large aquaria with glass sides that allow visitors to explore the variety of marine life. As part of our research efforts, we designed a prototype installation for the centre, where visitors were invited to construct fish for a virtual ocean. Fish were constructed using a physical construction kit with embedded
Fig. 2. (A)Visitors constructing fish based on a physical construction kit and (B) exploring the virtual ocean using the Hydroscopes (right)
Understanding the Dynamics of Engaging Interaction in Public Spaces
221
RFID chips. The construction kit contained the heads, bodies, fins, and tails of a variety of existing species of fish. From these pieces, visitors could create imaginary fish that combined qualities of existing species (Fig. 2, A). After visitors created the imaginary fish, they where invited to release their fish into a virtual ocean that was inhabited by the fish created by other visitors. The only way to explore this ocean was by using digital Hydroscopes (Fig. 2, B). The hydroscopes provided a view into the virtual ocean, which could be explored by pushing the Hydroscopes along the floor. The research methodology for the Hydroscopes consisted of in-situ observations over two periods of four days. Observations where supplemented by continuous video recordings of use situations. Furthermore, contextual interviews with visitors where conducted during the evaluation period. Together, these sources of data provided the material form which to gain a nuanced understanding of people’s engagement with the installation. The use of the Hydroscopes at the Kattegat Marine Centre exemplifies the dynamics between several of the elements of engagement introduced in the previous sections. Here, we focus more specifically on two examples that illustrate transformations between individual and social interaction, and between exploration of content and physical engagement. Individual and Social Interaction The Hydroscope installation was placed next to a series of large windows that provide a view into one of the larger aquaria of the marine centre. The three images in Figure 3 are snapshots of a sequence in which a user moves (the girl in the purple sweater) between the Hydroscopes installation and the large windows.
Fig. 3. (A) The girl moving between the construction table, (B) the Hydroscopes and (C) the large aquarium
There are several things worth noting, with regard to the elements of engagement described in the previous section. During the first part of the user’s interaction (Fig. 3, A), the girl is using the construction table by herself, and goes back and forth between the Hydroscopes and the construction table every time she has constructed a fish. Finding her own fish in the Hydroscopes inspires her to create new fish, which in turn prompts her to explore the digital ocean through the Hydroscopes. After approximately ten minutes, the girl looses interest in the Hydroscopes, walks to the large window of the aquarium, and spends a few moments looking at the fish (Fig. 3,
222
P. Dalsgaard, C. Dindler, and K. Halskov
C). After having watched the aquarium for a few moments, the rest of her family enters the room, and her father begins to construct a fish at the construction table. The girl immediately joins her father, and they collaborate on building a new fish. As her father is apparently using the installation for the first time, on several occasions the girl instructs her father on how the installation works (Fig. 4, A). After a few moments, her attention turns to her mother and her younger brother, who are exploring the digital ocean through one of the Hydroscopes. She walks to the Hydroscopes, and spends time moving one of these around with her brother (Fig. 4, B), before returning to the large aquarium windows.
Fig. 4. (A) The girl creating wish with her father and (B) using the Hydroscopes with her brother
This example illustrates how the girl’s engagement with the installation goes through a transformation from being individual to being social. In particular, it is worth noting that the girl initially seems to loose interest in the installation, once she has created a few fish, and the installation offers no additional depth or intrigue to keep her engaged. When she returns to the installation, her engagement takes a different form, and is driven by her social interaction with her family. In this example, the transformation from individual to social elements prompts a new mode of engagement, where the relationship between the girl and the installation is transformed. Content and Physical Interaction In the second example, involving the Hydroscopes, we focus on transformations between being primarily engaged with the content, to being engaged with the physical form of the hydroscope. Figure 5 shows a snapshot from a sequence where a boy is using the hydroscopes to explore various aspects of the digital ocean. The boy is visiting the marine centre with his family, who are also exploring the installations. The interesting aspect of this sequence is how the boy’s engagement continuously fluctuates between a relatively calm and concentrated exploration of the digital fish that he can see in the Hydroscopes (Fig. 5, A), and playful activity where the boy attempts to sit on the hydroscope, or use it as a stock car to ram into the other Hydroscopes (Fig. 5, B).
Understanding the Dynamics of Engaging Interaction in Public Spaces
223
Fig. 5. (A) A boy exploring the content of the Hydroscopes and (B) using the Hydroscope as a stock car (right)
During this sequence, the boy uses the Hydroscopes by himself, and moves them back and forth over the entire floor surface. At several points, he moves the Hydroscopes specifically towards his family members. The fluctuations between calmly exploring the content of the Hydroscopes, and the more physical and playful activity seem to rely on several elements. The physical design of the hydroscope seems important in this sequence; as the boy attempts to spin the Hydroscopes as fast a he can, he uses them as trolleys, pushing them forward while sitting on them, and he uses them as stock cars, ramming them into each other. However, the boy’s playful activities also embody a distinctly social element. During his play, he often looks up to see where the rest of his family is, and moves the Hydroscopes towards them. At several points during the sequence, the boy pauses in his playful activities, and spends a few moments looking more closely at the digital ocean and the fish that have emerged as he has moved the Hydroscopes. 3.3 LEGO Table The LEGO Table is an interactive table, designed to market LEGO Bionicle figures in a retail setting ([27]). The physical design of the LEGO Table comprises an interactive surface, a thirty-five-inch monitor, and four boxes with Bionicle figures (Fig. 6). The content consists of the four Bionicle figures: The two heroes, the large Lewa Nuva, and the smaller Tanma, in green, and the two villains, the large Radiak, and the smaller Antroz, in red. The digital content is a high-quality animation associated with each of the four figures, each of which can stand, walk, hover, fly, and fight. Moreover, Tanma can connect to the back of Lewa Nuva, and Antroz can connect to the back of Radiak. The interaction works in the following way: When a physical Bionicle figure is placed on the table, a corresponding animated figure appears on the display, and as the figure is moved on the table, the digital figures moves (either flying or walking) in the virtual 3D world. If a red and green figure approach one another, they begin fighting. Moreover, figures of the same colour have matching base profiles (Fig. 7, A), and when physically interlocked, the small figure jumps on the back of the big figure, in the virtual world. The interaction is implemented using reacTIVison software [21], together with visual markers on the bases of each of the boxes.
224
P. Dalsgaard, C. Dindler, and K. Halskov
Fig. 6. The LEGO table
Fig. 7. (A) The matching base profiles (B). The LEGO Table at the department store
The LEGO Table was in operation for a four-week period at a Danish department store. During this period, the interactive table was located in one of the main corridors, and the fact that the table faced the top of the ascending escalator facilitated easy access to the table (Fig. 7, B). During one busy shopping Saturday, activities around the interaction table were video-recorded. The initial analysis reveal that a total of 124 people were observed interacting with the table and in the six hours of observation data form the use of the LEGO table we identified 94 interactions ranging from six seconds to 25 minutes in length, see [27]. An analysis of the distribution of gender and age shows that a wide variety of people engaged in the interaction, though most were boys less than 16 years of age. The following two situations are from that particular day. The first situation represents a dynamic instance of engaging interaction including aspects of physical tangibility and social relations, whereas the second situation revolves around dynamic aspects of physicality and content. Physical Tangibility and Social Relations In one observed instance, two girls approach the LEGO Table, and one of them raises the large red box while watching the virtual world, then interacts with the two red figures in an exploratory, unstructured way, trying to make sense of the installation and of what she can do with it. The emergence of engaged interaction is enabled by the physicality of table, tangibility of boxes, and the content of the installation in terms of the familiar Bionicle universe. Once she starts using the table, she shifts smoothly from being a regular visitor to the store, to a mode in which she plays with
Understanding the Dynamics of Engaging Interaction in Public Spaces
225
the figures. Initially, she interacts individually, while the second girl watches the interaction, but in few moments she too joins in, and starts exploring the two green figures on her own. Briefly, the girls move the figures about the table in an unstructured manner, and soon, acting together, begin banging them together playfully, watching the display to see what happens. We can observe a social transition from individually exploring the LEGO table to playing together enabled by the two sets of physical boxes and but also by content (Bionicle figures do fight). A moment later the first girl disengages (i.e. leaves), and the other girl remains at the table, investigating its functionality, and various ways of banging the figures together. A boy walks up to the table and starts watching the girl, as does another girl, but their mother comes to pick them up, without their having directly interacted with the table. The second girl manipulates the figures in various ways, at certain times so quickly that they are not fighting in the virtual world. At one point, she arranges the figures systematically on the table and leaves, but approximately two minutes later, the second girl returns to the table, and resumes her individual interaction. To summarize, Engagement and transformation of social relations are enabled by both content and physicality. Initially, the first girl is concerned with the content of the installation, in terms of both the physical boxes and the virtual figures. She immediately grabs the boxes, and starts exploring what she can do with them. Enabled by the physicality of the boxes and shared display, the two girls jointly inquire into the nature of the installation, explore what they can do, and the effects of their actions. The two sets of physical boxes enable the girls to explore the installation, both individually and together. We can observe several forms of social engagement. The first girl starts out on an individual basis, while the second girl watches her, but the second girl joins in, and begins exploring the installation in parallel with the first girl; however, in a few seconds, she switches to interacting with the other girl, by physically banging the boxes together. We also see how the first girl smoothly moves in and out of the situation, as well as the instance in which two other children are briefly spectators of the situation, without being directly engaged in the manipulation of the boxes. Physicality and Content In a direct continuation of the situation discussed above, the second girl starts lifting the figures, playfully exploring the packages of the large red and green figures. She then raises all four figures off the table, while looking for, and calling to her friend, the first girl. The second girl carefully places each figure on the table horizontally, red figures on the right, green figures on the left. The first girl returns to the table and joins her friend, leaning on the table, beginning to explore the packages, and moving the two red figures about. As one more instance of transition from individual to social engagement the two girls enters the figures in combat, moving them quickly against one another. The second girl moves, and quickly lifts the figures, before replacing them on the table. The first girl couples and decouples the red figures, and inspects the package, and bottom of the large figure. The two girls engaging the figures in combat while turning them around and lifting them. The first girl moves to the left of the table, and moves the figures from there, before moving back to the front of the table. The second girl bangs the two small figures together, while the first girl watches passively for ten seconds. The large green and red figures are left untouched.
226
P. Dalsgaard, C. Dindler, and K. Halskov
The two girls couple the figures; the first girl moves the red figures, the second the green figures. The second girl decouples the green figures, and tries to couple the small red and small green figure. Next, the girls couple their figures as they had previously. They start to choreograph the figures’ movements in various ways, then suddenly decouple them, and start banging the small figures together. The figures in the virtual world float in the foreground, and do not fight. The girls’ attention continuously shifts between the physical figures and the display. They resume the combat on the table, moving all the figures around. The second girl lifts the small green figure, and opens its lid. She abruptly removes both the green figures from the table. The first girl follows suit, and removes the red figures, while touching the table with her hand. The second girl also touches the table with her hand. The interaction lasts for about two and a half minutes1. The interaction described above is a complex, dynamic flow of transformations between engagement with physical objects (both the boxes and the table), and play with the boxes. The embodied interaction is enabled by the physicality of the boxes: we can observe moments when one of the girls manipulates a single box, and others when, on her own, she bangs two boxes together, as well as instances when the two girls fight, using the boxes. Sometimes, they seem to focus on the physical boxes, rather than the relationship between the physical boxes and the virtual content. The process moves in and out of modes in which the girls seem to be either playing with the boxes, owing to the well-known and easily recognizable Bionicle figures, and modes where they seem, if not puzzled by the installation, then at least trying to determine how it works, for instance, by seeing what happens when they couple and decouple a small and a large figure. Eventually, they shift from focusing on the figures, to inspecting the table itself.
4 Discussion Our analysis of selected instances and situations of the three cases presented has provided insight into the nature of the dynamics of engaging interaction, in particular, how engagement unfolds as developments in the relationships between the physical, social, cultural, and content-related aspects. In our analyses of the cases, we have employed the concepts of emergence, transformations and relations in order to describe the dynamics of engagement. We have done so to explain how engagement first occurs, how it unfolds and changes over the course of time, and how different elements are connected and affect each other in this process. Comparing the three cases, they all illustrate how individual elements may at times play particularly prominent roles. Looking first at the physical and spatial aspects, the physical movements through the park of Aarhus by Light, and the location of the LEGO Table were essential to the emergence of engaging situations. In all three cases, we observed embodied, playful exploration of the installations. Our observations and interviews also show that content plays a role, even when it is not particularly complex. In the case of the LEGO Table, the familiar Bionicle Universe, with its easily recognizable cast of figures, provides an entry point, 1
This paragraph is an excerpt from [27].
Understanding the Dynamics of Engaging Interaction in Public Spaces
227
enabling children to engage with the installation. In contrast, Aarhus by Light seemed to attract people with the unfamiliarity of the installation. Our example of the use of the Hydroscopes shows how the content, in terms of creating and exploring various kinds of fish, was capable of engaging visitors, but that often, as in our example, transformations occurred where visitors shifted their focus to the physical aspects, or engaged in social activities around the installations. The social dynamics were prominent in all our cases, and it was particularly striking to see how fluidly people moved in and out of various social constellations. The social dynamics seemed to be facilitated by the physical elements for instance, the two sets of Bionicle figures, or the open interaction space in front of Musikhuset. The content of the LEGO table in terms of Bionicle figures also facilitated both individual and social engagement. In the case of the Hydroscopes, we saw how, in one instance, a transformation from individual to social engagement involved a focus on the content of the installation, whereas in another instance, the transformation revolved around the physical qualities of Hydroscopes. In the case of Aarhus by Light, we saw how understanding territorial issues involved a combination of physical and social aspects of engagement. Looking at the cultural elements, all three installations to various degrees broke with the conventions of the locations in which they were installed: At the department store, the children seemed comfortable breaking with the norm of not playing with the toys in the store, whereas one of the men in park of Musikhuset seemed slightly reluctant to reveal to others that he was exploring the installation. In terms of exhibitions, the Hydroscopes challenged the idea that a marine centre is usually a place where you observe fish and read about their characteristics. For some visitors, this transformation in mode, from primarily being observers to being creators, did meet with some reluctance. This was due not only to the design of the installations, but was also evidence of the way institutions embody particular forms of practice; people know what to expect when they go to museums (aquaria), or use public spaces. These expectations are, however, far from unambiguous or stable; in the world of museums, recent years have seen the development and use of various kinds of interactive technologies to support exhibition concepts in which visitors relate to exhibitions through the means of construction and active exploration. Similar developments may be observed in urban areas, where a range of interactive services has become commonplace. Through these developments, the norms and structures of the kinds of activities that are expected are gradually transformed. Apart from providing insights into the dynamic nature of engagement, our cases also illustrate how engagement may evolve through relatively distinct transformations between the social, cultural, and physical elements. In many of our examples, these transformations were sparked by changing social conditions. In the case of the Hydroscopes, not only is the girl’s engagement transformed from individual to social when her family enters the room, this transformation also meant that she re-engaged with the installations in a distinctly social mode. Where the Hydroscopes exemplified how these transformations may be relatively distinct, the LEGO Table and Aarhus by Light examples show how these transformations may be more fluid, as people’s engagement fluctuates among physical, social, or content-related elements. Taken together, our cases demonstrate how we may conceptualize engagement as evolving relationships between physical, social, cultural and content-oriented
228
P. Dalsgaard, C. Dindler, and K. Halskov
elements. Understanding the dynamics of engagement with any given interactive installation entails understanding how these elements are continuously re-shaped and formed into new constellations. Acknowledgments. This research has been funded by the Danish Council for Strategic Research (Digital Urban Living, grant 09-063245).
References 1. Aarseth, E.: Nonlinearity and Literary Theory. In: Landow, G. (ed.) Hyper/Text/Theory. Johns Hopkins University Press, Baltimore (1994) 2. Benford, S., Giannachi, G., Koleva, B., Rodden, T.: From interaction to trajectories: designing coherent journeys through user experiences. In: Proceedings of CHI, pp. 709– 718. ACM, New York (2009) 3. Dalsgaard, P.: Designing for Inquisitive Use. In: Proceedings of DIS 2008, Penn State, Pensylvania (2008) 4. Dalsgaard, P., Dindler, C.: Peepholes as Means of Engagement in Interaction Design. In: Proceedings of Nordes 2009: the Third Nordic Design Research Conference, Oslo, Norway (2009) 5. Dalsgaard, P., Hansen, L.K.: Performing Perception - Staging Aesthetics of Interaction. ACM Transactions on Computer Human Interaction 15(3) (2008) 6. De Certeau, M.: The practice of everyday life. University of California Press, Berkeley (1984) 7. Dewey, J.: Art as Experience. Perigree, New York (1934) 8. Dourish, P.: Where the Action Is: The Foundations of Embodied Interaction. MIT Press, Cambridge (2001) 9. Dunne, A.: Hertzian tales: electronic products, aesthetic experience and critical design. Royal College of Art Research Resrach Publications, London (1999) 10. Edmonds, E., Muller, L., Connell, M.: On creative engagement. Journal of Visual Communication 5(3), 307–322 (2006) 11. Forlizzi, J., Battarbee, K.: Understanding experience in interactive systems. In: DIS 2004: Proceedings of the 5Th Conference On Designing Interactive Systems, pp. 261–268. ACM, New York (2004) 12. Fritsch, J.: Understanding affective engagement as a resource in interaction design. In: Proceedings of Nordes 2009: the Third Nordic Design Research Conference, Oslo, Norway (2009) 13. Garfinkel, H.: Studies in ethnomethodology. Polity Press, Cambridge (1967) 14. Gaver, W.W., Bowers, J., Boucher, A., Gellersen, H., Pennington, S., Schmidt, A., Steed, A., Villars, N., Walker, B.: The drift table: designing for ludic engagement. In: Proceedings of CHI 2004, pp. 885–900. ACM Press, New York (2004) 15. Gedenryd, H.: How Designers Work. Lund University Cognitive Studies, Sverige (1998) 16. Genette, G.: Narrative Discourse: An Essay in Method. Cornell University Press, Ithica (1983) 17. Grudin, J.: Groupware and social dynamics: eight challenges for developers. Communications of the ACM 37(1), 92–105 (1994) 18. Hedegaard, M.: Tænkning, Viden, Udvikling. Aarhus University Publishing, Aarhus (1995)
Understanding the Dynamics of Engaging Interaction in Public Spaces
229
19. Hindmarsh, J., Heath, C., vom Lehn, D., Cleverly, J.: Creating Assemblies: Aboard the Ghost Ship. In: Proceedings of CSCW, pp. 156–165. ACM, New Orleans (2002) 20. Hutchins, E.: Cognition in the wild. MIT Press, Cambridge (1995) 21. Kaltenbrunner, M., Bencina, R.: ReacTIVision: a computer-vision framework for tablebased tangible interaction. In: Proc. Tangible and Embedded Interaction (TEI 2007), pp. 69–74 (2007) 22. Klemmer, S.R., Hartman, B., Takayama, L.: How Bodies Matter: Five Themes for Interaction Design. In: Proceedings of DIS. ACM Press, New York (2006) 23. Manovich, L.: The Language of New Media. MIT Press, Cambridge (2001) 24. Massumi, B.: The Thinking-Feeling of What Happens. In: Massumi, B., Mertins, D., Spuybroek, L., Marres, M., Hubler, C. (eds.) Interact or Die: there is Drama in the Networks, pp. 70–91. NAI Publishers (2007) 25. Mogensen, P.: Towards a provotyping approach in system development. Scandinavian Journal of Information Systems 3, 31–53 (1992) 26. Nardi, B.A. (ed.): Context and consciousness: activity theory and human-computer interaction. MIT Press, Cambridge (1996) 27. Nielsen, R., Fritsch, J., Halskov, K., Brynskov, M.: Out of the Box – Exploring the Richness of Children’s Use of an Interactive Table. In: IDC (2009) 28. O’Hara, K., Perry, M., Churchill, E., Russell, D.: Public and Situated Displays: Social and Interactional Aspects of Shared Display Technologies. Kluwer Academic, Dordrecht (2003) 29. Petersen, M.G., Iversen, O.S., Krogh, P.G., Ludvigsen, M.: Aesthetic interaction: a pragmatist’s aesthetics of interactive systems. In: Proceedings DIS, pp. 269–276. ACM, New York (2004) 30. Rogers, Y.: Moving on from weiser’s vision of calm computing: Engaging ubiComp experiences. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 404– 421. Springer, Heidelberg (2006)
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space Astrid Weiss, Nicole Mirnig, Roland Buchner, Florian Förster, and Manfred Tscheligi ICT&S Center, University of Salzburg, Sigmund-Haffner-Gasse 18, 5020 Salzburg, Austria {firstname.lastname}@sbg.ac.at
Abstract. This paper presents the contextual analysis of the user requirements for a mobile navigation robot in public space. Three human-human interaction studies were conducted in order to gain a holistic understanding of the public space as interaction context for itinerary requests. All three human-human requirement studies were analyzed with respect to retrieve guidelines for human-robot interaction. This empirical work should contribute by: (1) providing recommendations for a communication structure from a communication studies perspective, (2) providing recommendations for navigation principles for humanrobot interaction in public space from a socio-psychological and a HRI perspective, and (3) providing recommendations regarding (confounding) contextual variables from an HCI perspective. Keywords: Human-Robot Interaction, Human-Human Interaction, Public Space, User Study, User Requirement Analysis.
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
231
it involves prospective co-participants.. The work presented in the following was performed with the goal to identify characteristics and properties relevant for successful human-robot interaction in public environments. Three human-human interaction (HHI) studies were conducted in order to gain a holistic understanding of the public space as interaction context for itinerary requests. All three HHI studies were analyzed with respect to retrieve guidelines for humanrobot interaction. Thus, the data of these studies was analyzed in comparison with an HRI study and relevant HRI literature with the aim to inform the design of humanrobot interaction and subsequent user studies. The human-human ways of interaction in public space are assumed to be the gold standard for the most intuitive social interaction in public space and therefore offer the basis for user-centered design in HRI, however, limitations and differences in case of a robotic interaction partner need to be considered.
2 Motivation and Related Work Human-robot interaction in public space is mainly short-term interaction between the user and the robotic system. Relevant aspects to improve this interaction for a proactive robotic system that asks for the way are (1) communication, (2) spatial arrangement, and (3) contextual/situational influence factors. 2.1 HRI and Communication Regarding (natural language) communication in HRI previous research was mainly focused on analyzing speech acts and turns [4] as well as miscommunication as such [5]. With the underlying research presented in this paper, we try to tackle the problem of the restricted speech capabilities of a robot from a different angle in retrieving those factors that influence the successfulness of a dialog. Therefore, both humanhuman and human-robot dialogs were analyzed to develop a set of influencing factors that were rated according to the frequency of their occurrence and correlated with the successfulness of the dialogs. The resulting communication structure including guidelines should provide a helpful means for future research in HRI regarding shortterm interaction in public space. 2.2 Spatial Arrangement and HRI Natural interaction in HRI is typically regarded as the actual interaction of input and output [6]. From a socio-psychological view, however, the interaction between humans starts at a much earlier point, including how people select interaction partners and how humans recognize and approach each other, to finally establish an interaction space [7]. Aiming at natural interaction between humans and robots this prebeginning phase of an interaction has to be taken into account, especially for encounters between unknown entities in public space. Research about the navigation of robots is still strongly orientated towards the design and study of delivery tasks or movement through domestic settings. Hereby, people are treated as dynamic obstacles, resulting in robots that avoid collision and keep certain distances to avoid unpleasant feelings for humans [8]. Possible ways to select a person for starting an
232
A. Weiss et al.
interaction were investigated, as well as how to subsequently approach the person in public space. The results are presented in several guidelines on navigation for natural human-robot interaction in public space. 2.3 Context and HRI The research on contextual HRI was so far mainly focused on situatedness [9] in terms of the setting in which user studies are conducted, whereby field trials are assumed to provide more insights on natural reactions on robotic agents than labbased trials [10]. Within the IURO project, contextual information before the actual interaction between the user and the robot is taken into account, by informing the development process of the robotic system and its interaction model. By means of analyzing videos from human-human studies and a human-robot interaction study we tried to identify relevant context factors (in an HCI understanding of contextual/ubiquitous computing [11]) and to set up a context model for situated HRI in public space. This context model should help to inform the development of the IURO robot and the setup of the user studies, in later stages of the project. 2.4 Bridging HRI in Public Space by Understanding HHI Humans are remarkably good at coordinating their actions with each other to achieve outcomes that are difficult or even impossible to achieve by oneself, such as coordinating trajectories in public space and describing routes to unknown places. Such joint actions require coordination at multiple levels. Individuals must not only agree on a plan of action beforehand [12], but they must also continuously adjust their actions to one another to optimize time and space coordination ([13]; [14]). Understanding the mechanisms underlying human-human joint action has become a major goal in cognitive science during the last years [15]. Previous studies have focused on the role of language as a coordination device (e.g. [16]), on the role of shared representations (e.g. [17]), and on the importance of grounding (e.g. [18]). The results of these studies provide important implications, but also constraints for the development of interactive and pro-active robots that perform tasks together with humans as naturally as possible. Moreover, evidence can be found in literature that humans tend to socially respond to interactive systems [19] and that humans interpret social behavior patterns into animated objects [20]. The conclusions that can be drawn from all these prior findings in various domains and research approaches for the design of a pro-active robot interacting with humans in an unstructured environment are rather limited. We consider an ethnomethodolocially-oriented approach to identify human-human interaction patterns, which can serve as a bridging component in human-robot interaction. Thus, the three HHI studies, which were executed in the requirementgathering phase of the IURO project, had the aim to systematically analyze and link HHI data with the results of previous HRI studies in order to meet the requirements of both parties: the human as responder and the robot as asker. Thereby, guidelines should be developed on how to design the interaction with the IURO robot in terms of: (1) communication, (2) spatial arrangements, and (3) contextual/situational influence.
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
233
3 Exploring Itinerary Requests in Public Space In order to investigate the importance of HHI patterns for HRI and thereby derive design guidelines for the IURO robot, three HHI studies were conducted: Study 1 “Itinerary Requests”, Study 2 – “Pedestrian Selection”, Study 3 – “Interaction Reasons”. Additionally, footage from a field trial with the Autonomous City Explorer (ACE) robot was analyzed to bridge the gap between differences in HHI and HRI. The HHI studies were based on a scenario, in which the IURO robot is sent to a pharmacy to buy medicine and deliver it to a patient. It is assumed that the IURO robot was instructed to buy the medicine at the “Old Pharmacy” which is located at “Old Market No. 6”, in the old town of Salzburg. Figure 1 shows the location where the studies were conducted. Study 1 “Itinerary Requests” started in the street marked with “A”. Study 2 “Pedestrian Selection” and Study 3 “Interaction Reasons” had the additional starting point marked with “B”, as this spot is more densely crowded. As Figure 1 shows, there are several alternative routes to get to the pharmacy. To bridge the gap between HHI and HRI we wanted to compare our results with data gathered in an HRI field trial (conducted in September 2008 in the city center of Munich, Germany) with the Autonomous City Explorer Robot (subsequently called the “ACE” study). ACE is a robot that was developed in a nationally funded pilot study of the IURO project (see Figure 2). The ACE robot interacted via gesture and touch screen input and speech and image output, whereas the interaction was controlled by a finite state machine. The interaction started with the robot greeting a pedestrian and the itinerary request. The robot then asked the pedestrian to point in the direction of the designated goal for establishing a reference point fort he itinerary request. Then pedestrians could indicate further directions via buttons on the touch screen. During the interaction ACE builds a topological route graph from the presented route information. At the robot thanked the pedestrian and followed the route graph (see [21] for further details). In average the interaction lasted 63 seconds (SD: 34.96). The goal for the IURO robot is that speech and gesture are used as input and output.
Fig. 1. The HHI Study Setting
Fig. 2. The “ACE” Study
234
A. Weiss et al.
The procedure for gathering data on interaction and communication in public space was based on “ecological”, “semi-experimental” encounters, initiated by a researcher or a recruited participant, in all three studies (all studies were conducted in German, which will also be the only language with which the speech processing will work in that specific project). This methodology can be found in various comparable studies (e.g. [21][7]) for a discussion of the difference between “ecologically provoked data” and naturalistic data, see [23]. The data corpus of all three studies is constituted by social encounters in public space in the ancient city centre of Salzburg, Austria (pedestrian area). The encounters were video-recorded (with an informed consent allowing the usage of all collected data) and all studies were supplemented by questionnaires and/or interviews. The source material (videos and transcripts) of the “Itinerary Requests”, the “Pedestrian Selection”, and the “ACE” study were coded in NVivo1 by two independent coders using. The predefined coding schemes are explained in more detail in the according section and can be found in [24]. The duration of the footage was about 1 hour for the “ACE” study, 16 minutes for the “Pedestrian Selection” study, and about 4 hours for the “Itinerary Requests” study. Cohen’s Kappa was calculated for the intercoder reliability and all factors with a value above 0.40 were regarded as having sufficient agreement and were thus used for further analyzes [25]. Details on the data analysis and interpretation for the specific areas of interest can be found in the following sections. As the ACE recordings did not offer enough material to explore spatial arrangements, we tried to bridge this gap by HRI literature.
4 Study 1: “Itinerary Requests” In order to find out more about interpersonal communication in short-term interaction in public space, a participatory observational study was conducted to determine the influencing factors that are crucial for the successfulness of a conversation. These influencing factors were divided into two groups: (1) communication factors – all factors that are inherent in the verbal utterances of the dialog and (2) context factors – all factors that relate to the user or the immediate surroundings of where the dialog takes place. The main focus of the study setting lied in the influential communication factors. To guide our research on the communication factors, the following main research question was addressed: RQ1.1: What are the important influencing communication factors for successful human-robot conversation in the context of asking for directions in public space? From a total of 106 pedestrians who were asked during the study, 58 people could correctly describe the way to the “Old Pharmacy”, 3 gave wrong directions (2 of which directed the researcher to a different pharmacy), 21 people did not know the directions to the “Old Pharmacy”, and 18 people did not speak the right language. 26 people did not interact with the researchers at all; the remaining 80 people interacted at a mean interaction time of 19.74 seconds. Those participants, who gave correct 1
www.qsrinternational.com
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
235
directions, interacted with the researchers at an average of 18.02 seconds, whereas the interaction with the participants who gave wrong directions lasted for 22.06 seconds on average (however, the difference is not statistically significant). We could also observe, that 2 people who did not know the pharmacy, tried to explain the way by using a city map or a tourist guide. Furthermore, 2 people were willing to guide the interviewer to the pharmacy. In the end, 41 of the 106 dialogs conformed to our specifications and were included into the following analyses. The results of this study were incorporated in the development of the communication guidelines (see section 7.1) and context influence (see section 7.3). More details on the results can be found in the according sections.
5 Study 2: “Pedestrian Selection” For humans it is a quite common thing to ask an unknown person for information in public space. However, if there are plenty of people around (as in a crowded city centre), how does one decide which person to approach? Depending on which criteria does the person seeking information select a passer-by. The study aimed at revealing the decisive factors that account for which pedestrian is selected to ask for the way. Although not all preliminary factors derived from this study can be directly transferred into an algorithm for the robot architecture, they will inform the interaction model of how the IURO robot could choose and approach pedestrians. As a second focus of interest the preliminary study was set up to investigate how the chosen people are approached in a crowded area. To support the design of navigation and interaction patterns of the IURO robot, the study investigated the following two research questions for human-human interaction: RQ2.1: According to which criteria do people select passers-by to ask for directions? RQ2.2: How is a selected person approached? The study was conducted from June 10th to 18th, 2010. In total, 20 recruited participants (10 male and 10 female) randomly asked pedestrians for the way to the “Old Pharmacy”. The 20 recruited participants needed to approach 47 pedestrians in total to successfully reach the “Old Pharmacy”, with a minimum of 1 and a maximum of 5 passers-by approached. A total of 2 out of 20 participants did not succeed in finding the requested spot. In both cases the task was stopped after 10 minutes. To determine the criteria and characteristics people use to identify and select passers-by, interviews were conducted immediately after the destination had been found. All interviews were transcribed and annotated with an ethnomethodological understanding, subsequently the annotations were grouped into categories. However, this quantification of results should not be considered as a one-to-one functional representation, but as a trend or tendency. In a first step, reasons for approaching a person to ask for directions were extracted. The answers and explanations of the study participants were divided into criteria for selection and exclusion as well as categorized as procedure to select a person (an overview on all 11 selection and exclusion criteria can be found in [24]). Interestingly,
236
A. Weiss et al.
the participants more often named persons they actually excluded as a potential source of information (n=20) - with tourist being the most frequently cited group - than criteria for selected persons (n=10). The reason for this is probably that the old town of Salzburg is crowded with tourists, accounting for most of the passers-by. As the most frequent reasons to choose a person to ask for the itinerary, the participants named “ask a salesman of a shop”. They stated that a salesperson is usually familiar with the surroundings of the shop and very often lives in the town. As second most frequent reason/procedure people named “find some one who is not a tourist” which corresponds with the exclusion reasons. Related to this is “find a local” as an approach. Other approaches were to randomly choose a passer-by, talk to the next person available and to approach a single person. In a second step, all characteristics of the selected passers-by as named by the study participants were analysed and categorized. In total, 16 characteristics were identified plus a category for exclusion criteria (see [24] for details). The most frequently mentioned characteristic described by the study participants was movement (n=10), followed by behaviour (n=8) and dress style (m=7). 9 characteristics that lead to the exclusion of people were identified. Again the most frequently mentioned characteristic was movement (n=7). Three characteristics referred to objects that people carried with them and which lead to an exclusion: camera, suitcases and a hat. All interaction recordings were then analyzed with respect to the direction from which the participant approached the passer-by. Four directions were classified according to the empirical findings. From behind; lateral; diagonal frontal and frontal (see Figure 3). 17 of the approached pedestrians were moving towards the study participant and 16 were standing (most of them were salesmen or waiters in front of their stores or restaurants). Diagonally moving
Fig. 3. Categorized approach directions
pedestrians and sitting participants were only approached three times, respectively and a person walking the same direction as the study participant was only approached once. If one considers the movement of the approached person and the direction of approach, several patterns emerge. Passers-by moving towards a participant were most of the time approached from diagonal frontal, only one participant approached in a frontal manner. Standing passers-by, however, were approached from all four directions, no clear preference could be observed. Finally, we observed how the study participants re-directed their trajectories in order to start an interaction with a passerby. With all the recorded interactions we could identify 3 different types: • Crossing trajectory: The participants approached the person in a way that both projected trajectories crossed. • Directly towards the person: The projected trajectory of the participant directly aimed at the selected passer-by.
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
237
• Trajectories not crossing: The participant approached in a manner that his/her projected trajectory did not interfere with the one of a passer-by. Directly walking towards the person could mainly be observed for passer-by who were either standing or sitting or who held a role such as waiter or salesperson. However, some participants also approached passers-by that were moving towards them in a direct way. On the other hand, crossing trajectories as well as trajectories not crossing could almost only be observed when people were walking towards the participant, and the approach took place from diagonal frontal.
6 Study 3: “Interaction Reasons” Stopping another person in a public space to ask for something is the so-called “opening sequence”, which takes place before the actual start of a social interaction [7]. The mutual spatial arrangement happens in this phase and it concludes with the transition from moving to standing. In this moment the so-called interpellation happens: An unfocused pedestrian turns to a focused “would-be-imminent-coparticipant” [26]. To support and maintain the mutual orientation, people tend to give reasons for the encounter with an unknown person as quick as possible [27]. However, it is still an open question to which degree the moment of mentioning the interaction reason influences the itinerary request situation. Thus, the following two main research questions guided this study. RQ3.1: Does the timing of the reason mentioning influence the interaction? RQ3.2: Can additional characteristics for pedestrian selection be identified? The study was conducted in situ, in the old town of Salzburg from June 17th to 19th, 2010. It was based on social encounters, initiated by the researcher. A recruited student (female, 22 years old) took the initiative and asked for directions. She was confronted with the task of selecting the future addressee and organizing the entry of the co-participant in the interaction in accordance to two different conditions: condition A – subsequent reason, condition B – immediate reason. In condition A, the reason for the encounter was communicated in the pre-beginning before a common interaction space had been established. In condition B, the reason for the encounter was only communicated once an interaction space had been established (i.e. coparticipant stopped walking and mutual gaze is established). In total, the recruited student had to initiate 42 conversations to achieve 10 valid questionnaires. Half of the participants were male and the other half female, their age ranged from 16 to 79 years. The first verbal reaction of the co-participant towards the itinerary request of the recruited student was a confirming feedback statement “yes” (n=4) or a feedback statement combined with a politeness key word “yes, please” (n=4), the other two responses were “hello”. None of these responses were introduced by an “euh” to gain thinking time. To achieve the corpus, 30 trials were necessary in condition A and only 11 in condition B. However, the retained itinerary descriptions were correct in both conditions only twice. In condition A, two co-participants did not know where the
238
A. Weiss et al.
“Old Pharmacy” is located and one co-participant gave a wrong direction. In condition B, a wrong direction was given three times. The study showed that the timing of the reason mentioning influences the interaction. Condition B was the more appealing one for the co-participants. On a retrospective level it was clear for all co-participants in condition B that the reason for stopping them was an itinerary request. In condition A, two co-participants estimated “needing help” and “searching for something” as reason. Only in condition B one co-participant noticed that the recruited student is going to need something before the conversation started and mentioned the “inquiring gaze” as indicator. All co-participants of condition B retrospectively had the feeling that they were a conversation partner directly after the introduction words “Euh, excuse me”. This indicates that the follow up sentence “I need your help” (in condition A) did not foster the transition to a co-participant more than the immediate mentioning of the reason while walking. In condition B, all co-participants mentioned “stop walking” as the starting point for the transition. In condition A, this was only mentioned by two participants. In condition B, all co-participants chose almost all response possibilities how they tried to demonstrate their willingness to support the recruited student in her itinerary request: (1) stopping, (2) approaching, (3) answering, (4) smiling, (5) eye contact, (6) turning towards the asker. In condition A, only answer 1 and 3 was chosen, which indicates that the immediate reason mentioning in condition B supports the willingness of passers-by to become co-participants. Interestingly, the mentioned reason why co-participants wanted to help was the same in both conditions: the co-participants wanted to be polite. But upon what reasons did the recruited student select the pedestrians? Regarding the degree of difficulty for the recruited participant, the reflection form revealed that it was easy to select someone, if there were many people, that it was mediocre if there were only few people, and that it was difficult if there were many tourists frequenting the public space. Regarding the reasons why a specific pedestrian was chosen as an interaction partner by the recruited student, it could be revealed that she frequently changed her selection strategies during the study and tried to ask younger people, older people, male, female etc. Three main cues on which the recruited student mainly focused for the pedestrian selection could be identified: (1) pedestrians who looked friendly, (2) pedestrians who walked goal-oriented, (3) item – signalising that the person is a local.
7 Findings for HRI in Public Space In order to develop guidelines for HRI in public space, we linked the HHI data described in the previous sections with data of the “ACE” study and related HRI literature with the aim to consider both parties during the itinerary request, namely the human and the robot. This data analysis leads to (1) a communication structure, (2) principles on spatial arrangements, and (3) a context model for HRI in public space with the IURO robot, as described in the following. 7.1 A Communication Structure for HRI in Public Space For the development of a communication structure for itinerary requests in public space, we first analyzed the dialogues of study 1 “Itinerary Requests” and subsequently the data
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
239
of the “ACE” study. The results from both data sets taken together should help to get a more profound communication structure that pays respect to both interaction partners – the human and the robot. The following procedure was chosen to analyze the data: First, a coding scheme including the relevant factors that possibly influence the conversational process was developed. Second, two coders coded both data sets independently according to the coding scheme. Third, the results of the coding were analyzed and summarized in terms of a communication structure including guidelines for setting up successful human-robot itinerary requests. All circumstances that possibly influence the course of the conversation and that become manifest mainly in the verbal utterances are considered influencing factors for successful communication. The coding scheme was set up as a hierarchy and the coded data was further analyzed with respect to the factors: gender of the researcher, gender of the pedestrian, politeness2, as well as some basic statistical attributes. The guiding communication model underlying the communication structure presented in this section, is the model of Shannon and Weaver [28], as the model describes the communication process from a technical point of view (not considering semantics), including the concept of the noise source as a reason for unsuccessful communication. Starting from the noise source, a communication structure for successful human-robot interaction including guidelines for future research design was established (also not considering semantics as such, but communication as a process of sending and receiving information). All retrieved factors from the HHI dialogs and the HRI footage were correlated with the attribute “successfulness”, in order to identify which factors help to ask for directions effectively and which factors hinder the positive outcome of the conversation. • “Feedback” was the most frequently occurring positive influencing factor for the successfulness of a dialog, coded in the human-human dialogs . • The factors “coherence” and “incoherence” tie in with “feedback” as, according to the coding scheme, they indicate instances where an utterance either matches or does not match the immediately afterwards following action – which is a form of feedback. So, if an instance is coded as “incoherent” it can be said that there was either no feedback, not sufficient feedback, no matching feedback, or feedback that was not timed well enough. • “Fun” and “insecurity” might help to get a pedestrian to stop and help; “fun” during an interaction could help to enhance the tolerance of the pedestrian towards the robot (e.g. regarding non- or merely partial conformity to human conventions concerning conversation), whereas the absence of fun, curiosity, and tension might lead to an early abortion of the conversation. • Providing the pedestrian with a map might also help to retrieve the desired directions (however, clearly this approach is not part of the IURO project as it is contradicting with the basic assumption that the IURO robot has no map knowledge). 2
in order to distinguish polite from impolite utterances, we identified three different attributes: (1) salutation (entschuldigung- excuse me), (2) request (bitte-please), and (3) subjunctive (könnten Sie – could you).
240
A. Weiss et al.
• People tend to unconsciously treat the autonomously navigating ACE robot like another human being and thus expect their conversational partner to act according to human-human communication conventions. • Even if the condition “politeness” did not result in significant differences regarding the dialogs’ success rate, it will not hurt to make the robot ask politely. Concerning the directions coded at the human-human study, the following conclusions can be drawn on what a robot should be able to do from a user-centered perspective. A robot should: • be able to process verbal directions in terms of what direction to go to, which are potentially completed with a gesture indicating the direction, • be able to recognize landmarks (identify a traffic light, a church, a fountain, etc.), • be able to process explicit distances (100 meters) and implicit distances (for a short way), • be able to interpret confidence in a route description by means of probability weightings of route descriptions. Summarizing the conclusions on the factors and the robot’s abilities concerning directions, the following communication structure for successful short-term humanrobot conversation in public space can be established (see Figure 4): In itinerary requests in public space the human will use verbal directions, gestures, reference points, context information and distance declarations as input modality. The robot should provide timed feedback, in a coherent manner, and offer a city map as output modality, in case verbal communication is not successful. Moreover, if the robot expresses its neediness and the interaction is enjoyable for the pedestrian, it is more likely to be successful.
Fig. 4. Communication structure
7.2 Principles on Spatial Arrangements for HRI in Public Space Most state-of-the-art HRI studies on navigation are conducted in a context in which the user gets familiar with the robot, e.g. a domestic setting [29]. In such a context the user can adapt to the moving patterns of the robotic platform. In public space however, users will very often be novice, not having made any contact with the IURO robot before. This has to be taken into account when designing the robot’s navigation. Usually, the navigation of robots is designed and studied for missions of delivery or
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
241
simple movement, in which people are treated as “dynamic obstacles” [6]. However, if navigation has the purpose of human-robot interaction, different constraints and requirements have to be taken into consideration, namely the initiation of the interaction, the spatial distance, and the coordination of trajectories. In general, we can distinguish between three types of “robot navigation”: (1) autonomous navigation (moving from A to B), (2) navigation with the aim to start an interaction with a human (approaching a human), (3) navigation during the interaction (adaptation of the interactional space). The principles on spatial arrangements derived from the studies 2 “Pedestrian Selection” and 3 “Interaction Reasons”, mainly address the second and partly the third type of robot navigation (a list of all principles can be found in [24]). In the following we present the concrete recommendations for the IURO robot derived by taking together the results from both studies in combination with findings from the related HRI literature. • It will be relevant that the IURO robot is equipped with “eyes” (these “eyes” do not need to be the actual cameras of the robot, but need to be cues that are interpreted as eyes by the user) and let these eyes reflect and follow the gaze of the pedestrian. • It will be relevant that the IURO robot approaches pedestrians lateral or diagonal to initiate an interaction. This approaching direction will also support pedestrians’ in their perception if the robot is (1) heading from A to B, (2) listening to a pedestrian, and (3) noticeable seeking for help (“needy looking”, e.g. when it gets stuck). • It will be relevant that the IURO robot keeps the ideal spatial distance, whereas it needs to be distinguished between (1) approaching distance (between 0.45 and 1.2 meter) and passing distance (assumed to be less than 1.2 meters). Further user studies will be necessary to investigate this aspect. • It will be relevant that the IURO robot navigates in different speeds: lower for passing people and faster for approaching/ selecting people and goal navigation (heading from A to B). This will also foster pedestrians’ perception if the IURO robot needs help or is just navigating. Further user studies will be necessary to investigate this aspect. • It will be helpful that the IURO robot maintains in its position-alignment towards the pedestrian, therefore the user has the feeling that the IURO robot has the same reference point. In other words it will be helpful that the IURO robot keeps the reference point towards the user, therefore they orientate in the same direction during the interaction. Hereby the robot does not need to move the head or the torso, but just move in the same direction as the user, if required. • It will be helpful, that the IURO robot selects pedestrians depending on their walking speed. As the HH-studies showed, pedestrians who walk “faster than average” and “average speed” are locals, who know the environment and can give valid route descriptions. Slowly, walking people are searching for the route themselves or are non-locals, who are sauntering through the streets. Thus, the IURO robot should select people walking “average speed and faster”. Further user studies will be necessary to verify these speeds. • It will be helpful to foster the moment of transition from an unknown pedestrian to an IURO interaction partner in the dialogue, by immediately stating the reason for stopping in the itinerary request, namely “I am searching for xy”.
242
A. Weiss et al.
7.3 A Context Model for Situated HRI in Public Space In order to get information on influencing context factors, the video recordings of three studies were annotated by two independent coders: the ”Itinerary Requests” study, the ”Pedestrian Selection” study, and the ”ACE” study. Before setting up a coding scheme, a model of the human-robot context was developed. Hereby, three entities were considered to be essential for a basic context model: Robot context factors: Factors which directly affect the robot when interacting with a user or trying to find an interaction partner, e.g. if the user is not able to recognize the content on the screen of the robot because of bright sun light. This means that the factor sun negatively influences the interaction between the user and the robot. Human context factors: Factors which affect the human during the interaction, e.g. other people watching the interaction, the user being a foreigner, etc. Environment context factors: Factors which affect the whole interaction between the human and the robot, e.g. cobble stone, narrow space.
Fig. 5. Context factor models for HHI studies “
For the identification of the relevant contextual influence factors for HRI in public space two coding schemes were developed, one for the videos of the “ACE” study and one for the two human-human studies (“Pedestrian Selection” and “Itinerary Requests”), which both consisted of the same four main categories: action sequence, interaction partner, passer-by context factor, and environment context factor. The passer-by context factor tier should depict which factors appeared during the interaction with the robot. Factors influencing the interaction on the robot’s side could also be found in the environment context factors. Figure 5 depicts the context models derived for the “Itinerary Requests” study and the “Pedestrian Selection” study. The models were developed based on the video annotations of the two independent coders3. The numbers next to the arrows visualize the frequency of annotations. The 3
Only codings with a Cohen’s Kappa value greater than or equal to 0.4 were taken into account.
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
243
second model demonstrates the interdependency between user context factors and environment context factors (the intercoder reliabilities were unfortunately too low to see the same effect in the first model). The results from the two human-human studies taken together, the following conclusions can be drawn concerning context factors which are relevant for human-robot interaction in public space. User context factors: • Local: The successfulness of the interaction is highly dependent on the fact if the asked person is a local or not. However, also non-locals were often willing to give advice (e.g. by using their city maps), but their advice was often incorrect, as the location of the “Old Pharmacy” is local knowledge and not included in official tourist guides. Thus, it is important to identify possible locals as interaction partners and to weight the information from a local higher than that of a tourist. • In a group: There is not much difference in the success rate of the itinerary description provided by pedestrians in a group compared to pedestrians who are on their own. The IURO robot does not need to distinguish between pedestrians walking alone or in groups when it is asking for the way. • With an object: The fact if a pedestrian keeps an object with him/herself during the interaction, e.g. a bicycle, a dog or a shopping bag, does not influence the success rate of the interaction. However, it has to be kept in mind that pedestrians can be limited in their interaction possibilities due to the object. • Shortage of time and time-pressure: These aspects were only explored in the “Itinerary Requests” study by means of a questionnaire, but both factors revealed to be influential. Pedestrians, who subjectively perceived time-pressure were less willing to help, than people who had an actual shortage of time. However, if the time was too short, the pedestrians did not even stop for a request. Environment Context Factors: • Shadow: The fact if the itinerary request happened in bright sunlight or in a shadowed area influenced the interaction. It could be noticed several times that pedestrians jointly moved to a shadowed area to give an advice. Moreover, in the ACE study people often had to cover the robots’ display with their hands to improve the bad readability, because of the sunlight. • Passing people: In situations in which many other people walked past, the interaction between the questioner and the respondent was more often successful. This could be due to the fact that “public pressure” forces people to be more supportive during an itinerary request. Itinerary requests more often led to a successful communication if a pedestrian was selected in an environment where other passers-by were present. • Store: People tried to get information from staff members in local stores in order to enhance the chance to get the correct directions. They were standing in front of the entrance of the store (e.g. a waiter in front of the restaurant entrance) or were owners of outdoor stores (e.g. ice cream vendors). Thus, it is recommended that the IURO robot, which is able to pass low doorsills, prefers asking staff members or storeowners for the way instead of passers-by.
244
A. Weiss et al.
Robot Context Factors: Figure 7 visualizes a context model for robot context factors that were derived from the “ACE” study. Again, the numbers next to the arrows show the coding frequencies and only annotations with an intercoder reliability of Cohen’s Kappa greater than or equal to 0.4 were taken into account for the model. • Moving/not moving: More pedestrians were willing to interact with the IURO robot when it was moving, e.g. approaching people, than when it was standing still. Moreover, the success rate was higher when pedestrians interacted with the robot, which proactively approached them. The longest periods of no interaction were when the robot was stuck somewhere and did not move at all. It is advisable that the robot is somehow moving in such a situation to express its “neediness”. • Observing/not observing: Pedestrians paid more attention to the robot while it was moving. This implies that it is helpful to make the robot move slightly during the interaction with the user to increase the humans’ attention. • User: Men were more often willing to interact with the robot in public space. Moreover, their interactions took longer (interestingly in the human-human studies there was no difference in the length of the itinerary dialogues between men and women). Children were also willing to interact with the robot.
Fig. 7. Context factor model “ACE” Study
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
245
8 Conclusion In this paper, we presented the systematic analysis of data derived from three humanhuman observational studies and one human-robot interaction field trial in order to inform the interaction design of a proactive navigation robot in terms of its (1) communication structure, (2) its spatial arrangement principles, and (3) the contextual influence factors. These pilot studies, moreover, have the aim to inform later HRI user studies in the project IURO: Interactive Urban Robot. The IURO project considers a human-robot interaction scenario, in which a robot is injected into a densely crowded public place without any previous topological information and has to find its way to pre-defined places, people or items in quickly changing environments through proactive communication with passers-by. All three HHI studies were set up in accordance to the interaction scenario the robot should perform later on in the project, namely an itinerary request in the city center in order to find the “Old Pharmacy”. We could derive several relevant guidelines and recommendations for the humanrobot interaction scenario, whereas some of them need validation through user studies. Based on the findings presented in this paper and in conjunction with findings on morphology aspects of the robot (considering anthropomorphism and the uncanny valley effect [31] the dialogue model, the global path planning, and the local collision avoidance system for the IURO robot are currently being developed. We are now planning to perform laboratory-based user studies to test the communication structure and the navigation principles on navigation speed and spatial distances in Wizard-of Oz studies. Successive field trials will be conducted to prove the effectiveness of our strategy to inform the human-robot interaction scenario by means of human-human studies. Moreover, the aim of the successive laboratory-based studies and field trials is to identify key aspects that cannot or should not be directly transferred from human-human interaction to human-robot interaction, as they do not bridge the users' cognitive model about the robot and create wrong expectations about its capabilities.
References 1. http://www.iuro-project.eu/ 2. Lee, M., Forlizzi, J., Rybski, P.E., Crabbe, F., Chung, W., Finkle, J., Glaser, E., Kiesler, S.: The snackbot: Documenting the design of a robot for long-term human-robot interaction. In: Proceedings of the 4th ACM/IEEE International Conference on Human Robot interaction, pp. 7–14. ACM, New York (2009) 3. Sung, J., Christensen, H., Grinter, R.: Robots in the wild: Understanding long- term use. In: Proceedings of the 4th ACM/IEEE International Conference on Human Robot interaction, pp. 45–52. ACM, New York (2009) 4. Lauria, S., Bugmann, G., Kyriacou, T., Klein, E.: Instruction Based Learning: How to instruct a personal robot to find HAL. In: Proceedings of the 9th European Workshop on Learning Robots (2001) 5. Koulouri, T., Lauria, S.: A WOz Framework for Exploring Miscommunication in HRI. In: Proceedings of the AISB Symposium on New Frontiers in Human-Robot Interaction (2009)
246
A. Weiss et al.
6. Sisbot, E.A., Alami, R., Simeon, T., Dautenhahn, K., Walters, W., Woods, S., Koay, K.L., Nehaniv, C.: Navigation in the presence of humans. In: 5th IEEE-RAS International Conference on Humanoid Robots (2005) 7. Mondada, L.: Emergent focused interactions in public places: A systematic analysis of the multimodal achievement of a common interactional space. Journal of Pragmatics 41, 1977–1997 (2009) 8. Koay, K., Dautenhahn, K., Woods, S., Walters, M.: Empirical results from using a comfort level device in human-robot interaction studies. In: Proceedings of HRI 2006: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, pp. 194–201. ACM, New York (2006) 9. Suchman, L.: Human-Machine Reconfigurations: Plans and situated actions, 2nd edn. Cambridge University Press, Cambridge (2007) 10. Sabanovic, S., Michalowski, M.P., Simmons, R.: Robots in the wild: Observing humanrobot social interaction outside the lab. In: Proceedings of the 9th International Workshop on Advanced Motion Control (AMC 2006), pp. 596–601 (2006) 11. Dourish, P.: What we talk about when we talk about context. Personal and Ubiquitous Computing 8(1), 19–30 (2004) 12. Clark, H.H.: Language Use. Cambridge University Press, Cambridge (1996) 13. Bosga, J., Meulenbroek, R.G.J.: Joint-action coordination of redundant force contributions in a virtual lifting task. Motor Control 11, 234–257 (2007) 14. Knoblich, G., Jordan, J.S.: Action coordination in groups and individuals: Learning anticipatory control. Journal of Experimental Psychology: Learning, Memory, and Cognition 29(5), 1006–1016 (2003) 15. Galantucci, B., Sebanz, N.: Joint action: Current perspectives. Topics in Cognitive Science 1, 255–259 (2009) 16. Clark, H.H., Krych, M.A.: Speaking while monitoring addressees for understanding. Journal of Memory and Language 50, 62–81 (2004) 17. Sebanz, N., Knoblich, G., Prinz, W.: How two share a task: Corepresenting stimulus– response mappings. Journal of Experimental Psychology: Human Perception and Performance 31(6), 1234–1246 (2005) 18. Ros, R., Lemaignan, S., Sisbot, E.A., Alami, R., Steinwender, J., Hamann, K., Warneken, F.: Which one? Grounding the referent based on efficient human-robot interaction. In: IEEE RO-MAN 2010, pp. 570–575 (2010) 19. Reeves, B., Nass, C.: The media equation: How people treat computers, televisions, and new media like real people and places. Cambridge University Press, New York (1996) 20. Heider, F., Simmel, M.: An experimental study of apparent behavior. American Journal of Psychology 57, 243–249 (1944) 21. Weiss, A., Igelsböck, J., Tscheligi, M., Bauer, A., Kühnlenz, K., Wollherr, D., Buss, M.: Robots asking for directions – the willingness of passers-by to support robots. In: HRI 2010: Proceedings of the 5th ACM/IEEE International Conference on Human Robot Interaction, pp. 23–30. ACM, New York (2010) 22. Barberis, J.-M.: Indiquer son chemin au passant: role cognitif et discursif de orientation generale. In: Barberis, J.-M. (ed.) La Ville. Arts de Faire, Manieres de Dire, Langue et Praxis, pp. 77–97 (1994) 23. Mondada, L.: Deixis spatiale, gestes de pointage et formes de coordination de l’action. In: Barberis, J.-M., Manes-Gallo, M.C. (eds.) Parcours dans la Ville. Les Descriptions D’itineraires Pie´tons. L’Harmattan, Paris, pp. 261–285 (2007)
Transferring Human-Human Interaction Studies to HRI Scenarios in Public Space
247
24. Buchner, R., Förster, F., Mirnig, N., Weiss, A., Tscheligi, M.: Deliverable 2.0@M6: Characteristics and properties of human-robot interaction in public environments. Tech. rep. (2010), http://www.iuro-project.eu/ 25. Byrt, T.: How Good Is That Agreement? Epidemiology, 7, 561 (1996) 26. Althusser, L.: Ideology and Ideological State Apparatuses. In: Lenin and Philosophy and Other Essays. Monthly Review Press (1972) 27. Goffman, E.: Behavior in Public Spaces. Notes on the Social Organization of Gatherings. The Free Press, New York (1963) 28. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, US (1963) 29. Huettenrauch, H., Severinson-Eklundh, K., Green, A., Topp, E.A.: Investigating spatial relationships in human-robot Interaction. In: Proceedings of The IEEE/RSJ International Conference on Intelligent Robots And Systems, IROS 2006 (2006) 30. Pacchierotti, E., Christensen, H., Jensfelt, P.: Evaluation of passing distance for social robots. In: Proceedings of IEEE Workshop on Robot and Human Interactive Communication, RO-MAN 2006 (2006) 31. Förster, F., Weiss, A., Tscheligi, M.: Anthropomorphic design for an interactive urban robot – the right design approach? In: HRI 2011: Proceedings of the 5th Acm/IEEE International Conference on Human-Robot Interaction. ACM, New York (2011)
Comparing Free Hand Menu Techniques for Distant Displays Using Linear, Marking and Finger-Count Menus Gilles Bailly1,2, Robert Walter1, Jörg Müller1, Tongyan Ning1, and Eric Lecolinet2 1
Deutsche Telekom Laboratories, TU Berlin 2 Telecom Paristech – CNRS LTCI {gilles.bailly,robert.walter,hans-joerg.mueller, tongyan.ning}@telekom.de, [email protected]
Abstract. Distant displays such as interactive public displays (IPD) or interactive television (ITV) require new interaction techniques as traditional input devices may be limited or missing in these contexts. Free hand interaction, as sensed with computer vision techniques, presents a promising interaction technique. This paper presents the adaptation of three menu techniques for free hand interaction: Linear menu, Marking menu and FingerCount menu. The first study based on a Wizard-of-Oz protocol focuses on Finger-Counting postures in front of interactive television and public displays. It reveals that participants do not choose the most efficient gestures neither before nor after the experiment. Results are used to develop a Finger-Count recognizer. The second experiment shows that all techniques achieve satisfactory accuracy. It also shows that Finger-Count requires more mental demand than other techniques. Keywords: Finger-Counting, Depth-Camera, Public display, ITV, Menus.
Comparing Free Hand Menu Techniques for Distant Displays
249
user must find the phone and wait for the network connection). Moreover, in the case of interactive television, free hand gestures are convenient for selecting frequent commands as they avoid the need to reach (or search) for the remote control [8]. Many menu techniques have been specifically designed for personal computers [1, 15, 26, 29], mobile devices [12, 24] or tabletops [2, 19], but menu techniques for distant displays have received limited attention and we are not aware of any comparative studies. We propose three menu techniques for free hand interaction: Linear menu, Marking menu and Finger-Count menu. The Linear and Marking menu have similar behavior as their respective versions on personal computers or interactive surfaces: the cursor on the distant display is controlled by moving the hand and the command is selected by closing the hand. The Finger-Count menu is an adaptation of the technique proposed in [2] for multi-touch surfaces. It is a two-handed and multi-finger interaction technique that allows the user to perform commands by expressing a pair of digits with fingers “in the air”: the left hand is used for choosing a category while the right hand serves to select an item in the corresponding category. We performed two experimental evaluations. The first one, which is based on the Wizard-of-Oz protocol, was aimed to to understand how users would perform FingerCount gestures in our free hand scenarios and to develop our Finger-Counting recognizer. The second study compared the three free hand menu techniques in an interactive TV scenario. Results show the accuracy rate of all techniques is above to 93% and that Finger-Count menu requires more mental demand. Our findings are relevant for the design of free hand menu selection for public displays and interactive television.
Fig. 1. Marking menus (Left), and Finger-Count menus (Right) on ITV
2 Related work Menu Techniques. Linear Menus are widely used for exploring and selecting commands in interactive applications. Several alternatives have been proposed for desktops [1, 7, 15, 16, 26 29], mobile devices [24] and interactive surfaces [2, 19]. Marking menus are certainly one of the most famous menu techniques. They combine Pie menus [7] and gestural interaction. In novice mode, the user selects commands in a circular menu. In expert mode, the menu does not appear and the user leaves a trail that is recognized and interpreted by the system. Marking menus are efficient as they
250
G. Bailly et al.
favor the transition from novice to expert usage: users perform the same gesture in both modes [5]. Multi-Stroke menus [29] consist of an improvement of hierarchical Marking menus [16]: users perform a series of simple marks rather than a complex mark. This strategy improves the accuracy and reduces the total of amount of screen space [29]. Distant Displays. Studies on distant displays can be split into two main categories depending if users can use physical remote controls or not. Interactive TV is a typical case where users manipulate a physical remote control. However with the increasing number of services and multimedia data, users are forced to navigate in deep hierarchies or to manipulate remote controls overcrowded with buttons [8]. Performing free hand gestures can serve as a complementary modality for selecting frequent or favorite actions [13, 17]. For instance, a prototype of Marking menus based on computer vision-based hand gestures has been proposed in [17] to control frequent actions. However, this technique requires two specific registration poses, which are not appropriate for novice users. Moreover, this technique has not been experimentally evaluated. Finally, Microsoft recently introduced the Kinect, a combination of a RGB and a low-cost depth camera that enables users to play video games by performing body gestures in front of their TV. While some public displays are multi-touch (e.g. CityWall [23]), this solution is not always appropriate as it forces users to stop walking for interacting. Moreover, some users may refuse to touch the display as it can be dirty. For these reasons, some projects have investigated computer vision to enable interaction with public displays [5, 21]. These projects however do not focus on menu selection. Hand Gesture Interaction. Several interaction techniques based on hand gestures have been proposed [4, 27] especially in the context of virtual environments [10]. However, they generally use expensive and inconvenient hardware such as gloves that are not compatible with practical use. Some studies focus on computer vision based gesture recognition applications in HCI [18, 22]. However they mainly focus on recognition algorithms [17] and only few interaction techniques have been implemented for pointing [6, 25], manipulating data [20, 3] and even less for command selection. Multi-Touch Interaction. Multi-touch interaction and free hand interaction share several similarities as users can use both hands and several fingers [28]. Several interaction techniques exploit multi-touch capabilities [2, 19]. For instance, the MultiTouch Marking menu [19] is a technique that combines a Marking menu and chording gestures for selecting commands in a 2-level menu. The Finger-Count menu [2] is a two-handed and multi-finger technique that only counts the number of finger on the surface. This technique proved efficient on multi-touch surfaces [2] can be considered as a good candidate for free hand interaction, and it does not require distinguishing fingers. While a variety of menu techniques have been investigated for conventional interfaces and interactive surfaces, and a few free hand techniques have been implemented, we are not aware of any comparative studies comparing free hand menu selection techniques such as Linear menu, Marking menu and Finger-Count menu.
Comparing Free Hand Menu Techniques for Distant Displays
251
Table 1. Summary of the main properties of Linear, Marking and Finger-Count menus
Preview Expert mode & Eyes-free selection Fluid transition Direct access Gestures Number of items
Linear Yes No
Marking No Yes
Finger-Count Yes Yes
No No Dynamic About 8x8
Yes No Dynamic About 8x8
Yes Yes Static 5x5=25
3 Menu Techniques We now present the three menu techniques that we designed for distant displays: the Linear menu, Marking menu and Finger-Count menu. Their main properties are summarized in Table 1. 3.1 Linear Menu Users activate the menu by opening their hand, the palm in direction of the distant display. Items are organized vertically. The root menu is displayed on the left side while the submenu is displayed on the right side when a category (parent item) is selected. Users control a cursor by moving their hand “in the air”. As soon as the cursor is over a category the corresponding submenu appears on the right side. To execute the desired command, the user has to “grab” the corresponding item. Users can perform this metaphorical gesture just by closing their hand when the cursor is over the item. We choose this end gesture delimiter rather than a delay to let the user control the system. Besides, delays are often perceived as too fast by novice users and too slow by expert users [9]. Linear Menu Properties. One important feature of Linear menus which is often underestimated is that they make it possible to previsualize submenus [1]. Users can quickly scan the content of submenus just by performing a vertical gesture over categories. As the Linear menu is based on pointing, it requires visual feedback that is not compatible with eyes-free selection. 3.2 Marking Menu Our implementation of a Marking menu [15] is based on the Multi-Stroke menu [29] (Fig. 1, left): The root menu and the submenu are superimposed to avoid that submenus will be displayed outside of the screen [29]. Moreover, users perform two simple strokes rather than a compound stroke to maintain a high level of accuracy [17, 29]. In novice mode, the menu is always displayed in the center of the screen. The cursor is automatically located in the center in the menu when the user opens the hand with the palm in direction of the TV set. So, users select a category in the root menu just by performing a first stroke in direction of the corresponding item and then by “grabbing” it. The corresponding submenu appears at the same location as the root
252
G. Bailly et al.
menu and users execute the same mechanism to select the desired item. In expert mode, the menu does not appear and users only perform two straight strokes. Marking Menu Properties. Marking menus have several advantages. First, they reduce the mean distance for selecting items thanks to their circular layout. Second, they make it possible to perform eyes-free selection as they are not based on positioning. Third, they favor the fluid transition from novice to expert mode as users perform the same gestures in both modes. Moreover, gestures are easy to learn thanks to spatial memory [2]. However, users cannot preview the submenus as menus are superimposed [29]. 3.3 Finger-Count Menu The Finger-Count menu (Fig. 1, right) is an adaptation of Finger-Count shortcuts [2] from multi-touch surfaces to distant displays. The graphical layout is similar to the Linear menu except that the corresponding number of fingers to extend is displayed close to the item. In novice mode, the user selects a category in the root menu by exhibiting the corresponding number of fingers of the left hand and then selects the desired item in the submenu in the same way but with the right hand. The command is executed when the user closes both hands simultaneously. In expert mode, the user performs the same gestures except that the menu does not appear. Finger-Count Properties. Finger-Count menus also have several advantages. First, these gestures are natural: users interact with the system like basketball referees communicate with administration for signaling the number of the player called for foul: just by exhibiting fingers on each hand. Second, users can scan the different categories just by adding/removing fingers of the left hand. Third, users can perform eyes-free selection as they do not need the visual modality to show a given number of fingers. As for Marking menus, this technique favors the fluid transition from novice to expert usage as users perform the same gestures in both modes. Moreover, users have direct access to commands: experienced users can simultaneously exhibit fingers on both hands if they already know where the desired item is located. Contrary to Linear or Marking menus, users do not need to perform a dynamic gesture, a simple posture is sufficient to be recognized and interpreted by the system. Finally, it is worth noting that this technique is little affected by counting behaviors: only the number of fingers is taken into account, not how digits are mapped to fingers. Moreover Asian people use different finger-counting methods than Western people except for digits from 1 to 5 (differences appear for 6 or more).
4 Study 1: Hand Posture for Finger-Count The experiment is based on a “Wizard-of-Oz (WoZ)” prototype. Users were led to believe that they interacted with a fully implemented system, while the display was in fact controlled by a wizard in another room, observing the user through cameras. We used this methodology to observe and understand how users would naturally perform gestures using a distant display. We did not study gestures for Linear and Marking menus because 1) pointing is quite common, 2) it is already implemented in commercial products and 3)
Comparing Free Hand Menu Techniques for Distant Displays
253
implementing a Wizard-of-Oz protocol for a technique with direct feedback would be quite difficult. For these reasons, we only focused on finger-counting gestures, where the best hand posture is not known. The purpose of the first study was 1) to find which hand posture users would naturally choose and prefer 2) to identify the best position for the camera to recognize this posture and 3) to optimize our computer vision algorithm according to the resulting perspective. We suspected that the context of use, e.g. interactive TV (users may lean back in a sofa) vs. public display (users stand in front of the device) can strongly impact how users perform gestures. Accordingly, we decided to compare Finger-Count hand postures for these two scenarios. Through an informal pre-study the following postures (Fig. 2) have been identified as promising and were tested during the experiment: •
Interactive TV scenario (sitting): o Palm: Palm facing the display o Back: Back of hand facing the display o Fingertips: Fingertips towards the display. • Interactive public display scenario (standing): o Palm: Palm facing the display o Back: Back of hand facing the display o Down: Arms hanging loosely, back of hand facing the display.
Fig. 2. Palm (ITV), Back (ITV), Fingertips (ITV), Palm (IPD), Back (IPD), Down (IPD)
4.1 Experimental Design The stimulus consisted in displaying two digits on the display. Once the stimulus appeared, the participant could show the corresponding number of fingers of each hand. Feedback occurred as soon as the wizard recognized a valid posture. For each scenario (ITV vs. IPD), the 3 different hand postures where assigned to 3 different blocks which were counterbalanced between participants with a Latin square design. We also introduced a first and last block where participants could choose a hand posture freely. The first block was used to observe which hand posture users would choose naturally without instruction. The last block was used to know which hand posture is preferred after the experiment. So in total, each participant performed 5 blocks of 5x5=25 selections (all finger combinations). 10 European participants (age 22-26, mean 25.3) were recruited from a HCI lecture and assigned to the
254
G. Bailly et al.
ITV/IPD conditions randomly (5 for each condition). At the end of the experiment, participants filled out NASA TLX questionnaires to evaluate the mental and physical workload (100-point scale) for each technique, stated their preferred posture, and a short semi-structured interview was conducted. The display was a 52” display in landscape format with 1920x1080px resolution. A camera (Microsoft Kinect) was installed directly below the display. In the ITV condition the participant was sitting on a sofa at a distance of 1.6m from the display (recommended viewing distance1). In the public display condition, the user was standing at a distance of 1.6m from the display. 4.2 Results Intuitively Preferred Hand Posture. In the ITV condition, all five participants started (1st trial of the 1st block) with the palm posture. In the IPD condition, four participants started with the palm posture, while one participant started with the back posture. In ITV condition, for the 1st block all participants always chose the palm posture. In IPD condition, the palm posture was chosen the most often (67 times in total), but followed by the back posture (34 times) and mixed postures (different postures of left and right hand, 24 times). Preferred Hand Posture after Experience. In the 5th block in the ITV condition, the palm posture was chosen 53 times, followed by fingertips posture (45 times), mixed postures (19 times) and the back posture (8 times). In the IPD condition, the most popular posture was the back posture (54 times), followed by the down posture (38 times), the palm posture (19) and the mixed postures (14). Workload. In the ITV condition, the mental workload of the back posture was significantly higher than for the palm posture (51 vs. 41, p<0.05). Similarly, the physical workload of the back posture was significantly higher than for the fingertips posture (62 vs. 26, p<0.05). In IPD condition, palm posture scored worst. The mental workload was significantly higher than both back and down postures (26 vs. 21 vs. 20, p<0.05). The physical and temporal workload of palm posture was higher than down posture (45 vs. 18 and 38 vs. 16, p<0.05). Frustration of palm posture was higher than for down posture (37 vs. 12, p<0.05). Preference. For sitting, the fingertip posture was preferred, followed by the back and the palm posture. For standing, the back posture was preferred, followed by the down posture and the palm posture. Observation. In general, we could observe four different strategies of using FingerCount. Mostly, people took their hands down between trials, and opened the respecting number of fingers while lifting their hands (for palm and back gestures). Sometimes, they first opened the correct number of fingers and only then lifted their hands. Sometimes, they left their hands lifted and closed all fingers between trials, and a few participants sometimes only changed the number of fingers between trials. In most cases, participants showed the fingers of the left hand first. Sometimes, especially when the same number of fingers were shown on both hands, both hands were shown synchronously. Only one user sometimes showed his right hand first. 1
Comparing Free Hand Menu Techniques for Distant Displays
255
Rarely, participants had to correct the number of shown fingers, and in very few cases they actively looked at their hands. While for 1, 2, 4 and 5 fingers gestures were relatively consistent, users seemed to be unsure about which fingers to show when showing 3 fingers. 4.3 Discussion There are two interesting observations from this study. First is, while almost all users initially used the palm posture, this is not the most efficient one. Second is, while for the ITV condition, the palm posture is chosen most often, for IPD condition, the back posture is chosen most often (after training). However, while for ITV the palm posture is chosen most often, it is also the least preferred technique, but requires less mental demand than e.g. the fingertip posture. While both the fingertips and down postures required less physical demand from the participants, they were not performed very often. We believe that this may be because participants prefer a certain expressivity of their gesture towards the display. Even for the down gesture, users did usually not let their arm hang loose as indicated in the instructions, but rather moved their arms slightly forward towards the display. Similarly, for the fingertips gesture, users often expressively lifted their arms from the legs and pointed towards the display. This may be either because users believe that their gestures are recognized better when hands are moved towards the display, or because they feel more comfortable when it is obvious for bystanders that their gestures are directed towards a display. From the results of this study, we decided our implementation should recognize both palm and back gestures, so users could use both of them. Even if the palm posture is not the preferred nor the most efficient, it is chosen most often by participants both before and after training.
5 Implementation We employed the recent and low-cost Microsoft Kinect2 sensor which contains a depth camera providing 11 bit depth images in VGA resolution at a rate of 30 Hz. The server and client communicate via the widespread multi-touch protocol TUIO3. Our recognition software uses the OpenNI framework4 including the PrimeSense NITE middleware5 for hand tracking and OpenCV library6 for computer vision algorithms. Linear and Marking Menu. The hand of the user is tracked in 3D space during the whole session. We use the point tracking capabilities of the NITE middleware, which is initialized by a focus gesture, which is a wave gesture in our experiment. The grabbing gesture is recognized by analyzing the contour of the user's hand (Fig. 3). To get the contour of the hand, we use the x,y position of the tracked hand point to 2
segment the hand contour from the depth image by isolating the object that is within a depth range of 10 cm around the tracked point. We then determine the ratio between the area of the hand contour and the area of its convex hull. If the ratio exceeds 80%, we assume the hand is closed, otherwise it is opened. Motion blur artifacts in the depth image may lead to incorrectly recognized grabbing gestures if the hand is moving too quickly. So we used a minimal grabbing time of 500 ms increasing the overall time required for one selection but avoiding false positive detections. Finger-Count Menu. Contrary to Linear and Marking menus, counting fingers does not require hand tracking, hence no focus gesture is required. To count the fingers, we first isolate hands from the depth image by using a fixed threshold of 75 cm (see Fig. 3). This means that users must slightly extend their arms in the direction of the screen in order to interact with the system. In order to count the fingers, the contour of the hand is processed in the following steps (Fig. 3):
Fig. 3. Left: depth map; middle: isolated hands; right: fingertip detection: hand contour (black), simplified hand contour (gray), fingertips (gray circles), vertices marked with 4 or 5 were removed from the list of fingertips in the corresponding processing step
1. 2. 3. 4. 5.
Approximation of the hand contour using the Douglas–Peucker algorithm Determine the convex hull of the simplified contour Consider all vertices of contour of step 1 that are also contained in the convex hull to be fingertips Remove all vertices that hold large interior angles in the contour from step 1 from the list of fingertips (threshold of 57.3°) Remove all vertices that are in the lower 10% of the hand contour from the list of fingertips.
The limited resolution of the depth camera and the amount of noise make the detection of fingertips a little bit unstable. Since false positives and false negatives on fingertip detection may occur randomly, we decided to apply a histogram based smoothing technique. For each frame (every 33ms), a new value of counted fingers is written into a 60 elements circular buffer. Then we use the buffer to identify stable states by finding the most frequent values over the last period of time. This filter adds a delay of 1 to 2 seconds before a new count can be recognized depending on the amount of image noise. Future improved depth sensors would enable faster recognition.
Comparing Free Hand Menu Techniques for Distant Displays
257
6 Study: Menu Techniques Comparison The goal of this experiment was to compare the efficiency of three menu techniques for distant displays: the classical Linear menu, the Marking menu and the FingerCount menu. Each menu was tested for novice and expert behavior. For this study we decided to focus on the interactive TV (i.e. sitting) condition. 6.1 Experimental Design We used a 46” full-HD LCD display in landscape orientation. The distance between the user and the display was 1.6m. For the Finger-Count condition, the distance between the Kinect and the user was 1m. In the other conditions, the distance was 1.5m. Task, Stimulus and User’s Behavior. The task consisted in selecting an item in a 5x5 menu hierarchy. We used the same labels as in [2], e.g., “Shape” for category and “Line” for item. The feedback was a green/red square for correct/incorrect selection. We found interesting to evaluate menu efficiency for two types of user’s behavior: • •
Novice behavior. Users with novice behavior do not know the exact location of items: To simulate this behavior, the stimulus consists in showing the name of the target: users must navigate in the hierarchy to find and select the target. Expert behavior. Users know where the desired command is located and how to select it. To simulate this behavior, all necessary information is displayed: the target category and the target item are highlighted in blue to indicate the path. Moreover, the gestures for Marking menus and Finger-Count are displayed: 2 strokes for Marking menus and 2 digits for Finger-Count.
Procedure and Design. 12 European participants (aged 24-36, mean 28) were recruited from a subject pool with people of various professions and computer experience. They were briefed (~5 min) with written explanations and videos of the techniques. We also allowed them practicing in order to be sure they understood how to use the techniques. For each behavior, participants had to perform 2 blocks of 25 selections. Novice and expert behavior were evaluated in this order, familiarity increasing with blocks. After the experiment, participants filled out NASA TLX and SUS (System Usability Scale) questionnaires for each menu technique, stated their preferred technique, and a semi-structured interview was conducted. We used three sets of contents to avoid learning effects between techniques. The order of techniques and sets were counterbalanced across participants with a Latin Square design. For each block, the order of appearance of items was randomized. The independent variables of the study were menus and behavior. Dependent variables were speed, accuracy, workload (NASA TLX) and usability (SUS). To sum up, the experiment involved: 14 participants x 3 menus x 2 behaviors x 2 blocks x 25 items = 4200 selections. 6.2 Results Accuracy. The accuracy was 94.2% for Linear menu, 95.3% for Marking and 93.4% for Finger-Count (Fig. 4). There were no significant differences in accuracy between techniques or user’s behavior.
258
G. Bailly et al.
Speed. Completion time was measured as the time from when the stimulus appeared to the time when the item was activated [1, 2, 29] (in our case as soon as the system recognized a "grabbing gesture"). There was an interaction effect for menu technique and novice/expert mode (ANOVA, F2,22=28 p<0.01) shown in Fig. 5. A post-hoc Tukey test revealed that in novice mode, Linear (6.6s) is significantly faster than Marking (7.2s), which is significantly faster than Finger-Count (8.5s). In expert mode, Linear (5.4s) is still significantly faster than Marking (5.8s) (p<0.05). The mean selection time for Finger-Count is 5.7s.
Fig. 4. Accuracy by technique and user’s behavior (95% confidence interval indicated)
Fig. 5. Completion time by technique and user’s behavior (95% confidence interval indicated)
Questionaires. A GLM repeated measures test on NASA TLX data revealed no significant effects on all questions except for mental demand (F2,22 = 4,9, p< 0.03). A post-hoc t-test revealed that Linear menu (24.4) required significantly less mental demand than Finger-Count menu (44.6). The mental demand of Marking menus was 28.1. A GML repeated measures test revealed no significant effect on usability (SUS) for techniques. Participants stated they can learn techniques very quickly (Linear: 4.6/5; Marking: 4.3/5; Finger-Count:4,2/5) and found all techniques easy to use (Linear: 4.3/5; Marking: 3.8/5; Finger-Count:3.7/5). Finally, a Friedman test (ChiSquare=4.77, df=2, p>0.05) revealed no significant effect on techniques ranking. 7 Participants chose the Linear menu as their favorite technique, 3 the Finger-Count and 2 the Marking menu. Observations. We observed that there were great differences in the ability of users to express Finger-Count gestures: while some users could easily express all gestures, others had surprising difficulties for moving their fingers. Furthermore, our recognizer required fingers to be clearly separated while it can be bio-mechanically difficult to strongly separate middle and ring fingers. We also observed that some users seemed to be unsure about which fingers to show when showing 3 fingers.
Comparing Free Hand Menu Techniques for Distant Displays
259
Finally, we observed that most of the users had difficulties to move their hand in a 2D-plane when interacting with Linear and Marking menu: they generally moved their hand in a hemi-spherical area and sometimes accidently left the threshold area of the recognizer.
7 Discussion Accuracy. Results show that free hand gestures can be accurately performed during our experiment. Indeed, the three interaction techniques provide an accuracy rate superior to 93%. While the Kinect is already used for hand tracking in Xbox games, our experiment reveals that it can also be used to count fingers with our algorithm. Speed. All techniques are much slower than their interactive surface pendants. Selection time for Linear menus was 5.3s compared to 2.0s in [2], Finger-Count 5.7s (1.8s in [2]), and Marking menu 5.8s (2.4s in [2]). There were small differences between techniques for expert users with a small advantage for Linear menus. This result can seem surprising as Marking and Finger-Count have been proved faster than Linear menus [2, 7]. A more detailed analysis of the implementation of Linear menu can partially explain this result. Contrary to common Linear menu implementations, all submenus share the same location to reduce the real-estate used to display the menu. So, for the third category, the submenu is placed at mid-height of the parent item. This improvement, which was proposed in [26], decreases the average distance for reaching items and thus decreases the mean selection time according to the Fitts' law [11]. A deeper analysis also reveals that the Marking and Finger-Count performances are underestimated. Indeed, the Marking menu uses two grabbing gestures requiring 0.5s while the Linear menu needs only one. Similarly, FingerCount uses a filter which causes a delay of about 1s to compensate for the noise and the low resolution of the camera. Better camera hardware or algorithms can improve performance of these two techniques. Finger-Count. Several participants mentioned that Finger-Count requires more mental demand than other techniques through the NASA TLX questionnaire or the open discussion. However, Finger-Count gestures did not require a high mental demand in the first study. One reason may be that the task in the first study was easier as it did not imply menu selection. Moreover, in the first study, the recognizer was “perfect” and users never needed to make adjustments. Finally, free hand Finger-Counting seems to require more mental demand than the original version on multi-touch surfaces. In this latter case, users only need to touch the surface with the correct number of fingers and it does not matter which fingers are used. While this is also true for free hand counting, certain fingers must be extended and others folded. This makes the movement more difficult than just slightly moving the fingers to touch a surface. Due to cognitive and bio-mechanical constraints, certain finger postures may be difficult to perform. Distant Displays. In the context of ITV, free hand gestures cannot replace the physical remote control as tapping on buttons will remain easier and faster. However, several users mentioned that they would like to use free hand gestures as a complementary tool especially for selecting “favorite actions” or “switch lamps”, for example. They
260
G. Bailly et al.
also mentioned that it is beneficial if they do not need to look for the TV remote control nor to move to reach it on the coffee table. One participant also mentioned “only the guy with the physical remote control can interact with the TV; with free hand interaction everyone has the power”. As no technique is significantly preferred, we recommend to let the choice to the user to configure the techniques that s/he wants to use. In the context of public displays, people can walk, making pointing or directional gestures difficult to recognize by the system. The Finger-Count technique seems promising in this context as it is only based on posture and thus compatible with passing-by interaction. However, the high mental demand mentioned by participants is not compatible with immediate usability. So, we recommend to use Linear menus on public displays for novice users but to let the possibility for expert users to perform finger-count gestures as these two techniques are compatible.
8 Conclusion In this paper, we presented and evaluated three menu techniques for interacting with distant displays using depth cameras: Linear, Marking, and Finger-Count menus. In a first study, we compared different hand postures for Finger-Count. While for an interactive TV scenario, palm posture seems suitable and intuitive, for a public display scenario back posture seems much better suited. In a second study, we compared Linear, Marking and Finger-Count menus. While all three techniques achieve satisfactory accuracy, they seem to be much slower than their multi-touch counterparts or remote controls. There are relatively little differences between the techniques: Linear menus are slightly faster than Marking menus which are faster than Finger-Count in novice mode, while Finger-Count lies between Linear and Marking menus in expert mode. We believe that free hand menu techniques can be a valuable complement to touch and remote controls, and it may be best to leave users a choice for their individually preferred technique. For future work, it would be important to decrease the delay introduced by our filtering techniques. This could mainly be achieved by using a depth camera with higher resolution and less noise, and improved recognition algorithms. Further, we have evaluated menu selection in an interactive TV scenario, such that evaluation in an (outdoor) public display scenario would be a next step. Finally, we would like to deeper investigate the impact of age (elderly people or children) on the acceptance of these techniques. Acknowledgements. This work was supported by the Alexander von Humboldt Foundation and the QUAERO project. We thank H. Maktoufi, F. Alt and A. Roudaut.
References 1. Bailly, G., Lecolinet, E., Nigay, L.: Wave menus: improving the novice mode of hierarchical marking menus. In: Baranauskas, C., Abascal, J., Barbosa, S.D.J. (eds.) INTERACT 2007. LNCS, vol. 4662, pp. 475–488. Springer, Heidelberg (2007)
Comparing Free Hand Menu Techniques for Distant Displays
261
2. Bailly, G., Lecolinet, E., Guiard, Y.: Finger-count & radial-stroke shortcuts: 2 techniques for augmenting linear menus on multi-touch surfaces. In: Proc. CHI 2010, pp. 591–594. ACM, NY (2010) 3. Benko, H., Wilson, A.D.: Pinch-the-sky dome: freehand multi-point interactions with immersive omni-directional data. In: Proc. CHI EA 2010, pp. 3045–3050. ACM, NY (2010) 4. Baudel, T., Beaudouin-Lafon, M., Charade, M.: Remote control of objects using free-hand gestures. Commun. ACM 36(7), 28–35 (1993) 5. Beyer, G., Alt, F., Müller, J., Schmidt, A., Haulsen, I., Klose, S., Isakovic, K., Schiewe, M.: Audience Behavior around Large Interactive Cylindrical Screens. In: Proc. CHI 2011. ACM, NY (2011) 6. Bolt, B.R.: ”Put-that-there”: Voice and gesture at the graphics interface. In: ACM SIGGRAPH Computer Graphics, vol. 14(3), pp. 262–270. ACM, NY (1980) 7. Callahan, J., Hopkins, D., Weiser, M., Shneiderman, B.: An empirical comparison of pie vs. linear menus. In: Proc. CHI 1988, pp. 95–100. ACM, NY (1988) 8. Cesar, P., Chorianopoulos, K.: The evolution of TV systems, content and users toward interactivity. Now Foundation and Trends in HCI 2(4), 373–395 (2009) 9. Cockburn, A., Gutwin, C., Greenberg, S.: A predictive model of menu performance. In: Proc.of CHI 2007, pp. 627–636. ACM, NY (2007) 10. Dachselt, R., Hübner, A.: Virtual Environments: Three-dimensional menus: A survey and taxonomy. IEEE Computers & Graphics 31(1), 53–65 (2007) 11. Fitts, P.M.: The Information Capacity of The Human Mote; System in Controlling The Amplitude of Movement. Journal of Experimental Psychology 47, 381–391 (1954) 12. Francone, j., Bailly, G., Lecolinet, E., Mandran, N., Nigay, L.: Wavelet menus on handheld devices: stacking metaphor for novice mode and eyes-free selection for expert mode. In: Proc. AVI 2010, pp. 173–180. ACM, NY (2010) 13. Freeman, W.T., Weissman, C.: Television control by hand gestures. In: Int. Workshop on Automatic Face- and Gesture-Recognition, pp. 179–183. IEEE, Los Alamitos (1995) 14. Jones, E., Alexander, J., Andreas Andreou, A., Irani, P., Subramanian, S.: GesText: accelerometer-based gestural text-entry systems. In: Proc. of CHI 2010, pp. 2173–2182. ACM, NY (2010) 15. Kurtenbach, G., Buxton, W.: The limits of expert performance using hierarchic marking menus. In: Proc. of INTERACT 1993 and CHI 1993, pp. 482–487. ACM, NY (1993) 16. Kurtenbach, G., Buxton, W.: User learning and performance with marking menus. In: Proc. ACM CHI 1994, pp. 258–264 (1994) 17. Lenman, S., Bretzner, L., Thuresson, B.: Using marking menus to develop command sets for computer vision based hand gesture interfaces. In: Proc. Of NordiCHI 2002, pp. 239– 242. ACM, NY (2002) 18. Lenman, S., Bretzner, L., Thuresson, B.: Computer Vision Based Recognition of Hand Gestures for Human-Computer Interaction. Technical report TRITANA-D0209 (2002) 19. Lepinski, g.J., Grossman, T., Fitzmaurice, G.: The design and evaluation of multitouch marking menus. In: Proc. of CHI 2010, pp. 2233–2242. ACM, NY (2010) 20. Malik, S., Ranjan, A., Balakrishnan, R.: Interacting with large displays from a distance with vision-tracked multi-finger gestural input. In: Proc. UIST 2005, pp. 43–52. ACM, NY (2005) 21. Michelis, D., Müller, J.: The Audience Funnel: Observations of Gesture Based Interaction with multiple large displays in a city center. Int. Journal of HCI (to appear)
262
G. Bailly et al.
22. Pavlovic, V., Sharma, R., Huang, T.S.: Visual Interpretation of Hand Gestures for HumanComputer Interaction: A Review. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 677–695 (1997) 23. Peltonen, P., Kurvinen, E., Salovaara, A., Jacucci, G., Ilmonen, T., Evans, J., Oulasvirta, A., Saarikko, P.: It’s Mine, Don’t Touch!: interactions at a large multi-touch display in a city centre. In: Proc. CHI 2008, pp. 1285–1294. ACM, NY (2008) 24. Roudaut, A., Bailly, G., Lecolinet, E., Nigay, L.: Leaf menus: Linear menus with stroke shortcuts for small handheld devices. In: Gross, T., Gulliksen, J., Kotzé, P., Oestreicher, L., Palanque, P., Prates, R.O., Winckler, M. (eds.) INTERACT 2009. LNCS, vol. 5726, pp. 616–619. Springer, Heidelberg (2009) 25. Schick, A., van de Camp, F., Ijsselmuiden, F., Stiefelhagen, R.: Extending touch: towards interaction with large-scale surfaces. In: Proc. of ITS 2009, pp. 117–124. ACM, NY (2009) 26. Tanvir, E., Cullen, J., Irani, P., Cockburn, A.: AAMU: adaptive activation area menus for improving selection in cascading pull-down menus. In: Proc. of CHI 2008, pp. 1381– 1384. ACM, NY (2008) 27. Vogel, D.: Balakrishnan. R. Interactive public ambient displays: transitioning from implicit to explicit, public to personal, interaction with multiple users. In: Proc. of UIST 2004, pp. 137–146. ACM, NY (2004) 28. Wu, M., Balakrishnan, R.: Multi-finger and whole hand gestural interaction techniques for multi-user tabletop displays. In: Proc. of UIST 2003, pp. 193–202. ACM, NY (2003) 29. Zhao, S., Balakrishnan, R.: Simple vs. compound mark hierarchical marking menus. In: Proc. UIST 2004, pp. 33–42. ACM, NY (2004)
Design and Evaluation of an Ambient Display to Support Time Management during Meetings Valentina Occhialini, Harm van Essen, and Berry Eggen Intelligent Lighting Institute, Department of Industrial Design, Eindhoven University of Technology, PO Box 513, NL 5600MB, Eindhoven, The Netherlands [email protected], {h.a.v.essen,j.h.eggen}@tue.nl
Abstract. An explorative research to investigate the opportunities of using light as a communication medium to provide peripheral information is presented. An innovative ambient display, using dynamic light patterns on the walls of the meeting room to support time management during meetings has been developed. Designed according to the principles of calm technology and information decoration, the system seeks for a balance between aesthetical and informational quality. Two prototypes were created and qualitative research methods are used to evaluate the concept and the efficacy of light in conveying information. The results confirm the value of our concept by showing an appreciation of the usefulness and a good level of comprehension of the users towards the system. The project led to insightful considerations on design guidelines and recommendations for further development of ambient displays to use light to convey abstract information in a subtle, unobtrusive way. Keywords: Adaptive Interfaces, Ambient Display, Information Decoration, Novel User Interfaces and Interaction Techniques, Aesthetic Design.
a novelty in comparison to existing lighting solutions, since it combines decorative and informative aspects, resulting both pleasant and helpful for the potential users. A successful ambient display should provide relevant information in appropriate locations [17][25]. Although several work-related activities could be supported by our system, for this study we choose to focus on meeting situations. Meetings are essential activities within a working environment. Studies revealed that the number of meetings people attend and the time they spend on them dramatically increased in the past years. Although in an initial user study we found several issues (e.g. procedural issues, emotional issues, hierarchical issues, level of intimacy etc.) time management was indicated as a crucial issue within a meeting situation. One of the main causes of dissatisfaction among meetings’ participants is the duration of the meeting, often too lengthy [27]. Meetings often take longer than planned due to prolonged discussions and little control on individual contributions. This frequently leads to an unequal distribution of time, with the first speakers using more time than expected, and the last ones hurrying up due to the end of the meeting approaching. Moreover, different studies revealed that personal time estimation decreases in accuracy when individuals are simultaneously engaged in a cognitively demanding task [24][39]. This is often the case during meetings, when people are involved in presentations and discussions. Even when a moderator is in charge to guarantee the schedule is respected, effectively managing each transition is not effortless. Technology can offer support for pacing and timing a meeting, by providing interfaces that extend human capabilities. In addition, Suchman’s situated action [30] and Gibson’s ecological approach to perception [9] has shown that contextual information can support individuals to carry out their task without the necessity to infer it cognitively. Therefore, an ambient display that makes people aware of their time-consumption in respect of the overall time scheduled for a meeting, could rise meetings’ efficiency in two ways: first, real-time indications would help each participant to better manage individual contributions, adjusting the speech to fit the schedule; second, being aware of consuming other’s time would invite individuals to be more accountable on their use of common time. Also, time awareness information can support decision making and enhance group coordination by providing people with contextual cues that can be used to guide personal actions. To be able to investigate this in context, we developed an adaptive ambient display [36] that uses dynamic light patterns to support time management during meetings. Our study relates to different research areas, since it combines principles from information decoration [6] and calm technology [34], with the insights provided by cognitive psychology and information visualization. Moreover, we followed an iterative user centered design cycle, involving end-users at all phases of the study. The next sections provide detailed information on specific phases: section 2 presents earlier work and background knowledge on information decoration and ambient displays. Section 3 introduces the concept and design requirements, while section 4 describes two cycles of prototype implementation and the evaluation results. Finally, section 5 discusses the main outcomes and possible directions for the future development. The explorative nature of this project provided insightful considerations for possible interactions with ambient intelligent environments, and preliminary guidelines to use light as an information source in concrete user setups.
Design and Evaluation of an Ambient Display to Support Time Management
265
2 Information Decoration, Ambient Displays, and Related Work The main goal of Information Decoration [6] is to design awareness systems that provide contextual information to their users without overburdening them. These systems seamlessly merge with the physical environment and seek for a balance between informational and aesthetic quality. The information is presented by means of ambient displays [36], based on the principles of calm technology [34] and ubiquitous computing [35]. Although providing an exhaustive definition of ambient displays is difficult, these can be described as systems that display important, but not critical information; are embedded in the environment; exploit the ability of users to move their focus of attention from the periphery to the center and back; use subtle changes to reflect updates in information without distracting the user from his primary task; are aesthetically pleasing and environmentally appropriate [22]; and might use different modality (visual, sound, smell etc.) to portray the information [36]. Several attempts were made to provide guidelines for the design [17][22][31] and the evaluation [16][21] of ambient displays. Pousman and Stasko [22] compared several systems and developed a taxonomy based on four design dimensions: ─ Information Capacity the number of discrete information sources that a system can represent. ─ Notification Level: the degree to which system alerts are meant to interrupt a user. ─ Representational Fidelity: level of abstraction used for the visualization of the information. ─ Aesthetic Emphasis: importance given to aesthetics. We used this taxonomy to set initial design requirements, and obtain a tradeoff between these dimensions. Literature provides a considerable number of examples of such systems, as well as similar approaches suggested by different disciplines, such as informative art [25]. Figure 1 shows the most relevant examples we found of ambient displays that use light as mean of communication: Daylight Display (2003) [17] consists of a floor lamp which changes its brightness according to the external light conditions. Hello.Wall (2003) [23] is a wall-size display which emits contextdependent information using light patterns to provide awareness information and atmospheres in organizational environments. The display reflects the identity and the distance of people passing by. Power-aware cord (2005) [10]is an electrical power strip that uses dynamic glowing patterns to display the amount of energy passing through it, in order to increase energy awareness among the users. Ambient Orb (2007) [1] is a commercial product consisting of a glass lamp that uses color to display information about weather forecasts, trends in the market, or the traffic.
Fig. 1. From left to right: Daylight Display; Hello.Wall; Power-aware cord; Ambient Orb
266
V. Occhialini, H. van Essen, and B. Eggen
2.1 Related Work on Meeting Rooms Previous studies have touched upon pacing interfaces and ambient displays for meeting rooms. Although the described systems are meant for different situations and utilize more traditional implementation techniques, they relate to our research either for their purpose (enhancing pace ability for high cognitive demanding tasks; inviting to a more social acceptable behavior by providing feedback), or their characteristics (shared, adaptive displays). Time aura (2001) [15] is a desktop-based application that helps people to adjust their pacing while they give a presentation. It provides an overview of the presentation’s structure, and real-time information and suggestions about the progress of the task and feedback on speaker’s performance. Second messenger (2004) [4, 5] uses a shared display to visualize real-time how much each person has spoken during a meeting in relation to the others. Its aim is to influence groups' behavior towards a more balanced level of participation. Meeting Mediator (2008) [14] detects social interactions and provides persuasive feedback to enhance group collaboration in situations where balanced participation and high interactivity is desirable. Behavioral change is promoted by visualizing the interactivity level of the meeting (subjects’ active participation to the meeting) and individual speaking time. The feedback is provided to each participant on a personal display. Relational Cockpit (2009) [29] is a system that generates unobtrusive feedback on the social dynamics of a meeting. The feedback is presented on the top of the table, each participant has an individual visualization showing the personal cumulative speaking time since the beginning of the meeting; the duration of the current turn; the cumulative visual attention he/she gave to other participants as a listener, and the cumulative and current visual attention he/she got from the listeners. Conversation Clock (2009) [12] is described as a “social mirror” that visualizes the level of participation of individuals engaged in a group conversation. The system provides a historical overview of turn-taking and participation within a group. Finally, Rienks [26] presented an overview of agent-based systems in order to understand how attendees of a meeting respond to the presence of a virtual assistant, and how such an “assistant” should act and manifests itself. All the presented studies found evidence that technology is able to support cognitively demanding tasks during a meeting situation as well as to influence meeting processes. Further research is still needed to understand to which extend this influence positively changes the actual meeting’s outcomes. Also, the same studies highlighted some issues and concerns to take into account for the design and implementation of such systems. These issues will be further discussed in the next section, presenting a list of design goals and requirements.
Fig. 2. Time aura; Second messenger; Relational Cockpit; Conversation clock
Design and Evaluation of an Ambient Display to Support Time Management
267
3 Design of the Ambient Display This section describes the system’s development, the concept, the goals we want to achieve with our design, and the solutions we adopted to meet our initial requirements. We emphasize the explorative nature of our research: as the first step in a long-term project, the goal of our research is not to design an ultimate time management system, but to investigate the opportunities of light to communicate information in an unobtrusive way. This work fits in ongoing research projects on developing new, innovative, interaction styles for and with lighting in the context of the intelligent office. To be able to investigate this in context, we have selected meetings and time management as a specific case. In the prototype iterations, we focus on issues related to the visualization of information by means of light and the impact such a system might have on users. We are aware that there are other interaction related issues: the design of an interface and interaction styles allowing potential users to interact with the system before and during the meeting. Before a system like this can actually be implemented also the software architecture to support the data processes and the specification of devices and techniques [3, 20] for the collection of real-time, contextual information need to be developed. Although these are important aspects they are not addressed here. 3.1 Concept Description We develop an adaptive ambient display which provides timing information inside a meeting room to support time management during meetings where several individual contributions are planned. Three representative scenarios were chosen for its further development: presentations, round table meeting, and discussion meeting. These scenarios are characterized by one-to-many, sequential many-to-one, and random many-to-many communication patterns respectively, as well as an alternation between roles: participants being in turn member of the audience or main speaker. The system uses dynamic light patterns to provide a general overview of the meeting and its progress. The system also provides cognitive aid to the people who give an individual contribution to the meeting, for instance by displaying the time available for their presentation, or an indication of their elapsed and remaining time. Specifically, the system could display the following information: ─ ─ ─ ─ ─ ─
meeting duration (total time programmed for the meeting); meeting schedule (planned activities, such as agenda points, presentations etc.); activity duration progress of the meeting (overall time elapsed and time left to conclude); feedback on individual contribution time (time scheduled and elapsed time); notification of transitions between activities (the current activity is almost over and a new phase is starting).
As the meeting proceeds, the system tracks what it is going on in the room, and updates the schedule according to the duration of the previous activities. For example, if the first presentation last longer than planned, the system recalculates the optimal duration of the next ones, and it will allocate them in a shorter time-frame, to fit within the scheduled duration of the meeting.
268
V. Occhialini, H. van Essen, and B. Eggen
3.2 Design Goals and Choices Referring to the fundamental principles of information decoration [6] and calm technology [34], to earlier work, and to the selected scenarios we have set five requirements to be achieved with the design of the ambient display: 1. Provide contextual information in an unobtrusive way. The system should support an "on-demand" interaction. The users decide when they want to retrieve information from the ambient display, they are not required to use of the display the whole time. Information is embedded in the environment. Unobtrusiveness implies that people that need to be focused on primary tasks will not be distracted. 2. Reduce cognitive load for pacing and time keeping. The system should provide clear information, minimizing cognitive demands for the processing of that information. Therefore, the display should be always visible, easy to interpret and to recognize in a glance. Also conventional tools, such as clocks, provide time awareness. Yet, these devices might be experienced disruptive to participants who are performing a cognitively demanding activity. Ambient displays take advantage of our ability to quickly switch between focus of attention from center to the periphery and back, while performing different tasks [34]. Yet, the information capacity [22] and the complexity of the visualization are crucial factors for the interpretation and cognition. 3. The system should support different roles and activities within a meeting. Speakers need to be aware of their progress within presentations, while other participants might have different needs, according to the role they have in a particular moment; a rough indication of the progress within the meeting or activity might be sufficient to take decisions about their actions. 4. Invite meeting participants to a more accountable use of common time. The system should address individual’s accountability by visualizing the impact of their individual use of time on the common schedule. 5. Seek a balance between aesthetical and information quality. In line with the information decoration [6] and visualization [33] principles and the general guidelines for the design of ambient displays [22], the system should portray information through a subtle visualization, pleasant for everybody exposed to it. Although aesthetic judgment is subjective, integration within the architectural properties of the actual location and its application context needs to be fulfilled. According to these requirements some basic design choices for the actual design of the display can be made as guideline for the first implementations: Shared display. We opt for (one or more) shared displays which will be visible at any time to everyone attending the meeting. We expect that individual participants will become more accountable on use of time when everybody knows how individual behavior affects the common schedule. This expectation is based on the concept of social translucence [7], defined as a property of systems which enhance mutual awareness and accountability by displaying socially significant information to their shared users. Social translucency is based on theories on the formation of group norms [11] and on the self-regulation of behavior [3]. Although Dimicco’s study [5] indicates that some negative reactions might occur because of the public display of personal feedback, we consider timing information not sensitive enough to cause users’ discomfort.
Design and Evaluation of an Ambient Display to Support Time Management
269
Low information capacity. Since in a meeting situation each possible user is already involved in different primary tasks, a low information capacity is preferable, in order to ensure an effective retrieval of information. Displays that encode large numbers of data might result more informative, but certainly they require more time to be read and understood. Linear representation of time. Different visualizations can be proposed for the representation of time. A linear representation of time is preferable because of its similarity with a timeline. Following Yakura’s definition [38], timelines are a graphical representation of a set of planned activities, organized over a time interval. Unlike clocks and calendars, which measure time, timelines represent time in a narrative way, providing a concrete basis to organize and coordinate work. A linear representation requires an orientation to be interpreted; cognitive psychology tells that scan patterns similar to reading (left-to-right, top-to-bottom) appear intuitively to users even when they have no relevance to the performed task [18]. Pre-attentive processing. The automatic detection of basic visual information without consciously focusing on the display is referred to as pre-attentive processing [32]. Thanks to this property, users who are engaged in another task can quickly catch elements in a visualization which differ in color, size, position, orientation, and other basic characteristics. We use pre-attentive processing to visualize crucial information, such as the progress within the meeting and the different activities. Static status display, subtle transitions. We choose to visualize meeting “status” and “notifications” for changes in status. Status represents a specific condition or activity of the meeting, for instance agenda items or presentations. A status is displayed statically, i.e. the display does not change during the activity. Notifications indicate transitions between different activities (status), i.e. when the time allocated to a specific activity is about to elapse. Notifications are dynamically displayed. These dynamical effects can be designed with different degrees of subtlety, depending on the level of attention that is required. Multiple visualization modes to support different roles of participants. To optimally support the different roles participants have during a meeting, we chose to combine two different modes for visualization. The “overview mode” displays all the planned activities, their expected duration and the overall progress within the meeting, allowing all the participants to have an overview of the meeting progress in a glance, including their possible upcoming personal contributions. When a specific activity is going on which requires more control, the system highlights the time slots allocated to that specific activity: “presentation mode”. Convenient location for higher cognitive demanding tasks. As the most convenient location for the display to ensure a good visibility of the display to every participant in all the possible scenarios, we decided to display the information on the meeting room’s walls: multiple, identical displays could be used to allow an easy information access to all the attendees. In case of one display, it should be preferably placed opposite the presenter’s position. Light beams as information medium. Several light appliances and effects were considered for the actual implementation of the system. Eventually, our project aims to integrate timing displays into state-of-the-art intelligent (office) lighting systems.
270
V. Occhialini, H. van Essen, and B. Eggen
However, for prototyping purposes we preferred using more traditional luminaries, such as (halogen) spots and indirect light because they do not resemble too much to ordinary graphical displays and because these fixtures are commonly used in public and professional environments. The prototypes are explorative research vehicles and not iterative designs of final systems. This is also the reason why the aesthetic quality of the displays (requirement 5) is not subject of our present evaluation. 3.3 Evaluation and Research questions The evaluation of an ambient display might be complicated and costly, because of the difficulty to define its efficacy in measurable terms [16]. In order to reach our goal, i.e. to understand if light visualizations can effectively convey information to people engaged in a meeting, we design low-fidelity prototypes and set-up experiments with real users, taking into account our design goals and previous studies on similar systems [4][12][14][29]. We apply qualitative research methods (observations, questionnaires, interviews) to collect data. No quantitative data were collected on whether our system increases the efficiency of meetings. Our evaluation is formative, aimed at better understanding and informing the design process. The results are less formal than in summative evaluation. We identified three clusters in our evaluation: perception, interpretation and experience. Each of these clusters has related research questions to be addressed, as shown in table 1. Table 1. Research questions.
Perception Pre-attentive processing: do people catch the information they are looking for at a glance? Notifications: are the notifications noticeable? To which extent? Location and visibility: is the display location comfortable for everyone to see it? Distraction: does the ambient display distract people during the meeting?
Interpretation Information clarity: do people quickly understand the information they catch at a glance? Is there any element of the visualization that creates confusion during its retrieval? Information capacity: is the portrayed information enough to provide a satisfactory overview of the meeting progress? Are the participants able to remember the coding?
Experience Participant’s use of the ambient display: how do people actually use the ambient display? Perceived usefulness: does the presenter find the system useful to pace his/her presentation? Do the other participants find the feedback useful to organize their actions? Stress level: do the speakers feel more stressed because of the ambient display? Social dynamics: does the presenter feel more accountable for his/her use of time? Reaction to the shared display: how do people react to the public nature of the information displayed? Is the system too intrusive?
Design and Evaluation of an Ambient Display to Support Time Management
271
The questions in the perception cluster include cognitive aspects related to the identification of information from the display. The interpretation cluster relates to the comprehension of the portrayed information, while the questions in the experience cluster cover aspects associated with the users’ reactions to the system, both as individuals and as a group. The research questions reflect issues that we expect to encounter during the implementation of our system. The main goal of this explorative study is to find out which of these topics are more relevant in relation to the design choices we made and how can we exploit them in the next iterations.
4 Implementation and Evaluation results In two iterative design cycles, two prototypes of the system have been implemented and tested with actual users: first a low-fidelity graphical visualization, projected with a beamer on the wall. Second a full-scale mock-up using halogen lamps. The prototypes are implemented following the design choices described in the previous section and based on a specific presentation scenario: to support meetings consisting of a series of presentations and in between Q&A sessions. 4.1 First prototype: Low-Fidelity graphical implementation This initial prototype consists of a dynamic graphical pattern to be projected on one of the meeting room’s walls. The low-fidelity visualization consists of several bars that differ in color hue, color lightness, and height (see Figure 3). Each bar represents a 5 minute timeframe, and the number of these bars indicates the total meeting duration (so 12 bars represent one hour meeting). To maintain a low information capacity, we considered a 5 minutes resolution to be enough to provide the users with a sufficient overview on the overall schedule. This scale also accommodates the duration of the different activities (often planned to last 5, 10 or 15 minutes), as well as the overall meeting duration, which is on average between 30 and 90 minutes [13][19]. In line with our general design choices, the pattern modifies to indicate meeting status, and to notify activity transitions. We coded the different information as follows:
Fig. 3. Impression of the low-fidelity visualization prototype (for explanation see text)
272
V. Occhialini, H. van Essen, and B. Eggen
Status: Three variables (size; color, and color intensity) are selected to represent system states to easily trigger pre-attentive processing in our users. ─ Meeting progress – size: the highest bar, popping out the pattern, is the present time slot. Its position in the pattern is an immediate cue on overall progress. ─ Meeting schedule – color: light blue bars represent presentations, while yellow bars represent the in between discussions. Primary colors were chosen to ensure high-contrast visualization also in difficult ambient lighting conditions. ─ Presentation mode – color intensity: darker blue bars identify the current, ongoing, presentation slot. This visualization was preferred over other information visualization’s techniques, such as fisheye’s distortion techniques [28], to maintain a clear overview on the complete meeting. Notifications: Subtle changes in bars’ length and color indicate notifications that do not require a high level of attention from the users. Such slow modifications will not distract the subjects, but still add extra information: users can immediately notice that something is changing. More noticeable blinks are used for critical notifications. ─ Time slot almost elapsed – dynamic change in length: during the last minutes of the active time slot, the next bar starts growing towards the height of the active one. Then the new slot becomes active and the expired one shrinks again. ─ Upcoming change in activity – color transition: when 3 minutes are left for the next presentation to start, the yellow bar slowly turns into blue. ─ Approaching end of presentation time – blink: when 3 minutes are left for the presentation to be over, the blue bar starts blinking at low-rate. When 1 minute is left, the blink becomes faster. Blinking is preferred over static cues as it can persistently attract users’ attention: ensuring the information will be noticed and constantly reminding the presenter of the upcoming end of his personal time. 4.2 First Prototype: In-Field Concept Evaluation and Results During two meetings the prototype was evaluated with the same group. A total of 20 participants (18 students and 2 professors) took part in the study. During the first meeting (two hours) the results of four student projects were presented, each with a scheduled presentation time of 20 minutes, followed by a 10 minutes discussion. In the second meeting (one hour), four student projects were presented, all with a scheduled presentation time of 10 minutes, followed by a 5 minutes discussion. A beamer projection of the visualization was shown opposite to the presenter location, see figure 4 for the test set-up. All notifications and the time-tracking features were preprogrammed for the test, allowing for manual real-time updates (when necessary) by the researcher. No specific instructions about how and when to use the system was given, only the information coding was explained to familiarize the students with the visualization. The objective of this experiment is to gain qualitative data about the actual impact that our system would have on the participants (experience cluster). We choose an infield evaluation (real presentation session) over a controlled study as we expected that testing our system in this context would provide more insights on the actual experience. Although the projection of the visualization does not use real light sources to convey information, also initial insights on the perception and the interpretation
Design and Evaluation of an Ambient Display to Support Time Management
273
clusters were collected and later used for the development of the second prototype. After each meeting all the participants filled-in an open-ended questionnaire on their experience as audience. The presenters were also asked to respond to a closed-endedlikert scale questionnaire followed by a 10 minutes, semi-structured interview with a researcher. The questionnaires and raw data are not included in this paper. The results are summarized below for the three evaluation clusters.
Fig. 4. Set up first evaluation
The findings on the experience cluster show that the subjects comfortably used the display to be aware of the meeting’s progress, as well as to manage their available time. All participants reported to have looked at the display. As members of the audience “out of curiosity”, but mainly towards the end of the presentations to check whether the schedule was respected. Also the presenters reported they looked at display during their presentation time. Yet, few presenters used it while they were actually speaking. Most of the presenters checked the display before or right after their turn. Also, the system prompted some episodes of situated actions [30], (e.g. decide to ask a question; arrive late at the meeting and refer to the display to have an idea about the general progress of the meeting, etc.) confirming that increasing people’s awareness towards the actual progress of the common activity leads to a more appropriate personal behavior. Although timing of the presentations appeared not a main issue during the sessions, most subjects recognized the feedback to be helpful (“knowing how much time you speak, and how much you have left, that’s really helpful”) to improve their efficiency and to coordinate the whole group. No significant stress was reported in relation to the presence of the ambient display; instead, some participants appreciated its “calming” effect, and found it less stressful than being interrupted or notified by other people about their available time. Further research is needed to understand the system’s impact on the meeting’s social dynamics, especially to verify whether it increases the participants’ feeling of their own accountability. Finally, the little concern created by the public nature of the
274
V. Occhialini, H. van Essen, and B. Eggen
display among the users, confirmed our expectations that timing information is not too sensitive to be visualized in a shared display. The outcomes related to the perception questions illustrate that the system succeeded in providing peripheral information to the participants involved in a real meeting situation, without distracting or annoying them. The visualization successfully triggered pre-attentive processing, making it easy for people to spot the meeting’s progress in a glance. Also, our choice of using subtle transitions as notifications to create an apparent static visualization was confirmed by the participants who found this helpful to have a better retrieval of the information. The more intrusive blink notifications at the end of the presentations could not be evaluated, since most of the contributions lasted shorter than planned. As expected, the display was considered “more natural” and less distracting than retrieving timing information from screen applications, personal devices or other people. The location of the display was found convenient for and by the presenters who could quickly glance at it while they were looking at the audience. However, the efficacy of a single display in supporting the audience is not verified: some of the listeners had to turn their head to actually see it. Good visibility might be a precondition for perception. The results from the interpretation questions indicate that participants could correctly understand the information portrayed. “I think the visualization is really insightful and helpful because it really shows information in terms of where you are and how much you have available”. As observed during previous studies [4] [36], the initial briefing and some experience with the display were necessary to help the participants in properly retrieve the information. Furthermore, like Sturm and Terken’s experiment [29], the few users who did not comprehend the meaning of the visualization stopped using the display. Obviously longitudinal studies are preferred over one-session experiments for further evaluations because the consequent learning effect might solve the initial interpretation issues, and because multiple sessions might be also useful to overcome the novelty effects that usually accompany the introduction of a new technology. Finally, all the participants confirmed that the information capacity portrayed in the display was sufficient to support them in managing their time, and observed that adding more elements could compromise the display readability, making it more difficult to recognize the coding. Still, few participants found the 5 min slots not accurate enough for the presenter and demanded more specific time information. Concluding, our users showed an appreciation for the way a tool like this could support time management during their meetings, without being too intrusive or distracting. This confirms our initial expectations towards the system. Furthermore, this evaluation provided useful insights to proceed with the next iteration cycle of implementation. Specifically, we decided to highlight the timing of the present activity (ongoing presentation) more. Second, we decided to change the notification indicating the transition from one time-slot to the next one (within an activity like a presentation), since several subjects reported to be confused by the fact that the next slot already starts moving while the current slot is still active. Moreover, also the increasing length of the bars was considered misleading in representing the elapsing time. In the next implementation this transition is notified by a single time-slot (the active one) which becomes slowly less visible, until it is completely diminished and the next one is active. This implementation also matches better with actual lighting.
Design and Evaluation of an Ambient Display to Support Time Management
275
4.3 Second Prototype: Halogen Spots Implementation Building upon the findings of our first evaluation a second prototype has been created and tested. In this prototype, eight halogen spots combined with color filters create the colorful bars. Only a part of the display was implemented, allowing to perform a single session of presentation-Q&A instead of a complete meeting schedule (see Figure 5). A combination of white and orange beams is chosen to indicate the different activities. The information is coded as follows:
Fig. 5. Halogen spots prototype. Most left a picture of the real panel with all eight spots on. The other panels simulate from left to right a possible meeting scenario (see text for explanation).
Status: Direction (up/down), color, and intensity are selected to display status. ─ Meeting progress – direction: the bottom-to-top beams visualize the overall schedule, while the top-to-bottom beams indicate the currently elapsing slot. The direction of beams replaces beams’ length since beam length turned out to be hard to implement with the available light fixtures. ─ Meeting schedule – colors: white beams represent the Q&A and preparation phases while the orange beams represent the actual presentation time. ─ Presentation mode – intensity: during a presentation, the orange beams increase their intensity, while the white ones are dimmed. Notifications:
Again intensity and blink are used to indicate notifications.
─ Time slot almost elapsed – intensity: when 1 minute is left the current upper starts dimming. When the time is totally over, the next beam switches on. ─ Approaching end of presentation – blink: when 2 minutes are left for the presentation, the current time-slot starts blinking. When 1 minute is left, the blink becomes faster. 4.4 Second Prototype: Controlled Context Evaluation and Results Goal of this second study was to verify the efficacy of our light visualization to convey clear information to the users. This relates to the questions in the perception and interpretation clusters. The collected data provided useful insights to better understand how to code information with light. So although the second prototype has a different purpose, we consider it a next design iteration because it takes into account the lessons learned from the first iteration and because it uses real light as a medium (indicating a higher fidelity).
276
V. Occhialini, H. van Essen, and B. Eggen
The evaluation using the second prototype consisted of five experimental sessions carried out in a controlled condition in our lab, to ensure all the participants would use the display in similar ambient lighting conditions (illuminance level above 250 lux) and the display’s visibility would not affect the visualization. Five pairs of participants (10 in total) recruited among master students and employees took part in the evaluation. Each experimental session consisted of two times 12 minutes, in which each participant had 3 minutes to organize a speech about a topic of their choice, 6 minute to present it, and 3 minutes of Q&A session with the other participant and the researcher/observer, who followed the presentation as audience. Then the participants swapped their places, and the second one became the speaker. This specific timing was preprogrammed in the display and participants were asked to coordinate their actions accordingly. A short explanation about the system’s behavior and the information portrayed was given before the beginning of the actual test. The system behaved identically throughout the two sessions except for the notifications indicating the end of a presentation: in the first session the blinks were implemented as smooth brightness changes, while in the second session the blinks were implemented faster, the light smoothly dimming and suddenly becoming bright again. This difference was not explained to the participants. We expected the first notification to be less obtrusive, but also less noticeable by the peripheral view. After the experiment both participants had a semi-structured interview with the researcher to collect qualitative data about their experience with the display.
Fig. 6. Set up and impression of second evaluation
The results collected on perception illustrate that both the orientation of the beams and their different colors were effective to trigger pre-attentive processing, making it easy for the participants to check their progress without losing track of their speech. Also, the notifications indicating the end of the presentation time were found helpful to retrieve information from the display without actively looking at the visualization. Since none of our participants had specific remarks related to the different
Design and Evaluation of an Ambient Display to Support Time Management
277
notifications in first and the second iteration, we can conclude that neither the first blinks were too smooth to be noticed, nor the second ones were too obtrusive. None of the speakers reported distraction from the presence of the ambient display, but several participants found it distracting for the audience. This could be related to the novelty of the display (several participants reported to often look at the display out of curiosity, to see how it would behave), we expect this effect will fade when users will get more familiar with the display; also, given the situation with only two people attending the presentation, any distraction from the listeners was very noticeable during the test. The outcomes related to interpretation confirmed the efficacy of the general design of our visualization: again, all the participants found the display easy to interpret, and no particular difficulties were found in having an overview of the meeting progress. However, several subjects were slightly confused by the notifications, because the changes occurring in a single light beam were not sufficiently visible to have a precise indication of how much time they had left. This can be related to the lack of other reference elements in the visualization. Finally, although the evaluation did not specifically address questions on experience, several positive comments were gained in relation to perceived usefulness. All subjects recognized the value of the system in increasing awareness towards time management issues, as well as group coordination within a meeting situation. Also, the way our ambient display provides information was found unobtrusive and, in some cases, even “relaxing”. Concluding, the results with the second prototype confirm the positive attitude of the potential users towards a lighting system that supports time management during meetings and indicate that it is indeed possible to use light to convey peripheral information to users engaged in different primary tasks. We observe, however, limitations associated to the controlled condition in which the study was carried out. In contrast to our in-context first evaluation, during this second experiment several subjects were more focused on following the display than giving their presentations. Also the same remarks on longitudinal use and novelty effect can be made. We conclude that for further development and evaluations, in-field studies provide more insightful outcomes to assess the overall value of the concept.
5 Conclusions and Recommendations This study investigated the opportunities of light as a communication medium to provide peripheral information in working environments. An innovative ambient display was proposed which supports time management during meetings by visualizing time awareness information on the walls of a meeting room using decorative light beams. Such an adaptive system fits in the current trends of intelligent lighting solutions for the professional environment, which will become context aware and exploit light to support activities carried out in the working space and at the same time aesthetically enhance that specific environment. In an explorative research-through-design process, case-specific design goals and requirements were formulated and two different explorative prototypes were created. The outcomes from two experimental studies with these prototypes showed users
278
V. Occhialini, H. van Essen, and B. Eggen
recognized the display “useful” to support time management during group activities, both as a tool that assists the speakers for pacing their presentations, and as a system that increases audience’s awareness and enhances coordination in the whole group. The level of comprehension of the information portrayed confirmed that it is indeed possible to use light to convey information in a subtle, unobtrusive way: the visualization was found easy to be read at a glance, and simple to interpret, although a certain level of familiarization with the system was required, as is often the case with ambient displays [36]. Also, the system succeeded in maintaining its peripheral nature, without distracting or annoying the users. Participants in the studies appreciated its “calming” visualization and found it “more natural” than traditional methods to retrieve timing information while engaged in their primary tasks. Finally, we did not reveal privacy issues if the aim of the feedback is to improve the overall efficiency rather than assessing individual’s performance. These results support the idea that light can be successfully used as a communication medium for the abstract visualization of peripheral information and that ambient displays which communicate through light are appropriate in meeting room contexts: they display important but not critical information, seamlessly merge with the surroundings, are useful when needed, but still pleasant and not disturbing for main activities that are going on in the same location. More in general, the project contributed to better understanding of interactions with ambient intelligent environments and of the impact that interactive technology might have on group activities. The experience obtained during the explorative development provides new perspectives for the development of such systems. This encourages us to further develop the system, especially with respect to high fidelity visualizations, i.e. more beautiful, using real lighting equipment, embedded in a real environment, and longitudinal in-context studies. In particular, we individuated several important issues for further development: First, additional investigation is needed to better understand the level of information detail our display requires. The display aimed to give users an indication of their progress, and not about specific values. We assumed that overdetailed indications and notifications were not necessary and not desirable for the efficacy of our system and choose arbitrary time slots. Although our system succeeded in making users aware of the meeting progress, several participants felt discomfort because of the lack of specific, detailed information about the elapsing time. Second, special attention needs to be paid to users’ expectations towards the system’s behavior: should the display be more adaptive, and act as a mirror of the actual situation, or should it be more proactive, by indicating which action should be performed in order to respect the schedule? Third, the design of an interface and interaction styles allowing users to interact explicitly with such an information decoration system before and during the meeting, need to be developed. Finally, additional in-field evaluations are needed to better understand to which extent, and how, the system affects the participant’s perception of their accountability. Specifically, it is interesting to investigate whether social translucency [7] alone is enough to make participants more accountable towards their own actions, or that explicit persuasive cues [8] are needed to trigger a more substantial behavioral effect. Although our initial results indicate that “transparency” is an important factor for the acceptance of the public nature of the feedback, subtle persuasive mechanisms will enhance the effectiveness of the display, without making people uncomfortable.
Design and Evaluation of an Ambient Display to Support Time Management
279
References 1. Ambient Devices (2007), http://www.ambientdevices.com/cat/orb/orborder.html 2. Banerjee, S., Rudnicky, A.I.: Using simple speech-based features to detect the state of a meeting and the roles of the meeting participants. In: Proc. of the 8th Int. Conference on Spoken Language Processing - Interspeech 2004 – ICSLP (2004) 3. Carver, C.S., Scheier, M.F.: On the Self-Regulation of Behavior. Cambridge University Press, Cambridge (1998) 4. DiMicco, J.M., Bender, W.: Group reactions to visual feedback tools. In: de Kort, Y.A.W., IJsselsteijn, W.A., Midden, C., Eggen, B., Fogg, B.J. (eds.) PERSUASIVE 2007. LNCS, vol. 4744, pp. 132–143. Springer, Heidelberg (2007) 5. DiMicco, J.M., Pandolfo, A., Bender, W.: Influencing group participation with a shared display. In: Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, CSCW 2004, pp. 614–623. ACM, New York (2004) 6. Eggen, B., van Mensvoort, K.: Making sense of what is going on around: designing environmental awareness information displays. In: Awareness Systems. Human-Computer Interaction Series. Springer, London (2009) 7. Erickson, T., Kellogg, W.A.: Social translucence: an approach to designing systems that support social processes. ACM Trans. Comp.-Human Interactions 7(1), 59–83 (2000) 8. Fogg, B.J.: Persuasive technology: using computers to change what we think and do. Morgan Kauffmann Publishers, San Francisco (2003) 9. Gibson. J.J.: The ecological approach to visual perception. Lawrence Erlbaum Associates, Hillsdale (1986) 10. Gustafsson, A., Gyllenswärd, M.: The power-aware cord: energy awareness through ambient information display. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1423–1426. ACM, New York (2005) 11. Hackman, J.R.: Group influences on individuals in organizations. In: Dunnette, M.D., Hough, L.M. (eds.) Handbook of Industrial and Organizational Psychology (1992) 12. Karahalios, K.G., Bergstrom, T.: Social mirrors as social signals: transforming audio into graphics. In: IEEE Computer Society, vol. 29(5), IEEE, Los Alamitos (2009) 13. Kayser, T.A.: Mining group gold: how to cash in on the collaborative brain power of a group, 2nd edn., Irwin, Chicago (1995) 14. Kim, T., Chang, A., Holland, L., Pentland, A.S.: Meeting Mediator: enhancing group collaboration using sociometric feedback. In: Proc. of the Conference on Computer Supported Cooperative Work, CSCW 2008, pp. 457–466. ACM, New York (2008) 15. Mamykina, L., Mynatt, E., Terry, M.A.: Time Aura: interfaces for pacing. In: Proc. of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2001, pp. 144–151. ACM, New York (2001) 16. Mankoff, J., Dey, A.K., Hsieh, G., Kientz, J., Lederer, S., Ames, M.: Heuristic evaluation of ambient displays. In: Proc. of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2003, pp. 169–176. ACM, New York (2003) 17. Mankoff, J., Dey, A.: From conception to design, a practical guide to designing ambient displays. In: Ohara, K., Churchill, E. (eds.) Public and Situated Displays. Kluwer, Dordrecht (2003) 18. Megaw, E.D., Richardson, J.: Target uncertainty and visual scanning strategies. Human Factors 21(5 3), 303–316 (1979) 19. Monge, P.R., McSween, C., Wyer, J.: A profile of meetings in corporate America: results of the 3M meeting effectiveness study. Annenberg School of Communications, University of Southern California, Los Angeles (1989)
280
V. Occhialini, H. van Essen, and B. Eggen
20. Nickel, K., Pardàs, M., Stiefelhagen, R., Canton, C., Landabaso, J.L., Casas, J.R.: Activity classification. In: Computers in the Human Interaction Loop, Part II. Human-Computer Interaction Series, pp. 109–119. Springer, London (2009) 21. Pousman, Z., Stasko, J.: Ambient information systems: evaluation in two paradigms. In: Workshop Ambient Information Systems, Pervasive 2007, Toronto, pp. 25–29 (2007) 22. Pousman, Z., Stasko, J.: A taxonomy of ambient information systems: four patterns of design. In: Proc. of Advanced Visual Interfaces, pp. 67–74. ACM, New York (2006) 23. Prante, T., Rocker, C., Streitz, N., Stenzel, R., Magerkurth, C., Alphen, D.V., Plewe, D.: Hello Wall - beyond ambient displays. In: Video and Adjunct Proceedings of UBICOMP 2003 (2003) 24. Predebon, J.: Time judgments as a function of clock duration: effects of temporal paradigm and an attention-demanding nontemporal task. In: Perceptual and Motor Skills, vol. 88, pp. 1251–1254 (1999) 25. Redström, J., Skog, T., Hallnäs, L.: Informative art: using amplified artworks as information displays. In: Proc. of Designing Augmented Reality Environments. ACM, NewYork (2000) 26. Rienks, R., Nijholt, A., Barthelmess, P.: Pro-active meeting assistants: attention please! In: AI & Society, vol. 23(2). Springer, Berlin (2009) 27. Romano, N.C., Numamaker, J.F.: Meeting analysis: findings from research and practice. In: Proc. of 34th Hawaii International Conference on Systems Science (2001) 28. Spence, R.: Information visualization: design for interaction, 2nd edn. Prentice-Hall, Englewood Cliffs (2007) 29. Sturm, J.A., Terken, J.M.B.: Relational Cockpit. In: Computers in the Human Interaction Loop. Human-Computer Interaction Series, pp. 257–270. Springer, London (2009) 30. Suchman, L.: Plans and Situated Action: the problem of human-machine communication. Cambridge University Press, Cambridge (1987) 31. Tomitsch, M., Kappel, K., Lehner, A., Grechenig, T.: Towards a taxonomy for ambient information systems. In: Workshop on the Issues of Designing and Evaluating Ambient Information Systems, Pervasive 2007 (2007) 32. Treisman, L.: Preattentive processing in vision. Computer Vision, Graphics, and Image Processing 31(2), 156–177 (1985) 33. Ware, C.: Information visualization: perception for design. Morgan Kaufman, San Francisco (2004) 34. Weiser, M., Brown, J.S.: Designing Calm Technology (1995), http://www.ubiq.com/hypertext/weiser/calmtech/calmtech.htm 35. Weiser, M.: The Computer for the 21st Century. Scientific American, 933-940 (1991) 36. Wisneski, C., Ishii, H., Dahley, A., Gorbet, M., Brave, S., Ullmer, B., Yarin, P.: Ambient displays: Turning architectural space into an interface between people and digital information. In: Yuan, F., Konomi, S., Burkhardt, H.-J. (eds.) CoBuild 1998. LNCS, vol. 1370, p. 22. Springer, Heidelberg (1998) 37. Wyszecki, G., Stiles, W.S.: Color science concepts and methods, quantitative data and formulae, 2nd edn. Wiley Interscience, New York (1982) 38. Yakura, E.K.: Charting time: timelines as temporal boundary objects. The Academy of Management Journal 45(5), 956–970 (2002) 39. Zakay, D., Block, R.A.: Temporal cognition, current directions. Psychological Science 6(1), 151 (1997) 40. Kuijer, H.: Light architecture in Bernheze Townhall; Gemert-Bakel townhall, and SD worx Wand Antwerpen, http://www.hermankuijer.com
Does Panel Type Matter for LCD Monitors? A Study Examining the Effects of S-IPS, S-PVA, and TN Panels in Video Gaming and Movie Viewing Ki Joon Kim1 and S. Shyam Sundar1,2 1
Department of Interaction Science, Sungkyunkwan University, Seoul, Korea 110-745 [email protected] 2 Media Effects Research Laboratory, College of Communications, Pennsylvania State University, University Park, PA 16802 [email protected]
Abstract. As computer-based devices become the primary media via which users view movies and play interactive games, display technologies (e.g., LCD monitors) have focused increasingly on quality of video fidelity, with much debate surrounding the relative efficacy of different panel types of LCD monitors. A 3 (S-IPS panel vs. S-PVA panel vs. TN panel) x 2 (game vs. movie) between-subjects experiment was conducted to examine the effects of LCD panel type in facilitating regular viewing as well as enhanced interactive TV experiences. Data from the experiment showed that LCD panel and stimulus type as well as computer literacy were important factors affecting users’ viewing and interaction experiences. Limitations and implications for theory and ongoing research are discussed. Keywords: LCD panel, response rate, contrast ratio, viewing angle, computer literacy.
However, no empirical study has been conducted to examine the effects of LCD panel type on users’ viewing and interacting experiences. As a result, it is unclear whether a particular panel is more effective in providing greater satisfaction, presence, and enjoyment while interacting with LCD monitors. The present study is a modest first attempt at investigating the psychology of LCD monitor users when exposed to three major panel types: S-IPS, S-PVA, and TN. 1.1 Panel Difference – From an Engineering Perspective TN (twisted nematic) panel has been most widely used due to the advantages of its high transmittance, simple fabrication process, and relatively low production cost [1]. The panel’s affordable price and fast response rate have made it the most suitable panel for typical office use and fast-paced gaming. The biggest downside of the TN panel, however, is its severe off-axis image deterioration resulting in the worst viewing angle, color reproduction, and contrast ratio in LCD panel technology [2]. The panel is not recommended for movie viewing because, unlike 8-bit S-PVA and SIPS panels that are fully capable of displaying 16.7 million colors in 24-bit true color, it only mimics the 16.7 million colors in 6-bit. To solve the drawbacks of TN panel, IPS (in-plane switching) panel was first developed by Hitachi in 1996 and later enhanced by LG Display with S-IPS (super inplane switching) technology. The basic principle of the IPS panel was “to change the physical behavior of the liquid crystal layer by having the molecules move in parallel to the TFT and color filter layers rather than at oblique angles,” which resulted in “significantly lessened light scattering, and thus improved the picture uniformity and color fidelity when viewed from wide angles” [3]. LG Display further developed the original IPS technology into a premium LCD panel with improved viewing angle, color fidelity, response time, and contrast ratio called S-IPS. Developed and manufactured by Samsung Electronics as an improved alternative to the existing PVA technology, S-PVA (super patterned vertical alignment) is a new display technology providing image quality advantages over S-IPS, including high transmittance, 2300:1 contrast ratio, and wide viewing angle with no off-axis image inversion [2]. Samsung’s new technology called “Magic Speed” claims to offer enhanced response time, rendering the S-PVA panel more suitable for gaming and other interactive applications than the traditional PVA panel. 1.2 Panel Difference – From a Psychological Perspective Previous studies on the relationship between perceived viewing quality (e.g., attention, enjoyment, memory, and presence) and display screen have provided empirical explanations for consumers’ preference for large screens and highresolution display devices. Reeves, Detenber, and Steuer [4] showed participants short clips from action films on 35-inch and 70-inch screens, and found that the larger screen elicited a greater level of attention, sense of reality, and presence. Similar studies conducted by Lombard [5] and Detenber and Reeves [6] also found that participants experienced greater enjoyment and arousal and remembered content better when exposed to larger screens.
Does Panel Type Matter for LCD Monitors?
283
High-resolution display is another key factor contributing to greater presence. Bocker and Muhlbach found that participants exposed to higher resolution display in a video conferencing system elicited greater sense of communicative presence [7]. A later study conducted by Neuman also found that high resolution images evoked a higher level of self-reported presence than standard resolution [7]. In accordance with these previous studies that explored the effects of display characteristics on viewers’ psychology, the goal of the present study is to identify the effects of LCD panel type, another display characteristic that had never been studied before, on viewers’ perceived viewing quality. Therefore, we examine the following research question: RQ: For LCD monitor users, controlling for content and time spent on the monitor, what is the effect of LCD panel type and stimulus type (Independent Variables) upon viewers’ satisfaction with image quality and technical features (i.e., response rate, viewing angle, and contrast ratio) as well as perceived level of presence and enjoyment (Dependent Variables) of movies and games?
2 Method 2.1 Design and Participants A fully-crossed 3 (panel type: S-IPS vs. S-PVA vs. TN) x 2 (stimulus type: game vs. movie) between-subjects experiment was conducted to answer the research question. Data were analyzed from 60 undergraduate and graduate students from a four-year university in Seoul who signed up for the experiment through an online registration page posted on the university’s main homepage. Exactly 30 men and 30 women (average age of 23) signed up for the experiment. All participants signed an informed consent form prior to their participation. 2.2 Apparatus A 26-inch LG S-IPS panel monitor, a 27-inch Samsung S-PVA panel monitor, and a 26-inch Samsung TN panel monitor were connected to three high-performance desktop computers with identical hardware specifications (i.e., manufacturer, CPU speed, RAM, and graphic card) via DVI connection. Each computer was equipped with a Logitech 5.1-channel surrounding sound headphone. The monitors’ brand logos were masked in order to avoid potential effects of manufactures’ brand reputation, and the monitors’ user-changeable settings (e.g., color tone, brightness, and contrast) were set to the factory standard. Participants in one condition could not see the monitors used in the other two conditions. Above configurations remained the same throughout the experiment. 2.3 Stimulus Material A pursuit scene in downtown Paris from G.I. Joe: The Rise of Cobra was selected for the movie-watching condition. The present study intentionally chose a speedy pursuit scene from the movie, instead of a slow, long-take scene, in order to allow participants to identify potential motion blurs caused by difference in response rate of each monitor. The film was played in 1080p high-definition Blu-ray quality.
284
K.J. Kim and S. Shyam Sundar
The game used for the gaming condition was Burnout Paradise. This racing game was selected because it was easy to navigate the vehicle using a keyboard and its content was similar to the Paris pursuit scene from G.I. Joe. Burnout Paradise allows players to change the point of view (first or third person) and to select a vehicle that the player wishes to drive. The present study instructed participants to play the game in the first-person point of view based on previous game studies suggesting that playing a game in the first-person point of view resulted in greater involvement and immersion [8]. 2.4 Procedure Participants were randomly assigned to one of the six conditions. In the gaming condition, brief instructions about game controls and navigation were provided, along with an opportunity to test drive for one minute in order to build familiarity with keyboard-based controls. Participants were instructed not to change the vehicle and the first-person point-of-view mode while playing the game, but told that they could freely adjust the volume and sitting posture to their comfort level. Once the experimenter finished giving the instructions, participants were told to start playing the game for 10 minutes. The experimenter left the room. After 10 minutes, the experimenter re-entered the room and administered a paper-and-pencil questionnaire measuring participants’ level of satisfaction with the monitor they used as well as their perceived viewing/rendering quality. In the movie-watching condition, participants were told that they were going to watch a scene from G.I. Joe for 10 minutes alone in the room, and instructed not to modify any configurations on the computer while the movie was being played, except the volume and their sitting posture. The experimenter started the movie and then left the room for 10 minutes. The paper-and-pencil questionnaire was administered when the experimenter re-entered the room. After filling out the questionnaire, participants in both conditions were debriefed, paid five dollars, and asked not to discuss the experiment with others. 2.5 User Experience (UX) The present study created a questionnaire composed of 19 items measuring participants’ level of computer literacy and satisfaction with the monitors, as well as perceived level of presence and enjoyment. Participants responded to each item by marking on a 10-point Likert scale (1=“strongly disagree,” 10=“strongly agree”).
3 Results A series of General Linear Model analyses was conducted with the two manipulated independent variables (panel type and stimulus type) and one measured variable (computer literacy level) on each of the 19 UX variables. The analysis found interactions between stimulus type and computer literacy for perceived realism (measured by “During the game/movie, I felt I was in the world the game/movie created”; F(1,48)=7.65, p<.05), satisfaction with overall viewing experience (measured by “I was satisfied with the overall viewing experience provided by the LCD monitor”; F(1,48)=4.54, p<.05), and presence (measured by
Does Panel Type Matter for LCD Monitors?
285
“When the movie/game ended, I felt like I came back to the real world after a journey”; F(1,48)=4.62, p<.05). For participants in the gaming condition, computer literacy level was positively related to perceived realism, satisfaction with overall viewing experience, and level of presence regardless of panel type used for playing the game. For participants in the movie condition, however, a reverse relationship was found, such that the computer literacy was negatively related to these three UX outcomes. The analysis revealed a three-way interaction predicting users’ satisfaction with contrast ratio (measured by “I was satisfied with the contrast ratio of the LCD monitor”; F(2,48)=4.12, p<.05). In the gaming condition, the relationship between computer literacy and satisfaction with contrast ratio was dependent on panel type, such that computer literacy was positively related to satisfaction with contrast ratio for those using the S-IPS and TN panel, and negatively related to those using the SPVA panel (Fig. 1). For those in the movie-watching condition, computer literacy was negatively related to satisfaction with contrast ratio for those using the S-IPS panel, whereas it was unrelated to those watching with the TN and S-PVA panel.
Fig. 1. Three-way interaction for users’ satisfaction with contrast ratio
Another three-way interaction was discovered in predicting users’ satisfaction with overall viewing experience (measured by “I was satisfied with the visual display quality of the LCD monitor”; F(2,48)=3.30, p<.05). For participants in the moviewatching condition, computer literacy was negatively related to overall viewing experience whereas it was positively related for those in the gaming condition. This pattern was found only when using the S-IPS panel monitor. There was no such genre-based distinction in the relationship between computer literacy and viewing experience for those using the other two panels. A three-way interaction for presence (measured by “The game/movie-generated world seemed to me somewhere I really visited”) was also significant, F(2,48)=4.32, p<.05. In the gaming condition, computer literacy was positively related to presence for those using the S-IPS and S-PVA panel (Fig. 2). For participants in the moviewatching condition, however, computer literacy was positively related to presence for those using the S-IPS panel and negatively for those using the S-PVA panel, whereas it was unrelated to those using the TN panel in both conditions.
286
K.J. Kim and S. Shyam Sundar
Fig. 2. Three-way interaction for users’ perceived level of presence
For enjoyment, a main effect for panel type was found, F(2,48)=3.47, p<.05, such that participants exposed to the S-PVA panel (M=8.76, SE=.37) scored significantly higher on the “I enjoyed playing/watching the game/movie” item than those exposed to the S-IPS (M=7.50, SE=.39) and TN panel (M=7.58, SE=.39). Another main effect revealed that participants in the movie-watching condition (M=8.47, SE=.31) scored higher on the item than those in the gaming condition (M=7.43, SE=.32), F(1,48)=5.38, p<.05. Yet another main effect showed that the higher the computer literacy level, the greater the enjoyment, F(1,48)=14.38, p<.001. In summary, the present study found that computer literacy is a powerful factor (perhaps even more important than panel type) affecting users’ perceived level of realism and presence. It also has effects on monitor users’ enjoyment and satisfaction with contrast ratio, a key technical feature of LCD monitors. In general, computer literacy was positively associated with viewing experience, presence and satisfaction with contrast ratio when the IPS panel-type monitor is used in an interactive mode, i.e., for playing games. It is however negatively associated with viewing experience and satisfaction when used in a non-interactive mode, i.e., watching movies. As for the S-PVA panel type, computer literacy is positively associated with presence when playing games but negatively associated with contrast-ratio satisfaction.
4 Discussion Although the present study did not find a particular LCD panel that elicited the highest level of satisfaction, enjoyment, and presence under all conditions, our findings suggest important guidelines and insights for future research on psychological effects of display devices. In general, participants in the present study showed a slight preference for the S-PVA panel over the S-IPS or TN panel types in terms of enjoyment; participants who interacted on a monitor with the S-PVA panel perceived higher level of enjoyment than their counterparts in the other two conditions. However, no main effect for panel type was found on other UX measures. This is because the effects of panel type are qualified by users’ prior experience with computing technology.
Does Panel Type Matter for LCD Monitors?
287
This finding encourages consumers to ask a practical question: Is it worth the money? Monitors with PVA panel are generally more expensive than IPS and TN panel monitors. The PVA panel monitor used in the experiment (MSRP=$1,100) was nearly twice as expensive as the S-IPS (MSRP=$640) and TN panel monitors (MSRP=$480). Whether consumers are willing to pay that much more to buy PVA panel monitors for “supposedly” greater enjoyment is questionable because consumers’ preference is largely influenced by a combination of product price and visible differences noticed during in-store demonstrations (which may not always be fair comparisons). Given our findings, only high-end gamers who are highly computer-literate may find the presence afforded by S-PVA to be worth the added expense. But then, our interaction findings indicate that IPS panel, which is half the price, will do just as well for gaming (and presumably other interactive TV experiences) as long as the user is computer-literate. Unexpectedly, computer literacy turned out to be quite a significant predictor. For gaming, participants with higher computer literacy level were more satisfied with their overall viewing experience while experiencing greater realism and presence and were more satisfied with their overall viewing experience. However, a reverse relationship was found for movie-watching. We may interpret this as follows: individuals with high computer literacy level tend to focus more on active interaction with technology (i.e., playing the game using both mouse and keyboard simultaneously), whereas those with low computer literacy level focus more on the content (i.e., movie) delivered to them. This provides valuable insights for future studies on the relationship between computer literacy and technology acceptance. The implications of the present study suggest that display engineers and manufacturers should be looking more carefully at how panel characteristics (e.g., response rate, viewing angle, and contrast ratio) affect and interact with users’ viewing/interaction experience and overall enjoyment. The main effect for stimulus type suggests that interactive use of TV display systems, such as in games, leads to inherently lower ratings of panel characteristics and reported enjoyment. This means designers of display technologies and related systems that enable interactive TV ought to work harder to satisfy users regarding the fidelity of video rendering on their screen. Newer display technologies such as LED (light-emitting diode) and AMOLED (active-matrix organic light-emitting diode) provide avenues for further research. Researchers from psychology and communication may also examine how individual differences such as computer literacy, educational attainment and age affect user acceptance of display technologies that offer slightly superior quality at a heavy price. A notable limitation is that the present study allowed first-person point of view in the gaming condition. While playing the game in first-person is likely to result in greater involvement and immersion in the game [8], the movie was shot and played with the third-person point of view, thus producing a confound. A recent study on game and violence suggested however that playing a game in third person, not in first person, resulted in increased involvement and focus [9], which might explain why participants enjoyed watching the movie significantly more than playing the game. Ongoing research will further probe the impact of panel type on viewing-angle satisfaction and content enjoyment, as well as the relationship between these dependent measures. We will also take into account the size of participants’ own
288
K.J. Kim and S. Shyam Sundar
computer monitors and whether they are aware of LCD panel differences, and analyze these variables as covariates in the relationship between the panel type and viewing experience. Acknowledgments. This study was supported by a grant from the World-Class University program (R31-2008-000-10062-0) of the Korean Ministry of Education, Science and Technology via the National Research Foundation. The authors wish to thank Sungyeon Kim and Sanghun Kwak, Ph.D. students at the Department of Science, Sungkyunkwan University, for their collaboration in the data collection process.
References 1. Yoon, S., Won, T.: Electrode Structure for High Transmittance and Aperture Ratio in TFTLCD. Journal of Materials Processing Technology 191, 302–305 (2007) 2. Lyu, J., Sohn, J., Kim, H., Lee, S.: Recent Trends on Patterned Vertical Alignment (PVA) and Fringe-field Switching (FFS) Liquid Crystal Displays for Liquid Crystal Television Applications. Journal of Display Technology 3, 404–412 (2007) 3. Cleverdis Special Report, http://www.cleverdispdfdownloads.com/pdf_files/ spr_lpl_05.pdf 4. Reeves, B., Detenber, B., Steuer, J.: New Televisions: The Effects of Big Pictures and Big Sound on Viewer Responses to the Screen. In: 43rd Annual Conference of the International Communication Association, Washington, D.C (1993) 5. Lombard, M.: Direct Responses to People on the Screen: Television and Personal Space. Communication Research 22, 288–324 (1995) 6. Detenber, B., Reeves, B.: A Bio-informational Theory of Emotion: Motion and Image Size Effects on Viewers. Journal of Communication 46, 66–84 (1996) 7. Lombard, M., Dittion, T.: At the Heart of It All: The Concept of Presence. Journal of Computer Mediated Communication 3 (1997) 8. Tamborini, R., Eastin, M., Lachlan, K., Skalski, P., Fediuk, T., Brady, R.: Hostile Thoughts, Presence and Violent Virtual Video Games. In: 51st Annual Conference of the International Communication Association, Washington, D.C (2001) 9. Farrar, K., Krcmar, M., Nowak, K.: Contextual Features of Violent Video Games, Mental Models, and Aggression. Journal of Communication 56, 387–405 (2006)
ModControl – Mobile Phones as a Versatile Interaction Device for Large Screen Applications Matthias Deller1,2 and Achim Ebert1 1
University of Kaiserslautern, Gottlieb-Daimler-Straße 47, 67663 Kaiserslautern, Germany 2 DFKI GmbH, Trippstadter Straße 122, 67663 Kaiserslautern, Germany [email protected], [email protected]
Abstract. Large, public displays are increasingly popular in today’s society. For the most part, however, these displays are purely used for information or multimedia presentation, without the possibility of interaction for viewers. On the other hand, personal mobile devices are becoming more and more ubiquitous. Though there are efforts to combine large screens with mobile devices, the approaches are mostly focused on mobiles as control devices, or they are fitted to specific applications. In this paper, we present the ModControl framework, a configurable, modular communication structure that enables large screen applications to connect with personal mobile devices and request a set of configurable modules, utilizing the device as a personalized mobile interface. The main application can easily make use of the highly sophisticated interaction features provided by modern mobile phones. This facilitates new, interactive appealing visualizations that can be actively controlled with an intuitive, unified interface by single or multiple users. Keywords: Interaction framework, Distributed interfaces, Input devices and strategies, User-Centered Design.
The drawbacks of these devices are the limits imposed by hardware and physical restrictions. Although mobile processors are becoming faster, the need for saving energy puts restraints on the complexity of computing tasks achievable with the device. Also, the memory available on smart phones is extremely limited, impeding works with large data sets. Finally, the size of the display puts natural hindrances on complex visualizations. On the other hand, physically large displays are becoming more and more common, to the point of being almost omnipresent. Their forms range from large personal screens like home theatre screens to public screens for advertising or education. A problem with such screens is that the usual interaction metaphors with keyboard and mouse are no longer adequate for interacting. For one, large displays usually include an area in front of the display, in which viewers can move to get an overview or to obtain a closer look on details. Consequentially, interaction devices for large screens should support this kind of mobility, prohibiting stationary devices like keyboards or mice. In the case of public screens, there is often the problem that the screens are not physically reachable by users, either because they are out of the user’s range, or because accessibility is restricted to prevent vandalism. In these cases, touch interaction is also not an option to interact with the application. While there are approaches to combine large displays and mobile devices, they are mainly focused on either using the mobile as a pure controlling device for the display, or using the large screen as auxiliary display for the mobile device. In this paper, we present ModControl, a fusion of both large display and mobile device interaction. With it, we provide a flexible and versatile interface for mobile devices to interact with large screen applications. Using an XML communication framework and module-based, configurable clients for mobile devices, we enable sophisticated and intuitive metaphors for large screen interaction, while providing users with personalized access to collaborative large screens. Possible application scenarios include electronic pin boards, private information retrieval for public displays, or appealing entertainment applications like collaborative games.
2 Related Work The idea of combining large public displays with personal mobile devices has been around for some years now. In most cases, research can be divided in two categories, determined by the direction of the flow of information. In one case, the main part of the application is carried out on the mobile device, with the large screen as auxiliary output devices. In the other case, the mobile device is used as control device for a large screen application or other systems that can’t be interacted with directly. The WebSplitter project [5] offers interaction in the output direction, by specifying different end-point devices as targets for web-based multimedia content. Based on rules created via an XML protocol, specific multimedia content can be sent to corresponding devices. Connected devices are in this case used only as output for the content, without being able to provide feedback to the system. Conversely, the “inverted browser” of Raghunath et al. [9] is a browser application running on the mobile device. A network service running on the computer driving the large screen enables the mobile device to push content from the browser application to be displayed on the large screen. This approach is motivated by the idea of symbiotic display environments [3].
ModControl – Mobile Phones as a Versatile Interaction Device
291
A different approach for multiple display environments is used by the Universal Interaction Controller of Slay and Thomas [11]. The UIC consists of a spatially aware mobile device, the Ukey, an interaction manager, and a clipboard manager. With the Ukey, the user can select an active screen from the connected displays, and then interact with the contents of the display using the handheld device. Movement of objects on one display is possible, but the main benefit of the system is to be able to transfer objects between screens by copy and paste. Another type of connection between large screens and mobile devices is a method called “peephole navigation” [12]. This technique requires a spatially aware mobile device. By moving it around in two or three dimensions, the user can utilize it as a metaphorical “window” to different parts of a large virtual workspace. At the same time, the pen interaction of the PDA can be used to manipulate said environment. Myers et al. developed the Pebbles system, which consisted of several Palm Pilot PDAs attached to a single PC via serial connections, enabling several users to interact with the system simultaneously. However, interaction was restricted to low-level events sent from the PDAs to the PC. Later, the authors extended their approach with the “Semantic Snarfing” technique [7]. Here, users could interact remotely, using a combination of PDAs and Laser Pointers. In this manner, it was possible to select images, menus or text items on the PC screen, causing an adapted copy to appear on the PDA. Here, they could be interacted with appropriately and subsequently sent back to the main application. Another possibility adopted by several projects is to use the camera integrated in most mobile phones for interaction. Madhavapeddy et al. [6] use spot codes to tag objects displayed on the large screen. By analyzing the image of the phone’s camera, they can identify which tag the device is currently pointed at, as well as additional information encoded in the tags. Using this information, the user can navigate the application by selecting the appropriate objects/tags. A similar method is presented by Ballagas et al. in [2]. The point & shoot interaction technique is also based on visual codes embedded in the main visualization. In this case, however, the codes are not attached to objects, but rather form a grid covering the whole display area. Since the grid of visual codes is occluding a lot of information contained in the original visualization, the codes are only displayed for a short time span following the user triggering a selection. Another technique the authors describe is the sweep method. Here, the optical flow of the camera’s image resulting from movement of the device is analyzed to implement a relative movement control for the mouse cursor. A comprehensive list of possible interaction subtasks using mobile devices as well as different approaches to address these tasks can be found in [1]. More recent research, e.g. [10] and [4], investigates the performance of interfaces using the same approach as ModControl, namely distributing the interface between a large screen application and one or more small screens that are not only used to control the main application, but also to complement the large display with a small, private screen. The authors of [4] conducted a user study with three different applications to compare performance of interaction widgets that are distributed on large and small screens with widgets that are placed solely on a large display. While overall performance was equal in both cases, the degree of user satisfaction was higher and the error rates lower in the large screen/small screen scenario.
292
M. Deller and A. Ebert
3 The ModControl Framework Principally, the ModControl Framework is a Server-Client setup. A message server is responsible to keep track of the connected clients and passing messages between them. Received messages are put on a FIFO stack and then distributed to the corresponding clients, which can be categorized in application clients and interaction clients. Application clients are clients that require input or data from other clients, like a large public screen without inherent interaction possibilities. In the majority of cases, there would be only one application client connected to the server, but the framework supports different application clients making use of the same network. The interaction clients, on the other hand, are modularized apps running on the user’s personal mobile device. They enable users to interact with the main application by providing information or input data from the mobile device or present specific and personalized data on it. For this, the main application can send requests for specific modules to the interaction clients and register for events sent by these modules. 3.1 Communication and Message Format The communication messages of ModControl are in XML format, making it easy to add additional, application-specific messages. All Messages have a root element of the type msg. This element has the required attributes class and type. Another attribute that can be set by the message server is the sender attribute, which contains the sending client's ID that has been assigned when the client first connected. Additionally, the target-attribute can be included to send messages to specific clients. Basically, the class attribute can be used by an application to group messages into specific subgroups, or to distinguish messages of different main application using the same message server. The only exception is the sys class, which the ModControl framework reserves for network management like pings and registering clients for specific message types. The order in which the clients connect to the server is irrelevant, but in most cases the application client will be the first client to connect and start listening for connecting interaction clients. In both cases, the connection is handled by the XMLClient. This class capsules all required low-level functionality. The application client should derive a customized class to provide callback functions for received interaction messages. As soon as an interaction client connects, the application client can send it a message of the type needModule to indicate which types of interaction modules are supported by the application. 3.2 Modular Client and Implemented Modules The interaction client running on the mobile device is essentially a wrapper taking care of server connection and communication, and housing the modules requested by the application client. This is done by creating tabs for each active module. Whenever a needModule message is received, a new tab for the module with the corresponding creation parameters is added to the client. This message has the required parameters
ModControl – Mobile Phones as a Versatile Interaction Device
293
Fig. 1. Users can control applications without obstructing the main visualization
hidden and priority, determining whether the new module should be visible in the tab list. The higher the module’s priority, the higher up in the list it will appear. Also, the needModule message can define a module identifier, and a label that will represent it in the tab list. The main application can also switch to a specific tab by sending a showModule message to the interaction client. Changing the active tab on the client will result in a shownModuleChanged message that will be sent to all clients that have registered for that message type. Connection Module: This is the initial module always available on the interaction client. It is used to establish the connection to an XML message server or to end an already existing connection. The tab for the connection module consists simply of two text input fields for server address and desired user name. Text Module: As the name implies, this module is intended to either show fixed text information, or enable editing a text field. The text for the module can be set by including a text element at creation in the corresponding needModule message, or by a separate setText message. If the user has finished editing or changed the active module, a textChanged message containing the new text is sent to registered clients. Mouse Emulation Module: With this module, a touch capable interaction client is used to emulate a touch pad. The display area of the tab is used to track the user’s finger movements and translated into relative movement events for the main application’s cursor. At the creation of the module, the application client can include a numberOfButtons element in the needModule message to add one to three virtual mouse buttons to the touch pad area. A touchBegan message is generated whenever a new finger touches the pad area or the virtual buttons, containing the coordinates of the touch, a tap count, the timestamp and, if applicable, the index of the pressed button. If an existing touch is moved around the pad area, a touchMoved with similar parameters is generated. Finally, a touchEnded message is created for each finger leaving the pad area.
294
M. Deller and A. Ebert
Image Selection Module: The image selection module can be used to display arbitrary images on the mobile device’s display. When creating the module with a needModule message, the application client can include an imageData element containing the image encoded with base 64 RFC 3548. In case the image is too large to fit in the mobile’s display area, the application client can also determine x and y offset to ensure that a specific part of the image is shown. Users can use the mobile’s capabilities to rotate, pan or zoom. Whenever the position or scale factor of the image is changed, an according message with the updated value is sent. By using a double tap on the currently shown image part, the user can generate an imageTapped message. The message includes the current orientation, offset and scaling factor of the image on the mobile’s screen. Additionally, it contains a timestamp, the touch’s position on the screen, the number of taps, and finally, the coordinates of the touch in the image’s coordinate system. It is also possible to use the mobile client to facilitate multi-user picking functionality for the main application by sending a screenshot to the mobile device and use the returned tapping coordinates for object picking. SwitchList Module: Another important module for the mobile client is the switch list. It is used to control a collection of on/off type parameters. The application client has to provide a list with elements that constitute the switches displayed in the list. The state of one or more entries can also be set subsequently with a setStates message. Whenever one of the states is toggled, the client generates a switchChanged message containing the name of the switch and the new state. Accelerometer / Magnetometer Module: The accelerometer/magnetometer modules are the only modules that have a permanent hidden status, meaning they will not appear in the module tab. The only parameter that can be given for these modules is the desired update rate per second. The data itself is sent by an accelerometerData or magnetometerData message, respectively. Both messages include a data element containing the x, y, and z values for the corresponding sensor as well as a timestamp. 3D Navigation Module: The most specific of the currently implemented modules is a module for navigating in 3D surroundings. It is intended to provide a fly- or walkthrough metaphor navigation for virtual environments utilizing both touch recognition and accelerometer data. The module’s tab on the client consists of a navigation dial indicating the major movement directions. Users can move forward and backward by sliding their finger along the vertical axis, determining direction and speed of the movement. By moving the finger along the horizontal axis, a strafing movement is controlled in the same manner. By tapping and holding the buttons at the end of the horizontal axis, the virtual viewpoint can be rotated. To enable movement along the third axis, the user can activate the acceleration-controlled fly mode by tapping a button below the navigation dial. This causes the module to send additional accelerometer data that can be used to control the pitch and yaw orientation.
4 Demonstration Application As an initial demonstration and evaluation scenario, we decided to use an existing real world application that has need of several visual interfaces even on a normal desktop computer. The application we decided on is a large screen visualization of the OpenStreetMap-3D dataset [8]. It offers a Web 3D service providing a Digital
ModControl – Mobile Phones as a Versatile Interaction Device
295
Elevation model of Germany as a 3D scene graph. Our initial application was an implementation to visualize said data interactively on a large stereoscopic screen. To navigate the 3D environment, the user could use a fly through metaphor. This was done with the standard combination of keyboard and mouse that is used in most 3D games. The speed of the movement was fixed, but could be adjusted using the keyboard. Another key would superimpose a map of Germany over the main visualization, highlighting the current position. Moving the mouse to select a target location and clicking on it, the user could quickly teleport to this location. The overall scene graph of OSM-3D consists of several layers. In addition to the basic DEM data, there are layers for building models, labels for cities, and so on. These could be toggled to be shown in the visualization by pressing another function key on the keyboard. This would bring up a checklist where users could toggle visibility of the layers. Using hotkeys, the user could also display various status values of the application, such as a FPS counter, the currently loaded tiles etc. This led to several problems with regard to interaction with the application. For one, navigating the environment was rather awkward, forcing users to stand at a fixed position with keyboard and mouse. Also, they had to memorize keyboard shortcuts to invoke functions like the layer list or the teleport map. Also, visualizations like the map or the checklists occluded most of the actual visualization. To alleviate these problems, we applied the ModControl framework to create a new distributed interface and facilitate a more intuitive and flexible navigation for the large screen application. The user can now intuitively control the movement of the virtual camera using only one hand while moving around in front of the display. Also, the speed of the movement is controlled continuously without the need of adjusting it via additional buttons. The teleportation functionality is realized by using the image selection module of the framework. When the corresponding tab on the mobile device is activated, the application client will generate a texture of the map and send it to the interaction client. There, the user can navigate on the map and select the target of the teleportation without interfering with the main visualization. By additionally employing the accelerometer module, the user can present the map to others by making a “pushing” motion towards the large screen, causing the map to be shown over the main visualization. The switching of layers and information overlays was transferred to corresponding switchlist modules on the mobile client. Furthermore, several users with mobile interaction clients can control the system at the same time. To avoid confusion and conflicts in navigation, the features controlling the virtual camera are restricted to the first connected interaction client, while other interactions like the toggling of layers can be controlled by all users.
5 Conclusion and Future Work In this paper, we presented ModControl, a flexible and versatile framework that enables personalized multi-user interaction with large screens using a module-based client on a mobile device. Applications can request specific modules on the mobile and so take advantage of the sophisticated interaction possibilities of modern smart phones. By using a message server/client structure, an application can also address multiple clients at the same time, enabling personalized interaction for multiple users.
296
M. Deller and A. Ebert
Currently, we are designing more specific evaluation scenarios to be used in a formal user study of our system. Also, we aim to provide additional interaction modules for other features of mobiles, e.g. supporting cameras or multitouch gestures.
References 1. Ballagas, R., Borchers, J., Rohs, M., Sheridan, J.G.: The Smart Phone: A Ubiquitous Input Device. In: IEEE Pervasive Computing, vol. 5(1), p. 70. IEEE Computer Society, Los Alamitos (2006) 2. Ballagas, R., Rohs, M., Sheridan, J.G.: Sweep and point and shoot: phonecam-based interactions for large public displays. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1200–1203. ACM Press, New York (2005) 3. Berger, S., Kjieldsen, R., Narayanaswami, C., Pinhanez, C., Podlaseck, M., Raghunath, M.: Using Symbiotic Displays to View Sensitive Information in Public. In: Proceedings of the Third IEEE International Conference on Pervasive Computing and Communications, pp. 139–148. IEEE Computer Society, Los Alamitos (2005) 4. Finke, M., Kaviani, N., Wang, I., Tsao, V., Fels, S., Lea, R.: Investigating distributed user interfaces across interactive large displays and mobile devices. In: Proceedings of the International Conference on Advanced Visual Interfaces, pp. 413–413. ACM Press, New York (2010) 5. Han, R., Perret, V., Naghshineh, M.: WebSplitter: A unified XML framework for multidevice collaborative Web browsing. In: CSCW 2000: Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, pp. 221–230. ACM Press, New York (2000) 6. Madhavapeddy, A., Scott, D., Sharp, R., Upton, E.: Using Camera-Phones to Enhance Human-Computer Interaction. In: Adjunct Proceedings of The Sixth International Conference on Ubiquitous Computing. ACM Press, New York (2004) 7. Myers, B.A., Peck, C.H., Nichols, J., Kong, D., Miller, R.: Interacting at a Distance Using Semantic Snarfing. In: Proceedings of the 3rd International Conference on Ubiquitous Computing, pp. 305–314. Springer, Heidelberg (2001) 8. OSM-3D Germany, http://www.osm-3d.org/home.en.htm 9. Raghunath, M., Ravi, N., Rosu, M.-C., Narayanaswami, C.: Inverted Browser: A Novel Approach towards Display Symbiosis. In: Proceedings of the 4th IEEE International Conference on Pervasive Computing and Communications, pp. 71–76. IEEE Computer Society, Los Alamitos (2006) 10. Sas, C., Dix, A.: Designing and evaluating mobile phone-based interaction with public displays. In: CHI 2008 Extended Abstracts on Human Factors in Computing Systems, pp. 3941–3944. ACM Press, New York (2008) 11. Slay, H., Thomas, B.H.: Evaluation of a universal interaction and control device for use within multiple heterogeneous display environments. In: Proceedings of the 7th Australasian User Interface Conference, vol. 50, pp. 129–136. Australian Computer Society (2006) 12. Yee, K.-P.: Peephole displays: pen interaction on spatially aware handheld computers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1– 8. ACM Press, New York (2003)
A New Visualization Approach to Re-Contextualize Indigenous Knowledge in Rural Africa Kasper Rodil1, Heike Winschiers-Theophilus2, Nicola J. Bidwell3,4, Søren Eskildsen1, Matthias Rehm1, and Gereon Koch Kapuire2 1
Department of Architecture, Design, and Media Technology, Aalborg University, Denmark 2 School of Information Technology, Polytechnic of Namibia, Namibia 3 CSIR-Meraka, Council for Scientific and Industrial Research, South Africa 4 Nelson Mandela Metropolitan University, Port Elizabeth South Africa
Abstract. Current views of sustainable development recognize the importance of accepting the Indigenous Knowledge (IK) of rural people. However, there is an increasing technological gap between Elder IK holders and the younger generation and a persistent incompatibility between IK and the values, logics and literacies embedded, and supported by ICT. Here, we present an evaluation of new technology that might bridge generations and preserve key elements of local IK in Namibia. We describe how we applied insights, generated by ethnographic, dialogical and participatory action research, in designing a structure in which users can store, organize and retrieve user-generated videos in ways that are compatible with their knowledge system. The structure embeds videos in a scenario-based 3D visualization of a rural village. It accounts for some of the ways this rural community manages information, socially, spatially and temporally and provides users with a recognizable 3D simulated environment in which to re-contextualize de-contextualized video clips. Our formative in situ evaluation of a prototype suggests the visualization is legible to community members, provokes participation in design discussions, offers opportunities for local appropriation and may facilitate knowledge sharing between IK holders and more youthful IK assimilators. Simultaneously differing interpretations of scenarios and modeled objects reveal the limitations of our modeling decisions and raises various questions regarding graphic design details and regional transferability. Keywords: 3D visualization, indigenous knowledge, rural, Africa, design.
technological disparities between urban and rural, Western and Indigenous people lie deep tensions between the epistemologies of IK and those that underlie technology design [3]. Here, we step forward in this expedition by reflecting on our recent endeavors to build an IK management system with a rural community in Namibia, a Southern African country. 1.1 Rural-Urban Disparities For many generations, rural communities in Southern Africa have acquired, produced and re-produced knowledge that sustains their lives and their environments. Practices and wisdom that respond to ecological and social contexts, and are locally validated, have enabled communities to successfully husband animals; cultivate and harvest plants; and process and conserve local resources. People share such Indigenous Knowledge (IK) orally by talking, telling stories and by participating in ordinary activities and rituals. However, various changes in Southern Africa, from education over employment to transport, have disturbed the processes of information transfer and threaten the persistence of elements of IK systems. Senior community members, or Elders, die without opportunities to pass on rural practices in ways that are accessible to younger members. At the same time younger members encounter difficulties in undertaking activities that are essential for their well-being and survival and the health of their land without their Elders’ supervision or advice. Namibia’s mandatory education policies mean youth from remote areas are often sent to live with relatives in town where they will remain for years and only return in their holidays [4]. This has several consequences for local IK systems. Firstly, formal education curricula and teaching practices differ significantly from the content and processes of IK. That is, knowledge is constituted within the social and ecological rhythms of daily life for the 12% - 25% of rural residents who have never been to school; but, constituted in books and classrooms according to subjects and study timetables for those attending school. Secondly, in towns youth encounter modern technology and life-styles that contrast with those in their origin villages, which have poor sanitation, no grid electricity and sparse cell-phone coverage. After graduation some people return to their villages to reassume roles in their origin communities but encounter an increasing divide [5,6]; for instance, while they have written literacy [7] other community members have a literacy about the land and they might use technology to communicate with the “outside” world while other community members communicate according to local social protocols. Thirdly, many rural-tourban migrants remain in cities for employment but save money in order to establish homes in their rural villages later on. However, ungrounded in the minutiae of rural living, a migrant’s connection to rural habitat is shaped by globalization and urban power-relations; and, again, when they return with urban-generated assets, they recontextualize rural practices. Now, keeping more livestock than before; now, travelling in vehicles not by foot or on horseback; now, listening to a radio and making detours to access a signal to use their phones not listening to a storyteller around the fire [8]. It is hardly surprising, given differences between rural and urban literacies, that there are few reports about Southern African rural communities appropriating technologies to record or process their knowledge in text, electronically, graphically
A New Visualization Approach to Re-Contextualize Indigenous Knowledge
299
or with videos by themselves. There are, of course, many interpretations of IK recorded by outsiders, such as historical and anthropological accounts and documentary videos. However, this type of media use is not constructed within the communication patterns of local people. Indeed, initiatives to locate technologies in rural knowledge practices are generally sparse and the locale of technology production itself, sited in research labs and design studios in cities and industrialized regions, is a conduit for selective interpretations of rural life [9]. 1.2 Acknowledging an Epistemological Gap Various indigenous communities globally have appropriated multi-media technology to convey their local knowledge to wider audiences [8]. In doing so they respond to certain types of politics which privilege certain sets of social, technical and literary devices and establish certain design paradigms. Leveraging privileged sets can ‘give’ voice to marginalized peoples but, simultaneously, suppress and distort their knowledge traditions [10]. Consider how to achieve ‘development’ agendas people in ‘underdeveloped’ regions draw on formats derived from English-language journalism and project their lived world onto a 2D-plane according to the affordances of cameras in digital storytelling (e.g. [11]) or re-present a set of oral stories in hypertext. These systems for inscription evolved beyond the IK systems of communities that share their knowledge orally, by talking and participating in everyday life not by recording in print or electronically. Choices about what to record and how to represent and disseminate it are performative in producing knowledge. They are rarely domesticated into daily practice by rural communities and, thus, neglect, for instance, information residing in the performance, structure and form of oral practices or authoring relationships between teller and audience. Further, few design studies account for the situated dynamics as IK, narrative and representation entwine or the ways people create meanings with, and about, new representations continuous with their cultural values, logics and literacies and their expectations about technology. The dominant paradigms embedded in ICT solutions re-produce urban and western values, logics and literacies and these are often incompatible with values, logics and literacies of rural African communities. Different knowledge traditions organize and interact with information differently. That is, systems (from chronologies, taxonomies, and cartographies to authorship) do not merely translate knowledge between vocabularies but manifest a community's priorities and assumptions about reality. They draw upon implicit or explicit “theories” which encompass the kinds of relations and dependencies that do, or can, exist and their conditions of existence [12]. For instance, mainstream databases and representation and retrieval systems encode relations inherited from science and certain languages, such as hierarchies and tenses. These relations perpetuate particular perspectives on knowledge, whether that be through the structures embedded, ubiquitously, in computer filing systems to those that represent kin relations in family-trees (e.g. in Facebook) or construct the world visually from an external Point-of-view (e.g. Google Earth). And, through all of these, they are continuously shaped by writing traditions. Dilemmas in designing technologies and media to serve marginalized knowledge traditions are not about whether local knowledge remains superficially the same but what values, logics and literacies are lost in transformation. Consider an Indigenous
300
K. Rodil et al.
Australian Elder’s disappointment with a GPS-system, which was designed to preserve his clan’s knowledge on fire management but did not support the actions involved in “walking country” [8]. Consider also how a usability evaluation revealed that a sophisticated decision support system, based on ecological models, neglected the way that Herero farmers often draw upon their lived familiarity with their kinship in determining their trust of recommendations [13]. Over the three years of endeavors with the community mentioned here we have experienced similar incompatibilities between prototype technologies and members’ information behavior. For instance, the rural Herero community was unenthusiastic about our attempt to use meta-data, extracted from their accounts, and printed text keywords to organize and retrieve video clips that they had collected [14]. Thus, we seek technologies that better align and reconcile with non-Western episteme and alternative approaches to design for the ways the community normally communicates about knowledge. 1.3 Can Visualization Bridge the Gaps? Studies on the use of a GUI by rural communities with strong oral traditions suggest that members can more easily identify cultural icons and visualizations than using text-based technologies [15,16]. Visualizations of culture have a history from the earliest humans (e.g. cave paintings) which suggests that modern visualizations, which go beyond graphic icons, offer opportunities to organize information in ways that are compatible with rural IK systems. For instance, consider how Native Americans explain current situations by drawing upon a collection of stories in which events always relate to places [17] and, then, consider how a 3-D visualization of the places may offer an organizational structure that is compatible with the information conveyed in those stories. However, as noted in previous sections, for such visualizations to support the practices of IK we must account for the fundamental concepts on which they are built and the ways that community members interact with each other through the visualizations. While modern visualization tools can combine a plethora of photo-real, surreal and abstract element in diverse visual realities the compatibility of these elements and combinations with IK is only as good as their designer’s understandings about local concepts. Any visualization of a place represents a selective set of abstractions, including logics about location and time and these are by no means universal [17]. For instance, consider how 3D visualizations that separate geographical locations from temporality inadequately depict Arawakan people’s stories about their journeys in Brazil [10]. Thus, producing a visualization of a place, such as a rural African village, requires compatibility with the local concepts about location and time. Over the past decade a variety of visualizations have been created to depict IK. For instance,[18], amongst others, report on an elaborate 3D geospatial representation built for traditional custodians of the land to tell their stories by allowing users to step, virtually, into the Aboriginal dream world. However, this visualization is mediated by design teams and lacks facilities for Aboriginal people to add their own stories [19]. Further, many evaluations of visualizations designed to assist communities assume that a visualization will be experienced by a user alone, rather than be drawn into oral exchanges between several co-present users. Here, we describe a prototype visualization of a rural village which aims to enable community members to organize information about local practices and wisdom. Our
A New Visualization Approach to Re-Contextualize Indigenous Knowledge
301
design draws on detailed analyses of our extensive observations of these people’s spatial and temporal logics and literacies and interactions with each other and with media [20,12,21]. We begin by summarizing some of the ways our analysis informed designing the prototype. Then we describe how a 3D scenario-based visualization might enable local community members to upload, organize and retrieve their own video recordings of local stories and practices. Next, we present results from a formative evaluation of the visualization with the community; and, finally, we note insights on challenges of cross-cultural scenario-based visualization design that emerged in this endeavor.
2 Places and Representation We developed the 3D visualization prototype as part of a long-term research programme which aims to implement Indigenous Knowledge Management systems to sustain the content, structure and communication of the IK of rural people of the Herero tribe. We chose a village in the Omaheke region in Eastern Namibia as the site for exploring the visualization because we can engage with this community continuously. The village consists of approximately 20 homesteads, each housing about seven people. The Herero include around 240.000 people living in Botswana and Angola as well as Namibia, where they are most numerous and constitute around 9% of the population. Our dialogical and participatory action research approach aims to involve community members in coevolving the design space and exploring how multi-media technology might serve their knowledge system [21]. Thus, over the past three years, we have together acquired valuable design knowledge by mutual learning and discovery. We have undertaken a range of research activities including ethnographic observations, contextual interviews, participatory design sessions, technology probes and prototype evaluations. During this process we collected some 50 video clips, some recorded by researchers and some by community members. These videos include members telling stories, describing scenarios, demonstrating local practices and engaged in everyday activities. We have interpreted and reflected upon the videos with community members in various ways and also applied Grounded Theory to analyze their content independently [12,20]. In the following sections we summarize some of our insights on how this rural community manages information, socially, spatially and temporally which has implications for designing the visualization. 2.1 Social Significance of Places and Knowledge Residents in the village identify with social elements of place and refer to locations almost exclusively in terms of social relationships [12]. They build their kin relations into the physical infrastructure (since they construct their homesteads themselves) and linking places to people seems to be a feature of Herero oral traditions and their Otjiherero language. For instance, praise-names or verses describe and pay tribute to the places Herero society inhabited, before the German conquest, and have suggested cartography of the landscape in relation to people or events [22]. It appears that a familiarity with social relations makes the environment legible. For instance, locations in around villages are neither named nor signposted, although a vehicle
302
K. Rodil et al.
registration plate marks some homesteads. We also found that places, flora and other features are intelligible, and experienced, through their associations with social roles and daily activities. For instance, activities are gendered so the fire is a place for men to eat and talk but for women to prepare food. Villagers also described their wisdom in terms of social relationships. They communicated about their knowledge using real or metaphoric or prototypical examples and these always included relationships between people or between people and artifacts or settings. Residents frequently personalized information for the listener, explaining that ‘‘When you are telling a story with the intention of teaching you would want specific people to listen’’ and speakers judge the relevancy of information according to a listener’s social roles. Conversely, people indicated that they trusted the integrity of information in relation to recognizing the speaker’s pedigree. Further, those involved in recording video insisted that all clips should bear the village and participants’ names. Determining the relevancy of information to a listener and a knowledge-holder’s pedigree involves a deep acquaintance with an intricate web of trans-generational, kin relations [20]. These trans-generational relations are reproduced in interactions with the environment. For instance, consider links between generations performed with respect to the “holy fire”. This feature in some homesteads ritualizes Herero values about their society’s coherence and respect for ancestors’ in giving life and guarding descendants. Men use the fire in rites (e.g. ceremonial slaughtering), healing and appealing to ancestors to address social tensions. The fire has distinctive locational characteristics, sited between the house and cattle corral and separated by stones or a hedge but these vary between homesteads. The fire has a vital temporal dimension because men must ensure it is burning continuously to maintain favor with patrilineal ancestors, who may cause misfortune if displeased. 2.2 First Person Point-of-View Community members refer to people and locations using Points-of-view (POV) that are intrinsic, rather than extrinsic, to the world. This was very clear when we asked participants to arrange thumbnail images spatially taken from videos they made [12]. While they walked through dense bush directly to locations, they less confidently created a geospatially accurate, aerial view despite the proximity of these locations. Community members scaled the map to the immediate area of the homestead where they sat and were reluctant to extend or re-scale to include more of the village. They sorted thumbnails to isolate those they wanted to arrange on their map effortlessly, but spent more time gesticulating around the homestead and talking about people and activities in clips than mapping. That is, people related to their environment more acutely from a First-person POV. Indeed, Otjiherero often expresses the land as a continuum in which vegetation increases with distance from the speaker and a co-located listener. We propose that a POV that is intrinsic to the world may also relate to the social-relational space of the knowledge system. People often used spatial metaphors in speech when referring to relationships; such as describing people on ‘‘this side’’ or, ‘‘looking in opposite directions’’, and this might reflect that their bodies orient spatially in social relations. Certainly, community members’ use of cameras to record embodied relations between the cameraman and those recorded [20,12].
A New Visualization Approach to Re-Contextualize Indigenous Knowledge
303
2.3 Spatial and Temporal Relationships Villagers’ paths inscribe knowledge into settings as they move between and beyond homesteads and we have proposed that their movements structure their narratives [12]. Homesteads are approximately 2 km apart, separated by communally used pasture, and consist of huts within a fenced yard accessed by several gates and corrals. People undertake daily activities on foot or, occasionally, on horseback, and their movements often reflect the role of animals. For instance, men maintain camps to distribute cattle over large tracts of pasturage and thus their movement between camps invests the setting with meanings. Closer to home, people can walk between their huts, the fire, on which women cook farmed animals, and the cattle corral but must pass behind the hut to slaughter goats and cook game. That is, their paths are shaped by patterns of activities and, reciprocally, features and settings are legible via daily rhythms. For instance, a familiarity with livestock and daily and seasonal rhythms enables way finding beyond villagers’ permanent homesteads. When uncertain of direction villagers follow passages created by cows’ paths, read the movement of cattle between rivers, villages and corrals and recognize individual cows’ footprints. [12]. Often when viewing videos villagers made sense of their contents by drawing on oral cues that referenced to seasonality, movement and sequence. Our analysis suggests that sequencing events, through movement, might enable remembering and recollecting personal and collective history, as villagers sometimes structured relationships between specific features in locations, people, livestock, and events within journeys. We also observed that community members recounted knowledge using spatio-temporal references and sequencing to order events and objects coherently. For example, one Elder referred, spatially, to a fruiting plant growing behind the homestead, before telling how to clean, cut and put its root in a calabash to sour milk and also noted learning about a novel herb: “I went to a second place in the village and was looking after the goats that got lost”. [12]. That is, villagers accumulate knowledge within sets of everyday activities not by assembling separate observations at discrete times and locations. 2.4 Re-Creating Context As noted in the previous section we found that recording relationships between bodies, artifacts and settings is often difficult and this constrains the knowledge a video can represent. Community members often indicated that the videos incompletely depicted knowledge. For instance, in watching a clip which shows, visually, the side on which a cow is milked, Elder-1 said it lacked information to explain ‘‘From what side, left or right?’’. Settings were not always legible and villagers often found it easier to identify features from spoken sequences instead. Thus, while video records verbal descriptions, bodily actions and camera-use it excludes much information by projecting the world onto a 2D-plane. Further, the action of recording interrupts social practices in information transfer. Indeed, one Elder reflected later that he should have interviewed more actively than he would have done in an ordinary exchange so that subjects would “explain to the video” what they were doing. Viewing clips provoked community members to note that they
304
K. Rodil et al.
needed to record more detailed explanations [20] or they added their own stories and more situational information to enrich the context for understanding. Villagers preferred to view video as a group and their exchanges about the videos further indicated the role of layering context orally to share their understandings.
3 Scenario-Based 3DVisualization Prototype Our aim in creating a 3D visualization prototype is to produce a structure in which users can store, organize and retrieve an expanding corpus of user-generated videos in ways that are compatible with their knowledge system. The structure should enable rural Elders, who are considered by community members to be knowledge holders to transfer information, asynchronously, to rural-to-urban migrants, who we consider to be knowledge assimilators. To ensure that the information structure is compatible with the rural knowledge system it must enable Elders who may be textually illiterate and have little access to, and experience of, technology to store and organize videos in ways that persist important elements of social contexts. We use various mechanisms to afford elements of social context. Firstly, we embed videos in a recognizable 3D visualization of highly familiar features of the users’ village and populate this with ‘generic’ models of people. We position videos at locations representing those where the video was filmed and propose that users’ familiarity with places in the village means that they will interpret represented locations within a social-relational space. For instance, knowing particular gender protocols associated with locations makes activities in that location intelligible and can guide a user’s search for videos about those activities. Secondly, beyond graphically modeling the village we create scenarios at the locations at which videos are embedded. The scenarios are animated models of people associated with an audio narrative and we propose that by simulating aspects of communication patterns these provide resources for users to make appropriate connections in their social-relational space and restrict multiple interpretations. Thirdly, we did not prescribe paths between scenarios but rather modeled some of the paths that farmed animals follow through the village to provide some of the patterns that villagers use to navigate their setting. In the final system, Elders will position new videos at the appropriate locations in the 3D environment for the younger rural-to-urban migrants to assimilate the content within the appropriate context. The current prototype was conceptualized and built by two new members of the project team (Authors 1 and 4) during their first visit to Namibia in collaboration with researchers involved in the research programme from its inception, including a Herero man from the village (Author 6). We modeled the village from Author 6’s specifications and by using reference photos and videos depicting specific landmarks [23], including Google Maps satellite images. We created the visualization using the Unity 3D game engine. The visualization consists mostly of trees and bushes and a few man-made objects, such as houses, fences, water-pump, fire places and car tracks (see Figure 1) which marked specific locations as we saw on photos and videos. We populated the static environment with animals and people.
A New Visualization Approach to Re-Contextualize Indigenous Knowledge
305
Fig. 1. The visualization including all the 3D objects constructed from references in the village
3.1 Resemblance and Animation We focused on graphical resemblance of objects in the village and geographical positioning to support community members’ recognition of locations. We assumed that community members were more likely to recognize homesteads if we modeled huts with high photo-realism, matching colors and shape, and including details like flowers (e.g. see Figure 2). We invested considerable effort in modeling goats and cows as both our ethnography and participants’ frequent choice in activities they recorded suggested they are a focus of local knowledge. We attempted to model not only the forms of these animals but also their behavior by animating their bodily movements and simulating the paths they routinely follow through the village. We also modeled more nuanced details, such as smoke from a burning fire.
Fig. 2. The house on the left is a 3D visualization of the right
306
K. Rodil et al.
3.2 Scenarios and Embedded Videos Scenarios, linking locations in the village to social activities and communication protocols, act as triggers to launch corresponding videos. The videos record community members telling stories, demonstrating everyday practices or engaged in ordinary rural activities. Community members, or various members of the team, recorded these videos over the past three years and community members have reflected on and interpreted all the videos in various ways. Videos are executed on a 2D plane in the visualization (Figure 3). We selected scenarios based on the extensive amount of video depicting these activities. We incorporated video in five different scenarios, which encompassed tending or slaughtering goats, healing a calf, and milking or branding cows. Scenarios respond to some of the ways the local knowledge system contextualizes information in social relationships and particular communication patterns. Each scenario consists of an audio description, recorded by a community member and models of four or five men grouped around the animal of concern. We designed multiple copies of a generic man but each instance of a man behaves a little differently. We modeled the men’s gestures to suggest they are pointing and indicating that a story is being told or actions that are imminent at that location. By clicking on the area where the men are gesturing, the user is presented with the recorded video playing on a 2D plane hovering over the area.
Fig. 3. Scenarios act as triggers to launch a video as a 2D plane in the visualization. Here the scenario includes Elders transferring information on branding cows and maintaining the herd.
A New Visualization Approach to Re-Contextualize Indigenous Knowledge
307
3.3 3D Placement and Navigation Users navigate around the village to the different scenarios by using drag-able mouse interaction that controls the camera as a lifted First Person POV. The camera has a tilted angle to provide users with a better view of social context without losing 1st POV. We also carefully positioned certain important objects relative; such as the holy fire, relative to the house and the kraal and the appropriate place for slaughtering.
4 Evaluation We explored community members’ responses to our 3D scenario-based visualization tool during a three-day trip to their village by Authors 1, 4 and 6 and another member of our project team. We sought to discuss the system with residents and evaluate how accessible the system was for them. More specifically, we wanted to determine whether or not people recognized the graphical representation of their locale and could understand that the behaviors of the people models, within a depicted scenario, connected a location to activities and triggered narrative events. We also sought to gain insight into Elders’ opinions about the affordances of the scenario-based visualization and the associated embedded video in knowledge sharing and whether young people’s use of the system might enable them to learn rural wisdom and skills. We scheduled two separate evaluation sessions with Elder residents of the village and younger rural-to-urban migrants who had returned to their village for the Christmas holidays. In both sessions we charged the laptop with a car battery as the village has neither electrical power nor cellphone reception. 4.1 Community-Based Evaluation We pursued a community-based evaluation as our studies show the high collaboration in villagers’ knowledge practices [20,12,21]. As with all our work in the village, evaluation activities were spoken in Otjiherero (villagers’ native language) and were facilitated by the researcher originating from the village (Author 6). Another Otjiherero-speaking researcher also translated to provide updates on the flow of conversation to the developers (Authors 1 and 4) who cannot speak Otjiherero. We recorded the entire sessions for post-situ analysis with one static camera directed at participants and one free-floating camera. We (Author 6) translated the recorded videos in Windhoek and an external person translated and transcribed the recordings to ensure comprehensive documentation. We framed the evaluation with four main questions. Firstly, we sought to determine if participants recognized that the visualization represented their village and could identify the site of the evaluation within the visualization. Secondly, could participants understand the scenarios in the visualization and relate these scenarios to the activities they represent; for instance, that men gather around a cow that is being milked. Thirdly, did participants understand that the people positioned in the scenarios were event triggers for the video, and did this logic make sense? Finally, we sought participants’ specific ideas or reflections on the visualization.
308
K. Rodil et al.
4.2 System Walkthrough with the Youth Seven younger people, aged 10 to 18 years, agreed to participate in evaluating the system. All live in Windhoek most of the time but return to the village for holidays. The session lasted for 28 minutes and commence outside one of the newer homes in the village [see Figure 4]. We encountered some dilemmas that were due to the setup. While amiable participants were hesitant and stood too far from the laptop to see the application in the sunlight but, when we suggested moving into a house, we were reminded that people never sit and talk inside homes. After a little negotiation participants moved closer and were able to see the visualization; however, they remained reluctant to share their thoughts and during the evaluation only two, younger boys, interacted with the system. Thus, we focused the evaluation on whether participants understood the 3D model and basics concepts about its use. We began by exploring participants’ familiarity with computer use and introducing the aim of the system. Some said they have never used a computer before and others said they used computers only in schools in Windhoek. We broached the topic of sharing local knowledge by asking if participants knew what to do if a goat or cow is sick. A youth said: “We never heard a story for that” so we (Author 6) pursued this by saying that the Elders are the holders of this knowledge and asking “if the Elderly die, what happens with this knowledge?”. The youth agreed that this knowledge ”will die with them as well.” .
Fig. 4. We demonstrated the system to seven younger rural-to-urban migrants
After their initial shyness the young people began to comment on the system and about what they saw. These suggest that they connected the visualization to their imagery of various objects (or perhaps pictures of objects). For instance, exclaiming
A New Visualization Approach to Re-Contextualize Indigenous Knowledge
309
“People! Fire! Houses and trees!” We found that participants were able to navigate and ‘read’ the visualization and interpret objects using a tilted camera perspective. For instance, some of the youth recognized the 3D representation of the house outside which we conducted our evaluation: “This is here. It is this house.” Afterwards, they explained that they recognized it from the sides of the house and the fires in front of it. When participants panned over the cattle-branding scenario they mentioned, immediately, the branding irons, smoke and cattle, and then watched a video on branding cattle. Thus, we asked if the scenario was a story and they replied that the cattle were in the kraal (corral) and that because a man was pointing this indicated that he (the model) was telling a story. Another interesting discussion ensued after a youth panned over a scenario about slaughtering a goat. To begin with participants vacillated: “It’s a goat. It’s a sheep.” but then suddenly agreed that sheep do not have small tails. We interpret this ambiguity as indicating both the importance of photorealistic quality and, that the users’ and developers’ emphasis on specific visual details differs. That is, we (Author 1 and 4) emphasized man-made objects but not the very fauna that make for local livelihoods and settings. 4.3 Elders’ Comments on the System We conducted a 24 minutes evaluation session with Elders in the shade outside of a home, and 2 village Elders and a group of 3 women aged over 35, and 3 younger people sat around the prototype. The Elders have no experience of computers and did not try to interact with the visualization, perhaps because of the set-up and composition of the group or perhaps because the group has been more used to watching videos on the laptop in our earlier work (Figure 5).
Fig. 5. We demonstrated the system to 8 rural residents including 2 Elders
310
K. Rodil et al.
The Elders observed the resemblance between the visualization and where the evaluation occurred; however, commented on the visual accuracy of the features. For instance, when discussing the scenario near the cattle in the kraal, the Elders told us that planted trees do not grow inside the kraal and added: “It’s not a planted tree, it’s a wild tree (Otjindanda), just look the way it grows!” Our (the developers) unfamiliarity with local culture and environment meant we lacked an appreciation of visual differences between cultivated and wild trees or where they might grow and made many assumptions when we drew upon reference images. Our lack of visually differentiating trees made the setting less legible to the Elders. This indicates that they not only understand the visualized objects but also recognized the importance of placing objects. Lack of visual accuracy sometimes caused the Elders’ frustration and uncertainty about their own memory of the setting; for instance, based on photos we modeled one block of concrete, where villagers lay their fire, in front of a house. However, this hut has two blocks of concrete. The Elders also noticed inaccuracies in the behaviors of modeled people. For instance, they understood the scenario about milking the cow but said that it looked as if the person was tying the cow. They also understood the goat slaughter scenario easily but, when we asked them who was telling the story, they said that it was the model in the middle of the group because “He is pointing and showing what to be done.” While this reassures us that the Elders reason about the visualized objects it also indicates the need to invest, substantially, in designing and modeling of fauna and objects to match local priorities. At the end of the session we asked for the Elders’ general impressions of the visualization and they said “They are good in their look [quality] and in the sense that they will be kept there forever and they will never be forgotten.” We also asked their opinion about the system’s potential in teaching; they said: “Very much! Especially those township youth, they don’t know village stuff, they will learn from that.”
5 Reflections and the Way Forward The evaluation suggests that re-contextualizing video clips in a recognizable 3D visualization of a familiar environment can potentially bridge epistemological, ruralurban, technological and generational gulfs. However, simultaneously, the evaluation shows differences between developers’ and users’ interpretations of visualized scenarios and objects. This leaves us with a number of challenges in the next phases of the research. 5.1 The Importance of Realistic Details Both youth and Elders paid acute attention to specific details such as posture, gesture, animal features and behaviors, trees, and the positions of some objects, such as fire. These details seem to contribute to recognizing places as well as to the integrity of representing IK. In evaluations participants discussed, at length, the “wrong posture” of the milking people who are supposed to kneel rather than stand. Such
A New Visualization Approach to Re-Contextualize Indigenous Knowledge
311
conversations are critical in designing the system. Community members clearly indicated which details are important which both re-emphasized local practices and that the details that scenarios must include in order that they enable sufficient social framing. References to specific details indicate that some items require greater accuracy but our evaluation did not reveal the importance of photo-real properties, such as the color of houses or the size of the trees. Due to our unfamiliarity with the Namibian landscape the developers (Authors 1 and 4) we emphasized photo-real resemblance for man-made objects (e.g. the communal water pump, huts), assuming that these visually differentiated settings. However, as the Elders indicated, we neglected important features and distinctions between trees. This shows the vital importance of gathering data about people’s interactions with items in their environment, linguistically and visually, and translating this in the visualization. 5.2 Interpretations and the Need for Community-Driven Design Interpreting visual representations depends on a viewer’s cultural background. For example, we intended the cow milking scenario to represent people milking cows and others watching. However, the posture of those milking and the gestures of those watching signified to community members that the scenario was about teaching milking. Drawing upon information sources such as photographs, videos and a community member’s description was a practical step in creating the visualizations; but, as indicated by rich ethnographic depictions yielded by extensive stays in the village [12,20] they cannot match direct and collaborative engagement. This resonates with observations in various earlier activities about differences in visual recognition between community members and ourselves and community members comments that video incompletely depicts visual details. Thus, our next design steps are situated in the village developing the visualization with the community members. These prolonged situated development periods will enable us to explore how to design the visualization to respond to the ways community members movements structure their narratives. Like various other visualization endeavors with indigenous groups [18] our current prototype allows users to explore the setting without forcing a linear narrative between scenarios. However, this might compromise the information contained in the sequences of movement that contextualize knowledge [12]. A situated approach to development not only provides continuous integrity checking but also insight into the dynamics of situated interactions between local IK and the visualization. Our previous analysis of community members’ interactions with video showed the narrative structure of people’s comments and stories changed when local people viewed videos [12,20]. Such insights are important to avoid making design decisions that will suppress and distort local knowledge [10]. 5.3 Further Features and Transferability Our further development includes focusing on appropriate scenario design and modeling the clothing, posture, gestures and movements of people. We will also explore time-location-activity relations as our earlier, and ongoing, ethnography repeatedly shows the importance of daily rhythms to navigating the village and coordinating activities. We can realize shade movements easily in the game engine and this will enable varying scenarios at locations according to time of day.
312
K. Rodil et al.
We will also layer audio of community members’ different interpretations of video clips as our previous studies reveal ambiguity decreases or increases when members add stories and information [20]. Currently, we propose that users click on one of the people in the scenario people to access perspectives. Such a feature might also contribute to personalizing information according to the user; however, we also need to explore differential access to scenarios such that users interact with scenarios based on gender, kinship, age or other criteria [12]. Access to different layers is increasingly used for various visualization and mapping projects for Australian indigenous people, where the suitability of information transfer depends on place, age, clan and gender [8]. Investing in the specific details of a specific village has consequences for the transferability of the application to other villages. Our current prototype serves as a proof of concept and we are eager to explore which features of the visualization can be re-used for other villages and for other regions. We will explore this by running further evaluations in other sites in Namibia.
6 Conclusion Designing an appropriate visualization to support IK in rural Africa requires thoroughly understanding situated interactions between knowledge systems and audio-visual representations and the conceptual frameworks bearing upon design. That is, to design digital infrastructures for currently unserved IK we must account for the transformations that take place as technology interacts with the lived experiences, actions and thought or spoken narratives that constitute knowledge [20]. We have shown the potential of embedding video recorded by, or with, communities within a scenario-based visualization to bridging some of the epistemological, generational, literacies and technological gulfs in Eastern Namibia. We claim that the prototype can help to re-instantiate IK by supporting information sharing. Embedding videos in scenario-based 3D visualization is a first attempt to re-contextualize decontextualized representations in a locally accessible way. Merging information about activities and locations visually appears to be more intuitive for this African rural community than our earlier attempts with text-based and video-only retrieval. Our formative evaluation of the prototype in Namibia indicates that both Elders and younger people relate to the representation and can participate in discussions about improving and appropriating the system. We continue to explore some of the challenges that further development entails, such as the degree of detail, scope of interpretations and transferability between communities. However, we hope our reflections on this novel structure for organizing video, according to a community’s familiarity and priorities, will stimulate international dialogue on the contribution that scenario-based 3D visualizations to other local knowledge systems. Acknowledgements. We thank the residents of the village in Namibia for evaluating our prototype and their continuous commitment and inspirational participation in the project. The work described in this paper was partly funded by Det Obelske Familiefond.
A New Visualization Approach to Re-Contextualize Indigenous Knowledge
313
References 1. Winschiers, H., Fendler, J.: Assumptions Considered Harmful. In: Aykin, N. (ed.) HCII 2007. LNCS, vol. 4559, pp. 452–461. Springer, Heidelberg (2007) 2. Evers, V., Hinds, P.: The Truth about Universal Design: How knowledge on basic human functions, used to inform design, differs across cultures. In: Proceedings of the 9th International Workshop on Internationalisation of Products and Systems, Building Global Design Communities (July 2010) 3. Bidwell, N.J., Browning, D.: Pursuing Genius Loci: Interaction Design And Natural Places. Pers. Ubiq. Comp. 217 (2009) 4. Rumble, G., Koul, B.N.: Open Schooling for Secondary and Higher Secondary Education: Costs and Effectiveness in India and Namibia. Vancouver, Commonwealth of Learning, http://www.col.org/resources/publications/ consultancies/Pages/2007-07S (accessed: November 2010) 5. Chinn, M.D., Fairlie, R.W.: The Determinants of the Global Digital Divide A CrossCountry Analysis of Computer and Internet Penetration. Working Papers 881, Economic Growth Center, Yale University (2004) 6. Fuchs, C., Horak, E.: Africa and the digital divide. Telemat. Inf. 25, 99–116 (2008) 7. UNESCO, International Literacy Statistics: A review of Concepts, Methodology and Current Data. Institute for Statistics, Montreal (2008), http://www.uis.unesco.org 8. Bidwell, N.J., Standley, P., George, T., Steffensen, V.: The Landscape’s Apprentice: Lessons for Design from Grounding Documentary. In: Proc. Designing Interactive Systems (DIS), pp. 271–280. ACM Pr., New York (2008) 9. Bidwell, N.J., Browning, D.: Pursuing Genius Loci: Interaction Design And Natural Places. Pers. Ubiq. Comp. 217 (2009) 10. Green, L.J.F.: Cultural heritage, archives & citizenship: reflections on using Virtual Reality for presenting knowledge diversity in the public sphere. Critical Arts 2(21), 308– 320 (2007) 11. Taachi, J., Kirran, J.: Finding a voice: Themes & discussion. Technical report, UNESCO (2008) 12. Bidwell, N.J., Winschiers-Theophilus, H., Koch Kapuire, G., Rehm, M.: Pushing personhood into place: Situating media in rural knowledge in Africa. International Journal of Human-Computer Studies (2011) 13. Winschiers, H., Fendler, J., Stanley, C., Joubert, D., Zimmermann, I., Mukumbira, S.: A Bush Encroachment Decision Support System’s Metamorphosis. In: Proceedings of the 20th Australasian Conference on Computer-Human Interaction: Designing for Habitus and Habitat, Cairns, pp. 287–290. ACM, New York (2008) 14. Kapuire, G., Winschiers-Theophilus, H., Blake, E., Bidwell, N., Chivuno-Kuria, S.: A revolution in ICT, the last hope for African Rural Communities’ technology appropriation. In: IDIA 2010, Cape Town (2010) 15. Parikh, T., Ghosh, K., Chavan, A.: Design studies for a financial management system for micro-credit groups in rural India. SIGCAPH Comput. Phys. Handicap. 73-74, 15–22 (2002) 16. Kuicheu, N.C., Fotso, L.P., Siewe, F.: Iconic communication system by XML language: (SCILX). In: Proceedings of the 2007 International Cross-Disciplinary Conference on Web Accessibility (W4A) (W4A 2007), pp. 112–115. ACM, New York (2007) 17. Brewer, J., Dourish, P.: Storied spaces: Cultural accounts of mobility, technology, and environmental knowing. Int. J of Human-Computer Studies 66(12), 963–976 (2008)
314
K. Rodil et al.
18. Pumpa, M., Wyeld, T.G.: Database and Narratological Representation of Australian Aboriginal Knowledge as Information Visualisation using a Game Engine. In: Proceedings of the Conference on Information Visualization (IV 2006), pp. 237–244. IEEE Computer Society, Washington, DC, USA (2006) 19. Turner, J., Browning, D., Bidwell, N.J.: Wanderer Beyond Gameworlds Leonardo Electronic Almanac 16(2-3), Part 1 Embodiment and Presence (2009) 20. Bidwell, N.J., Winschiers-Theophilus, H., Koch Kapuire, G., Shilumbe Chivuno-Kuria, S.: Exploring Situated Interactions Between Audiovisual Media & African Herbal Lore. Personal and Ubiquitous Computing (2010) 21. Winschiers, H., Bidwell, N.J., Blake, E., Chivuno-Kuria, S., Kapuire Koch, G.: Being Participated – A community approach. In: Proceedings of the 11th Participatory Design Conference, Sydney (November/December 2010) 22. Bubenzer, O., Bollig, M., Kavari, J., Bleckmann, L.: Otjiherero praises of places collective memory embedded in landscape and the aesthetic sense of a pastoral. People Studies in Human Ecology & Adaptation 4, 473–500 (2009) 23. Haasbroek, J.L.: The challenges of utilizing intelligent human-computer interface technology in South Africa and other African developing countries. In: IEEE International Conference on Systems, Man and Cybernetics, Chicago, vol. 1(18-21), pp. 821–826 (1992)
Design Opportunities for Supporting Treatment of People Living with HIV / AIDS in India Anirudha Joshi1, Mandar Rane1, Debjani Roy1, Shweta Sali1, Neha Bharshankar1, N. Kumarasamy2, Sanjay Pujari3, Davidson Solomon4, H. Diamond Sharma5, D.G. Saple6, Romain Rutten7, Aakash Ganju7, and Joris Van Dam8 1
Indian Institute of Technology, Bombay 2 YRG CARE, Chennai 3 Institute of Immunological Disorders, Pune 4 SHADOWS, Chirala 5 Catholic Medical Centre, Imphal 6 HHRF, Mumbai 7 Johnson and Johnson 8 Formerly Johnson and Johnson {anirudha,mrane}@iitb.ac.in, {debjani.r,swetz21,ndm.madame,sanjaypujari, dimsh23,joris.w.vandam}@gmail.com, [email protected], [email protected], [email protected], {aganju,rrutten}@its.jnj.com,
1 Introduction Adoption of digital technologies, particularly mobile telephony is on the rise in the developing countries. For example, the number of mobile phones in India grew from 52 million in 2005 to 707 million in 2010 [1]. As penetration increases, a large number of new users can potentially get access to the advantages of digital technologies. Among other things, expectations are rising about what technology solutions can do to improve healthcare in these contexts [2-12]. However, as has been acknowledged before [13], [14], these new users, new cultures and new use contexts call for fresh approaches to design of such solutions. Can new technologies support healthcare in developing countries? If so, what ought to be the nature of such a system and under what constraints will it need to work? We investigate information gaps, problems, and opportunities for design of technology solutions for supporting treatment of people living with HIV /AIDS (PLHA) in India. Our objective is to develop a deep understanding of the ecosystem of a cross section of the PLHA in resource-limited settings and to inform the design of a technology solution that could be effectively used as a healthcare intervention tool, and to probe how such solutions could potentially influence patient behaviours.
2 Background 2.1 About HIV, AIDS, and ART The spread of Human Immunodeficiency Virus (HIV) and Acquired ImmunoDeficiency Syndrome (AIDS) has reached epidemic proportions, particularly in developing countries. As of year 2009, an estimated 33.3 million people were infected by HIV, of which 22.5 million live in Sub-Saharan Africa and 4.9 million in Asia. India accounts for 2 to 3.1 million estimated cases [15]. The HIV virus spreads through exchange of infected body fluids, mainly during unprotected sexual intercourse, during blood transfusions, by using an infected syringe needle, or from a mother to her baby. HIV attacks the immune system of the infected person causing a decrease in the number of the CD4+ T helper cells in the blood, resulting in immunologic decline and consequent opportunistic infections (OIs). In the period immediately after the infection, a patient may suffer from some symptoms followed by a clinically latent period, where the patient can remain asymptomatic for several months to years. However, over the long term, the patient’s plasma viral load (PVL) increases and a correspondingly his CD4 count drops. When the CD4 count falls below 200 copies per mm3, the patient is especially prone to OIs and is considered to have entered the AIDS stage [16], [17]. As of now, HIV is not curable. Once infected, there is no way to eliminate the virus from the body of the patient. The treatment of HIV consists of antiretroviral therapy (ART), a set of drugs that act at various stages of the life cycle of HIV and work by interrupting the process of viral replication. After initiation of ART, patients may experience adverse symptoms. Some of these occur due to Immune Reconstitution Inflammatory Syndrome (IRIS), a condition where the body begins to resist the infections due to partial immune reconstitution (which is a positive sign, but
Design Opportunities for Supporting Treatment of PLHA in India
317
one that a patient could misinterpret). Some symptoms occur due to the side effects of the ART drug. Often, side effects are minor and patients overcome them in time, or with the help of medications. However, in some cases, the symptoms are severe or the ART drug is not effective, and the patient needs a change in the ART regimen [17]. Once a patient is stabilised on ART, his PVL reduces and CD4 count increases gradually, and significant immunological improvement may be seen over the period of one year. Studies have indicated that with 95% adherence to ART for one year, viral suppression occurs in 80% patients [18]. In developed countries, the average life expectancy of the PLHA has increased from 10.5 in 1996 to 22.5 in 2005 after the introduction of ART [19]. ART has managed to convert HIV from a “virtual death sentence” to a “chronic manageable disease” [17]. Over a period, the virus develops resistance to a particular ART drug and PVL increases again. Resistance may develop faster in case the adherence is poor or if the PLHA discontinues ART [20]. When this happens, the patient needs another line of ART drugs. Several lines of ART drugs have been developed, but the drugs of the “first line” are usually the cheapest and more widely available [17], [21]. The first regimen also has the best chance for a simple regimen with long-term treatment success [22]. Hence, it is highly desirable that once a patient is initiated and stabilised on ART, he should take his pills regularly, not only for his own sake, but also to avoid the emergence of drug resistant viral strains [23]. PLHA are counselled about potential side effects and the importance of adherence before initiation on ART [17]. 2.2 Prior Studies on Treatment Effectiveness Adherence is an important factor in effectiveness of ART and has been a topic of many investigations [18], [20], [23-28]. Costs of ART and financial status of the patient play a major role, particularly in resource-limited settings [25]. But giving away free medication may not be a silver bullet. A study in private clinics in India found that while financial reasons were the ones most commonly cited for breaks in treatment, respondents receiving free drugs had lower adherence [23]. Severe depression, less education, being unemployed, high CD4 count, hospitalisation, side effects, and pill burden were the other factors associated with lower adherence. Reasons for missed doses were “ran out of pills”, “travelling away from home”, “felt sick or ill”, “simply forgot”, or “busy with other things”. Factors that help in improving adherence include patient’s knowledge about side effects, belief towards ART, having developed reminder tools for taking medication, and patient’s trust and confidence in the doctor [26]. “Directly observed therapy” is known to have improved adherence and treatment effectiveness [27]. Studies have shown a positive correlation between family support and adherence [28], though family support is usually less in case of PLHA because of the stigma. The importance of adherence and the need to support the patient in improving adherence has also been researched in domains beyond HIV [29-31]. Klein et al. studied adherence to regimens for four chronic illnesses in natural settings and proposed a model for supporting adherence [31]. They describe an “adherence loop” in which first, the patient needs to believe that he has an illness and develop a mental model of the condition and the therapy. Next, he needs to know what he needs to do, when and how, while strengthening his mental model. Finally, he needs to act, and have the cognitive, physical, emotional, and financial ability to do so.
318
A. Joshi et al.
There have been several efforts to use technology in HIV treatment, particularly in resource-rich countries. After a meta-review of computer-based interventions for HIV prevention, Noar et al. conclude that such interventions have similar efficacy to human-delivered interventions at lower costs and higher flexibility [11]. Interventions based on mobile phones have also been found to be effective [4], [8], [10]. On a dissenting note, one study shows negative results after the use of a dedicated portable alarm, mainly associated with device failure and confidentiality concerns [32]. Early on, mixed opinions were expressed about the potential of mobile phone interventions in resource-limited settings, mainly due to cost apprehensions [3]. As phone penetration improved, authors have become more optimistic. Two pilot experiments in Uganda and Kenya report that using phones for communication between PLHA, primary health workers, and the clinic staff improved adherence and gave a feeling to the PLHA that “someone cares” [6], [7]. Project Masiluleke in South Africa reports increased calls to the HIV helpline after they sent SMSs to a large number of people [9]. Cell Life uses mobile phones for mobile data collection, and sending messages for HIV prevention, positive living, linkages to clinics, text counselling, and supporting ART in South Africa [12]. There have been studies focussed of parts of the ecosystem of PLHA in developing countries. A survey of PLHA in Kenya found that though 89% PLHA used phones, only 12% had ever used it for a healthcare purpose, but 54% indicated that they were comfortable to receive HIV information on the phone [2]. A qualitative study among PLHA on ART in Peru reports usage of the internet and cell phones [5]. It concludes that information on the internet could be overwhelming, and health interventions based on mobile phones using SMS and voice reminders had many advantages in supporting adherence. (However, this study was not exactly in resource-limited settings. All participants were high school educated and were regular internet users.) 2.3 The Objective While use of technology solutions as healthcare intervention tools have been studied in a variety of settings, sufficient research has not been conducted to understand the settings in which such solutions would be used, and what such solutions should look like in those settings and for those patients to achieve an optimal effect. Particularly, there have been no context-of-use studies that look at the whole ecosystem of patients in resource-limited settings to inform the design of interventions. The objective of this study is to understand the ecosystem of the PLHA in India, including the financial implications, the operations of HIV clinics, the information needs of PLHA, their pill taking and adherence behaviours, the social, cultural, and economic issues affecting the treatment, and the factors affecting the access and acceptance of technology solutions.
3 Method We started the project with visits to five private-sector HIV clinics across India. In each visit, we did a clinic walk-through followed by fly-on-the-wall observation of the clinic operations. We observed in the waiting rooms, doctors’ and counsellors’
Design Opportunities for Supporting Treatment of PLHA in India
319
rooms, the pharmacy, and the pathology labs. This was followed by detailed discussions with stakeholders, mainly the doctors and the counsellors. An ethics committee approval was obtained to interview PLHA. From the five participating clinics, 64 PLHA on ART were recruited. After taking an informed consent, we conducted a semi-structured interview with the PLHA in a language they preferred. Some of the moderators were local HIV counsellors, while others were a part of the design team from a university. Female patients were interviewed in presence of female moderators. The interviews focussed on the following: • Management of ART regimen: How does the PLHA currently manage his ART regimen? What difficulties does he face doing so in different stages? Has the PLHA devised any techniques to improve his adherence? • Information needs: How did the PLHA find out information about HIV / ART? What information gaps remained in spite of current efforts? How did the PLHA cope with the large amount of information? Which problems occurred due to lack of information? • Socio-cultural issues: How did the PLHA deal with social issues such as stigma? What is the role of the ecosystem, i.e. clinics, relatives, friends, NGOs, chemist, family physicians, in treatment and adherence? • Mobile phone usage: Does the PLHA have access to a mobile phone? What does he use it for? Does he use it in any way to improve adherence? Which opportunities and barriers exist in providing health information on a phone? Given the sensitivity of the subject and stigma issues, the interviews were conducted in a consulting room at the clinic. If the PLHA consented, interviews were recorded in audio and some relevant artefacts (such as pillboxes, charts, prescriptions, and reports) were photographed. After the interviews, some PLHA agreed to allow the moderators for a home visit, where we observed relevant artefacts. Some PLHA who reported poor adherence were given adherence assistance. For example, some were given seven-day pillboxes. Some PLHA with a high pill burden were helped to draw charts to record their adherence. For PLHA with mobile phones, a Google Calendar SMS reminder was set up at their pill times. Interviews were translated if necessary and analysed using a grounded approach described by [33] to identify the important user statements, observations, insights, breakdowns, and design ideas. After a gap of 1-2 months, 25 PLHA were re-interviewed. The objective of the second interview was to explore any treatment-related events that may have occurred since the first interview (including symptoms, missed pills, or delays) and to ask follow-up questions resulting from the analysis. The usage and perceptions of the adherence assistance provided to the PLHA at the time of the first interview was also probed. Within 3 months after the first interview, all personal details of the PLHA such as photographs, recordings, and contact information were removed from our records as per the requirements of the ethics committee. After the fieldwork and analysis, a bottom-up affinity of the findings was built to discover patterns, resolve differences, and identify higher-level insights and design opportunities. These were reviewed in two workshops with doctors, counsellors, NGO representatives, technologists, and designers.
320
A. Joshi et al.
3.1 Clinics and the PLHA Profiles The clinics and locations in our study were identified in close collaboration with an expert panel of HIV clinicians from across India to ensure representation and diversity of the PLHA population. The locations included two large metros (Mumbai and Chennai, population 10 million+ each), a large city (Pune, population 3.5 million), and two small cities (Imphal and Chirala, population 200,000 each). Our locations covered 4 of the 6 states in India with high prevalence of HIV. All the clinics in the study are in the private sector, within which they represent different models of operation. Two clinics were consulting offices of senior HIV physicians, their assistant doctors, and some staff. Another was an HIV department in a 405-beded hospital, with 7 HIV specialist doctors and 5 counsellors and a HIV ward with 21 beds. The fourth was a HIV unit in a 150-bedded charitable hospital that provides consultation for a nominal fee of USD 0.33. The fifth was located among a cluster of services that included a private clinic of a senior HIV physician, a private nursing home, a rural primary health centre, and a follow-up facility for government ART PLHA. Each clinic included a pharmacy to disburse ART drugs. A pathology facility was either a part of each clinic or was located near it. Some clinics had professional counsellors, while in others, the doctors did the counselling. Variability was sought also among the 64 PLHA interviewed. There were 40 men and 24 women. Age varied from 26 to 55 and the average was 37. Education varied from less than primary school (16, including 9 illiterates) to graduate or more (9). Annual family income varied from less than USD 1,333 (19) to more than USD 5,333 (9). Occupations included low-income salaried workers such as drivers and security guards (17), homemakers (17), those with white-collar jobs (9), shopkeepers (8), farmers (6), jobless (4), and daily wages workers (3). A majority (41) were diagnosed HIV positive less than 5 years ago, 16 were diagnosed 5 to 10 years ago, and 7 more than 10 years ago. At the time of the first interview, 18 PLHA had been on ART for less than 3 months, 24 from 3 to 12 months, and 22 for more than 12 months. PLHA were reasonably well-distributed across behavioural variables including levels of adherence, family support, knowledge about HIV and ART, and mobile phone usage.
4 Findings and Implications to Design 4.1 Clinic Operations Visiting Clinics is a Burden on PLHA’s Time We found opportunities to speed up clinic operations and minimise the PLHA waiting time with the help of an information system. HIV clinics are crowded places. One reason for this is that good HIV clinics are few. The PLHA spend a long time in the waiting room. A consultation may turn out to be unpredictably long. At times, the doctor may determine that he requires additional diagnostics, before he can make a decision. The counsellor may not be sure whether the PLHA knows enough, so she may repeat information that the PLHA already knows. A doctor remarked, “Our appointment system breaks down every day”.
Design Opportunities for Supporting Treatment of PLHA in India
321
Secondly, some PLHA consider a tertiary HIV clinic as a place of primary care. For example, a PLHA travelled 200 km to a HIV clinic because of a minor fever. All he was prescribed was a paracetamol, which he could have easily procured near his home. Could a phone-based system be used to help the patient decide whether they should go to the specialty clinic, or just to a local pharmacy or a family doctor? We found that the clinics are willing and much interested to provide such phone consultations for stable PLHA with known history and minor complaints, but need technology support to do so. The time that PLHA spend at a clinic may discourage follow up. Some PLHA stable on ART have been purchasing pills from the pharmacy, but have skipped meeting the doctor for a long time. Solutions also need to be culturally grounded. Clinics usually do not pro-actively follow up with PLHA. This seems to be a cultural practice in the medical profession in the private sector. The doctor may suggest a date during consultation, but it is up to the PLHA to determine the actual date of his next visit. Asymptomatic PLHA tend to postpone their clinic visits. For example, a PLHA gave his blood for a CD4 test on doctor’s advice, but did not turn up to collect the reports and follow up for 3 months. All clinics were equipped with computers, but only a few had systems for managing clinic visits. Information systems could speed up clinic operations and implement best practices across clinics (e.g. trigger automated follow-up reminders to PLHA for clinic visits) within the current cultural norms. Poorly Maintained Health Records Technology solutions can help digitise the reports and the prescriptions securely and make them available to doctors across clinics on demand. Many PLHA (particularly the educated ones) meticulously maintain their reports and prescriptions. On the other hand, some PLHA deliberately burn or tear off their papers due to stigma and disclosure issues. Others retain only the latest prescription. At times, some PLHA forget to bring their papers to the clinic and the doctors have no reference to treatment history. Some clinics maintain a digital or paper copy of the prescriptions and reports, but this leads to additional data entry, data duplication (and possible inconsistency), and slowing down of the clinic operation. In some cases PLHA hop clinics, because of treatment failures, financial constraints, or seasonal migration. Unfortunately, there are no arrangements for the PLHA to move their records digitally between clinics. A related issue is the language of the reports and the prescriptions. Records are primarily useful for the pathologists, the doctors, and the pharmacists as they are in English, and are often cryptic. Many PLHA do not read English. If they were to maintain their records, these would be of little use to them personally. Digital records would not only make data available at all times, but also speed up the clinic operation and improve communication with patients. Versions of the reports and prescriptions can be translated, elaborated and simplified in a language that PLHA understand, and made available through a medium that they can use. 4.2 Financial Issues Technology solutions need to operate under tight financial constraints, but also can empower the PLHA with prudent financial planning. The cost of ART varies from
322
A. Joshi et al.
very expensive to almost free depending on where the PLHA gets his treatment. In the private sector, the current cost of the drugs could vary from USD 266 to USD 5,866 per year. Additional expenses are required for diagnostics (CD4 tests, viral load tests, resistance tests, etc.) and the doctors’ consultation. These can add up to a substantial percentage of the annual income for many. (India’s average nominal per-capita annual income is about USD 1,176 [34].) On the other hand, treatment is available for free in government hospitals since 2004 [17]. (It must be pointed out that the PLHA in this study were recruited in private clinics. PLHA taking treatment exclusively at government hospitals were not represented. They might have different perceptions.) As discussed above, free treatment may not be the solution, as free treatment was found to be correlated with poor adherence [23]. One reason for this could be nonawareness of treatment costs. Our sample did include PLHA who got free treatment. Among these, some PLHA took consultation (or second opinions) from a private physician, but got their ART drugs and tests done from a government hospital. Others were associated with clinical trials at the private clinics and were entitled to free medication. Many of these PLHA were not aware of the cost of their medication. Further, even the PLHA paying for their treatment were not aware of the cost of the next line of treatment, which they would have had to take had they failed their current regimen. Technology solutions should make costs of free ART transparent. Some PLHA perceived that more expensive medicines were “serious” or good, and given a choice, chose these if they thought they could afford them. Doctors and counsellors pointed out that some people may agree to expensive arrangements to begin with, but cannot keep up as the expenses mount. Costs other than the treatment are also significant. PLHA need to make at least 4 trips to the clinic every year [17]. For each trip, a PLHA has to bear the cost of travel and the loss of wages of half to two days. Due to the stigma, some PLHA prefer to travel to a clinic further from their residence, with the hope of retaining anonymity. Of the 64 PLHA, 24 travelled for more than 1 hour each way to reach the clinic, of which 13 travelled for 4-8 hours each way. Further, PLHA on ART need to take good nutrition and improve their lifestyle. Unfortunately, these changes are beyond the means of some. When asked whether he drinks boiled water as advised by the counsellor, a PLHA from lower-middle class found the idea laughable – it was way beyond his means to do so. Unfortunately, some people are not able to foresee and plan their expenses for a lifelong ART. For example, a farmer on ART with very limited means on his second trip to the HIV clinic had had to sell a cow to meet expenses of each of his trips. He had another 4-5 cows, but did not have a plan of what would happen after he sold them all. When probed, all he could say was “we shall see when the time comes”. Technology solutions could help PLHA plan their finances better, but they need to operate under existing constraints. Adding devices, special mobile phones, or services may not be scalable, as resources are already scarce. Giving away devices may not work even if funding could be found, partly because of the possibility of a stigma on these devices, and partly because it would add another disruption to the life of the PLHA. Solutions could instead try to leverage existing infrastructure (such as ordinary mobile phones) or shared resources (such as waiting rooms in clinics).
Design Opportunities for Supporting Treatment of PLHA in India
323
4.3 Awareness Issues and Information Gaps Insufficient Counselling Interactive learning tools could complement, support and personalise current counselling efforts. Counselling the PLHA is important on diagnosis and before initiation on ART. Clinics do counsel on both the occasions, but counselling is difficult on these occasions as the PLHA is in a state of shock and is not in a condition to absorb information. Secondly, the field of HIV has too much information, not all of which may be relevant to an individual. Thirdly, counselling a large number of PLHA is tedious, and after a while, the counsellor could get fatigued. Doctors and counsellors spend a lot of effort repeating similar information to each PLHA. However, an individual PLHA gets very little time with the doctor or counsellor to absorb information and ask questions and doubts. There is always a long queue and time pressure. Further, counselling once is not enough, particularly for less educated and older PLHA who require a lot of repetition and redundancy. Group counselling (as practiced in some government clinics) can reduce the load to some extent. Nevertheless, it is less effective than individual counselling as PLHA do not speak up or ask doubts in a group. It was observed that HIV brochures and other material in the waiting rooms of the clinics were often left untouched by PLHA. This could be partly because of low levels of literacy, but partly because the brochures are not designed to address the needs of the individuals. We came across a government ART centre that recently started screening videos related to HIV and ART in the waiting room. This seemed more effective and it was observed that the PLHA were more attentive. Most clinics, however, screen TV channels for entertainment in their waiting areas. Informational videos and other forms of self-paced learning material could be a good tool to prime the PLHA before the counselling and repeat the information after the counselling. These would optimise the time that the PLHA spend in the waiting rooms, and leave more time for personalised discussion during the counselling session. Tool-use logs could give hints to counsellors on where they need to focus. Bombarding information in one direction may not be good enough. It was observed that asking PLHA probing questions about their knowledge of HIV during the interviews triggered their curiosity and encouraged them to ask questions to the moderators. We suggest that the learning tools could quiz the PLHA about their knowledge. This would not only help evaluate the effectiveness of the counselling and identify weak areas, it would also prime the PLHA to absorb more information. Information needs to be localised beyond language. PLHA use different terminologies to describe the same symptom. Food, meal timings, and lifestyle vary between states, and because of professions. For example, in Imphal, most people have lunch at 8 am and dinner at 8 pm, and no other meals in between. Nutritional advice and pill times will need to be adjusted accordingly. Technology could be used to repeat counselling, to personalise it according to literacy levels and medical conditions, to localise it in terms of language, food, profession and culture, and to make it available on-demand and in a form that people, including those with less education, can follow. A technology-based solution could also ask questions, give feedback based on responses, and evaluate and track the
324
A. Joshi et al.
effectiveness of the communication. As we discuss below, it could also help PLHA connect with other experienced PLHA anonymously to get their questions answered. Unfamiliar Terminology Solution designers need to consider that some PLHA may not be aware of relevant terminology. It was found that some clinics referred to the ART drugs they prescribed simply as “HIV medication”, perhaps with the good intention of avoiding jargon and simplifying communication. As a result, though, some of the PLHA on ART had never heard of the term ART. Others had heard the term ART, but they associated it with the free medication available in government hospitals, or with the ART centre in government hospitals. Such a PLHA is likely to ignore all communications about “ART”, as he would consider these irrelevant. Similarly, many PLHA were not familiar with relevant terms such as CD4, viral load, or the virus. Not knowing common terms or not understanding their precise meaning prevents PLHA from communicating with each other (as we discuss below). Issues with Pill-Taking and Adherence Solutions are needed to help PLHA take their pills and improve adherence on a dayto-day basis. It was observed that the pharmacists often explain orally to PLHA how to take their pills while handing them over. They tend to do this if the PLHA is newly on ART or if he asks for an explanation. Yet, a few PLHA were not completely aware of how they should take their drugs. For example, one woman did not know that her daughter’s pills were dispersible, and had a difficult time in making her swallow them. Another woman was not aware that her pill was supposed to be taken just before going to bed – she complained of giddiness, a side effect. Another woman cut her pill into two and had it twice a day (a wrong practice), because she had difficulty in swallowing it whole. Patients with complex regimens (with 5+ pills per day) needed visual support (such as a chart or timings written on pill bottles), especially in the first few weeks. Many of the less educated PLHA could not recall the names of their drugs. They recognised and differentiated between their pills visually – mainly by colour, shape, bottle, or a few alphabets on the label. Confusions did arise if a company changed its pill design, or when the pharmacy provided a different brand of the same drug. In a system meant to support PLHA in pill taking, reading out drug names from the prescription may not be enough. Additional cues such as a photograph or shape and colour of the pill should be communicated. It should be accompanied by pill-taking instructions, health tips, and answers to frequently asked questions. Many PLHA reported having missed a few pills, but pill counts revealed that they often underestimated the missed pills. It was found that many PLHA faced difficulty in counting days since last visit to the clinic, accounting for the number of missed pills, and then estimating the number of pills left. One of the PLHA who found this task difficult was a carpenter, who was otherwise able to make complex calculations routinely. It was observed that some older, less educated PLHA faced difficulties in reporting their age or in counting cash. A system should work within these constraints and provide personalised data and feedback about pill counts and estimates. People for whom the SMS alarms were set up were happy to get their SMS beeps (though some never read the SMS – “no one I know would ever send me an SMS”).
Design Opportunities for Supporting Treatment of PLHA in India
325
These PLHA were disappointed when we informed them that their alarms had to be discontinued after 3 months of the first interview (as per the ethics committee requirement). These personalised pill reminders had clearly helped the PLHA and were not considered intrusive. Even those who were already using mobile phone alarms were happy to receive SMS reminders, as the redundancy ensured that the system was fail-safe. 4.4 Knowing Facts and Procedures or Understanding Concepts? Solutions need to help the PLHA build a mental model of HIV and ART apart from informing about facts and procedures. In spite of several awareness issues and information gaps, some factual and procedural knowledge about HIV and ART is slowly getting built up among PLHA. However, many have a poor conceptual understanding of HIV and ART. For example, some PLHA know that ART is the treatment for HIV (a fact), and that they should take their pills regularly (a procedure), but cannot explain why they need to take their pills lifelong, or what would happen if they missed their pills or took them late. As Klein et al. suggest [31], unless the PLHA develop a mental model, they would not believe in the therapy and would find it difficult to understand what to do, particularly in a new situation. HIV – “Everybody knows, but nobody understands” Through the awareness campaigns in several media, many have heard of HIV. However, few understand how HIV actually works. When asked what HIV was, the first reaction of a PLHA was “everybody knows what HIV is”. When the moderator insisted that he explain to him as if he was a newly infected, naive person, he quickly replied that he “did not know that much”. People usually think about a disease in terms of its symptoms. They try to describe HIV in terms of darkening of the face, weight loss, body ache, death, etc. Unfortunately, in case of HIV, PLHA may face different symptoms, so they face difficulty in coming to grips with this disease. Most PLHA (though not all) had reasonable procedural knowledge. They knew about the modes of transmission of HIV (unsafe sex, injection needles, razors, blood transfusion, and mother to child transmission). However, they could not explain why HIV spreads through these modes and not through other forms of contact. Many PLHA betrayed their conceptual confusions (or possibly denial). One PLHA thought he was infected because he “donated blood”. A PLHA responded, “I don’t know how the virus came in our house – we always keep it clean”. Another wanted to know from the moderator whether she could cook for her family or serve food. Some had become overcautious about hygiene and avoided touching children, shaving in saloons, sharing utensils, or having sex. CD4 – “Batting Without Knowing the Score” Many PLHA had heard about a CD4 count, though some had gotten it mixed up with viral load. Some remembered the numbers (“87”), but were not sure if more CD4 was good or bad, or what a good value should be. Others were not sure what CD4 was and why they need to be tested for it so regularly. To draw an analogy with cricket, these people are batting without knowing the score.
326
A. Joshi et al.
Since it is usually not possible for clinics to proactively follow-up PLHA, the onus of going to the HIV clinic regularly is on the PLHA. If the PLHA is non-symptomatic and not yet on ART, he may postpone visiting the clinic though his CD4 count could be dropping. None of the PLHA was aware that the CD4 count has a possibility of 20% error and PLHA with counts below 250 should test frequently [17]. The duration between CD4 tests for many was longer than the prescribed 3-6 months. Many had OIs at the time of initiation of ART. To continue with the cricket analogy, these batsmen started trying to catch up on their run-rate after the asking-rate had gotten out of hand. ART, Adherence, “Scolding”, and “Side Effects” Many PLHA had factual and procedural knowledge about ART. They knew that ART “improves health”. They also knew that they have been told to be adherent to their pills and to take them on time. However, when we asked why they should be adherent, a frequent answer was, “otherwise the doctor will scold me”. Only procedural and factual knowledge is not enough when one has to handle a situation that one has not been prepared for. Many reasons for non-adherence seem to be related to a poor understanding of ART. For example, half an hour after the time for the dose, a PLHA remembered that he had not taken his pill for the day, so he skipped his dose. According to the doctors, an occasional half-hour delay would not have mattered. Another PLHA has been counselled that alcohol consumption has bad effects on ART. Therefore, on an evening when he plans to consume alcohol, he deliberately skips his ART pill. Another PLHA waits for the exact time to take his pill. His pill time is 8:30. He sets an alarm for 8:28, and when the alarm rings, he takes a pill out of the bottle and waits for the next two minutes before he takes it. Some PLHA were not sure whether they should take a double dose the next time if they miss a pill, or whether they should delay the current dose if the last dose was late. Some were unsure if they could share pills with their spouse. These situations could be avoided by developing a mental model of HIV and ART. In the period immediately after initiating ART, PLHA may experience symptoms and an apparent deterioration of health. A symptom could be a side effect, an OI, an IRIS condition, or an indication of a regimen failure. Each of these calls for a different medical follow-up. Most PLHA could not differentiate between these. They called each of these a “side effect” of ART. One PLHA believed that his wife and child died “because of ART” and was afraid that he was now put on the same medication. Poor conceptual understanding of ART could lead to self-treatment, discontinuation of ART, or the PLHA being lost to follow-up. Even in cases where the ART is successful and the health improves, conceptual understanding of ART is important. Some PLHA did not know that ART was meant to be taken for the rest of the life. After feeling better, they had felt that they had recovered and had discontinued ART. Others are “bored” of taking tablets daily. Unfortunately, missing a pill has no immediate visible effect that could help the PLHA build a mental model of how ART works. As a PLHA poignantly put it, “it makes no difference, whether I take the pill, or I skip it”. Missing a dose helps the virus build resistance and this leads to a treatment failure in the long term. Technology solutions can help externalise these abstract notions, and provide continuous feedback and encouragement for adherence and health status.
Design Opportunities for Supporting Treatment of PLHA in India
327
Conceptual Clarity Leads to a “Normal” Life Some PLHA did have a good conceptual understanding of HIV and ART and could explain it in their words. One PLHA remarked, “HIV is a virus, not a disease”. Another explained CD4 were the “germs of power” and OIs as “your power goes down and all diseases catch you”. Another explained that when ART is successful, “the virus becomes sleepy”. Most of these PLHA also demonstrated a positive approach to life and were adherent to their treatment. Experience of suffering coupled with conceptual understanding of the disease seemed to make people compliant with the treatment. PLHA who suffered a lot before feeling better with ART, and those who have seen others die of HIV seemed to have a better adherence. Technology solutions can communicate positive and negative experiences of other PLHA to help build a mental model and minimise the personal suffering of the individual. 4.5 Social and Emotional Aspects Family Support – The Haves and the Have-Nots Solutions need to be personalised, depending on the disclosure status of the individual within his family. Some of the PLHA who had disclosed to their families got both financial and emotional support. Such support is particularly valuable at the initiation of ART, when the pill burden is high and the regimen is complex. Many PLHA had disclosed not to the entire family, but to one or two members. Mature, open minded, supportive, and earning members of the family were preferred for disclosure. In many cases, PLHA preferred a same-gender disclosure, in other cases, disclosing to males was preferred, mainly due to the financial implications. The people who were often kept in the dark were the elderly (“why bother them”) and the children (“they might tell the neighbours”). Full disclosure is not necessary to gain support. Family members (especially children) are happy to support even without a full disclosure. Many PLHA had only disclosed the need for taking regular pills, while others had said that they suffer from a non-stigmatising chronic illness such as blood pressure or diabetes. Though desirable from a treatment perspective, many PLHA struggled to disclose their status to family members. In some cases, disclosure led to tension in the family, break up of a joint family, or separation from spouse. Due to such fears, a few PLHA have preferred to disclose to friends, but not to family members. Such PLHA tend to reject a solution that has a risk of accidental disclosure to the family. For example, one PLHA who also had tuberculosis preferred treatment in a private clinic because government clinics insist on DOTS (directly observed treatment, short course) – he did not wish to let anyone visit his home to observe him taking pills. Non-disclosure to family members creates obstacles for ART. A PLHA avoided using alarms because his family members might get suspicious and ask why he was setting an alarm for odd hours such as 9 am or 9 pm. Another PLHA took his pills at night, when everyone had gone off to sleep. Some tore off labels of pill bottles and hid them in a safe place. A PLHA complained of the feeling of being “alone”, because of such situations. A system meant to support ART has to be flexible. For a PLHA who enjoys high levels of family support, such a system should empower family members and help
328
A. Joshi et al.
them give better support. A PLHA who has not disclosed to family members needs support all the more, but such support needs to be more discreet and under the control of the PLHA. Similarly, a PLHA with partial disclosure would need other solutions. In no situation should such a system cause unintended disclosure, as it would have disastrous consequences for the PLHA, and it would compromise privacy laws. Need for Anonymous Socialisation Technology-mediated social networking has become common. We found that there is both a need and an opportunity to leverage technology to enable socialisation among PLHA, although in a new form. Socialisation is the usual way for people to acquire information in India. Even while looking up an address, Indians prefer to ask someone on the street for directions rather than look up a map or a signage. However, we found that PLHA socialise less. To begin with, some PLHA had difficulty in locating a good HIV clinic. Being shy about asking someone and not knowing where to look does not help. Some PLHA avoid social functions to maintain ART times. Others avoid socialisation because of stigma, depression, or because of the need to save money, as socialisation often leads to expenses. While PLHA socialise less, on the other hand, lack of socialisation gives PLHA a feeling of being “all alone”. We found that PLHA do have a lot to share. Some PLHA use interesting ideas for improving ART adherence or to integrate it in their lives. A PLHA on two ART pills a day uses an interesting “two-pill strategy” to remember whether he has taken his current dose (a common problem). He takes his ART pills from a small pillbox. Every night, after he has taken his pill, he takes out two pills from the bigger bottle and keeps them in the pillbox. If the pillbox has one pill at night, it implies he has not taken his night pill yet, but if it has two or zero pills it implies that he has taken it. Conversely, if it has two pills in the morning, it implies that he not taken his morning pill yet, but if it has only one pill, then it implies that he has taken it. Another PLHA with a travelling job ensures that he never forgets to carry his pills – he always carries three things, “phone, money, and ART”. Another PLHA thought of his ART as his religion. Some PLHA associate taking a pill with events in a day – sending children to school, or a meal. Each of these ideas seems to be worth spreading. Informal social activities among PLHA seemed to be happening in some contexts. In a rural hospital, PLHA were aware of each other’s problems and there was exchange of information among them. It appeared that the hospital gave the PLHA anonymity and the knowledge that others are PLHA like them. In an urban government hospital, some ART-experienced PLHA were invited to share tips with others after a group counselling session. A PLHA who attended it reported that he valued such tips more than what the counsellor had told. Unfortunately, such socialisation among the PLHA was not observed in the urban private clinics. In some cities, formal networks of positive people and support groups exist outside the clinics. The activities of these networks seem to be growing and their members benefiting. However, only a few of the PLHA in our study were active in positive networks, hinting that these are not yet widespread. Positive networks may not be suitable for those who live in rural areas or for those who have travelling jobs. Others may not prefer them because of stigma and face-to-face nature of their activities.
Design Opportunities for Supporting Treatment of PLHA in India
329
Typically, online social networking forums are meant to reach out to existing and new friends with one’s true credentials. Such social networking might not work for PLHA due to stigma issues. However, anonymous networks, possibly on mobile phones, and catalysed by the clinics could help PLHA socialise. Power Distance, Trust, and Fear India is considered a society with a high power distance index with a rank of 17 out of 68 countries and regions [35]. In a high power distance society the less powerful people accept and expect that power is distributed unequally. While characterising a country along a few dimensions thus could be critiqued as not being the best way to understand the rich textures of its society, high power distance is certainly visible between the PLHA and the doctors, and to an extent the counsellors. Not only did PLHA respect and trust the doctors, they seemed to revere them. A doctor mentioned that he observed some PLHA leave his clinic walking backwards, something that they would only do in a temple. A doctor showing quick results is particularly revered. No other source of information, including TV, radio, internet, a positive network, or a friend is trusted as much as the treating doctor. Unfortunately, the high power distance also prevents communication. Many PLHA are afraid to ask too many questions to the doctor, as it might seem that they are questioning the doctor’s judgement. Gender gap too plays a similar role. Most HIV doctors in our study are men, and women are shy to ask them simple questions such as “can I cook for my family?” (A PLHA woman asked this question to the female moderator). Conversely, men are shy to talk with counsellors (who happen to be mostly women) about multiple sex partners. Time pressure in the clinic makes it harder to overcome the power distance. As has been said earlier [13], technology could possibly bridge some of these social gaps. It could provide a platform to the PLHA to ask their questions offline (or to just listen to others’ questions) and get the doctor’s answer. At the same time, the trust with the doctor could be leveraged by using the doctor’s voice and / or his photograph while giving medical advice. 4.6 Non-uniform Mobile Phone Usage Of particular interest to us were the differences in mobile phone usage patterns among PLHA. Though penetration of mobile phones is quite high, mobile usage abilities of the PLHA varied a lot. Of the 64 PLHA interviewed, 42 had access to a personal or a shared mobile phone. Almost all of them could receive and make calls. Only 33 could look up missed calls, and 30 could look up a number in the phone book. Only 20 had ever read an SMS, and 9 had ever written one. Only 17 reported using noncommunication functions of the phone such as the alarm, the radio, or the camera. This being a qualitative user study, the sample might not represent the population proportionately, but it certainly indicates the constraints that a technology solution would encounter. Even though a facility is available, not everyone might think of using it for ART. Many PLHA with mobile phones had never thought of using an alarm as a pill reminder. By the second interview, some had started using alarms in their mobile phones (possibly because of our first interview). Similarly, people who were
330
A. Joshi et al.
presented with a seven-day pillbox – a relatively simple intervention – said that they found it useful (“we can tell if we have missed a dose”) and reported that their adherence had improved. Once PLHA start using it, an intervention may prove to be useful. However, many PLHA may not initiate the use by themselves. The intervention has to be initiated at the clinic along with ART and a PLHA has to be walked through the first few steps. Phone sharing is another area that needs consideration. A phone is frequently shared between family members. In some cases, a child is more tech-savvy than its parents. For example, a PLHA said that his daughter could look up an SMS, while he could not. Even so, he did not mind receiving HIV related SMS. He said, “You send, madam. I will manage”. On the other hand, a tech-savvy PLHA preferred emails “because these are more secure” than SMS “which could be read by anyone”.
5 Conclusions and Ongoing Work The problem of treatment of people living with HIV / AIDS in resource-limited settings such as India has moved, at least in part, from the medical and pharmaceutical domain to the information and communication domain. Many issues relate to low education, access to technology, access, presentation and interpretation of information, behavioural aspects, lack of socialisation, less time with doctors and counsellors, high power distance between PLHA and doctors and counsellors, and information overload. This study establishes that there is a clear need and an opportunity to make technology-based, user-centred interventions in management of anti-retroviral therapy. Technology presents opportunities, but realising them may not be easy, given the usability problems and information-seeking behaviours. A specific, isolated intervention (such as only SMS reminders) could have a limited effect, or could even be counterproductive. This study points to the need for a holistic intervention. Gains can be achieved by considering not only the medical needs of the patient, but also the informational, social, and emotional aspects, and the abilities of the people. Given the variability across the PLHA, the system should have flexibility. The intervention should be personalised for the needs of an individual, based on education, knowledge, availability of family support, medical status, age, gender, and treatment. It should be localised for a community, based on language, food habits, profession, and culture. Many PLHA will need repetition of information as much of it is new to them; most will need help to build a conceptual understanding of HIV and ART. The intervention should be grounded in the reality of a resource-limited setting. It should not add to the already heavy financial burden. It should not be a stand-alone, independent activity, but and should closely complement the ongoing efforts in the clinic. Trust can be leveraged and simultaneously barriers of power distance can be broken with a usable interface between the doctor and the patient. On the other side, the system can improve the efficiency of the clinic, secure health records, and optimise the time of the PLHA in the clinic. The last decade has brought digital technologies within the reach of new users from developing economies for the first time. This study brings insights about the needs and contexts of these users. Though our focus was on HIV and ART, our study
Design Opportunities for Supporting Treatment of PLHA in India
331
resulted in several broad-based findings as well. These can be potentially applied to supporting other chronic illnesses and conditions with technology solutions. Based on the insights from this study, we have undertaken design of an interactive voice response system on the PLHA side, and a web-based system on the clinic side. Our next step would be to prototype such a system, evaluate it for usability, and perform a randomised control study among a cohort of PLHA to evaluate its effectiveness on ART adherence and pathological outcomes. Acknowledgements. This project was funded through a grant from Johnson & Johnson Limited. We thank the doctors, counsellors, moderators, and the people living with HIV / AIDS for participating in this project and sharing their insights.
References 1. TRAI: Telecom Subscription Data as on (October 31, 2010), http://www.trai.gov.in/WriteReadData/trai/upload/ PressReleases/780/PRecodiv24dec10.pdf (accessed December 24, 2010) 2. Lester, R., Gelmon, L., Plummer, F.: Cell phones: tightening the communication gap in resource-limited antiretroviral programmes? (2006) 3. Kaplan, W.: Can the ubiquitous power of mobile phones be used to improve health outcomes in developing countries? (2006) 4. Vidrine, D., Arduino, R., Lazev, A., Gritz, E.: A randomized trial of a proactive cellular telephone intervention for smokers living with HIV/AIDS (2006) 5. Curioso, W., Kurth, A.: Access, use and perceptions regarding Internet, cell phones and PDAs as a means for health promotion for people living with HIV in Peru (2007) 6. Chang, L., Kagaayi, J., Nakigozi, G., Packer, A., Serwadda, D., Quinn, T., Gray, R., Bollinger, R., Reynolds, S.: Responding to the Human Resource Crisis: Peer Health Workers, Mobile Phones, and HIV Care in Rakai, Uganda (2008) 7. Lester, R., Karanja, S.: Mobile phones: exceptional tools for HIV/AIDS, health, and crisis management (2008) 8. Kevin Patrick, K., Griswold, W., Raab, F., Intille, S.: Health and the Mobile Phone (2008) 9. Fabricant, R.: Project Masiluleke. Interactions 16(6) (2009) 10. Winchester, W.: Catalyzing a Perfect Storm: Mobile Phone-Based HIV-Prevention Behavioral Interventions. Interactions 16(6) (2009) 11. Noar, S., Black, H., Pierce, L.: Efficacy of computer technology-based HIV prevention interventions: a meta-analysis (2008) 12. Benjamin, P.: Cellphones 4 HIV. In: mHealth Potential in South Africa: The Experience of Cell-Life (2010), http://www.cell-life.org/images/downloads/CellLife_Organisation_Poster.pdf (accessed 2010) 13. De Angeli, A., Athavankar, U., Joshi, A., Coventry, L., Johnson, G.: Introducing ATMs in India: a contextual inquiry (2004) 14. Joshi, A.: Mobile Phones and Economic Sustainability - Perspectives from India. In: Workshop on Expressive Interactions for Sustainability and Empowerment, Londan (2009) 15. AVERT: Worldwide HIV & AIDS Statistics Commentry, http://www.avert.org/worlstatinfo.htm (accessed 2009) 16. AVERT: AIDS , http://www.avert.org/aids.htm (accessed 2009)
332
A. Joshi et al.
17. NACO: Antiretroviral Therapy Guidelines for HIV-Infected Adults and Adolescents Including Post-exposure Prophylaxis. Antiretroviral Therapy Guidelines for HIV-Infected Adults and Adolescents Including Post-exposure.pdf, http://nacoonline.org/upload/Policies&Guidelines/1 (accessed May 2007) 18. Arnsten, J., Demas, P., Gourevitch, M., Buono, D., Farzadegan, H., Schoenbaum, E.: Adherence and Viral Load in HIV-infected Drug Users: Comparison of Self-report and Medication Event Monitors. In: Conference on Retroviruses and Opportunistic Infections, New York (2000) 19. Harrison, K., Song, R., Zhang, X.: Life Expectancy After HIV Diagnosis Based on National HIV Surveillance Data From 25 States, United States (2010) 20. Sherr, L.: Understanding Adherence (2000) 21. Bartlett, J., Shao, J.: Successes, challenges, and limitations of current antiretroviral therapy in low-income and middle-income countries (2009) 22. Panel on Antiretroviral Guidelines for Adults and Adolescents: Guidelines for the Use of Antiretroviral Agents in HIV-1-Infected Adults and Adolescents, http://www.aidsinfo.nih.gov/contentfiles/ AdultandAdolescentGL.pdf (accessed January 10, 2011) 23. Sarna, A., Pujari, S., Sengar, A.K., Garg, R., Gupta, I., Dam, J.: Adherence to Antiretroviral Therapy & its Determinants Amongst HIV Patients in India (2008) 24. Nischal, K., Khopkar, U., Saple, D.: Improving Adherence to Antiretroviral Therapy (2005) 25. Duraisamy, P., Ganesh, A., Homan, R., Kumarasamy, N., Castle, C., Sripriya, P., Mahendra, V., Solomon, S.: Costs and Financial Burden of Care and Support Services to PLHA and Households in South India (2006) 26. Wu, X.: Factors associated with adherence to antiretroviral therapy among HIV/AIDS patients in rural China (2007) 27. Goggin, K., Liston, R., Mitty, J.: Modified directly observed therapy for antiretroviral therapy: a primer from the field (2007) 28. Cauldbeck, M., O’Connor, C., O’Connor, M.B., Saunders, J. A., Rao, B., Mallesh, V., Kotehalappa, N., Kumar, P., Mamtha, G., McGoldrick, C., Laing, R.B., Satish, K.S.: Adherence to anti-retroviral therapy among HIV patients in Bangalore, India (2009) 29. Osterberg, L., Blaschke, T.: Adherence to medication, pp. 487–497 (2005) 30. PhRMA: Just What the Doctor Ordered: Taking Medicines as Prescribed Can Improve Health and Lower Costs, http://www.phrma.org/files/attachments/Adherence.pdf (accessed March 2009) 31. Klein, D., Wustrack, G., Schwartz, A.: Medication adherence: Many conditions, a common problem. In: 50th Annual Meeting on Human Factors and Ergonomics Society (2006) 32. Mannheimer, S., Morse, E., Matts, J., Andrews, L., Child, C., Schmetter, B., Friedland, G.: Sustained benefit from a long-term antiretroviral adherence intervention (2006) 33. Beyer, H., Holtzblatt, K.: Contextual Design. Morgan Kaufmann, San Francisco (1998) 34. IMF: Report for Selected Countries and Subjects, http://en.wikipedia.org/wiki/ List_of_countries_by_GDP_nominal_per_capita (accessed October 2010) 35. Hofstede, G.: Geert Hofstede Cultural Dimensions, http://www.geert-hofstede.com/hofstede_dimensions.php (accessed 2009)
In Class Adoption of Multimedia Mobile Phones by Gender - Results from a Field Study Elba del C. Valderrama-Bahamondez1, Jarmo Kauko2, Jonna Häkkilä2, and Albrecht Schmidt3 1
University of Duisburg-Essen, Paluno Institute, Gerlingstr. 16, 45127 Essen, Germany 2 Nokia Research Center, Visiokatu 1, 33720 Tampere, Finland 3 University of Stuttgart, VIS, Universitässtr.38, 70569 Stuttgart, Germany {elba.valderrama,albrecht.schmidt}@acm.org, {jarmo.kauko,jonna.hakkila}@nokia.com
Abstract. In this paper we share our findings from a field study conducted in Panama, focusing on adoption of mobile phones in classroom settings. Our initial findings reveal that during the initial phase of use, boys adopt mobile phone usage faster and explore more functionality; while girls take more time to familiarize themselves with the phones. Girls seem to maintain a better focus on the learning activities using the mobile phones across all tasks. When the task implies an active role then boys also showed high concentration. The videos recorded by the children as part of the learning activities showed a remarkable difference in roles between girls and boys. These findings suggest that it is important to consider the different adoption and exploration strategies of girls and boys with new technologies when designing tools for mobile learning. Keywords: Mobile phones, children, technology adoption, rural schools, developing countries, novelty effect, learning, boys, girls.
2 Related Work Although mobile phone assisted learning is an emerging field, there exists some earlier research in the area. The interaction between children and multimedia phones has been investigated in developed regions e.g. in [5, 6], where Jarkevich et al. [6] studied the adoption of mobile phones by small children in the kindergarten, and Puikkonen et al. [5] analyzed short videos created by girls with their mobile phones. However, our research focuses on the use of multimedia phones in developing countries. Here, pioneering work has been done for example by Kumar et al. [4], who studied the utilization of mobile phones in unsupervised learning environments in rural India; and Kam et al. [7], who explore the adoption of local game practices for learning games with mobile phones in schools, also in India. Hollow and Masperi [8] explored the use of video learning tutorials for handheld devices in primary schools in Malawi. In contrast to this work, our research looks at the opportunities that arise from using general multimedia mobile phones as generic learning tools – much like paper and pencil. Qualitative findings of how teachers adopt the use of multimedia for their own teaching style; and needs and strengths of their pupils in a similar setting as used in the research presented in this paper have been reported in [3].
3 Method The starting point for our research reported in the paper was based on the results from a set of interviews, discussion and surveys to teachers and children realized previously in Panama [2,3]. The surveys and interviews were conducted to get an overview of the use of technology in schools, the acceptance of using mobile phones for learning, how teachers and children imagine mobile phones could be used for learning, and the learning needs that could be supported by mobile phones in Panama. The main findings revealed that teachers and pupils have a high access to mobile phones [2], and they would welcome support especially for Math, Language arts and Nature [2,3]. It was desired that any learning application should include multimedia features. Teachers were open to use mobile phones in their classes, and they considered the school the most suitable place for using a mobile phones for learning [2,3]. Based on these findings we decided to conduct an explorative field study to understand how children and teachers adopt the use of the mobile phones in class and how phones could support learning in class in the context of rural areas in developing countries. 3.1 Setup of the Field of Study The field study was realized in two schools in Cocle, a province located in the west of Panama City. The first study was conducted in a rural multigrade school in the village El Retiro. The participants attended 4th, 5th and 6th grade; pupils from the 4th and the 5th grade worked together in the same classroom under the supervision of the same teacher (multigrade). We have described the learning activities and general findings from the field study realized in this school in [3]. The second school was Maria de Tirones, an urban school located in the village of Rio Hato. There we worked with 2 classes of the 5th grade. Although only one of the schools was classified as rural,
In Class Adoption of Multimedia Mobile Phones by Gender
335
most of the pupils in both schools came from families with low or very low income. In total, 78 children participated in the study, 36 from the first school and 42 from the second one. Additionally, we worked with 6 teachers, including the two respective grade-teachers and the English teacher. The gender distribution was 41 girls and 37 boys, and the mean age was 10.5 years. As the learning activities using the mobile phones were planned by teachers to target the specific grade and learning goals, some activities were analyzed from a subset of all participants. The study ran for two working-weeks in each school, between September and October 2010. As we want to explore the use of multimedia in class and to avoid any bias if children used their own phones, we provided each child and teacher with a Nokia 5530 Music Xpress equipped with an SD card of 4 GB. With this model we were also able to explore touch screen interaction, as touch-screen phones are currently the trend in the market. In both schools we gave a brief introduction to the teachers showing the multimedia features and applications of the mobile phones. Teachers were free to decide which multimedia features and applications they used during their classes. The learning tasks were designed exclusively by teachers and respected the planned lessons scheduled previously by them for the school term. Only three of the children had never used a mobile phone before, but none of them had prior experience with touch-screen interaction on mobile phones. Mobile phone features and applications were introduced to children when needed for a learning task. After the first week all the children knew well how to use the phone. The mobile phones were used only during the school; teachers delivered them to children when starting the class and collected them once the class was finished. On average, children used the phones between 2 and 4 hours daily. In addition to the instructed tasks designed by teachers, children had a chance to explore the mobile phones freely. The mobile phones were used only as portable computers, thus the ability to do phone calls was not available during the field study. To monitor the usage of the mobile phones, we installed a logger application in all the phones. The logger run on the background and recorded the applications that had been opened by the pupils. The logger application created a daily text file with the phone IMEI, timestamp and name of the application launched. In addition, the logger took a screenshot of the active application every 20 seconds. The first author of this paper was present during the field study doing observations and giving support in case of any technical problems.
4 Adoption of the Use of Multimedia Phones during the Learning Activities One of the goals of this research was to explore gender differences in using multimedia features of the mobile phone in class. From our observations made during the field study, apparently, boys seemed to be more distracted than their female peers while working with an assignment on the phone. For example, they explored other applications during an instructed task. Also, our observations suggest that boys are faster to adopt the mobile phones than girls. As initial point, we analyzed two learning tasks: watching a video, and filming a video. The choice of these two tasks was based on the opportunity of analyzing different roles of the children: one as active role (filming video) and the other as passive role (observing a video).
336
E. del C. Valderrama-Bahamondez et al.
Watching videos. Overall, there were six learning tasks involved watching videos, and the videos varied in terms of topic and duration. We restricted our population to two 5th grades groups who observed the same Math video (introduction of the multiplication of fractions), to ensure comparability. The video has duration of 5.7 minutes and was watched by 21 girls and 18 boys. In our first analysis, see fig. 1 (left), we found a gender difference in how many times the pupils observed the same video. In general, boys watched the video 1.2 times less than their female peers. To confirm this difference statistically, we conducted 2x2 between-subjects ANOVA to analyze the net time used for watching the video. The analysis revealed a significant difference between the groups 1 and 2 (F1,35 = 6.65, p < 0.05). However, the difference between genders was only marginally significant (F1,35 = 3.77, p = 0.060). As illustrated by fig. 1 (left), group 2 observed the video less times than group 1, while maintaining the trend between girls and boys.
Fig. 1. Number of repeated observations of the Math video by gender and by class group (left), and the percentage of interruptions in the video watching task (right). Group 1 and 2 represent the pupils of 5th A and 5th B respectively. Together refers to both groups.
Fig. 1 (right) shows the number of distractions while watching the Math video by gender. We define distractions as the occasions when the pupils pause/stop playing the video and open another application on the mobile phone instead. Fig. 1 (right) illustrates that the number of distractions by boys (39%) is approximately double in comparison to their female peers (19%).This also explains the shorter time used by boys for watching the video. We notice also that in group 1 girls show no distractions, and the number of distractions by boys is considerably lower in group 1. An explanation of the variation between the two groups showed in both graphs is that in the first group teacher used the video to introduce the concept (multiplication of fractions), while in the second group the topic was already introduced and the video was used to reinforce the learning. Another explanation is the learning effect. The group 1 observed the video during the first day of the field study and group 2 during the fourth day. This means that pupils in group 2 were already familiar with the mobile device. For both groups, however, this was the first time they were introduced to playing videos using the phone.
In Class Adoption of Multimedia Mobile Phones by Gender
337
Filming videos. Although there were around 7 tasks involving filming video, most of them were realized in a team - every child had to film a video in one learning activity only. We analyzed the children’s behavior during this task. In this activity, pupils had to go out to the school garden to film and describe a part of a flower. Children were free to choose the flower(s). This task was performed by children from the 5th grade of the first school, including 8 girls and 4 boys. The task was conducted during the third day of the trial, but it was the first task involving video recording. In general, we noticed two major gender differences: (1) All the girls worked in pairs – one filming while other presenting and vice versa; in contrast all the boys worked alone and filmed their own videos; (2) boys had no technical problems when filming, while two girls had difficulties with the task.
Fig. 2. Screenshot of the video recorded by the children for Nature Science class
Boys maintained focused in the flower as the main subject (fig. 2 right), while girls were more concerned about acting in front of the camera (fig. 2: left and center). All the girls cut off the flowers and held it in the hand in front of the camera; in contrast all boys completed the tasks without cutting it. As the boys created their own videos, they did not appear on the film in person. Six girls presented in front of the camera (full view), while two only partially (hand and lower body). The zoom option was neither taught nor explored by the participants.
5 Learning/Adoption during Explorative Use In addition to instructed learning activities, children were allowed to explore other functions of the mobile phone. Teachers reserved specific time slots (between 10 and 15 minutes) for free exploration, but unscheduled explorative usage also occurred during and between the instructed learning activities. We analyzed the explorative use of the phones mainly from log files containing active application history data. Findings from the quantitative log data were then contrasted by qualitative observations. 5.1 Log File Analysis We did not have any a priori hypotheses, as the original focus of the study was in the learning activities. Due to lack of control in the field trial method, the results are affected by unknown random and potentially confounding factors. Therefore, the following results should be interpreted merely as hypotheses for future experiments unless strongly supported by qualitative findings. To avoid bias in selecting metrics and comparisons, the log files were analyzed by a researcher who was not involved in the field trial.
338
E. del C. Valderrama-Bahamondez et al.
Log files contained a time-stamped history of active applications. We analyzed the usage during the first 4 days, and excluded 3 users who participated less than 4 days. First, the applications were categorized in system applications (e.g., home screen, application menu), learning applications (e.g., Camera, Paint Pad) and explored applications. A subset of the last category was further classified as game applications. Second, a set of metrics were extracted from the log files. We analyzed the results using mixed-models ANOVA with gender and school as between-subjects factors and day as a within-subjects factor. Time used for learning applications varied between the schools and days, but there were no systematic differences. Overall, the mean time used for learning applications was 60.8% (sd = 21.3%). For explored applications, there were significant main effects for gender (F1,53 = 5.98, p < 0.05) and school (F1,53 = 6.86, p < 0.05). In the first school, the mean exploration time by boys was 8.5% (sd = 13.4%) and by girls 3.6% (sd = 3.7%). In the second school, the corresponding results were 14.8 % (sd = 15.1%) for boys and 8.9% (sd = 8.9%) for girls. The remaining time was used in system applications. Idle time was excluded from the analysis using a threshold of 1 hour without an application switch. By analyzing the explorative usage further, we found a significant gender difference for game applications (F1,53 = 5.99, p < 0.05). There were also significant schoolday and gender-school-day interaction effects. A post-hoc analysis with Bonferroni correction revealed a significant day effect for the second school (F3,132 = 10.91, uncorrected p < 0.000005). This reveals that time used for games increased day by day, as illustrated by Fig. 4. Overall, the time used for games was more significant for boys (µ = 4.1%, sd = 9.6%) than for girls (µ = 0.3%, sd = 0.9%).
Fig. 3. Mean percentage time used for game applications
Similarly to these findings, during the field study we observed that girls played games notably less than their male peers. One girl commented to us “I find this game [car racing] very boring and difficult”, while a boy stated “I like this game [car racing] … is easy but I have to be concentrated”. The most played game was the Global Race- Raging Thunder® where the player drives a car in a racing competition. Most of the boys played alone, but four boys discover that turning on Bluetooth allowed them to play against each other.
In Class Adoption of Multimedia Mobile Phones by Gender
339
In order to analyze the scope of explored applications, we calculated the amount of new applications discovered daily by each user. System and learning applications were excluded from this analysis. We found significant main effects for gender F1,53 = 12.28, p < 0.001 () and day (F3,159 = 5.28, p < 0.005). The mean amount of discovered applications was higher for boys (µ = 3.5, sd = 3.0) than for girls (µ = 2.6, sd = 2.2). The discovery rate decreased moderately with time. Cumulative amounts of discovered applications are illustrated in Fig. 5 (left). We also analyzed exploration activity by calculating the frequency of switching the active application per hour. This metric includes learning and system applications, such as the application menu. We found a significant main effect on activity by day (F3,159 = 5.28, p < 0.005). Further analysis shows that there is an increasing trend in the activity during the first 3 days. Interestingly, the activity seems to decrease after the third day, as illustrated in Fig. 5 (right). Decreasing activity and slightly decreasing discovery rate may indicate that users found the most interesting applications during the first three days.
Fig. 5. Cumulative amount of discovered applications (left), and application switch frequency (right)
6 Discussion and Conclusion Mobile phones have a great potential as learning tools. In the learning process children need to learn to use the basic functions and a variety of applications of the mobile phone to utilize the tool. Children are fast to adopt these skills, especially if they find the device interesting. Games and media applications can have an important role in maintaining interest and introducing new functions. On the other hand, such applications can distract the actual learning process. A design challenge for future mobile learning devices is to balance pragmatic learning activities and free exploration. Our results indicate that there is a notable gender difference in how much children weight these activities. Boys are more active in free exploration and learning new applications, games in particular. Therefore, boys discover the basic functions of the phone faster. Girls are typically more focused on the learning task, but may be accidentally interrupted by problems in operating the device. One possible reason discouraging girls from exploration is that the existing games are more targeted towards boys.
340
E. del C. Valderrama-Bahamondez et al.
The initial surveys and interviews [2,3], as well as the post-interviews after the field study indicate that teachers in Panama would welcome the use of technology in classroom. In particular, they welcome the use of mobile phones, as they are more commonly available and widely used than computers in Panama. Teachers with no prior experience of these technologies, react more skeptically to the use of technology for learning. Because of the explorative and uncontrolled nature of our study, where each teacher designed their own learning activities, we received a variety of results and insights. As the learning tasks varied in content and form from grade to grade, the results are difficult to compare directly. The focus of this study was to capture the initial adoption of the mobile phones, a new technology for the children, thus the paper is limited to findings in this area. Nonetheless, we believe that our findings contribute in better understanding how children in unprivileged areas adopt the use of multimedia phones for learning with no preinstalled learning applications. Our future work will focus on longer term usage trends and gender differences in both in and out classroom settings. Also, we plan to conduct a controlled experiment to evaluate how using mobile phones actually affects the learning performance. Finally, as touch-screen phones are a growing trend in the market, it is expected that in the near future the technology will become a commodity in low-end devices and reachable to people living in developing regions. This will enable the use of touch-screen technologies and widen the spectrum of potential mobile applications also for learning purposes.
References 1. Contraloría General de la República de Panamá, http://www.contraloria.gob.pa/ 2. Valderrama Bahamóndez, E., Schmidt, A.: A Survey to Assess the Potential of Mobile Phones as a Learning Platform for Panama. In: Extend Abstracts on CHI 2010, pp. 3667– 3672. ACM Press, New York (2010) 3. Valderrama Bahamóndez, E., Winkler, C., Schmidt, A.: Utilizing Capabilities for Mobile phones. In: CHI 2011, pp. 935–944. ACM Press, New York (2011) 4. Kumar, A., Tewari, A., Shroff, G., Chittamuru, D., Kam, M., Canny, J.: An Exploratory Study of Unsupervised Mobile Learning in Rural India. In: ACM Conference on Human Factors in Computing Systems CHI 2010, pp. 742–752. ACM Press, New York (2010) 5. Puikkonen, A., Ventä, L., Häkkilä, J., Beekhuyzen, J.: Playing, Performing, Reporting – A Case Study of Mobile Minimovies Composed by Teenage Girl. In: OZCHI 2008, pp. 140– 147. ACM Press, New York (2008) 6. Jarkievich, P., Frankhammar, M., Fernaeus, Y.: In the Hands of Children: Exploring the Use of Mobile Phone Functionality in Casual Play Settings. In: Mobile HCI 2008, pp. 375– 378. ACM Press, New York (2008) 7. Kam, M., Mathur, K., Kumar, A., Canny, J.: Designing Digital Games for Rural Children: A Study of Traditional Village Games in India. In: Mobile HCI 2009, pp. 31–40. ACM Press, New York (2009) 8. Masperi, P., Hollow, D.: An evaluation of the use of ICT within primary education in Malawi. In: ICTD 2009, pp. 27–34. IEEE Press, Los Alamitos (2009) 9. UN-Panama: Erradicar la Pobreza Extrema y el Hambre, http://www.onu.org.pa/objetivos-desarrollo-milenio-ODM/ erradicar-pobreza-extrema-hambre
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design and Implementation Nicholas Graham2, Emmanuel Dubois1, Christophe Bortolaso1, and Christopher Wolfe2 1 IRIT, University of Toulouse, 31000 Toulouse, France {Emmanuel.Dubois,Christophe.Bortolaso}@irit.fr 2 School of Computing, Queen’s University Kingston, Ontario, Canada K7L 3N6 {graham,wolfe}@cs.queensu.ca
Abstract. In this paper, we present scenarchitectures, a means of raising the level of design of advanced interactive systems. Scenarchitectures combine elements of scenarios and system architectures, and can be used during the user interface design process as an adjunct to other design tools such as textual scenarios and story boards. Meanwhile, scenarchitectures can be automatically transformed to system architectures, providing a link between design and implementation. Using two existing scenarchitectural notations, we investigate the role of scenarchitectures in the design process. We then show how model-transformation techniques can be used to automatically derive system architectures from scenarchitectures, and conclude with concrete examples of the application of the scenarchitectural approach to the design of a mixed-reality system. Keywords: User interface design methods, software architecture, scenarchitecture, adaptive groupware, mixed interactive systems.
Fig. 1. The gulf between the scenario view used by designers and the architectural view of implementers (adapted from Carroll [7])
Scenarchitectural styles are domain-specific. For example, ASUR helps in the design and implementation of mixed reality applications [15], while Fiia is customized for mobile groupware applications [24]. By focusing on a single domain, a scenarchitectural style can provide abstractions and implementation techniques tailored to that domain, while retaining a simple and easily learned syntax. Fiia and ASUR were developed separately, but share core concepts. In this paper, we use these two styles to illustrate the principles underlying scenarchitectures. We show how Fiia and ASUR support a common process, and show how their differences help to illustrate different aspects of the design space of scenarchitectural styles. We first survey the traditional relationship between scenario-based design artifacts and software architectures, illustrating the ever-widening gulf between design and implementation. We then introduce the concept of scenarchitectures, illustrated by our two examples of Fiia and ASUR. We explore how scenarchitectures fit within a design process, showing how they can be derived from scenarios, and how they can coevolve with interaction design artifacts during iterative refinement. Finally, we discuss how model transformation techniques can be used to automatically map scenarchitectures to implementation-level system architectures.
2 Traditional Approach There is increasing recognition of the significant gulf between the perspectives of designers and implementers of interactive systems [6,4]. This gulf is illustrated by two widely-used design notations: scenarios and software architectures. Scenarios are, according to Carroll “a narrative description of what people do and experience as they try to make use of computer systems and applications” [6]. Scenarios aid design by capturing how people use existing systems [3], and how they might use new systems. Scenarios may be expressed textually, using storyboards, or through video [19]. Software architectures, on the other hand, are a central design artifact used to document a system’s implementation [2]. They show how the system is decomposed into components and how these components interact. Numerous software architectural styles have been proposed for interactive systems, such as MVC [17] and PAC [9], and more recently special purpose architectures for groupware [11] and games [13]. These two examples capture what Carroll calls the scenario and establishment perspectives in software design [7] (figure 1), showing the differing perspectives of designers and implementers. Concrete examples become generic and abstract; instead of focusing on the user’s experience, the implementation focuses on the system-level components that implement the interaction; scenarios cover part of the system’s use,
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design
343
while an architecture must comprehensively specify a complete design; scenarios are typically written in imprecise, free-form prose, while architectures are specified in a welldefined and precise notation; and scenarios express the expected (or “envisioned”) use of the system, while architectures are specifications. These multiple changes in perspective cause the difficulty of moving from interaction design to implementation design. An alternative to the “traditional” approach is the model-based generation of user interfaces from high-level models. These approaches are mainly based on task models, as following the CAMELEON framework [5]. As opposed to scenario-based design, generation of user interfaces from task models has largely been confined to the research lab. We argue in the following sections for a new approach, scenarchitectures, which combine aspects of scenarios and system architectures.
3 Overview and Examples Scenarchitectures are design-level documents written from the system perspective. They provide a notation that can be used in design sessions alongside traditional scenarios, user interface mockups and task models. Scenarchitectures contribute to iterative design by helping to make the design more concrete, and provide a bridge towards the implementation of an interactive system. Figure 2 shows how scenarchitectures fit into a design and development process. Scenarios capture how people interact with the system under design. Scenarchitectures make these scenarios more concrete by capturing the system’s components, relations between the components, and techniques used by the user to interact with the system. Scenarios and scenarchitectures are complementary: scenarios suggest system structures that should be captured in scenarchitectures; scenarchitectures in turn identify areas where scenarios are missing or lacking in detail. As we will show, scenarchitectures help with implementation, as model-driven engineering techniques can be used to transform scenarchitectures to system architectures.
Fig. 2. Fitting scenarchitectures into the development process
344
N. Graham et al.
Fig. 3. Furniture layout application – “Sally” the salesperson lays out furniture on a tabletop, while “Clive” the customer sees the results in 3D on his PC
Scenarchitectures are diagrammatic notations capturing a conceptual view of an interactive system in the context of its use. Three defining features of scenarchitectures differentiate them from traditional software architectures as described using class diagrams: •
Scenarchitectures are concrete. As opposed to a class diagram which expresses the relationships between types of components, scenarchitectures show the actual instantiated components making up the system;
•
Scenarchitectures are runtime snapshots showing the system at a particular point during the interaction. A scenario might be explained via a sequence of scenarchitectural diagrams, where each diagram captures a significant situation expressed by the scenario;
•
Scenarchitectures are domain-specific, specialized to the type of scenario being expressed.
We now provide two examples of scenarchitectural styles, and discuss how they can be used to support the design of mixed-interactive systems and adaptive groupware systems. We draw two examples of scenarchitectures from the literature: ASUR [15] and Fiia [24]. Both are domain-specific, with ASUR targeted towards mixed-interactive systems, and Fiia supporting the design of adaptive groupware. While both ASUR and Fiia already existed, our framework helps show the commonality of their approach and illustrates the tradeoffs they make. We briefly describe these scenarchitectural styles, and then show how they can be used to support both the interaction design process and the model-based derivation of traditional software architectures. To illustrate these two styles, we use the scenario of a tool helping with the layout of furniture in an office (figure 3.) ASUR and Fiia use different notations. To simplify presentation, we use a subset of their notations, in a unified syntax. This is summarized in figure 4 (left). 3.1 Example: The ASUR Scenarchitectural Model ASUR helps bridge between design and implementation of mixed-interactive systems. Such systems seamlessly combine the physical and virtual worlds. To help “Sally” explore the layout of furniture, the application of figure 3 displays digital images of the furniture on a physical table. Sally touches and drags furniture items to move them around the room.
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design
345
Fig. 4. Left: Legend showing scenarchitectural notation, and right: an ASUR scenarchitecture showing how “Sally” the salesperson manipulates digital furniture on a physical tabletop
ASUR scenarchitectural diagrams, such as the one shown in figure 4, capture snapshots of such mixed-interactive systems. Figure 4 (right) shows the table, digital furniture and digital map of the rooms. Adapters detect ( ) which furniture item is touched by the user on the table, and project ( ) the furniture layout on the table. Dataflow arcs express that the “User” selects and moves an item projected on the table, which in turn conveys a position and motion path to the input adapter, which finally delivers a position to the relevant “Furniture Item” before rendering it through the output adapter. Finally, the diagram specifies that the table, input and output adapters are all physically close (|=|) to the presenter. ASUR diagrams therefore capture the components of a system at a conceptual level, showing the system’s configuration at a particular point in time. ASUR is a domain-specific notation, tuned to answering questions pertinent to mixed interaction, and eliding details less relevant to that domain. As we shall see, this allows models to be created quickly and fluidly, allowing their use in design sessions. ASUR diagrams can be created at several levels of detail. When hand-sketched during a design meeting, details such as dataflow types are typically omitted. The model can be refined as part of the process of moving towards implementation. Also as part of the implementation process, decisions must be made such as how the input adapter should be implemented (e.g., as a camera tracking fiducial markers on the fingers, or via capacitive touch.) ASUR scenarchitectural diagrams provide a bridge to implementation. Using the Guide-Me tool [14], an ASUR scenarchitecture can be semi-automatically transformed into an implementation architecture, where code stubs are provided for virtual components, and where adapters are automatically implemented [15]. 3.2 Example: The Fiia Scenarchitectural Model Fiia addresses the problem of developing adaptive groupware systems – systems that allow groups of people to work together, and that adapt to changes in users’ tasks, locations and devices. Figure 5 continues our furniture layout example, showing how salesperson “Sally” creates a furniture layout for a customer “Clive’s” office. Sally uses a
346
N. Graham et al.
tabletop surface to manipulate furniture in a top-down view. Meanwhile Clive uses a 3D viewer to see how the furniture will appear in his office. Fiia captures the implementation design of groupware applications. The diagram of figure 5 shows the two participants’ settings (demarcated by dashed lines), emphasizing the different contexts of the collaborators. The “=” line shows data that is shared by Sally and Clive (the furniture layout – what furniture items are being used, and where they are located). Sally interacts with the furniture layout using a “2D layout editor”, which she manipulates via a tabletop surface. Meanwhile, Clive can view the scene on his PC using a “3D layout viewer”.
Fig. 5. A Fiia scenarchitecture showing how a salesperson and customer collaborate in the furniture layout task
Fiia is domain-specific in that it provides high-level constructs directly addressing groupware concepts. Settings collect people and their resources; sharing shows what resources are accessible to different users, and adapters (as in ASUR) explicitly show the devices that people use to interact with the system. Fiia diagrams represent a snapshot of the system’s use at a single point in time. As we shall see, a sequence of Fiia diagrams can be used to capture the flow of changing tasks, locations and devices, each showing the transition from one situation to another.
Fig. 6. How scenarchitectures bridge design and implementation
Fiia diagrams are usable not just to clarify the system’s design, but also as runtime artifacts used in the system’s implementation. Through a model transform algorithm [25], Fiia diagrams are automatically transformed to distribution architectures, suitable for
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design
347
direct execution. Both the Fiia model and the distribution model are maintained at runtime, and changes to either are automatically reflected in the other. This enhances the value of the Fiia diagrams, as they not only help clarify design, but provide a significant step towards implementation. ASUR and Fiia serve as strong examples of scenarchitural styles, showing how scenarchitectures expressed using these styles are concrete, runtime, domain-specific and help lead to implementation via model transformation. In the following two sections, we explore how scenarchitectures support design through their close linkage to scenarios, and how scenarchitectures provide a bridge to implementation through model-based derivation of system architectures. We then provide an example showing how ASUR has been used in an end-to-end design and implementation process.
4 Scenarchitectures in the Design Process As we saw in figure 1, Carroll has identified the ways in which scenarios provide a different perspective on systems than the traditional architectural view. We now expand this to figure 6 to show how scenarchitectures are compatible with scenariolevel design and also serve as a bridge to implementation. As motivated by Hornecker [16], this approach of combining formal representations and design sessions is part of a larger design approach in which participants’ creativity benefits from the generative power [22] of design models. As with scenarios (and unlike system architectures), scenarchitectures are concrete and focused on instances. This is the fundamental reason why they are an appropriate tool within scenario-based design. For a given step in a scenario, a corresponding scenarchitecture can be created, showing the system’s components. This helps situate the scenario in terms of the system being manipulated. When a user action is described in a scenario, the action can be clarified through the scenarchitecture – it is possible to see the instruments that the user will be manipulating, and trace the effects of these manipulations through the system. This view complements that of the scenario itself, which better captures the user’s intentions, deliberations and mood. Because they are at the same level, designers can use both forms of description. Elaboration of one artifact may expose open questions in the other, leading to a coevolutionary refinement of both. Meanwhile, scenarchitectures form a bridge between design and implementation. Scenarios are work-driven (expressing users’ actions and intentions) while system architectures are technology-driven (illustrating the system’s software and hardware components), and scenarchitectures have elements of both (showing the system’s components and how the user interacts with them). Scenarchitectures typically start their life during brain-storming sessions as openended and fragmentary and are later refined to a complete and exhaustive state necessary for model-based implementation. Using scenarchitectures during a design session thus supports progressive movement from a very partial description, such as those offered with scenario, to a complete description of the solution including implementation recommendations as those included in architectural models.
348
N. Graham et al.
Scenarchitectures are also refined to bridge the gap between informality and formality: their use is flexible enough to allow discussions among non-experts but they also conform to a formal meta-model. Finally, scenarchitectures constitute specified solutions. Each scenarchitecture diagram describes how the system will work and how the user will interact with it. Scenarchitectures are not limited to an envisioned use as in the case of scenarios. We have identified four ways in which scenarchitectures help in the design process: supporting exploration and better understanding of tasks; refining scenarios from abstract to concrete; illustrating alternative ways of carrying out a task, and illustrating the steps of a scenario. We now illustrate these using examples from ASUR and Fiia. 4.1 Support Exploration of Tasks Scenarchitectures help designers explore users’ tasks. They help in the refinement of both task and scenario descriptions by providing a different viewpoint on the interaction and by helping to clarify how the user interacts with the system itself. If we consider our furniture layout example, we might state the initial task as “find appropriate furniture and layout for an office”. The Fiia scenarchitectural diagram of figure 5 encourages us to think about a range of questions related to the task: •
What are the roles of the different participants? Sally is trying to sell furniture, and therefore her task is ultimately to find a furniture configuration that Clive is willing to purchase. Meanwhile, Clive’s ultimate task is to assess how well Sally’s proposals will fit his office.
•
How do the participants interact with the system? To support her sales role, Sally will propose ideas for furniture and how it will be laid out. She therefore needs an editor allowing her to quickly manipulate the furniture. The editor provides a top-down view, and is based on a touch-surface where furniture can be easily dragged and rotated. Meanwhile, Clive might have a harder time understanding the abstract top-down view, and therefore sees the furniture in 3D, as will appear in the office. The Fiia diagram clarifies Clive’s use of a standard PC, where a mouse and keyboard are used to navigate the 3D scene, while Sally uses a specialized tabletop computer to manipulate the furniture positions.
•
How do the participants communicate with each other? The Fiia diagram clearly illustrates the participants’ points of communication. The furniture layout data is shared, and therefore Clive’s view is updated in real time in response to Sally’s edits. Both Sally and Clive share a voice over IP connection allowing speech communication.
The process of creating the Fiia diagram helps to address these questions, simply because the diagram explicitly contains the participants’ settings, devices and communication modalities. Creating the diagram therefore raises questions that designers must consider, helping to refine understanding of the underlying task. Scenarchitectural diagrams are complementary to traditional task models. If the design process uses task models, formalized links can be established between them and scenarchitectures [8].
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design
349
Fig. 7. Abstract ASUR description of the furniture layout (left) and concrete description of tangible interface (right)
4.2 Refine Scenarios from Abstract to Concrete The design of an interactive system is often initially sketched in an abstract form, where details are left for later discussion. Scenarchitectures support this transition from abstract to concrete form. This allows designers to work with a level of abstraction that is appropriate to the stage of the design process. We illustrate this refinement by example. Our example system allows the seller to move furniture items within a digital representation of the customer’s office. As shown in figure 7 (left), ASUR can be used to provide a concise description of this very abstract definition of the system: ASUR entities represent the key concepts, and the ASUR arrows depict the communication channels required to perform the features. This abstract description however does not help designers understand the concrete specifics of interaction. The furnished room might for example be displayed on a screen or described via spoken voice. Scenarchitectures help refine such abstract scenarios to concrete designs. One design option would be to provide a speech interface for the “select and move” action. As a result, the input sensor would have to be able to recognize speech. An alternative design would be to add a table in the seller’s interaction space, as was seen in figure 4 (left). These items are then displayed by a device that is physically grouped (|=|) with the table and the sensor. Attributes of the communication channels between the table and sensor express that the touched position on the table are encoded in an IR Image (not shown), implying that the sensor must be an IR Camera. This concrete refinement is illustrated in figure 4 (right). This refinement within the scenarchitectural notation matches the needs of the design process. Designers can start with their choice of abstract or concrete design, and refine over the course of their design session, all within the same notation. The scenarchitectural diagram helps identify areas where decisions need to be made more concrete, helping to feed back into the design.
350
N. Graham et al.
Fig. 8. Customer “Clive” moves to his boss’ “Barry’s” office and shows him his furniture purchase decisions
4.3 Illustrate Alternative Ways of Carrying out a Task An important part of ideation is rapid consideration of design alternatives. Scenarchitectural diagrams are high-level, tuned to a particular domain, and – as described in the last section – support refinement from abstract to concrete form. This makes them suitable for rapid sketching of alternative design ideas. Figure 4 showed one concrete design of the furniture layout editor. A second option (figure 7, right) uses a physical representation of every furniture item: a cube represents a table; a pyramid stands for a chair, and a cone for a lamp. Placing or dragging these tokens moves virtual furniture items. This physical–digital association is represented by an “X” symbol in the ASUR diagram. The localization of all physical tokens on the table is performed through an input adapter, still tightly coupled with the projector and physical table. To move from the digital tabletop solution (figure 4) to this tangible tabletop solution, the scenarchitectural diagram remains unchanged in its treatment of the digital world. The physical area, however, has been reworked to introduce new elements and correctly insert them in the interaction process. This example illustrates how scenarchitectures support the brainstorming of different ideas within the design process, helping to quickly illustrate the system perspective of different design ideas. 4.4 Illustrate Scenario Steps A scenario may involve changes to the system’s configuration, for example in response to new devices becoming available, users’ changing location, or partial failure of the underlying infrastructure. Such changes can be described as part of a prose scenario, and can also be precisely captured using a scenarchitectural diagram.
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design
351
As a second step in our furniture layout scenario, Clive finishes his session with Sally, and wishes to discuss the results with his boss “Barry”. He walks to Barry’s office, and uses his Smartphone to show the resulting furniture layout. On the smaller screen device, he uses a top-down view (similar to the one Sally used on her tabletop display.) Using dragging and zooming operations, he explains the furniture purchase proposal to Barry. Figure 8 shows the Fiia diagram capturing this scenario step. This example shows how scenarchitectural diagrams can capture the steps involved in a scenario. In this case, the following has changed: •
The participants: Sally has left the collaboration, while Barry has joined;
•
The participants’ settings: the collaboration is now co-located (in Barry’s office) rather than distributed;
•
The participants’ devices: instead of the PC and tabletop, the participants are now using a Smartphone;
•
The task: the task has moved from finding a furniture layout to trying to convince the boss to release funds to proceed.
Real collaborative work typically involves such changes in participants, tasks, locations, devices and collaboration style. Scenarios are an excellent tool for documenting such changes. Because they express particular instances, scenarchitectures are suitable for capturing the the steps identified in a scenario. They add to the information in the scenario by explicitly showing changes in participants, settings and devices in response to changes in the task. 4.5 Summing Up: Fitting Scenarchitectures into the Design Process In this section, we have shown how scenarchitectures can contribute to the design of richly interactive systems such as adaptive groupware and mixed interactive systems. We have discussed how scenarchitectures can help refine the designers’ understanding of tasks, how they can help transition from abstract to concrete designs, how they can help enumerate design choices, and how they can help show transitions in the system’s user’s tasks, devices and locations. By supporting these activities, scenarchitectures can be refined together with other design artifacts such as task descriptions and textual scenarios. Scenarchitectures are better suited to this role than traditional software architectures due to their high-level and user-focus. Returning to the table of figure 6, a fundamental property of scenarchitectures is that, like scenarios, they are concrete and focus on particular instances. This allows direct matching of system entities with the activities described in scenarios. In effect, the steps of a textual scenario act as captions describing the situations diagrammed in a scenarchitecture. Scenarchitectural diagrams bridge the gaps of formality and rigor. Diagrams can be informally sketched on a whiteboard or paper. As part of the process of moving to an implementation, they can later be refined to clean up their syntax or add detail. More fundamentally, moving from a scenarchitecture to a concrete implementation requires movement from the scenarchitecture’s concrete and instance-focused representation to the software architecture’s abstract and generic representation. As we will discuss, model transformation techniques can partially (or fully) automate this
352
N. Graham et al.
process. We argue that this automation is crucial to the practicality of the scenarchitectural approach, by providing developers with a concrete implementation benefit in addition to the less tangible benefits of improvement of the design process.
5 Deriving Implementations from Scenarchitectures The key to the practical use of scenarchitectures is the ability to automatically derive all or part of the system architecture from the scenarchitectural descriptions (figure 2). This provides a seamless transition from the world of design to the world of implementation. It also adds significant value to the scenarchitectures themselves, as they not only aid in design, but also provide a gateway to implementation. The generation of system architectures from scenarchitectures is a modeltransformation problem [20]. How this is done in ASUR and Fiia is described in detail elsewhere [14,25], but we briefly summarize their approaches. In Fiia, the system architecture contains the same components as the Fiia diagrams [24]. Adapters are replaced by components that interface with the specified devices. Connectors are replaced with system-level components that provide network endpoints, caches, broadcasting facilities, and concurrency-control and consistency maintenance managers. The Fiia runtime is responsible for dynamically modifying this system architecture in response to change from one Fiia diagram to another (as described in section 4.4) [25]. Fiia produces only architectures, not complete programs; developers are responsible for programming the internals of the components in their Fiia diagrams. The Fiia runtime architectures are highly performant, executing faster than architectures programmed in native code [24]. ASUR’s Guide-Me tool generates the part of the software architecture that is responsible for interaction [14]. Developers are responsible for integrating this subarchitecture with their application. Guide-Me helps developers choose between different interaction technologies, and automatically generates device handling code over existing middleware. Interaction sub-architectures can be generated for each situation identified in an ASUR diagram, and can be manually linked. As these two examples suggest, there is a significant tradeoff space in how scenarchitectures can be transformed to system architectures. ASUR and Fiia’s different points in this tradeoff space help to illustrate how different implementation choices are appropriate to different kinds of scenarchitectural style. We now enumerate the axes of this tradeoff space: •
Development-time versus runtime models: The transformation from a source to target model may be a static process, where the scenarchitectures are used to create a system architecture. Alternatively, the transformation may be dynamically performed at runtime. ASUR’s Guide-Me tools take the static approach, with the advantage that it is considerably easier to implement [15]. This allows Guide-Me to use standard model transformation tools based on Eclipse’s ATL Transformation Language [18]. The dynamic approach, as used by Fiia, gives the additional flexibility of allowing the program to consult the state of the scenarchitecture as the program executes. This is necessary for adaptive groupware, where execution involves transition from one scenarchitecture to another. Runtime models have considerably stronger
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design
353
performance requirements: it is acceptable for a static transformation to take minutes to execute, while a dynamic transformation must typically be executed in milliseconds in order to avoid a perceptible pause in the application’s execution. Fiia uses a custom model-transformation algorithm to achieve runtime speeds in transformations [25]. •
Ability to edit at both levels: Typical model transformation techniques are oneway only, allowing a target model to be generated from a source model. Bidirectional transformations allow either level to be edited, and changes in one to be reflected in the other. ASUR uses a one-way transformation, meaning that if the transformation is executed again, any changes that have been made in the system architecture must be manually integrated into the newly generated architecture. With Fiia, bi-directional changes are supported through a novel rewriting transformation algorithm [25], allowing changes in the system architecture (e.g., due to partial failure in the distributed system) to be reflected in the scenarchitecture.
•
Degree to which implementation requires hand-coding to make it executable: The system architecture that results from the transformation must be populated with code to allow its execution. As discussed above, Fiia and ASUR require application components to be hand-coded.
•
Ability of developer/end-user to influence transformation: When the transformation is non-deterministic (i.e., multiple target models could result for a single source model), the transformation engine may not pick the best result. Different approaches exist allowing the developer to influence the transformation. Fiia uses semantics-preserving annotations on the source model, while ASUR engages the user during the transformation process.
In sum, ASUR uses classic forward engineering using standard tools, while Fiia is based on incremental, bi-directional runtime adaptation of models. The differences between the model transformation approaches taken by ASUR and Fiia show that a wide range of techniques are appropriate for different classes of architectural style. Specifically, forward engineering is a good match for ASUR, since the required transformation is from a static diagram to a system architecture. Forward engineering allows the use of stock tools (ATL) and allows an interactive generation process. Fiia, on the other hand, relies on transitions between scenarchitectures at runtime, and needs to support bidirectional transformation. This requires a novel transformation algorithm that can perform bidirectional updates at runtime speed. The tradeoff space that we have identified above helps to understand this range of implementation techniques for different kinds of scenarchitectural style.
6 Example of Collaborative Design Using Scenarchitectures We have used scenarchitectures to help design numerous concrete systems, including the furniture layout application described earlier in this paper [24] and the Raptor tool for game sketching [23]. To more concretely illustrate the concepts presented in this paper, we now explore our use of scenarchitectures in the development of CladiBubble, an augmented reality exhibit for the Toulouse Museum of Natural
354
N. Graham et al.
History. CladiBubble is a walk-up exhibit that helps explain the cladistics biological classification system to museum visitors. Visitors are invited to manipulate a threedimensional cladogram (i.e., tree of life) in order to modify the tree and move the species spatially close to each other. Then visitors are asked to group species (e.g. tree leaves) using a digital bubble (figure 9).
Fig. 9. Screenshots of CladiBubble: a 3D cladogram and its digital bubble for grouping species
Fig. 10. The CladiBubble design team collaboratively creating an ASUR model
The CladiBubble design team was multi-disciplinary, involving a museologist, computer scientists, ergonomists and domain experts. Each design session was two hours in length. During the session, we used scenarchitectures in the ASUR style to augment traditional idea capture devices (i.e., post-it notes, whiteboards and mockups). The core of the design session was an unconstrained iterative process of idea generation [22], diagramming the idea as an ASUR scenarchitecture, and exploring potential variations. Concretely, when a participant suggested an idea, he/she sketched it using ASUR (figure 10). Then the facilitator pointed out possible variations of relevant characteristics, helping the participants identify a new range of possibilities. We observed that elementary manipulations of a scenarchitecture diagram during design sessions helped designers to systematically explore the design space, as suggested in section 4.3. For example, an early design for manipulating the digital species of the tree required users to manipulate blocks, each representing a species. An alternative design suggestion was to add a physical element to the grouping task (with the bubble). The designers created a physical balloon whose behavior is directly linked
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design
355
to the grouping bubble behavior (as illustrated by ASUR’s mixed proximity “X” group). In both of these examples, scenarchitectural diagrams were helpful in exploring the design space and prompting designers to envision new possibilities. We also observed that scenarchitectures eased the refinement of abstract system descriptions into concrete solutions (as suggested in section 4.2). For example, during the session a participant suggested the use of a remote pointer to select the species in the tree. He then added an input sensor to link his physical remote controller with the digital entities. At that point, the domain expert advised the group that wires were a poor choice in a museum context. As a result, the facilitator suggested the use of the light to convey information, leading the input adapter to be refined to an infrared camera. In this example, a domain constraint (i.e., wire-free environment) and an ASUR attribute drove participants to generate a new interactive solution. As a result, the scenarchitecture helped to refine an abstract attribute to a concrete solution. By the end of the session, 17 different scenarchitectures had been generated. These models captured a range of generic interaction techniques, which we classified as: 1) remote pointers, 2) gesture and pen based interaction, 3) tactile tabletops and 4) tangible interfaces. For each of these generic techniques, participants explored several metaphors. For example, to inflate the bubble two metaphors were proposed: a physical inflatable balloon whose behavior was analogous to the digital bubble, and a pump for inflating the digital bubble. This example of the design of CladiBubble illustrates how scenarchitectures assist in the exploration of the design space, and particularly highlights their usefulness in refinement of abstract attributes to concrete solutions, as well as showing their capacity for illustrating different ways of carrying out a defined task.
7 Discussion This paper has introduced the concept of scenarchitectures as a notation bridging design and implementation. We have shown how designers can use scenarchitectures during design sessions, and how the resulting designs can be used as input to a model transformation process, helping to create a system architecture. We have discussed the space of model-transformation techniques helping the derivation of system architectures from scenarchitectural models. We have illustrated how two existing models fit with the concepts, roles and impacts of a scenarchitecture on the development process of advanced interactive systems – Fiia for adaptive groupware, and ASUR for mixed-interactive systems. We have argued that scenarchitectural styles are by their nature domain-specific. An obvious question is whether it would be possible to create a single notation supporting interactive systems in general. Three points argue against this approach: •
To be usable in design sessions, scenarchitectural styles should be simple, easily taught, easily used, and should address the specific issues that commonly occur in the design of a particular class of interactive system. For example, ASUR focuses on the design of the adapters that people use to interact with mixed-interactive systems, while Fiia focuses on how people interact with each other through shared artifacts and shared communication
356
•
•
N. Graham et al.
modalities. Attempting to address too many interactional issues in a single notation would conflict with the need for notational simplicity. As discussed in section 5, ASUR and Fiia use very different approaches to generate system architectures. ASUR is based on static transformation involving interactive guidance from the developer. Fiia uses dynamic, bidirectional transformation at runtime to support real-time adaptation. A unified notation would make it difficult to support such varied implementation techniques. Finally, as new styles of interaction are developed, it seems unreasonable to expect to anticipate all possible styles.
Relatedly, we might speculate about what kind of additional scenarchitectural styles could be interesting. A further example from the literature is MIM, which allows the development of mixed-interactive systems [10]. Another candidate where domainspecific notations might of use includes interfaces for the otherly-abled (allowing exploration of how an application might present interaction possibilities to people with a range of disabilities). Much future research is possible to find further domains and explore scenarchitectural notations aiding in their design. Another area for further exploration is what form of tools might be helpful to support the documentation of scenarchitectures during design sessions. As described in section 6, to date, we have found the most practical approach is traditional pen and paper, aided by post-it notes. This has the disadvantage, however, that designers are given no aid in the syntax of the notation, and that editing can become difficult as the design advances. Additionally, a facilitator must translate the scenarchitectures into an electronic notation following the design session. We hypothesize that it might be possible to build an interface based on a touch-sensitive table that balances the need for smooth and simple interaction with the benefits of a digital editor.
8 Conclusions In this paper, we have introduced the concept of scenarchitectures, a class of design notations for interactive systems that bridge between the scenario and architecture perspective of interactive systems. We have shown how scenarchitectures help capture a high-level view of the system being designed, and show the system’s interaction affordances. They complement other design notations (such as scenarios and UI mockups) by bringing the system perspective into design, and helping to expose domainspecific concepts. We have shown several ways in which scenarchitectures help in the design process, including supporting exploration of the users’ tasks, illustrating different ways of carrying out a task, refining scenarios from abstract to concrete, and showing the steps of a scenario. A concrete example of the use of scenarchitectures in the development of a museum installation helped demonstrate these concepts. Scenarchitectures bridge design and implementation by affording model-based generation of system architectures. In the two example styles we considered (ASUR and Fiia), these system architectures are represented as executable code. We explored the design space of model-based techniques for transforming scenarchitectures to system architectures, and detailed the two very different techniques used in our example notations.
Scenarchitectures: The Use of Domain-Specific Architectures to Bridge Design
357
Acknowledgements. This work was partially funded by a France-Canada research grant from the French Embassy in Canada, and by the Natural Science and Engineering Research Council of Canada.
References 1. Beaudouin-Lafon, M.: Designing interaction, not interfaces. In: AVI, pp. 15–22. ACM, New York (2004) 2. Bass, L., Clements, P., Kazman, R.: Software Architecture in Practice, 2nd edn. AddisonWesley Professional, Reading (2003) 3. Bødker, S.: Scenarios in user-centred design – setting the stage for reflection and action. Interacting with Computers 13, 61–75 (2000) 4. Brown, J., Marshall, S.: Sharing human-computer interaction and software engineering design artifacts. In: Computer Human Interaction Conference, pp. 53–60 (1998) 5. Calvary, G., Coutaz, J., Thevenin, D., Limbourg, Q., Bouillon, L., Vanderdonckt, J.: A Unifying Reference Framework for Multi-Target User Interfaces. Interacting with Computers 15(3), 289–308 (2003) 6. Carroll, J.M.: Scenario-based design: envisioning work and technology in system development. John Wiley & Sons Inc., Chichester (1995) 7. Carroll, J.M.: Making use: scenario-based design of human-computer interactions. MIT Press, Cambridge (2000) 8. Charfi, S., Dubois, E., Feige, U.: Articulating interaction and task models for the design of advanced interactive systems. In: Winckler, M., Johnson, H. (eds.) TAMODIA 2007. LNCS, vol. 4849, pp. 70–83. Springer, Heidelberg (2007) 9. Coutaz, J.: PAC, an implementation model for dialog design. In: INTERACT, pp. 431– 436 (1987) 10. Coutrix, C., Nigay, L.: Balancing physical and digital properties in mixed objects. In: AVI, pp. 305–308 (2008) 11. De Alwis, B., Gutwin, C., Greenberg, S.: GT/SD: performance and simplicity in a groupware toolkit. In: EICS, pp. 265–274 (2009) 12. France, R., Rumpe, B.: Model-driven development of complex software: A research roadmap. In: Future of Software Engineering, pp. 37–54 (2007) 13. Graham, T.C.N., Roberts, W.: Toward Quality-Driven Development of 3D Computer Games. In: DSV-IS, pp. 248–261 (2006) 14. Gauffre, G., Dubois, E.: Taking advantage of model-driven engineering foundations for mixed interaction design. In: Hussmann, H., Meixner, G., Zuehlke, D. (eds.) ModelDriven Development of Advanced User Interfaces. Studies in Computational Intelligence, vol. 340, pp. 219–240. Springer, Heidelberg (2011) 15. Gauffre, G., Charfi, S., Bortolaso, C., Bach, C., Dubois, D.: Developing Mixed Interactive Systems: a Model Based Process for Generating and Managing Design Solutions. In: The Engineering of Mixed Reality Systems, pp. 183–208 (2010) 16. Hornecker, E.: Creative idea exploration within the structure of a guiding framework: the card brainstorming game. In: TEI 2010, pp. 101–108. ACM, New York (2010) 17. Krasner, G.E., Pope, S.T.: A cookbook for using the Model-View-Controller user interface paradigm in Smalltalk-80. JOOP 1(3), 26–49 (1998) 18. Jouault, F., Allilaire, F., Bezivin, J., Kurtev, I.: ATL: A model transformation tool. Science of Computer Programming 72(1-2), 31–39 (2008)
358
N. Graham et al.
19. McKay, W., Ratzer, A.V., Janacek, P.: Video Artifacts for Design: Bridging the Gap Between Abstraction and Detail. In: DIS 2000, pp. 72–82 (2000) 20. Sendall, S., Kozaczynski, W.: Model transformation: heart and soul of model- driven software development. IEEE Software 20(5), 42–45 (2003) 21. Smith, J.D., Graham, T.C.N.: Raptor: Sketching Games with a Tabletop Computer. FuturePlay, 191–198 (2010) 22. Wilson, C.E.: Brainstorming pitfalls and best practices. Interactions 13(5), 50–63 (2006) 23. Wolfe, C., Smith, J.D., Graham, T.C.N.: A Model-Based Approach to Engineering Collaborative Augmented Reality. Engineering Mixed Reality (2011) 24. Wolfe, C., Graham, T.C.N., Phillips, W.G., Roy, B.: Fiia: User-Centered Development of Adaptive Groupware Systems. In: EICS, pp. 275–284 (2009) 25. Wolfe, C., Graham, T.C.N., Phillips, W.G.: An Incremental Algorithm for HighPerformance Runtime Model Consistency. In: MODELS, pp. 357–371 (2009)
Pattern Tool Support to Guide Interface Design Russell Beale and Behzad Bordbar School of Computer Science University of Birmingham Edgbaston, Birmingham, B15 2TT, UK {R.Beale,B.Bordbar}@cs.bham.ac.uk
Abstract. Design patterns have proved very helpful in encapsulating the knowledge required for solving design related problems, and have found their way into the CHI domain. Many interface patterns can be formalised and expressed via UML models, which provides the opportunity to incorporate such patterns into CASE tools in order to assist user interface designers. This paper presents an implemented tool-based approach for the discovery of an appropriate set of design patterns applicable to a high-level model of the system. The tool accepts a UML model of the system and presents a set of interface design patterns that can be used to create an effective implementation. The tool is aimed at providing designers with guidance as to which successful design approaches are potentially appropriate for a new interactive system, acting as a supportive aid to the design process. The use of high-level modelling approaches allows designers to focus on the interactions and nature of their systems, rather than on the technologically-driven details. Keywords: UML, Design Patterns, Modelling, Tools.
input devices, new functionalities such as digital cameras, and so on) and providing a consistent, coherent design solution in such a rapidly moving environment is a major challenge. These difficulties are worsened over the course of rapid cycles of software production - the user can be forgotten as the technology advances and all too often new features appear in originally well-designed systems that are unnecessary, unwanted, or simply inaccessible [2]. Even when well designed initially, systems can evolve away from users’ needs. Ideally, what is needed is a high level approach to designing systems that captures the requirements of the user but which is not directly linked to the underlying technologies. We also want to be able to do our interaction and user experience design and capture the resulting system in a way that enables us to instantiate it in alternative technologies, and which provides us with a framework to refer back to when revisiting the design, as we will inevitably do when new technologies come along. Essentially, we need to abstract many aspects of the design away from the low-level details, in order to allow us to reuse successful fundamentals in changing implementations. This paper, building on previous work in the formal modelling of HCI patterns [3], reports on the design of a tool that accepts a UML model of the system and then identifies appropriate HCI design patterns suitable for the implementation/refinement of different parts of the model. This aims to assist the designer of such system by enhancing the functionality of existing UML tools. The paper also reports and evaluates a prototype tool developed on the basis of the presented method.
2 Patterns in HCI Design patterns build upon Alexander’s pioneering work in architecture [4-6], in which he introduced patterns as an approach to framing and discussing architectural problems and possible solutions. Later the “Gang of Four” [7] popularised this approach for software development. They have been embraced by parts of the HCI community as an approach to design [8-10] A pattern describes a recurring problem that occurs in a given context, and based on a set of guiding principles, suggests a solution. The solution is usually a simple mechanism: a certain style of layout, a particular presentation of information; techniques that work together to resolve the problem identified in the pattern. Patterns are useful because they document simple mechanisms that work; provide a common vocabulary and taxonomy for designers, developers and architects; enable solutions to be described concisely as collections of patterns; enable reuse of architecture, design and implementation decisions. Patterns are useful as they allow us to capture the salient features of a design, and the accompanying issues associated with that choice. They give us a way of sharing concepts, an approach to discussing different options, and a repository of design practices. As well as in interface design in software, HCI design patterns have been extensively used in website development, and the consistency now observed in navigation bars, side menus and so on are down to an adoption of common approaches to solving the navigation and maneuvering issues encountered on the web, and these have been encapsulated into a set of design patterns e.g. Duyne, Landay and Hong [10]. Van Welie and van der Veer [11] provide a detailed discussion of HCI design patterns and their formalization into pattern languages. Tidwell [12] and the accompanying site
Pattern Tool Support to Guide Interface Design
361
(www.designinginterfaces.com) provide a fairly comprehensive collection of design patterns describing: “what the pattern consists of”, “when the pattern should be used” and “why the pattern is useful”. There is also an illustration of “how the pattern can be implemented” and examples of user interfaces from real applications that implement each design pattern. At a high level, patterns are therefore highly useful constructs. Design patterns are not perfect, however. There is no commonly accepted pattern language, and those that exist provide a framework for textual descriptions. Design patterns are usually expressed as semi-structured free-form text: they have a regularised layout of name, uses, problems and so on, with the details of the patterns described in natural language[13]. Efforts are ongoing to devise a standard XML expression (e.g. CHI 2003 workshop, 2004 workshop) [9], which will provide a framework for effective sharing and exchange of HCI patterns. Being essentially textual, design patterns rely on large quantities of realworld knowledge to interpret and understand them, are not machine-understandable, and so are hard to apply without a great deal of craft knowledge. In itself, this does not limit the scope of patterns for describing solutions to problems, but it does make accessing relevant patterns particularly difficult. Pattern libraries have essentially to be browsed manually, with the rapid identification of a suitable pattern usually coming about only by the designer having extensive familiarity with the complete set. As different authors often create patterns, it becomes hard for any substantial pattern library to identify patterns that are in fact identical, or at least very similar. The textual nature also makes the different descriptions used by different designers more critical, in that this can confuse the search of a relevant solution to a problem. Some designers would argue that this is an inherent advantage of patterns: they provide the core of a solution to a problem but in a way that allows you to use it many times without ever repeating the exact same result, enabling them to express their creativity and reflect their specific understanding of the problems of the user. But for others less experienced, not being able to access individual solutions to problems without understanding the whole pattern set is too much of a hurdle to overcome. A pattern language offers a relatively familiar structure and so suggested solutions can be understood more quickly. But being able to identify the set of candidate patterns, to find related ones, to understand the constraints imposed by one choice over another, is an unsupported, difficult task. We are not advocating an approach that allows uncritical application of patterns to problems, but do believe that designers are in increasing need of support in their tasks. One of the successes of the gang of four’s book [14] was to offer standardised approaches to common software engineering problems in a way that programmers could easily select from, comprehend, use and adapt, without them having to have detailed craft knowledge of all the other patterns as well. Clearly, expert designers who understand the details will produce consistently better solutions than those who do not, but such is the pace of technological development that many of our systems are being designed by non-experts, who may well benefit from any support we can give them. 2.1 Sketch of the Solution The aim of this paper is describe a method for the discovery of design patterns, which are applicable to UML models of the system. Identifying patterns is the first step in automating the implementation of user interface design patterns. Figure 1 describes
362
R. Beale and B. Bordbar
such a tool. The diagram indicates that the tool receives a system design model as its input and a set of templates modelling user interface design patterns in the UML. The tool compares a number of HCI patterns with the model and returns a report highlighting where the particular design pattern matches the input model. In effect the tool proposes a set of suitable patterns for a portion of the model. The designer can decide on a suitable pattern from the presented list. Existing UML tools can be adopted to apply such patterns to a design and create an implementation. In this paper we solely focus on identifying patterns and do not deal with the automated implementation of chosen patterns, say in a programming language such as Java, an activity which has its own challenges [15].
Fig. 1. High-level design for a CASE tool to match HCI patterns to a system design
To create such a pattern-matching tool, two steps are involved. First, HCI patterns, which are described in an informal manner, say in Tidwell [12], must be modelled as machine-readable form. Secondly, the pattern-matching tool must implement an algorithm, which probes the model of the system to discover part of the model corresponding to each pattern. These two steps are described in the next two sections.
3 Formalizing HCI Patterns in UML Formal modelling of design patterns [7] via UML has received considerable attention. Sunye et al [16] adopt a meta programming approach, applying design patterns by means of successive transformation steps, though they do not address the issue of interaction and focus on static aspects. [17] and [18] address both static and interaction aspects of the specification of the design patterns. [19] and [20] both make use of UML class diagrams and OCL statements and suggest extensions of the UML via a profile for the modelling of the patterns, and [21] studies the composition of design patterns. Our recent work deals exclusively with modelling of HCI patterns via the UML and describes models for a number of design patterns. To describe the process of modelling, we shall present two related examples of a design pattern and their formal modelling (taken from [3]).
Pattern Tool Support to Guide Interface Design
363
3.1 Overview Plus Detail Tidwell [12] describes the Overview Plus Details (OPD) as follows: “Use when: You need to present a large amount of content - messages in a mailbox, sections of a website, frames of a video - that is too big, complex, or dynamic to show in a simple linear form. You want the user to see the overall structure of the content; you also want the user to traverse the content at their own pace, in an order of their choosing. Why: It's an age-old way of dealing with complexity: present a high-level view of what's going on, and let the user "drill down" from that view into the details … the overview can serve as a "You are here" sign. A user can tell at a glance where they are in the larger context…. How: The overview panel serves as a selectable index or map. Put it on one side of the page. When the user selects an element in it, details about that element - text, images, data, controls, etc. - appear on the other side. …” Examples of this can be seen in the Windows Explorer and typical email clients, shown in Figure 2. The Overview is shown in the pane on the left in Figure 2: Folders in Explorer, Mail folders in Mozilla, with details about the selected item on the right – files and folders in Explorer’s case, an email message header in Mozilla’s. In the Mozilla example, there is also a pane below the detail pane; the overview detail pattern is applied again to the contents of the mailbox, with a set of message headers in the top as the overview and the detail given in the pane below them. Tidwell’s description of the design pattern is typical of many – a flowing, clear, description in natural language that conveys the essence of both the circumstances under which it is appropriate and what it actually means for the interface. The full pattern also identifies other patterns that are related. The problem is that using these patterns requires very good craft knowledge, as there is no tool support or effective way of browsing or searching them. This makes sharing knowledge with up and coming designers more difficult. Common examples of the application of Overview Plus Details are the file browsers discussed earlier, but here we shall take an email client as our example. It keeps a selectable list of email in an overview pane and by clicking on each email the contents are shown in another pane.
Fig. 2. Overview Detail design pattern example - Mozilla
364
R. Beale and B. Bordbar
Figure 3 formalizes the overview plus details as a class diagram; each Window includes two panes; one pane is for the overview, which presents a high-level view of the data and the second pane is for the detail, which depicts the details related to the high-level view. The overview is in correspondence with the only one detail: this is depicted via a unary association connecting the two. Window
+displays
Overview +overview
1
Selectable Index -activated : bool
+select() : void
1
1..*
1 1
1
1
1
1
2 Pane
+displays
Detail 1
Item
+detail load(item:Item) : bool
1
1
1
Fig. 3. High level representation of Overview Plus Details
Figure 3 depicts only a static view of the Overview Plus Detail. To complete the specification of the pattern, we have to specify the dynamic aspect of the pattern by specifying the interaction between the elements. To explain this, consider the mailer example. If the user select()s a Selectable Index, e.g. a mail header, its state is changed on the GUI: for example, the email within the Overview window gets highlighted. This results in the change activated = true. As a result, the corresponding Item is downloaded to the Detail (invoking load()). In case of success in displaying the item true is returned, otherwise false is returned. As a result, as specified in load(item:Item):bool, load accepts an object item of type Item and return Boolean (bool). Such interaction aspects of the system can be represented via a sequence diagram[22] or an OCL statement. The sequence diagram, which represents a possible interaction of the metamodel elements, is shown in Figure 4. :Overview
:Selectable Index
:Item
:Detail
select() activated:=true
item
load()
Fig. 4. Sequence Diagram representing an interaction in the Overview-Detail pattern
Pattern Tool Support to Guide Interface Design
365
For those unfamiliar with sequence diagrams, the diagram is read as follows. Semantically, the colon and underline around some text (:___ ) identifies an object. The vertical dotted line represents that object over time. The thin vertical bars represent the object within the system, with the vertical bar representing the lifeline of the object. The ‘invokes’ action is denoted by the horizontal arrow, and the label is the message or invocation of message or creation – i.e. the method call. The dotted line is the return of the message (the passing back of control). Progress of time moves from top to bottom. This diagram expresses that the Overview has a Selectable Index which is shown if selected (activated), and which causes the Detail to be loaded, in that order. We can also use OCL to represent the interaction between the metamodel elements of Figure 3. The OCL representation consists of three main parts, representing the expected behaviour of each method in the context of its related model element. OCL gives us a more precise explanation, which is a logical formalism that can be automatically transformed into code and incorporated into a software tool. The OCL is presented here for completeness, as shown in Figure 5, though for general explanation and usage the sequence diagram captures the main elements perfectly adequately. Comments in the OCL are prefixed by -context Overview :: select() -- There are SelectableIndex items to select pre selectConstraint : self.displays -> size() > 0 post selectConstraint_1 : -- There is one item selected from the collection -- of "displays" self.displays -> select(s:SelectableIndex | s.activated = true) -> size() = 1 and -- The selectedItem gets loaded in the Detail window. -- The select operation returns a Set, -- so we have to convert it to a sequence -- and retreive the first item -- (which is the only item of the set and the -- selected one) so as to load it to the -- Detail window of the overview. -- That way we also specify that the Item related to -- the SelectableIndex is the same as the item shown --by the Detail window. ( let selectedItem: Item = ((self.displays -> select(s:SelectableIndex | s.activated = true))->asSequence->first).item in self.detail.load(selectedItem) ) context Overview inv itemsSelected : -- There is at most one item selected at a time self . displays -> select ( s : SelectableIndex | s . activated = true ) -> size() = 0 or self . displays -> select ( s : SelectableIndex | s . activated = true ) -> size() = 1 context Detail::load (item: Item): boolean post: if self.displays = self.displays@pre->including(item) then return = true else return = false
Fig. 5. OCL statement capturing Overview Plus Detail pattern interactions
3.2 Modelling One Window Drilldown Having modelled one pattern, and shown that the concept works, we can extend this to look at a related pattern. One Window Drilldown (OWD) is an alternative to OPD. It is often used for the user interface of a device with tight space restrictions, such as a
366
R. Beale and B. Bordbar
handheld device such as a mobile phone. OWD can also be used in building interfaces for applications running on desktops or laptop screens, if complexity is to be avoided. In particular, if the user is not used to computers, they might have little patience with (or understanding of) having many application windows open at once. Users of information kiosks fall into this category; as do novice PC users. Figure 6 depicts the metamodel for the OWD. There is a single Pane to which a Current and Next data are loaded. On the selection of an item from the Selectable Index, the corresponding Item is loaded as the Next pane.
Fig. 6. High level model for specification of One Window Drilldown
To ensure in the above model there is only one in Current or Next the following OCL constraint is added. context Pane invariant: -- There is either a current or a next item (or both) -- The if statement takes out the "both" possibility self.current -> size() = 1 or self.next->size() = 1 and (if self.current -> size() = 1 then self.next -> size() = 0 else self.next->size() = 1 endif )
This essentially says that, in the diagram, there could be 0 or 1 Current screen and 0 or 1 Next screen – and we can only display one at a time, hence the need for the constraint. The behavioural model of the OWD is exactly the same as in Figure 4, which we would expect since the interaction is very much the same. The OCL statement is essentially the same as well (with minor variations, not presented here). 3.3 Extensions to other Patterns Clearly, there are many more HCI design patterns than Overview Plus Detail and One-Window Drilldown. However, these two patterns show that we are able to capture both structural and behavioural aspects of the pattern, allowing us to represent the essence of the interaction formally. This ensures that this approach is general, able to capture salient aspects of other HCI patterns, for both existing patterns, and for new ones. This is critically important, for if the approach only captured one or other
Pattern Tool Support to Guide Interface Design
367
aspect of the system, it would not allow us to apply it more widely. Three further patterns are given, very briefly, to show the general applicability of the approach (taken from [12]). 3.3.1 Card Stack The card stack interface is typically used when there are multiple pages of information to display that can be segmented into a relatively small number of meaningful categories. The meta-representation of the pattern is more or less the same as Overview Plus Details (OPD), shown in Figure 7.
Fig. 7. Meta-representation of card stack pattern
The data model of this is very similar to the data model of Overview Plus Detail, except for the following two points: the overview part in Card Stack must be small, consisting of one or two words (or small icons), and secondly that it is better to have under six cards. The data model is shown in Figure 8: similar to OPD, information represents the data, label is the same as overview and detail is not changed.
Fig. 8. Card stack data model
3.3.2 Cascading List The meta representation, shown in Figure 9, consists of a number of panes, each at the left hand side of 0 or 1 panes. Each pane contains 1 or more Items. Each Item is related to a list of Items in lower hierarchy. By clicking on each Item, the method LoadAtRightPane() is invoked.
Fig. 9. Cascading list meta-representation
368
R. Beale and B. Bordbar
The data model represents a hierarchy of information, in which each item has potentially many more subitems - Figure 10.
Fig. 10. Cascading list data model
3.3.3 Top-Level Navigation Commonly seen in websites and other internet-based applications, the top-level navigation model puts tabs or links across the header of the page to provide main access to the key areas of the site or application. The conceptual model of related data is captured in Figure 11. The emphasis is on: “Application had a number of main divisions” [12].
Fig. 11. Data model for top-level navigation
Top-level navigation implements the above information as follows (Figure 12). The UI has a single Top level navigation bar and a single Content Area. The toplevel navigation bar has a clickable affordance. The method Click(), if invoked, uploads the Division into the Content Area.
Fig. 12. Top-level navigation meta-representation
3.4 Issues and Limitations with the Representation The UML representation captures the behavioral and structural characteristics of an interaction artifact that provides a solution to an interface design problem. One criticism of the approach is that it only captures this aspect, and does not include issues such as the problems of the user – represented as the ‘Use When’ approach in Tidwell’s formalism – or the set of other possible actions that they could undertake.
Pattern Tool Support to Guide Interface Design
369
Our design pattern is constructed on the basis that, if the data has a specific structure, and you wish to display it to the user, then you could use this particular pattern to guide your solution. It does not identify whether you actually do want to display this part of the system to the user. This represents a subtle change in thinking about design: rather than patterns representing solutions to questions of the sort “I want to show this information to the user; how can I best achieve this” we have instead “I have this sort of information; if I wanted to show it to the user, this is how I could do it”. The designer is then faced with a series of choices of what to show from a set of possibilities, derived from applying the patterns to the system model. Notice too that aesthetic aspects are not captured in the abstract representation – nor are things such as the ‘clickable affordance’ mentioned earlier. There are two possible resolutions to this issue: the first is to recognise that the approach is focused on assisting designers identify a set of potential solutions to a particular problem, but is not dictating specifics, allowing them to focus their efforts on developing appropriate, effective implementations for the identified areas. The second, more software engineering oriented approach, is to capture some of these specific aspects in a Device Profile Model which provides details of the instantiations of UI elements for specific platforms and device. Our approach is aimed at formalising, structuring and supporting the use of design knowledge and past effective solutions: it is not aimed as a machine-based replacement for the activity of designers, but tries to support them in suggesting areas to focus their efforts, and outline solutions for them to consider and modify. We do not claim that our system automates interface design, or removes the need for subjective, aesthetic expertise – but it does guide the designer and narrow down the multiplicity of patterns to only a relevant subset. How we achieve this is discussed in the next section.
4 A Tool for Recognizing HCI Patterns If we examine the UML meta-representation for, for example, Overview plus Details, we can see that it comprises both graphical aspects – the window, containing 2 panes – and interactional aspects: when an item is selected in the overview pane, the detail corresponding to that is shown in the detail pane. If we concentrate on the from of data that could be represented in this way, it can be seen that the data structure suitable for the above must have the following general shape: a data type A (overview) has a list of items B (Selectable Index) and each item B is in one-to-one correspondence with a data type C (Item). Figure 13 encapsulates this concept.
Fig. 13. Type of data suitable for Overview Plus Details
It is this concept that forms the basis of the software tool: these pattern signatures, based on their datatypes, offer us a way of identifying which parts of a UML model may be suitable for a patterns representation, as long as we can match the datatype signatures on the actual UML model of the system, and the prototypical system.
370
R. Beale and B. Bordbar
4.1 Example System: An Email Client Consider in Figure 14, which depicts a system model for an email client. This contains the usual things we would expect in such a client – multiple users, multiple mailboxes, folders that contain email, with messages listed by date, time, title, priority and sender, and so on. This model essentially states that we have a Mail Server that can have many users. Each user can have one or more mailboxes, and each mailbox has both an Inbox and a Local Folder. Each of these can be subdivided into more folders. These folders contain email, and this email is comprised of one or more identifiers with a variety of fields, and associated with each identified chunk is some content. By examining the data types within the model, we can see that the part of the model including email, identifier and Content matches Figure 13, i.e. Overview plus Details, as marked by letters A, B and C, accordingly.
B A
C
Fig. 14. Simplified email client
Also matching is the user:Mailbox:Inbox and user:Mailbox:Local Folder – all three of these element sets are therefore potentially amenable for display using the Overview Plus Details pattern, as shown in Figure 15. We have seen examples of the data modelled as Part A of Figure 15 as both OPD and OWD. For example, OPD is used for popular desktop-based mailers such as Mozilla and Outlook – the right hand side of Figure 2 shows this, for example. OWD is used in shell-based mailers such as pine and mailers on the PDA, and mobile phone displays, which can’t use OPD because of the size restrictions of the screen. Part B is also seen in mail systems in which users have many email accounts and collect them together on one email server: by representing this in the display the user is able to identify which mail account they want to deal with. Part C is similar to B except that the user can see the Local Folder that is associated with the selected Mailbox. For parts B and C it is also clear from the diagram that, whilst these patterns are appropriate to display this information, only part of the system will be shown to the user, since Mailbox does have both an Inbox and a Local Folder. As well as being able to correctly identify the correct parts of the system that can be modeled this way, it is equally important that the system does not incorrectly
Pattern Tool Support to Guide Interface Design
371
identify other parts that in fact cannot. This is the case, as the data signature does not map onto other aspects of the system. For example, Mail server – User, User – Mailbox: this has a different data signature (1:*, 1:1..*) and so is not identified as appropriate. We can undertake a thought experiment in which we try to envisage an interface to a mailer in which these parts did appear in an overview-detail representation: we have to modify our conceptions of the system – there would have to be at least one user, rather than possibly none, and we would have to consider the collection of users’ mailboxes as one thing – and if we did that we are altering the model, giving it a signature of 1:1..*, 1:1 which would clearly then be suitable for OPD or OWD. These changes represent substantial changes in our conception of the mailer, and whilst it would be possible for a designer to want to represent this information in this manner, this approach ensures that the underlying model is modified to fit the new concepts introduced.
Fig. 15. Parts of the email client that can be mapped to the Overview Plus Details pattern
There are two major observations from this. Firstly, it is possible to identify data models similar to Parts A-C automatically. In other words, it is possible to programmatically scan a UML class diagram and identify all parts of the model that can be refined via OPD and OWD. Identifying such patterns paves the way to creating tools that can prompt a designer, suggesting the application of suitable design patterns for relevant parts of the system. At present, using this approach does allow us to automatically identify all the design patterns that are potentially appropriate for representing different elements of the system, and, equally, rules out patterns that are unsuitable. This represents a step forward for the use of patterns – we do not have to be familiar with all the patterns in a library, but can rely on them being indexed and identified by the types of data that they can represent. 4.2 The Design Pattern Tool We have seen that the idea behind the method is to create a tool to identify fractions of the UML model within a system model that match a design pattern. More specifically, this means recognising instances of one of the diagrams (typically the smaller diagram,
372
R. Beale and B. Bordbar
the HCI pattern in this case) that occur within the other diagram. Partial matches are of no interest, but multiple instances of the same pattern within a diagram are. The basic algorithm for matching patterns in the diagrams' structure works by attempting to use each element of the system model as a start point for comparison with the HCI pattern. The algorithm then attempts to compare the structure of the start point with a designated ‘first’ element from the HCI pattern. A comparison is successful if the two objects being compared have associations with the same properties pointing to them. For example, if the HCI pattern element under consideration has two associations pointing to it, then the system design element must also have at least two associations and, further, the multiplicity at the end of each association must be the same in both models. It does not matter if the system model element has other additional associations that do not match the HCI pattern. A simplified example of this is presented in Figure 16. The figure shows a UML representation of an HCI pattern and System design model. It is clear that the HCI pattern’s structure is replicated in the system model. In fact, there are two instances in which such a mapping could occur. The two possible mappings are: “Object X maps to Object A; Y to B; Z to D” and “Object X maps to object D; Y to B; Z to A”. The algorithm will recognize both of these instances, ignoring the fact that objects A and D in the system model have other associations that do not map to the HCI pattern. Using the approach outlined above, it is possible to recognize all of the direct mappings between the structures of HCI patterns and system models. The above approach is successfully implemented as a prototype tool [23], which is an Eclipse plug-in working with Omondo [24]. First, the UML models of the system, which are captured in XMI format by Omondo, are transferred to collection of Java objects and the above algorithm is implemented in Java.
Fig. 16. Matching UML and HCI pattern models
5 Discussion and Further Work This approach is not a universal panacea. For example, a large number of the HCI patterns are subjective. It may not be possible to model such patterns in UML. For example, consider the “Intriguing Branches” design pattern [12]. In the pattern's
Pattern Tool Support to Guide Interface Design
373
description, as part of the section on “how to implement the pattern”, the following guideline is given: ``Start with a deep understanding of your users. What might interest them? Where in the interface are they likely to take time to explore something further, and where do they just need to get something done?'' Formal modelling of such patterns is a far more complex task than that dealt with using UML languages. Hence, our approach can not deal with these forms of HCI patterns – though others have argued that these types of design approach should not be termed patterns anyway [13]. However, this should not be seen as a terminal problem for the approach – the system aims to guide designers through a morass of patterns, and whilst not all aspects of designs may be contained within the pattern set, an extensive and comprehensive pattern library is already in existence, for which guidance can be highly valuable. Another major issue that is the level of complexity the application designer has entered into when drawing up their system designs. It is possible for two designers to represent the same system, in the same format (a UML class diagram, for example) and still come up with vastly different output. One designer may wish to group small components together to form a single component that combines their functionality, whilst another may wish to express every component, however small, in their designs. As an example, consider and imagine if the system designer had, instead of representing component ‘B’ as a single class, decided to represent ‘B’ as two separate components ‘B1’ and ‘B2’, linked by a ‘one-to-one’ association (Figure 17).
Fig. 17. Alternative system model to Figure 9, containing B1 and B2 elements
Clearly, the components ‘B1’ and ‘B2’, if viewed as a single component (enclosed in the box) result in the same diagram as that in Figure 9 and thus the same output from the pattern matching tool. However, as the diagram is presented, the tool would come up with no direct matches between the HCI pattern and the system design. This can be rectified by including additional runs of the algorithm in which elements of the system model are permitted to be 2,3,4... actual classes in size. However, this feature is not implemented in our prototype version [23] and remains a subject for further
374
R. Beale and B. Bordbar
work. The solution is known, however: this will be tackled by implementing a backtracking algorithm, in much the same way as the string searching utilities such as grep work when trying to find matches to complex expressions containing wildcard identifiers such as ‘*’.
6 Conclusions The tool does offer identification of some relevant design patterns given a UML model of a system, and so takes us some way towards our goal. By having a highlevel model, we can adapt our implementation to changing technological bases or to revisions in functionality relatively easily (and in some cases, automatically, using machine translations from platform independent models to platform specific ones). We can then use these models to give designers some guidance as to which parts of the model are able to be represented with which design patterns. Clearly, there will be parts of the system that require no such interface presence, and other parts in which multiple patterns will be identified, and so we see this as giving support and guidance to designers. By identifying potentially appropriate patterns, it reduces the barriers to wider use of design patterns, and so could promote more effective interfaces through the use of known solutions to problems. As well as producing better systems, this also speeds up the implementation phase, allowing even faster production of code, potentially keeping up with the pace of technological change.
References 1. Szyperski, C., Gruntz, D., Murer, S.: Component Software - Beyond Object-Oriented Programming, 2nd edn. Addison-Wesley/ACM Press (2002) 2. Thimbleby, H.: The computer science of everyday things. In: Second Australasian User Interface Conference Proceedings, AUIC 2001 (2001) 3. Beale, R., Bordbar, B.: Using modelling to put HCI design patterns to work. In: HCI International. 11th International Conference on Human-Computer Interaction. Lawrence Erlbaum Associates, Inc. (LEA), Las Vegas (2005) 4. Alexander, C.: A city is not tree. Architectural Forum, 122(1), 58–61, 122(2), 28–62 (1965) 5. Alexander, C.: Notes on the Synthesis of Form. Harvard University Press, Cambridge (1964) 6. Alexander, C., et al.: A Pattern Language. Oxford University Press, New York (1977) 7. Gamma, E., et al.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading (1994) 8. Dearden, A., Finlay, J.: Pattern Languages in HCI: A Critical Review. Human-Computer Interaction 21(1), 49–102 (2006) 9. Erickson, T.: Interaction Design Patterns Page, http://www.visi.com/~snowfall/InteractionPatterns.html (was http://www.pliant.org/personal/Tom_Erickson/ InteractionPatterns.html) (cited September 18, 2008) 10. Duyne, D.K.V., Landay, J.A., Hong, J.I.: The Design of Sites: Patterns: Principles and Processes for crafting a Customer-Centered Web experience. Addison Wesley, Reading (2002)
Pattern Tool Support to Guide Interface Design
375
11. van Welie, M., van der Veer, G.C.: Pattern Languages in Interaction Design: Structure and Organization. In: Human Computer Interaction (Interact 2003). IOS Press, IFIP, Tokyo, Japan (2003) 12. Tidwell, J. Designing Interfaces: Patterns for Effective Interaction Design. (2005), http://designinginterfaces.com/ (was http://time-tripper.com/uipatterns/index.php) (cited June 8, 2008) 13. Mahemoff, M., Johnston, L.J.: Usability Pattern Languages: The ”Language” Aspect 14. Gamma, E., et al.: Design patterns: elements of reusable object-oriented software, p. 395. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (1995) 15. Akehurst, D.H., Howells, W.G.J., Maier, K.D.: Implementing Associations: UML 2.0 to Java 5. To be Published in Journal of Software and Systems Modeling, 1–33 (2006) 16. Sunye, G., Guennec, A., Jezequel, J.-M.: Design patterns application in UML. In: Hwang, J. (ed.) ECOOP 2000. LNCS, vol. 1850, p. 44. Springer, Heidelberg (2000) 17. France, R., et al.: A UML-Based Pattern Specification Technique. IEEE Trans. Softw. Eng. PU 30(3), 193–206 (2004) 18. Kim, D., et al.: A UML-Based Metamodeling Language to Specify Design Patterns (2003), http://www.cs.colostate.edu/~georg/aspectsPub/ WISME03-dkk.pdf 19. Mak, J., Choy, C., Lun, D.: Precise Modeling of Design Patterns in UML. In: ICSE 2004: Proceedings of the 26th International Conference on Software Engineering (2004) 20. Dong, J., Yang, S.: Extending UML To Visualize Design Patterns In Class Diagrams. In: Proceedings of the Fifteenth International Conference on Software Engineering and Knowledge Engineering (SEKE), San Francisco Bay, California, USA (2003) 21. Dong, J.: Representing the applications and compositions of design patterns in UML. In: SAC 2003: Proceedings of the 2003 ACM Symposium on Applied Computing (2003) 22. OMG. Unified Modelling Language, UML (2005), http://www.uml.org 23. Evans, M.J.: From System Models to HCI Design Patterns via the Model Driven Architecture. In: School of Computer Science, p. 44. University of Birmingham, Birmingham (2006) 24. Omondo: The Live UML company: Omondo Eclipse - Free Edition (2006), Software package available at http://www.omondo.com
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity in Reminiscing John Helmes, Kenton O’Hara, Nicolas Vilar, and Alex Taylor Microsoft Research Cambridge 7 J J Thomson Avenue Cambridge CB3 [email protected]
Abstract. People are accumulating large amounts of personal digital content that play a role in reminiscing practices. But as these collections become larger, and older content is less frequently accessed, much of this content is simply forgotten. In response to this we explore the notions of randomness and serendipity in the presentation of content from people’s digital collections. To do this we designed and deployed two devices - Meerkat and Tuba - that enable the serendipitous presentation of digital content from people’s personal media collections. Each device emphasises different characteristics of serendipity that with a view to understanding whether people interpret and value these in different ways while reminiscing. In order explore the use of the devices in context, we deployed in real homes. We report on findings from the study and discuss their implications for design. Keywords: Reminiscence, Photo sharing, Serendipity, Interaction, Social Media, Robotics, Screens, Iterative design.
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity
377
for people. But to use this as a point of departure for design, we must understand further some of the properties of random led consumption of media that lead to values such as joyful serendipity. Leong’s work is particularly significant here, articulating a number of key factors underpinning the values of randomness and serendipity in media consumption. First of all there is the issue of choice abdication. Deferring choice to a system through randomness not only obviates the practical need for interaction but can also be important when people become “paralysed” by an overwhelming number of choice possibilities [3]. Second, randomness introduces elements of chance, luck and surprise and the related issue of anticipation. The value of these elements can be seen in many everyday activities such as games of chance and rituals such as opening presents [1]. Third, there is the notion of defamiliarisation [22] in which the placement of things in unfamiliar and unusual contexts can cause people to reexamine their assumptions around existing concepts and ideas. Juxtaposing content in new ways and against new contexts can lead us to reinterpret content with new and meaningful inscriptions. Key here are the interpretative acts that accompany this randomness. In part, this is about the interpretation of new relationships between content and mediated memories. But it also concerns interpretation of the significance of why these random relationships have been generated with people sometimes looking to higher level and unexplained forces to make sense of the randomness they are presented with. The importance of this interpretation as a value brings to mind the arguments of Gaver and colleagues around ambiguity and how one might design explicitly to encourage interpretive acts [10] [21]. In these acts of interpretation, there are also concerns with a sense of agency. That is, to what extent can these acts of randomness be attributed to something meaningful and purposeful on the part of the system creating the events. Within this framework then, it can be seen that there are a number of dimensions on which we can target the design of systems. This means going beyond simply the design of a randomness function, to thinking about additional factors such as, facilitating anticipation, and surprise, attributions of agency and above all the interpretive acts that accompany this. In particular how can we do this in a way that fits neatly in with people’s everyday practices? In this paper, we explore some of these issues. We do this through a presentation of some novel technology prototypes, Meerkat and Tuba, whose designs relate to different elements from this framework of randomness and serendipity.
2 Related Work With the increasing proliferation of digital capture and presentation technologies, there is a growing body of research that is exploring the relationship between personal digital content and practices of reminiscing. This is turn has drawn on work from anthropology that explores the material aspects of our reminiscing behaviours. For example, work by Chalfen et al [5] looked at the everyday social practices around photo displays in the home through which family norms, traditions and values are expressed and maintained. Drawing on this work within the context of HCI, Crabtree et al. [6] and Frohlich et al. [9] further articulated the values of collocated photo
378
J. Helmes et al.
sharing practices and the accompanying storytelling and reminiscing practices. Further reasons for collecting and keeping content, addressed in the work of Kirk et al [14] refer to the importance of maintaining link to one’s history as a form of legacy and also to keep certain special people in honorium. Kirk et al [15] describe several practices people undertake with their photo collections prior to sharing it with others. Of interest to us specifically is how these practices shape and influence the possibilities for presenting and interacting with large digital collections. Of particular significance, as highlighted by Kirk et al [15], and in spite of research and design effort to facilitate viewing, browsing [11] and searching [12], people infrequently look back through, let alone search their collections. When they do, they are more likely to deal with recent content. Our approach to interacting with these large collections focuses on facilitating experiences that arguably try to fit better into people’s existing rituals and daily practices; similar to the approach of dedicated digital photo frames that are increasingly making their way into family households. An area of great interest and explored widely within the HCI community; unpacking emergent practices and family values through interactions with novel photographic technologies in domestic life [19] and [24]. A similar approach is undertaken in the work of Durrant et al [8] in which a novel content displaying prototype reveals several design implications for these types of devices, providing a new interaction mechanism and exploring what it means to present/interact with content from different identities on a single device. Within this work it was our attempt to further elaborate on the material properties of photo display devices [23], by exploring qualities like choice abdication, uncertainty, surprise and agency [16] for the devices’ interaction mechanisms.
3 Design Approach We have taken a practical stance towards the exploration of alternative devices for reminiscing practices. As this activity often manifests itself as a conscious interaction with our computers, mobiles and digital photo frames, it was our intention to engage people in novel ways with their digital content. The designed artifacts each addressed a different approach to representing personal digital content stored on our computers, without necessarily trying to superimpose any kind of specific usage [10]. The following sections describe our design goals and decisions in more detail. Bespoke It was our intention to design dedicated devices for reminiscing about digital personal content within a domestic setting. We explicitly wanted to steer away from reminiscing as a side-product of a multi-purpose device or system. In addition, the design had to be compact in order to support potential repositioning in different places within the home. Engaging We intentionally tried to engage people in new ways with their digital content, trying to steer away from using screens mostly as a static presentation entity. Hence, we
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity
379
wanted people to more actively engage with them by trying to emphasize their materiality and tangible interaction qualities. Serendipitous We explored novel presentation and interaction mechanisms to allow moments of anticipation and surprise, e.g. by integrating elements of choice abdication, defamiliarisation and agency. By highlighting on these characteristics, we hoped to further enhance people’s engagement and interpretive practices surrounding their reminiscing experiences. Effortless By initiating an automated way of collecting personal content from people’s digital repositories we wanted to make the effort of using and populating the devices with content as effortless as possible.
4 Designing the Devices Apart from taking these general design goals into account we further specified two different directions for the design of the devices in order to explore alternative interaction mechanisms in parallel. With our first design, called Meerkat, we explored the notion of pushing content towards the user, to grasp the user’s attention. In contrast, our second design, called Tuba needs a deliberate action from the user in order to pull content from the device. Throughout the design of Meerkat and Tuba we applied an iterative design process in which several different fidelities of explorations led to the final instantiation of Meerkat and Tuba (see fig 1). The next sections describe the two designs in more detail.
a
b
c
d
Fig. 1. Pictures of the iterative design process of Meerkat, some early sketches (a), cardboard exploration (b), acrylic exploration (c) and part of an exploded view in Solid Works (d)
4.1 Designing Meerkat As illustrated in figure 2, Meerkat is a robotic entity consisting of 3 displays on a moving mechanical arm for displaying combinations of photographs. Over time, as the name suggests, Meerkat randomly pops up with an eccentric movement of its arms and displays to reveal new combinations of pictures.
380
J. Helmes et al.
a
b
c
d
e
Fig. 2. Meerkat moving sideways (a), folded (b), moving up (c&d) and completely up, displaying content (e)
Broadly, Meerkat can be dissected in three parts. The base of Meerkat houses most of the electronic components, a servo, and is designed in a slightly curved fashion. This curvature allows its servo-enacted movements to amplify its empathic characteristic. Its middle part consists of two arms that, equipped with another two servos, allow it to move up and down and in various other curious ways. The top of Meerkat’s incorporates three small screens for presenting digital content. Each screen has a servo attached to it in order to further increase its movement vocabulary. In addition, another servo allows the 3 screens to be tilted into the direction of the user. Whenever in folded position, Meerkat’s screens do not display content. When it decides to pop up, it randomly selects and presents three images stored on the SDcard. As well as popping up at random intervals, Meerkat also responds to people’s presence. An embedded IR sensor detects close presence that triggers it to start displaying content. Meerkat is designed in such a fashion that ignoring it will prompt it to pop up more frequently, asking for attention. In contrast, actively triggering it has an opposite effect, reducing the frequency with which it pops up. As such the design dynamically plays with engagement levels. Meerkat has a total of 7 distinct, pre-programmed behaviours. It can also come up with subtle movements itself by moving a combination of its joints for random amounts of times and direction. Its final decision on how to move is semi-random as it also depends on whether it has been interacted with recently and whether its IR sensor is actively receiving data. There are a number of features of the Meerkat design that relate to our concerns with regards to serendipity, reminiscing and surprise. The device is semi autonomous
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity
381
in its movements and presentation of content, not requiring direct interaction from a user in order for new content to be revealed. As well as providing a gentle way of alerting attention to the device, this autonomy gives the device a perceived sense of agency. The eccentric movement vocabulary of the device is also designed to enhance the sense of agency perceived in the device and aims to affect interpretation of the device’s randomness and serendipity. The movements were not designed with any immediate and obvious semantic qualities, though we wanted to see whether people might begin to recognise certain patterns of behaviour within Meerkat. As well as contributing to a sense of agency, then, we wanted to see whether the users’ emerging understandings of the movement vocabulary might influence the interpretations of the photographs – that is questioning why particular photo combinations were revealed in the context of particular movements. Finally, the display arrangement plays a role in a number of ways: first, the face-like qualities created offer additional opportunities for anthropomorphic interpretations of the device and perceptions of agency; second, it increases the movement vocabulary; third, the juxtaposition of 3 random photos provides a mechanism for defamiliarisation of the content and new possibilities for interpretation of serendipitous and coincidental presentation of content. 4.2 Designing Tuba Our second device, called Tuba, is shown in Figure 3. Again, it is used for the random presentation of digital media from a person’s media archive. In contrast to Meerkat, content is revealed and changed through a deliberate act on the part of the user. Our aim in the Tuba design was to embody anticipation and surprise in the interaction control mechanisms. As with artifacts such as gifts or trinket boxes, the excitement is in not knowing what is hidden inside. Anticipation is created through the process of revealing the hidden. With Tuba, the user needs to explicitly “open” device to reveal the screen. Each time Tuba is opened up a piece of digital content gets revealed. By means of this simple, conscious and physical interaction, Tuba creates anticipation, as it is a surprise what gets presented each time the display is revealed. Secondly, to further elaborate on the notion of anticipation and curiosity we diverted from merely presenting one content type, as was the case with Meerkat. Increasing the variety of different media types, we envisioned the level of anticipation and curiosity to rise as well. As such Tuba was populated not just with digital pictures, but also with music, Facebook wall postings and general knowledge facts. Again, as well as emphasizing qualities like randomness and surprise some of the media types introduced defamiliarisation - for example, presenting historical Facebook posts outside their originally intended context. In addition, the device also contains a generic speaker, which, whenever Tuba is closed, points upwards. The part holding its screen and speaker can be tilted backwards in order to reveal the screen. Each time Tuba is opened up (the screen revealed) it presents a single media item. In case this media item is a song or a more generic audio file the user would be prompted by an on-screen message to close Tuba again in order to listen to the music. We wanted users to close Tuba to listen to the music as in closed position the speaker points upwards into the direction of the person as opposed to the opposite direction. In addition to this practical reason the distinct positions further emphasized Tuba’s different media presentation capabilities (displaying content and playing audio file).
382
J. Helmes et al.
a
b
c
d
e
Fig. 3. Tuba opened up and displaying a Facebook post (a), in closed position (also for playing music/audio files) (b), Tuba being tilted backwards revealing the screen (c, d & e)
4.3 Content Scraper In order to populate the devices with content we created a content scraper that automatically scrapes people’s hard drives (or parts of it, based on a pre-defined folder path). Programmed in C# and making use of the Microsoft Windows search API, our scraper searches for a pre-defined number of .jpeg, .jpg, .bmp, .png and .mp3 files. In an ideal situation one could imagine running the scraper on a dedicated computer system automatically updating the devices with content. Nevertheless, we decided not to fully implement this aspect of the system for two reasons. First, with the design and deployment of Meerkat and Tuba it was our goal to create an engaging and serendipitous experience without investing too much time in the hidden technological back-end. Second, people would still have to consciously select folders on their computers that they would be happy with displaying on the devices. More detail about the process of populating the devices can be found in the next section.
5 User Evaluation In order to investigate people’s behavioural practices around Meerkat and Tuba we executed a series of qualitative tests in real world settings and contexts. The devices were deployed within four different families for a period of two weeks each, during which each family received one Meerkat and one Tuba. Throughout an introductory
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity
383
session the families were interviewed about their current reminiscing practices and received an introduction about the devices. We solely provided the necessary information to interact with the devices without steering towards any kind of intended use. Especially given that we were most interested in the way people tried making sense of, perceived and used the devices from their own point of view. During the session participants were asked to locate content they would be happy to upload onto the devices. In order to practically realise this process, folders (contacting collections of years of photos) were copied from people’s computers onto our external hard drive. Using our laptop we ran the content scraper to collect the media from these folders and store it on the SD-cards of the devices. On average we had the scraper collect a minimum of 600 image and 30 music files. We intentionally chose to deploy the devices within a variety of different families in order to gain insight in their impact on different social situations. It should be recognised though that there are other different types of households that would also be of interest to explore. The families we chose, then, represent our initial explorations into the types of behaviours and responses to these devices. Family 1: A couple in their thirties, just married and living in a small apartment with one area that functions as a living room and kitchen. Both work as scientists at a University. They store their content as a shared collection on a Mac, but do not interact regularly with it. Although having a digital picture frame they haven’t found the time to set it up. Family 2: A couple in their late 30s with two young children, a boy aged 4yrs and a girl, aged 6yrs. Living in a spacious house with a number of different rooms and large kitchen area. The father, works full and the mum does voluntary work as well as home and childcare. Their shared digital content collection lives on a PC upstairs, they do not interact on a regular basis with it and have no other ways of displaying the content. Family 3: A couple (in their thirties) with a baby. Similar to the first couple their living room is connected to the kitchen, similar sized apartment with a slightly different layout. The husband, works as a radiologist in a hospital. Trained as an anesthetist the wife, currently stays at home as a full time mother. Storing their content as a shared collection on a MacBook, their screensaver displays the collection. Family 4: A husband and wife in their early 40s with two young children, a boy aged 6 yrs and a girl, aged 7yrs. Both parents work. Similar to the other family they live in a spacious house with a number of different rooms. Their digital content lives on a Mac in one of the downstairs rooms but does not seem to be accessed that often. After our initial session, we left the families to use the devices for two weeks, during which we stayed in touch in order to assist in any way if necessary. The families were asked to keep a diary and/or make use of a small Flip-type camera to document any interesting social events, thoughts and interactions that emerged from living with the devices. At the end of the two weeks a final interview took place during which we talked about particular episodes that took place during those two weeks. In some cases the diary was used to allude to particular entries and trigger conversation around social events that emerged through their interactions. The interviews were fully transcribed for the purpose of analysis.
384
J. Helmes et al.
Findings The findings presented here represent an early pass through the data from the interviews and diary episodes. While the responses to the devices were quite varied, we have nevertheless attempted to draw out a number of key themes from the data that highlight some of the ways people oriented to the devices. In unpacking these themes, our aims are to link what we found in the interviews back to the initial design goals and different features from the serendipity framework embodied in the prototypes. The key themes we explore in this initial work relate to engagement and interpretation, fit with daily routines and influence of different content. Engagement and Interpretation An important question for us to consider is the extent to which the devices and their particular design characteristics facilitated engagement from the users and how this related to reminiscing and interpretations of the content. We begin our discussion with a look at Meerkat. Perhaps most significant to people’s engagement with the device were the characteristic movements of the device. The movements were seen as quirky and erratic in a humourous way. Accompanied by the mechanical noises of the servos, this gave the device a certain personality leading to anthropomorphic interpretations. Several participants made reference to it being like having a kind of “pet” around the house that was fun to watch. During one of the interviews, Meerkat was on the table and after popping up, it flopped over and rested on the arm of the participant: M
“And that’s why I feel it is a bit pet like. Cause it does, it does look a bit like a face and a body and a… Sometimes [the movements] are quite endearing. Cause you know, when you sit next to it it’ll kind of flop onto you, a bit like a dog or something you know. Like that see [Meerkat falling onto M]. And you don’t want to keep it too close to the edge but it hasn’t fallen off yet but I keep thinking it might. It’s just so erratic isn’t it?”
The movements, then, were a source of engagement in themselves. Children for example, would mimic the movements as they watched it. Through this engagement with the movement, attention was to the photos being displayed. Having personal photos on there contributed to a sense of warmth to the device: N “You probably wouldn’t have the same warmth towards it if it didn’t have the pictures”. What was apparent from the interviews though, was that the movements were not really interpreted in any meaningful way. Participants did not observe or notice any patterns of movement behaviour over time and as such did not really ascribe any semantic significance to particular movement patterns. In this sense, while drawing attention to the device and prompting engagement, the movements did not contribute to any sense of interpretation for why particular photo combinations were being shown – they were perceived as essentially random.
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity
385
N
“I don’t really connect its movements to what it is showing.
C
“I don’t know, I just find myself looking at it anyways with or without the pictures cause with the movements and stuff it just grabs your attention… It just is really fun…”
There were times too when Meerkat would simply be ignored and left to its own devices. Because it didn’t require any deliberate interaction on the part of the user, it continued functioning as an ambient display, drawing attention intermittently depending on the context of other activities going on in the household. While most participants were happy to ignore the device, there were occasions when its unpredictable behaviour was distracting leading to it being switched off. For example, one couple who had a very young baby, would at times switch the device off because the sudden noises would potentially startle the baby. Some of these practical concerns, then, impact on the device’s ability to function as an ambient display that could drift in and out of attention – in turn impacting on its role to promote serendipitous reminiscing behaviours within these contexts. With Tuba, by contrast, the deliberate interaction required promoted a different sense of engagement with the randomly generated content. Of particular significance here was the opening mechanism. A number of participants commented on how this created a certain amount of anticipation, and a “magic spell” associated with the revelation of the media. N
“… I have never bothered with a picture frame - well partly ’cause I don’t think they are that attractive, scrolling through a few pictures on the wall. Whereas this one, cause you got that like little magic spell of opening it up it’s… I find it more engaging.”
Important to some participants was how the device slowed down interaction and thereby encouraged more reflection on the content. This slow reflective behaviour was felt to be something that might be lost with more efficient interaction mechanisms that would allow more deliberate browsing of the materials in search of photos you liked. M “You would race through it then wouldn’t you - looking for something? Whereas this does slow you down definitely. I think it would get a bit too much, cause then you would be able to search for something.” Indeed, it was this slowness of interaction and the uncertainty of what would be revealed that in some instances promoted a sense of longer-term engagement with the device. In one of the families, the young daughter had been waiting for a particular photo of herself diving while on holiday “R has been waiting for a certain selection of photos to come up and hasn’t found that one yet.” Each time the device was opened, the daughter would hope to see the photo, sustaining her desire to keep interacting with Tuba. Throughout her interactions she
386
J. Helmes et al.
was building up excitement for that particular moment it time. Coincidentally, during the interview, we witnessed the moment when the particular photo in question finally came up on the display, much to the excitement of the daughter - this became an important social event. [M plays with Tuba and open it again, the diving pictures comes up] M … “oowww, R, R!!” J “Is this the moment?” [R comes back, walks up to M] R “Heej, look daddy, look look look look!” K “Is that you diving?” R “Yeah, with my friend M, who lives in Malaysia.” N “Yeah, that was in March, that was a big deal.” K “So you have been waiting a few weeks for that photo to come up?” M “We have looked at it quite a bit, so that is quite nice isn’t it? We’ll leave that we won’t close it for a bit.” R “That’s my best friend.” M “Careful cause if you turn it over it’ll be gone forever.” K “So that is another issue, the leaving open, has there been a lot of it?” M “Yes, yeah, don’t touch it you know, when you were looking at it or something like that so we were all kind of sitting around and I would for example. turn it over and R would say “don’t turn it over yet”, cause she wouldn’t be finished with it yet. She hadn’t quite digested the moment, the photo yet.” What this episode also points to are important behaviours around non-engagement with Tuba – namely leaving the device open. In this instance, the device is left open and unavailable for further interaction because the photo, that had acquired increasing importance over time, was not wanted to be lost. In other instances too, people would leave the device open on occasion, when arriving at a photo they liked or because they wanted someone else in the family to see it who was not there at the time. The reminiscing was extended across time and other people. R
“What? More active? Yeah, I suppose so, but equally Tuba needs physical involvement doesn’t it to open and shut it each time to get something out of it and actually for us we found that…that was something we found we didn’t prioritize time for to do. So I’d often, with Tuba, find a picture I liked and think “och, beautiful picture or whatever, St Ives last year, lovely photo, really enjoyed seeing it, haven’t seen it for years and I would just leave it there.”
What this snippet also begins to point to is that for some participants, the inefficiencies of the interaction mechanisms with Tuba were at times frustrating. For those participants who interpreted as a simple content browsing tool, the interaction mechanism was laborious. This highlights some important tensions in the design and perhaps raises questions about primacy of slow and serendipitous interaction in design. For these participants, there engagement with the device was periodic,
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity
387
skipping through photos and other content to get to something they liked. Their concern was not so much with the slow and contemplative values enjoyed by other participants. Fitting into Daily Routines In understanding the experiences of serendipity in relation to media consumption with these devices, our concerns need to extend further beyond the relationship between form factor and the ways in which randomness may be interpreted. What is also of significance here is how these devices and the consumption of content related to the everyday routines and structures of family life within the home. Encounters with the device and content were very much contingent upon rhythms and flows of everyday family activity and their spatial and temporal organisation. In this respect for example, the location of the devices very strongly orchestrated the kind of interactions that unfolded. After initially setting up the devices at the beginning of the trial a number of the families repositioned the devices in different areas of the home, mostly based upon their presence in certain areas of the home but also in order to suit current activities and prevent obtrusion with others. N
K N
“They were initially in the sitting room for a day/two days switched off. So I brought them in here and turned them on during breakfast time and meal times, like this.” “So it is more accessible in here?” “Yes more accessible, it goes for a good dozen times between my chores and washing up.”
The sitting room for this family was the place where they would sit and watch television to switch off and relax during the evening. With attention dominated by the television in this space, the devices were simply ignored in this spatial and temporal context of home life – any opportunities for serendipitous experiences were lost. Only by moving the devices to a different part of the house, namely an open plan kitchen and dining space, did the devices begin to fit with the kinds of family activities conducted in this space. This bounded interaction time and opportunities for serendipitous encounters to specific places, times and associated activities. Within the two families having children, the presence of the devices was a feature of daily family get-togethers, such as breakfast and dinner, in which they functioned as conversation starters through both the displayed content and the more materialistic affordances of their interactions. In that sense, the devices seemed to have a different effect compared to static frames, creating anticipation through their curious enactment and interaction features. R K R
“I would turn it on after feeding the cat, almost becoming like another routine in the morning…” “So you’d switch it off overnight and then switch it back on in the morning?” “Yeah, so it kind of became part of the sort of routine of the day.”
388
J. Helmes et al.
Important here is that the devices come to be used sometimes with secondary purposes. In one family for example, it was a way of keeping the children entertained and occupied while the mother managed the practicalities of meal production, preparations for school and other aspects of family management. But the devices nevertheless did offer occasion for reflection for the family, if sometimes fleetingly, talking about content brought up during a breakfast or dinner. Interesting for us here is to reflect upon the way their interaction mechanisms triggered a more active opportunity to “break free” from daily routine; creating anticipation through the integrated elements of serendipity and creating short moments of delight, being taken away (even for a few seconds) from everyday habits. J N
“What you just mentioned regarding using your laptop, was that a positive experience then?” “When I was doing the work on my laptop on the table? Yeah, cause the work is boring… ehm… cause I was desperate for distraction and it’s not that it interrupts the work at all. It doesn’t really get in the way it’s not like it was interrupting my flow of thought on my website repair.”
Content Type One of the features of Tuba was the ability to presentation different content types and as we see in the fieldwork, these different content types had an important impact on experiences with randomness and serendipity. First of all, variety of content added an additional element of uncertainty and surprise with regards to what would be encountered on any given interaction. This level of uncertainty, then, offered some value in terms of curiosity and surprise but at other times potentially frustrating when an undesired content type arose. This is a delicate balance to orient to in the design. In practice, what also appeared to impact on people’s serendipitous experiences at this level of content was the ratio of different content types. Typical in the installations was that photographs were the dominant media in terms of number, making them much more likely to appear than other media types. This ratio of media types then is important to orient to in the design of serendipitous encounters. But over and above concerns with surprise and serendipity is that people oriented to the content types in different ways, some people wanting more photos, others more Facebook content. A
S K A
“I think the music sounds better if you do it that way and the pictures look better if you do it that way. Erm, but I sort of like the idea that it was sort of one device that was playing with these different functions.” “Yeah.” “Was it good that you didn’t know what content or content type it was going to come up with or was that just annoying?” “No I think that actually was very good, because, yeah, if you wanted to see a certain picture or play a specific song you would do that on your iPod or something ehm, so coming up with a more random thing I thought that was quite good…those made it fun I suppose if it didn’t do that than you would sort of loose your point I guess.”
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity
389
What was also significant about the different content types were the different ways people oriented to their presentation. Facebook posts for example, provided some humorous reflection by those who had posted them and by their partners. But these posts also seemed to cause concern in terms of their presentation in public settings, for example, when there were visitors to the homes. Because the content originated from a variety of different sources and has been written up for different audiences, the presentation of it in a dynamic, social context was not so simple, in particular because of the need to account for and explain the information in the excerpts, or because of the need to present only certain facets of the self within the context of particular audiences. In the following excerpt one of the participants expressed her embarrassment when having visitors over. A S K S
“Yeah, when new people arrived and played with it a bit.” “Because it had so much embarrassing Facebook stuff I actually didn’t show the mums….” [laughing] “Why would you be embarrassed about having a heavy night?” “It’s just like I don’t have to explain every single instance to them each time.”
So the integration of a variety of different content types had both positive and negative effects. On the one hand it contributed to the element of surprise by abdicating choice and increasing the participants’ levels of engagement. On the other hand, the broad topic range and different content origins led to some mismatch in expectations as well.
5 Conclusion/Discussion With the design of Meerkat and Tuba it was our goal to create devices specifically for reminiscing purposes, steering away from existing digital content presentation technologies. By introducing elements of agency, randomness, defamiliarisation, surprise and choice abdication we explored possibilities to further engage people with their personal digital content. The deployments gave us much more insight into people's acceptance and rejection of such technologies and why certain features worked well in specific contexts whilst having opposite effects in others. Though we took contextual aspects into account with the design of the devices, it was evident how aspects like people's stage in life, their family situation and daily rituals (or the lack of) had a very pronounced effect on the participants' perception and interactions with the devices. Meerkat and Tuba either resonated well with existing sociocontextual qualities or they simply did not at all. Nevertheless, as alluded to in the previous sections, the exact reasons are complex. Therefore, it is essential to strike a careful balance with regards to the integration of the described attributes as main functionalities of an appliance. As with Tuba and Meerkat's manifestations, it was sometimes felt as if the systems were “trying too hard”. People had a certain expectation whilst interacting with the devices, but that expectation seemed, to some extent, to decrease the chance for a serendipitous and delightful interaction to emerge.
390
J. Helmes et al.
If the presentation of the content were a by-product of another meaningful interaction, the manifestation of the attributes for engagement would potentially have been stronger. Therefore, we believe the design for serendipitous experiences has to some extend disappoint as well in order to be effective. The notion of these devices within daily family routines is an interesting one, especially taking the richness and diversity of the activities. Nevertheless, the devices’ flexibility to adapt, change and respond in various ways in specific situations is still limited. Especially given the fact that the differences in family setup require quite a significant difference with regards to the objects’ autonomy and interaction mechanisms. In this sense, a way for users to tweak these aspects (e.g. their level of “involvement”, from static frame to an attention seeking agent) would be very beneficial. Not just to adapt to certain social situations but also to try and keep breaking daily routines, as certain interaction patterns might become too monotonous. This represents an opportunity for further exploration in future work. On a similar note, a fully automated content scraper raised questions as well about manually influencing the content source. This was not so much a question of privacy but much more practically; as computers, hard drives are used as backups they often contain multiple copies of similar content. Though manual management of digital content is often considered to be rather laborious and thus seen as a threshold for people to actively do something with it, especially since we are creating ever increasing amounts, there is something about these manual actions that increases value to the collections. Content being brought together (e.g. in a scrapbook) increases the content’s value making it interesting to further explore the material qualities and ways the introduced engagement mechanisms could feed into this stage.
References 1. Aczel, A.D.: Chance. Thunder’s Mouth Press, New York (2004) 2. Bentley, F., Metcalf, C., Harboe, G.: Personal Vs Commercial Content: The Similarities between Consumer Use of Photos and Music. In: Conference of Human Factors in Computing System, Montreal, Quebec, Canada (2006) 3. Botti, S., Iyengar, S.S.: The Psychological Pleasure and Pain of Choosing: When People Prefer Choosing at the Cost of Subsequent Outcome Satisfaction. Journal of Personality and Social Psychology 87(3), 312–326 (2004) 4. Bull, M.: Investigating the Culture of Mobile Listening: From Walkman to Ipod. In: O’Hara, K., Brown, B. (eds.) Consuming Music Together: Social and Collaborative Aspects of Music Consumption Technologies, pp. 131–149. Springer, Dordrecht (2006) 5. Chalfen, R.: Snapshot Versions of Life. Bowling Green State Univesity Press, Bowling Green (1987) 6. Crabtree, A., Rodden, T., Mariani, J.: Collaborating around Collections: Informing the Continued Development of Photoware. In: Proc. CSCW 2004, p. 396. ACM Press, New York (2004) 7. Durrant, A., Taylor, A.S., Frohlich, D., Sellen, A., Uzzell, D.: Photo displays and intergenerational relationships in the family home. In: Proc. HCI 2009, p. 10. British Computer Society (2009)
Meerkat and Tuba: Design Alternatives for Randomness, Surprise and Serendipity
391
8. Durrant, A., Frohlich, D., Sellen, A., Lyons, E.: Home curation versus Teen Photography: Photo displays in the family home. International Journal of Human-Computer Studies (2009) 9. Frohlich, D., Kuchinsky, A., Pering, C., Don, A., Ariss, S.: Requirements for photoware. In: Proc. CSCW 2002, p. 166. ACM Press, New York (2002) 10. Gaver, W.W., Beaver, J., Benford, S.: Ambiguity as a resource for design. In: Proc. CHI 2003, pp. 233–240. ACM, New York (2003) 11. Huynh, D.F., Drucker, S.M., Baudisch, P., Wong, C.: Time Quilt: Scaling up Zoomable Photo Browsers for Large, Unstructured Photo Collections. In: Proc. CHI 2005, pp. 1937– 1940. ACM Press, New York (2005) 12. Kang, H., Shneiderman, B.: Visualization Methods for Personal Photo Libraries: Browsing and Searching in the Photofinder. In: Proceedings of IEEE International Conference on Multimedia and Expo. (2000) 13. Kim, J., Zimmerman, J.: Cherish: Smart digital photo frames for sharing social narratives at home. In: Work in Progress CHI 2006, p. 953 (2006) 14. Kirk, D., Sellen, A.: On Human Remains: Excavating the Home Archive. In: Microsoft Research, Cambridge, UK 15. Kirk, D., Sellen, A., Rother, C., Wood, K.: Understanding photowork. In: Proc. CHI 2006, p. 761. ACM Press, New York (2006) 16. Leong, T.: Understanding Serendipitous Experiences when Interacting with Personal Digital Content. PhD Thesis, University of Melbourne (2009) 17. Levy, S.: The Perfect Thing: How the Ipod Shuffles Commerce, Culture and Coolness. Simon & Schuster, New York (2006) 18. Lindley, S., Monk, A.: Social enjoyment with electronic photograph displays: Awareness and control. International Journal of Human-Computer Studies 66(8), 587 (2008) 19. Lindley, S., Durrant, A., Kirk, D., Taylor, A.: Collocated social practices surrounding photos. In: Proc. CHI 2008, p. 3921. ACM Press, New York (2008) 20. Peesapati, S.T., Schwanda, V., Shultz, J., Lepage, M., Jeong, S., Cosley, D.: Pensieve: Supporting Everyday Reminiscence. In: Proc. CHI 2010. ACM, New York (2010) 21. Sengers, P., Gaver, W.: Staying Open to Interpretation: Engaging Multiple Meanings in Design and Evaluation. In: Proceedings of the 6th Conference on Designing Interactive Systems, pp. 99–108. ACM Press, University Park (2006) 22. Shklovsky, V.: Art as Technique. In: Davis, R.C. (ed.) Contemporary Literary Criticism. Modernism through Poststructuralism. Longman, New York (1917) 23. Swan, L., Taylor, A.S.: Photo displays in the home. In: Proc. DIS 2008, p. 261. ACM Press, New York (2008) 24. Taylor, A., Swan, L., Durrant, A.: Designing Family Photo Displays. In: Proc. ECSCW 2007, vol. 7, Springer, London (2007) 25. van Dijck, J.: Mediated Memories: Personal Cultural Memory as Object of Cultural Analysis. Continuum 18(2), 261–277 (2004a) 26. van Dijck, J.: Memory Matters in the Digital Age. Configurations 12(3), 349–373 (2004b) 27. van Dijck, J.: From Shoebox to Performative Agent: The Computer as a Personal Memory Machine. New Media & Society 7(3), 311–322 (2005) 28. van Dijck, J.: Mediated Memories in the Digital Age, Stanford, USA (2007)
Culture and Facial Expressions: A Case Study with a Speech Interface Beant Dhillon1, Rafal Kocielnik1, Ioannis Politis1, Marc Swerts2, and Dalila Szostak1 1
Eindhoven University of Technology, Den Dolech 2, 5612 AZ Eindhoven, Netherlands 2 Tilburg University, Warandelaan 2, 5037 AB Tilburg, Netherlands {B.K.Dhillon,R.D.Kocielnik,I.Politis,D.Szostak}@tue.nl, [email protected]
Abstract. Recent research has established cultural background of the users to be an important factor affecting the perception of an interface’s usability. However, the area of cultural customization of speech-based interfaces remains largely unexplored. The present study brings together research from emotion recognition, inter-cultural communication and speech-based interaction and aims at determining differences between expressiveness of participants from Greek and Dutch cultures, dealing with a speech interface customized for their culture. These two cultures differ in their tendency for Uncertainty Avoidance (UA), one of the five cultural dimensions defined by Hofstede. The results show that when encountering errors, members of the culture that ranks higher in the UA scale, i.e. Greeks, are more expressive than those that rank low, i.e. Dutch, especially when encountering errors in a low UA interface. Furthermore, members of the high UA culture prefer the high UA interface over the low UA one. Keywords: Multicultural study, culture, cultural differences, uncertainty avoidance, expressiveness, speech interface.
Culture and Facial Expressions: A Case Study with a Speech Interface
393
cultural models by Hall [14], Trompenaars [31] and Hofstede [16] for applications in interface design. The present study is based on Hofstede’s model thus, it relates to exploring differences between national cultures. The following section discusses some of the empirical studies carried out in context of culture and interface design. 1.2 Culture and Interface Design Culture has been shown to affect perception [25], judgement [12] and decision making [5]. Moreover, several studies have shown how the cultural background of users affects their perceptions as well as preferences for interfaces. Researchers Cyr and Trevor-Smith [6] analyzed design elements such as the use of symbols and graphics, color preferences, and site features for 30 municipal websites in Germany, Japan, and the U.S to find out the effect of design elements in culture. They concluded that layout, symbols, navigation, and the use of color are amongst important design elements that vary across cultures and need to be taken into account for localization. Singh et al [29, 30] provided empirical evidence that in comparison to standardized websites, consumers from Germany, Spain, Netherlands, Switzerland, Italy, China, and India preferred culturally adapted websites. As described, there are several studies that emphasize the importance of cultural customization of graphical user interfaces to the cultural background of users [7, 27, 28] but research dealing with cultural customization of speech interfaces has not led to very conclusive results so far [24]. 1.3 Facial Expressions, Emotion and Speech Interfaces Speech based interaction started with telephony but people quickly found telephonic conversations with machines (voicemail, booking systems) rather uncomfortable. Because of that reason, systems with human facades which try to mimic human interaction are becoming a preferred way of interaction [26]. One way of making the interaction more human-like is to develop systems that can interpret the facial expressions of humans to determine their emotional state and adapt their behavior and respond accordingly (Affective Computing). There are technological solutions available that can already provide a robust estimation of a person’s emotional state by interpreting facial expressions [2], [11], [19]. Further, existing research suggests that interpretation of certain facial expressions (e.g. frustration) could help a system deal with errors more effectively [4]. Along the same line, the present study aimed to explore if members with different cultural backgrounds differ in their expressiveness towards speech-based interfaces adapted for culture. Based on the results, a few pointers are provided for cultural customization of a speech-based interface. The following section describes the cultural dimension on which the interface design was based as well as the rationale for choosing Dutch and Greek culture. Since Hofstede’s cultural model is one of the most well-known and widely applied models in both the field of management as well as interface design [8, 21], it was used for the purposes of the current study. 1.4 Culture and Expressiveness One of the cultural dimensions defined by Hofstede [16] that is related to frustration is Uncertainty Avoidance (UA). Uncertainty Avoidance Index (UAI) is used as a
394
B. Dhillon et al.
measure of UA, i.e., the higher the UAI of a country, the higher the UA tendencies of members of that culture. Cultures with high UA are associated with higher anxiety, stress, avoidance of ambiguity and tendency to be normative, as emphasized in rules. Hofstede [16] also suggests that cultures rating high on UAI are emotionally more expressive. As he reports, this might be because people from cultures with high UA experience higher stress and have less internalized emotional control. On the contrary, people living in cultures with low UA tend to experience lower stress and less conflict between norms and experience. Due to this “weak superego” and acceptance of deviation, people from these cultures tend to display less emotion [13]. According to Hofstede’s research, Greek culture has the highest UAI, i.e. 112, while Dutch has a considerably lower score in the same index, i.e. 53, (mean UAI=66.4, min=8, max=112). Since the UAI of Greeks and Dutch differ substantially, representatives of these two cultures were chosen as participants in the study.
2 Hypotheses As described above, research suggests that emotional expressiveness differs across cultures. Additionally, guidelines have been developed for adapting interfaces to different cultures [23], based on the assumption that such an adaptation will lead to a more positive user experience. This study focused on determining whether there are differences in facial expressiveness of members of different cultures when they interact with an error prone interface. Further, the study tested whether the level of expressiveness differs when the interaction involves an interface which is adapted to one’s culture vs. one that is not. Thus, intensity of expression was the dependent variable. These expectations are justified by the researchers’ intuition that any deviation from the expected system behavior would elicit a reaction from the user and the reaction would, very likely, include a facial expression. This deviation was achieved by introducing errors in the system which resulted in an error-prone interaction. Pilot studies by independent evaluators showed that intensity of expression was the variable that most clearly reflected the perceived difference between conditions where there was an error encountered and conditions where there was no error. Two different versions of an interface that varied in the dimension of UA were designed. This enabled the researchers to assess differences in expressiveness during the interaction with each version. A more expressive response was expected for an interface that did not correspond to the cultural expectations of the user. Moreover, a general tendency towards higher expressiveness was expected in high UA cultures. The following hypotheses were formulated to be tested during the study: • • •
H1: When encountering errors, high UA cultures will overall be more expressive than low UA cultures. H2: When encountering errors in a low UA interface, high UA cultures will be more expressive than low UA cultures. H3: When encountering errors in a high UA interface, low UA cultures will be more expressive than high UA cultures.
Culture and Facial Expressions: A Case Study with a Speech Interface
395
3 Methods 3.1 Experiment Design The study was set up as a 2X2 mixed factorial design with two independent variables: culture and type of interface. Culture was divided into Greek and Dutch and type of interface was divided into high UA and low UA. The dependent variables were frustration experienced by participants and facial expressiveness. UAI for each culture group was measured using the Values Survey Module 94 (VSM94) questionnaire, frustration was measured with the NASA Task Load Index (NASA-TLX) questionnaire and expressiveness was measured through a perception test. There are some disadvantages of having a 2X2 mixed design, e.g. learning effects and fatigue. However, the advantage of the design is that a lower number of participants is required. Additionally, statistical power increases since participants’ individual differences are kept constant between conditions. 3.2 Application Design The study was conducted using a Wizard of Oz approach which involved interacting with a speech based Train Booking System. One of the experimenters controlled the system as a response to the users' utterances where the participants were asked to make a booking from London to Newcastle for a specified date and time. Two versions of the system were developed, one customized for a high UA culture and the other for a low UA one. The interfaces were developed using the guidelines available for cultural customization of Graphical User Interfaces [23]. Guidelines were adapted for use in a speech only interaction. Furthermore, during pilot testing, two independent reviewers were asked to interact with the interfaces and the interaction was modified according to the comments provided by reviewers. For the high UA interface, the following recommendations were applied: • • • •
Guided interaction with confirmations at every step. Explicit cues to prevent users from feeling lost. Restricted amounts of data. Limited choices
The following is an example of the high UA interface used in the study: System: Welcome to the Train Booker! Where do you want to depart from? User: London. S: Ok, you are departing from London. What is the destination? You can always go back or cancel. U: Newcastle. S: Ok then, what day do you want to travel between London and Newcastle? U: Sunday. S: And what time do you want to travel on Sunday? U: 12 o'clock.
396
B. Dhillon et al.
S: So you are traveling from London to Newcastle on Sunday around 12 o'clock, right? You can always go back or cancel. U: Right. ... For the low UA interface, the following recommendations were applied: • • •
Free interaction with fewer confirmations More flexible navigation Variety of choices
The following example demonstrates the low UA interface used in the study: S: Welcome to the Train Booker! You can start with places or times, which do you prefer? U: Places. S: Between which places do you wish to travel? U: From London to Newcastle. S: What day and time do you want to travel? You can always go back or cancel. U: Sunday at 12 o'clock. ... Two types of errors, i.e. misrecognition and repetition, were introduced. In the first case, the system would respond as if it had misheard the participant, e.g.: S: What class would you like to travel with? U: Second class. S: First class then? ... In the second case, the system would ask the participant to repeat what he/she had last said, e.g.: S: Do you wish to take a direct or an indirect train? U: A direct one. S: Sorry, can you repeat your train preference? ... An equal number of misrecognition and repetition errors were induced in both high and low UA versions of dialogue, two of each type for each interface. As a result, each user would encounter, in total, 4 errors per interface. 3.3 Participants Participants were recruited from the Eindhoven University of Technology. There were 16 participants, 8 Greek and 8 Dutch. All the participants selected for the study were university students as the VSM94 questionnaire is validated only for participants from similar professional backgrounds. Their age ranged from 18-30. The participants were asked to fill in the VSM94 designed by Hofstede [17] in order to measure the UAI and
Culture and Facial Expressions: A Case Study with a Speech Interface
397
make sure their responses were in line with Hofstede's findings. Participants were offered coffee/tea with a snack as well as a small reward (5 Euros Coupon) for participation. Details on participants’ demographics can be found in Table 1. Table 1. Demographics for participant study Nationality
UAI*
Average age Average time abroad
Greek
105.0
24-29
1-3 years
Gender Male Female 5 3
Dutch 40.7 20-25 0-1 years 6 * UAI stands for Uncertainty Avoidance Index measured for this group
2
3.4 Questionnaires Two questionnaires were utilized during the experiment: the Values Survey Module (VSM94) and the NASA Task Load Index (NASA-TLX). The VSM94 [17] is a 26-item questionnaire developed by Geert Hofstede for comparing cultures, based on region or country, across 5 dimensions: Power Distance, Individualism, Masculinity, Uncertainty Avoidance, and Long-term Orientation. 20 questions (4 questions for each) are used for the dimensions, and the remaining six questions are demographic. Scores for each dimension are computed on the basis of four questions. Each question is scored on a five point scale and the index scores are derived using the mean scores of respondents from these questions. The results are computed using a formula provided with the questionnaire. Score for each dimension is calculated using mean scores for related questions for a group of participants. For calculating UAI, Formula 1, as provided below, is used: UAI = +25*mean Q13 +20*mean Q16 – 50*mean Q18 – 15 mean Q19 +120.
(1)
In the present study, the UAI score for Greek participants was found to be 105 and 40.7 for Dutch participants. A large difference between these scores supports the selection of Dutch and Greek participants based on differences in UAI. The NASA-TLX is a tool that allows users to perform subjective workload assessments of tasks performed with human-machine systems [15]. The ratings are collected on six sub-scales: Mental Demand, Physical Demand, Temporal Demand, Own Performance, Effort and Frustration. The overall workload is computed from weighted averages of these ratings. 3.5 Procedure Participants filled in the VSM94 questionnaire after which they were briefed about the experiment by one of the investigators. While interacting with the interface, the participants were video recorded using the built-in webcam of the computer. The average duration of the interaction was 2:03 minutes for the low UA interface (Std. Dev. 0:17) and 2:22 minutes for the high UA interface (Std Dev. 0:14). After having completed the task with the first interface, participants were asked to fill in the
398
B. Dhillon et al.
NASA-TLX questionnaire which was followed by the interaction with the second interface. The order of interfaces was balanced across subjects to counterbalance for learning effects. After the second interaction, participants were once more asked to fill in the NASA-TLX questionnaire followed by a debriefing session as well as a short interview session. The location and set up of the study was chosen in order to recreate a comfortable and friendly environment for participants. All the questionnaires were used in their entirety to preserve validity. 3.6 Perception Study Perception study is a widely used method for evaluating facial expressiveness, in terms of type and/or intensity of expression [3], [9], [10]. It involves showing a series of videos or pictures to the participants and asking them what kind of expression is perceived by the viewer and/or rating how intense an expression is. The videos recorded during the previous phase of the experiment were edited and instances where a participant encountered an error with the system were extracted. An equal number of videos from Dutch and Greek participants with misrecognition and repetition error were used. The situations considered as erroneous were only the ones where the users explicitly repaired the errors by repeating their command or correcting the system. At times, the participants did not correct the mistake made by the system, giving no evidence of realizing a misrecognition error. In such cases, their videos were not included in the perception test. Also, for each participant, one video with a neutral expression, i.e. a facial expression in a neutral, non-erroneous context, was included in the perception test. The neutral videos were selected from dialogue fragments without errors, and without clearly perceivable muscle movements within a timeframe of 5 seconds, following best practice procedures outlines in related studies [3]. Neutral videos were used later as a baseline in terms of expressiveness. The results from the perception study validated this set-up. The total number of clips presented during the perception test was 56. Each clip was about 2 seconds long. The participants of the perception study rated the intensity of the expression shown in each video on a scale of 1 to 5 (from neutral to very strong). All 17 participants for the perception study were employees of Eindhoven University of Technology (TU/e). In order to balance possible cultural differences in expressiveness perception, participants from at least 9 different cultures were selected. These participants did not have any information regarding the nationality of the participants in the videos or the context of the interaction. Furthermore, none of them had participated in the initial experiment.
4 Results 4.1 Quantitative The ratings obtained from the perception study were averaged for each rater by video type and culture. This resulted in 6 average scores for each rater (2 cultures X 3 Video types): 3 videos of the Greek participants (high UA, low UA and Neutral), and the same for the videos of the Dutch participants. This procedure was followed for all 17 scorers. These scores were analyzed using a 2X3 repeated measurements ANOVA
Culture and Facial Expressions: A Case Study with a Speech Interface
399
with 2 independent factors (nationality - 2 levels; video-type - 3 levels). The results display significant main effects for nationality F(1,16)=27.109, p < 001. Thus, hypothesis H1 is supported. Further, video type F(2,15)=125.466, p<.001 as well as an interaction between nationality and video type F(2,15)=31.804, p<.001 were found to be significant, both for low UA and high UA levels. To tease apart this interaction, a post hoc pair wise comparison (Bonferroni test) was conducted. This test revealed that the perception ratings for neutral video type differ significantly from the two other versions (low and high), but they don’t differ among themselves. Furthermore, unlike high UA or low UA conditions, the neutral videos scored close to 1 (No expression) for both cultures. The mean score, on a scale of 1-neutral to 5-very strong, was 1.309 (Std. err: 0.070) for Greek participants and 1.322 (Std. err: 0.092) for Dutch participants. The neutral videos were indeed rated as having ‘no expression’, thus validating the set-up for the study. The bar graph (Fig. 1) presents the mean scores from perception test for each video with respect to each culture. The results of pair wise comparison for the main effect of nationality and of the interaction between nationality and video-type can be found in Tables 2 and 3.A spliced analysis confirmed that the differences between UAI levels are significant (using separate t-test), both for Dutch and Greek participants separately (P < .001). Thus, the null hypothesis can be rejected and hypothesis H2 and H3 are supported. Table 2. Pair wise comparison for expressiveness by nationality Nationality
Table 3. Interaction between variables nationality and context of interaction (video clips from Low UA interface, Neutral, High UA interface respectively) Nationality
Low UA interface* 2.230 0.094 2.031 2.429 Neutral 1.322 0.092 1.126 1.518 High UA interface* 2.678 0.088 2.491 2.865 * Low/High UA interface: video clip from interaction with low/high UA interface
Finally, the values from the NASA TLX Questionnaire were analyzed using multivariate ANOVA. Culture and system were the independent variables, and workload, frustration and frustration weights were the dependent variables. No significant main effects were found for this interaction.
400
B. Dhillon et al.
Fig. 1. Perception test results – expressiveness mean values for video excerpts from each interaction context (Low UA interface, Neutral, and High UA interface) and nationality
4.2 Qualitative According to the statements from users in the semi-structured interviews conducted after each experiment, five out of eight Greeks preferred the high UA interface, commenting that they “liked the step - by - step and uncomplicated structure and the clear confirmations provided”. Five out of eight Greeks disliked the low UA interface, describing it as “stressful and frustrating” and claiming that it presented them with “too few confirmations”. It is worth noting that none of the Greek participants preferred the low UA interface. Concerning Dutch participants, the preferences were equally distributed between the interfaces. Thus there are no strong conclusions to be made for Dutch people out of the interviews conducted. Only some of the participants expressed strong opinions about their preferences or dislikes when asked. Others, however, had no strong attitude towards either type of interface. Tables 4 and 5 summarize the main findings of this qualitative analysis. Table 4. Preference towards a type of interface based on comments from interviews Nationality
Total
Greek
8
Low UA interface 0
Dutch
8
3
Preference High UA interface 5
3
3
2
None
Culture and Facial Expressions: A Case Study with a Speech Interface
401
Table 5. Dislike of a type of interface based on comments from interviews Nationality
Total
Greek
8
Low UA interface 5
Dutch
8
3
Dislike High UA interface 0
3
0
5
None
5 Discussion Based on the aforementioned results, it can be concluded that the three alternative hypotheses, i.e. H1, H2 and H3, are all supported. When encountering errors, cultures that score high on the UA index are more expressive than cultures with low UA index. Also, when encountering errors in a low UA interface, high UA cultures are more expressive than low UA ones. This holds true for low UA cultures encountering errors in a high UA interface too, but the results are not as strong. The results suggest that Greek participants, who belong to a high UA culture, prefer interfaces with more confirmations and a step-by-step guided navigation. They have stronger reactions when encountering errors during an interaction with an interface with fewer confirmations. This was confirmed by the exit interviews where five out of eight Greek participants commented that they disliked the low UA interface and five out of eight expressed a liking for the high UA interface. The Dutch participants seemed to have slightly stronger reactions when they encountered errors in the high UA interface. However, as discovered during the interviews, three Dutch participants disliked the low UA interface and three of them liked it, thus no strong conclusions can be derived from this observation. Overall eight out of sixteen participants disliked the low UA interface, conveying that if no clear information regarding the culture is available, it is probably better to provide an interface designed for high UA. These findings can be helpful in building systems which detect the expressiveness of a user and adapt their features, e.g. a navigation style that suits the needs of the user. If an interface could detect the UA (based on expressiveness) of its user, especially in context of errors, the interface could adapt its communication style to enable a less frustrating and more efficient interaction. An example could be the case of a virtual agent - based system for flight booking. If the system detects a high level of frustration in the user, then, it could adapt itself to another context more suited to the user, e.g. adapt its navigation style to become more flexible. Since such a system takes into account the individual differences of the user, this adaptation could even be independent from the cultural background of the user and rather focus on the individual reactions of the user without binding the users into cultural stereotypes. Since the present study was conducted using a speech interface, care should be taken while extending them to other kinds of interfaces. Another interesting finding was the fact that the results from the NASA TLX questionnaire provide minimal insights into participants’ preferences for an interface. There was no match between the participant ratings for frustration in the NASA TLX questionnaire and their comments or the way their expressions were perceived in the
402
B. Dhillon et al.
perception test. Thus, this finding demonstrates possible limitations of quantitative methods when it comes to providing insights into human behavior, especially in relation to human computer interaction. 5.1 Possible Bias The results of this study could have been affected by various factors such as cultural adaptation and the lab set up. Since culture was an important aspect of the study, it is essential to note that the study was conducted in the Netherlands and Greek participants living in this country could potentially have shown different behavior from Greeks living in Greece, although measuring the UAI might have taken care of this bias to some extent. While Hofstede's cultural model advises against making statements from responses from less than 20 participants from a given country, the researchers found a substantial difference between the two cultures even for 16 participants. Finally, testing in a lab environment with a Wizard of Oz approach could have introduced bias in comparison to usual context of interaction. Although the researchers are aware of the limitations of Wizard of Oz, it was still chosen as the best option for this type of research because it facilitated a controlled testing of specific hypothesis. Since it was required to control the number and type of errors that the participants experienced during the interaction, a script-based interaction was a good fit. We note that 14 out of the 16 participants stated they were convinced they interacted with a real system, even though they were briefed that the interaction dialogue was being tested and not the system. The remaining two participants did not know that their expressiveness would be analyzed and their expressions didn’t appear fake or staged either. The sample size could be another point of concern in this study, but it should be noted that our initial work with a limited number of participants already displays observable differences. It can be expected that these promising effects will only become clearer when larger groups would be analyzed. 5.2 Conclusion and Future Work Facial expressions are very important in the field of human-machine interaction, not only as an output modality (e.g. avatars), but also as an input modality. Automatic speech recognition is known to improve when systems interpret information from multiple modalities, but facial expressions as a modality have not been explored in depth in HCI. Facial expressiveness constitutes a big part of human communication and as language and gestures differ across cultures, so might facial expressiveness in the context of machine interactions. This study contributes to the area since it explores and finds out that these presumed differences are indeed true, for the dimension related to expressiveness and do not just refer to prejudices. Furthermore, this study demonstrates that such differences could be exploited to improve humanmachine interactions by tuning interaction styles to specific groups of people. In this respect, the current study is both novel and relevant. An important future step for this research could be to include a larger number of participants for both the test as well as the perception study. A larger number of participants would make the cultural difference statement stronger, regardless of the
Culture and Facial Expressions: A Case Study with a Speech Interface
403
strong statistical values that the current research shows. Additionally, testing other cultural dimensions with similar interfaces that produce controlled errors, or testing the same dimensions but in a real interface that is naturally prone to errors could also be of interest for further research. Finally, testing Greek participants in their native country and comparing the results to those of the described study could provide richer insights in this field.
References 1. Ailon, G.: Mirror, mirror on the wall: Culture’s Consequences in a value test of its own design. The Academy of Management Review, 885–904 (2008) 2. Balomenos, T., Raouzaiou, A., Ioannou, S., Drosopoulos, A., Karpouzis, K., Kollias, S.: Emotion analysis in man-machine interaction systems. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 318–328. Springer, Heidelberg (2005) 3. Barkhuysen, P., Krahmer, E., Swerts, M.: Problem detection in human–machine interactions based on facial expressions of users. Speech Communication 45, 343–359 (2005) 4. Barkhuysen, P., Krahmer, E., Swerts, M.: Audiovisual Perception of Communication Problems. In: Speech Prosody International Conference (2004) 5. Briely, D.A., Morris, M., Simonson, I.: Reasons as carriers of culture: dynamic vs. dispositional models of Cultural influence on decision making. Journal of Consumer Research 27 (2000) 6. Cyr, D., Trevor-Smith, H.: Localization of Web design: an empirical comparison of German, Japanese, and US Web site characteristics. Journal of the American Society for Information Science and Technology 55(13), 1199–1208 (2004) ISSN: 1532-2882 7. Cyr, D., Kindra, G.S., Dash, S.: Web site design, trust, satisfaction and e-loyalty: the Indian experience. - Online Information Review. Emerald Group Publishing Limited (2008) 8. Dhillon, B.: Effects of cultural background of users on interface usability. VDM Verlag Muller, London (2010) 9. Ekman, P., Friesen, W.V.: A new pan-cultural facial expression of emotion. Motivation and Emotion 10(2), 159–168 (1986) 10. Ekman, P., Friesen, W., O’Sullivan, M., Chan, A., Diacoyanni-Tarlatzis, I., Heider, K., et al.: Universals and cultural differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology 53(4), 712–717 (1987) 11. Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Raouzaiou, A., Karpouzis, K.: Modeling naturalistic affective states via facial and vocal expressions recognition. In: International Conference on Multimodal Interfaces (ICMI 2006), Alberta, Canada (2006) 12. Gardner, W.L., Gabriel, S., Lee, A.Y.: “I” Value Freedom, But “We” Value Relationships: Self-Construal Priming Mirrors Cultural Differences In Judgment. Psychological Science 10(4), 321–326 (1999) 13. Gudykunst, W.: Bridging Japanese/North American differences. Sage Publications, Inc., Thousand Oaks (1994) 14. Hall, E.: The silent language. Anchor Books, New York (1990) 15. Hart, S.: Nasa-task load index (nasa-tlx): 20 years later. In: Human Factors and Ergonomics Society Annual Meeting Proceedings. Human Factors and Ergonomics Society, pp. 904–908 (2006)
404
B. Dhillon et al.
16. Hofstede, G.: Culture and organizations. International Studies of Management & Organization, JSTOR, 15–41 (1980) 17. Hofstede, G.: Values Survey Module manual. University of Limburg, Maastricht (1994) 18. Hofstede, G.H., Hofstede, G.: Culture’s consequences: Comparing Values, Behaviours, Institutions, and Organisations Across Nations. Sage Publications, Inc., Thousand Oaks (2001) 19. Jarkiewicz, J., Kocielnik, R., Marasek, K.: Anthropometric Facial Emotion Recognition. In: Jacko, J.A. (ed.) HCI International 2009. LNCS, vol. 5611, pp. 188–197. Springer, Heidelberg (2009) 20. Khaslavsky, J.: Integrating culture into interface design. In: CHI 1998 Conference Summary on Human Factors in Computing Systems, Los Angeles, California, pp. 365–366. ACM Press, New York (1998) 21. Kondratova, I., Goldfarb, I.: In: Proceedings of EdMedia 2005 - World Conference on Educational Multimedia, Hypermedia & Telecommunications, Montréal, Québec, Canada, pp. 1255–1262, NRC 48237 (2005) 22. Kroeber, A.L., Kluckhohn, C.: Culture: A Critical Review of Concepts and Definitions (1952) 23. Marcus, A.: Cultural dimensions and global web user-interface design: What? So what? Now what?, http://www.tri.sbc.com/hfweb/marcus/hfweb00_marcus.html 24. Matousek, V., Mautner, P., Moucek, R., Tauser, K. (eds.): Proceedings of the 4th International Conference on Text, Speech and Dialogue, TSD 2001, Zelezna Ruda, Czech Republic, September 11-13, vol. XIII, p. 452 (2001) 25. Nisbett, R.E., Miyamoto, Y.: The influence of culture: holistic versus analytic perception. Trends in Cognitive Sciences 9(10) (2005) 26. Pitt, I., Edwards, A.D.N.: Design of speech-based devices: a practical guide. Hardcover, 179 (2003) ISBN: 978-1-85233-436-9 27. Reinecke, K., Bernstein, A.: Culturally Adaptive Software: Moving Beyond Internationalization. In: Proceedings of the 12th International Conference on HumanComputer Interaction. Springer, Beijing (2007) 28. Sheppard, C., Scholtz, J.: The effects of cultural markers on Web site use. In: Proceedings of the Fifth Conference on Human Factors & the Web, Gaithesburg, Maryland (1999) 29. Singh, N., Fassott, G., Zhao, H., Boughton, P.D.: A cross-cultural analysis of German, Chinese and Indian consumers’ perception of website adaptation. Journal of Consumer Behaviour 5(1), 56–68 (2006) 30. Singh, N., Pereira, A.: The culturally customized website – Customising websites for the global workplace. Elsevier - Butterworth Heinemann, MA, USA (2006) 31. Trompenaars, F., Hampden-Turner, C.: Riding the Waves of Culture: Understanding Cultural Diversity in Business, 2nd edn. Nicholas Brealey, London (1997) 32. Zeng, Z., Pantic, M., Roisman, G., Thomas, S., Huang, A.: Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39–58 (2009)
Equality = Inequality: Probing Equality-Centric Design and Development Methodologies Rilla Khaled Center for Computer Games Research, IT University of Copenhagen Copenhagen, Denmark [email protected]
Abstract. A number of design and development methods, including participatory design and agile software development, are premised on an underlying assumption of equality amongst relevant stakeholders such as designers, developers, product owners, and end users. Equality, however, is not a straightforwardly accepted feature of all cultural perspectives. In this paper, we discuss the situation of equality-centric methods in a culturally mixed setting. We present a case study of the Girl Game Workshop, a game development event intended to empower young women through game design and to promote diversity in game creation. While conducting the workshop, the organisers encountered numerous issues, which presented challenges to their assumptions of the desirability of an emphasis on equality during game design and development. In this paper, we focus on seven key themes relating to equality that emerged from an ethnography conducted during the workshop, including location, cultural and classroom hierarchies, gender, “girl games”, stakeholders and boundaries, and risk mitigation. Keywords: equality, culture, gender, participatory design, agile methodologies, game design.
private school in Copenhagen specialising in delivering instruction in Danish and Arabic to students between the ages of 6 and 15. Ten female DIA-skole students took part in the workshop, all of whom had Arabic cultural backgrounds and were approximately 14 years old. Using a methodology that featured elements of participatory design and Scrum, and with constant support from the organisers, the students developed two games from start to finish despite having no previous experience with game development. As one of the goals of the workshop was self-expression through digital games, the students were encouraged to make the majority of creative decisions. In the process of doing so, however, a number of embedded assumptions the organisers had made about the feasibility of applying methods premised on equality and flatness of hierarchy were challenged. In this paper, we overview the current trend of equality within design and development methodologies in light of the cultural characterisations of Scandinavian and Arabic culture. Next, we provide a summary of the events that took place during the workshop. We then discuss seven themes related to equality that emerged from an ethnographic analysis of the workshop.
2 Background Understanding the events that took place as well as our context of interpretation both during and after the workshop requires knowledge about several different topics. Here we present some general characterisations of the cultures of participants involved in the workshop. We also discuss participatory design and agile methodologies, both of which were very influential on our workshop process. 2.1 Culture Culture, which has been defined as “the software of the mind” [8], is omnipresent. It is shared, learned, and shapes the way we think, perceive, and act. Cultural characteristics can be difficult to articulate for members belonging to those cultures, as cultural beliefs and values are often so basic as to be effectively invisible. Over the last two decades, there has been an increased focus in HCI on culture and cultural differences, ranging from localisation issues and surface level conventions of culture [16] to deeper questions of how to address and incorporate cultural assumptions into design practice [22,12]. In many ways, this interest in culture parallels the growing consideration of situational and contextual factors in HCI, as supported by theories such as activity theory and distributed cognition [15,9]. Like culture, they point to the importance of context in making sense of action and use. Turning more specifically to the cultural backgrounds involved in our workshop, we can briefly characterise both Scandinavian and Arabic cultures in terms of the work of sociologist Geert Hofstede and psychologist Shalom Schwartz. Scandinavian cultures (including Denmark, Norway, and Sweden) have generally been described as individualistic, meaning that according to societal beliefs, the individual is more
important than the group [19,8]. Further, Scandinavian cultures are described as egalitarian and flat in terms of power distance, that is, people tend to perceive one another as existential equals regardless of role or rank. Scandinavian cultures have also been ranked as somewhat low on the scale of uncertainty avoidance, indicating more tolerance of ambiguous or under-defined protocols and situations. Gender roles have blurry divisions, so rather than clear expectations about the roles of men and women, these roles are fluid, and overlaps and interchanges are not unusual. Arabic cultures (including but not limited to Egypt, Morocco, Iraq, and Saudi Arabia) are generally strongly collectivistic, and place the importance of the group over that of the individual [19,8]. For example, when an individual has to choose between his own needs and those of the group to which he belongs, the cultural expectation is that he will prioritise the needs of the group. Correlated to collectivism, Arabic cultures feature rigid hierarchies and large power distances: unequal distributions of power and resources are accepted as legitimate, and the lack of upward mobility by society members with less power is accepted as cultural heritage. Arabic cultures also tend to be uncertainty avoiding, and place much emphasis on rule following and protocol adherence. Finally, in contrast to Scandinavian cultures, in Arabic cultures gender roles tend to be fixed and separated, in that there are clear expectations regarding acceptable roles and behaviour between the genders. 2.2 Participatory Design Participatory design (PD), originally known as cooperative design, is a methodology that first emerged in Scandinavia in the 1970s, and has since experienced wide uptake especially within Scandinavia, the United States, and the United Kingdom. Many of its practices stem from a set of core assumptions, including but not limited to the belief that actions must be understood within their context of occurrence, that values should be democratically negotiated, that end users should be involved in decision making throughout the design and developmental process, and that methodological steps must be taken to ensure fairness of representation of developers, designers, and end users [14,3,10]. PD emerged as a result of action research addressing the concerns of workers and trade unions regarding the introduction of IT to the workplace, and fears that it would result in a decrease in the power, decision making, and participation of workers. It is an inherently political methodology, partly stemming from, and embodying the concerns and expectations of Scandinavian workers. The incorporation of diverse concerns and perspectives of all stakeholders into the design process is central to PD. Wrapped up in these beliefs is an assumption of equality between stakeholders - not just between designers, developers, and end users, but also within these groupings. As discussed earlier, belief in egalitarianism is particularly characteristic of Scandinavian culture. The assumption of equality is reflected in many of the tools and methods used within PD. For example, mock-ups and lo-fi prototypes afford collective modification from end users and designers alike, as their lo-fi nature means that no specialised skill is required for suggesting modifications [4]. Here, there is an effort to flatten skill differences between designers and users. Following in the vein of co-design, in Future Workshops,
408
R. Khaled
users are invited to take part in design tasks with designers, in which they suggest design solutions for current design problems [11]. As another example, during on-site visit tasks, designers specifically seek to understand the working contexts of users, to develop clear and balanced views of the user and to ensure that their own design assumptions and perspectives do not conceal and override those of users [4]. In the final method we mention here, PD games, design concerns are presented within the context of games. This has the effect of flattening power differences and hierarchies, and facilitating contribution, sharing, and collective problem solving, alongside the formation of mutual understandings and vocabularies [5,14]. While the literature reports many successful applications of these methods, and indeed the success of PD more generally, it is worth noting that much of this research focuses on applications in Scandinavian or North American contexts. While there has been much lively debate regarding differences in Scandinavian and North American PD practice (e.g. [13,20]), the underlying values of egalitarianism remain present in both interpretations. 2.3 Agile Development Methodologies Since the publishing of the Agile Software Development Manifesto in 2001 [2], agile methodologies have become very popular amongst industry-based game developers, and are also often used for academic game development [18]. While several variants of agile methodologies exist, including Scrum and extreme programming, most agile methods share a basic set of core principles. One of these core principles surrounds the nature of development, which is assumed to be changeable, in that requirements may change over the course of a project. As such, development is often approached in a just-in-time manner and is conducted iteratively and incrementally, with an emphasis placed on producing deliverable, functional software at each iteration. Another core principle is that teams are self-organising, allocating work amongst themselves and working on the assumption of a largely equal distribution of power between team members. Agile development teams do not pander to “star” developers and everyone must be aware of changes and developments. In some agile methodologies, development is approached though pair programming, where two programmers work together at one work station [1]. Collective code ownership is another characteristic principle of some agile methods, where code is owned by the team rather than any particular individual [1]. This is related to the cross-functional skills of team members, and the expectation that team members should be able to take over tasks previously worked on by others with minimal overhead. Thus, code quality and the maintenance of code functionality are prioritised, as code that has been checked into a collective repository may next be checked out by any other team member. The final principle that we mention here, which has strong overlap with PD, is that close, ongoing contact is maintained with the client or customer throughout the project. In extreme programming, for example, at least one individual representing the interests of the customer is expected to be physically present at the development site 40 hours a week for the entire duration of the project.
3 Case study: The Girl Game Workshop 3.1 Motivations and Objectives The concept of the workshop arose in connection with a desire to address the imbalance in gender representation within the game industry, and also to address the lack of gender and ethnic diversity represented within mainstream games. Further, the organisers wished to provide young women with an experience of game development, for reasons of both education and empowerment. As a response, the organisers came up with the concept of a “Girl Game Workshop”, an intensive three day female-only event, during which students would become familiar with different aspects of game development. Specifically, they would create games from start to completion featuring narrative themes, game play, and audiovisual assets of their own creation. Given that a major focus of the workshop was empowerment, both in terms of skills development and expression, it was important to the organisers that the students exercised a lot of creative and developmental control over the games. As such, the organisers settled on a design and development process featuring elements of PD and Scrum, but with design and development responsibility weighted heavily towards the students, and organisers limiting their input to guidance and facilitation. We point out that our use of the “workshop” concept differs from how workshops are often used within PD, that is, as means to facilitate stakeholders in communicating and committing to shared goals and outcomes [14]. Our intention was to use the workshop as a forum for support, and a space for creativity and game development. Further, we also note that by merging PD and Scrum elements into one process, it meant that we no longer had separation between end-user (player) roles and developer roles, as both roles were played by the students. 3.2 Host Organization: The DIA-Skole The DIA-skole is a private school located in Copenhagen, Denmark that specialises in providing bilingual educational instruction in both Danish and Arabic. Established approximately 30 years ago, it was formed by a group of Denmark-based Arabic parents who wanted their children to have the opportunity to learn Arabic to a sufficiently high standard that they would be able to transition back into Arabic society if the need arose. Over the years, the initial focus on transition was replaced by a focus on maintaining bilingualism within a Danish context. Approximately 90% of the student population come from Arabic countries, while the remaining students come from countries in which Arabic is recognised as an important language, such as Pakistan and Bosnia. After making contact with the principal and receiving preliminary approval to conduct the workshop with DIA-skole students on school grounds, two of the organisers presented the workshop concept to an assembly of students in order to gather interest. Following the presentation, students signed up to participate. 3.3 Workshop Process The Girl Game Workshop participants were the organisers along with 10 female students of approximately 14 years of age. The organisation team was made up of six individuals:
410
R. Khaled
a game designer, an audio designer, a graphic designer, a game artist, a programmer, and the author who participated as a researcher. All three days of the workshop took place in the DIA-skole computer room, which was equipped with individual workstations, as well as other classroom equipment. Each morning, the workshop activities commenced with a daily Scrum standup meeting in which everyone present stated their expectations for the day. On the first day, the organisers introduced the students to some fundamental concepts in game design, focusing in particular on game objectives, core mechanics, obstacles, and resources. The students formed two teams, here referred to as Team A and Team B, and began a two-stage ideation process, which involved individual brainstorming around the concept of “home”, followed by team-level decisions connecting the brainstormed home associations to game concepts which were then paper prototyped. On the afternoon of the first day, the programmer presented the students with an introductory tutorial on Game Maker and fundamental concepts related to game programming. The students then began a series of programming exercises, which they completed individually at separate computers. During the exercise session, the programmer and other organisers circulated the room and assisted the participants. On the second day, the organisers presented three more tutorial sessions on audio design, graphic design, and more advanced programming. After the tutorials, in accordance with agile development, each team determined their own role and responsibility allocations for programming, music, graphics, and story-related tasks, then began development. Development continued for the rest of the day under the watch of the organisers, who stepped in to assist when students specifically asked for help, when they appeared unable to progress further unaided, or when they seemed unproductive and/or distracted. On the final day, the teams continued development work until the deadline in the late afternoon. During the morning, while Team A seemed on track for realising their game concept, it was less clear that Team B would finish without scaling down their creative vision and receiving significant assistance from the organisers. Thus, much of the morning was spent with the programmer and graphic designer closely directing and overseeing the work of Team B. Around midday, once it was absolutely clear that Team B would not be able to meet their targets, the programmer called a meeting with the team members and instructed them to make a prioritised list of features. Following a group decision, the team revised their vision and work plan. With about two hours remaining before the completion deadline, both teams focused on ensuring that minimal game functionality was complete, then turned their attention toward the creation of title screens and game narrative text. After the deadline had passed, the workshop wrapped up with a small party and an awards ceremony for the students. 3.4 Game Concepts Team A: Darbie Going Crazy. Darbie Going Crazy begins with a doll named Darbie, with many similarities to Barbie, having been discarded by a disgruntled girl. The game opens with Darbie waking up in a dumpster, dirty and alone, and with a desire to get home. The game play takes place within two levels that draw on maze game tropes, with the player’s objective being to get Darbie back to her house. During
the journey, the player can increase Darbie’s energy by picking up and eating strawberries, while sneaking past angry dogs, who are present at various points along the path home. The game also includes run-ins with characters inspired by Einstein (“Steinein”) and Elvis Presley, who give the player mysterious and sometimes useless information. See figure 1 for a screenshot from Darbie Going Crazy.
Fig. 1. Darbie in the dumpster in Darbie Going Crazy
Fig. 2. Choosing an outfit in Movie Night
Team B: Movie Night. Movie Night begins with the main character, a teenage girl, having been asked out on a date by the boy of her dreams. On arriving home, the character is told by her father that she must complete housework before she will be allowed to go out on the date. The game features four interactive scenes, each focused on a particular point in the day. The first scene involves the player navigating the main character home through a maze, the second presents an interactive conversation between the character and her father, the third involves a dish washing challenge in which the player has to clean food remnants off a plate with a sponge, and the fourth concerns getting ready for the date, in which the player must choose an outfit for the main character from a set of three outfits. In the final scene the character meets her dream boy at the movies. See figure 2 for a screenshot from Movie Night.
412
R. Khaled
3.5 Research Method The author was invited to participate in the project in the capacity of an ethnographer on the organisation team for the duration of the workshop. As culture is of central importance in this paper, we point out that the author is neither an ethnic Dane nor a Danish Arab. Prior to moving to Denmark, she spent most of her life in New Zealand, but grew up in a Muslim Bangladeshi household. The author’s objectives were to observe and capture the workshop process through participant observation and interviewing. She focused on maintaining an ongoing record of events, observing the effectiveness of the methodological process, and learning about the workshop participants and the DIA-skole. The interviews with the workshop participants were generally unstructured, and conducted at regular intervals in response to the changing state of the workshop. The author’s specific objectives with the interviews included learning about the students as individuals, their relationship to games and media, establishing their reasons for workshop participation, and more generally, gaining impressions of their experiences of the workshop as it progressed. Due to the author’s inability to speak Danish, all of the interviews were conducted in English. This progressed smoothly for some students and somewhat awkwardly for others, due to their varying levels of English and comfortableness speaking an unfamiliar language with an unfamiliar person. After the workshop was over, all notes and observations made during the workshop were coded, analysed, and interpreted in accordance with grounded theory [21]. In grounded theory, during the theory building phase, the researcher attempts to avoid other related literature in order to maintain as clear and unpolluted a focus as possible on the specific data collected. In addition, an iterative approach is used for analysis, such that any emergent categories and themes arising from the codes can be repeatedly checked and cross-checked against the original data, to maximise the chances of a strong conceptual fit. A theory saturation stage is reached once core categories and their relations have stabilised over analysis iterations, and categories are able to accommodate new data. At this point, the important themes arising from the data are ready for discussion.
4 Results Both of the methodologies that informed the workshop, participatory design and Scrum, rest on an underlying assumption of the importance of equality of perspectives between and within stakeholder groups. Over the course of conducting the workshop, we encountered various obstacles and mismatches in expectation related to our process. During the analysis phase, a picture emerged that connected the difficulties encountered to the core assumption of equality. In this section, we discuss seven key themes from the analysis. 4.1 Location Site has been discussed in the PD literature as playing a key role in shaping how perspectives are shared and taken into account [14]. The choice of site usually
involves a decision between bringing designers to the workplace, or bringing workers to the design site. Bringing designers to the workplace has the effect of allowing workers to feel at ease and contextualise explanations of their practice through references to specific tools and processes. In contrast, bringing workers into the design site shifts focus to more general conceptualisations of issues, which may in turn create possibilities for new insights and innovation. Muller notes that a more subtle consequence of choice of site is the inclusion of perspectives of marginal participants, i.e. stakeholders who were not initially considered core stakeholders [14]. Typically, design activity is undertaken at both sites. The Girl Game Workshop was conducted entirely at the DIA-skole, as we had been granted exclusive use of their computer room for all three days. As such, the workshop was held in the “workplace”: it was a location that the students associated with daily practice and was unfamiliar to the organisational team. Conducting the workshop at the DIA-skole had a number of advantages, including reduced overhead costs for the workshop and reduced perceived barriers of entry for the students. While these reasons alone contributed greatly to the successful running of the workshop, the location as a workplace, and particularly as DIA-skole territory, also led to complications. At no point during the workshop was it possible to forget that the workshop was taking place on school grounds. There were ever-present audiovisual reminders that we were in a classroom, including the room layout, the active PA system, and spontaneous visits from faculty members. In terms of equality of perspective, simply due to location, the DIA-skole perspective (not necessarily equivalent to the students’ perspective) was given priority. As an example of this, whenever faculty members dropped by there was a mood change amongst the organisers. We felt an obligation to show that the school’s decision to host the workshop had been justified. For each visitor, one or more organisers stopped what they were doing to explain what was currently happening, to show off the progress the students had been making, and to discuss how well the students were picking up key concepts. Thus, we found ourselves giving preference to the DIA-skole perspective and feeling an implicit need to obtain DIA-skole approval. Even though our objective was to conduct a workshop featuring girls who happened to be students of DIA-skole, inadvertently, the tone subtly shifted to being about DIAskole students specifically. But even if we were more affected by the location than expected, we did not uphold the cultural beliefs and norms of the DIA-skole. This was the result of the organisers and the faculty coming from very different cultural backgrounds, and a lack of planning for how to handle such situations. This clash was further intensified due to the unspoken rule often in operation when on unfamiliar territory, namely, that one should act in accordance with its norms. During one visit from the principal and the board members, for example, the principal paused to look over the shoulder of the graphic designer as she was retouching outfits drawn by a student that formed the outfit choosing sequence of Movie Night. The outfits were inappropriate by Muslim standards: one consisted of a mini skirt paired with a strapless top. The principal did not comment, and nor did we, but it was clear that the outfits did not embody the Islamic values that the principal stated in an interview formed a cornerstone of the school’s profile. Although we had explained to the faculty members that the game
414
R. Khaled
concepts were completely under the control of the girls, in the instance of the outfit, it was difficult not to feel that the principal viewed the workshop as a negative influence. This situation exemplifies a tension between the cultural context of the site and the workshop goals and process. In particular, it emphasised the different standpoints of the DIA-skole and the workshop organisers in terms of power distance and deference to superiors. The DIA-skole, and more specifically, the outward facing image and cultural values of the DIA-skole stakeholders align more with high power distance, hierarchical cultural patterns. Our workshop, on the other hand, was specifically aimed at encouraging flat power distance, equality-focused freedom of expression through design. At a micro level, the same tension arose when the author asked the student who drew the outfits whether she would wear those outfits herself: her answer was that she would not. For this student, too, there was a contradiction between real life clothing choices and the aesthetics chosen for the fictional game world. At the same time, the existence of a clash in expectations surrounding content showed that the students were able to contextualise their activity outside of the DIAskole culture and general school culture associated with the site. We propose that this was a consequence of a number of factors, including the open atmosphere we worked hard to create, and the relatively large number of individuals in the room who were neither DIA-skole faculty nor students challenging the notion that “school culture” was operative. Further, as the students were all bicultural individuals, we can expect that they were adept at transitioning between cultural frames according to contextual relevance [6]. Based on our more mainstream dress and behaviour, that is, we suggest that the students intuited that a DIA-skole Arabic, Islamic cultural framing was less necessary given the context, and that they decided to rely more on a Danish cultural framing to guide their actions and responses to the workshop. 4.2 Cultural Hierarchies The Scandinavian cultural value of egalitarianism pervades much of PD practice. For example, the practices of incorporating multiple stakeholders in key design decisions, conducting design work at both work and design sites, and use of mock-ups and lo-fi prototyping are all manifestations of attempts to flatten power relations and promote equality. For the most part, the underlying assumption of equality has been accepted within the HCI community. As discussed previously, all of the DIA-skole students and faculty had an Arabic background, and we reiterate that Arabic and Scandinavian cultures are in many ways diametrically opposed. We would expect that equality-centric development processes that implicitly endorse egalitarianism would encounter hiccups when used in Arabic contexts, due to patterns of high power distance within Arabic cultures. If we had adapted our process to suit DIA-skole culture, we would have paid closer attention to hierarchies, we would have incorporated the ideas of faculty into the design process, and we would have left workshop-related decision making up to more “powerful” stakeholders (i.e. not the students). These changes would have contradicted workshop goals, however, and more generally, ideals of equality-centric methodologies. In section 4.1 we discussed the implicit pressure to uphold school values. Within the context of hierarchical cultures, this pressure may have been related to our own
position within a DIA-skole hierarchy. All of the faculty members we met were both older than us and DIA-skole insiders, thus outranking us in the DIA-skole hierarchy. As a consequence of being lower in the hierarchy, we were expected by all above us to comply with the will of those with more status. The students also formed part of this hierarchy, but were positioned below us, whereas their regular teachers were positioned above us. The hierarchy could partly explain why we sometimes had difficulty in getting the students to focus during workshop sessions. The students were accustomed to a more authoritarian style of teaching which we did not adopt and thus they viewed us as less powerful. When collaborating with stakeholders, it serves to remember that by default people view others through their own cultural lenses. When stakeholders share the same culture, cultural assumptions are less problematic because they are shared by everyone. When stakeholders come from different cultural backgrounds, however, assumptions and accepted protocol become ambiguous, and this can lead to tension and misinterpretations. 4.3 Classroom Hierarchies The PD literature reports instances in which PD has been successfully conducted with children as participating stakeholders (e.g. [17,10]). In most of these cases, however, parents or other senior stakeholders have also been involved in the activities involving children. The students in our workshop were all approximately 14 years old, theoretically old enough not to require chaperoning. By default, we treated the students as we would adults, extending to them the same equality of perspective, freedom, and expectations of conduct, as this was in keeping with our cultural beliefs about student-teacher relationships. Over the weekend, however, we were frequently reminded that the students were not adults. This ranged from the students’ inability to concentrate for extended amounts of time, to their difficulties with grasping abstract concepts and working individually, to occasional disrespect for tutorial presenters. On the afternoon of the first day, for example, the students became quite distracted during a programming lesson. Two students were being especially disruptive at the back of the room, talking, watching videos on their mobile phones, and hitting each other, instead of paying attention to the tutorial presentation. As we mentioned in section 4.2, our occasional lack of control over the students could have been related to cultural patterns of behaviour, and expectations on behalf of the students of more hierarchical student-teacher relationships. We were conflicted about how to handle situations requiring assertions of authority over the students. On the one hand, the workshop concept was devised around the assumption that there would be equality and partnership between workshop participants. In addition, it was important to obtain the students’ trust and create a positive, nurturing working environment, as the workshop goals involved students expressing themselves and relating comfortably to games and game development. On the other hand, when the students were distracted and disruptive, it created an ambience wholly incompatible with workshop progress and a positive workshop atmosphere. At the end of the first day of the workshop, during a review of the events of the day, we decided to adopt more authoritative roles and to be stricter about creating and
416
R. Khaled
adhering to boundaries. We also agreed to be consistent with one another with regards to expectation setting surrounding levels of authority, as we had concerns about students developing divisions in perception between organisers who were deemed “easygoing” and “trustworthy” versus those considered “stern” and “untrustworthy”. It was important to us to present a consistent and united front, for the sake of workshop cohesion. In making this decision, we were placed in the position of needing to renegotiate our beliefs on how we perceived the students, i.e. not as adults. Further, by deciding to assert more authority, we had to relinquish our principles of egalitarianism with the students, because equal partnership was not tenable. Working with teenagers when using methodologies premised on equality can be complicated. Teenagers do not require chaperoning in the same manner as children, but we cannot assume that processes that have successfully been used with adults will have the same outcomes with teenagers. As we learned firsthand, conducting activities involving unaccompanied teenagers requires consideration of how and where one stands with regards to balancing equality, partnership, and trust. These are factors that are likely to arise when dealing with this age group, which in turn will have an effect on how the assumptions and methods associated with equality-focused methodologies can be operationalised. 4.4 Gender Representations in “Girl Games” A key objective of the interviews conducted with the students on the first day of the workshop was to establish the nature of the students’ relationship with games. The interviews revealed that games played a role in most of their lives: some preferred to play alone, others played games with their siblings, while others had members in their immediate family who were avid gamers. In terms of the games they played, almost all of the students identified the PC as their platform of choice, and over half identified “girl games” as their preferred genre. By girl games they essentially meant “pink games”, i.e. games featuring heavily gender-stereotyped narrative and game play, for example related to dressing up and make up application. The discovery that the students were playing games with such limited perspectives on female gender identity served to emphasise our beliefs surrounding inequality of gender perspectives in digital games. There are a number of possible reasons for why this particular group of students had gravitated towards girl games. One reason might be that they had a genuine interest in fashion and fashion design, but that the game industry has yet to provide interesting games pitched at an appropriate level of narrative and game play for a female, teenaged audience. Not finding anything within that niche, these students ended up playing girl games instead. Another similar reason relates to games and their branding. Despite the growing sophistication of the game industry, digital games are still considered more the domain of males than females, and games are generally more likely to draw on themes and game play more likely to appeal to males. For girls who want to play games but wish to avoid hyper-masculine games, one solution is to seek out games specifically angled towards girls. One of the games showed to the author by a student was linked from a game aggregator site girlsgogames.com [7] that housed a plethora
of similar girl games. In fact, performing a search on “games for girls” returns many sites like this one. According to the internet, at least, “games for girls” implies “girl games”. As such, this is what young women encounter when they seek out games under these terms. The final reason we propose relates again to culture. As discussed earlier, gender roles are fixed and delineated in Arabic culture: men and women are expected to conform to masculine and feminine stereotypes, respectively. Given their cultural backgrounds, these students may have been socialised to be more interested in expressing their female identity in conventionally feminine ways, for example, in terms of make up and dress. Perhaps they found the subject area of girl games genuinely appealing. That we were surprised that the students played these games was at least partly caused by cultural differences surrounding gender. 4.5 Replications of “Girl Game” Themes As discussed previously, the students were given the theme of “home” as a design constraint. In terms of the workshop goals, however, we later felt we should have reconsidered the theme in light of its gendering connotations, particularly in the context of Arabic gender roles. While Team A’s game concept was somewhat neutral in terms of gendering, Team B’s concept was not. In essence, Team B developed a modern day rendition of Cinderella, featuring a controlling father in the place of an evil step-mother and reinforcing a rhetoric of male dominance. In the context of the workshop goals of empowerment and the introduction of minority perspectives to the game scene in terms of culture and gender, Team B’s concept was problematic, and retold a narrative that the workshop was intended to counteract. Why, when given freedom to create any kind of game, did Team B choose to mirror girl games? The easy answer to this question, that girl games are what they like, may not tell the whole story. For members of Team B, we suggest that their very concept of games and how they must be designed was tightly bound to girl games, that is, the games they were most familiar with. On finding out that their task was to create a game in a “Girl Game Workshop”, they may have intuited that a girl game was what they were expected to deliver. Furthermore, the concept developed by the students was undoubtedly influenced by the ideation method employed, which as discussed earlier, involved a stage of brainstorming associations to the concept of “home”, and a second stage of using the associations to inspire game concepts. Using the brainstormed ideas to directly inform the game concepts was perhaps too literal and limiting as a creative process, and may have served more as a transferral step rather than as an inspiration. Less gendered game concepts may have emerged if an alternative ideation method had been used, and if we had worked with students to “unpack” the brainstormed theme associations. We note, though, that even assisting the students in “unpacking” design concepts might have resulted in too much influence on the game concepts. We tried not to intervene during any of the ideation stages, as we felt duty-bound to support the students in developing game concepts and narratives of their own choosing. Thus, by trying to ensure creative freedom, we created a perfectly safe environment for the
418
R. Khaled
students to express gender inequalities, and were bound by our own rules of creative non-intervention to support it. 4.6 Stakeholders and Boundaries When forming the workshop concept, the organisers decided to limit participation to females only, with the limitation applying to students and organisers. When organising the running of the workshop with the school, however, the “female only” atmosphere that we were wanting was not foregrounded as strongly as it should have been. On all three days of the workshop, a DIA-skole teacher who also served as the school journalist dropped by with video recording equipment, staying for up to an hour each time to record footage of the workshop for a school news item. The first time he arrived, we were not sure who he was or why he was present. Given that we were unfamiliar with school protocol and felt that the school was already doing us a favour by hosting the workshop, we felt awkward about imposing conditions about room usage. As such, we never directly stated to the journalist that we preferred him not to drop by, and he continued turning up each day with his recording equipment. His presence might have been less of a concern had it not palpably changed the atmosphere that we were trying to create. He and the students were on familiar terms with each other and each time he arrived the students sought his attention and vice versa, disrupting the workflow. As well as creating a distraction, he was opinionated about the game concepts and not hesitant about sharing his thoughts. For example, after playing one level of Movie Night, which involved the avatar eating candy to power up, he asked, “but won’t that make her fat?” This comment was far removed from the atmosphere of creative freedom that we were attempting to foster. Had we told the journalist that we preferred maintaining an all female environment for the duration of the workshop, he likely would have respected our wishes. This boundary was not established, however, and we felt powerless over these disruptions in our planned process and silently resentful during every visit. While the journalist was not technically a workshop participant, as a DIA-skole faculty member and by virtue of his presence at the location, he became a peripheral stakeholder. While PD places importance on equality of representation of stakeholder voices, his was a perspective we were actively seeking to avoid. So, although balanced representation of diverse perspectives amongst stakeholders is a positive intention, it can also lead to conflicts of interest. To uphold the empowerment goals of the workshop, we would have, in fact, needed to specifically exclude some stakeholder perspectives, thereby supporting inequalities. One solution for establishing boundaries amongst all stakeholders would have been to collectively develop a manifesto pertaining to expectations for the event. Explicitly stating expectations and goals allows stakeholders to discuss and debate expectations, opening further channels of reflection on objectives, participant involvement, fairness, and ethical conduct. Negotiation is a cornerstone of PD [13], and we emphasise its importance here. In our own case, while negotiated boundary setting may seem to contradict equality, it can break down existing power structures by creating a separate context for interaction. Specifically, it facilitates clarity, debate, and fairness, all of which serve to balance inequalities.
4.7 Risk Mitigation In more traditional PD practice, users are heavily involved in design-related activity and much less so during development activity. In the workshop, we intended the students to serve as designers, developers, and also end users, with our own roles focused towards facilitation. As the workshop progressed, however, it became apparent that our roles would also need to encompass managerial activity if the workshop was to stay on track. At various points over the three days, the programmer successfully mitigated risk in the project by identifying potential development time sinks, estimating feature completion times, and helping the students conceptualise and prioritise their work. The importance of managerial intervention should not be downplayed in timebounded projects and must be further emphasised in situations involving young, inexperienced developers, such as the students in the workshop. An alternative and less preferable solution for dealing with deadlines occasionally adopted by the graphic designer was to intervene and complete tasks on behalf of students, who sometimes lacked the necessary skills, interest, or willpower. We had not clarified amongst ourselves our course of action if students were unable to complete on time. Faced with the probable outcome of one game not featuring any graphics, at least while the workshop was progressing, taking the tasks over from the students seemed to be the path of least resistance. In both cases of organisers acting as managers and intervening to complete outstanding tasks, team self-organisation and decision making processes were left to the students. Further, the students were encouraged to make their decisions through a process of discussion and consensus. As well as staying true to the workshop process, this had the effect of fostering a sense of responsibility and ownership over the games as products of a process which they largely controlled. So, even though there was an inequality in the level of skill between organisers and students, the students steered the key creative and developmental decisions related to their games. As a result, they were all responsible for and owners of the final artefacts, and had approached the decision making processes from a position of empowerment.
5 Conclusions Equality is an assumption underlying a number of the design and development methodologies currently in widespread practice. As our experience with the Girl Game Workshop shows, however, and as the cross-cultural research literature indicates, equality cannot and should not be taken for granted. It is a cultural value judgement that is by no means universal. While methodologies such as PD seek to accommodate cultural context within their paradigm, those premised on hierarchy are already in conflict with fundamental methodological assumptions. In this paper, we detailed our reflections on a game development workshop run over three days at a bilingual school in Copenhagen that formally involved ten girls and the organisation team, but also informally included some school faculty as peripheral stakeholders.
420
R. Khaled
At several points throughout the workshop, we encountered cultural and value clashes with our stakeholders that we had not foreseen. One such clash involved the game content developed by students not representing the cultural values of the school. Another centred around the existing hierarchies at the school conflicting with our process assumptions of working within a flat hierarchy. A further clash existed between the kinds of games that students played and our own beliefs about appropriate games. The final clash we mention here regards our own beliefs on what “empowerment” should look like compared to what the students actually produced. If stakeholders come from diverse cultural backgrounds, it is worth considering what kinds of cultural clashes might occur, and then establishing a process for dealing with the various possible outcomes. The clashes that emerged in our workshop might have been predicted if, prior to the workshop, we had negotiated boundaries and expectations with our stakeholders. This negotiation would have served as a discussion forum for learning about each other and clarifying process ambiguities. Boundary setting may mean that strict equality will need to be downplayed, as it may entail the exclusion of certain perspectives and possibilities. In terms of outputs of our workshop process, our intention was for the students to use the games as expressive vehicles for empowerment. But the game developed by one team, and the kinds of games that over half of the students played, referenced themes not in keeping with our own ideas about empowerment. Freedom of expression does not necessarily look like empowerment. In fact, by the end of our workshop, it seemed that equality and empowerment were occasionally at odds with one another. There was a contradiction between equality and empowerment regarding whose perspectives to prioritise amongst the DIA-skole stakeholders, and another similar contradiction regarding our own rules of non-intervention in the design process. Empowerment is about the act of giving or delegating power or authority, whereas equality is about parity and sameness, thus not granting anyone more power than anyone else. So, while equality of perspectives may seem like a means of empowerment, equality-centric methods may not be the best avenue for supporting empowerment, especially if the context of use is one in which egalitarianism is not the norm.
References 1. Beck, K., Andres, C.: Extreme Programming Explained: Embrace Change, 2nd edn. Addison-Wesley Professional, Reading (2004) 2. Beck, K., Beedle, M., van Bennekum, A., Cockburn, A., Cunningham, W., Fowler, M., Grenning, J., Highsmith, J., Hunt, A., Jeffries, R., Kern, J., Marick, B., Martin, R.C., Mellor, S., Schwaber, K., Sutherland, J., Thomas, D.: Manifesto for agile software development (2001) 3. Bødker, K., Kensing, F., Simonsen, J.: Participatory It Design: Designing for Business and Workplace Realities. MIT Press, Cambridge (2004) 4. Bødker, S., Grønbæk, K., Kyng, M.: Cooperative design: techniques and experiences from the Scandinavian scene. In: Human-Computer Interaction, pp. 215–224. Morgan Kaufmann Publishers Inc., San Francisco (1995)
5. Brandt, E.: Designing exploratory design games: a framework for participation in participatory design. In: Proceedings of the Ninth Conference on Participatory Design: PDC 2006, vol. 1, pp. 57–66. ACM, New York (2006) 6. Cheng, C.Y., Lee, F., Benet-Martinez, V.: Assimilation and contrast effects in cultural frame switching: Bicultural identity integration and valence of cultural cues. Journal of Cross-Cultural Psychology 37(6), 742–760 (2006) 7. Games, S.: Girls go games, http://www.girlsgogames.com 8. Hofstede, G.: Cultures and Organisations: Software of the Mind. McGraw-Hill Education, New York (1996) 9. Hollan, J., Hutchins, E., Kirsh, D.: Distributed cognition: toward a new foundation for human-computer interaction research. ACM Trans. Comput.-Hum. Interact. 7, 174–196 (2000) 10. Iversen, O.S., Halskov, K., Leong, T.W.: Rekindling values in participatory design. In: Proceedings of the 11th Biennial Participatory Design Conference, PDC 2010, pp. 91–100. ACM, New York (2010) 11. Kensing, F., Madsen, K.H.: Generating visions: future workshops and metaphorical design, pp. 155–168. L. Erlbaum Associates Inc., Hillsdale (1992) 12. Khaled, R., Biddle, R., Noble, J., Barr, P., Fischer, R.: Persuasive interaction for collectivist cultures. In: Piekarski, W. (ed.) The Proceedings of The Seventh Australasian User Interface Conference (2006) 13. Kraft, P., Bansler, J.P.: The Collective Resource Approach: The Scandinavian Experience. Scand. J. Inf. Syst. 6, 71–84 (1994) 14. Muller, M.J.: Participatory design: the third space in HCI. In: The Human-Computer Interaction Handbook, pp. 1051–1068. L. Erlbaum Associates Inc., Hillsdale (2003) 15. Nardi, B.A. (ed.): Context and consciousness: activity theory and human-computer interaction. Massachusetts Institute of Technology, Cambridge (1995) 16. Nielsen, J. (ed.): Designing User Interfaces for International Use. Elsevier Science Publishers, Amsterdam (1990) 17. Ruland, C.M., Starren, J., Vatne, T.M.: Participatory design with children in the development of a support system for patient-centered care in pediatric oncology. J. of Biomedical Informatics 41, 624–635 (2008) 18. Schild, J., Walter, R., Masuch, M.: Abc-sprints: adapting scrum to academic game development courses. In: Proceedings of the Fifth International Conference on the Foundations of Digital Games, FDG 2010, pp. 187–194. ACM, New York (2010) 19. Schwartz, S.H.: A theory of cultural values and some implications for work. Applied Psychology: An International Review 48(1), 23–47 (1999) 20. Spinuzzi, C.: A Scandinavian Challenge, a US Response: Methodological Assumptions in Scandinavian and US Prototyping Approaches. In: Proceedings of the 20th Annual International Conference on Computer Documentation, SIGDOC 2002, pp. 208–215. ACM, New York (2002) 21. Strauss, A., Corbin, J.M.: Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. SAGE Publications, Thousand Oaks (1998) 22. Vatrapu, R., Suthers, D.: Culture and computers: A review of the concept of culture and implications for intercultural collaborative online learning. In: Ishida, T., R. Fussell, S., T. J. M. Vossen, P. (eds.) IWIC 2007. LNCS, vol. 4568, pp. 260–275. Springer, Heidelberg (2007)
e-Rural: A Framework to Generate Hyperdocuments for Milk Producers with Different Levels of Literacy to Promote Better Quality Milking Vanessa Maia Aguiar de Magalhaes1,2, Junia Coutinho Anacleto1, André Bueno1, Marcos Alexandre Rose Silva1, Sidney Fels3, and Fernando Cesar Balbino1 1
Universidade Federal de São Carlos (UFSCar) – São Carlos, SP – Brasil 2 Embrapa Gado de Leite 3 University of British Columbia {vanessa_magalhaes, junia,andre, marcos_silva,balbino}@dc.ufscar.br, [email protected]
Abstract. We created and tested e-Rural, an approach to allow educators to dynamically adjust the target literacy level for their online learning content using a combination of three tools: PACO-T for planning, COGNITOR for editing hyper documents and Simplifica for text simplification. PACO-T and COGNITOR use the Brazilian Open Mind Common Sense knowledgebase (OMCS-Br) to provide access to commonly held understandings and beliefs on a diverse set of topics associated with a large range of Brazilian demographics, including, people with low literacy. We tested our experiment with 13 users that were creating hyperdocument-based learning content to describe important methods to milk production. We chose milk production as this is one of Brazil’s primary agricultural products and yet it has been established that there is a wide gap between the content from researchers with methods to greatly enhance the quality and economic power of milk production and the tacit knowledge and procedures of the farmers who actually produce the milk who are often at low literacy levels consistent with Brazil’s low literacy levels being around 75% of the population. Our experiments reveal that educators are able to produce milk related learning content geared towards different literacy levels using our tools with a very satisfying efficacy and efficiency levels. Thus, we believe that the use of our approach that introduces demographically sensitive common sense holds promise to bridge the gap between high literacy researchers with evidence-based approach to milk production and tacitly-based, low-literacy milk producers to better develop the milk industry in Brazil. Keywords: Accessibility, literacy, textual simplification, textual equivalents, W3C Recommendation.
e-Rural: A Framework to Generate Hyperdocuments for Milk Producers
423
compatible with the economic development of the country [1]. According to the National Indicator of Functional Illiteracy [2], approximately 75% of Brazilian population between 15 and 64 years of age do not have the complete level of literacy. In other words, 75% of Brazilian do not have complete writing and reading skills to support them understanding information and using it in their work, habit, life, etc.. Within this percentage 7% are considered as absolute illiterate, 21% as literate at a rudimentary level, 47% at a basic level and only 25% have full literacy. This reality, illustrates that Brazilians have difficulties understanding texts and, consequently, have limited access to information, knowledge and cutting edge technology. Social, cultural, educational, perceptual, cognitive, and motor differences existing among people are cited as primary factors contributing to this reality. A way to deal with the different needs and diversities in Brazil is to consider adapting the content of texts and information according to literacy level through using ICTs (Information and Communication Technologies). ICT have been shown to be efficient and effective as tools for: working, study, entertainment, means of expression and communication among people of different ages, special needs, abilities, capacities and interests. Understanding how the previously cited factors influence the way people read and understand a text, as well as the way they access and use ICT, can be a pathway to friendlier and easier ways to use these technologies to present a particular adapted text, facilitating better understanding and usage by everyone, independent of their socio-economic, cultural, educational, and cognitive conditions status. If successful, this approach improves the democratization of information and promotes autonomy of citizens. We have investigated creating and making hyperdocuments according to the literacy level of a specific target public including using specific cultural knowledge represents this public’s knowledge, values, etc. The main objective of using the contextualized documents is to allow the public to identify and understand what is written by taking into account their educational and cultural level. This information is integrated within a contextualized hyperdocument. The use of these contextualized hyperdocuments in the context of Embrapa (Brazilian Company of Research on Agriculture and Livestock), specifically Embrapa Cattle and Milk, is a strategy to disseminate information and knowledge respective to the Normative Instruction 51 (NI 51) of the Ministry of Agriculture, Cattle Breeding and Supply (MAPA) of the Federal Government of Brazil. The NI 51 is mandated to be transmitted to every person involved with the milk production in Brazil, for the purpose of improving the quality of the milk produced in Brazil through establishing minimum requirements during the production. The problem lies in that the information in the instructions is full of complex terms that are hard to understand and is written for an audience that is expected be fully literate. As pointed out above, in Brazil, this assumption is not valid, especially for the farmers who produce the milk, leading to the instructions to be ignored by critical people in the milk production workflow. Thus, The efforts to improve the quality of milk production through the dissemination of evidence-based information is thwarted. To address this problem, we conducted an experiment performed using our framework for creating contextualized hyperdocuments that considers rudimentary literacy level and cultural knowledge obtained through a project called Open Mind Common Sense in Brazil (OMCS-Br) described in the next section. This paper is organized as follows: section 2 describes the common sense and OMCS-Br project, in
424
V.M.A. de Magalhaes et al.
section 3 covers our framework and its characteristics, section 4 details our experiment and results followed by final considerations in section 5.
2 Common Sense and OMCS-Br Project Common sense is defined as set of known facts by most people who live in a particular culture in certain age, “arraying a wide part of the human experiences, knowledge about the spatial, physical social, temporal and psychological aspects” [3]. In order to collect this kind of common sense knowledge and use it to develop contextualized technological applications, i.e., applications which consider cultural human knowledge in their interface and content, the Advanced Interaction Laboratory (LIA) of the Federal University of São Carlos (UFSCar) in collaboration with MediaLab-MIT developed OMCS-Br Project, which is a project in Brazilian Portuguese language [6]. In this project, there is a website www.sensocomum.ufscar.br where Brazilians can access it to tell what they know, belief, think, etc., in other words, they can tell about their commonsense knowledge. To enter in this website, a previous enrollment is necessary because according it the project can provide filters that allows queries considering a particular profile (age group, geographical localization, gender and level of academic training). In this website, there are nine distinctive themes and twenty activities aiming to collect and approach types of knowledge that compounds the people’s common sense. For example, there are themes to collect what Brazilians know about colors, sex, slang, among others. It was created a theme called “All about Milk” to collect information about milk production for this specific research, described in this paper. Through this theme, it was possible to collect people´s cultural knowledge and vocabulary about cow, milk production utilities, steps for milking and other useful information necessary to create hyperdocuments taking into consideration people´s reality. It is important to say that this research also uses the whole knowledge collected from other themes and activities. Collecting common sense is done through templates, as shown in Fig. 1, a template from “All about milk” theme. Templates are sentences with a dynamic part (dashed green), fixed part (outlined in yellow) and gaps (the second rectangle), to be filled by people according to what they know or believe to be true. The dynamic part changes for each user interaction harnessing the knowledge already collected in other interactions, thus, the website has a feedback system to use the knowledge from the base to collect new information.
Fig. 1. Example of a template from “All about milk” theme
In the beginning the collected knowledge is stored in the OMCS-Br knowledgebase in a natural language. As the computer does not deal well with the natural language, a processing which generates a dynamic net called ConceptNet was performed, based on the concepts and twenty Marvin Minsky relations. Minsky was a
e-Rural: A Framework to Generate Hyperdocuments for Milk Producers
425
researcher of the artificial intelligence area who researched about the human knowledge mapping for the computational [4]. ConceptNet communicates with the computer tools, like some tools existents in e-Rural framework explained follow, from a set of functions, an Application Program Interface (API), which was developed for this purpose.
3 e-Rural Framework e-Rural framework was created to support educators to dynamically adjust the target literacy level for their online learning content using a combination of three tools PACO-T, Simplifica and Cognitor. This framework describes step by step what educators need to do and what computer tools they need to use in order to do hyperdocuments according target group´s profile. i.e., target literacy level. It is important to say that the main characteristic of this framework is to support educators who do not have good technological background to use many different and independent tools to do a hyperdocument. Once this kind of background is not common among Brazilians educators and, in previous experiments using these tools independently it was possible to notice that educators did not know when they could use tools. Then, we integrated these three tools in a single framework. Thus, some adjustments were made in the tools to improve their use and to make them work better together. As one example, we can cite the insertion of the functionality of search on the base of common sense in the Simplifica tool. In framework (see Fig.2), the first tool used is PACO-T [9], designed to assist teachers in the planning of learning consists of seven steps yet we used this tool until step six because in the seventh step we used other two tools which are Simplifica and Cognitor. Simplifica [7] aims to lexically and semantically simplify a text for people of rudimentary and basic levels of literacy. The lexical simplification process starts with the identification of difficult words in the text, that is, words not found in the Porsimples dictionary that lists words from the children vocabulary and from daily newspaper. These words are less frequent in the children’s daily lives, i.e., agricultural technique words are considered complex. For each complex word have been found we perform a search in other dictionaries and in common sense knowledgebase to find synonyms, if there are any. After lexically simplification it is done strong semantically simplification. The third tool is Cognitor [8] aiming to support organizing and editing educational content. Cognitor allows teachers to create contextualized contents considering concepts and analogies used and well known by learners’. For the context approached in this paper, the rural producer ones, who have low literacy most of the times, the usage of texts with simplified structure and of reduced dimensions does not guarantee total access to the information, being necessary other means to transmit this knowledge. Thus, the development of applications for these individuals requires special care in the elaboration of the content. In order to make this content more accessible, the textual equivalent cultural contextualization is proposed, as well as the linguistic simplification to this public level of literacy.
426
V.M.A. de Magalhaes et al.
Fig. 2. e-Rural framework
4 Experiment The experiment was conducted with 13 people organized into three groups A with computer science researcher’s users, B with professionals in computer science and digital content creator users and C with researchers and technicians specialized in milk quality users. All users received a script to be followed and applied at the eRural framework. The purpose of this script was to create and to plan a learning content using the PACO-T tool at first. A text set out in high level of literacy, part of the NI 51, chosen by the user should be transformed to the rudimentary level using Simplifica, and finally should be created a hyperdocument with the transformed text using the Cognitor tool. Each user chose a text from the script (there were 6 texts) and then the level of readability of the text chosen was measured, lexically simplified, and when necessary, syntactically simplified. After the textual simplification process was done (with the level of text readability in rudimentary level of literacy), the user created the hyperdocument with the simplified text. At the lexical simplification process the user could look for synonyms in the OMCS-Br cultural knowledge base, use his/her own knowledge or the offered replacement suggestions offered by Simplifica. In the syntactic simplification process, users could choose the best sentence using the offered suggestions by Simplifica and change them as well as using their own common sense. Hypotheses were formulated for the study regarding the creating contextualized hyperdocument process. Null hypothesis (Ho) - Using Simplifica tool is enough resource on text simplification to process text from full literacy level to rudimentary level. In this context, it is not necessary to use common sense because it does not support Simplifica on lexically simplification. (H1) - Using common sense is a useful knowledge to create contextualized hyperdocuments with simple texts in rudimentary level.
e-Rural: A Framework to Generate Hyperdocuments for Milk Producers
427
Hypothesis test. The purpose of this step is to verify with any significance degree (α), if it's possible to reject the null hypothesis (H0) in favor of some other alternative hypothesis (H1) based on the data set obtained. If it's not possible to reject the null hypothesis, nothing can be said about the study result. Table 1 shows the summary of the textual simplification process made by the group. The first column represents the number of words that were automatically identified by Simplifica as complex during the lexical simplification process. The second represents the number of words replaced by their synonyms found in the OMCS-Br cultural knowledge base, available with the contextualized Simplifica tool, that was improved with the functionality to be inserted in the e-Rural framework. The third column represents the number of words replaced by the users’ suggestions that we also considered as a result from the e-Rural framework. The fourth column represents the number of complex words that users have replaced using the replacement suggestions from Simplifica (not the contextualized version). And the last column represents the need for syntactic simplification to achieve the level of readability of rudimentary literacy. Table 1 - Results of the textual simplification process by group A, B and C: Texts Group A Group B Group C
Nº of Words in the text 3 5 3
OMCS-Br cultural knowledge (Ms) 26 15 13
Replacement suggestions from Simplifica(Md) 1 0 0
At first, when comparing synonyms obtained from the common sense knowledgebase with the synonyms suggested by the Simplifica tool in the Table 1, it is possible to see the first indication that the use of the common sense knowledgebase helped in the contextualized culturally hyperdocuments creation allowing the chance to transform a hyperdocument into its rudimentary level. To see this effect in a statistical way, we applied the paired t-test [6]. This parametric test is used to compare two independent samples and check if their averages are statistically different and thus show that the hypothetical effect was demonstrated. In our experiment, there are the text data samples contain complex words that were replaced by synonyms found in the common sense knowledgebase or by the user own common sense and by words suggested by the Simplifica tool. The paired t-test is based on the idea that, when you are looking for differences between two samples X and Y, you must judge the difference between their averages considering the dispersion or the variability of the data that compose them. It is because as greater the data variability from the two samples compared, greater the chances of these data are superimposed on the normal distribution, even if the average doesn't change. The following equations show the method of paired t-test. It is one reason: the numerator is the average difference, and the denominator, also called the standard error of difference, there is a measure of variability or data dispersion. The T value will be positive if the first sample average is greater than the second, and negative if it's less. After calculating T, the default distribution table of statistic probability t-Student [10] should be checked to discover whether the calculated ratio is sufficiently large so that we can attest that it is unlikely that the sample difference
428
V.M.A. de Magalhaes et al.
was mere coincidence. It establishes, therefore, a significance level α which represents the "cut point" or "risk level" with which it is allowed to claim that there is difference between the samples and to reject the null hypothesis. Becomes α = 0.025. This means that we assume a 2.5% risk of finding a significant difference between the samples averages, even though this gap was by chance (false positive). The confidence level of the test result in this case would be 97.5%. It is also necessary to establish the freedom degree (fd) for the test, which corresponds to the total number of samples groups subtracted of two (fd = n-1). Thus, taking T, α and fd the t value should be noted in the standard table to determine if the T value is large enough to be significant. The t value checked in the table indicates the probability of having obtained the calculated difference between the samples averages if both factors were equal. For the study, the t value would indicate the probability of having been obtained the difference if the two samples were based only on the conventional process of replacement from the Simplifica tool. If this probability is too small, then can be concluded that the observed study result is statistically significant. In this case, if the t value on the table related to the chosen cut point and the freedom degree is less than the calculated T, the null hypothesis is rejected. Otherwise, the null hypothesis is not rejected and no conclusions can be obtained from the study. Once the effect on the dependent variable in the study (common sense effectiveness) involved an aspect, the synonymous substitution, the application of the paired t-test in the sample data set was accomplished in one step. We compared the related samples to the replaced words number by the each group common sense. The null hypothesis test was based on the rejection criterion test, if T > tα, H0 is rejected. Otherwise, H0 is not rejected. Considering data from s, it is possible to obtain that T = 4,7595 and the freedom degree is fd = n-1 = 3 -1 = 2, and significance degree α = 0,025. In the distribution table of statistic probability t from Student, t is t0.025,2 = 4,303. Taking into consideration T > t 0.025,2 was possible to reject Null hypothesis (Ho) with α = 2,5%, on other words, 2,5% of chance to reject Null hypothesis (Ho) as true and 97,5% to confirm that we have a correct test.
5 Conclusions Once the null hypothesis was rejected (using only the suggestions provided by the Simplifica tool it is possible to transform a technical paper from the full level of literacy to the rudimentary level of literacy), it is possible to conclude regarding the independent variables influence on the dependent variables, whereas the experiment is valid and the threats to validity were treated. With the H0 rejection in the study, we can state that the observed differences in the effectiveness of using synonyms from the common sense knowledgebase to create hyperdocuments for farmers who have rudimentary level of literacy and using the synonymous offered by the substitution suggestions has statistical significance, ie., the treatments applied (the two processes of lexical substitution) were the cause of the efficiency changes and not because of mere chance. As noted on the data presented in Table 2 the average of words replaced by common sense knowledgebase in the culturally contextualized hyperdocument creation process using the e-Rural was higher than the synonyms used and available in the replacement suggestions from Simplifica tool (μsSense> μsTips).
e-Rural: A Framework to Generate Hyperdocuments for Milk Producers
429
Finally, considering the experiment was conducted in vitro under controlled conditions, it is important to emphasize that the conclusions about the observed results in this work are restricted to the scope of researchers in the field of computer science, professors and expert researchers in the agriculture area, in an university environment and a research institution. For reasons of external validity, to extend the generalization of the observed phenomenon to an even broader context of other units of Embrapa interested, it is necessary further studies in other environments, in different contexts in order to obtain a broader validation of the research hypotheses. The tools union and the help of the common sense knowledgebase from a single web computing framework, e-Rural, helps to reduce the effort, avoiding extra work on editing hyperdocuments, including editing technical text in rudimentary literacy level.
References 1. IBGE - INSTITUTO BRASILEIRO DE GEOGRAFIA E ESTATÍSTICA (2009), http://www.ibge.gov.br/home/estatistica/populacao/ condicaodevida/indicadoresminimos/sinteseindicsociais2009/ indic_sociais2009.pdf 2. Indicador de Alfabetismo Funcional – INAF BRASIL (2009), http://www.acaoeducativa.org/images/stories/ pdfs/inaf2009.pdf 3. Liu, H., Singh, P.: ConceptNet: A Practical Common Sense Reasoning Toolkit. BT Technology Journal 22(4), 211–226 4. Minsky, M.: The Society of Mind. Simon and Schuster, New York (1986) 5. Singh, P.: The OpenMind Commonsense project. KurzweilAI.net., http://web.media.mit.edu/~push/OMCSProject.pdf 6. Anacleto, J.C., et al.: Can common sense uncover cultural differences in computer applications? In: Bramer, M. (ed.) AI In Theory And Practice, vol. 217, pp. 1–10. Springer, Heidelberg (2006) 7. Maziero, E., Candido Jr., A.C., et al.: Supporting the adaptation of texts for poor literacy readers: a text simplification editor for brazilian portuguese. In: Proceedings of Workshop of Innovative Use of NLP for Building Educational Applications at NAACL 2009 (2009) 8. Buzatto, D., Anacleto, J.C., Dias, A.L.: Providing Culturally Contextualized Metadata to Promote Sharing and Reuse of Learning Objects. In: Proc of the 27th ACM SIGDOC 2009, pp. 163–170. ACM, New York (2009) 9. Carvalho, A.F., Anacleto, J.C., Neris, V.P.A.: PACO-T: A Computational Framework for Planning Cultural Contextualized Learning Activities by Using Common Sense. In: Proc. WCCE 2009, IFIP World Conference on Computers and Education (WCCE 2009) (2009) 10. http://en.wikipedia.org/wiki/Student%27s_t-distribution
Designing Interactive Storytelling: A Virtual Environment for Personal Experience Narratives Ilda Ladeira1, Gary Marsden1, and Lesley Green2 1
Department of Computer Science/ICT4D Lab, University of Cape Town {iladeira,gaz}@cs.uct.az.za 2 Department of Social Anthropology, University of Cape Town Lesley [email protected]
Abstract. We describe an ongoing collaboration with the District Six Museum, in Cape Town, aimed at designing a storytelling prototype for preserving personal experience narratives. We detail the design of an interactive virtual environment (VE) which was inspired by a three month ethnography of real-life oral storytelling. The VE places the user as an audience member in a virtual group listening to two storytelling agents capable of two forms of interactivity: (1) User Questions: users can input (via typing) questions to the agent; and (2) Exchange Structures: the agent poses questions for users to answer. Preliminary results suggest an overall positive user experience, especially for exchange structures. User questions, however, appear to require improvement. Keywords: Virtual Reality, Digital Storytelling, User Experience.
Designing Interactive Storytelling: A Virtual Environment
431
2 Background 2.1 Digital and Virtual Storytelling Digital storytelling typically seeks to preserve and disseminate real-life, non-fiction stories. Some have focused on supporting user generated content [7, 8] and others on creating publically accessible archives [4, 5, 6]. And, while some digital storytelling projects have provided interactivity in story creation and capture [7] and archive browsing [5], none, to the best of our knowledge, have explored interactive presentations of real-life stories. On the other hand, virtual storytelling projects have explored interactive storytelling extensively [9]. But, it has been, almost exclusively, with fictional content where users can manipulate, evolve or completely author narrative plots [10, 11, 12]. Others have also explored allowing users to influence stories’ emotional tone, such as Silva et. al.’s system [13], where users could input how they would like the story told via tangible cards placed into an “influencing box”. So, digital storytelling is devoted to presenting real-life stories whose plots listeners should, typically, not be able to edit, leading to story experiences with little or no interactivity. Meanwhile, virtual storytelling tends to give users agency over story plots, leading to highly interactive story experiences with little authorial control. We posit our work as something of an intersection, since we aimed to create digital, reallife stories where users can interact with story content, but not in ways that alter narrative plots. 2.2 Simulated Museum Guides In museums, there have been impressive efforts towards digital tour guides which simulate specific guide-visitor interactions. After conducting a brief survey of tour guide experiences, Yii & Aylett [14] created digital tour guides on hand-held devices. These mobile guides had different “personalities”, which influenced the stories they told. Additionally users were able to continually input their level of interest which influenced the extensiveness of stories told. Yamazaki et al. [15] studied 15 instances of real guide-visitor interactions in an art museum to inform the creation of a robot capable of mimicking some of the interactions they observed. They placed special focus on the direction of a guide’s gaze and their robot was able to detect human faces so as to direct its “gaze” in appropriate directions and able to respond to audience reactions and interruptions to a limited degree. The robot guide also periodically posed “involvement questions” where the robot would pose a question about a museum artifact, pause for a preset amount of time and then give the correct answer. However, responses to these questions were preset and, hence, non-interactive, serving rather to create the illusion of interaction. One of the interaction techniques in our storytelling prototype aims to improve on this by creating a more truly interactive form of involvement questions, which we term “exchange structures”. 2.3 Personal Experience Narratives Personal experience narratives are those where a storyteller tells of something that happened to themselves or an acquaintance [16]. Labov [17] describes a, now well-
432
I. Ladeira, G. Marsden, and L. Green
established, linguistic structure which personal experience narratives often follow wherein they are composed of a number of ordered components: abstract (signals a story’s beginning); orientation (provides context); complicating action (the main event – usually something noteworthy or unexpected); resolution (the outcome); evaluation (commentary on why the story is interesting or noteworthy, this may appear near the end and throughout the story); and coda (signals the end). Martin and Plum [18] further identified distinct genres of personal experience narratives: recounts are exact descriptions of events akin to a courtroom testimony, exemplums serve to convey moral or pedagogical judgments and anecdotes convey emotional and/or humorous aspects of the teller’s experience. With live oral storytelling, a storyteller conveys a story using words and body language and audiences may interact with the storyteller to influence a story’s course or tone. Hence, each oral retelling of story differs with the extent of variation ranging from slight to where the audience might be considered as co-storytellers [19]. However, where a storyteller is telling a personal experience narrative, variations would not alter the overall plot. Discourse analyses of classroom conversations between teachers and students sheds light on the specificities of storyteller-audience interactions. A simple interaction might involve a student making a comment or asking a question to which the teacher responds. This can be extended to a longer interaction, called an exchange structure, which the teacher initiates by posing a question to which students may respond, leading to a dialog [16, 19, 20]. These exchanges are not unlike the “involvement questions” implemented in [15], but are, of course, interactive in that a teacher can respond to a number of different student answers. In our work, we used ideas of understanding personal experience narratives from [17] and [18] and insights on teacher-student interactions from [20] to analyze ethnographic data and design interactive storytelling agents.
3 Research Overview As [10] points out, real storytellers establish a special connection with audiences that is difficult to emulate digitally. Our approach was to study real-life oral storytelling ethnographically in order to observe how experienced storytellers connect with audiences and draw them into their personal stories. To this end, we chose to observe Joe and Noor, who tell their personal stories every day to District Six Museum visitors. We aimed to learn about their techniques for making their personal stories engaging, dynamic and interactive. Our findings inspired the design and implementation of a storytelling prototype as a virtual environment (VE) containing interactive storytelling agents. Most recently, we conducted a large user study to evaluate our prototype’s effectiveness.
4 Ethnographic Study of Storytelling Our research began with the rather broad aim of observing real-life techniques for making narratives engaging, dynamic and interactive. We wanted to learn how oral storytellers make content vivid and compelling when telling about personal experience
Designing Interactive Storytelling: A Virtual Environment
433
and examine real storyteller-audience interactions. The first author spent three months at the District Six Museum unobtrusively observing tours led by Joe, Noor, independent guides not employed by the museum, and, occasionally, other museum staff. We observed and took field notes for thirty-nine tours – of these we audio recorded (via lapel microphone) nine tours and video recorded snippets from a further four tours. Audio recorded tours ranged from nine to sixty minutes; these were transcribed and became a main focus of our analysis. Using the classification framework from Martin & Plum [18], we identified anecdotes and exemplums which appeared most often in Joe and Noor’s tours. This led us to five stories (two from Noor and three from Joe), for which we had recorded three to five retellings each. We used Labov’s structural framework [17] to identify each retelling’s constituent components, totaling nineteen distinct discourse analyses. This allowed us to discern a general linguistic structure for each of the five stories and examine the storyteller-audience interactions closely. We found that the stories matched the structure reported in [17] almost exactly and that they, somewhat disappointedly, did not vary much across retellings. Instead there was remarkable consistency in their structure and content. We believe this is due to the frequency with which Joe and Noor tell their stories – their storytelling has become well-rehearsed [21]. Despite this, their stories were not completely static and the variations we did observe arose, chiefly, due to two types of storyteller-audience interactions, which we translated into our prototype design. First, audience questions: Audience members were free to ask questions at any point during a story. Typically, we noticed that they either waited for a pause in the storytelling or raised their hand and waited to be called on by the storyteller. Questions were also solicited by the storyteller explicitly inviting the audience to ask questions, usually at the end of a story. If no one asked questions immediately when invited, Noor especially, would drop hints for questions they could ask, for example: “You can ask me anything about District Six... games, gangsters, you name it, right?” Second, were exchange structures; an interaction initiated by the storytellers which matched the exchange structures defined in [20]. Periodically, a storyteller directed a question to the audience and then waited for responses. They were either looking for one or for a number of correct responses and the interaction would end when the audience gave enough correct answers. Storytellers usually handled incorrect answers by encouraging more answers. For instance, Noor often asked audiences to guess how he felt upon witnessing his home’s demolition whereupon audiences offered responses, such as “sad”, until reaching the answer Noor is looking for: “angry”. In another example, he asks the audience to name some Cape Town townships, soliciting a number of correct answers before ending the interaction. An important observation about audience questions and exchange structures was that they always took place at the end of a narrative component and before the next began.
5 Storytelling Prototype Our prototype design was based on the ethnography findings discussed in Section 4. We chose to implement a VE with interactive storyteller agents since we wanted a somewhat natural way for users to see the museum objects we were including and the
434
I. Ladeira, G. Marsden, and L. Green
likeness’ of the Joe and Noor. However, the storytelling interactions we designed are orthogonal to a VE implementation and could be used in other storytelling applications. Fig. 1 shows the VE at the start: the user is part of a virtual audience facing two storytelling agents, modeled on Joe and Noor, and is able to move and look around using standard keyboard and mouse controls. The agents introduce themselves and, then, begin the first story, eventually telling five stories. A key design consideration in ensuring the prototype’s sustainability was to make it possible for the museum to edit and improve on it easily. Thus, we chose a free implementation platform and ensured that it ran on an affordable desktop machine. We used Microsoft’s XNA Game Studio and created the models in Blender 3D. The agents’ animations were based on Joe and Noor’s typical gestures and movements during storytelling. We noted these from video footage and had grown to know them well during the ethnographic study. The soundtrack was composed entirely of recordings gathered during the ethnography. While not our original intention, this allowed us to (a) present the stories as told spontaneously and (b) combine different retellings so that the VE presented, not one particular version of a story, but a composite version.
Fig. 1. The storytelling virtual environment including the storytelling agents modeled after Noor Ebrahim (left) and Joe Schaffers (right). Also visible, are other audience members and museum objects: around Noor are pictures of his grandparents, son and former home and near Joe, pictures of his former home, a former main street and Apartheid-era public signs.
5.1 User Questions Here we aimed to mimic the hand raising behavior observed in the museum. So, at any point during a story, the user may press the Space bar to ‘raise their hand’ and signal the desire to ask a question. The prompt, shown in Fig. 2, “You may press SPACE to put up your hand”, serves to remind users of this. When the user presses space bar, a hand icon, also shown in Fig. 2, is displayed to indicate that the user’s hand is up. The agent first finishes the narrative component that is in progress before acknowledging the user’s question. The typing window, shown in Fig. 3, then appears for the user to type their question. The agent has a reserve of possible question responses and we use simple keyword matching to find an appropriate response. If no matching response is found the agent responds by saying “I don’t know”. Occasionally the agent invites the user to ask questions by saying something like “If you have any questions, raise your hand”, and then waits for the user to press Space.
Designing Interactive Storytelling: A Virtual Environment
435
If the user does not do this within a certain timeframe, a virtual audience member asks a question instead. This way, the VE does not wait indefinitely, or unrealistically long, on the user. We also devised a way of supplying the user with question hints: If the user takes longer than a certain time to either type a question or ‘raise their hand’ when invited, the VE displays keywords hints for questions the agents can answer.
Fig. 2. While the Noor agent is telling a story the user is reminded that they may ‘put up their hand’ by pressing the space key (left screenshot). When the space key is pressed, a hand icon, right, appears and is displayed until the agent acknowledges the user’s question.
Fig. 3. The typing box in which users enter user question and exchange structure input
5.2 Exchange Structures The second interaction type, exchange structures, is initiated by an agent directing a question to the user. When this happens, the typing box is, again, displayed for the user to type an answer. Each exchange structure has a ‘final correct answer’; if the user enters this answer, the agent gives positive feedback and the interaction ends. The agent is also able to give feedback to a number of non-terminating answers (which may be correct or incorrect answers). As before, keyword matching is used to recognize the user’s input. Whenever the user enters a non-terminating or unrecognized answer, the agent encourages another answer, saying something like “No, try again”, and the typing box is again provided. To ensure that this interaction does not cycle indefinitely, the user has three tries at answering a question before a virtual audience member answers. Following on from the exchange structure described Section 4, we created an exchange structure initiated by the Noor agent asking “What do you think? How did I feel?”; if the user types “sad”, he responds
436
I. Ladeira, G. Marsden, and L. Green
with “Sad” and waits for another answer; if the user types “angry”, he responds “Angry! That’s the word! I was so angry!” and the interaction ends. If the user types anything else, he responds “And what else?” and waits for another answer.
6 Preliminary Results and Conclusion While it is impossible to simulate all the nuance and interactivity of a real storyteller, we believe our VE recreates two interactions common in oral storytelling. To test the effectiveness of adding user questions and exchange structures to a digital storytelling experience, we conducted a user evaluation with 150 university students. Participants signed up voluntarily, and were each paid 50ZAR. Each participant first experienced a training VE to practice keyboard and mouse navigation as well as user questions and exchange structures, which were presented textually. Participants then experienced the storytelling VE with an unobtrusive experimenter in the room making ad-hoc observations. Preliminary results indicate a surprisingly positive response with many participants reacting to the prototype by giggling, speaking, exclaiming and nodding. Many approached the experimenter afterward to find out more about the prototype. Participants’ comments suggest particular enjoyment of the VE’s storytelling nature and the exchange structures. However, user questions were less successful with numerous participants reporting frustration at the agents’ inability to answer their questions. A superficial examination of usage logs revealed that a large proportion of the user questions were, indeed, unsuccessfully answered. Currently, the prototype is limited by the number of recordings we had of Joe and Noor answering questions. We have since recorded Joe and Noor answering all those questions which the agents were unable to answer. These will be added to the VE to increase the agents’ repertoire of answers. The user questions could also be made more effective using artificial intelligence techniques more sophisticated than our simple keyword matching. The final step in this project is a deployment of our prototype as a trial exhibit at the District Six Museum, the original setting that inspired it. We have found that observing real-life storytelling there was an invaluable design starting point. Linguistics and discourse analysis helped us identify structure in the stories we recorded and well-defined interaction patterns, allowing us to ultimately produce a prototype that users responded to positively. This paper describes only the first part of our work; we look forward to conducting more in-depth analyses of our user evaluation and forthcoming museum deployment towards understanding which aspects of our design were truly effective and how it could be improved. Acknowledgments. Our thanks to the District Six Museum, especially Joe Schaffers and Noor Ebrahim. Research partially supported by South Africa NRF Grant.
References 1. Maynes, M.J., Pierce, J.L., Laslett, B.: Telling Stories: The use of Personal Narrative in the Social Sciences and History. Cornell University Press, Ithica (2008) 2. Welsh, D.: The Rise and Fall of Apartheid: From Racial Domination to Majority Rule. Jonathan Ball Publishers, South Africa (2010)
Designing Interactive Storytelling: A Virtual Environment
437
3. Ebrahim, N.: Noor’s Story: My Life in District Six. IM Publishing, Charlesville (2009) 4. BBC, Capture Wales: Digital Storytelling, http://www.bbc.co.uk/wales/audiovideo/sites/ galleries/pages/capturewales.shtml 5. USC Shoah Foundation Institute, Visual History Archive, http://dornsife.usc.edu/vhi/ 6. University of Cape Town, Centre for Popular Memory, http://www.popularmemory.org.za 7. Bidwell, N.J., Reitmaier, T., Marsden, G., Hansen, S.: Designing with Mobile Digital Storytelling in Rural Africa. In: 28th International Conference on Human Factors in Computing Systems (CHI), pp. 1593–1602. ACM Press, New York (2010) 8. Jones, M., Harwood, W., Buchanan, G., Frohlich, D., Rachovides, D., Lalmas, M., Frank, M.: Narrowcast yourself: Designing for Community Storytelling in a Rural Indian Context. In: 7th Conference on Designing Interactive Systems (DIS), pp. 369–378. ACM Press, New York (2008) 9. Cavazza, M., Pizzi, D.: Narratology for Interactive Storytelling: A Critical Introduction. In: Göbel, S., Malkewitz, R., Iurgel, I. (eds.) TIDSE 2006. LNCS, vol. 4326, pp. 72–83. Springer, Heidelberg (2006) 10. Brooks, K.M.: Do story agents use rocking chairs? In: ACM Multimedia 1996, pp. 317– 328. ACM Press, New York (1996) 11. Charles, F., Mead, S.J., Cavazza, M.: User Intervention in Virtual Interactive Storytelling. In: 1st Conference on Autonomous Agents and Multiagent Systems (AAMS), pp. 318– 325. ACM Press, New York (2002) 12. Mateas, M., Stern, A.: Structuring Content in the Façade Interactive Drama Architecture. In: 1st Annual Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pp. 93–98. AAAI Press, Marina del Rey (2005) 13. Silva, A., Raimundo, G., Paiva, A.: Tell Me That Bit Again... Bringing Interactivity to a Virtual Storyteller. In: Balet, O., Subsol, G., Torguet, P. (eds.) ICVS 2003. LNCS, vol. 2897, pp. 146–154. Springer, Heidelberg (2003) 14. Yii Lim, M., Aylett, R.: Narrative Construction in a Mobile Tour Guide. In: Cavazza, M., Donikian, S. (eds.) ICVS-VirtStory 2007. LNCS, vol. 4871, pp. 51–62. Springer, Heidelberg (2007) 15. Yamazaki, K., Yamazaki, A., Okada, M., Kuno, Y., Kobayahsi, Y., Hoshi, Y., Pitsch, K., Luff, P., von Lehn, D., Heath, C.: Revealing Gauguin: Engaging Visitor’s in Robot Guide’s Explanation in a Art Museum. In: 28th International Conference on Human Factors in Computing Systems (CHI), pp. 1437–1446. ACM Press, New York (2010) 16. Pridham, F.: The Language of Conversation. Routledge, London (2001) 17. Labov, W.: The Transformation of Experience in Narrative Syntax. In: Labov, W. (ed.) Language in the Inner City: Studies in the Black English Vernacular. University of Philadelphia Press, Philadelphia (1972) 18. Martin, J.R., Plum, G.: Construing Experience: Some Story Genres. J. Narrative and Life History 7(1/4), 299–308 (1997) 19. Bauman, R.: Story performance and event: Contextual studies of oral narrative. In: Burke, R. (ed.) Cambridge Studies in Oral and Literate Culture. Cambridge University Press, London (1986) 20. Sinclair, J.McH., Coulthard, R.M.: Towards an Analysis of Discourse: The English used by teachers and pupils. Oxford University Press, Oxford (1975) 21. Ladeira, I., Nunez, D.: Story worlds and virtual environments: Learning from oral storytelling. In: 10th Annual Workshop on Presence, pp. 257–264. ISPR (2007)
Choosing Your Moment: Interruptions in Multimedia Annotation Christopher P. Bowers, Will Byrne, Benjamin R. Cowan, Chris Creed, Robert J. Hendley, and Russell Beale School of Computer Science, University of Birmingham Birmingham, B15 2TT, UK {C.P.Bowers,W.F.Byrne,B.R.Cowan,R.J.Hendley, R.Beale}@cs.bham.ac.uk
Abstract. In a cooperative mixed-initiative system, timely and effective dialogue between the system and user is important to ensure that both sides work towards producing the most effective results, and this is affected by how disruptive any interruptions are as the user completes their primary task. A disruptive interaction means the user may become irritated with the system, or might take longer to deal with the interruption and provide information that the system needs to continue. Disruption is influenced both by the nature of the interaction and when it takes place in the context of the user’s progress through their main task. We describe an experiment based on a prototype cooperative video annotation system designed to explore the impact of interruptions, in the form of questions posed by the system that the user must address. Our findings demonstrate a preference towards questions presented in context with the content of the video, rather than at the natural opportunities presented by transitions in the video. This differs from previous research which concentrates on interruptions in the form of notifications. Keywords: annotation.
Choosing Your Moment: Interruptions in Multimedia Annotation
439
A mixed-initiative multimedia annotation system might be performing its own analysis of the contents of a document in order to support the user by suggesting annotations to the user or applying its own annotation. Dialogue with the user might take the form of queries designed to help the system with its analysis, perhaps by resolving conflicts between different interpretations of some parts of the content. Responses to these queries will influence the main annotation task. The system as a whole will perform better if queries are answered by the user promptly and accurately, so an effective dialogue should take into account the best times to present particular queries to the user. However, system-driven interactions can also have a negative effect if they disrupt [12, 17] or irritate [1, 16, 19] the user during their primary task. So as well as a benefit (user input), a query will also have a cost [12, 2, 9, 10] which might be detrimental to the overall outcome of the task.
2 Previous Work Interruptions in human-computer interaction have received much attention from researchers. Findings show that interrupting users from their primary task can have a negative impact on performance [12, 8, 17], and produce higher levels of frustration, stress and anxiety [19, 1, 16]. Other studies have found that interruptions can also have a detrimental effect on how long it takes to complete a task [14, 7], decision making ability [18], emotional state [19, 1], and the number of errors users make during a task [12]. Researchers have been investigating whether interrupting users at “opportune” moments might reduce some of these negative effects. This approach is largely guided by Miyata and Norman [15] who argued that interruptions should be made at times of lower mental workload. They also argued that these periods typically occur at subtask boundaries of task execution. Bailey and Iqbal [5] also found that interruptions can have lower cost if made during periods of lower mental workload. Pupil dilation (a recognised measure of workload) was measured whilst performing a number of different tasks (route planning, document editing, and email classification) and it was largely confirmed that signs of decreased workload occurred at subtask boundaries. This work suggests that, if possible, interruptions should be deferred to a breakpoint in the task to reduce costs and negative effects. Iqbal and Bailey [11] also found that presenting notifications at break points in problem solving tasks (i.e. a programming task and diagram editing) reduced frustration and resumption time of the task when compared with presenting the notifications immediately. It was also found that the relevance of the notification content influenced at what time the notification should be presented. In particular, it was found that notifications relevant to the user’s current work should be presented at medium or fine breakpoints (i.e. during lower level activities such as editing code or adding shapes). Conversely, it was found that less relevant or generic notifications should be presented at coarser breakpoints (i.e. higher level events such as the user switching to their mail or instant messaging client).
440
C.P. Bowers et al.
Adamczyk and Bailey [1] also examined the effects of interrupting a user at different stages of a task (document editing, writing summaries of videos and web searching tasks). The selection of the points at which users were interrupted was based on their predicted cognitive load. There were two primary conditions where interruptions were presented: presumed best and presumed worst times. The other conditions were random and no interruptions. The presumed best times tended to be at the completion of subtasks whilst the presumed worst were when the user was performing work on their primary task. The presumed best condition resulted in reduced annoyance, frustration, and mental effort, while the presumed worst condition resulted in the poorest ratings on each of these measures. Mark et al. [13] examined the impact of interruptions in an email management task and found that in contrast to the results of the other studies, participants who were interrupted completed the task in less time than those who were not interrupted. The authors suggest that people compensate for interruptions by working more efficiently, but also that this increased efficiency comes at an extra cost. Whilst participants that experienced the interruptions completed the tasks faster, they also experienced increased levels of stress, frustration, time pressure, and effort. In this paper, we describe an experiment to measure the effect of interrupting users performing a video annotation task by asking them questions in a range of contextual conditions, and discuss how our results compare to the existing body of research.
3 The CASAM System - Interruptions and Disruptions The experimental data was collected during trials with a prototype of the CASAM (Computer Aided Semantic Annotation of Multimedia) system. As well as a user interface in which the user can view video and perform the usual playback functions, CASAM incorporates a video analysis engine that extracts concepts from video and a reasoning engine that constructs and maintains an ontology to manage these concepts. The reasoning engine makes inferences about possible relationships between concepts and generates queries to help it resolve ambiguities. These queries can sometimes seem trivial to the user. However, answering the query might have a significant impact on the content and structure of the ontology and therefore any future inferences made from it. This will, in turn, affect the efficiency of the reasoning component and the future dialogue with the user. The user interface component of the system is responsible for transforming these queries into a human-readable form and deciding how and when to present them to the user. Concepts from the ontology are then used by the system to annotate the video in collaboration with the user. The operation of these analysis and reasoning components is otherwise transparent to the user. If a system such as CASAM presents a query while the user is engaged with another task or simply concentrating on the video then the user will be interrupted, which is likely to have a negative effect on the user’s work, or at least make the user think it does. How disruptive the user finds the interruption depends to a large extent on when it occurs relative to what they are doing [1]. We can identify three broad contextual conditions, with respect to the user's current activity: In context, Out of context, and Opportune.
Choosing Your Moment: Interruptions in Multimedia Annotation
441
For the In context condition queries are presented immediately after the relevant part of the document, which the query refers to, is viewed by the user. In the Out of context condition queries are presented at a point at which the user is engaged with a part of the video which is unrelated to the content of the query, and the Opportune condition accounts for those interruptions which take place at times when the user is not engaged with any particular task or section of the video. In a video annotation system this might be at the boundary between two consecutive shots. Opportune interruptions are generally considered preferable.
Fig. 1. Screenshot of the prototype interface
Bailey et al. [3, 4] find that users are more annoyed when presented with peripheral tasks when they are engaged with another primary task than when they are not, and that users perform slower on an interrupted task than when uninterrupted. They go on to show that interrupting the user at Opportune moments (that is, when users are not engaged with another task) lessens the disruptive effect. Experimental data on interruptions may be used to formulate some measure of the cost of those interruptions, both in terms of the user's perception of the disruption the interruptions cause and the measurable effects they have. However, it should be noted that any assessment of cost would ideally need to be linked to some measure of the actual effect of the interruptions on the outcome of the user's task. We do not pursue that in detail here, focusing instead on the perceived and measurable effects of the interruption itself.
4 Experiment Design The prototype interface used for the experiments consists of a video player and a panel in which queries about the content of the video are displayed and answered. The video player allows the user to replay previous sections of the video, and the query panel allows them to tick check boxes to choose an answer to a query (Figure 1).
442
C.P. Bowers et al.
Table 1. Time and content of questions presented to the user for Context and Question Type conditions Condition ID
Context :Time
Query Type: Question
1
In context: 00:06 Out of context: 02:12 Opportune: 00:23
Important: Are these flies, bees, or snow? Trivial: What colour is the wall?
2
In context: 00:35 Out of context: 02:04 Opportune: 00:41
Important: Is this man a journalist? Trivial: Is this a field?
3
In context: 00:45 Out of context: 01:48 Opportune: 00:58
Important: Is this man wearing a protective suit? Trivial: Is this person wearing a hat?
4
In context: 01:05 Out of context: 01:31 Opportune: 01:21
Important: Is this Dave Hackenburg? Trivial: Are these boxes in the background?
5
In context: 01:25 Out of context: 00:06 Opportune: 01:31
Important: What object is the person in this frame using? Trivial: What is the colour of this person’s jacket?
6
In context: 01:31 Out of context: 00:35 Opportune: 01:42
Important: Is the person on the left a researcher? Trivial: Does the man on the left have dark hair?
7
In context: 01:48 Out of context: 00:45 Opportune: 01:54
Important: Is this is a hive? Trivial: Are there trees in the background?
8
In context: 02:04 Out of context: 01:05 Opportune: 02:09
Important: Is this person angry? Trivial: Is this person wearing a shirt?
When a query arrives in the interface the video is paused, interrupting the user, and the query is highlighted using a red border. Once answered, the video continues to play. Queries presented to the user only concern the content of the video and were hand designed to replicate the form, complexity and context of queries that would be generated by the automated reasoning engine of the CASAM system. Two between subject variables were used in the study. The first (Question Type) consisted of two levels; Important and Trivial questions. This classified queries according to how relevant they were to a high-level description of the content of the video. For example, an important query might be: “What object is the person in this frame using?” while a trivial query might be: “What colour is this person’s jacket?”. The second (Context) varied the context within which the interruption occurred. The
Choosing Your Moment: Interruptions in Multimedia Annotation
Description How successful were you in accomplishing what you were supposed to do? How well did the interface support the annotation task? To what extent do you think the system's questions focused on the key points of the video? How irritated were you with the questions the system asked? How usable was the interface?
Rating (0 - 100)
Did the interface behave in ways which you expected?
Very Low – Very High
How much effort did it require for you to answer the system's questions?
Very Low – Very High
What is your overall perception of the interface?
Very Poor – Very Good
How mentally demanding was the task? How well do you think the system will have annotated the video given the nature of the questions asked? How responsive was the interface? How useful do you think the questions were in helping to enhance the system’s processing? Did you understand what was going on during the annotation task? How hurried or rushed was the pace of the task?
Very Low – Very High
Failure – Perfect Very Low – Very High Very Low – Very High Very Low – Very High Very Low – Very High
Very Badly – Very Well Very Low – Very High Very Low – Very High Very Low – Very High Very Low – Very High
conditions within this variable differed on whether the question was asked In context, Out of context, or in an Opportune condition. In the In context condition a question must be presented to the user immediately after the relevant information has been presented in the video. In the Opportune condition a question must be presented to the user at a point of transition between shots in the video. In these cases the timing of the questions are predetermined by the content of the video and the question. For the Out of context condition, questions are presented at a considerably different time to the relevant content within the video. In combination this makes a total of 6 Question Type-Context condition pairs (e.g. Trivial – In Context) within the experiment. Participants were randomly allocated to one of these 6 possible pairings. The video the users were shown contained 8 shot breaks, and each user was asked 8 questions about the video. For the experiment the content and presentation of the questions were both hand coded. The full set of questions and conditions used is shown in table 1. For each query the time it took for the user to respond was recorded. After completing the experiment users completed a questionnaire designed to assess their perception of both the content and usefulness of the queries, their effect on the task and how challenging they were to deal with. The post experiment questions are shown in table 2, and were scored on a visual analogue scale with 100 being very high and 0 being very low. Participants were told before starting that they would be asked questions about the video as it progressed. 92 participants took part in the study and were recruited from the body of students at the University of Birmingham using an email request for participation. The experiment was conducted online. The data for two of the participants was excluded
444
C.P. Bowers et al.
entirely from the dataset due to high amounts of missing data and the existence of extreme outliers. The video they were shown was a news report about falling bee populations in the United States with a running time of two minutes. The same video was shown to all participants through the same interface. Only the question condition pairings were varied for each participant.
5 Results 5.1 User Perceptions Two-way (2x2) between subject ANOVAs were used to analyse the effects of the Question Type and Context variables on the user perception dependent variables measured in the post experiment questionnaire. Before reporting the findings of the statistical analysis it is worth noting that due to the amount of analysis conducted there is an increased probability of Type 1 error (a false positive). This has been controlled within each ANOVA analysis by the use of Bonferonni post hoc tests however due to the amount of ANOVA’s conducted the readers must interpret the findings whilst being aware of the potential for Type 1 error in this analysis. Additionally due to the amount of analysis conducted, only significant effects are reported in the main body of this paper. A full description the analysis is included in table 12. Interaction Success Participants in the In context condition rated the perception of their success in accomplishing the task as higher than the Out of context condition. There was a significant main effect of Context with respect to the perceived success of the interaction [F(2, 84) = 6.97, p<0.01]. Participants in the In context condition rated the interaction as significantly more successful than those in the Out of context condition (p<0.001). There was no significant difference between the In Context and Opportune (p>0.05) and Out of Context and Opportune conditions (p>0.05). There was no significant main effect of Question Type on interaction success [F(1,84) = 0.17, p>0.05]. There was also no significant interaction between Question Type and Context on ratings of interaction success [F(2,84) = 0.11, p>0.05]. The means related to this analysis are displayed in table 3. Table 3. Mean scores of interaction success by Context and Question Type Context In context
Out of Context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
Choosing Your Moment: Interruptions in Multimedia Annotation
445
Support Participants in the In context condition perceived that the interface better supported the task compared to those that experienced the Out of context and Opportune conditions. There was a significant main effect of Context on how supported participants felt during the annotation task [F(2,84) = 11.82, p<0.001]. Participants in the In context condition felt significantly more supported than those in the Out of context (p<0.001) and Opportune conditions (p<0.05). There was no significant main effect of Question Type [F(1,84) = 0.26, p>0.05] or significant interactions of Question Type and Context on participants’ feeling of support during the task [F(2,84) = 2.13, p>0.05]. The means for this analysis are displayed in table 4 below. Table 4. Mean score of support for the annotation task by Context and Question Type Context In context
Out of Context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
Key Points Those participants in the Important condition perceived the question to address the key points of the video more so than those in the Trivial question condition. There was a significant main effect of Question Type with regard to the extent that participants perceived the questions to focus on the key points of the video [F(1,84) = 22.15, p<0.001]. Participants in the Important condition perceived the questions to focus on the key points of the video significantly more when compared with participants in the Trivial condition (p<0.001). There was no significant main effect in relation to the Context [F(2,84) = 0.75, p>0.05] and no significant interactions [F(2,84) = 0.18, p>0.05]. The means for this analysis are displayed in table 5. Table 5. Mean scores of extent to which questions address the key points of the video by Context and Question Type Context In context
Out of context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
Irritation Irritation was shown to be lower in those participants in the In context condition as compared to those in the Out of context condition. There was a significant main effect of Context on how irritated participants felt during the interaction [F(2,84) = 4.40, p<0.05]. Those in the In context condition felt significantly less irritated than those in the Out of context condition (p<0.05). There was no significant difference between the Out of context and Opportune (p>0.05) and the In context and Opportune conditions (p>0.05). There was no significant main effect of Question Type [F(1,84) = 0.18, p>0.05] or interaction of Context and Question Type [F(2,84) = 0.48, p>0.05] on how irritated participants felt. The means are displayed in table 6. Table 6. Mean scores of irritation with the questions by Context and Question Type Context In context
Out of context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
Usability Usability of the interface was rated higher by those in the In context condition than those in the Out of context condition. There was a significant main effect of Context on how usable participants perceived the interaction [F(2,84) = 4.39, p<0.05]. Those in the In context condition rated their interaction higher in usability than those in the Out of context condition (p<0.01). There was no significant difference between usability scores gained in the In context and Opportune (p>0.05) and the Out of context and Opportune conditions (p>0.05). There were no significant effects related to Question Type [F(1,84) = 0.80, p>0.05]. There were also no significant interaction [F(2,84) = 0.39, p>0.05]. The means are shown in table 7. Table 7. Mean scores of usability of the interface by Context and Question Type Context In context
Out of context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
Choosing Your Moment: Interruptions in Multimedia Annotation
447
Expected Behaviour The participants rated the interface as behaving as expected more so in the In context condition than in the Out of context condition. There was a significant main effect of Context in terms of participants judgements of whether the system behaved in ways that the user expected [F(2,84) = 6.54, p<0.001]. Participants in the In context condition rated the expected behaviour of the system significantly higher than those in the Out of context (p<0.01) condition. There was no significant difference between In context and Opportune (p>0.05) and Out of context and Opportune (p>0.05) conditions. There was no significant main effect of Question Type [F(1,84) = 0.28, p>0.05] or interaction effect [F(2,84) = 1.83, p>0.05] for this dependent variable. The means for the analysis are displayed in table 8. Table 8. Mean scores of interface behaving as expected by Context and Question Type Context In context
Out of context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
Effort There was a significant main effect for Context [F(2,84) = 3.70, p<0.05]. Participants in the In context condition rated the effort required as significantly less than those in the Out of context condition (p<0.05). There was no significant main effect of Question Type [F(1,84) = 0.0, p>0.05] conditions. There were no significant interactions [F(2,84) = 0.62, p>0.05]. The means for the analysis are displayed in table 9. Table 9. Mean perception of effort required by Context and Question Type conditions Context In context
Out of context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
Overall Perception The overall perception of the interface was rated higher for those in the In context condition than for those in the Opportune condition. There was a significant main effect of Context in overall judgement of the system [F(2,84) = 3.84, p<0.05]. Participants in the In context condition rated the expected behaviour of the system significantly higher than those in the Opportune condition (p<0.05). There was no significant difference between the Opportune and Out of context (p>0.05) as well as the Out of context and In context conditions (p>0.05). There was no significant effect of Question Type on participants’ judgements on the system overall [F(1,84) = 0.12, p>0.05]. There was no significant interaction effect [F(2,84) = 0.96, p>0.05]. The means are displayed in table 10. Table 10. Mean scores for overall perception of the system by Context and Question Type Context In context
Out of context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
There were no significant main effects for Context or Question Type or any significant interactions for any of the remaining questions. 5.2 User Response Times Measuring query response times is an objective way of quantifying the impact upon user performance. Figure 2 shows the quartile distribution of the response time for each of the test conditions and table 11 depicts the table of means used in the statistical analysis for participant response times. For interruption questions there was a significant main effect of Context [F(2,84) = 11.09, p<0.001]. There was no significant main effect of Question Type condition [F(1,84) = 0.08, p>0.05] and no significant interaction [F(2,84) = 0.42, p>0.05]. Participants consistently answered queries In context faster than either the Opportune (p<0.01) or Out of context (p<0.001) cases.
Choosing Your Moment: Interruptions in Multimedia Annotation
449
Table 11. Mean scores for participant response time by Context and Question Type Context In context
Out of context
Opportune Total
Question Type Important Trivial Total Important Trivial Total Important Trivial Total Important Trivial
Condition Fig. 2. Quartile distribution of response times for all questions in each condition
6 Discussion The results show that the time at which the system presented questions to the user had a strong influence on their perceptions and the amount of effort that was required to complete the annotation task. Participants in the important question condition perceived that questions focused on the key points of a video significantly more than those in the trivial question condition. However, there were no significant main effects in the question conditions in all other cases. This suggests that whilst participants could clearly distinguish between the two different types of questions they did not fully understand how the questions influenced the system’s processing. The In context conditions were generally rated more positively than the other conditions. Participants in the In context conditions rated the interaction as more
450
C.P. Bowers et al.
successful, supportive, and usable. They also rated the interaction as being less irritating, requiring less effort and behaving in a way that was closer to what they were expecting. Similar results are found when looking at user response times. Participants in the In context condition responded to questions on average more quickly than those in the Out of context or Opportune conditions. Based on these findings we propose that, in the context of cooperative, mixed-initiative multimedia annotation, it is better to interrupt a user with questions when they are in context with the multimedia content. 6.1 Comparison to Previous Research Results This finding differs from much of the related research in the field. In other studies, it has been found that presenting questions at Opportune moments significantly reduces the negative impact of an interruption [1, 5]. It is instructive to contemplate why we achieved a different result. The interruptions used in this work are qualitatively different from those typically considered in other work. Typically experiments within the interruptions domain explore interruptions that are gross, often in the form of notifications, and represent a complete change in context to the main task. In the work presented here the interruptions are more subtle and closer to the context of the main task. There are two potential explanations for this observed behaviour: • Interruptions may be similar to the main task and thus have a reduced impact on cognitive load. • The user may perceive that responding to an interruption may have a positive impact on their performance on the main task and thus reduce the perceived cost of the interruption. However, neither of these explanations can be examined within the bounds of the experiment reported here and this is principally due to a number of limitations. 6.2 Limitations and Further Work Our participants’ only task was to watch the video and answer questions: they did not perform any specific annotation themselves. Interruptions may have been more costly if the user had to perform some annotation of their own. Conversely, if the user had been more aware of the impact of responding to questions upon their performance on the task, perhaps through changes to the annotation state, then the perceived cost may have been lower. A more detailed discussion of the cost of interruptions should be based around a more engaging primary task for the user, preferably one for which there is an easily quantifiable outcome. This is because a more demanding task might alter the users’ perception of disruption as well as how quickly they respond to it. Moreover, in order to assess an objective cost for a disruption we need to measure the actual impact on the users’ task performance as well as their impact perceptions. It may well be the case that in our more limited scenario cognitive load of users may have been low and thus the differences between the Context conditions less pronounced. This could account for the lack of impact of the Opportune condition. It is also possible that a significant overlap between the In-Context and Opportune
Choosing Your Moment: Interruptions in Multimedia Annotation
451
condition might confound the results. This may occur if queries are presented immediately after the relevant portion of the video and this happens to correlate with a clear shot boundary. However, as can be observed in table 1, there is only one case in which a clear overlap occurs between the Opportune condition for question 5 and the In-context condition in question 6. However, the test performed in this experiment is still representative of the form of interruption in a system like CASAM. It is therefore a useful initial step in understanding the impact of interruptions in such a system. Ideally, the user would not need to do any annotation, and the system would only ask the user questions when necessary. As such, the experiment described in this paper is a useful test to see how users respond. Other limitations include the use of a single short video. It would be useful in future studies to get participants to interact with the software for much longer period using a range of videos. Additionally, it would be interesting to run some longitudinal studies where participants used the tool for an hour or two on a daily basis. The current prototype would obviously need to be enhanced for longitudinal studies, but this type of test over time would be useful in understanding the true cost of interruptions and how they impact the user’s experience. It would also be interesting to examine how the urgency of a question influences user perceptions. There may be times when the system has an important question that is critical to the automated annotation process. How should the system communicate this urgency to the user? Is it acceptable for the system to immediately interrupt the user from their current task and present the question? Does communicating the importance of the question lower the negative impact of the interruption? Or should the system always wait for the most opportune time to interrupt users?
7 Conclusions This study demonstrates that, in a video annotation task, users tend to prefer interruptions that are In context, that is, at the appropriate moment in the video, regardless of whether this is an Opportune moment or not or whether the query is Important or Trivial. In addition, they also answer them more quickly. This outcome differs from previous research, which may be related to our specific experimental setup, or may be more intrinsically related to the specific task of video annotation. It certainly demonstrates that, if the aim is to have an ‘intelligent’ system undertaking most of the work, and relatively inexperienced annotators using it whose main function is to respond to the system’s questions, then presenting such questions In context is the most effective approach. There is also a broader implication of these results. Although we argue that the interruptions presented to the user are qualitatively different from those typically used in the interruptions literature, we also believe that this form of subtle interruption in context with the overall task is representative of a wide range of domains and thus warrants further study.
452
C.P. Bowers et al.
0.17ns
0.11ns
Usability
4.39*
0.80ns
0.39ns
0.28
ns
1.83
ns
0.12
ns
0.96ns
6.54
***
Overall Perception
3.84
*
Irritation
4.40*
0.18ns
0.48ns
*
ns
0.62ns
Expected Behaviour
Effort
3.70
Support
11.82***
0.26ns
2.13ns
Key Points
0.75ns
22.15***
0.18ns
ns
0.00
0.00
ns
1.70ns
Mental Demand
2.34
Quality
2.49ns
0.41ns
0.67ns
ns
ns
1.18ns
Response
3.05
Question Usefulness
0.22ns
1.85ns
0.07ns
Understanding
2.51ns
2.64ns
0.30ns
ns
ns
2.71ns
0.08ns
0.42ns
Rushed
1.41
Response Times
11.09***
1.90
0.80
Sig. Post Hoc Comparisons
6.97**1
Interaction
Main EffectQuestion Type F(1,84)
Interaction Success
Item
Main EffectContext F(2,84)
Table 12. Table of full analysis
In Context > Out of Context** In Context > Opportune* In Context < Out of Context* In Context > Out of Context*** In Context > Opportune* Important > Trivial***
In Context < Opportune** In Context < Out of Context***
* p<0.05 ** p<0.01 *** p<0.001 ns p>0.05
Acknowledgment. This work was support by EU grant FP7-217061.
References 1. Adamczyk, P.D., Bailey, B.P.: If Not Now When? The Effects of Interruptions at Different Moments Within Task Execution. In: Proc. of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2004, pp. 271–278. ACM, New York (2004) 2. Allen, J.F.: Mixed-Initiative Interaction. IEEE Intelligent Systems and their Applications 14(5), 14–23 (1999) 3. Bailey, B.P., Konstan, J.A., Carlis, J.V.: Measuring the Effects of Interruptions on Task Performance in the User Interface. In: IEEE Conference on Systems, Man, and Cybernetics, pp. 757–762 (2000) 4. Bailey, B.P., Konstan, J.A., Carlis, J.V.: The Effects of Interruptions on Task Performance, Annoyance, and Anxiety in the User Interface. In: Proc. of INTERACT 2001, pp. 593–601 (2001)
Choosing Your Moment: Interruptions in Multimedia Annotation
453
5. Bailey, B.P., Iqbal, S.T.: Understanding changes in mental workload during execution of goal-directed tasks and its application for interruption management. ACM Trans. Comput.-Hum. Interact. 14(4), 1–28 (2008) 6. Creed, C., Lonsdale, P., Hendley, B., Beale, R.: Synergistic Annotation of Multimedia Content. In: Proc. of the 2010 Third International Conference on Advances in ComputerHuman Interactions, pp. 205–208 (2010) 7. Cutrell, E., Czerwinski, M., Horvitz, E.: Notification, Disruption, and Memory: Effects of Messaging Interruptions on Memory and Performance. In: Proc. of INTERACT 2001, pp. 263–269 (2001) 8. Czerwinski, M., Cutrel, E., Horvitz, E.: Instant Messaging and Interruption: Influence of Task Type on Performance. In: OZCHI 2000 Conference Proceedings, pp. 356–361 (2000) 9. Horvitz, E.: Principles of Mixed-Initiative User Interfaces. In: Proc. SIGCHI Conference on Human Factors in Computing Systems, pp. 159–166 (1999) 10. Horvitz, E., Kadie, C., Paek, T., Hovel, D.: Models of attention in computing and communication: from principles to applications. Comm. ACM 46(3), 52–59 (2003) 11. Iqbal, S.T., Bailey, B.P.: Effects of Intelligent Notification Management on Users and Their Tasks. In: Proc. of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, CHI 2008, pp. 93–102 (2008) 12. Latorella, K.A.: Effects of modality on interrupted flight deck performance: Implications for data link. In: Human Factors and Ergonomics Society Annual Meeting Proceedings, Aerospace Systems, vol. (5), pp. 87–91 (1998) 13. Mark, G., Gudith, D., Klocke, U.: The Cost of Interrupted Work: More Speed and Stress. In: Proc. of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, CHI 2008, pp. 107–110 (2008) 14. Mcfarlane, D.C.: Coordinating the Interruption of People in Human-computer Interaction. In: Proc. of Human-Computer Interaction (INTERACT 1999), pp. 295–303. IOS Press, Amsterdam (1999) 15. Miyata, Y., Norman, D.A.: Psychological Issues in Support of Multiple Activities. In: User Centered System Design: New Perspectives on Human-Computer Interaction, pp. 265–284 (1986) 16. Monk, C.A., Boehm-Davis, D.A.: The Attentional Costs of Interrupting Task Performance at Various Stages. In: Proc. of the Human Factors and Ergonomics Society 46th Annual Meeting, pp. 1824–1828 (2002) 17. Rubinstein, J.S., Meyer, D.E., Evans, J.E.: Executive Control of Cognitive Processes in Task Switching. Journal of Experimental Psychology: Human Perception and Performance 27(4), 763–797 (2001) 18. Speier, C., Valacich, J.S., Vessey, I.: The influence of task interruption on individual decision making: An information overload perspective. Decision Sciences 30(2), 337–360 (1999) 19. Zijlstra, F.R.H., Roe, R.A.: Temporal factors in mental work: Effects of interrupted activities. Journal of Occupational and Organizational Psychology 72, 163–185 (1999)
Attention and Intention Goals Can Mediate Disruption in Human-Computer Interaction Ernesto Arroyo1 and Ted Selker2 1
Department of ICT, Universitat Pompeu Fabra, Barcelona, Spain [email protected] 2 Carnegie Mellon Silicon Valley Moffett California, USA [email protected]
Abstract. Multitasking environments cause people to be interrupted constantly, often interfering with their ongoing tasks, activities and goals. This paper focuses on the disruption caused by interruptions and presents a disruption mediating approach for balancing the negative effects of interruptions with respect to the benefits of interruptions relevant to the user goals. Our work shows how Disruption Manager utilizing context and relationships to user goals and tasks can assess when and how to present interruptions in order to reduce their disruptiveness. The Disruption Management Framework was created to take into consideration motivations that influence people’s interruption decision process. The framework predicts the effects from interruptions using a three-layer software architecture: a knowledge layer including information about topics related to the ongoing activity, an intermediate layer including summarized information about the user tasks and their stages, and a low level layer including implicit low granularity information, such as mouse movement, context switching and windowing activity to support fail-safe disruption management when no other contextual information is available. The manager supports implicit monitoring of ongoing behaviors and categorizing possible disruptive outcome given the user and system state. The manager monitors actions and uses common sense reasoning in its model to compare communication stream topics with topics files that are active on the desktop. Experiments demonstrate that disruption manager significantly reduces the impact of interruptions and improve people’s performance in a multiapplication desktop scenario with email and instant messaging. In a complex order taking activity, disruption manager yielded a 26% performance increase for tasks prioritized as being important and a 32.5% increase for urgent tasks. The evaluation shows that the modulated interruptions did not distract or troubled users. Further, subjects using the Disruption Manager were 5 times more likely to respond effectively to instant messages.
Attention and Intention Goals Can Mediate Disruption
455
1 Introduction The increase of information overflow and continuous request for user attention [20] makes interruptions a common occurrence in computing environments. Users might delegate another person or a computerized intelligent agent to perform tasks on their behalf to avoid cognitive overload and successfully perform multiple tasks. With enough delegation, management of it can also cause interruption [27]. This paper focuses on the disruption caused by interruptions presented to users by computer technologies as the result of a request for synchronous or asynchronous communication. Careful design of interrupting notifications might help reduce disruption effects on people’s performance on ongoing activities. But in order to keep up with the ever-increasing demands for user attention, future computers environments will require active mediators that are able to interpret and recognize the value of communication. Indeed, mediating approaches have been helpful in developing successful software agents such as spam filters that act on behalf of the user. Our work deepens such systems into the area of mediating people’s communications with friends and colleagues so that people are able to maintain concentration amidst their busy different lifestyles. While it is possible to restrict all interruptions so that they do not disrupt the ongoing task, sometimes interruptions represent important corollary tasks, opportunities you never want to pass up, or being responsible for tasks of even a higher priority such as health. We argue that user goals and motivations should take precedence over micro-task benefits in considering which interruptions to block or attempt to block. To illustrate this argument, we present a Disruption Management Framework designed as a modeling method for designing computer-based disruption managers. The framework outlines some of the factors needed to mediate disruption in computing activities regarding the interruption context and its relationships to the user’s goals. 1.1 Interruptions, Disruption, and Distraction Interruptions are an everyday and normal part of human behavior. People frequently interrupt communication dialogue, such as an unanticipated request for topic switching while having a conversation [29]. An interruption can be defined as an unanticipated request for task switching from a person, an object, or an event while multitasking [29][30][24][15]. Interruptions typically request immediate attention and insist on action [10], and reduce productive focus. If an interruption is allowed to distract the user into action, it escalates into disruption [23]. Thus, disruption is defined as the negative effects on a primary task from interruptions requiring transition and reallocation of attention focus from one activity to another. This paper will use the term disruption to accommodate a situation in which a request has been accepted from another task, causing a negative effect on the ongoing activity. Although the main focus of this paper is disruption it is important to mention that distractions and interruptions are similar in that they can both occur while the user performs a primary activity. For instance distraction conflict theory represents a research stream investigating how distractions can be ignored or processed at the same time as the primary activity [17]. Unlike distractions, interruption-disruption shares the same sensory channel as the primary activity and encompasses a task that
456
E. Arroyo and T. Selker
could be performed. Our main interest is in on interruptions that result in capacity and structural interference and that disrupt the ongoing activity; negatively affecting human performance [18]. 1.2 Interruption Management Coordinating interruptions involves one or more person’s modes of activity: cognition, perception, or physical action. Interruption coordination is defined as the method by which a person shifts their focus of consciousness from one processing stream to another [29]. How efficiently and effectively interruptions are handled by users might depend on characteristics of the notification itself and characteristics of the ongoing activity [7]. However, people’s reactions to interruptions and perceived disruption are principally affected by goal oriented strategies adopted to evaluate incoming interruptions in order to accomplish their goals [24], [28], [5]. For example, a person who works at an office is more likely to take an incoming phone call from a co-worker while at work than when he is on his way home. It is common for people to juggle several competing goals at one time, while their priorities might change depending on various factors. This is exemplified in a diary study of office work, which reported frequent and deliberate task-switching activities [14]. Similarly, residential interviews and self-reports revealed that willingness to handle interruptions varies across individuals with current location, as well as with current activity [31]. Can a computing system recognize and mediate relative to the underlying factors that influence people when dealing with interruptions? Our work addresses this question and shows how a disruption manager utilizing context and relationships to user goals and tasks can reduce disruptive qualities of interruption requests. 1.3 Disruption Management One of the key questions for the understanding of human disruption is identifying the factors that play a role in people’s decision process regarding interruptions. Previous research has focused on identifying task complexity and its influence on user performance [4][11], the coordination method used to handle interruptions [29], the interruption point at which interruptions arrive [11][21], the similarity between ongoing and interrupting tasks [15], and the interruption modality [24]. We expand this to focus on the factors that influence people’s decision process at turning interruptions into actual disruption. A factor strongly correlated to how people react to interruptions is the level of goal commitment, that is, the importance of the task and the belief that the goal can be accomplished [26] [28]. The importance of the goal to the individual, will affect subsequent reaction to an interruption. We explore the hypothesis that people’s reactions to interruptions and perceived disruption are principally affected by goal-oriented strategies users adopt to evaluate incoming interruptions in order to accomplish their goals. We argue that incoming interruptions are evaluated with respect to the ongoing processes goals and priorities. Therefore, it is possible for people to be influenced by the level of commitment to a task [21][7][36]. If a task is almost completed, people can opt for finishing the task before accepting an interruption and switch to the interrupting task as task priorities are constantly changing.
Attention and Intention Goals Can Mediate Disruption
457
2 Disruption Mediation Framework Disruption Mediation Framework (DMF) introduces a multilayer software model that separates important constituents of disruption and aids in creating a computer-based disruption manager. Earlier interruption models [24][32][1][20][25] have not supported continuously monitoring the user and dynamically adapting timing and interrupting modality, among other parameters. Indeed, current interruption models focus on the user, disregarding the context from interrupting applications. Based on our work and existing research we take the approach that goal concepts and task context serve as important factors in predicting disruption. The DMF is influenced by existing literature, and is especially an extension of our explorations in three areas: tracking user activity, identifying ways of tracking ways to interrupt people, and understanding the factors that influence the interruption decision process (task completion level, task type, task complexity, task switching strategies, etc.) [3][4][12][22][35]. The DMF identifies a decision process that evaluates incoming interruptions with respect to the current state of the ongoing activity or situation. It determines the priorities of each interruption to decide when to execute the task associated with the interruption. Previous work investigating interrupting modalities has shown that the decision process is affected by people’s individual differences, such as prior experiences, motivations, and psychological factors [1][15][19]. The DMF also identifies goal priorities, goal commitment, and goal relevancy as other factors affecting the interruption-decision process. The DMF introduces goals and relevance (to these goals) as a main factor for mediating interruptions. The interruption process is closely related to attention (according to the information-processing model). Attention also determines which goal concepts are relevant. These concepts then can also determine which tasks are important or have higher priority. For this reason the DMF uses concepts to provide a cognitive representation and offer insight into the user’s attention. The DMF also includes, user activity, and task context as other important factors that allow examining incoming interruptions in order to evaluate their potential for disruption (see Fig. 1). Our approach is meant to be usable and easily implemented for any interactive digital system. 2.1 Relationships between Goals and Concepts SuwatanaPongched [34] classified interruptions into three categories: 1) Interruptions relevant to the primary task that assist completion of the primary task. 2) Interruptions irrelevant but related to the primary task, although not contributing to completion of the primary task. 3) Interruptions irrelevant and unrelated to the primary task. These interruption categories are useful; however, people often perform more than one task at a time in order to accomplish their goals as several tasks can be grouped as being part of a single goal. The above interruption classification schema can be extended to include the relationship to the user’s goals. That is, interruptions irrelevant to the primary task can be related to the user’s goal, thus contributing to completion of one of the user’s goals. Some of these goals are unique in that the user might be willing to sacrifice a certain amount of primary task attention in order to achieve them. Literature agrees that irrelevant, unrelated interruptions can be harmful to the primary task, and that they elicit frustration and anxiety [11][13][15][29]. Our work has also shown that people can benefit from interruptions if they are relevant to the ongoing task or the user’s goals [35].
458
E. Arroyo and T. Selker
Fig. 1. The Disruption Management framework (DMF) identifies goal concepts, user activity, and task context: the three main components to mediate and filter disruption from implicit metrics
Current approaches for reasoning about the user’s goals have focused on sensing user actions directed at achieving a domain-specific goal [16]. This approach is dependent on previous examination of the desired domain and is limited to known domains or the domain itself. Instead, the DMF uses concepts such as topic of documents, and communication derived from the user environment as a way to reason and think about the underlying user’s goals. Tools such as, WordNet1 and ConceptNet2 can support context-oriented inferences over real-world texts [33], by supporting semantic similarity computations and by performing query expansion from a given concept. These semantic knowledgebase systems make reasoning about goals through concepts a practical approach. 2.1 Activity Monitoring The DMF focuses on supporting people’s goals as a mean to reduce disruption. The degree to which interruption mediated interfaces support the user goals is a key factor that determines their success. The success is also determined by inferences generated from interpreted knowledge; we call virtual sensors [25]. Thus, The goal is to identify “sensors” that generate sufficient knowledge independent from domain-specific sources. These domain-independent sources provide the basis for mediating disruption when no other data sources are available. Low-level factors, such as micro tasks and HCI interactions (mouse and keyboard behavior, windowing activity, etc.) can be used for mediating interruptions, as they provide a fail-soft system response when no domainspecific data is available. 2.3 Tasks The benefit from accepting interruptions should be balanced with respect to the ongoing task. The challenge consists of balancing interruptions at a task level while 1 2
Princeton University "About WordNet." http://wordnet.princeton.edu ConceptNet http://csc.media.mit.edu/conceptnet
Attention and Intention Goals Can Mediate Disruption
459
supporting the user’s goals. Our approach extends and makes use of some constructs defined by previous research in the area of interruptions. This research indicates that disruption is related to several task factors: 1) Task complexity and similarity of the primary or interrupting task [15][22], 2) Interruption relevancy to the primary task, task stage when interrupted [13], 3) Interruption coordination method [29], 4) Modalities of the primary task and interruption [2][24]. In our framework task context includes Number of Tasks, Task level, Task Type, Task Complexity, Task Difficulty, and Task priority.
3 Disruption Manager We developed a test-bed to evaluate dynamic interruption systems. The test-bed allows the examination of the relation between ongoing behaviors, task actions, goals and interruptions. The application is implemented as a three-layer architecture system (see Fig. 2). A low-level layer includes implicit low granularity information such as keystrokes and mouse movement activity. An intermediate layer includes the activities and information to which some of the low granularity data can be extracted and summarized, such as reading, switching tasks, paying attention, etc. And a top layer or knowledge layer includes the information or concepts related to the user goals.
Fig. 2. Disruption manager three-layer architecture: A low level layer includes implicit low granularity information (top). An intermediate layer includes the user activities and information to which some of the low granularity data can be extracted and summarized (middle-left). And knowledge layer includes the information or concepts related to the user goals (right).
The disruption manager monitors the user state (current activity), concepts surrounding the user’s goals: history of recently accessed documents, web pages and search queries, the interrupting message relevance to these concepts, and concept priority. The manager then identifies messages that should be allowed to interrupt the user or delayed to an appropriate time within task execution. The disruption manager
460
E. Arroyo and T. Selker
uses monitoring modules to track the user state, concepts surrounding the user’s goals and interrupting message concepts. The system has one module for each context (goals, tasks, and activities) in which the interruption content is examined and a decision module that mediates interruptions on multiple services based on the evaluations of the context modules. The mediator uses several auxiliary modules for interfacing with an Instant Messaging client to both read in the Instant Message (IM) content and manipulate the timing and presentation of the IM. A context module that uses natural language on the goal-level layer; experience, interest, and reading for the intermediate layer; and mousetracking, task-stage, and task-exit for the low-level layer. These architectural elements in disruption manager are described in the following section. 3.1 Context Modules Each Context Module is responsible for evaluating a particular aspect of the interruption, the system, and the user. These modules are derived from aspects of the disruption model. The modules convey their evaluations to the Decision Module as appropriateness of showing the IM at a given moment. The manager’s top level monitoring layer uses Google Desktop3and ConceptNet [37] engines as services running on the user’s computer. Google Desktop keeps an up-to-date index of files and documents and their contents. ConceptNet is a commonsense knowledgebase with facts from the Open Mind Commonsense corpus [33]. Its concise semantic network contains 200,000 assertions and supports practical textual-reasoning tasks over real-world documents. Natural Language Module. (NL) implements the part of the disruption module concerning the relatedness of the content of the IM to documents the user is working with. It uses natural language processing and commonsense reasoning to develop an understanding of the interruption and documents, and attempts to compare the interruption to each document. These comparisons are aggregated into a relevance score, indicating the relevance of the IM to all documents examined. A separate score focuses on relevance to the current document. The NL Module has three major components. The first is responsible for obtaining the text of an IM (this is done through the IM Interface). The second component is responsible for locating files of interest and obtaining the contents of those files. The manager application queries the Google Desktop Engine for recently accessed documents, files of interest (PDF, DOC, PPT, etc), emails, IMs and web pages and parses them using ConceptNet and a natural language processing engine. Files of interest are the files open on the user’s computer, as well as recently viewed documents and webpages. The system also obtains a list of open files using VBScript and the Microsoft PsTools4 library. The system then uses Google Desktop [39] to locate and read those files and to search for web pages viewed in the past hour in the web cache, and to find and read the documents in the user’s My Recent Documents folder. The third component uses document-level functions in ConceptNet (text normalization, commonsense-informed part-of speech tagging, semantic recognition, chunking, surface parsing, thematic-role extraction, and pronominal resolution) to extract 3 4
Google Desktop: http://desktop.google.com/ PsTools http://www.microsoft.com/technet/sysinternals/utilities/pstools.mspx
Attention and Intention Goals Can Mediate Disruption
461
the verb-subject-object-object frames from recently accessed documents. The entire contents of both the IM and all of the retrieved documents are individually fed into the MontyLingua5 natural language processing suite. The MontyLingua suite provides both lexical parsing of text and commonsense reasoning through the OpenMind6 commonsense database The NL Module extracts from MontyLingua interpretation key words and concepts in the texts, uses a thesaurus to find possible synonyms for those words and concepts, and then counts the number of times the important words, concepts, and synonyms from the IM appear in the other documents. The NL module then extracts all the concepts in a document, assigns them saliency weights based on lightweight syntactic cues, and computes their weighted contextual intersection. Concept connections in ConceptNet’s semantic network allow the contextual neighborhood around a concept to be found by spreading activation radiating outward from a source concept node. The more frequently the number of important words or concepts appears, the more relevant the content of the IM is likely to be. The output of the NL module is the average number of times a key word or synonym in the IM appears per sentence in all the searched documents. This module allows the manager to summarize text of active documents, identify the documents gist topics, evaluate notifications, capture and classify incoming messages, detect if actions are required from the user, keep track of topics related to ongoing and past goals, and determine if incoming interruptions should be presented to the user. Mouse-tracking Module (MT). This MT module is composed of three components and represents the portion of the disruption module concerning the user’s familiarity with and their depth of involvement in their task. This component records the user’s mouse movements and reasons about the user by comparing his or her mouse movements to a database of many people’s mouse movements. The Mousetracking Module serves to determine the experience of a user with a website, the interest level of a user in a website, and whether the user is reading or scanning a website. This module is based on experimental data monitoring mouse activity while browsing websites [8]. The manager’s low-level proxy-based layer installed on the user’s computer monitors and categorizes mouse movement activity into low granularity behaviors (scrolling, menu, text input, clicking) and user states (reading, deciding, scanning, and waiting). The manager uses a naïve Bayes classifier and wrapper based feature selection because it is computationally inexpensive, which is important when running on a web-browser due to its limited resources. The mouse tracking module outputs data representing the percent experience, the percent interest, and the percent likelihood the user was closely reading a webpage. The Decision Module uses these heuristics to estimate the depth of user involvement with their current task, with the idea being that the more deeply involved a user is with their current task, the more costly it is to interrupt the user. Task Stage Module (TS). The TS module is responsible for determining if a user is at the beginning, in the middle, at the end, or between tasks. It does this by looking for discontinuities or changes in keyboard and mouse usage and windowing behavior. This is used to determine the user cognitive load based on HCI interactions. The TS 5 6
looks for significant changes to the number of keystrokes per minute, mouse time per minute, or windowing behavior indicating that the user is changing tasks, or subtasks, as at those moments interruptions are more likely to be less disruptive [21]. Existing Tasks Module (ET). The ET module attempts to gain an understanding of persisting tasks the user may have, even though they are not currently working on them, and corresponds to the part of the disruption module which determines if interruptions relate to other tasks the user has but may not be currently working on. The ET module returns the percentage to which an interruption is appropriate for past tasks. This information is a heuristic for how often the user cares about what the interruption is about, and thus also on how significant tasks related to the interruption are to the user. Decision Logic Module (DL). The DL module is the central component of the disruption manager. Whenever an IM arrives, DL determines the appropriateness of that message. It polls all Context Modules for their evaluations of the IM, and decides how to proceed. Once the disruption manager decides an interruption should be presented, it delays the interruption until an appropriate time in order not to disrupt the ongoing micro-task or activity. If the interruption is relevant to the user’s goal, the manager gives priority to this interruption, and presents it as soon as possible; while minimizing disruption on the ongoing task. Delaying standards are lowered over time to guarantee the message will be displayed within 2 minutes. In addition, the manager’s decision rules are based on findings showing that interruptions related to topics the user has worked on in the past, have the potential to be significant to the user’s goals and should be allowed [5][22][28]. Thus, the manager limits the number of unrelated interruptions in order to reduce perceived disruption. The manager also limits interruptions whenever this ratio increases and allows interruptions related to prioritized topics whenever confidence values are above predefined thresholds.
Fig. 3. Disruption manager’s Decision Logic Module: a layered filtering process determining the stages that each interruption must go through before being delivered to the user
Attention and Intention Goals Can Mediate Disruption
463
The decision module for the disruption manager is implemented as add-on to Trillian™; a stand-alone chat client that supports AIM, ICQ, MSN, Yahoo Messenger, and IRC. This allows the disruption manager to be easily deployed and integrated into current systems without requiring the user to lose existing contacts, learn a new interface. The IM client provides unique customization functionality, such as, contact message history, and an advanced automation system to trigger events based on anything that happens in the client. This allows the disruption manager to “catch” incoming interruptions and control them (see Fig. 3).
4 Evaluation An experiment evaluated Disruption Manager’s effectiveness in mediating interruptions compared to existing interfaces. The experiment elicits disruption by presenting interruptions while the user is engaged on a task. These interruptions present themselves as notifications that carry a message associated with them, requiring the user to switch to another task. In addition, the message associated with each interruption can either be related or unrelated to the user goals. The experiment evaluated Disruption Manager with respect to productivity and user satisfaction. Productivity refers to objective performance metrics such as overall goal completion, the time taken to finish an activity, task or goal. While user satisfaction refers to subjective metrics designed to evaluate the perceived user satisfaction for given tasks and overall goals. The main dependent variables were performance and perceived disruption. Several other variables were used to confirm the task was performed properly. These variables included task time, number of notifications attended to, time spent on each email, and STAI7 score. 4.1 Hypothesis 1. People using Disruption Manager will have higher performance than people in the No-Manager condition. 2. People using Disruption Manager will be more efficient in their task than people in the No-Manager condition. 3. People using Disruption Manager will report less perceived disruption than people in the No-Manager condition. 4.2 Method The effectiveness of Disruption Manager was assessed using a between subjects experimental design. Subjects were randomly assigned to one of two conditions: 1) Communications mediated by Disruption Manager or 2) Communications presented normally as the IM application receives them (No-Manager). In the Disruption Manager condition, the manager controls email notifications presented based on whether the email is related to the ongoing activity and several factors. Fig. 3 shows the filtering stages that each interruption must go through before being delivered to the user. The manager allows people to complete the task without unnecessary 7
State-Trait Anxiety Inventory for Adults.
464
E. Arroyo and T. Selker
distractions. That is, related IMs are presented (almost) right away so that the subjects can benefit from the IM. On the other hand, irrelevant IMs are delayed until a subtask is finished the. The manager’s behavior can be summarized with the following rules: • • •
•
•
Relevant IM are presented after small changes in activity, such as quick task switches, or after finished finding an item, updating values, text entry, etc. Irrelevant IM are presented after subjects finish gathering data for one customer, or finish sending email. Allow Instant Message notifications if relevant to current email /customer request (active email, document, or webpage). o Relevant presented almost immediately. o Wait until finished task or task switch. Delay Instant Message notifications if relevant or moderately relevant to current email /customer request (active email, document, or webpage) o Wait until a task break. o Wait for a task switch only Delay Instant Message notifications if not relevant to current email /customer request (active email, document, or webpage). o Wait until email sent or task completion.
User Task Scenario. The scenario consisted of customer service and order processing activity for an e-commerce site. We simulated a typical small business environment where customer service representatives take email orders from several customers and process each order individually trying to satisfy the customer’s demands and complete a sale. This represents the most commonly reported task for 77 million workers who used computers at work in 2003 according to a Survey by the USA Bureau of Labor and Statistics [6]. Subjects were told that customer service representatives obtain a commission based on their sales and instructed subjects to play the role of a customer representative. Adding this role guaranteed that subjects would perform the task to the best of their abilities and encouraged subjects to obtain a bigger profit. The customer service scenario serves as a good proxy for representing high-level goals. Rather than specifying detailed task-based goals, the scenario presents an abstract goal requiring subjects to create their own definition of the goal based on the task constrains. The abstract task goal is to satisfy the customer demands based on their requirements. This type of settings addresses the situation where a system might not be able to identify the exact representation of the user’s goals, and relies instead on the specific concepts surrounding the user’s goals. Subjects were instructed that the experimental setup evaluated an automated system that filtered customer e-mails and an Instant Message (IM) system for price updates, and requests from fellow employees. Centering the experiment on the email manager system focused subjects’ attention o the email content, rather than on their own reactions to IM. This warranted that people would react to interruptions, as they would normally do. User Task Description. The tasks presented in the scenario are similar to those used in a customer service task done at a large supply company in the purchasing department. Small to medium e-commerce sites still require human intervention and process customer orders in a similar way. The main task is focused on processing email orders; the task
Attention and Intention Goals Can Mediate Disruption
465
includes different types of customers, and the importance of satisfying customer’s requests. The tasks are controlled by two systems for improving productivity: 1) An automated system that assigns different type of customers to different employees through the day, as to maintain a balanced workload. The system classifies and sorts customer emails depending on the type of service requested and the customer’s demands (high accuracy, fast response, high volume, and low-volume clients). 2) An instant messaging system that allows its employees to share pricing information with one another. Sharing information benefits the company and participants receive a bonus based on the company performance. User Task details. Each order included a short email script where customers described which items they wanted to buy and why. The short email script included enough information to convey the expectations and motivations from real clients and hint the condition type. Each email script was designed so that it would remind subjects how they should process the email and to reduce the task completion time. Subjects worked with an email client that sorted customer emails and placed them in separate folders. Each of these folders had to be completed before moving to the next one. The task required subjects to scan their email folders, decide which type of customers they might be working with and what type of service to provide these customers. The subjects then found the listed price for the items requested by each customer, and arranged the products so that the customer was able to buy as many items as possible while accommodating their preferred products; all within the customer’s budget. Subjects were encouraged to use their own intuition and taste in order get them involved in the task. Two out of four emails had artificially introduced budget errors (similarly to a customer error). Errors required subjects to drop one or two items from the order, correct the item price, and recalculate the order quantities based on the customer budget restrictions. Emails with no errors received prices updates so that the update could be reflected on the email price. The errors ensured subjects devoted enough attention to the task and to keep the task from becoming monotonous. The number of items per email order varied from 3-5 items to keep the scenario more realistic and short enough not to overwhelm subjects, and to provide enough time for interruptions in the middle of the task. Pilot Studies. Several pilot studies showed that we had indeed created a cognitively effortful scenario and that subjects should perform a minimum of two email orders before becoming familiar with the task. As a result of these pilots several changes were introduced: 1) the length of each condition was reduced to 10-15 minutes long. 2) The number of emails to be processed by condition was reduced from an earlier experiment with 6 emails to 4 emails so that subjects wouldn’t be overwhelmed with too many orders needing processing. 3) The number of interruptions was reduced from 5 interruptions per email to 3 in order to allow subjects to successfully complete the order without excessive interruptions. These pilot explorations also helped choose the quality and urgency tasks to focus on in the formal experiment. 4.3 Protocol Each mediator condition was presented in three stages, an introduction stage, a quality stage, and an urgent stage. The introductory stage served to fully familiarize subjects
466
E. Arroyo and T. Selker
with the experimental task. This within-subjects condition explored how the task is performed when the task is highly prioritized. The quality and urgent conditions were selected because the exhibited similar traits in pilot experiments. 40 subjects were randomly assigned to two conditions: Disruption Manager, and No manager. Subjects were first presented with the interface and a walkthrough of the task based on a script previously rehearsed by the experimenter. In order to obtain a consistent response to interruptions, the walkthrough included an exemplification of potential interruptions and how they should be deal with them A practice session allowed participants to become familiar with the interface, familiar with the content, and familiar with the interrupting messages. The practice session also allowed subjects to identify the benefits from attending to interruptions. The practice session lasted until subjects completed all questions and were satisfied with their answers. On a second practice run, timed sections were introduced in order to introduce this feature and allow subjects to experiment with different navigations techniques. 4.4 Results The hypothesis regarding performance was confirmed. Mediating interruptions yielded higher performance than without mediation supported by planned comparisons indicating a significant difference on performance based on the manager type F(1, 37) = 473.92, p<.001. The disruption manager conditions showed 26% and 32.5% performance increase over no mediation for the quality and urgent tasks respectively. Additionally to the increases in individual performance, Disruption Manager subjects were more able to share pricing information by replying to IMs, therefore improving overall goal completion or company’s profits. This was confirmed by the hypothesis stating that subjects under the Disruption Manager condition would be more efficient in their task. The ratio of IMs responded to was about 5 times higher for the disruption manager condition. This indicated that the manager did much better at presenting interrupting messages (relevant information) at the right time. Participants on the manager condition responded to 58% and 51% of the IMs received for quality and urgent tasks. Whereas, they only responded to 12% and 8% of the messages on the No-manager condition, see Fig. 4.
Fig. 4. Instant Message Response Ratio: Subjects responded to instant messages 5 times more when using Disruption Manager while dealing with quality and urgent goals
Attention and Intention Goals Can Mediate Disruption
467
Perceived Disruption. There was no main effect of manager type in perceived disruption F(1,37) = .089 p=.7, nor were there any significant contrasts between Quality and Urgent tasks. Thus, our third hypothesis that subjects would perceive less disruption was not confirmed. Both manager categories demonstrated a similar moderate (0.5) disruptive effect across all task categories: 0.5 - 0.6 with no manager and 0.5 – 0.5 with disruption manager. While more data might show a main effect here, the shear point that automated disruption management did not increase perceived disruption and that Subjects were not aware of the system being different is clearly exciting and present here.
5 Discussion We have presented a Disruption Management Framework and Disruption Manager System that reduces disruption by taking into account the user goals tasks, and activities. Our results show a general computer interface capable of evaluating incoming interruptions in relation to their benefits to the user’s goals and the disruption to the ongoing task, improves performance and overall productivity. Our evaluation shows a 26% and 32.5 % performance increase for task prioritized by quality and urgency. The manager also increased goal and task efficiency by presenting relevant information at the right time and allowing participants to respond to 5 times as many interrupting messages. Furthermore, participants showed no perceived difference in disruption for IM’s delayed by our intelligent system. Delaying IM’s by variable amounts of time did not interfere with the user goals and did not increase perceived disruption. Disruption manager can in fact reduce unnecessary disruptions without reducing a person perceived control. This work represents a new style of interface, one that can recognize and mediate important communication effectively. The approach improves a user’s ability to work fast or accurately by attending to what the computer knows will be more relevant.
6 Conclusions and Future Work In an increasingly complex information world, mediating information is critical and new tools must be developed to gather information about attention from low-level data. However, we must consider that as new sensors are being developed, their implementation is time consuming and user acceptance increases slowly. Therefore, domain-independent sources of information are still usable as means to provide failsafe disruption mediation. In this paper we used several tools developed to investigate attention and user interest from domain in dependent low-level metrics. These tools demonstrate that a disruption manager is able to rely on Goals, Tasks, Implicit Metrics, Activity, Mediator Filters, Sensors and virtual sensors. These tools are bound to improve; everyday new technologies and approaches provide the much-needed insight into people’s computing activities. More and more software acquires an ability to decide when it is appropriate to accept or project information to and/or from particular people, particular places, and particular projects. This will have profound influence on the social dynamics between people, the ability for people to accomplish their work, their homework, their social responsibilities, and even their own goals. In addition to mediating interruptions based
468
E. Arroyo and T. Selker
on goal-related information, it is important to understand how people manage interruptions and what makes interruptions be perceived as disruptive. Future work should investigate the factors that influence people’s decision process at turning interruptions into actual disruptions. For instance, interruptions relevant to user’s goals might not be perceived as disruptive, even if unrelated to the ongoing task. In these situations people might be more likely to switch to the interrupting task in order to accomplish their goals, or not. Further information is needed regarding the user task priorities and their influence on perceived disruption: How do changes in task priority affect people’s reactions to interruptions? What are the people’s strategies and reactions for dealing with interruptions?
References 1. Altmann, E.M., Gray, W.D.: Managing attention by preparing to forget. In: Proceedings of the IEA 2000/HFES 2000 Congress, pp. 152–155. Human Factors and Ergonomics Society, Santa Monica (2000) 2. Arroyo, E., Selker, T., Stouffs, A.: Interruptions as Multimodal Outputs: Which are the Less Disruptive? In: IEEE International Conference on Multimodal Interfaces (ICMI 2002), Pittsburgh, PA, pp. 479–483 (October 2002) 3. Arroyo, E., Selker, T.: Self-adaptive multimodal-interruption interfaces. In: Proceedings International Conference on Intelligent User Interfaces IUI 2003 (2003) 4. Bailey, B.P., Konstan, J.A., Carlis, J.V.: The effect of interruptions on task performance, Annoyance, and Anxiety in the User Interface. In: IEEE International Conference on Systems, Man, and Cybernetics (2000a) 5. Bailey, B.P., Iqbal, S.T.: Understanding Changes in Mental Workload During Execution of Goal-directed Tasks and Its Application for Interruption Management. ACM Transactions on Computer Human Interaction (TOCHI) 14(4), 21–56 6. Bureau of Labor Statistics Current Population Survey for Computer and Internet Use at Work, http://www.bls.gov/news.release/ciuaw.nr0.htm 7. Burmistrov, I., Leonova, A.: Do interrupted users work faster or slower? The microanalysis of computerized text editing task. In: To appear in: Proceedings of 10th International Conference on Human-Computer Interaction (HCI International 2003) (2003) 8. Chen, M., Anderson, J.R., Sohn, M.: What Can a Mouse Cursor Tell Us More? In: Correlation of Eye/mouse Movements on Web Browsing. In: Ext. Abstracts CHI 200. ACM Press, New York (2001) 9. Claypool, M., Le, P., Waseda, M., Brown, D.: Implicit interest indicators. In: Proceedings of the 6’4 International Conference on Intelligent User Interfaces (IUI 2001), USA, pp. 33–40 (2001); Covey: pp. 150–152 (1989) 10. Covey, S.R.: The Seven Habits of Highly Effective People. Simon and Schuster, Inc., New York (1989) 11. Cutrell, E., Czerwinski, M., Horvitz, E.: Notification, Disruption and Memory: Effects of Messaging Interruptions on Memory and Performance. In: Hirose, M. (ed.) HumanComputer Interaction INTERACT 2001, Tokyo, July 9-13, pp. 263–269. IOS Press(for IFIP), Amsterdam (2001) 12. Czerwinski, M., Cutrell, E., Horvitz, E.: Instant Messaging and Interruption: Influence of Task Type on Performance. In: Paris, C., Ozkan, N., Howard, S., Lu, S. (eds.) OZCHI 2000 Conference Proceedings, pp. 356–361 (2000-B)
Attention and Intention Goals Can Mediate Disruption
469
13. Czerwinski, M., Cutrell, E., Horvitz, E.: Instant Messaging: Effects of Relevance and Time. In: Turner, S., Turner, P. (eds.) People and Computers XIV: Proceedings of HCI 2000, vol. 2, pp. 71–76. British Computer Society (2000) 14. Czerwinski, M., Horvitz, E., Wilhite, S.: A diary study of task switching and interruptions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vienna, Austria, pp. 175–182 (2004) 15. Gillie, T., Broadbent, D.: What makes Interruptions Disruptive? A study of length, Similarity and Complexity. Psychological Research 50, 43–250 (1989) 16. Gievska, S., Sibert, J.: Examining the Qualitative Gains of Mediating Interruptions during HCI. In: The Proc. of HCI 2005 (2005) 17. Groff, B.D., Baron, R.S., Moore, D.L.: Distraction, attentional conflict, and drivelike behavior. Journal of Experimental School Psychology 19, 359–380 (1983) 18. Hess, S.M., Detweiler, M.C.: Training to reduce the disruptive effects on interruptions. In: Proceedings on the Human Factors and Ergonomics Society 38th Annual Meeting, pp. 1173–1177 (1994) 19. Horvitz, E., Apacible, J.: Learning and reasoning about interruption. In: Proceedings of the Fifth International Conference on Multimodal Interfaces, Vancouver, BC, Canada, vol. 29 (November 2003) 20. Horvitz, E., Kadie, C., Paek, T., Hovel, D.: Models of attention in computing and communication: From principles to applications. Communications of ACM 46(3), 52–59 (2003) 21. Iqbal, S.T., Horvitz, E.: Disruption and Recovery of Computing Tasks: Field Study, Analysis and Directions. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2007), San Jose, California, USA, pp. 677–686 (2007) 22. Iqbal, S.T., Bailey, B.P.: Understanding and Developing Models for Detecting and Differentiating Breakpoints during Interactive Tasks. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2007), San Jose, California, USA, pp. 697–706 (2007) 23. Jackson, T.W., Dawson, R.J., Wilson, D.: The cost of email interruption. Journal of Systems and Information Technology 5(1), 81–92 (2001) 24. Latorella, K.: Effects of Modality on Interrupted Flight Deck Performance: Implications for Data Link. In: Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting, Chicago, IL (October 1998) 25. Lieberman, H., Selker, T.: Out of Context: Computer Systems that adapt to, and learn from, context. IBM Systems Journal 39(3-4), 617–632 (2000) 26. Locke, E.A., Latham, G.P.: Building a Practically Useful Theory of Goal Setting and Task Motivation: A 35-year Odyssey. American Psychologist 57, 705–717 (2002) 27. Maes, P.: Agents that Reduce Work and Information Overload. Communications of the ACM 37(7), 31–40 (1994) 28. Scott, M.D., Chewar, C.M.: Attuning Notification Design to User Goals and Attention Costs. Communications of the ACM 46(3), 67–72 (2003) 29. McFarlane, D.C.: Coordinating the interruption of people in human-computer interaction. In: Sasse, A., Johnson, C. (eds.) Proceedings of Human-Computer Interaction (INTERACT 1999). IFIP, pp. 295–303, page 121. IOS Press, Amsterdam (1999) 30. McFarlane, D.C., Latorella, K.A.: The scope and importance of human interruption in human-computer interaction design. Human-Computer Interaction 17(1), 1–61 (2002) 31. Nagel, K.S., Hudson, J.M., Abowd, G.D.: Predictors of availability in home life contextmediated communication. In: Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, Chicago, Illinois, USA, November 06-10 (2004)
470
E. Arroyo and T. Selker
32. Rasmussen, J.: Information processing and human-machine interaction. An approach to cognitive engineering, p. 215. North-Holland, New York (1986) 33. Singh, P., Lin, T., Mueller, E.T., Lim, G., Perkins, T., Zhu, W.L.: Open Mind Common Sense: Knowledge acquisition from the general public. In: Chung, S., et al. (eds.) CoopIS 2002, DOA 2002, and ODBASE 2002. LNCS, vol. 2519. Springer, Heidelberg (2002) 34. SuwatanaPongched, P.: A More Complex Model of Relevancy in Interruptions (2003) 35. Tsukada, K., Okada, K.I., Matsushita, Y.: The Multi-Project Support System Based on Multiplicity of Task. In: Eighteenth Annual International Computer Software and Applications Conference (COMPSAC 1994), pp. 358–363. Institute of Electrical and Electronics (IEEE), New York (1994) 36. Van Bergen, A.: Task interruption. North-Holland, Amsterdam (1968)
Again?!! The Emotional Experience of Social Notification Interruptions Celeste Lyn Paul, Anita Komlodi, and Wayne Lutters University of Maryland Baltimore County 1000 Hilltop Circle, Baltimore, Maryland, United States {cpaul2,komlodi,lutters}@umbc.edu
Abstract. This paper describes a post-hoc analysis of the relationship between the socialness of an interruptive notification and the emotional tone of the words used to describe the experience through a One-Word-Response (OWR). Out of the 89 responses analyzed, 73% of participants used emotional words to describe their notification experiences. There was a significant relationship between the emotional tone of a OWR response and the socialness of an interruptive notification experience and participants were 3.2 more likely to describe social interruptive notifications with positive words than negative words. Keywords: Emotion, interruption, One-Word-Response, methodology, notification, user experience.
Previous work informs us when during an activity is best to interrupt users [4,15], how to best design software to manage this [3,13,14], and how to interrupt users in specialized domains [1,17]. The literature does not help us understand for what reasons users should be interrupted. Understanding “why” depends highly on understanding the notification and the context and there is little research in this area. We set out to investigate these contextual factors through an exploratory survey. The study asked participants to report a recent interruption experience and their reaction to it. Among other questions they were asked to describe the experience in one word. Emotion is an important aspect of the user experience and influences how users understand, interpret, experience, and interact with technology [2,6]. However, emotion can be difficult to study because it is context dependent. There is no single method that is best for studying emotion and the best method is often determined by the scope of the study and type of data desired [10,12]. There are a range of methods available for studying emotion that include: using pictogram surveys such as the Selfassessment Manikin (SAM) [11] and Emocards [5], multimedia tools such as Experience Clip [16] and 3E [9] that allow participants to create media to express themselves, and physiological measurements such as EKG, heart rate, blood pressure, respiratory rate, and galvanic skin response [12]. The One-Word-Response technique described in this study (see Methodology) is most similar to the participant selfassessment methods used to study emotion.
2 Methodology The purpose of the study described in this paper was to explore the interruptive notification experience with a focus on understanding the context of the experience and to help identify areas for further study. An interruptive notification is a notification that displays in a way that actively draws the user's attention. Emotion was discovered as an emergent factor during analysis which inspired further post-hoc analysis. This paper presents the results of the post-hoc analysis of the relationship between the emotional tone of words elicited from a One-Word-Response question and the socialness of the interruptive notifications collected from the main study. The main study collected 139 responses regarding a participant's recent notification interruption experience using Amazon Mechanical Turk (AMT). AMT was used as a way to gain access to a more representative population sample and an alternative to the common approach of using college students. Participants were recruited and paid through the AMT system. Participation in the study was limited to participants in the United States to help control for English language skills, and was limited to participants who had at least a 95% Human Intelligence Task (HIT) approval rating (the percentage of HITs correctly completed) in order to control for participants who are known to not follow instructions and provide low-quality responses. The study instrument was a web-based form that asked participants to describe a recent interruptive notification experience through a combination of open-ended (text) and closed-ended (selection or scale) questions about the details of their experiences: 1. 2. 3.
Describe the most recent pop-up notification you received (text) Describe what you were doing at the time of the notification (text) How long ago did you receive the notification? (selection)
Again?!! The Emotional Experience of Social Notification Interruptions
473
Table 1. Description of study rounds for data collection
All-Notifications Social-Notifications
Round 1 20 32
Round 2 Round 3 20 20 20 27 Total collected responses Total valid responses Total OWR emotional tone responses
Total 60 79 139 122 89
4.
Describe the type of notification message (e.g. New email message, Software updates available) (text) 5. What application or service did the notification come from (e.g. AOL Instant Messenger, Facebook, Windows Updates) (text) 6. Did you feel that you needed to take action or respond to the notification? (selection + text) 7. Did you take action or respond to the notification? (selection + text) 8. Rate the notification based on the following qualities: Important, Interesting, Urgent, Useful, Valuable (scale) 9. Which of these qualities is the most important to you when receiving any notification? (selection) 10. How often would you want notifications like the one you received in the future? (selection) 11. Using one word, how would you describe the notification you received? (text) The main study was conducted in two parts. The All-Notifications part asked participants to respond based on any recent interruptive notification experience. The Social-Notifications part asked participants to respond based only a recent social-only interruptive notification experience. The study instruments were the same for both parts except that the instructions for the Social-Notifications part specified social-only interruptive notification experiences. Each study part was conducted over three rounds of AMT studies until 20 responses that followed instructions were collected. Separating the studies into multiple rounds is best for AMT studies because newer studies requiring fewer responses have better response rates than older studies requiring many responses. Table 1 describes how the study rounds were conducted. One-Word-Response. One-Word-Response (OWR) is a word association technique that is a short, direct question that requests the participant to respond with a single word. Word association is an elicitation technique that aims to get an immediate reaction rather than a thought-out response. OWR differs from other survey question types in that it is a simple question with a simple response and requires no deliberation to respond to. This question type is common, but we formalize the technique in this study. Although the study instructions asked participants to provide a “description”, to our surprise, many of them gave emotional words as responses. We report the analysis of these emotional responses in this paper. Word association techniques can be susceptible to priming [8], a memory effect due to a previous influence or exposure, and so the responses must be considered within the context of other influences in the study. The OWR in this study was worded as: “Using one word, how would you describe the notification?”
474
C.L. Paul, A. Komlodi, and W. Lutters Table 2. Positive and negative OWR responses with frequency • • • • • • • • • • • •
Easy (1) Fast (1) Fun (1) Good (1) Great (1) Happy (1) Humorous (1) Necessary (1) Polite (1) Remindful (1) Simple (1) Thankful (1)
• • • • • • • • • • • • •
Negative Words Annoying (15) Unimportant (3) Boring (2) Irritating (2) Again?!! (1) Bad timing (1) Bothered (1) Distracting (1) Forgettable (1) Pestering (1) Time waste (1) Untimely (1) Useless(1)
Data Quality. Responses were reviewed to ensure the collected data was about a recent notification experience and not an experience similar to but not the same as interacting with a notification. Each response was evaluated based on two rules: First, if the notification was part of the main task and required a response before the user could continue this main task (For example, a browser security alert requiring the user to accept a cookie before continuing); and second, if the response was a web-based pop-up (For example an advertisement). Responses that met a rule were discarded since participants did not follow instructions. Of the 139 responses collected, 17 responses met one of these rules and were discarded for a total of 122 valid responses collected. Data Coding. Coding rules for notification socialness were developed ad-hoc as part of the study design while OWR emotion coding rules were developed post-hoc in response to the need for additional analysis. Notification Socialness. Participants were asked to describe a recent notification experience including what they were doing at the time of the notification and details of the notification. These details contained information about the application or service that sent the notification and what the notification was about, e.g. “An incoming message from a friend on Facebook” and “A notification that I needed to update my anti-virus software”. From these descriptions, responses were coded as either social or non-social. If the notification came from a social application or service it was coded as a social notification. If the notification was not a social application or service it was coded as non-social notification. OWR Emotion. OWR responses were coded based on an emotional dictionary that defined the positive or negative tone of the emotion [7]. OWRs were coded either positive for words with a positive emotional tone, negative for words with a negative emotional tone, or descriptive for words that had no emotional tone and simply described the experience. Words that had an unclear emotional tone were further investigated by evaluating the context of the open-ended responses. For example, two cases of “reminder” were clearly descriptive, while one case of “remindful” was clearly a positive word when the context of the open-ended responses considered.
Again?!! The Emotional Experience of Social Notification Interruptions
475
Table 3. Statistical analysis of OWR emotion x notification socialness Notification Socialness Social Non-social Total
OWR Emotion Positive Negative 35 23 58
10 21 31
Chi-square
Phi
Total
Value (df)
Sig.
Value
Sig.
45 44 89
6.376 (1)
.012
0.268
.012
3 Results and Discussion Even though the OWR question asked participants for a “description” of their experiences, most responses contained emotion. Of the 122 participant responses, 89 words (73%) had a positive or negative emotional tone with 58 positive words (24 unique) and 31 negative words (13 unique). “Annoying” was the most popular negative word (n=15, 48% of all negative words). There was no single popular positive word, with the top five positive words as “informative” (n=9, 15% of all positive words), “useful” (n=7, 12%), “helpful” (n=6, 10%), “important” (n=6, 10%), and “exciting” (n=4, 7%). Table 2 lists the positive and negative OWR responses. The prevalence of emotion suggests that the interruptive notification experience had a significant emotional effect on participants such that their reactions to the interruptive notifications were emotional. Perhaps participants found it easier to draw on an emotional word to describe their experience, which supports design literature that includes emotion as part of the interaction experience. A Chi-square Test of Independence showed a significant relationship between the OWR emotional tone (positive/negative) and socialness of the interruptive notification (social/non-social) (6.376, df=1, p=.012). A Phi coefficient showed a significant weak relationship between emotional tone and socialness (r=.268, p=.012). A theta odds ratio showed that social interruptive notification experiences were 3.2 times more likely to be described with a positive OWR word than a non-social interruptive notification. Table 3 provides a summary of statistical analysis. Social Interruptive Notifications. Notifications from Facebook were the most frequently reported source of social interruptive notifications (n=31). These included interruptive notifications from all features of Facebook including chat, new mail messages, and other notices. Email (Gmail, Inbox.com, Outlook, Thunderbird, Yahoo!, but does not include Facebook messages) was the second most reported source of social interruptive notifications (n=18), followed by chat (AOL Instant Messenger and MSN Messenger, not including Facebook chat, n=5). More positive words used to describe social interruptive notifications than negative words. Of the 45 social interruptive notification experiences, 35 were described using a positive emotional word compared to 10 that were described using a negative emotional word. Many participants expressed the social benefits of social interruptive notifications, as one participant stated, “I like knowing when someone quotes me [on a website]” (OWR “informative”). Another participant described a social obligation, “I didn't want to make my friend wait; [I responded immediately because] it was polite” (OWR “humorous”). These results suggest that social interruptive notifications are likely to be a positive experience.
476
C.L. Paul, A. Komlodi, and W. Lutters
Non-social Interruptive Notifications. Software and security updates were the most common non-social interruptive notifications. Operating system updates from Windows (versions XP, Vista, and 7) and Mac OS X as the most frequently reported source of nonsocial interruptive notifications (n=18). Security software (Avast, AVG, Immunet, Kaspersky, McAfee, Microsoft Security, Norton) were a close second (n=15), followed by various software-related update services (Adobe n=7, HP n=4, Java n=4). However, there was no difference in the ratio of positive or negative words used to describe nonsocial interruptive notifications. Of the 44 non-social interruptive notification experiences described, 23 were described with positive words and 21 were described with negative words. A further look into the experience of non-social interruptive notifications revealed mixed feelings about receiving notifications about software updates and security services. Some participants did not mind non-social notifications because, “It is always good to know your virus protection is working,” (OWR “great”). Other participants expressed dislike because, “They interfere with what I was currently doing,” (OWR “irritating”). Other factors may influence why one interruptive notification can be a better experience than another. For example, frequency of nonsocial interruptive notifications may be an important contextual factor. As one participant explained, “It's important to know that everything is working okay, but maybe not everyday,” (OWR “boring”). Another participant stated, “The less often I get [notifications], the more likely I am to listen to them,” (OWR “annoying”). This suggests two things about non-social interruptive notifications. First, that non-social interruptive notifications may be an emotional experience but not an overwhelmingly positive or negative one. Second, that non-social interruptive notifications may have additional factors that influence the overall experience; that is to say, context matters more for understanding the non-social interruptive notification experience. 3.1 Study Limitations The purpose of the main study described in the Methodology was to collect information about recent experiences of interruptive notifications. The nature of our methodology may have resulted in participants to recall rather than to report their experiences. A study design with a different methodology could better support reporting and result in different OWR responses or different emotional content. As previously stated in the Methodology, emotion was not a planned study factor. Due to our main study methodology and the post-hoc nature of our analysis, we were unable to capture a baseline of participants’ emotional state before and after the interruptive notification experience. This type of comparison is common in emotional design research. Finally, while the OWR and other study quests were descriptive in the way they were presented to participants, there was possible evidence of word priming. Some of the qualities participants were asked to rate in the main study appeared in the OWR responses as emotional words (e.g. useful and interesting). However, priming is not always s a negative effect and could be used as a methodological strategy. For example, in order to control context of responses, participants could be primed in a way to influence the scope of how they respond.
Again?!! The Emotional Experience of Social Notification Interruptions
477
4 Conclusions We found a significant relationship (Chi-square 6.376, df=1, p=.012) between OWR emotional tone (positive/negative) and the socialness of an interruptive notification (social/non-social). Although the overall strength of the relationship was weak (Phi r=.268, p=.012), a positive word was more likely (theta odds ratio, 3.2) to be used to describe a social interruptive notification experience than a negative word. Social interruptive notifications stand out as particularly interesting results since many users’ computing experiences are shifting from a work-related to more a social environment. Our results are important for two reasons. First, our results identify emotion as an important factor in the interruptive notification experience. Second, our results reinforces existing literature on emotion as an important factor in interaction design. While socialness was planned as a factor to further investigate, the results presented in this paper has also lead us to include emotion as a factor to examine in any future work. We plan to continue investigating the contextual factors of the interruptive notification experience. A future field study will follow a small group of users who know and interact with each other over social communication tools. The primary research method will be a software-supported diary study with secondary methods to log user and system behavior. The diary study is unique in that the software will help negotiate common challenges found in diary study methodology such as consistency in event sampling, consistency in participant data entry, and consistency in participation over time.
References 1. Avrahami, D., Fussell, S.R., Hudson, S.E.: IM Waiting: Timing and Responsiveness in Semi-Synchronous Communication. In: Proc. ACM CSCW 2008, pp. 285–294 (2008) 2. Boehner, K., DePaula, R., Dourish, P., Sengers, P.: How Emotion is Made and Measured. Int. J. of Human-Computer Studies 65(4), 275–291 (2007) 3. Cadiz, J.J., Venolia, G., Jancke, G., Gupta, A.: Designing and Deploying an Information Awareness Interface. In: Proc. ACM CSCW 2002, pp. 314–323 (2002) 4. Czerwinski, M., Horvitz, E., Wilhite, S.: A Diary Study of Task Switching and Interruptions. In: Proc. ACM CHI 2004, pp. 175–182 (2004) 5. Desmet, P.M.A., Overbeeke, C.J., Tax, S.J.E.T.: Designing products with added emotional value; development and application of an approach for research through design. The Design Journal 4(1), 32–47 (2001) 6. Forlizzi, J., Battarbee, K.: Understanding Experience in Interactive Systems. In: Proc. Designing Interactive Systems 2004, pp. 261–268 (2004) 7. Hein, S.: Feeling Words List (2010), http://www.eqi.org/ (retrieved April 2011) 8. Hines, D., Czerwinski, M., Sawyer, P.K., Dwyer, M.: Automatic Semantic Priming: Effect of Category Exemplar Level and Word Association Level. J. Exp. Psych.: Human Perception and Performance 12(3), 370–379 (1986) 9. Isomursu, M., Kuutti, K., Vainamo, S.: Experience clip: method for user participation and evaluation of mobile concepts. In: Proc. Participatory Design Conference, pp. 83–92 (2004)
478
C.L. Paul, A. Komlodi, and W. Lutters
10. Isomursu, M., Tahti, M., Vainamo, S., Kuutti, K.: Experimental evaluation of five methods for collecting emotions in field settings with mobile applications. Int. J. HumanComputer Studies 65, 404–418 (2007) 11. Lang, P.J.: Behavioral treatment and bio-behavioral assessment: computer applications. In: Sidowski, J.B., Johnson, J.H., Williams, T.A. (eds.) Technology in Mental Health Care Delivery Systems, pp. 119–139. Albex, Norwood (1980) 12. Lopatovska, I.: Researching Emotion: Challenges and Solutions. In: Proc. iConference 2011, pp. 225–229 (2011) 13. Mankoff, J., Dey, A.K., Hsieh, G., Kientz, J., Lederer, S., Ames, M.: Heuristic Evaluation of Ambient Displays. In: Proc. ACM CHI 2003, pp. 169–176 (2003) 14. Matthews, T., Dey, A.K., Mankoff, J., Carter, S., Rattenbury, T.: A Toolkit for Managing User Attention in Peripheral Displays. In: Proc. ACM UIST 2004, pp. 247–256 (2004) 15. Miyata, Y., Norman, D.A.: Psychological Issues in Support of Multiple Activities. In: Norman, D.A., Draper, S.W. (eds.) User Centered Systems Design: New Perspectives on Human-Computer Interaction, pp. 265–284. Lawrence Erlbaum Associates, Mahwah (1986) 16. Tahti, M., Arhippainen, L.: A Proposal of collecting Emotions and Experiences. In: Interactive Experiences in HCI, vol. (2), pp. 195–198 (2004) 17. Vastenburg, M.J., Keyson, D.V., de Ridder, H.: Considerate home notification systems: a field study of acceptability of notifications in the home. Personal and Ubiquitous Computing 12, 555–566 (2008)
Do Not Disturb: Physical Interfaces for Parallel Peripheral Interactions Fernando Olivera, Manuel García-Herranz, Pablo A. Haya, and Pablo Llinás Dept. Ingeniería Informática, Universidad Autónoma de Madrid C. Fco. Tomás y Valiente, 11, 28049 Madrid, Spain {Fernando.Olivera,Manuel.GarciaHerranz,Pablo.Haya, Pablo.Llinas}@uam.es
Abstract. Interaction is, intrinsically, a multi-thread process. Supported by our various senses, our ability to speak, and the structure of our body and mind we can get simultaneously involved in multiple interactions, using different resources for each of them. This paper analyses natural interactions and the impact of using parallel channels in peripheral interactions. Applying a similar approach to human-computer interaction, we present a Tangible User Interface proof of concept to analyze the advantages and weakness of parallel interaction in computer-based systems. To this end, two tangible applications -to control the profile status in social networks and to control an Intelligent Room- are compared to their usual graphical counterparts, presenting the results of a user study and analyzing the implications of its results. Keywords: tangible, subtle interaction, calm computing, fiducial marker, peripheral interaction, parallel interaction.
between the two”. In the same way, we consider it ground of a new interaction technology that can move back and forth between the center and the periphery of our interaction too. Using periphery, as Weiser and Brown, “to name what we are attuned to without attending to explicitly” [1]. Following the path opened by Matthews et al. [2], we have previously explored Subtle Interaction [3]. Rather than focusing on information visualization, Subtle Interaction uses a concise piece of information to build up a dialog upon context. Peripheral Interaction explores an analogous concept from a different perspective: How much of our attention do we put into the interaction itself? Both are meant to be fast but for different reasons. In the former, the message to be send is short (i.e. a concise piece of information to trigger a richer dialog), thus the interaction is brief. Nevertheless, while winking to our loved one might be a brief act, our interaction is (and we want it to be) completely focused on it. Conversely, Peripheral Interaction is brief because our interaction focus is somewhere else and, as in the “hand over the mug” reaction, we want to deal with it without strongly affecting the main one. Our five senses are naturally designed for different types of interaction. Thus, our binocular stereoscopic vision is designed to focus on a particular object (contrary to the wide range monocular vision of most animals of prey) while our somatosensory system is distributed all along our body to simultaneously collect diverse stimuli from different sources. Moreover, different parts of our brain control and process each of them. Contrary to Edge and Blackwell [4] who already used the term to explore and “exploit the affordances of physicality to facilitate the engaging user experiences”, we pursue a parallel non-engaging interaction channel. Thus, instead of looking for an extra channel to enrich a particular domain of interaction, we pursue a parallel one for users to deal with simultaneous domains, allowing them to naturally distribute the interactions among the channels according to the nature and importance of the tasks. The potentials of this parallel architecture are lost when most human-computer interfaces are designed for our vision (and bits of hearing) and thus collide and crowd into a single channel. Thus, as human-computer interactions expand through our life, penetrating every bit of it (e.g. planning, social networking, leisuring, working or communicating), the HCI community faces the problem of dealing with itself: how many of our thoughtfully designed interfaces coexist in user’s daily life. In this sense, it is critical to reduce the interaction overhead where it is not needed nor wanted. For instance, a status update in Skype -not to be interrupted in the middle of a movierequires a considerable amount of interaction (i.e. opening Skype’s window and selecting from a dropdown menu) compared to leaving a nearby phone off the hook. This overload can be reduced either by cutting down the interaction needed or by minimizing the impact of it, transferring it to underused interaction channels to reduce the load of overused ones. Physicality opens up a parallel interaction channel and thus, due to the nature of our somatosensory system, extra threads for interaction that may require low attention. Therefore, it allows performing tasks concurrently as well as to move some of them to an interaction periphery. In addition, physical objects have a strong and inherent directedness both for performing actions and retrieving feedback. As when hanging a “Do not disturb” sign at the room’s door, physical objects stress a direct mapping between commands and actions and stay there to remind us their state. This article explores the possibilities of Peripheral Interaction through physical and simple actions.
Do Not Disturb: Physical Interfaces for Parallel Peripheral Interactions
481
2 Peripheral Interaction through Physical and Simple Actions To study physical simple interaction we have augmented physical objects using Fiducial Markers (FM) -i.e. 2D-codes, designed to improve the accuracy of automatic image recognition systems- to create a special tangible object or PolyTag (i.e. polyhedrons with FM printed on their faces) as a proof of concept for Peripheral Interaction. As multi-faced dices, PolyTags are human-computer interfaces in which half of their faces hold a human-readable command (a word or an icon) while their opposites hold a FM to be read by the computer (see Figure 1). Thus, PolyTags are objects that can be both interpreted by humans and computers as in Magic Cards [5] in which this kind of systems are used to control domestic robots. The number of possible commands a PolyTag can hold is determined by half its number of sides: opposite to each human readable icon there is a FM to be read by the camera embedded in the table when the human readable side faces up to the user. Since the camera can recognize the FM identity, location and orientation, interaction can be further enriched with placement and manipulation information. Additionally, in a rear projected tabletop, dynamic feedback can be shown to the user in the form of halos or associated labels.
Fig. 1. PolyTags are easily printed and constructed, and can be personalized with user-selected actions
Our proof-of-concept prototype consists of a glass coffee table, a webcam and a CPU (see Figure 2 a.). The webcam is positioned upward under the table to capture any object on the table through the glass surface.
482
F. Olivera et al.
When a FM is identified, a TUIO [6] message is sent to the end-user application using the reacTIVision [7] computer vision framework. Using this framework we developed two case studies for social networking and ubiquitous control respectively. The former is intended to explore Peripheral interaction as an unobtrusive mean of parallel control. Mimicking the “hand over the cup” reaction, we analyze fixed and simple physical interactions as parallel channels for controlling out-of-focus applications, in this case controlling the status of our social network profile. The latter explores the potential of this type of interfaces to perform more complex interactions. Using an Intelligent Environment as scenario, we analyze tailored physical actions enriched with movement information to control home appliances.
(a)
(b)
Fig. 2. (a): The fiducial table has a clear surface and the webcam is positioned below looking upwards. (b) The intelligent environment schema drawn on the table contains a representation of all configurable devices. To interact with a device, the ‘Actions’ PolyTag must be placed on top of its representation.
2.2.1 Social Networking The ‘Mood’ PolyTag (see Figure 1), a twelve-sided dodecahedron, is intended to set the status of social networks. Each human readable face holds a different emoticon expressing a feeling (happy, upset, angry…). When placed on the table the status of different social platforms such as Twitter, Facebook, or Buzz is updated according to the emoticon facing up. This task, usually requiring to open the social app on a mobile phone and typing in a message, is performed using the ‘Status’ PolyTag with minimal attention, thus minimizing the loss of focus suffered in the main activity (e.g. talking, reading or watching TV). Since FM are responsible for the actions, different users can have their own replica of the ‘Mood’ PolyTag. While these PolyTags will share the same icons, their different FM will allow the system not only to identify the action to be performed but also the user commanding it. 2.2.2 Ubiquitous Control The ‘Actions’ PolyTag is a 6-sided dice to control the environment. For this prototype, a schematic representation of the intelligent space is drawn on the table’s surface (see Figure 2.b) showing all interactive elements that can be controlled with the PolyTag. The PolyTag allows for three actions: ‘Power’, ‘Select’ and ‘Off’. Power and Select are
Do Not Disturb: Physical Interfaces for Parallel Peripheral Interactions
483
movement enriched actions, allowing respectively to turn the volume/intensity up and down and to change the channel/mode by rotating the PolyTag once placed on top of the representation of the element to be controlled. As an example, to turn on the TV and search for Channel 4, the user will place the PolyTag, with the “Power” face looking up, on top of the TV icon of the table. On posing the PolyTag, the TV will turn on, and by rotating it the volume is adjusted. Then, turning the PolyTag so the ‘Select’ face looks up, the user can channel-hop by rotating it as when adjusting the volume. When similar operations are conducted over other elements such as heaters or lights, the Power and Select operations adapt their meaning to the them thus controlling temperature or light intensity and changing between cooling and heating modes or selecting light color. Integration with the intelligent environment is achieved using the context-based architecture for intelligent environments described in [8].
3 Evaluation We have performed a preliminary experiment to show how tangible interfaces can be used to reduce the attention load of certain tasks. The experiment was conducted with 6 males and 4 females from 23 to 28 years old. The population was chosen so all of them had programming skills and experience using Internet and social networks. We choose a within-subjects design in which each participant had to perform an attention demanding task interrupted with a secondary task that had to be accomplished using two different user interfaces presented one after the other in random order. Subjects were therefore asked to count how many times a particular vowel appeared in a given text to force them to focus into an attention demanding task. While counting vowels, they were asked to carry out some extra task related to the foretold scenarios such as changing their status in Twitter or controlling an intelligent environment (turning on/off the lights, browsing TV channels…). These interruptions were randomly distributed over each trial and accomplished using either the tangible interface proposed in this article or a graphical user interface (a browser logged in a Twitter account and a web based application for controlling the smart home [9]). The order in which each participant performed the trials was counterbalanced. In addition, to minimize the learning effects, different texts of similar complexity were used in each trial. As dependent variables both error rate (ratio between errors and number of words) and performance (word per seconds) were measured (see Table 1). Table 1. Experiment results of the ten subjects including the mean and standard deviation of the whole group Subject #1 Fiducial Version Error Rate (%) 0 WPS 1.4 GUI Version Error Rate (%) 6.4 WPS 1.4
#2
#3
#4
#5
#6
#7
#8
#9
#10 Mean (Std. Dev)
3 0.7
1.5 5 1 0 2 4 3.3 1.1 1.1 0.8 1.3 1.0
6 1.4
1.5 1.6
2.4 (±2.1) 1.4 (±0.7)
0.6 0.8
10.4 6.9 1.7 1.7 2.9 7.5 2.5 0.9 0.9 0.5 1.1 0.9
5.2 1.2
2.9 1.3
4.6 (±3.2) 1.2 (±0.6)
484
F. Olivera et al.
Since data were non-normally distributed, the Wilcoxon Signed Ranks Test for related samples was chosen. The analysis shows that participants make marginally significant fewer errors in the main task (counting vowels) when using the tangible interface than when using the GUI, Z(10)= -1.886, p = 0.059, r= 0.242. Additionally, the performance in the tangible trial was significantly better than in the GUI ones, Z(10)= -2.514, p = 0.012, r = 0.974. Both GUIs were selected to require a similar amount of user’s interaction time compared to the PolyTag interface. On the other hand, the attention load they require pays its price in the number of errors users make in the main task since users lose their concentration more easily when using the GUI interface than when using the tangible one. While the two scenarios present an increasing degree of complexity in the interaction, further investigation is required to study the limitations and implications of complex interactions in user’s attention.
4 Related Work HCI researchers have previously explored subtle interaction in peripheral displays [2][4]. These ambient displays have shown how computers can give information in a subtle way. Similarly, as we have discussed throughout this article, we believe that users should be also able to give simple orders to computers in an unobtrusive way, allowing them to command simple actions while keeping their focus in other tasks. Fiducial-based approachs have been previously used in the field of Augmented Reality [10] as well as for transforming physical objects into tangible interfaces (usually as physical icons, called phicons in the field of TUI). D-Touch [11] showed how to develop low cost tangible interfaces for musical applications using fiducial tags. Other musical systems have used other passive technologies such as RF tags [12 or Near Field Communication (NFC) to send messages (usually an ID). While RF tags are considered a low cost solution, they are not as affordable as printed-based technologies such as Fiducial Markers. In this sense, further research in tabletops explores the use of Fiducial Markers, usually as a way to easily integrate object recognition in GUI, as shown in the reacTIVision framework [6], which was also developed to explore musical tangible instruments. Besides music, Fiducial Markers have been previously used to control information in ambient displays of intelligent environment too [13]. Nevertheless, our contribution follows a different direction looking for subtle ways to communicate with computers rather than for subtle ways in which they communicate with us. Finally, tangible interfaces’ relationship with social networks has been also explored in the form of simple peripheral displays [14] or through limited interfaces [15]. Rather than aiming to retrieve information from or interact with our digital social networks in enhanced ways [16], we seek new channels to interact with them, channels that can parallelize some of the simple interactions they require (such as updating our status) with other concurrent activities. This kind of parallel channels have been briefly explored in art and school projects using wearable technologies, such as shoes sending information about our steps to Twitter [17] or registering events on Facebook through RFID tagged shoes [18].
Do Not Disturb: Physical Interfaces for Parallel Peripheral Interactions
485
5 Conclusions This paper analyses the diversity of interactions in our nature and, focusing on Peripheral Interaction, aims to provide a similar approach to human-computer interaction. Exploiting the physicality of tangible user interfaces, we explore the necessity and means to enrich human-computer interaction with parallel threads to allow peripheral interactions to take place without strongly affecting our focus. We have argued the benefits and limitations of physicality and based our design on the directedness of its actions and feedback (i.e. straight and explicit commands and constant feedback of its state). We have then presented the PolyTags that, using Fiducial Markers, remove any need for power supply reducing both unit and maintenance costs. Additionally, they are easy to use and modify and can be built from paper or cardboard at a very low cost thus being a customizable and versatile solution. Since the same icon can be assigned to different Fiducial Markers we are able to thus preserve a human-readable image in different PolyTags while allowing the computer to distinguish between them., Ttherefore, as an example, members of a family can each use a ‘Mood’ PolyTag in which emoticons are constant while the system is able to recognize whose its owner. Focusing on the input capabilities of Peripheral Interaction rather than on the output ones we have built two demonstrators: a) a status updater for social networks, and b) a tangible UI for controlling an intelligent environment. We have evaluated them as a mean for parallel interaction in terms of how they affect the effectiveness (number of errors) and efficiency (time spent) of a concurrent task and compared this results with traditional GUIs. The results show a clear increase in the effectiveness of the concurrent task while a small improvement in terms of efficiency. While the low correlation in the effectiveness requires of further experiments to make stronger assertions, the results incline us to think that the parallel channel provided by the PolyTags helps in reducing the attention needs, thus improving the effectiveness and efficiency of the main task. Additionally, we hypothesize this attention reduction is due to taking advantage of an underused interaction channel (somatosensory system) to reduce the load of the main one (sight), since while the identification of the action to perform remains visual, positioning (contrary to the mouse cursor of the GUI approach) is tactile-supported. This article has shown that through physical interaction we can open natural interaction channels to distribute the interaction load of simultaneous tasks, resulting in an equally fast interaction with less lost of attention. Acknowledgments. This work has been partially funded by the following projects: ASIES: Adapting Social & Intelligent Environments to Support people with special needs (TIN2010-17344), Vesta (TSI-020100-2009-828) and e-Madrid (S2009/TIC1650).
2. Matthews, T., Dey, A.K., Mankoff, J., Carter, S., Rattenbury, T.: A toolkit for managing user attention in peripheral displays. In: 17th Annual ACM Symposium on User Interface Software and Technology, pp. 247–256. ACM, New York (2004) 3. Olivera, F., García-Herranz, M., Haya, P.A.: Subtle Interaction for Ambient Assisted Living. In: II International Workshop on Ambient Assisted Living, IWAAL 2010, CEDI 2010 (2010) 4. Edge, D., Blackwell, A.: Peripheral Tangible Interaction by Analytic Design. In: TEI 2009, pp. 69–76 (2009) 5. Zhao, S., Nakamura, K., Ishii, K., Igarashi, T.: Magic cards: a paper tag interface for implicit robot control. In: CHI 2009, pp. 173–182. ACM, New York (2009) 6. Kaltenbrunner, M., Bovermann, T., Bencina, R., Costanza, E.: TUIO: A protocol for tabletop tangible user interfaces. In: 6th Int’l Workshop on Gesture in Human-Computer Interaction and Simulation (2005) 7. Kaltenbrunner, M., Bencina, R.: ReacTIVision: a computer-vision framework for tablebased tangible interaction. In: TEI 2007, p. 74. ACM, New York (2007) 8. Haya, P.A., Montoro, G., Alamán, X.: A prototype of a context-based architecture for intelligent home environments. In: Chung, S. (ed.) OTM 2004. LNCS, vol. 3290, pp. 477– 491. Springer, Heidelberg (2004) 9. Gómez, J., Montoro, G., Haya, P.A.: iFaces: Adaptative user interfaces for Ambient Intelligence. In: IADIS International Conference on Interfaces and Human Computer Interaction (2008) 10. Rekimoto, J., Ayatsuka, Y.: CyberCode: designing augmented reality environments with visual tags. In: DARE 2000 on Designing Augmented Reality Environments. ACM, New York (2000) 11. Costanza, E., Shelley, S.B., Robinson, J.: D-touch: A consumer-grade tangible interface module and musical applications. In: HCI 2003, pp. 8–12 (2003) 12. Patten, J., Recht, B., Ishii, H.: Audiopad: a tag-based interface for musical performance. In: New Interfaces for Musical Expression. National University of Singapore (2002) 13. Bovermann, T., Hermann, T., Ritter, H.: A tangible environment for ambient data representation. In: 1st International Workshop on Haptic and Audio Interaction Design, vol. 2, pp. 26–30 (2006) 14. McPhail, S.: Buddy Bugs: A Physical User Interface for Windows Instant Messenger. In: Western Computer Graphics Symposium, Skigraph 2002 (2002) 15. Peek, N., Pitman, D., The, R.: Hangsters: tangible peripheral interactive avatars for instant messaging. In: TEI 2009, pp. 25–26. ACM, New York (2009) 16. Kalanithi, J.J., Bove Jr., V.M.: Connectibles: tangible social networks. In: TEI 2008, pp. 199–206. ACM, New York (2008) 17. O’Nascimento, R., Martins, T.: Rambler, http://www.popkalab.com/ramblershoes.html 18. Lemhag, H., Naslund, M., Andersson, A., Bengtson, K., Madonia, P., Gustafsson, B.: WESC Karmatech Cocept, http://projeqt.com/piermadonia#lsi8859ci2134q
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild” Hannu Kukka1, Fabio Kruger1, Vassilis Kostakos2,1, Timo Ojala, and Marko Jurmu1 1
MediaTeam Oulu, University of Oulu, Finland [email protected] 2 Madeira Interactive Technologies Institute, University of Madeira Sema
Abstract. This paper presents a case study on the iterative design of a system for delivering in-situ information services to users’ mobile devices using proximity-based technologies. The design advances from a questionnaire study of the users’ attitudes and needs toward such information services via several incremental prototypes evaluated in a usability lab and at a university campus to the final version subjected to longitudinal evaluation "in-the-wild" in a city center. The final prototype is a hybrid interface where the users can select from an interactive public display the information services to be downloaded to their personal mobile devices over no-cost Bluetooth connection. The results include an empirical comparison of different models for delivering such information services, and a quantitative analysis of the usage of the system by the general public over a period of 100 days. Our findings suggest that multiple environmental factors strongly affect the usage of the system. Furthermore, the usage varies distinctly between different contexts, and there is a strong correlation between location and usage patterns. Finally, we present a number of guidelines for designing and deploying this type of hybrid user interfaces. Keywords: public interactive displays, smartphones, longitudinal study, Bluetooth, urban computing, ubiquitous computing.
This paper presents the iterative design of an effective interaction model for in-situ "information pick-up" utilizing Bluetooth as the main technology, and understanding users’ appropriation of such technology over a long period of time. The presented system, called BlueInfo, has evolved over a series of prototypes and user tests described in this paper. An initial questionnaire study to uncover attitudes and existing practices was conducted at bus stops, and results were used as a backbone for the prototypes. The first prototype, tested in a usability lab, utilized a pull-based interaction model of content acquisition utilizing parameterized textual commands sent from a mobile device to a Bluetooth access point. The second prototype, tested on a university campus, utilized a push-based interaction model where users could selectively request relevant content to be pushed to their devices when passing a Bluetooth access point. The third prototype, also trialed on campus, featured a proximity-based interaction model, where physical artifacts in the environment were used to push content to users who brought their device very close to the artifact, thus functioning as a hybrid push/pull interaction model. The fourth and final prototype is a hybrid interface comprising of interactive public displays and mobile devices. It was deployed for a period of 100 days “in the wild” using a network of 12 large public displays called UBI-hotspots (later: hotspots; [25], Fig. 1), situated in highly public indoor and outdoor locations around downtown Oulu, Finland. The final prototype allows users to select which content they wanted to download from a UI on an interactive public display, and continue the interaction on their mobile devices once the selected content is delivered. The displays are available for 24/7 use by the general public, and offer a number of information and entertainment services to users through their touchscreen interface. The focus of this paper is not on the public display architecture or service selection, however, but rather on the iterative design of the BlueInfo service, and, more importantly, on the longitudinal real-life deployment and evaluation of the final version of the system.
Fig. 1. Outdoor UBI-hotspot (left) and hotspot locations on map (right)
The main contributions of this paper are: i) an empirical evaluation of several incremental prototypes utilizing different interaction models for Bluetooth-based information retrieval; ii) identification of multiple environmental factors, such as the day of the week, hour of the day, and weather conditions, that strongly affect the
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild”
489
usage of the deployed real-world system; iii) demonstration that usage varies distinctly between indoor and outdoor hotspots set in different types of locations; iv) a number of guidelines and lessons learned for the design and deployment of such hybrid interfaces. The rest of this paper is structured as follows: Section 2 covers related work in the field of interactive public displays, hybrid interfaces utilizing mobile devices and public displays, and Bluetooth-based information retrieval. Section 3 introduces the BlueInfo system, and briefly discusses the infrastructure on top of which it is deployed. Section 4 describes different BlueInfo prototypes and their evaluation in different settings. Section 5 presents the results and Section 6 discusses the implications of the findings. Section 7 concludes the paper.
2 Related Work Large public displays have gained popularity in industry and research due to their reduced cost and high visual impact [33] in relation to other elements of the urban infrastructure [13]. Such displays can be categorized into reference displays and interactive displays. Reference displays are designed for unidirectional broadcasting of digital information and signage. They require relatively small setup effort, but often suffer from short attention spans [10] and so-called display blindness [21]. Interactive displays are mostly research-driven projects, although some commercial interactive display installations such as the BBC Big Screens (http://www.bbc.co.uk/ bigscreens) exist. A number of important findings have been established in relation to public displays and their in-situ use. For instance, their use typically gives rise to emerging social interaction patterns, with interaction roles such as mentoring and ad hoc collaboration [28]. In addition, ongoing interaction on a public display serves as an attention incentive for attracting other users, a phenomenon known as honeypot [4, 11, 28]. Studies on the use of public displays during pedestrian navigation [20] have shown that displays were useful in the planning stage (due to the displays' increased capacity), as well as when straying from the planned route, suggesting that public displays support information foraging [27]. Furthermore, studies have shown that networks of public displays used for broadcasting [30] must emphasize the locationaware presentation of the content, thus adhering to the calm aesthetics principle [34]. A sub-category of public interactive displays is information kiosks, which usually feature certain input mechanisms in addition to touch screens, such as keyboards. Examples include the Kimono kiosk [9], and the work presented in [22]. The system presented in this paper has similarities to such information kiosks, but external input is limited to mainly that from nearby mobile devices [2]. A substantial body of work has considered how mobile devices can cooperate and interact with public displays. The increased visual capacity of a large display combined with the mobile device functioning as a private GUI and input channel has been investigated from several perspectives, including distributed multi-user access to a single public display with personal mobile phones [29], migratory user interfaces capable of traversing among different devices, maintaining the state of the application and enabling continuous interactivity regardless of the used terminal [3]; transitioning of UI elements between heterogeneous device types through UI rendering engines and
490
H. Kukka et al.
high-abstraction description languages [23]; and application composition, where architectures commonly make independent decisions on physical and logical composition of the application and its logical parts including the presentation layer during runtime rather than design-time or compile-time [26]. In addition to serving as complementary interaction spaces, mobile devices can also act as a control mechanism in conjunction with proximity-based wireless technologies. More specifically, Bluetooth has often been suggested as a suitable technology for controlling public displays, due to its high market penetration and consumer awareness: 46.7% of mobile phones in 2007 had Bluetooth transceivers, and approximately 81% of consumers are aware of Bluetooth technology [5]. Continuous scanning of Bluetooth devices has been utilized as presence information [12, 24], and along with the friendly name of a device was used to create instant places, which are identities of users in the same space visualized on a public display. The idea of using Bluetooth friendly names as a control channel was developed further in [8], where users could request several different services on public displays in a campus setting using command parameters in their Bluetooth friendly names. Proximity technologies such as Bluetooth and NFC [14] have also been utilized for exchanging data with users. In [7] users were able to exchange photos with a public display, and the study's findings suggest that users react positively to the idea of being able to upload and download photos from/to a situated display. Similarly, the work in [6] describes a rule-based context-aware system that delivers information to smartphones using Bluetooth. In general, wireless proximity technologies are well suited for contextual content dissemination, defined as a process achieved either by the user's request (e.g. pull), or by the sources own initiative (e.g. push). Reportedly, the push model is more efficient when multiple clients are present, and the pull model is more appropriate for a small number of clients, however a combination of both is typically proposed [1, 32].
3 Conceptual Design and Implementation The system presented in this paper proposes a hybrid interface for delivering contextual information to pedestrians in a city centre. This system draws inspiration from much of the work described above. Specifically, this work is partly inspired by findings which suggest that users are happy to download files from public displays [7], as well as using proximity based technologies to infer the presence of users [8, 15]. In addition, it builds on the fact that displays are well suited for information pickup, and also attempts to capitalize on the honeypot phenomenon to entice further users to use the system. The result is a service, BlueInfo, where users via direct manipulation of a public display can request a small dataset to be sent directly to their phones via Bluetooth. In this model, users are initially enticed by the display's information foraging affordances [10], must physically interact with the displays (thus enticing additional users) [4, 11, 28], are able to take advantage of the display's increased visual capacity for planning purposes in preparing their dataset [10], and can use their mobile devices as extended interaction spaces to take the information with them. This model of interaction is akin to in-situ information pick-up.
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild”
491
BlueInfo is an architecture that employs an in-situ information pickup model of interaction, allowing for multiple types of information delivery mechanisms including push, pull, and hybrid. The architecture bridges WPAN (wireless personal area network) and WAN (wide area network) connectivity in a way that allows for realtime content delivery from Internet sources to the personal mobile device of a user, with the last leg of delivery using a cost-free Bluetooth connection. Thus, the content is dynamic (i.e. not static content uploaded to the BT access point), and due to wellformed APIs, third party content can be easily integrated. BlueInfo fetches service content in real-time from the origin servers in the Internet, and once delivered to the personal device the content is available for viewing at any time, without need of further connectivity. In our implementation we use Bluegiga’s Bluetooth Access Server 2293-56-EXT, which has three Bluetooth transceivers with external antennas, runs embedded Linux, and can support up to 21 simultaneous users. This hardware is connected to the Internet via WiFi or LAN. Using the same hardware configuration multiple interaction models were developed and iterated over a number of prototypes described in the next section.
4 Iterative Design The design of the BlueInfo has progressed as follows. First, a questionnaire study was conducted at bus stops aimed at understanding users' information needs and preferences during idle time and their preferences regarding push vs. pull modes of delivery. Then, three incremental prototypes with alternate delivery modes were developed and evaluated in different experimental settings in a usability lab and at university campus. The fourth and final prototype was deployed in real world setting in a city center for a longitudinal study “in the wild”. Questionnaire study. The study was conducted at four different public transport stops. These were selected due to the perceived idle time people experience while waiting for a bus, thus possibly being open to receiving content on their phones. The people were asked to fill in a questionnaire collecting demographic information and information about mobile device uses while at the bus stop (e.g. browse the web, call, sms). Finally, the questionnaire probed users regarding how they would prefer to access services such as those provided by BlueInfo [6]. On-campus prototypes. Three prototypes were evaluated at a university campus: BlueInfo Pull, BlueInfo Push, and “Easter egg” probe. BlueInfo Pull was tested in a usability lab to evaluate usability and user acceptance. The interaction model of this prototype required users to send text-based requests (not SMS) over Bluetooth to a BlueInfo access point, which would parse the request and subsequently respond with the requested information fetched from Internet sources. The comparative study required users to complete five information-seeking tasks using a mobile Internet browser with a WiFi connection, and then perform the same tasks by pulling the information from BlueInfo [18]. BlueInfo Push was tested as an open trial on a university campus. Users were provided with a website to sign up and pre-create their own ‘daily message’, comprising
492
H. Kukka et al.
of selected services, which would be then pushed to their device when passing a BlueInfo access point. Users could request the information to be sent during AM hours, PM hours, at any time, and select the number of times per day that he/she would like to receive a message, for example once in the morning and once in the afternoon, only once per day, or even every time he/she passes an access point. Seven BlueInfo access points were deployed in the restaurants, cafés, and corridors around the campus. The study was advertised on student email-lists and with posters around campus. The data collected addressed performance and user preferences. Follow-up interviews were conducted with the participants after the study. Tests with the ”Easter egg” probe were conducted just before the Easter break. An Easter egg (Fig. 2) enclosing the BlueInfo access point was built from paper maché and chicken net wrapped in a colorful paper. The egg was placed on a table at a busy cafeteria on the campus. The purpose of this study was to evaluate recognizable and fun physical user interfaces for enabling users to request content via proximity-based interaction. To do so, users had to place their mobile device very close to the egg, which used received signal strength indicator (RSSI) to detect such events. When signal strength exceeded a predetermined threshold, an image wishing "Happy Easter" was sent to the phone. Participants from this study were recruited by the sheer presence of the egg itself, as well as leaflets distributed in the cafeteria.
We also constructed and deployed another proximity-based probe for a student party on campus. This probe was a dummy build to resemble a student (Fig. 2), and users could interact with the probe by placing their phone in the dummy's pocket. The experiment was cut short, however, as the dummy was taken to sauna by some students during the first night of the trial, and has not been seen ever since. Longitudinal deployment in the wild. The final prototype of BlueInfo provides users with a hybrid interface for in-situ information pick-up model of interaction. In this prototype, users select from the interactive display of a hotspot the information they wish to download to their personal device over a Bluetooth connection. To achieve this, the system continuously scans for nearby Bluetooth devices, and
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild”
493
populates an on-screen list of discovered devices. User can browse a directory of available information services on the interactive display, including bus schedules, weather forecasts, news, TV programming, and movie listings, and indicate which services she would want to download to her phone (Fig. 3). At anytime during their use of BlueInfo, user may click the on-screen "Download to my device" button. S/he is then instructed to identify his/her phone from the list of discovered devices, which initiates an OBEX Push connection to the specified device. User is also advised that if his/her device is not present in the list of discovered devices s/he should set her Bluetooth device to discoverable mode.
Fig. 3. Hotspot user interface (BlueInfo in lower left quadrant)
BlueInfo was deployed for 24/7 use by the general public on 12 hotspots situated in highly public locations around the downtown area of the city of Oulu (Fig. 1). The data presented in this paper was collected during a period of 100 days of usage by general public (i.e. not by recruited test users). The data reflects the types of information downloaded, along with the date, time and location of these interactions. Additionally, the BlueInfo access point in each hotspot collected traces of passing by Bluetooth devices regardless of whether they actually used the BlueInfo service. The traces allow us to explore how often each device (user) visited a hotspot in general and with respect to their BlueInfo usage. Further, three researchers individually went through the list of Bluetooth friendly names of the devices, classifying each device (user) as either male, female, unknown, or factory default based on the friendly name. Thus, devices with a friendly name such as "John's phone" were classified as male, devices with a name such as "Jane's phone" as female, devices with a name such as "Angry badger" as unknown, and devices with a name such as "Nokia N97" as factory default. While this method of identifying users is somewhat inaccurate, it can be used to gain further insight into gender specific usage patterns.
494
H. Kukka et al.
5 Results 5.1 Questionnaire study A total of 105 respondents (51 female) completed the questionnaire, with more than half of them being 20-30 years old. Most respondents (80%) claimed to use their mobile device while waiting for the bus. Overall, 60% claimed to use their device for messaging purposes, 41% for making phone calls, 35% for entertainment (music, games), and 5% for online access. While most respondents had a Bluetooth enabled device (73%), only 11.5% had their Bluetooth turned on and set as discoverable. Of those who explicitly disabled their Bluetooth, 50% claimed security concerns and 37% power consumption concerns, while 13% gave no reason. Regarding their preferred way of receiving digital content on their mobile devices, 31% claimed they wanted to retrieve the content (i.e. pulling content), 32% claimed they wanted to be pushed information if they had explicitly registered beforehand, 4% wanted to be pushed information without prior registration, and 33% claimed that they did not care as long as the information was relevant to them. A chi-squared analysis revealed a significant association between demographics and device usage practices. Specifically, those more likely to user their mobile phones for texting while waiting at a bus stop are women (χ2=6.507, dF=1, p<0.05) and those aged between 10-20 (χ2=21.69, dF=4, p<0.01). In general, participants aged 10-30 were much more likely to use their mobile device while waiting for the bus (χ2=11.04,dF=4, p<0.05). Further analysis revealed a significant association between waiting time and use of mobile devices, with those waiting only between 5 and 10 minutes less likely to use their devices (χ2=12.83, dF=4, p<0.05). 5.2 On-campus prototypes BlueInfo Pull. The comparative task-based study conducted in a usability lab assessed the relative performance of a mobile web browser versus BlueInfo in five information seeking tasks such as finding a bus timetable. On average, BlueInfo Pull reduced the number of required clicks by 42% across all five tasks. Similarly, task execution times were reduced by 35% across the 5 tasks, and the success rate improved by an average of 54%. After completing the tasks, test users were asked to fill out a questionnaire with 15 statements on a 5-point Likert scale. Users indicated a strong tendency to accept the system in their questionnaire answers. The statement the service was useful scored an average of 4.4. The statement learning to use the service was easy scored an average of 4.5, and the statement the option to use the service on a mobile phone is a good thing an average of 4.7. Users also considered the transfer times of messages to be fast, as the average response and download time was between 20 and 30 seconds, or up to 120 seconds with video trailers. For a detailed report on the BlueInfo Pull prototype see [18]. BlueInfo Push. A total of 33 participants took part in this study for a period of one month. Overall, 336 daily messages were sent during the study. On average, each user received 10 messages during the testing period, with the most active user receiving 31 messages and the least active user receiving 4. The level of activity
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild”
495
varied throughout the study, with the maximum number of messages sent during one day being 37, while the minimum was 5. The available information and entertainment services, along with the number of users who included these services in their preferences, are listed in Table 1. Table 1. BlueInfo Push services and users Category Word of the Day Word of the Day News News News Weather TV programs Comics Comics Comics Other: Music charts Other: Horoscope
Service/provider Merriam-Webster Urban Dictionary YLE Reuters Kaleva FMI telkku.com PhD Comics Calvin and Hobbes Viivi & Wagner last.fm horoscope.com
# users 11 18 2 8 20 21 22 3 4 7 10 5
During follow-up interviews participants expressed satisfaction with the service, especially noting the fact that the message would remain in the phone’s inbox so that the information would be available for later reference. Participants often remarked that the news stories made for good reading during boring lectures, or on the bus ride home. One participant even mentioned that the daily message made for perfect restroom reading material. A source of discontent was the requirement to pair the mobile phones with each individual BlueInfo access point: if the device was not paired with the access point, a confirmation prompt would be shown on the device, and the message would not be received unless the user explicitly gave permission. As the prompt does not alert the user with any audio/tactile feedback, it can be easily missed if the phone is carried in a pocket or bag. Another issue was that due to technical restrictions the system could not accurately verify if a message was successfully received - this could happen when users moved beyond the access point’s range during message transmission. This resulted in the system marking the message as sent, even though the user had not received it. This issue was fixed after the first week of testing. "Easter egg" probe. The Easter egg probe was operational for four days, during which a total of 104 messages were sent to 97 unique devices. During the study, we observed the colorful egg raising interest in passers-by, many of whom approached the egg to read the usage instructions printed on leaflets around the object. People approached the egg in company more often than alone, and preferred to interact with it while others were present. However, due to system delays, not all attempts at interaction were successful, as people withdrew their phone before the message could be transmitted. The egg served also as a conversation piece, with people gathering around it to chat and, at times, to test the boundaries of the system by slowly inching their device closer to the egg to see when a message would be sent. Overall, the egg as an artifact appeared to encourage playful interaction, and also served as a temporary point for social behavior.
496
H. Kukka et al.
5.3 Longitudinal Study in the Wild The longitudinal study in the real world setting began in February 2010 and lasted for 100 days. In total 7268 downloads, or “messages”, were downloaded by 1338 unique devices, i.e. on average 73 messages per day and 5.4 messages per device. The most active user downloaded 91 messages utilizing 8 of the available 10 services, meaning that on average s/he downloaded every service 11 times. On average, a user used the system 5.4 times utilizing 3.4 distinct services. Out of the 1338 unique devices, 74 used BlueInfo in more than one hotspot. Users using more than one hotspot were more active in downloading content than those using only a single hotspot: the 74 devices downloaded a total of 836 messages, or 11.3 messages per device on average (min 2, max 91) from an average of 2 hotspots (min 2, max 6). The most active user used the system on 10 separate days. There is a positive correlation (r=0.5) between the number of days used and messages downloaded. A p-test showed that the correlation is statistically significant (r(1325)=0.5, p<0.0001). The number of unique users per each hotspot varied considerably (Table 2). The most popular hotspot located in the main swimming hall of the city attracted a total of 538 unique users (38.2 % of all users). Similarly, a hotspot at another sporting facility (Ouluhalli) had 164 users (11.7 %). Overall, indoor hotspots were more popular in terms of users, with a total of 965 users (68.6 %). Table 2. BlueInfo usage at each hotspot (WS = Walking Street, outdoor hotspots in italics)
Hotspot Main library Ouluhalli Swimming hall Office building WS: Intersection WS: East WS: South WS: West Main square Science center Market place Culture center
Average downloads per user 3.5 6.3 6.8 2.0 2.9 4.2 4.1 2.8 3.0 3.5 3.5 5.2
A similar trend was observed in the number of actual downloads (Table 2). The swimming hall hotspot recorded a total of 3665 downloads (50.4 %), and Ouluhalli 1025 downloads (14.1 %). Indoor hotspots recorded 5737 downloads (78.9 %), and outdoor hotspots 1531 downloads (21.1 %). The average number of downloaded messages per user was also the highest in the two most popular hotspots, with 6.8 and 6.3 messages, respectively. The average number of messages downloaded per day
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild”
497
across all hotspots is 74, with the maximum of 267 messages downloaded on a single day. The average number of users per day is 16, with the maximum of 38 users on a single day. Fig. 4 depicts the evolution of the number of users and downloads during the study. There is a strong positive correlation (r=0.76) between the daily number of unique users and the number of downloads. A p-test showed that the correlation is statistically significant (r(97)=0.76, p<0.0001).
Fig. 4. Evolution of number of unique users and downloads (y-axis) over time (x-axis)
An ANOVA showed that the variation of downloads at each location was significantly affected by the day of week (F(6,7303)=21.371, p<0.0001) and hour of day (F(23,7303)=59.192, p<0.0001). The variation of downloads across weekdays (Fig. 5) shows a rising trend that peaks on Wednesdays, and declines again towards the end of the week. Further, when looking at the variation across each day, we see a rising trend towards the afternoon hours, peaking at 5 pm, and declining thereafter.
Fig. 5. Total downloads per weekday (left) and per hour of the day (right)
The number of times each service was downloaded, along with the number of users downloading it, is shown in Table 3. From the table we can see that the news service provided by Kaleva (the local main newspaper) has been by far the most popular with 1732 downloads (23.8 % of all downloads). Overall, news services (Kaleva and Reuters) have been very popular with 2397 downloads (33.0 % of all downloads).
498
H. Kukka et al. Table 3. Downloads and users per service
Service News (Kaleva) Weather Bus schedules TV programs Daily message News (Reuters) Movie service Event calendar Wappu City info
The categorization of the Bluetooth friendly names resulted in 388 (24.8 %) ‘male’ names, 160 (11.9 %) ‘female’ names, 499 (36.6 %) unknown names, and 366 (26.8 %) factory default names. Given this gender categorization, women used the outdoor hotspots more actively than men, as 27.5 % of all downloads by females occurred at outdoor hotspots, while the same number for males was 13.9 %. Both genders preferred indoor hotspots, as 86.1% of all downloads by males and 72.4 % by females occurred in indoor hotspots. When looking at the downloads of individual services, women preferred the bus service slightly more than men (12.6 % for females vs. 11.3% for males), as well as the movie service (10.5 % vs. 8.1 %), while for all other services the difference between genders was within 1 %.
6 Discussion This paper highlights several factors related to user preferences with different models of information acquisition. The questionnaire study conducted at bus stops revealed that people are open to receive content over Bluetooth, with push and pull interaction models receiving equal support. The need for relevant data was apparent, as 33 % of respondents claimed not to care how the information was delivered as long as it was relevant to them. Interestingly, only 11.5 % of the 105 respondents reported to having Bluetooth turned on in their device, even though market reports suggest that more than 90 % of devices are Bluetooth capable. This finding highlights the inefficiency of such sampling, as shown by the fact that during the period from March 2009 to August 2010, the 12 BlueInfo access points scanned 118162 unique Bluetooth devices, which is a significant amount given that Oulu has 140000 inhabitants. This large number indicates that Bluetooth could be a feasible wireless delivery mechanism for mobile information services. With these considerations in mind, we set out to explore alternate interaction models implemented atop Bluetooth technology and evaluated in different experimental settings. 6.1 Models of In-Situ Information Delivery The first prototype, BlueInfo Pull, where content is pulled with textual keywords, was successful in terms of reducing task execution times and the number of clicks in
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild”
499
comparison to a smartphone’s web browser. User acceptance of the system was high, with most users claiming that information retrieval was easier and faster than using a mobile browser. The main constraint with the keyword-based interaction model, however, is the high initial cognitive load imposed on users as they have to memorize several keywords and parameters, and the somewhat unfamiliar way of sending textual messages from the phone's notepad application instead of using SMS. To address this issue, the next prototype, BlueInfo Push, employed pushbased interaction model, which was evaluated in a month-long study on a university campus. The relatively small number of participants (33) in this study is due to the requirement for a high-end mobile device with a browser that supports the composite HTML format used to create the daily messages. The penetration of such high end models among university students was still low. However, participants who did use the system were mostly happy with it. The follow-up interviews showed that users appreciate the push-based model as it is a very low-effort way of receiving up-to-date information and entertainment content. However, users expressed discontent with the fact that the access points were not visible in the environment, and there was no way of telling when they were within range to receive a message. Users explained that they had only paired their device with one or two access points, usually those situated within their favorite restaurant or cafeteria, and would always go to those places when they wanted to receive a message. Moreover, users expressed the desire of being able to change their preferences even when they could not directly connect to the configuration page of the system’s website. The idea of a simple interactive interface in proximity of some access point was very appealing to them. With this emergent behavior push-based information delivery can begin to actually take on characteristics of pull-based interaction, where users explicitly go to a certain location to get content. The Easter egg probe provided a middle ground between the two previous prototypes. Users did need to register before using the system, removing keywordbased messages reduced interaction time, and the system employed proximity-based mode of interaction by requiring the user to explicitly place her device very close to the egg, thus clearly marking the area where content could be downloaded. The downside was that only a single message could be made available at a time, i.e. users were not able to select which content they want to receive. However, we considered the Easter egg to be a successful experiment given the number of sent messages and unique devices. Since the message was a simple animated GIF, even older devices were capable of showing it. Also, the physical user interface seemed to be a strong incentive for people to try the system as it encouraged playful and social interaction. 6.2 In-Situ Information Delivery in the Wild The final prototype utilized the 12 hotspots located in indoor and outdoor locations in a city center. In this study users were able to select which information services would be sent to their devices using the touch screen interface of the hotspot. This added feature addresses the finding from the pull experiment, where users wished to select services while on the move, and not have to have access to the configuration webpage to do so. After deployment the system was left "in-the-wild" for the general public to use, without researchers' intervention or supervision. The point of this type of study is to see if a deployed system can survive on its own in a highly public setting, and
500
H. Kukka et al.
whether or not it will entice people to start using it. Additionally, because BlueInfo was offered as a part of a large set of different applications, it was interesting to see if people would find and use this service, as it was among the more technically challenging applications in the hotspots (users were required to begin the interaction sequence on the hotspot, and continue it on their mobile device). Over the course of the 100-day study, 1338 unique devices were used to interact with BlueInfo. We consider this to be a highly encouraging number, as it shows real users’ willingness to adopt such a service. The number of users utilizing the service on several hotspots was rather low, however, indicating a high amount of 'curiosity usage' where people try the system once, and never return to it, which is to be expected in a real-life setting. The group of lead users who did use the service on several hotspots became very active in downloading content, indicating that they had adopted the service as part of their everyday information seeking behavior. The number of users and downloads illustrates that people do have a need for easily accessible information services that they can take with them. Observations from the use of the hotspots indicate that the proportion of people utilizing their mobile phones for on-line information access (i.e. web browsing), or even owning suitable smart phones, is still rather low, especially with older people. In many studies mobile on-line services are offered as a the solution for information acquisition while on the move, but the longitudinal observations and data from this study shows that alternative ways of content delivery need to be explored. Marketing research shows that high-end smartphones with large touch screens suitable for extended on-line access are still somewhat rare among the general population. According to a recent report [35], only 19 % of mobile phones globally are smartphones. Furthermore, people still often quote cost as a major reason for not doing mobile browsing. Bluetooth-based services are a viable option, as WPAN connectivity is free of charge, and browsing the downloaded content does not require further (IP-based) connectivity. 6.3 Towards Understanding Users’ In-Situ Behaviour Over the course of the longitudinal study, 965 (72.1 %) of the 1338 devices used BlueInfo at indoor hotspots to execute 78.9 % of all downloads. This may be explained by the time of year the data gathering took place. The arctic winter conditions in Northern Europe do not encourage outdoor use of mobile devices, which often requires the user to remove his/her gloves to operate the keypad. There is a (somewhat low) correlation between the mean temperature each day, and the number of users in outdoor hotspots (r=0.3). A p-test shows that this correlation is statistically significant (r(91)=0.3, p<0.004). The highly public nature of the outdoor hotspots may also discourage first-time users from attempting to use the system, as they may be afraid of failing in front of a lot of people. Regarding the most active days and hours of use, Wednesdays, Tuesdays, and Fridays stand out as usage peaks. Similarly, usage peaked around afternoon hours between 5 pm and 7 pm, and other times of high activity were 2 pm and 11 am. This cycle reflects the daily and weekly rhythm of the city, where there is high activity at lunchtime (around 11 am), when youngsters get out of school (around 2 pm), and when people leave work (around 5 pm). The lower number of users during weekends
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild”
501
is partially explained by the fact that some of the indoor locations are closed and thus not accessible during weekends. When looking at the usage of individual hotspots, the hotspot at the municipal swimming hall clearly stands out. This hotspot attracted 38.2% of all users, and 50.4% of all downloads. Similarly, the hotspot located in another large sporting facility (Ouluhalli) attracted substantial use. From this data we can draw the conclusion that sporting facilities work well as settings for interactive displays. This may be due to several factors, such as a high number of children and teenagers, but also the extent to which the physical environment instills trust in users, as shown in [17]. From our empirical evidence we have found that youngsters often have an open view on new technology, and are willing to try it out without encouragement or external reward. In fact, emerging technologies are often a driver of younger users behaviour, shaping multiple aspects of their everyday life [16]. Older people, on the other hand, often have a more reserved view, and require explicit training or encouragement before starting to interact with a device/service. This was confirmed during another study utilizing the hotspots, where public events were arranged to educate people on the use of the hotspots and services. A majority of participants in these training events were middleaged or elderly (45% over 50 years of age), while teenagers and young adults often declined participating, claiming to already be familiar with the technology. Further, people in sporting facilities often have idle time, when either waiting for their training shift to begin, or parents waiting for their children to finish practice. This perceivably attracts people to experiment with new technology. There is correlation between the number of distinct days people visited a hotspot (as determined by the number of times their device was scanned by a Bluetooth access point) and on how many of those days they actually used BlueInfo. Overall, we find a regression of how many visits people made to a location with a hotspot and how many times they used BlueInfo with a=0.15, suggesting that people use the service once every 7 days. People who did not use BlueInfo at all on average visited the hotspot on 5 separate days during the study. This shows that they did not "make the threshold" of 7 days which would likely cause them to use BlueInfo at least once. For the swimming hall hotspot we find a similar pattern with a=0.2, suggesting people use the hotspot every 5 days. The people who did not use the hotspot visited it on average 2.8 separate days, hence again not "making the threshold". 6.4 Implications for Design Different locations attract different types of use. Outdoor locations, usually characterized by their highly public context, do not necessarily attract first-time users if the offered service is perceived as technically challenging. Further, weather conditions have a considerable effect on outdoor usage. Indoor displays, on the other hand, seem to attract heavy usage, especially with hotspots situated in locations geared towards leisure such as the two sporting facilities used in this study. This would suggest that other projects planning on deploying public displays should look into the possibility to have at least one in such a context. Related to this, there appears to be a 'threshold' of how many times a person has to visit a hotspot before becoming an active user: in our case of the swimming hall hotspot this threshold was 5 days. This is an interesting phenomenon that requires further study before explicit implications for design can be drawn.
502
H. Kukka et al.
Bluetooth is a viable but underused channel for delivering In-Situ information. The very large number of Bluetooth devices scanned by the hotspots, the high number of downloads using BlueInfo, along with the results from the different studies presented here, suggest that a substantial number of people a) own Bluetooth capable devices, b) have the Bluetooth radio turned on and set to discoverable, c) do not care how information is transferred to their devices, as long as the information is relevant to them, and d) appreciate the option to select information services from a public display for immediate pick-up. This implies that low-effort, fast-to-use interaction models, such as the one presented here, work well because people do not have to install additional client software on their devices prior to using a system, which in turn encourages a lot of curiosity usage which, eventually, turns into continued use for some proportion of users. In-Situ information pick-up demonstrates strong daily, weekly and seasonal patterns. These patterns intuitively appear to reflect the rhythm of the city. For design, this implies that users are more willing to interact with systems found in the environment during these times. Context-aware services supporting activities normally found around these times would have the potential to become highly used and valuable to people. Interestingly, similar usage patterns were found in a recent study on Twitter usage [31]. The highest number of 'tweets' is found on Tuesdays, Thursdays and Fridays. Further, the number of 'tweets' peaks at 5 pm, and between 10 pm and 11 pm. This further confirms the fact that these are optimal times for offering services, and the correlation provides fascinating possibilities for further research. Push-based services seem to be well received by people. However, users need to be made aware of the location and range of access points offering push-based content. Otherwise, users may begin to re-appropriate their interaction model by always going to a place where they know information will be pushed to them, thus shifting the interaction paradigm more towards pull-based interaction. Related to this, a physical artifact such as the Easter egg probe functions well in communicating interaction possibilities to users. These types of artifacts could be employed to make users aware of access points, which could push content to them.
7 Conclusion We presented a set of prototypes for in-situ information delivery over Bluetooth. Given our findings, we conclude that Bluetooth is a viable option for delivering mobile information services, and that the general public is willing to utilize such services together with interactive public displays. We showed that usage differs between hotspots placed in varying contexts and that several environmental factors affect the usage of services. We explored the relationship between how many times a person visits a space augmented with the hotspots and how likely they are to become active users. Finally, we presented a set of design and deployment guidelines that should be useful for anyone planning similar installations. Acknowledgments. The authors wish to thank the Finnish Funding Agency for Technology and Innovation, the Academy of Finland, the ERDF, the City of Oulu and the UBI (UrBan Interactions) consortium for their valuable support.
Information to Go: Exploring In-Situ Information Pick-Up “In the Wild”
503
References 1. Acharya, S., Franklin, M., Zdonik, S.: Balacing Push and Pull for Data Broadcast. In: Proc. ACM SIGMOD Conference on Management of Data, pp. 183–194 (1997) 2. Ballagas, R., Borchers, J., Rohs, M., Sheridan, J.: The Smart Phone: A Ubiquitous Input Device. IEEE Pervasive Computing 5(1), 70–77 (2006) 3. Berti, S., Paterno, F., Santoro, C.: A taxonomy for migratory user interfaces. In: Proc. DSV-IS 2005, Newcastle upon Tyne, UK, pp. 149–160 (2005) 4. Brignull, H., Rogers, Y.: Enticing people to interact with public displays in public spaces. In: Proc. INTERACT 2003, Zurich, Switzerland, pp. 17–24 (2003) 5. Bluetooth, SIG press release, http://www.bluetooth.com/Bluetooth/Press/SIG/AWARENESS_OF_BL UETOOTH_WIRELESS_TECHNOLOGY_CONTINUES_TO_CLIMB.htm (accessed January 24, 2011) 6. Camacho, T., Kostakos, V., Mantero, C.: A Wireless Infrastructure for Delivering Contextual Services and Studying Transport Behavior. In: Proc. ITSC 2010, Funchal, Portugal, pp. 943–948 (2010) 7. Cheverst, K., Dix, A., Fitton, D., Kray, C., Rouncefield, M., Sas, C., Saslis-Lagoudakis, G., Sheridan, J.: Exploring Bluetooth based Mobile Phone Interaction with the Hermes Photo Display. In: Proc. MobileCHI 2005, Salzburg, Austria, pp. 47–54 (2005) 8. Davies, N., Friday, A., Newman, P., Rutlidge, S., Storz, O.: Using Bluetooth Device Names to Support Interaction in Smart Environments. In: Proc. MobiSys 2009, Kraków, Poland, pp. 151–164 (2009) 9. Huang, A., Pulli, K., Rudolph, L.: Kimono: Kiosk-Mobile Phone Knowledge Sharing System. In: Proc. MUM 2005, Christchurch, New Zealand, pp. 142–149 (2005) 10. Huang, E., Koster, A., Borchers, J.: Overcoming assumptions and uncovering practices: When does the public really look at public displays? In: Indulska, J., Patterson, D.J., Rodden, T., Ott, M. (eds.) PERVASIVE 2008. LNCS, vol. 5013, pp. 228–243. Springer, Heidelberg (2008) 11. Holleis, P., Rukzio, E., Otto, F., Schmidt, A.: Privacy and curiosity in mobile interactions with public displays. In: Proc. CHI 2007 Workshop on Mobile Spatial Interaction, San Jose, USA (2007) 12. Jóse, R., Otero, N., Izadi, S., Harper, R.: Instant Places: Using Bluetooth for Situated Interaction in Public Displays. IEEE Pervasive Computing 7(4), 52–57 (2008) 13. Kostakos, V., Nicolai, T., Yoneki, E., O’Neill, E., Kenn, H., Crowcroft, J.: Understanding and measuring the urban pervasive infrastructure. Personal and Ubiquitous Computing 13(5), 355–364 (2009) 14. Kostakos, V., O’Neill, E.: NFC on mobile phones: issues, lessons and future research. In: Adjunct Proceedings of the Fifth IEEE International Conference on Pervasive Computing and Communications (Percom 2007), Pervasive RFID/NFC Technology and Applications Workshop (Pertec 2007), March 19-23, pp. 367–370. White Plains, NY (2007) 15. Kostakos, V., O’Neill, E.: Capturing and visualising Bluetooth encounters. In: Adjunct Proceedings of CHI 2008, Workshop on Social Data Analysis, Florence, Italy (2008) 16. Kostakos, V., O’Neill, E., Little, L., Sillence, E.: The social implications of emerging technologies. Interacting with Computers 17(5), 475–483 (2005) 17. Kindberg, T., O’Neill, E., Bevan, C., Kostakos, V., Stanton Fraser, D.S., Jay, T.: Measuring Trust in Wi-Fi Hotspots. In: Proc. CHI 2008, Florence, Italy, pp. 173–182 (2008)
504
H. Kukka et al.
18. Kukka, H., Kruger, F., Ojala, T.: BlueInfo: Open architecture for deploying web services in WPAN hotspots. In: Proc. ICWS 2009, Los Angeles, CA, USA, pp. 984–991 (2009) 19. Masinter, L.: The ”data” URL scheme, RFC 2397, http://www.faqs.org/rfcs/rfc2397.html (accessed January 24, 2011) 20. Müller, J., Jentsch, M., Kray, C., Krüger, A.: Exploring Factors that Influence the Combined Use of Mobile Devices and Public Displays for Pedestrian Navigation. In: Proc. NordiCHI 2008, Lund, Sweden, pp. 308–317 (2008) 21. Müller, J., Wilmsmann, D., Exeler, J., Buzeck, M., Schmidt, A., Jay, T., Krüger, A.: Display blindness: The effect of expectations on attention towards digital signage. In: Tokuda, H., Beigl, M., Friday, A., Brush, A.J.B., Tobe, Y. (eds.) Pervasive 2009. LNCS, vol. 5538, pp. 1–8. Springer, Heidelberg (2009) 22. Mäkinen, E., Patomäki, S., Raisamo, R.: Experiences on a Multimodal Information Kiosk with an Interactive Agent. In: Proc. NordiCHI 2002, Århus, Denmark, pp. 275–277 (2002) 23. Nylander, S.: Approaches to achieving device independent services - an overview. Technical Report T2003-16, SICS, Sweden 24. O’Neill, E., Kostakos, V., Kindberg, T., Schiek, A.F.g., Penn, A., Fraser, D.S., Jones, T.: Instrumenting the city: Developing methods for observing and understanding the digital cityscape. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 315– 332. Springer, Heidelberg (2006) 25. Ojala, T., Kukka, H., Linden, T., Heikkinen, T., Jurmu, M., Hosio, S., Kruger, F.: UBI-hotspot 1.0: Large-scale Long-term Deployment of Interactive Public Displays in Authentic Setting in City Center. In: Proc. ICIW 2010, Barcelona, Spain, pp. 285–294 (2010) 26. Paluska, J.M., Pham, H., Saif, U., Chau, G., Terman, C., Ward, S.: Structured decomposition of adaptive applications. Pervasive and Mobile Computing 4(6), 791–806 (2008) 27. Pirolli, P., Card, S.: Information Foraging. Psychological Review 106, 643–675 (1999) 28. Peltonen, P., Kurvinen, E., Salovaara, A., Jacucci, G., Ilmonen, T., Evans, J., Oulasvirta, A., Saarikko, P.: ”It’s mine, don’t touch!”: Interactions at a large multi-touch display in a city centre. In: Proc. CHI 2008, Florence, Italy, pp. 1285–1294 (2008) 29. Scheible, J., Ojala, T.: MobiLenin - Combining a multi-track music video, personal mobile phones and a public display into multi-user interactive entertainment. In: Proc. ACM Multimedia 2005, Singapore, pp. 199–208 (2005) 30. Storz, O., Friday, A., Davies, N., Finney, J., Sas, C., Sheridan, J.: Public ubiquitous computing systems: lessons from the e-Campus display deployments. IEEE Pervasive Computing 5(3), 40–47 (2006) 31. State of the Twittersphere, http://www.hubspot.com/Portals/53/docs/01.10.sot.report.pdf (accessed January 24, 2011) 32. Tan, K., Ooi, B.: Data Dissemination in Wireless Computing Environments (Advances in Database Systems). Springer, Heidelberg (2000) 33. Terrenghi, L., Quiqley, A., Dix, A.: A Taxonomy for and Analysis of Multi-PersonDisplay Ecosystems. Personal and Ubiquitous Computing 13(8), 538–598 (2009) 34. Vogel, D., Balakrishnan, R.: Interactive Public Ambient Displays: Transitioning from Implicit to Explicit, Public to Personal, Interaction with Multiple Users. In: Proc. UIST 2004, Santa Fe, NM, USA, pp. 137–146 (2004) 35. http://communities-dominate.blogs.com/ brands/2010/08/smartphone-bloodbath-report-card-at-halfpoint-of-year-2010-for-all-major-brands.html (accessed October 10, 2010)
IntelliTilt: An Enhanced Tilt Interaction Technique for Mobile Map-Based Applications Bradley van Tonder and Janet Wesson Department of Computing Sciences, Nelson Mandela Metropolitan University PO Box 77000, Port Elizabeth, South Africa, 6031 {Bradley.vanTonder,Janet.Wesson}@nmmu.ac.za
Abstract. Current interaction techniques for mobile map-based applications suffer from several usability problems. Tilt interaction provides an alternative form of interaction which combines the benefits of one handed interaction with intuitive physical gestures. Research has shown that tilt interaction suffers from a lack of controllability, high mental demand and practical concerns. In this paper, the design and evaluation of a new tilt interaction technique, called IntelliTilt, is described. IntelliTilt incorporates several intelligent techniques to address the shortcomings of tilt interaction. IntelliTilt was compared to a basic tilt interaction technique using a prototype mobile map-based application in an experiment. The results of this experiment showed that IntelliTilt was preferred by the participants and that it offered significant advantages in terms of mental demand, perceived efficiency and controllability. Keywords: Tilt interaction, mobile map applications, sensor-based interaction.
technique, called IntelliTilt, which incorporates several design features to address the shortcomings of basic tilt interaction. These features include making use of visual and vibrotactile feedback to improve the ease with which users can control panning and selection operations. Attractor mechanisms are included to make it easier to settle the cursor on a particular icon. Sensitivity adaptation is included to compensate for variability in accelerometer data while walking. A gesture zooming technique was developed to provide an intuitive method of zooming. These features were designed to address the shortcomings of basic tilt interaction identified earlier. A user study was conducted to compare IntelliTilt to basic tilt interaction incorporating SDAZ. The purpose of the user study was to determine whether the design features of IntelliTilt helped to address the previously identified shortcomings of basic tilt interaction. Participants were required to perform typical mobile map-based tasks while seated and while walking. Touch-screen interaction was not considered in this research as our focus was on investigating one-handed interaction techniques for mobile devices. This paper begins with a discussion of related work. The design of the IntelliTilt interaction technique is then described. A prototype mobile map-based application, called MapExplorer, which was used for evaluation purposes, is described. Experimental results are then presented, followed by a discussion of the implications of these results. Finally, conclusions and ideas for future work are presented.
2 Related Work 2.1 Mobile Map-Based Applications Mobile map-based applications typically support users in performing the following tasks [7, 8]: • •
Locating: Identifying the position of something (e.g. where am I?); Searching: Identifying facilities matching certain criteria (e.g. where is the nearest hotel?); • Navigating: Finding a route between two points or navigating along a route; • Checking: Determining the condition of a person or place (e.g. operating hours of a business); and • Identifying: Identifying and recognizing people, places or objects (e.g. the name of a business). These five high-level tasks are accomplished through several low-level operations. Irrespective of the interaction technique used, three low-level operations are commonly used, namely panning, zooming and selection [9]. 2.2 Tilt Interaction Tilt interaction was first proposed almost 15 years ago as an experimental form of interaction for mobile devices [10]. Research into the use of tilt interaction has since been extended to a range of domains including menu navigation, text entry, mobile museum guides, photo browsing and interacting with mobile map-based applications [3, 4, 11]. Most existing research has focused on the use of tilt as a means of performing panning, where the tilt angles along the x and y axes are mapped onto
IntelliTilt: An Enhanced Tilt Interaction Technique
507
panning speeds in the horizontal and vertical directions. Most of this research relied on accelerometer data to determine tilt angles. Recently, gyroscopes and digital compasses have also started to be integrated into mobile phones, allowing for more accurate measurement of pitch and roll angles [12]. Tilt interaction techniques allow for one-handed interaction with mobile devices. This is desirable in a mobile context of use and allows the user an unobstructed view of the display. Furthermore, one-handed interaction requires less visual attention than bimanual control [13]. Existing studies have shown that users like tilt interaction because of its natural, expressive and intuitive nature [3]. Unlike keypad interaction, tilt interaction allows for fine-grained control over panning speed. Different implementations of tilt interaction differ in terms of the mapping between tilt input and the corresponding effect. Some implementations identify discrete inputs, where tilting past a certain threshold is interpreted similarly to a button press [14]. Other implementations allow for continuous input, where tilt input is used to continuously perform panning [15]. The mapping between tilt input and panning effect can be done using position control, rate control or inertial control [14]. Rate control maps the rate at which the display pans onto the angle at which the device is tilted relative to the neutral position. For example, the further the device is tilted to the left, the faster the display pans left [16]. The function mapping tilt input onto panning speed does not need to be linear. It has been suggested that a linear mapping for small tilt angles allows for fine-grained control, while the amplification of larger tilt-angles makes it easier to pan faster over long distances [15]. This approach has previously been used in a mobile map-based application [17]. Discretization can be used to improve the level of control users have over tilt interaction. Different tilt angle intervals are mapped onto different tilt speeds to minimize jitter and allow for finer-grained control. Linear, quadratic and sigmoidal discretization are all possible. Existing research has provided evidence that quadratic discretization allows for the highest level of user control [18]. Some mobile map-based applications allow panning and zooming to be performed simultaneously. Most of these applications implement Speed Dependent Automatic Zooming (SDAZ). SDAZ was proposed to address some of the problems and shortcomings of traditional zooming and panning techniques [19]. These problems include blurring (and disorientation) when panning at high speeds and the need for the user to switch focus between the display and user interface controls [19, 20]. SDAZ adjusts the zoom-level automatically in response to the speed at which the user pans the display. SDAZ has since been implemented for touch-screen mobile devices [21]. SDAZ has been implemented using tilt interaction to perform combined zooming and panning in a mobile document browser [22]. Kratz and Rohs[1] combined an element of manual control over the zoom level with a tilt-controlled SDAZ implementation. Their technique, called semi-automatic zooming (SAZ), allows users to control the zoom level using a touch-screen slider control. SAZ was shown to offer significant efficiency, workload and user satisfaction advantages over SDAZ. 2.3 Problems and Shortcomings An experiment was previously conducted by the authors to compare tilt interaction (incorporating SDAZ) and keypad interaction [6]. Participants were required to
508
B. van Tonder and J. Wesson
perform typical mobile map-based tasks (locating, navigating and checking) using a prototype mobile map-based application, called MapExplorer. This experiment took the form of a lab-based user study and involved 32 participants. The results of this experiment allowed several shortcomings of basic tilt interaction to be identified: •
• • • •
Controllability: Tilt interaction was shown to be difficult to control precisely, particularly for performing selections. This problem is not restricted to our implementation and is a well-documented shortcoming of tilt interaction [2]. Tilt interaction used for panning also tends to suffer from overshooting problems, where users pan past their intended target and have to reverse direction [23]. Zooming: The use of SDAZ in MapExplorer yielded mixed results. While SDAZ did allow users to pan longer distances more rapidly, it was criticized because it was too easy to trigger accidentally and took control away from the user. Mental Demand: Tilt interaction was reported to be more mentally demanding than keypad interaction, particularly for locating and checking tasks. Sensitivity: Several participants felt that the sensitivity of the tilt interaction did not match their expectations (either too sensitive or not sensitive enough). Practicality: Some participants commented that they felt that tilt interaction was unlikely to be feasible in other contexts of use, such as while walking. This is due to the fact that tilt interaction requires users to exert control over their physical movements while interacting with the mobile device.
Several design modifications were made to the basic tilt interaction technique used in the above experiment in order to address the shortcomings identified. The design of this modified tilt interaction technique, called IntelliTilt, is described below.
3 Intellitilt: Design and Implementation IntelliTilt was designed to address the shortcomings of basic tilt interaction discussed above. Table 1 summarizes the relationships between the problems identified in basic tilt interaction and the features of IntelliTiltdesigned to address these shortcomings. Table 1. Shortcomings of tilt interaction and IntelliTilt features designed to address these shortcomings Shortcoming
Feature(s)
Controllability
Visual and vibrotactile feedback, attractor mechanisms
Zooming
Gesture zooming
Mental Demand
Visual and vibrotactile feedback
Sensitivity
Sensitivity adaptation
Practicality
Sensitivity adaptation, dwell-time selection
The design of each of the above features of IntelliTilt is now described in more detail.
IntelliTilt: An Enhanced Tilt Interaction Technique
509
3.1 Visual and Vibrotactile Feedback The results of the previous experiment showed that participants struggled to control tilt interaction. Qualitative comments revealed that some participants found the exact effects of tilt gestures (in terms of panning speed and direction) difficult to predict. In order to accurately control tilt interaction, users need to be aware of the effect the current orientation of the mobile phone has on the direction and speed of panning. Various modalities have previously been employed to aid in controlling tilt interaction, including visual, vibrotactile and audio feedback [1, 24, 25]. Audio feedback was considered to be impractical because it is likely to be annoying to other people (and the use of headphones is not always feasible). Previous research has only made limited use of visual feedback to indicate cursor position and to control automatic zooming [1]. IntelliTilt makes use of visual feedback in the form of arrows attached to the cursor, showing the vertical and horizontal panning speeds (Figure 1). The length of these arrows is used to denote speed (the longer the arrow, the faster the panning speed). A simple linear mapping between panning speed and arrow length is used. Cursor Horizontal Panning Direction
Horizontal panning speed
Vertical Panning Direction
Fig. 1. Design of visual feedback in IntelliTilt (left) and an example of its use (right)
IntelliTilt also makes use of vibrotactile feedback to reinforce the visual feedback channel. This form of feedback is particularly useful in situations where the user may be partially distracted (e.g. while walking). Previous research regarding the use of vibrotactile feedback in conjunction with tilt interaction for menu navigation showed that users liked the use of short vibration pulses to indicate movement from one menu item to another [24]. IntelliTilt uses short vibration pulses (250 ms) to indicate when the cursor moves over point of interest (POI) icons and route markers. This duration was selected as it has previously been employed for similar purposes and experimentation revealed it to be long enough to be noticeable, but short enough to not irritate users [24]. The available map area is divided into grid cells based on latitude and longitude, with POI icons indexed into these cells using a hash table. This enables IntelliTilt to efficiently determine whether the cursor is close enough to a POI or route marker icon to trigger vibrotactile feedback.
510
B. van Tonder and J. Wesson
3.2 Attractor Mechanisms One of the most significant problems encountered during the use of tilt interaction in mobile map-based applications is that users struggle to settle the cursor on a particular target icon. In a previous system using tilt interaction to browse photo collections on a mobile phone, attractors were employed to make it easier to settle on a particular photo [3]. This idea was extended to the two-dimensional domain of map browsing. The attractors are designed to work in conjunction with a discretization approach, which splits tilt input into discrete speed levels. If both the current horizontal and vertical panning speeds are slow (in which case it is likely that the user is trying to select an icon), the algorithm then determines whether any selectable icons are within a specified distance. Nearby icons are identified using the same indexing method used for vibrotactile feedback. If a nearby icon exists, the position of the cursor is adjusted vertically and horizontally to draw the cursor towards the centre of the nearest icon. Figure 2 illustrates the use of attractors to draw the cursor (shown in blue) towards the nearest POI within range of the cursor.
Fig. 2. Example of the use of attractors to aid selection
The resulting effect is one of drawing the cursor slowly towards selectable icons. The effects of the attractors are easily overcome by increasing panning speed, so as to avoid being drawn onto icons when this is not desired. The use of attractors eliminates the need to exactly position the cursor, as the user is able to approximately position the cursor and is assisted by the attractor to perform exact selection. 3.3 Sensitivity Adaptation One of the problems identified in the previous experiment was that users felt that tilt interaction would be impractical to use in a mobile context due to the sensitivity of this form of interaction [6]. If tilt interaction is not sensitive enough, it is inefficient, while if it is too sensitive it is difficult to use while walking. Sensitivity adaptation was implemented in IntelliTilt to address this problem. Streams of accelerometer data along the x, y and z axes are constantly recorded and monitored. Accelerometer data is sampled at approximately 20 Hz and a sliding window of data from the last second (last 20 samples) is used to identify whether the user is currently stationary or mobile. Such a sampling rate has previously been
IntelliTilt: An Enhanced Tilt Interaction Technique
511
successfully employed in a similar mobile application [1]. Several data samples were recorded of users using MapExplorer while stationary and while walking in order to measure the baseline acceleration variances for the three axes. While performing tilt operations, spikes in acceleration along one or more of the axes are recorded, so a single spike in acceleration is not sufficient to identify when the user is walking. Variances were calculated for acceleration along the three axes, as these provide a more accurate measurement of variability in the acceleration data than the raw values themselves. Analysis of the recorded data showed that while walking, all three acceleration variance values were typically greater than 0.125 (where acceleration is measured in g-forces). This is as a result of the linear acceleration detected by the accelerometer while the user is walking. While seated, acceleration is largely as a result of deliberate tilting gestures, which typically involve acceleration along only one or two axes. As a result, all three values rarely exceeded this threshold at the same time while the user was seated. This observation allows a computationally inexpensive method of identifying when the user is walking that can be performed quickly enough not to negatively affect system performance. Figure 3 illustrates the acceleration variances using actual recorded data while seated and walking.In order to improve the accuracy of the algorithm, all observations within a second need to indicate a change in order for the algorithm to transition between walking and stationary states. Preliminary testing revealed this method to be extremely accurate at identifying when users were walking (> 98% accuracy in informal trials). Number of Accelerometer Axes Variances > 0.125 (Walking)
Number of Accelerometer Axes Variances > 0.125 (Seated) 3
3
2
2
1
1
0
Time
0
Time
Fig. 3. Comparison of the number of axes recording acceleration variances of greater than 0.125 while seated (left) and while walking (right). Acceleration measured in g-forces.
In order to compensate for increased acceleration variability when walking, IntelliTilt decreases the sensitivity of tilt interaction (Figure 7). Tilt interaction techniques typically employ a “dead zone”, wheresmall tilt angles are ignored to allow the user to maintain a stable cursor position. IntelliTilt also increases the size of the dead zone when the user is walking. IntelliTilt constantly observes the variation in the acceleration data in order to identify when to implement sensitivity adaptation. 3.4 Gesture Zooming Previous implementations of tilt interaction have largely relied on SDAZ, where the zoom level is linked to the panning speed [1, 6, 26]. This form of zooming, however,
512
B. van Tonder and J. Wesson
has been shown to be frustrating for users who feel that it takes control away from them. In an attempt to find a form of zooming that would offer better usability and user satisfaction, a gesture-based zooming technique was implemented in IntelliTilt.
Fig. 4. Zooming out using gesture zooming in IntelliTilt
Gesture zooming in IntelliTilt relies on physical movement of the mobile phone. Zooming out was implemented using a backward movement of the device (Figure 4), while zooming in was implemented using a forward movement of the device, relative to the z-axis (which is perpendicular to the phone’s screen). This mapping was designed to be conceptually the same as moving a camera away or towards the map. Such an approach has previously been used in a camera-controlled mobile map-based application [27].These gestures are identified by constantly examining the z-axis acceleration data received from the accelerometer to identify spikes in the data. Z-Acceleration vs. Time 0.8
Z-axis Acceleration (g's)
0.7 0.6 0.5 0.4 0.3 0.2
Zooming Out
Zooming in
0.1 0
Time
Fig. 5. Z-axis acceleration data showing zoom gestures. The dotted line shows the baseline acceleration which changes depending on the orientation of the phone.
Acceleration data was captured and stored of users performing the zooming gestures using a Nokia N97. Figure 5 shows an example of the z-axis acceleration data captured. The spikes in the data show where zooming gestures were performed. Zooming in and zooming out gestures can be differentiated from each other based on
IntelliTilt: An Enhanced Tilt Interaction Technique
513
the order in which the lower and upper spikes occur (this can be used to infer the direction the phone is moving). Experimental trials identified changes of 0.15g either side of the baseline z-acceleration (when the phone is at rest) as suitable to identify zooming in and zooming out gestures. While walking, this interval is increased to allow for the greater variability in z-axis acceleration data. The acceleration values reported during zooming gestures must take the starting orientation into account. As shown in Figure 5 (dotted line), the baseline z-acceleration value will differ depending on the orientation of the phone. As a further measure to avoid accidental zooming, zooming gestures have to be completed within 300ms (normal movement is generally more gradual). This threshold was determined after experimentation to identify the most suitable interval. 3.5 Dwell-Time Selection During the previous experiment, it was observed that users would often accidentally pan away from a target icon while touching the screen to perform a selection operation [6]. Furthermore, the original design required both hands to perform selection operations. In order to combat this, dwell-time selection was implemented in IntelliTilt. If the cursor is over an icon for longer than 2.75 seconds, the icon is selected without the user having to do anything. Vibration pulses of increasing durations (25ms, 50ms and 75ms) are used to indicate to the user that a selection operation is about to be performed. The duration of 2.75 seconds was selected after experimentation and was set relatively long as the timing starts when the user is within 25 pixels of any icon, rather than directly over an icon. Shorter durations resulted in unintentional selections taking place. Dwell-time selection can be deactivated when not desired and manual touch-screen selection is always possible.
4 Prototype: MapExplorer A prototype mobile map-based application supporting both basic tilt and IntelliTilt was developed for experimental purposes. The prototype, called MapExplorer, was implemented in Java ME and Python and tested using a Nokia N97. Microsoft’s Bing Maps service was used as the source of the maps in the application, with map tiles being saved to the phone’s memory to improve performance. MapExplorer allows users to browse maps at ten different zoom levels (from 1 pixel = 4891.97m to 1 pixel = 9.56m). A caching system is used, whereby the most recently loaded map tiles are stored in memory, allowing these tiles to be efficiently loaded the next time they are needed. When the maximum cache size is reached (40 tiles), the least recently used tiles are removed from memory to make space for the new tiles.The basic functions of MapExplorer are first described, along with the implementation of tilt interaction. The implementation of SDAZ, used in the basic tilt technique, is also described. 4.1 Functionality MapExplorer supports four of the five typical mobile map-based tasks (locating, navigating, identifying and checking). Searching was not implemented in order to ensure that users interacted with the map-based display.
514
B. van Tonder and J. Wesson
Three categories of POIs are included in the system, namely restaurants, hotels and tourist attractions, each denoted by a different icon (Figure 6). Selecting a POI icon displays detailed information such as a description, contact details and opening hours. MapExplorer also allows users to plan and follow routes. A route planning web service was used to calculate routes and retrieve routing instructions. By moving the cursor over route information markers, users are able to read routing instructions (Figure 6).
Fig. 6. Navigating using MapExplorer
Tilt interaction in MapExplorer is implemented using a rate control approach. The further the device is tilted in a certain direction, the faster the display is panned in that direction. The user is able to finely control the speed and direction of panning by adjusting the tilt angle in any direction. The accelerometer in the Nokia N97 is able to detect acceleration between -2g and 2g relative to the x, y and z axes. The Python sensor module was used to access the acceleration data. Accelerometer values along the x, y and z axes are used to calculate pitch and roll angles [28].
Fig. 7. Discretisationwhere the user is (a) stationary and (b) mobile
Discretization is used to map pitch and roll angles onto panning speeds (Figure 7). This process consists of splitting the range of motion into intervals, with tilt angles falling in the same interval resulting in the same effect. Discretization intervalsare increased for larger angles, as previous research has shown that this approach allows for the highest level of user control. Panning speed increases linearly for the first two
IntelliTilt: An Enhanced Tilt Interaction Technique
515
discretisation intervals and quadratically for the last three. Previous research has shown that this allows for fine control with small tilt angles and faster panning with largerangles. A dead zone is included to avoid unintentional panning as a result of minor orientation adjustments and hand tremor. Tilt control is activated and deactivated by tapping anywhere on the display. This allows the user to lock the display and prevents accidental panning from taking place.The neutral orientation, relative to which tilt angles are calculated, is the orientation of the phone when tilt interaction is activated. 4.2 SDAZ Implementation The zoom level is linked to panning speed in order to prevent the problem of extreme visual flow, where the display can become blurred when panning at high speeds [19]. In order to combat this problem, the display is automatically zoomed out in order to prevent blurring when panning speed exceeds a certain threshold. The zoom level is a function of the panning speed. According to the original implementation of SDAZ, the zoom level is calculated as the normalized displacement between the current panning speed and the minimum panning speed which triggers zooming [19]. That is: (1) Where so is the minimum scale, dy is the current panning speed, d0 is the minimum panning speed which triggers zooming out and d1 is the maximum panning speed. When automatic zooming takes place, a red rectangle is displayed to denote the area that will fill the screen when the panning speed is decreased again (Figure 8). A zoom level indicator is also temporarily displayed while the user is zooming. Automatic zooming was limited to only zooming out a single discrete zoom level to minimize user disorientation. Manual zooming can still be performed using the on-screen controls.
Fig. 8. Panning at a high speed using basic tilt interaction and SDAZ
5 User Study In order to determine whether IntelliTilt helped to address the shortcomings identified in the previous experiment, a user study was conducted to compare the tilt interaction
516
B. van Tonder and J. Wesson
technique developed in the original version of MapExplorer (referred to hereafter as basic tilt) with the modified interaction technique, called IntelliTilt. 5.1 Method Participants. Sixteen participants (eleven male, five female) took part in the experiment. Nine participants had prior experience with tilt interaction, but in most cases this was only limited use of mobile games or the Nintendo Wii. All participants were computer science students between the ages of 20 and 29. Four participants were left handed and most indicated occasional use of mobile map-based applications. Experimental Design and Tasks. A within-subjects approach was used with all participants using both interaction techniques. Two similar task sets were used to offset any order effects and the order in which the task sets were used was counterbalanced across participants. The order in which participants used the two interaction techniques was counterbalanced to offset any learning effect. The two techniques were simply named “version 1” and “version 2” (depending on the order in which they were used) in all questionnaires and documentation to avoid any bias.Participants were required to complete three types of tasks: • • •
Locating tasks: participants were required to find a particular POI icon (e.g. Locate the Ritz Hotel). Locating tasks therefore incorporate identify tasks. Navigating tasks: participants were required to plan and follow a route from a particular start point to a particular end point (e.g. Plan a route from the beach to the airport. Follow the route from start to finish). Checking tasks: participants were required to check specific information for a particular POI (e.g. check the opening hours of Starlight Restaurant).
Participants had to complete two tasks of each task type while seated followed by two tasks of each type while walking. Walking tasks were included in order to evaluate the two interaction techniques in situations in which there is likely to be more variability in the accelerometer data as a result of movement. Walking tasks were conducted indoors in a laboratory environment. All tasks started at the scale 1 pixel=38.22m. Targets were located in a major metropolitan area. Metrics. The following three sets of metrics were collected: •
•
•
Perceived workload: Participants were required to complete a post-task questionnaire after using each interaction technique. This questionnaire included the six perceived workload measures from the NASA-TLX questionnaire (mental, physical and temporal demand, performance, effort and frustration) [29]. User satisfaction: Six questions were included in the post-task questionnaire to measure perceived efficiency, effectiveness, ease of use, controllability, ease of performing selections and ease of use while walking. This section was based on the standard After-Scenario Questionnaire (ASQ) [30]. Performance: Tasks were built into the system, allowing for easy measurement of task times, which were recorded in a log file stored on the phone.
IntelliTilt: An Enhanced Tilt Interaction Technique
517
The post-test questionnaire required participants to rate their preferred interaction technique while seated and while walking and to identify their preferred zooming technique. Participants were also asked to rate the usefulness of the visual and vibrotactile feedback and to identify positive and negative aspects of each interaction technique. Seven point semantic differential scales were used throughout. 5.2 Perceived Workload and User Satisfaction Results Figure 9 shows the mean perceived workload ratings for the two interaction techniques. Perceived workload was lower (better) in all categories for IntelliTilt. These differences were significant for mental demand (Wilcoxon Z = 2.78, p = 0.005), temporal demand (Wilcoxon Z = 2.20, p = 0.03) and effort (Wilcoxon Z = 2.55, p = 0.01). Perceived Workload Ratings 7 6 5 4 3 2 1 Mental Demand
Fig. 10. Mean user satisfaction ratings (95% confidence intervals shown) (n = 16)
518
B. van Tonder and J. Wesson
Figure 10 shows the mean user satisfaction ratings for the two interaction techniques. Participants were asked to rate the two techniques on a seven point semantic differential scale in terms of effectiveness, efficiency, controllability, ease of use, ease of selection and ease of use while walking. The results show that the IntelliTilt received higher user satisfaction ratings for all six satisfaction questions. These differences were statistically significant for effectiveness (Wilcoxon Z = 2.19, p = 0.03), efficiency (Wilcoxon Z = 2.71, p = 0.01), controllability (Wilcoxon Z = 2.17, p = 0.03), ease of selection (Wilcoxon Z = 2.82, p = 0.01) and ease of use while walking (Wilcoxon Z = 2.13, p = 0.03). Table 2 shows the participant preference ratings from the post-test questionnaire. In Table 2, a value of 1 indicates a preference for basic tilt and 7 indicates a preference for IntelliTilt. Participants preferred IntelliTilt both while seated and while walking. The preference for IntelliTilt was stronger when participants were walking (mean of 5.88 while walking as opposed to 4.94 while seated). The difference between the mean values and the neutral value was statistically significant in both cases. Gesture zooming was preferred over SDAZ, with a mean value of 4.44. The difference between this mean value and the neutral value was not, however, statistically significant. Table 2. Participant preferences (1=Basic tilt, 7 = IntelliTilt). Standard deviation values are shown in parenthesis (n = 16) Mean
T-stat
P-value
Preferred Technique (seated)
4.94 (1.69)
2.22
0.03
Preferred Technique (walking)
5.88 (1.45)
5.16
< 0.01
Preferred Zooming Technique
4.44 (2.00)
0.88
0.39
Participants were also asked to indicate whether they felt the vibration feedback and visual feedback were useful. Mean values of 6.25 and 6.00 were recorded for vibration and visual feedback, indicating that participants found these forms of feedback to be very useful. The difference between the mean and the neutral value of four was statistically significant for vibration feedback (t15= 7.32, p < 0.01) and for visual feedback (t15 = 6.32, p < 0.01). 5.3 Performance Results Table 3 shows the mean task times while seated and walking for the two interaction techniques. IntelliTilt proved slightly more efficient for seated tasks, while basic tilt proved slightly more efficient for walking tasks. The mean total time for both walking and seated tasks was almost identical. None of the differences between the two techniques were statistically significant using Wilcoxon ranked paired tests and alpha values of 0.05. Mean task times for different task types are shown in Table 4. IntelliTilt achieved marginally better task times for locating and checking tasks, while basic tilt was slightly more efficient for navigating tasks. None of these differences were statistically significant using Wilcoxon ranked paired tests and alpha values of 0.05.
IntelliTilt: An Enhanced Tilt Interaction Technique
519
Table 3. Mean task times (in seconds) for seated and walking tasks (standard deviation shown in brackets) (n = 16) Seated
Walking
Total
Basic tilt
275.73 (60.49)
267.74 (66.38)
543.47 (113.13)
IntelliTilt
259.81 (82.54)
284.90 (91.66)
544.71 (156.98)
Table 4. Mean task times (in seconds) for different task types (standard deviation shown in brackets) (n = 16) Locating
Navigating
Checking
Basic tilt
124.38 (20.15)
314.16 (27.20)
104.93 (19.85)
IntelliTilt
116.47 (16.92)
328.56 (44.90)
99.68 (19.26)
5.4 Qualitative Feedback Positive feedback about IntelliTilt focused on improved controllability. Several participants found the attractor mechanisms to be useful when performing selections. Participants also liked the visual and vibrotactile feedback. Positive feedback regarding basic tilt was received from some participants who felt that the use of SDAZ allowed them to be more efficient when locating POIs, particularly while seated. Some participants also liked the relative simplicity of this technique. Negative feedback regarding IntelliTilt mainly centered on issues with gesture zooming, with some participants struggling to get used to this form of interaction. Negative feedback about basic tilt focused on the controllability of this technique, particularly while walking. Participants complained that SDAZ was not required when following a route and took place too easily while walking. Participants who preferred gesture zooming found this form of zooming to be more intuitive than SDAZ. Several participants stated that they found SDAZ to be difficult and disorienting to use while walking. Participants who preferred the SDAZ zooming technique stated that they found this form of zooming to be more efficient and some found gesture zooming difficult to learn to use.
6 Discussion IntelliTilt provided statistically significant improvements in three aspects of perceived workload, namely mental demand, temporal demand and effort. The improvement in mental demand is particularly encouraging, as this is one of the areas in which tilt interaction was previously identified as being inferior to keypad interaction [6]. Given the strongly positive comments regarding the use of attractors and visual and vibrotactile feedback, it is reasonable to assume that these features contributed to the improved mental demand. The user satisfaction results were all higher for IntelliTilt. Of particular interest is the fact that users felt that it was easier to perform selection operations accurately
520
B. van Tonder and J. Wesson
using IntelliTilt. Given that the additional feedback and attractor mechanisms were designed to improve selection, this result is encouraging. It is also encouraging to note that participants felt that IntelliTilt was significantly easier to use while walking, since sensitivity adaptation was designed for this purpose. Participants were noticeably more relaxed when using IntelliTilt while walking and were generally able to maintain their normal walking pace. Using basic tilt, participants frequently had to slow down to maintain control. Participants made frequent use of dwell-time selection, even when both hands were available. There was very little difference in terms of efficiency for the two interaction techniques and overall task times were almost identical. No significant trends were evident in the task times for the different types of tasks.Despite the fact that there was little difference between the task times of the two interaction techniques, the participants perceived IntelliTilt to be more efficient. Given that controllability has previously been identified as a problem with tilt interaction, it is encouraging to note that participants found IntelliTilt easier to control. User feedback regarding the use of visual and vibrotactile feedback was very positive. Participants rated both forms of feedback as very useful. Qualitative comments regarding these forms of feedback was also very positive, with only one participant remarking that vibrotactile feedback might become annoying after prolonged use. IntelliTilt was preferred both while seated and while walking. This preference was stronger while walking. Several possible reasons for this exist. Firstly, IntelliTilt included sensitivity adaptation which makes allowance for the greater variability in accelerometer data while walking. Secondly, it is likely that the attractor mechanisms and feedback played a greater role in improving controllability while the user was mobile. Finally, SDAZ was perceived to be difficult to use while walking. Opinions regarding the preferred zooming technique were divided. While some users quickly mastered the gesture zooming technique, others found the learnability of this technique to be problematic. Gestures were sometimes not detected due to participants moving the phone vertically, rather than backwards or forwards along the z-axis (which is dependent on the phone’s orientation). SDAZ also resulted in divided opinions. SDAZ was identified as being useful for locating POIs, but difficult to use when following a route. Participants also found SDAZ to be difficult to use when walking.
7 Design Implications Analysis of the above results provided insight into the design of tilt interaction for mobile map-based applications. The following list contains some implications for the design of tilt interaction for mobile devices: 1. 2.
3.
Mechanisms should be provided to compensate for the inexact nature of tilt interaction and facilitate selection by providing automated assistance to the user. Multimodal feedback can be used to improve controllability and reduce mental demand. Vibrotactile feedback allows for non-visual reinforcement that is useful in a mobile context. Compensation should be made for variability in accelerometer data while walking by decreasing the sensitivity of tilt interaction.
IntelliTilt: An Enhanced Tilt Interaction Technique
4.
521
One-handed selection can be facilitated using tilt interaction. Dwell-time selection can provide one possible implementation.
8 Conclusions and Future Work This paper proposed an enhanced tilt interaction technique called IntelliTilt, which was designed to address the current shortcomings of basic tilt interaction. IntelliTilt was compared to basic tilt interaction using a prototype mobile map-based application, which allowed participants to browse maps in a wide geographic area at a wide range of zoom levels. The inclusion of walking tasks allowed IntelliTilt to be evaluated in a realistic context of use which is essential for mobile interaction research. IntelliTilt was designed to address specific shortcomings of basic tilt interaction that were previously identified. Statistically significant improvements were achieved in several important areas, including perceived mental demand, controllability and ease of selection. These results provide empirical evidence that the use of visual and vibrotactile feedback and attractor mechanisms can aid users in controlling tilt interaction and performing selections. Participants also indicated that they preferred IntelliTilt, particularly when walking, suggesting that sensitivity adaptation was beneficial in this regard. Results regarding gesture zooming were less conclusive, however, with participants divided between SDAZ and the new gesture zooming technique. Future work will involve investigating improvements to gesture zooming. The use of gyroscope and magnetometer sensors in conjunction with accelerometers will be investigated in order to improve gesture recognition and responsiveness.
References 1. Kratz, S., Brodien, I., Rohs, M.: Semi-Automatic Zooming for Mobile Map Navigation. In: Proc. MobileHCI 2010, pp. 63–72. ACM Press, New York (2010) 2. Karlson, A., Bederson, B., Contreras-Vidal, J.: Understanding Single-Handed Mobile Device Interaction. Technical Report HCIL-2006-02 (2006) 3. Cho, S.-J., Sung, Y., Murray-Smith, R., Lee, K., Choi, C., King, Y.-B.: Dynamics of TiltBased Browsing on Mobile Devices. In: Proc. CHI 2007, pp. 1947–1952. ACM Press, New York (2007) 4. Kratz, S., Rohs, M.: Navigating Dynamically-Generated High Quality Maps on TiltSensing Mobile Devices. In: Proc. of Workshop, MEIS 2008 (2008) 5. Dong, L., Watters, C., Duffy, J.: Comparing Two One-handed Access Methods on a PDA. In: Proc. Mobile HCI 2005, pp. 235–238. ACM Press, New York (2005) 6. van Tonder, B., Wesson, J.: Is Tilt Interaction Better Than Keypad Interaction for Mobile Map-Based Applications? In: Proc. SAICSIT 2010, pp. 322–331. ACM Press, New York (2010) 7. Reichenbacher, T.: Mobile Cartography - Adaptive Visualisation of Geographic Information on Mobile Devices. Institut für Photogrammetrie und Kartographie., PhD Thesis, Technischen Universitat (2004)
522
B. van Tonder and J. Wesson
8. Von Hunolstein, S., Zipf, A.: Towards Task-Oriented Map-based Mobile Guides. In: Proc. International Workshop “HCI in Mobile Guides”, Mobile HCI 2003 (2003) 9. Looije, R., te Brake, G., Neerincx, M.: Usability Engineering for Mobile Maps. In: Proc. Mobility 2007, pp. 532–539 (2007) 10. Rekimoto, J.: Tilting Operations for Small Screen Interfaces (Tech Note). In: Proc. UIST 1996, pp. 167–168. ACM Press, New York (1996) 11. Mantyjarvi, J., Paternò, F., Salvador, Z., Santoro, C.: Scan and Tilt –Towards Natural Interaction for Mobile Museum Guides. In: Proc. MobileHCI 2006, pp. 191–194. ACM Press, New York (2006) 12. Apple. iPhone 4, http://www.apple.com/iphone 13. Eslambolchilar, P., Murray-Smith, R.: Interact, Excite, Feel. In: Proc. the Second International Conference on Tangible and Embedded Interaction (TEI 2008), pp. 131–138. ACM Press, New York (2008) 14. Oakley, I., Ängeslevä, J., Hughes, S., O’Modhrain, S.: Tilt and Feel: Scrolling with Vibrotactile Display. In: Proc. EuroHaptics 2004, pp. 316–323 (2004) 15. Dachselt, R., Buchholz, R.: Throw and Tilt – Seamless Interaction across Devices Using Mobile Phone Gestures. In: Proc. MEIS 2008, pp. 272–278 (2008) 16. Hinckley, K., Pierce, J., Horvitz, E., Sinclair, M.: Foreground and Background Interaction with Sensor-Enhanced Mobile Devices. ACM Transactions on Computer-Human Interaction 12(1) (2005) 17. Mock, M., Rohs, M.: A GPS Tracking Application with a Tilt- and Motion-Sensing Interface. In: Proc. Workshop Mobile and Interactive Systems, MEIS 2008 (2008) 18. Rahman, M., Gustafson, S., Irani, P., Subramanian, S.: Tilt Techniques: Investigating the Dexterity of Wrist-based Input. In: Proc. CHI 2009, pp. 1943–1952. ACM Press, New York (2009) 19. Igarashi, T., Hinckley, K.: Speed-dependent automatic zooming for browsing large documents. In: Proc. 13th Annual ACM Symposium on User Interface Software and Technology, pp. 139–148. ACM Press, New York (2000) 20. Cockburn, A., Savage, J.: Comparing speed-dependent automatic zooming with traditional scroll, pan, and zoom methods. In: Proc. People and Computers XVII: British Computer Society Conference on Human Computer Interaction, pp. 87–102 (2003) 21. Jones, S., Jones, M., Marsden, G., Patel, D., Cockburn, A.: An evaluation of integrated zooming and scrolling on small screens. International Journal of Human-Computer Studies (63), 271–303 (2005) 22. Eslambolchilar, P., Murray-Smith, R.: Tilt-Based Automatic Zooming and Scaling in Mobile Devices - a state-space implementation. In: Proc. Mobile HCI 2004, pp. 120–131 (2004) 23. Cho, S.-J., Murray-Smith, R., Kim, Y.-B.: Multi-Context Photo Browsing on Mobile Devices Based on Tilt Dynamics. In: Proc. Mobile HCI 2007, pp. 190–197. ACM, New York (2007) 24. Oakley, I., O’Modhrain, S.: Tilt to Scroll: Evaluating a Motion Based Vibrotactile Mobile Interface. In: Proc. of EuroHaptics 2005, pp. 40–49 (2005) 25. Eslambolichilar, P., Williamson, J., Murray-Smith, R.: Multimodal Feedback for Tilt Controlled Speed Dependent Automatic Zooming. In: Proc. UIST 2004. ACM Press, New York (2004) 26. Eslambolchilar, P., Murray-Smith, R.: Control centric approach in designing scrolling and zooming user interfaces. International Journal of Human-Computer Studies 66(12), 838– 856 (2008)
IntelliTilt: An Enhanced Tilt Interaction Technique
523
27. Mooser, J., You, S., Neumann, U.: Large Document, Small Screen: A Camera Driven Scroll and Zoom Control for Mobile Devices. In: Proc. I3D, pp. 27–34. ACM Press, New York (2008) 28. Tuck, K.: Tilt Sensing Using Linear Accelerometers, Freescale Semiconductor (2007), http://www.freescale.com/files/sensors/doc/ app_note/AN3461.pdf 29. Hart, S.G., Staveland, L.E.: Development of NASA-TLX (task load index): results of empirical and theoretical research. Human Mental Workload, 139–183 (1988) 30. Lewis, J.R.: Psychometric evaluation of an after-scenario questionnaire for computer usability studies: The ASQ. SIGCHI Bulletin 23(1), 78–81 (1991)
Tensions in Developing a Secure Collective Information Practice - The Case of Agile Ridesharing Kenneth Radke1,2, Margot Brereton1, Seyed Mirisaee1, Sunil Ghelawat1, Colin Boyd2, and Juan Gonzalez Nieto2 1 School of Design Information Security Institute, Queensland University of Technology, Australia {k.radke,m.brereton,s.mirisaee,s.ghelawat, c.boyd,j.gonzaleznieto}@qut.edu.au 2
Abstract. Many current HCI, social networking, ubiquitous computing, and context aware designs, in order for the design to function, have access to, or collect, significant personal information about the user. This raises concerns about privacy and security, in both the research community and main-stream media. From a practical perspective, in the social world, secrecy and security form an ongoing accomplishment rather than something that is set up and left alone. We explore how design can support privacy as practical action, and investigate the notion of collective information-practice of privacy and security concerns of participants of a mobile, social software for ride sharing. This paper contributes an understanding of HCI security and privacy tensions, discovered while “designing in use” using a Reflective, Agile, Iterative Design (RAID) method. Keywords: Usable privacy and security, user experience based approaches, trust, design, HCI, participation.
1 Introduction The growth in use of tracking and data mining technologies, in order to support human activities, increasingly raises concerns about privacy and security. In this paper we explore how to address these issues through a case study of agile ridesharing, in which we are investigating how to grow the practice of ad hoc shared vehicle rides arranged in real time through mobile social software. Ridesharing software stands to benefit from tracking and data mining technologies, but the decision of whether to share information and share rides is inherently situated in social and cultural perspectives. Dourish and Anderson argue for a move away from narrow and technically focused views of privacy and security toward a holistic view of situated and collective information practice [7]. Here, we attempt to build on this view by exploring how to evolve a secure collective information practice in the ongoing design of a successful ridesharing system. In explicating the need to consider collective information practice, Dourish and Anderson [7] consider alternative views: • •
Privacy as economic rationality – the trade-off between risk and reward; Privacy as practical action – the practical detail of what people do to maintain privacy; and
Tensions in Developing a Secure Collective Information Practice
•
525
Privacy as discursive practice – the way in which notions of privacy and security are used to categorize activities, events and settings, separating acceptable actions from unacceptable ones.
In defining collective information practice, Dourish and Anderson argue for a greater focus on the latter two approaches, explaining that rational actor economic models “are inadequate as sole explanations of privacy and security practices because they fail to capture other symbolic and social values.” Further, various studies have shown that humans do not make use of such rational decision making with respect to privacy and security [19,1]. Dourish and Anderson explain collective information practice as “collectively reproduced understandings of the ways information should be shared, managed and withheld.” The shift from privacy and security as disconnected and abstract technical practices, that can be setup and left alone, to ones that are performative, ongoing accomplishments, calls into question the separation between configuration and action that characterizes most interactive systems for privacy and security management. A collective information approach posits how can configuration and action be achieved together and collectively evolved? However, the story is complicated by the issue of technical infrastructure [20,2], because social and mobile software applications for ridesharing typically need to operate across existing internet based technical infrastructures, where security is protected under a model of risk and reward. Given this situation, we pose the question, “What is a practical way to evolve a collective information practice for ridesharing building on existing infrastructures?” The approach that we propose is firstly a methodological one, drawing upon a reflective agile iterative development (RAID) method to grow a ridesharing culture, where the practicality of security can be devised collectively in the doing. Secondly, we examine the tensions in the existing approaches wherein technical capabilities may go beyond, be insufficient for or introduce conflicts with human needs, and explore how we might resolve these through evolution of collective information practice. Table 1. Tensions between Technical and Cultural Practices
Cultural practices Imprecision (negotiated and necessary disclosure) Action over time and in the moment Referral and reputation Accountability and transparency
A potential reason for the difference between technical practices and cultural practices, outlined in Table 1, is the disparity between the necessary probabilities of success for a cyber attack, compared with an attack in a social environment. Taking a rational economic view, an attack in a social environment (such as a robbery) may need a success rate of, at least, one in ten to be worthwhile for the perpetrator; whereas in the cyber-world a one-in-a-million attack can be seen as successful [17].
526
K. Radke et al.
Privacy and security are a pervasive aspect of how a system is designed and they cannot simply be grafted on [7]. We propose that, from a practical perspective, in the social world, security is an ongoing accomplishment rather than something that is set up and left alone, in agreement with Dourish and Anderson. However, it was not clear how a design approach can work to support the ongoing accomplishment of security and privacy. In this paper we examine a design case study in order to explore how design can support privacy as practical action.
2 Background Agile ridesharing aims to utilise the capability of social networks and technologies such as mobile phones and web applications, to facilitate people sharing vehicles and journeys. Social technologies, such as SMS, email and web applications, provide the opportunity for people to offer and request impromptu rides in real time. Previously, most mobile phone and social-network-supported ridesharing, such as Zimride, Avego and GoLoco has been limited, due to being based on a particular phone or social network platform, due to insufficient ride matches, and due to following a standard carpooling paradigm of regular shared rides which is impractical for many people in many circumstances. An investigation of existing rideshare approaches, in 2009, identified that there was a need, and potential based on new technologies, to create a system which allowed people to, in real time rather than a static matching program, arrange ride sharing based on extended social networks [4].
3 Related Work The range of social network designs, all with privacy and security issues, is very broad. Some examples are covered by the empirical work by Patil and Lai, who investigated the privacy settings of MySpace users [13]. Privacy is generally approached as a social consideration, whereas security is seen as a technical concern, though they are closely related [7]. We argue that technical security decisions in the interface and underlying infrastructure of internet communication have such an impact on privacy, that privacy needs to be considered from the perspective of the technical infrastructure and interaction with it, in order to ensure that the privacy expected from the social perspective is achieved. While we do not retreat from attempting to better support privacy as ongoing practice, the practicalities of interaction and design with technical infrastructure also need to be addressed. Lampe, Ellison and Steinfield, in their study of 1085 Facebook users which explored users’ expectations of privacy, also made important contributions in the exploration of users’ expectations [12]. They found that 90% of participants believed that no one from outside their university would read their Facebook page, and that 97% of participants believed that no law enforcement agency would look at their Facebook page. Schechter et al. created a study in which bank websites were progressively changed, to become less and less secure, and the researchers determined whether the participants continued to enter their passwords into the website (which they did) [15]. Similarly, De Keukelaere et al. and Sotirakopoulos et al. examined the effectiveness of security warnings, and their work provides succinct credible information that users were largely untrained in security and would not notice shortfalls in security [6,18].
Tensions in Developing a Secure Collective Information Practice
527
The conclusion is that privacy must be designed in, and that the default privacy settings both need to be sufficient to ensure the expected privacy, without user education and input, and sufficient to allow the socio-technical system to work effectively, which creates a tension. Sasse et al. argued that existing HCI techniques are sufficient to address security issues in the design of systems [14]. This being the case, it is important that the critical questions and concerns are identified. Our study outlines the range of security and privacy issues identified in an ongoing, locationspecific, social networking application and draws attention to particular tensions.
4 Iterative “Design in Use” Approach From a methodological perspective, figuring out how to accomplish privacy and security in the doing points toward an approach that combines ethnographic study and iterative design. For this reason our agile rideshare project has adopted a reflective agile iterative development (RAID) method to explore the design requirements for an agile rideshare system [10]. In summary, the design approach aims to: • • • • •
Understand community practices through ethnographic fieldwork Explore key design hypotheses by designing and deploying working investigatory prototypes for use by a segment of the community; Gather fragments of ethnographic data from the prototype in use; Build communities of use as the prototype is refined and extended; Understand the factors that persuade or dissuade others from joining.
The approach uses the simplest functioning technology prototypes deployed over an extended period, to understand how people use them in their daily lives to augment their activities. Thus the approach emphasizes understanding of use over feature provision and the functionality is extended in order to address pressing needs and emergent opportunities. We have employed a gradual growth strategy, as is recommended to ensure successful customer interaction [5], and in order to ensure due consideration is given to these issues. The rideshare prototype was initially developed for use among a small group of people who knew each other in order to understand basic aspects of the interaction paradigm, as reported in [9]. Following use of the simple prototype this group was able to consider practical aspects of sharing, privacy and security through use, and to consider how this needed to be enhanced in order to successfully grow the ridesharing community among known friends and also potentially to strangers. The prototyping approach is supported by interviews and group discussion. The initial rideshare prototype was designed to operate using a web browser, so that it could be accessed using any web-enabled phone, laptop and desktops, thus maximising the number of people who could participate in sharing using their own equipment. The prototype had a very limited functionality in that it only allowed people to send ride messages and information about seeking and offering rides. Even over a short four week trial of the first interface we observed a wide variety of practices and adoption, sharing and evolution of practices. In the beginning, most people sent formal ride messages by filling out form fields and few sent informal text messages because they were not revealed on the main page of the interface. However, once one participant realised
528
K. Radke et al.
that if no formal fields were filled out, only the informal text message would be revealed in the main page of the interface, the practice of informal messaging grew and it became the predominant form of communication. Collective information practice was at work.
5 Emerging Security and Privacy Tensions in Agile Ridesharing A number of emerging tensions between technical and cultural practices have been identified through collective information practice in our agile rideshare case study. These privacy and security tensions are listed in Table 1. 5.1 Precise Tracking versus Imprecision While there are immense technical capabilities to track people’s location, participants had concerns about who could see their location, even at fleeting moments. For example, providing journey start and end times would allow others to identify when they were away from their home and their car. Of particular concern was when both start and end of day rides, in opposite directions, were entered a day in advance, providing a clear understanding of when the participant would not be at home. However, a participant observed that even if return journeys were entered just once, then anyone with access to that data at any point in the future would find out a potentially ongoing commitment for the participant. These concerns have been identified as real, and are similar to concerns raised in the mainstream media in Norway regarding the EU Data Retention Directive for data from mobile telephones which allows for tracing of the user [21,22]. The ability to be imprecise was valued for other reasons. Through use of the prototype we have observed that people often give only scant information as much as is sufficient to open a conversation. This allowed them to make a vague proposition to a broad audience and then to discuss specifics with a few people on a need to know basis. Further to practices that we can see developing in the field, matchmaking literature, such as by [16], offers lessons in obscuring or hiding the respective parties’ personal information, while still providing relevant connections between appropriate parties, when technical assistance to do this is needed. But, most importantly, participants were able to control this information themselves in the doing, because the interface allowed such vagueness, by supporting free text messaging, rather than forcing specifics in formal fields. The tension of providing data which helps an application to function, such as journey start times and locations, versus the need to obscure what is shown both immediately and when creating a total picture over time, is a tension which must be investigated in many social networking and sensor enabled applications. 5.2 Prior Disclosure versus Action over Time There is a tendency in technical systems to ask for prior disclosure of profile information from people with a view that other users want this information to make decisions about sharing. However, in direct conflict with this, many people do not wish to provide personal information about themselves, especially to strangers, and ideally not even to “potential ride sharers”, but rather only to the person who is going
Tensions in Developing a Secure Collective Information Practice
529
to be in the car with them. The accuracy of profile information is anyway questionable. People are more likely to come to trust other people either through social connection, referral and reputation, or through their actions over time. Another facet of this tension is the need to acquire information to allow for greater privacy and security. For example, some female participants stated they would only wish to ride with women. This is in keeping with the traditional practice that women can wait for other women to use rideshare across the East Bay of San Francisco bridge, a rideshare process which has been in place for 30 years [23]. Therefore, gender may be a reasonable question to ask potential rideshare participants. However, there was a strong reluctance by participants to enter gender and other information on the profile form, to the extent that we have now removed these questions from the profile form and provided a free text field instead (seen in Figure 1 below). This was seen as giving the users more control over what they chose to share.
Fig. 1. Original profile page (left) and current profile page (right)
Interestingly, although a key feature of agile ridesharing is the ability to bring unrelated people together, in the case of established groups, simply a recognisable nickname may be all that is required to fully detail a prospective journey. This introduced the human ethics consideration of “only collecting from the users the data required.” Sometimes participants already know all details concerning gender, address, how to meet and what the vehicle looks like, and hence this data should not be compulsory to enter, since the system does not need to know this information, only the riders do. Thus we see again collective information practices at work. 5.3 Moderation versus Referral and Reputation Moderators, people who would vet potential new participants and scrutinise the rides posted to ensure acceptability, were discussed with the participants and the idea was discarded due to there being too much responsibility placed on one person. Possible
530
K. Radke et al.
issues range from loss of life through to smaller offences such as pick pocketing or unsafe driving [23]. A participant used the example of a referral chain from her baby sitting circle. In this case, a circle member could recommend a new potential circle member, and the circle would make a decision. Having made the decision to include the new member, each participant chose who from the group they would let babysit their children, so there was another level of individual decision and control. Referral chains may have a similar responsibility problem in much the same way as moderators, though the relationship is more direct and hence the risk is reduced. Instead of moderators and referral chains, allowing people to make their own groups gives individuals the greatest control [9]. Also discussed was the use of reputation systems [11]. While the agile ride share system makes strong use of social networking technologies on the internet, the people who are travelling together may be work colleagues, neighbours, or family members. Therefore, there were mixed feelings about reputation systems, in which the person travelled with is assessed and the score advertised on the rideshare system, although this does have potential as a method that reveals community trust built over time [9]. 5.4 Underlying Infrastructure versus Accountability and Transparency At the lowest infrastructure level was the question, “Who has access to the database of user and ride information?” Attention was drawn to the concern that although the current interface protected the privacy of the individual, once the information was in the database a future design may make the information accessible. This led to questions of how to hide the information even from the designers, while allowing filtering and searching. Possible solutions, such as predicate cryptography (which allows users with the relevant attributes to view a message, while all other users may not), create a tension due to a lack of flexibility with future designs and the lack of control and visibility by the creators of the application. A further tension exists with the expectation participants had that the service would be provided free of charge. For a prototype application with a small group of users, the design, development, maintenance, sending of SMSs to participants regarding rides, and connection and storage infrastructure costs may be included in a research budget. For a large application with millions of users, these costs would be considerable, and would typically be offset by either advertising with tracking cookies, or else by accumulating information about the participants and providing that information to interested parties, which may be the participants themselves (such as ride predictions for best times and places for rides). Both scenarios impact privacy. Finally, the issue, common amongst social networking applications, was identified that it is difficult for users to view what others see about them. Further, there was the realisation that participants had no control over what other participants post. For example, even though participant RLady consciously makes a decision to never publish when she was not going to be home, another participant could write “Picking RLady up from her place at 10am.” The above three concerns are indicative of tensions between individual or collective social practices that are accountable on a small scale and the implications of supporting these practices with an underlying technical infrastructure that has the capability to easily support large scale sharing of information.
Tensions in Developing a Secure Collective Information Practice
531
6 Conclusion We developed an agile ridesharing prototype mobile social software system and trialled it in order to explore the collective information practices that might be developed through its use in organising ridesharing. The paper contributes a practical example of designing for collective information practice, an approach proposed by Dourish and Anderson. A number of emerging security and privacy tensions between technical and cultural practices have been identified through examining the collective information practice in ridesharing. These tensions are: precise tracking versus imprecision; prior disclosure and setup versus action over time and in the moment; moderation versus referral and reputation; and underlying infrastructure versus accountability and transparency. A key aspect of design involves paying attention to people’s practices and matching the system technical capability to these practices, so that technical capabilities support growth of collective information practice and do not introduce conflicts with human needs. Acknowledgements. We gratefully acknowledge the contributions of our participants and reviewers.
References 1. Acquisti, A., Grossklags, J.: Privacy and rationality in individual decision making. In: Security & Privacy. IEEE, Los Alamitos (2005) 2. Bell, G., Dourish, P.: Yesterday’s tomorrows: notes on ubiquitous computing’s dominant vision. In: Personal and Ubiquitous Computing. Springer, Heidelberg (2007) 3. Brereton, M., Ghelawat, S.: Designing for participation in local social ridesharing networks: grass roots prototyping of IT systems. In: PDC. ACM, New York (2010) 4. Brereton, M., Roe, P., Foth, M., Bunker, J.M., Buys, L.: Designing participation in agile ridesharing with mobile social software. In: OzCHI. ACM, New York (2009) 5. Carlson, R.C.: Anatomy of a systems failure: Dial-a-ride in Santa Clara County, California, Transportation. Springer, Heidelberg (1976) 6. De Keukelaere, F., Yoshihama, S., Trent, S., Zhang, Y., Luo, L., Zurko, M.: Adaptive security dialogs for improved security behavior of users. In: Gross, T., Gulliksen, J., Kotzé, P., Oestreicher, L., Palanque, P., Prates, R.O., Winckler, M. (eds.) INTERACT 2009. LNCS, vol. 5726, pp. 510–523. Springer, Heidelberg (2009) 7. Dourish, P., Anderson, K.: Collective Information Practice: Exploring Privacy and Security as Social and Cultural Phenomena. In: HCI. Lawrence Erlbaum Assoc., Mahwah (2006) 8. Dourish, P., Grinter, B., Delgado de la Flor, J., Joseph, M.: Security in the wild: user strategies for managing security as an everyday, practical problem. In: PUC (2004) 9. Ghelawat, S., Radke, K., Brereton, M.: Interaction, privacy and profiling considerations in local mobile social software: a prototype agile ride share system. In: OzCHI. ACM, New York (2010) 10. Heyer, C., Brereton, M.: Design from the everyday: continuously evolving, embedded exploratory prototypes. In: DIS 2010. ACM, New York (2010) 11. Josang, A., Ismail, R., Boyd, C.: A survey of trust and reputation systems for online service provision. In: Decision Support Systems. Elsevier, Amsterdam (2007) 12. Lampe, C., Ellison, N., Steinfield, C.A.: Face (book) in the crowd: Social searching vs. social browsing. In: Computer Supported Cooperative Work. ACM, New York (2006)
532
K. Radke et al.
13. Patil, S., Lai, J.: Who gets to know what when: configuring privacy permissions in an awareness application. In: SIGCHI. ACM, New York (2005) 14. Sasse, M.A., Brostoff, S., Weirich, D.: Transforming the ‘weakest link’- a human/computer interaction approach to usable and effective security. BT Tech. Journal 19(3), 122–131 (2001) 15. Schechter, S.E., Dhamija, R., Ozment, A., Fischer, I.: The Emperor’s New Security Indicators: An evaluation of website authentication and the effect of role playing on usability studies. Security and Privacy (2007) 16. Shin, J.S., Gligor, V.D.: A new privacy-enhanced matchmaking protocol. In: NDSS. Citeseer (2007) 17. Shostack, A., Stewart, A.: The New School of Information Security. Addison-Wesley Professional, Upper Saddle River (2008) 18. Sotirakopoulos, A., Hawkey, K., Beznosov, K.: “I did it because I trusted you”: Challenges with the Study Environment Biasing Participant. In: SOUPS (2010) 19. Spiekermann, S., Grossklags, J., Berendt, B.: E-privacy in 2nd generation E-commerce: privacy preferences versus actual behavior. In: EC-2001. ACM, New York (2001) 20. Star, S.L.: The ethnography of infrastructure. American behavioral scientist, Sage Publications, Thousand Oaks (1999) 21. http://eur-lex.europa.eu/LexUriServ/ LexUriServ.do?uri=CELEX:32006L0024:EN:NOT 22. http://www.zeit.de/digital/datenschutz/ 2011-03/data-protection-malte-spitz 23. http://www.ridenow.org/carpool/#locations
Choose Popovers over Buttons for iPad Questionnaires Kevin Gaunt1, Felix M. Schmitz2, and Markus Stolze1 1
Institute for Software, University of Applied Science Rapperswil, CH-8640 Rapperswil-Jona, Switzerland 2 Institute of Medical Education, Medical Faculty of the University of Bern, CH-3010 Bern, Switzerland {kevin.gaunt,markus.stolze}@hsr.ch, [email protected]
Abstract. When designing questionnaires for iPad an important design decision is whether to use popover listings or button listings for representing singlechoice selections. In this paper we examined effects of each listing method on performance and subjective preferences when performing a non-linear selection task. A quantitative experiment (N = 39) with the two within-factors (1) listing method (popover versus button) and (2) task completion time (15s versus 7s versus 5s) was conducted. Results show subjects performing significantly better when using popovers, which they also strongly preferred. We attribute this to lower extraneous cognitive load and shorter forms, ultimately requiring less scrolling. Results also show the expected effect of task completion time on performance: the longer the allotted time, the higher the test scores. Keywords: popover listing, button listing, single-choice questionnaires, cognitive load, performance, iPad.
that by choosing BL one step can be eliminated from the task. Thus, by using iPad as a survey instrument (i.e. single-choice questionnaires), PL will require more "tap"interactions when choosing a value. Additionally, by using BL all of the questions' choices are visible at one glance. This might indicate BL to be the better performing design. However, BL also increases the chance of spurious selections to occur while interacting with the device: buttons might be pressed when the user only intended to scroll the form. Further, by displaying fewer choices, PL obviously decreases the screen’s information density at state 0 (see Figure 1). In this context, mental processes like searching for the right item by performing a non-linear selection task could be fostered in cause of lower extraneous cognitive load [4]. This is an indicator that PL's design might prove to be more accurate. The question of which listing type offers a better fit for iPad questionnaires still remains: to the best of our knowledge, no empirical research has been done on this topic so far. However, related work in the field of web surveys shows radio buttons outperforming drop-down menus [2,3]. In this paper, we investigate how these findings translate to touch-screens when questionnaires are completed in a non-linear fashion.
Fig. 1. Steps needed to select a choice depending on listing type. State 0: No Selection; State 1: Displaying Popover; State 2: Choice Selected.
2 Hypotheses We have already established that “Popover Listing” (PL) requires more “tap”interactions than “Button Listing” (BL). Thus BL could help the user to perform better. On the other hand, PL might make spurious selections less likely and could reduce cognitive load. As both listing types have their advantages, it is unknown which of them offers better performance while conducting a non-linear selection task (see H1). H1: Performance differs depending on listing type. Performance can be regarded as the number of successfully completed tasks within a given period of time. Therefore, we are also interested how performance will vary depending on how much time is available for task completion. It is generally accepted, that after a certain threshold, a shorter task completion time should lead to inferior performance (see H2). H2: Lower task completion times reduce performance. Consequentially, we are interested if listing type and task completion time interact. As we expect both conditions to have an effect on performance, interaction seems possible (see H3).
Choose Popovers over Buttons for iPad Questionnaires
535
H3: Depending on task completion time, the listing type influences performance. Finally, we are interested in the users' perception of both listing types. Good indicators for this are: the users' speed perception, confidence perception, and their overall preference. We expect subjective perception to differ (see H4). H4: Subjective perception differs between listing types.
3 Method 3.1 Subjects We performed the experiment with 40 subjects (31 male, 9 female). All subjects were third year computer science students (mean age was M = 24.5 (SD = 3.27)) attending a course on human computer interaction. 20% of the subjects indicated having no prior experience with touch-screen devices (iOS or Android), 15% indicated having some experience, and 65% indicated being daily users of iOS or Android devices. All subjects were recruited during a lesson on advanced user interfaces and were fluent in German. 3.2 Design To test our hypotheses, we performed a quantitative experiment. The subjects’ task was to identify an object of a given colour and register its occurrence on an iPad questionnaire (see Material). The experiment was a 2 x 3 within subject design. The independent variables are listed in Table 1. The dependent variables were test score (as an indicator of performance) and 3 survey ratings (as indicators of subjective preference). The test score was determined by counting every correct choice per trial. For this to be possible, the test software recorded every one of the subjects' choices. Per trial each subject could reach a test score between 0 and 10 points. In order to avoid a sequence effect, we counterbalanced the listing type order (see Procedure). The surveys' results were compiled from a survey administered after the experiment. Besides questions regarding the subjects age, gender, and iOS/Android experience the survey contained questions to determine which application the subjects perceived as faster, which application the subjects felt more confident with, and which application they preferred overall. The choices for the latter three questions were all either “Popover Listing” (PL) or “Button Listing” (BL). Table 1. Overview of the independent variables Independent Variable Listing Type Task Completion Time
3.3 Material The task-related stimulus material was selected from a pool of 500 objects of varying colours. Ten qualifying objects made up a trial. For each trial the objects were drawn at random ahead of time. This means that all subjects were presented the exact same series of objects. To improve discoverability within the questionnaire, objects were classified into five groups (tools, transportation, animals, sports, fruits). Every object was coded to be of one of five colours (red, purple, yellow, blue, black). The language used to express all categories, objects and colours was German. All trials (see Procedure) were assembled in a single PowerPoint presentation, which then was displayed on a 2.5m screen using a classroom LCD projector. An introductory slide was shown to inform the candidates how much time they had to select the given object before the next one would be displayed. The subsequent ten slides were the randomized objects that form a single trial. Subjects performed the experiment on iPad (2010) devices for which two separate applications were developed. They differed only in listing type. Figure 2 shows the application developed for “Popover Listing” (PL) and for “Button Listing” (BL). Neither application provided any long-term feedback when choosing a value to ensure equal conditions for each trial’s task. The size of the buttons or table cells representing the available colours for each object had to be equal and determined by the largest label. Although iPads can be used in both portrait and landscape orientation, we made sure that the developed applications supported only portrait orientation. The devices were sufficiently charged and had all power saving features disabled. Interact Paper App (Popover Listing)
Interact Paper App (Button Listing)
Werkzeuge
Werkzeuge
Messband
Bewerten...
Bohrer
Schwarz
Bewerten...
Schraubenzieher
Violett
Bewerten...
Zange
Gelb
Bewerten...
Hammer
Blau
Bewerten...
Transport
Messband Schwarz
Bewerten...
Flugzeug
Bewerten...
Zug
Bewerten...
Auto
Bewerten...
Gelb
Blau
Rot
Schwarz
Violett
Gelb
Blau
Rot
Violett
Gelb
Blau
Rot
Violett
Gelb
Blau
Rot
Violett
Gelb
Blau
Rot
Schraubenzieher Schwarz
Rot
Bus
Violett
Bohrer
Zange Schwarz
Hammer Schwarz
Fig. 2. Comparison of both iPad applications, “Popover Listing” (left) and “Button Listing” (right) differing only by listing type
3.4 Procedure Participants were divided into groups of four (totalling 10 groups). Each group was separately led to the prepared room. There they were instructed to seat themselves at any of four test beds positioned three meters from a 2.5-meter screen (angular diameter 45.24°). Before starting the experiment, the test supervisor gave a short introduction. Both the experiment procedure and its tasks were explained. Additionally, participants were told not to expect any visual feedback for their
Choose Popovers over Buttons for iPad Questionnaires
537
selections to occur. To ensure all subjects had understood the instructions, a quick training session was held. Participants were informed the first trial would allow them 15 seconds per object to find and select the correct choice. As soon as everybody was ready, the first trial was started. An auditory signal marked the moment when a new object was displayed. This procedure repeated itself through trial 2 and 3, using task completion times of 7 and 5 seconds respectively (see Table 2). Pauses were permitted between trials if participants felt they needed them. After completing trial 3, participants were instructed to switch their applications. Specifically, participants needed to open whatever application they hadn't used in the first three trials (“Popover Listing” versus “Button Listing”). After switching applications, another training session was held. Trials 4 through 6 followed the same procedure as for trials 1 through 3 (see Table 2). Finally, a brief survey on the participants’ impressions of the two listing types was administered. Table 2. Schedule for a subject starting with the “Popover Listing” application Trial 1 2 3 4 5 6
Number of Tasks 10 tasks 10 tasks 10 tasks 10 tasks 10 tasks 10 tasks
4 Results One subject had to be excluded from the results due to failing to comprehend the experiments tasks (N = 39). Initial analysis of the data showed an exceptionally high number of errors related to the colours blue and violet. We attribute this to the relatively small difference in colour when displayed using a standard projector. To remove this irregularity, we treated objects of the colours blue and violet as if they were of a single colour. The grand mean of test scores was M = 9.67 (SD = 0.45). Specifically, the “Popover Listing” (PL) type had the highest overall test score mean at M = 9.80 (SD = 0.36). This compares to a test score mean of M = 9.55 (SD = 0.62) for “Button Listing” (BL) type. Regarding task completion time (TCT) the test score mean for 15 seconds was M = 9.81 (SD = 0.37), for 7 seconds it was M = 9.68 (SD = 0.72) and for 5 seconds M = 9.51 (SD = 0.75). The largest difference in test score means was observed at an interval duration of 5 seconds: PL's test score M = 9.75 (SD = 0.55), BL’s test score M = 9.28 (SD = 1.15). Figure 3 illustrates the mean test scores for each factor level. Subjects using the application with the PL type had consistently higher test scores than when using the BL type application. This difference is strongly statistically significant (F(1,38) = 10.065; p = 0.003). Therefore, we accept our first hypothesis (H1). Data also showed that subjects with short TCTs (i.e. when allotted only 5 seconds per task) were more likely to commit errors than with longer TCTs. In order to test if this observation is significant, we computed inner subject contrasts for TCT and found a linear effect on
538
K. Gaunt, F.M. Schmitz, and M. Stolze
performance (F(1,38) = 4.798; p = 0.035). Surveying the mean scores for accordant TCT levels (see above), we accept our second hypothesis (H2). Furthermore, no interaction between listing type and task completion time could be found (F(2,76) = 2.618; p = 0.08). Consequentially, we reject our third hypothesis (H3). Listing Type
10,0
Popover Button
Estimated Marginal Means
9,8
9,6
9,4
9,2
9,0 15s
7s
5s
Task Completion Time Fig. 3. Mean test scores per condition
The survey gauged which application the participants felt more confident with (confidence perception), which application enabled them to fill in the questionnaire faster (speed perception) and which of the applications they preferred (overall preference). The results show that participants were more confident using PL type (x2(1) = 21.56; p = 0.000) and preferred the PL type overall (x2(1) = 21.56; p = 0.000). Participants also perceived PL to be faster. However, the difference in speed perception is not statistically significant (x2(1) = 3.60; p = 0.58). Based on this data, we partially accept our fourth hypothesis (H4). No significant difference between answers from subjects with different levels of expertise was observed. The survey results are listed in Table 3.
Choose Popovers over Buttons for iPad Questionnaires
539
Table 3. Ratings for confidence (“which listing type did you feel more confident with?”), speed (“which listing type did you perceive as being faster?”) and overall preference (“which listing type did you prefer?”)
5 Discussion Our experiment provides the following four results regarding our initial hypotheses: first, the listing type does have a significant effect on the subjects' performance (H1 accepted). In particular, when using the "Popover Listing" (PL) design users perform significantly better compared to the "Button Listing" condition. Second, lower task completion times (TCT) decrease the subjects' performance significantly (H2 accepted). Third, no interaction between listing type and TCT could be observed (H3 rejected). Fourth, a statistically significant difference in the participants' subjective perception regarding their confidence and overall preference benefitting PL exists. However, the difference between participants' speed perception is not significant (H4 partially accepted). The superior performance of PL seems to contrast prior research regarding webbased questionnaires [2,3]. But contrary to the published research, our experiment setup did not require participants to fill in the questionnaire in linear fashion. Thus, finding the next choice was a search task that would regularly involve scrolling. We believe this to be a major reason for BL's lower performance. The total length of the questionnaire implementing PL was half that of the questionnaire implementing BL. The use of popovers causes forms to be shorter as the individual choices are presented only on demand. We conclude that a design that requires less scrolling (PL) outperforms a design that requires less tapping (BL). We were also interested if any spurious choices occurred when interacting with the device using BL's listing type. This might happen when users intend to scroll the form and accidentally trigger a choice. We investigated this by manually analyzing all BL related error situations. This way we were able to establish three instances, which strongly suggest that those erroneous choices were the result of handling problems.
6 Conclusion In this paper we have taken the first steps towards determining which listing type is better suited for questionnaires on iPad. Our results are limited to single choice selections in questionnaires that are completed in a non-linear fashion. For this context we are confident to report that it is strongly advisable to choose popovers over buttons for iPad questionnaires. As iPads and other tablets continue to gain traction and questionnaires remain popular, the future implications of our research are compelling. Acknowledgments. The University of Applied Science Rapperswil (HSR) and SWITCH funded this research. We would like to acknowledge Philippe Zimmermann, Hans Rudin, Stephan Schallenberger, and Michael Graf for their contributions.
540
K. Gaunt, F.M. Schmitz, and M. Stolze
References 1. Apple, Inc., iOS Human Interface Guidelines: iOS UI Element Usage Guidelines (March 3, 2011), http://developer.apple.com/library/ios/#documentation/ UserExperience/Conceptual/MobileHIG/UIElementGuidelines/ UIElementGuidelines.html (retrieved June 10, 2011) 2. Johnsgard, T.J., Page, S.R., Wilson, R.D., Zeno, R.J.: A Comparison of Graphical User Interface Widgets for Various Tasks. In: Proceedings of the Human Factors and Ergonomics Society, USA, vol. 39, pp. 287–291 (1995) 3. Heerwegh, D., Loosveldt, G.: An evaluation of the effect of response formats on data quality in Web surveys. Social Science Computer Review 20(4), 471–484 (2002) 4. Mayer, R.E.: Multimedia Learning. University Press, Cambridge (2001)
Developing and Evaluating a Non-visual Memory Game Ravi Kuber1, Matthew Tretter1, and Emma Murphy2 1
UMBC, 1000 Hilltop Circle, Baltimore MD 21250, USA 2 Dublin City University, Glasnevin, Dublin 9, Ireland [email protected]
Abstract. This paper describes the development of a non-visual memory game based on the classic game ‘Simon™’, where users are presented with a sequence of stimuli, which they need to replicate in the same order to progress to the next level. Information is presented using a combination of speech, non-speech audio and/or haptic cues, designed to aid blind users who are often excluded from mainstream gaming applications. Findings from an empirical study have revealed that when haptic feedback was presented in combination with other modalities, users successfully replicated more sequences, compared with presenting haptic feedback alone. We suggest that when developing a non-visual game using an unfamiliar input device, speech-based feedback is presented in conjunction with haptic cues. Keywords: Audio, blind, haptics, memory games, multimodal, speech.
1 Introduction The recent surge in the availability of brain and memory training games suggests that individuals are interested in keeping their minds active, and strengthening their motor skills through interacting with the software. While these games are typically popular with children and younger adults, research has shown that cognitive training can in certain instances benefit older individuals (Lustig et al., [15]), some of whom may experience levels of memory loss. As much of the information presented via memory game interfaces is visual in nature, individuals who are blind can experience some difficulties accessing content when using their existing assistive technologies. Screen readers, which are often used to convey information from a graphical user interface through speech, inadequately handle graphics, and can omit structural information, making the process of playing a game more complex. In order to provide a more inclusive experience, a need has been identified to replace the missing structure of the interface through the use of alternative channels, such as audio and haptics.
(AudioBattleShip) which provides the user with awareness of current position, and informs him/her of key events (e.g. the resulting outcome of dropping a bomb in a cell of the contender’s battlefield). Input is made using a tablet, with a pen device aiding the triggering of events. More recently, arcade-style games have been modified specifically for blind users. Examples include an accessible version of ‘Rockband’ (Allman et al., [1]), where the user is presented with vibrations on the arm and ankle to represent the drum cues, while auditory information is used to provide feedback on performance. Morelli et al. [18] have developed ‘VI Bowling’, an exergame where blind individuals interact with a wireless controller which provides vibrotactile cues to the user’s hand. The user is able to identify the direction of the virtual pins using this feedback. Findings showed that participants were able to throw a virtual ball with an average error of 9.76 degrees, demonstrating the promise of using the ‘tactile dowsing’ method developed, to aid interaction with a gaming interface. In terms of memory games, Sjostrom [21] developed a non-visual interface specifically enabling children who are blind, to match pairs of sonified buttons together. Haptic feedback is also produced to provide structural cues to users, presented via the PHANTOM device. Once correctly identified, the pair disappear from the interface leaving a smaller selection of buttons to choose from. Wang et al. [24] developed a similar game using tactile feedback presented via the STRESS² display. Using a range of tactile rendering techniques, the researchers created a set of discernable effects, which the user can explore using his/her sense of touch, to locate matching stimuli. Evreinova et al. [6] developed a memory game targeted to the needs of deaf and hard of hearing users, where participants explore vibrotactile patterns using the Logitech IFeel mouse [14]. The parameters of frequency and duration were modified to develop the set of effects for the game. Interestingly, rather than committing the whole tactile patterns to memory, participants were noted to recall the number of bursts of tactile information associated with each tactile icon. Other strategies were also developed by users to aid game play. 2.2 Non-visual Memory While research has traditionally focused on human abilities to recognize visual stimuli, less is known about our memory for both haptic and auditory (non-speech) items. In terms of audio, parameters of pitch and tempo of melodies can be effectively recalled by users [12, 13]. In terms of touch, estimates suggest that we are able to remember between two to six pieces of tactile information (Watkins and Watkins, [25]; Mahrer and Miles, [16]). Kuber and Yu [9] found that a sequence of four pin-based tactile icons could be recognized from a wider range presented. Participants were provided with an extensive training period to commit tactile stimuli to memory, which aided retention over the month-long period. However, findings from a follow-up study showed that after a gap of four months without practice of tactile passwords, rates of accurate identification reduced to 58.3% [10]. 2.3 Non-visual Design Considerations Blind users often rely on keystrokes to navigate around an interface using a screen reader (e.g. up arrow to move up and down arrow to move down), rather than using a mouse.
Developing and Evaluating a Non-visual Memory Game
543
Interacting with a mouse requires a strong level of hand-eye coordination, so when developing a non-visual interface using such a device, it is essential to provide additional non-visual support to aid the targeting process. Furthermore, additional assistance is needed to remain positioned upon targets after locating them, as unintentional deviations may be made. Oakley et al. [19] found that attraction effects (gravity wells) and recess effects were the most effective methods to aid the targeting process, as it was difficult to slip away from a particular object mapped to these properties. Vitense et al. [22] highlighted the benefits that force-feedback cues can make when presented alone and in combination with other forms of feedback, when exploring an interface. The researchers have suggested that conditions providing haptic effects were more quickly recognized than conditions that did not provide haptic feedback. To ensure that abstract auditory cues (earcons) can be appropriately perceived when integrated with an interface, Brewster et al. [2] have suggested maintaining the level of pitch no higher than 5Khz and no lower than 125-150 Hz. In addition, manipulating the spatial location of the earcon can be beneficial to distinguish between effects. Findings from a later study revealed that earcons can be recalled over longer periods of time, although training techniques were found to affect recall rates (Brewster, [3]). Studies have shown that auditory feedback can be used to augment haptic cues, due to the mechanical limitations associated with force-feedback devices (McGee et al., [17]) However, research has yet to focus on the memorability of multimodal cues within memory games. The research described in this paper, examines how effectively sequences of nonvisual effects can be replicated, with the long term goal of developing design guidance for interface developers interested in improving access to memory games, through the use of non-visual feedback. More specifically, we have aimed to determine whether haptic feedback can be recalled more effectively if presented independently or in conjunction with other forms of feedback.
3 Development of Memory Game ‘Simon™’ [7] is an electronic game, where the user is presented with a sequence of flashing colored lights and tones from an electronic device. The user is required to replicate the sequence, by pressing the colored buttons on the device in exactly the same order as originally presented. A multimodal game has been developed based on Simon™ (Figure 1). This was designed with the aim of being accessible to users who are blind. The user is presented with the following forms of feedback: speech, nonspeech audio, haptics, and graphics. The user can use one or more forms of each type of feedback to play the game. Blind users have the choice of using the keyboard, or using both the keyboard and the Logitech Wingman force-feedback mouse (Figure 2) to interact with the game. The interface contains four different-colored buttons (labeled ‘red,’ ‘blue,’ ‘green’ or ‘orange’) arranged in the shape of a cross. Each button is mapped to an earcon or speech icon, which is presented for a period of 100ms at 60dB using recommendations from [5] (Table 1). The buttons were recreated in the auditory domain using non-speech sound by manipulating the pitch and spatial position of a pure sine tone. Upward movements are indicated by a 1 second sine tone with an upward frequency glissando (220Hz-880Hz)
544
R. Kuber, M. Tretter, and E. Murphy
and a sine tone with a downward glissando indicates a downward movement (880Hz220Hz). Left and right positions were implemented by adjusting the spatial position of a 1 second sine tone (440Hz) earcon accordingly Haptic spring effects were developed and integrated with the interface to provide guidance towards a target (button labeled ‘red,’ ‘blue,’ ‘green’ or ‘orange’). For example, to prompt the user to move rightwards, the mouse gently guides the users’ hand towards the right-hand side of the page.
To play the game, the user is presented with a sequence of buttons which flash and/or play one or more non-visual effects described in Table 1. To complete the stage, the user must successfully select the buttons in the same order originally presented. The system presents speech-based feedback to the user, to indicate his/her completion and whether the attempt was successful or not. Other buttons are presented on the interface to enable the user to customize the forms of feedback which he/she wishes to use, and to manipulate the complexity of the game (Figure 1). Table 1. Mappings to non-visual stimuli
Move Up Move Down Move Left Move Right
Audio Increase in pitch (upward frequency glissando) Decrease in pitch (downward frequency glissando) Sine tone panned left
Speech Up
Left
Sine tone panned right
Right
Down
Haptic Continuous directed guidance (upwards) to target using attraction effect Continuous directed guidance (downwards) to target using attraction effect Continuous directed guidance (leftwards) to target using attraction effect Continuous directed guidance (rightwards) to target using attraction effect
Developing and Evaluating a Non-visual Memory Game
545
4 Main Study The study aimed to address the following hypotheses: • •
H1. Participants would achieve greater levels of accuracy replicating sequences when haptic cues are presented in composite with audio and/or speech, rather than by itself. H2. Participants would be able to navigate faster using a mouse while confirming actions using keystrokes, compared to using the keyboard for both activities.
Due to practical difficulties associated with obtaining a large number of blind participants, sighted participants were included within this exploratory study. Ten sighted participants (aged between 20 and 30), were blindfolded when interacting with the game, to simulate conditions of being blind. Two legally-blind individuals (one congenitally blind, one adventitiously blind, aged between 20 and 25) were also selected to participate. Participants were provided with a ten minute period of training to familiarize them with the non-visual icons developed. Both blind participants were provided with an additional five minutes of training using the mouse, due to their lack of experience with the device. Participants were then asked to play the memory game, by working through a series of different levels, corresponding from ‘easy’ (two stimuli) to ‘complex’ (seven stimuli) which they needed to replicate in order to progress to the next stage. Each condition presented in Table 2, was randomized to reduce the occurrence of an order effect. A pilot study had revealed that presenting separate speech and auditory effects in composite could cause confusion. Icons were developed presenting speech with variations in pitch to indicate to move upwards or downwards, or speech panned in a particular direction, to convey to the user to move left or right. Each condition was presented twice, to gain a more comprehensive overview of results. Auditory and speech based cues were delivered through noise-cancelling headphones. For purposes of the study, participants were asked to ‘think-aloud’, describing their experience after performing each condition, and suggesting improvements for the feedback presented. At the end of the study, they were asked to rate perceived levels of cognitive workload experienced under each condition using a Likert scale (1-5). Table 2. Conditions presented to participants
Method of Interaction Mouse with keyboard
Keyboard
Feedback Haptics Haptics and speech Audio and haptics Audio, haptics and speech Speech Audio Audio and Speech
Abbreviation MH MHS MAH MAHS KS KA KAS
To determine the usability of the game, the time taken to complete each level and the number of successful attempts were automatically logged by the software. Participants
546
R. Kuber, M. Tretter, and E. Murphy
were also asked to rate the level of enjoyment experienced (1 to 5), and asked to provide any comments on the methods of interaction used (e.g. use of keyboard vs mouse with keyboard).
5 Results 5.1 Replication of Non-visual Sequences The percentage of successful attempts to replicate sequences of stimuli, are shown in Figure 3. A repeated measures ANOVA showed that the level of accurate replication of sequences varied by condition (F (2.901, 31.910) = 7.023, p=0.01, GreenhouseGeisser adjusted). While results from Figure 3 suggested that the KS, MHS and MAHS conditions could be replicated most accurately (i.e. conditions employing speech-based feedback), post-hoc analysis (Bonferroni corrected) could only confirm significant differences between the following conditions (MH vs MHS, MH vs MAH, MH vs MAHS, MH vs KS, MAHS vs KA). Although findings indicated that the number of accurate replications of sequences did not differ widely depending on method of interaction (Keyboard – M: 74.3%, SD: 25.7%; Mouse with keyboard – M: 74.5%, SD: 28.2%), a significant effect could not be detected using statistical analysis.
Fig. 3. Percentage of successful attempts by condition
The percentage of successful attempts by level, are shown in Table 3. Results have suggested that as the level of the game increased, accuracy was found to decline from 95.2% when replicating a sequence of two stimuli to 32.7% for a sequence of seven stimuli (F(4.245, 55.181) = 72.160, p=0.00, Huyhn-Feldt adjusted). The percentage of successful attempts increased upon the second attempt for Levels 3 (+1% difference) and
Developing and Evaluating a Non-visual Memory Game
547
4 (+9% difference). However, as the game became more complex resulting in an increase of objects to target, the number of accurate replications on the second attempt appeared to decline. Table 3. Percentage of successful attempts and time taken by level
Average Time Taken (Seconds) 2.8 (SD: 1.2) 3.5 (SD: 1.4) 4.6 (SD: 2.0) 5.8 (SD: 2.5) 8.2 (SD: 3.3) 10.1 (SD: 3.8)
5.2 Time Taken to Perform Tasks Figure 4 shows the time spent replicating sequences by condition. A significant effect was detected by condition (F(4.034, 479.92) = 5.227, p = 0.00, Greenhouse-Geisser adjusted) and by level (F(3.244, 450.929) = 294.940, p = 0.00, Greenhouse- Geisser adjusted). Results suggested that participants were on average faster performing tasks under KS, MH and MHS. Post-hoc analysis (Bonferroni corrected) confirmed differences between all levels. However, significant differences were only found between the following conditions: MH vs KA, MHS vs MAH, MHS vs KA, MAH vs KS, KS vs KA. Although the average time taken for replicating sequences was not found to vary widely between different methods of interaction with the system (Keyboard – M: 5.9 seconds, SD: 3.5 seconds; Mouse with keyboard – M: 5.8, SD: 3.7), further analysis would be needed to confirm the presence of an effect.
Fig. 4. Time taken to replicate sequences by condition
548
R. Kuber, M. Tretter, and E. Murphy
5.3 Performance of Blind and Sighted Participants Table 4 shows the results of both blind and sighted groups. The two blind participants were on average found to replicate sequences more successfully (78.0%) when compared with their sighted counterparts (73.7%). Results indicated that blind participants were able to perform tasks within a shorter period of time (4.7s, SD: 2.4s) compared to sighted participants (M: 6.1s, SD: 3.8s), when both using the keyboard or a mouse with keyboard. Table 4 shows that additional time was spent replicating sequences containing non-speech auditory feedback. Results indicated that the time taken was found to increase for both groups while progressing through levels, with a steady decrease in success when replicating sequences of stimuli. However, further study would be needed to identify whether these claims can be validated statistically. Table 4. Time taken by condition and level of sight (1-easier to 7-difficult)
Condition
Blind
Sighted
MH
4.4 (SD: 2.1)
5.8 (SD: 4.0)
MHS
4.5 (SD: 2.3)
5.8 (SD: 3.4)
MAH
5.0 (SD: 2.5)
6.5 (SD: 3.6)
MAHS
4.7 (SD: 2.1)
6.4 (SD: 4.4)
KS
4.2 (SD: 2.7)
5.7 (SD: 3.6)
KA
5.3 (SD: 2.7)
6.6 (SD: 3.8)
KAS
4.8 (SD: 2.3)
6.0 (SD: 3.7)
5.4 Usability of Interface Results indicated that levels of workload varied by condition, with larger levels experienced when using the mouse with keyboard, in comparison to solely using the keyboard (Table 5). When recalling the sequence of objects presented, participants reported the most difficulties when using the mouse and keyboard with no audio (MH), and when using the keyboard with auditory feedback (KA). Lower levels of workload were reported for conditions where speech-based feedback was presented (e.g. MHS, KS and KAS). When asked to rate each form of feedback, nine out of the twelve participants agreed with the statement that haptic and speech based cues were appropriate for use. However, only seven agreed with the same statement for audio. Four out of the twelve participants agreed with the statement that using the keyboard with the mouse, was more usable than using the keyboard alone. Nine of the twelve suggested that they were able to enjoy the experience of playing the games developed.
Developing and Evaluating a Non-visual Memory Game
549
Table 5. Ranking of Ease of Recall and Cognitive Workload by Condition (1-easier/lower to 7difficult/higher)
Condition
Recall
Workload
MH
5.9 (SD: 1.5)
5.8 (SD: 1.9)
MHS
3.7 (SD: 1.5)
3.6 (SD: 1.6)
MAH
4.9 (SD: 1.5)
4.8 (SD: 1.8)
MAHS
3.7 (SD: 2.2)
4.2 (SD: 1.6)
KS
2.6 (SD: 1.9)
2.3 (SD: 1.6)
KA
4.5 (SD: 2.2)
3.9 (SD: 2.1)
KAS
2.7 (SD: 1.3)
3.3 (SD: 1.4)
6 Discussion 6.1 Haptic Feedback vs Multimodal Conditions Findings confirmed that participants were able to replicate sequences more accurately when additional feedback was presented alongside haptic feedback (M: 78.2%, SD: 24.3%) compared to the unimodal presentation of the haptic effects (M: 62.5%, SD: 27.4%), providing support to H1. Although not statistically significant, results suggested that more time was taken when additional feedback was presented alongside the haptic cues (M: 5.9s, SD: 3.6s), in contrast to haptics presented by itself (M: 5.5s, SD: 3.7s). The large levels of deviation experienced would be worthy of further study. Lower levels of perceived cognitive workload were experienced under conditions where speech was presented (MHS and MAHS), compared with MH and MAH. When questioned about their performance, participants suggested that the presentation of speech would provide the most informative cues which they could commit to memory, while the haptic feedback would provide supplementary feedback to aid retention and provide valuable guidance towards a target. 6.2 Method of Interaction with Game Although findings suggested that the number of accurate replications of sequences and time taken did not differ widely depending on method of interaction, no effects could be detected. Therefore, H2 could not be supported. When questioned about their performance, participants suggested that constraints of the device had slowed their progress. For example, haptic directional effects could be missed, unless the force-feedback mouse was aligned towards the center rather than the edge of its base. Absolute positioning was suggested as one method of improving interaction with the device. Lower levels of cognitive workload were reported for keyboard-only interaction (M: 3.2, SD: 1.7), compared to when using the keyboard in conjunction with the mouse (M: 4.6, SD: 1.3). This was in part attributed to the sensitivity of the mouse, meaning that small movements made using the device, would translate into larger on-screen movements which could cause confusion for the users.
550
R. Kuber, M. Tretter, and E. Murphy
As the blind participants were unfamiliar with using a mouse, it was anticipated sighted users would be faster in terms of targeting objects using this method of interaction. Findings showed that blind participants completed tasks on average 1.4 seconds faster (SD: 1.4 seconds), with greater levels of accuracy (4.3%) compared with their sighted counterparts. Blind participants were noted to make careful, controlled movements to interact with the device. The blind participants were asked to comment on the condition which most effectively supports mouse use. The combination of speech, audio with haptic feedback (MAHS) was found by both participants to aid interaction with the game. One blind participant mentioned that the “accessibility of the game would enable her to take turns with her sighted siblings, and not be at a disadvantage due to her visual abilities.” This is particularly encouraging as research suggests that there are few immersive and collaborative games for blind and sighted users (Sanchez, [20]). 6.3 Auditory and Multimodal Feedback Findings from our study revealed that more time was spent using a combination of audio and haptic cues (MAH - M: 6.2 s, SD: 3.5s) compared to the majority of other conditions presented in Figure 4. Cockburn and Brewster [4] found that audio and tactile cues presented individually improved targeting times by 4.2% and 3.5%, while the combination of audio and tactile feedback reduced normal targeting times by only 1.7%. The researchers have suggested that by combining tactile feedback with ‘stickiness’ (adjustment of control-display gain), could significantly benefit the targeting process, rather than using alternative combinations of feedback. Subjective comments received from participants in our study, suggested that they experienced difficulties processing both sets of cues in tandem, accounting for the larger levels of cognitive workload expended (M: 4.8, SD: 1.8). More specifically, participants had suggested that the pitch mappings used could on occasion cause confusion, requiring them to think more carefully about the direction of movement which they were meant to convey. Walker and Kramer [23] highlighted that mental models and experiences from previous metaphors can influence a user’s perception of pitch movement and associated mappings. This may explain why some users did not intuitively associate the upwards or downwards glissando with the intended movements. In future versions of the system, the auditory and haptic feedback will be more carefully designed to complement one another, with the aim of reducing the levels of cognitive workload experienced. 6.4 Effects of Memory Although we did not record the stimuli selected by participants when a sequence was erroneously entered, participants were questioned on the most memorable parts of each sequence. All twelve participants strongly agreed with the statement that they found it easier to remember the first few non-visual stimuli, compared to the last few stimuli in each of the longer sequences. Watkins and Watkins [25] have suggested that with auditory input, a strong primacy effect is obtained when recalling tactile information, together with a shorter recency effect. Participants in our study were observed pausing midway through longer sequences, attempting to recall the remaining stimuli.
Developing and Evaluating a Non-visual Memory Game
551
Participants suggested that for longer sequences, they would on occasion attempt to guess the order of the last few remaining stimuli, in order to complete the level. While participants suggested that the presence of speech aided their retention of sequences, Watkins and Watkins [25] remain cautious about the role which verbal labels play, favoring further study in the area. 6.5 Participant Selection When evaluating accessible interfaces, researchers often encounter challenges recruiting target users. This is in part attributed to the small size of the blind community, and the variability in the levels of residual sight which may impact the way the interface is used. In response, blindfolded sighted users are recruited for preliminary or exploratory studies. Studies have shown that no major differences were identified in response times between groups of sighted and blind users (Ferres et al., [8]). However, results from the authors’ study revealed that blind participants were found to select more commands, compared to their sighted counterparts. As many blind users opt to use the keyboard, rather than a mouse, so may require additional training to use these type of input devices. Although we thought that it may create difficulties for the blind users who were unfamiliar with the mouse, results in Section 5.3 have revealed that their careful controlled movements, enabled blind participants to complete tasks with and without a mouse in a faster time than their sighted counterparts. However, we acknowledge this may not be the same case for all blind users. Furthermore, the mental structural representation of interfaces differs between both sighted and blind groups (Kurniawan et al., [11]). This is largely due to the restricted output from a screen reader, leading individuals who are blind to perceive that objects on an interface are spatially presented along one dimension. While sighted users knew from previous experience with web pages, that new content is predominantly in the center of the page, they would move the mouse cursor towards this point to locate targets. The case was more difficult for blind users, who were observed moving around the page searching for a landmark to orientate their position, and then search for the particular target within the sequence. Speech-based cues were described as being essential by the blind participants, to aiding interaction with an unfamiliar device such as a mouse. Further study will also need to be performed with representative users, to ensure that the system meets the diverse needs of the blind community.
7 Conclusion and Future Work This study has demonstrated the potential of non-visual feedback in the design of accessible memory games. Observations revealed that blind users were able to interact with the Logitech Wingman force-feedback mouse, and complete the games in a comparable time to using the keyboard alone. Findings have shown that blind users would benefit from haptic feedback providing directional information, however, more successful attempts at replicating sequences were found when haptic cues were presented in combination with other effects, particularly speech-based cues. This was thought to offer additional support when using an unfamiliar input device, such as a
552
R. Kuber, M. Tretter, and E. Murphy
force-feedback mouse. In the future, we aim to examine whether participants are able to strengthen their auditory and haptic memory abilities through playing the game, and to see whether with more distinctive forms of feedback, participants are able to replicate sequences within a shorter period of time. Acknowledgements. We thank Dr. Henry H. Emurian (UMBC) for his input to this work.
References 1. Allman, T., Dhillon, R.K., Landau, M.A., Kurniawan, S.H.: Rock Vibe: Rock Band® Computer Games for People with No or Limited Vision. In: ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2009), pp. 51–58. ACM Press, New York (2009) 2. Brewster, S.A., Wright, P.C., Edwards, A.D.N.: An Evaluation of Earcons for use in Auditory Human-Computer Interfaces. In: INTERACT 1993 and CHI 1993, pp. 222–227. ACM Press, New York (1993) 3. Brewster, S.A.: Navigating Telephone-Based Interfaces with Earcons. In: BCS 1997, pp. 36–39. Springer, Heidelberg (1997) 4. Cockburn, A., Brewster, S.A.: Multimodal Feedback for the Acquisition of Small Targets. Ergonomics 48(9), 1129–1150 (2005) 5. Dangerous Decibels, http://www.dangerousdecibels.org 6. Evreinova, T.G., Evreinov, G., Raisamo, R.: An Alternative Approach to Strengthening Tactile Memory for Sensory Disabled People. Universal Access in the Information Society 5(2), 189–198 (2006) 7. Hasbro, http://www.hasbro.com 8. Ferres, L., Lindgaard, G., Sumegi, L.: Evaluating a Tool for Improving Accessibility to Charts and Graphs. In: ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2010), pp. 83–90. ACM Press, New York (2010) 9. Kuber, R., Yu, W.: Feasibility Study of Tactile-based Authentication. International Journal of Human-Computer Studies 68, 158–181 (2010) 10. Kuber, R., Yu, W.: Tactile vs Graphical Authentication. In: Kappers, A.M.L., van Erp, J.B.F., Bergmann Tiest, W.M., van der Helm, F.C.T. (eds.) EuroHaptics 2010. LNCS, vol. 6191, pp. 314–319. Springer, Heidelberg (2010) 11. Kurniawan, S.H., Sutcliffe, A.G., Blenkhorn, P.L.: How Blind Users’ Mental Models Affect their Perceived Usability of an Unfamiliar Screen Reader. In: INTERACT 2003, pp. 631–638. IOS Press, Amsterdam (2003) 12. Levitin, D.J.: Memory for Musical Attributes. In: Levitin, D.J. (ed.) Foundations of Cognitive Psychology: Core Readings. MIT Press, Cambridge (2002) 13. Levitin, D.J., Cook, P.R.: Absolute Memory for Musical Tempo: Additional Evidence that Auditory Memory is Absolute. Perception & Psychophysics 58, 927–935 (1996) 14. Logitech, http://www.logitech.com 15. Lustig, C., Shah, P., Seidler, R., Reuter-Lorenz, P.A.: Aging, Training, and the Brain: A Review and Future Directions. Neuropsychology Review 19(4), 504–522 (2009) 16. Mahrer, P., Miles, C.: Memorial and Strategic Determinants of Tactile Recency. Experimental Psychology 25(3), 630–643 (1999)
Developing and Evaluating a Non-visual Memory Game
553
17. McGee, M.R., Brewster, S.A., Gray, P.D.: The Effective Combination of Haptic and Auditory Textural Information. In: Brewster, S., Murray-Smith, R. (eds.) Haptic HCI 2000. LNCS, vol. 2058, pp. 118–126. Springer, Heidelberg (2001) 18. Morelli, T., Foley, J., Folmer, E.: VI-Bowling: A Tactile Spatial Exergame for Individuals with Visual Impairments. In: ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2010), pp. 179–186 (2010) 19. Oakley, I., Adams, A., Brewster, S.A., Gray, P.D.: Guidelines for the Design of Haptic Widgets. In: BCS 2002, pp. 195–212. Springer, Heidelberg (2002) 20. Sanchez, J.H.: AudioBattleShip: Blind Learners Cognition through Sound. In: Fifth International Conference Series on Disability, Virtual Reality & Associated Technologies, ICDVRAT 2004 (2004) 21. Sjostrom, C.: Using Haptics in Computer Interfaces for Blind People. In: Extended Abstracts of CHI 2001, pp. 245–246. ACM Press, New York (2001) 22. Vitense, H., Jacko, J.A., Emery, V.K.: Multimodal Feedback: Establishing a Performance Baseline for Improved Access by Individuals with Visual Impairments. In: ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2002), pp. 49–56. ACM Press, New York (2002) 23. Walker, B.N., Kramer, G.: Mappings and Metaphors in Auditory Displays: An Experimental Assessment. ACM Transactions on Applied Perception 2(4), 407–412 (2005) 24. Wang, Q., Levesque, V., Pasquero, J., Hayward, V.: A Haptic Memory Game using the STRESS Tactile Display. In: Extended Abstracts of CHI 2006, pp. 271–274. ACM Press, New York (2006) 25. Watkins, M.J., Watkins, O.C.: A Tactile Suffix Effect. Memory and Cognition 5, 529–534 (1974)
Playing with Tactile Feedback Latency in Touchscreen Interaction: Two Approaches Topi Kaaresoja1, Eve Hoggan2, and Emilia Anttila3 2
1 Nokia Research Center, Helsinki, Finland Helsinki Institute of Information Technology HIIT, University of Helsinki, Finland 3 Nokia, Espoo, Finland [email protected], [email protected], [email protected]
Abstract. A great deal of research has investigated the potential parameters of tactile feedback for virtual buttons. However, these studies do not take the possible effects of feedback latencies into account. Therefore, this research investigates the impact of tactile feedback delays on touchscreen keyboard usage. The first experiment investigated four tactile feedback delay conditions during a number entry task. The results showed that keypads with a constant delay (18 ms) and the smallest feedback delay variation were faster to use and produced less errors compared to conditions with wider delay variability. The experiment also produced an unexpected finding – users seemed to perceive buttons with longer delays as heavier, with a need for greater force when pressing. Therefore another experiment was conducted to investigate this phenomenon. Seven delay conditions were tested using a magnitude estimation method. The results indicate that using different latencies can be used to represent tactile weight in touchscreen interaction. Keywords: Touchscreen, latency, tactile feedback, weight.
Playing with Tactile Feedback Latency in Touchscreen Interaction
555
This paper will address the following questions: How do constant and variable tactile feedback latencies affect performance and perceived pleasantness? How does audio feedback inherently part of the tactile feedback affect the perception of tactile feedback latencies? It must also be noted that certain perceptual consequences, such as illusions, may occur when latency is introduced to tactile feedback. Therefore, our final research question asks whether illusions created through latency can be used as a design parameter for an interface when carefully selected and controlled. This paper reports the results of two experiments focusing on tactile latency. The first experiment investigated four different tactile feedback delay variation conditions during a number entry task on a touchscreen keypad. The second experiment was conducted to test seven different delay conditions for virtual buttons using a magnitude estimation method. This experiment was used to determine whether using different latencies can be an effective way in which to represent tactile weight as a parameter in touchscreen interaction.
2 Background Work The following section will briefly describe some of the existing research in the areas of tactile feedback for touchscreens and latency tolerance in relation to this work. 2.1 Touchscreens with Tactile Feedback Recent research has mainly focused on the technology used to provide the tactile feedback from touchscreens on mobile devices and also on the advantages of augmenting visual displays with an additional modality through which information can be transmitted. Poupyrev et al. [9] introduce tactile feedback for ,mobile touchscreen devices. They embedded TouchEngine, a thin, miniature low-power tactile actuator in a PDA. The actuator was designed specifically for use in mobile interfaces, and it is a piezoelectric actuator that bends when a signal is applied. In this case, Poupyrev et al. used the tactile feedback as an ambient background channel of information for several applications. They conducted a formal user study into the use of tactile feedback with tilting devices, for example. Participants were required to scroll through a text list using gestures. The results of the study showed that, on average, participants could complete the tasks 22% faster when provided with tactile feedback. Nashel and Razzaque [10] added tactile cues simulating real buttons to virtual buttons displayed on mobile devices with touch screens. As the user’s finger moves over a virtual button: a ‘pop’ is presented as the finger enters a button region; a low amplitude vibration is presented when the finger pushes the button; a short pulse is presented as the finger leaves the button area and no feedback when the finger is between buttons. The experiments conducted found that all participants were able to differentiate between vibration (finger over a button) and no vibration (finger not over any button). Kaaresoja et al. [11] presented a touchscreen mobile device augmented with piezoelectric tactile feedback. The actuators are positioned under a resistive touchscreen, and can provide tactile feedback to a stylus or finger. The authors suggest four applications for the touchscreen tactile feedback: numerical keypad, text selection, scrolling, and drag
556
T. Kaaresoja, E. Hoggan, and E. Anttila
and drop. Lastly, Hoggan et al. [12] presented a study investigating the use of tactile feedback for touchscreen keyboards on PDAs. The tactile feedback added to the standard touchscreen buttons was made up of simple tactons, that are abstract structured vibrotactile icons that can be used to encode multiple dimensions of information [13]. The research mentioned above shows that tactile feedback can be an effective addition to touchscreen interaction in terms of performance and user satisfaction. However, even though most tactile hardware involves at least a small amount of latency, the majority of this research does not take this into account. The promising results of the studies discussed above may be drastically reduced if users are not tolerant to latencies. 2.2 Latency Adelstein et al. [14] experimented with the threshold for detecting a difference between two different latency conditions between audio and tactile stimuli. The participants tapped a brick with the rubber tip of a hammer twice, and were asked to judge if there was difference in the lag of two stimuli pairs. The results suggest that the just noticeable difference is 24 ms, which as unaffected by the length of the audio stimulus. Mäki-Patola and Hämäläinen [15] studied the tolerance of a human performer for latency in gesture controlled musical instruments. The threshold for latency perception was between 20 and 30 ms. They also found out that the playing style affected the detection of latency: if slow passages of music with vibrato were played, high latencies were tolerated. Also younger participants detected the latency more easily. MacKenzie and Ware [16] reported a study on the effect of latency in a target acquisition task using the Fitts’ law paradigm. They introduced four latency conditions, 8.3, 25, 75 and 225 ms. The task was to move a small rectangle to a target with a mouse on a computer screen. They modified the target width and the distance from the target. The results showed that the maximum latency 225 ms significantly slowed down the task by 64% from the reference 8.3 ms, error rate being 214% higher (3.6% to 11.6%). In our previous work [8] we investigated the effect of constant tactile feedback latency in touchscreen keypad interaction. We experimented with latency between the virtual button press and the tactile feedback from 18 ms to 118 ms.. The task was to enter numbers in a number keypad and short sentences in a QWERTY keypad. The latency was unchanged during a test block of number or character entry tasks. The results showed that the users tolerate the tactile feedback latency well. The performance did not drop significantly. However, an effect was found in subjective satisfaction: the satisfaction drops as the delay gets longer. Although a small amount of research has been conducted in the area of tactile feedback latency for touchscreens, there has been no investigation of variable latencies or of the potential design opportunities provided by such latencies.
3 Experiment 1: Variable Delay The first experiment in this research investigated the effect of tactile feedback latency variation on user performance in touchscreen interaction. The aim of the study was to compare the efficiency, accuracy and subjective perceived pleasantness of the touchscreen use between different tactile feedback latency variation conditions.
Playing with Tactile Feedback Latency in Touchscreen Interaction
557
3.1 Participants A within subjects design was used with 12 participants. Six of them were female and six were male. The age varied from 18 years to 47, average age being 28 years. All the participants were employees of Nokia Research Center. They were all right-handed. Ten of the participants had used a mobile phone for over five years and two of them for about 3-5 years. All participants claimed they write an average of three text messages per day. Four participants said they use a touchscreen device daily. Five participants said that they use touchscreen devices once in a while, and the last three participants had never used a touchscreen device. 3.2 Equipment and Stimuli Experiment 1 was conducted on a Nokia 770 Internet Tablet (Fig. 1) augmented with tactile feedback [17]. The 4-inch (90 x 54 mm) touchscreen displayed a virtual numeric keypad (Fig. 2). The experiment application was written in Python. Tactile feedback was generated with two piezo discs which were placed under the display module [17].
Fig. 1. Nokia 770 Internet Tablet enhanced with tactile feedback
Fig. 2. The touchscreen displayed a virtual keypad
558
T. Kaaresoja, E. Hoggan, and E. Anttila
We decided to study button-down feedback as opposed to button-down and buttonup to keep the number of experiment conditions low enough and because an experiment with button-up feedback was conducted separately [8]. Button-up feedback on touchscreens can be complicated because the user’s finger tends to have been removed from the display before the feedback is presented meaning that tactile feedback can often go unnoticed at least when the latency is higher. The tactile pulse used in the Experiment 1 is shown in Fig. 3. The rise time of the pulse was 1 ms and the fall time 4 ms, displacement of the display module being about 6 µm. When the user pressed a key, this pulse was given after a delay described below. Stimulus pulse 7 6
Travel (um)
5 4 3 2 1 0 0
1
2 3 Time (msec)
4
Fig. 3. The tactile feedback pulse used in Experiment 1. The figure shows the displacement of the display module.
Four different tactile feedback latency variation conditions were selected. The first condition was the reference 18 ms (minimum latency achieved with the hardware used) with no variation. The second condition was latency variation between 18 and 36 ms (randomized equal amount of latencies 18 and 36 ms), the third being between 18 and 72 (randomized equal amount of latencies 18, 36 and 72 ms), and the fourth between 18 and 144 ms (randomized equal amount of latencies 18, 36, 72 and 144). As we can see, the maximum latency was doubled for every condition. The latency variation conditions were named as 18, 18-36, 18-72 and 18-144. The delay was validated with a microphone and an oscilloscope. 3.3 Procedure The piezo technology inherently produces audio feedback in addition to tactile feedback. In order to find out if the audio feedback has an effect, we decided to run the experiment both with and without headphones. In the first condition, typical street noise was played in the headphones to mask the audio feedback from the device. These two environmental conditions were named as Tactile (with headphones) and Tactile&Audio. The task was to key in the three numbers, which appeared on the display at once and press lastly the OK button using the virtual keypad (Fig. 2). If the participants
Playing with Tactile Feedback Latency in Touchscreen Interaction
559
made an error, they could correct it with the C-button. There were altogether 55 tasks in one block. The block was repeated in all four conditions. There were 440 tasks in Experiment (55 tasks, 4 latency conditions and 2 environmental conditions) altogether. All the blocks and conditions were counterbalanced. The experiment application measured the time from the first keypress to the keypress of OK button and it also wrote the numbers, the test user keyed in, into a results file. Before starting the actual experiment, the participants were able to try to key in few three digit number series. The participants were instructed to hold the device with both hands and press the virtual buttons on the touchscreen with their right thumb (Fig. 4).
Fig. 4. The participants were instructed to hold the device with their both hands and press the virtual buttons with their right hand’s thumb
There was a questionnaire concerning the subjective satisfaction after each block. The participants were asked to rate their degree of agreement with the following statements using a 1-7 rating scale, where 1 meant totally disagree and 7 totally agree: • • • • •
This keypad is pleasant to use. I felt myself comfortable when using this keypad. Pressing the keypad buttons felt just like pressing physical (“real”) buttons. I always knew that the device received my keypress. I would like to buy a device with this kind of keypad.
There also was a interview containing the following questions after the completion of all the test blocks: 1. Did you notice any difference in the feedback? 2. Did you notice that some of the feedback came after a delay? 3. What did you think about the feedback delay? Was it distracting? Did the feedback delay have some effect on your performance? It took approximately 30 minutes to complete the whole experiment.
560
T. Kaaresoja, E. Hoggan, and E. Anttila
3.4 Results Fig. 5 shows the average task time with 95%CI in all the latency variation conditions for both Tactile and Tactile&Audio conditions. A two-way repeated measures ANOVA showed there is a significant difference in the latency variation condition [F(3,33)=2.72, p=0.049]. Post-hoc Tukey HSD analysis showed the statistical difference are between the [Tactile&Audio 18] and [Tactile&Audio 18-144] (p<0.05), [Tactile&Audio 18-36] and [Tactile&Audio 18-144] (p<0.05), [Tactile&Audio 18] and [Tactile 18-144] (p<0.05), and [Tactile&Audio 18-36] and [Tactile 18-144] (p<0.05). It means that the participants performed significantly faster with the reference (18 ms) and the minimum latency variation (18-36 ms) without headphones than with the largest latency variation (18-144 ms) with or without headphones. The analysis did not show statistical difference between the Tactile and Tactile&Audio conditions [F(1,11)=0.51, p=0.51].
Average task time (ms) 1800 1600
Time (ms)
1400 1200 1000
Tactile
800
Tactile&Audio
600 400 200 0 18
18-36
18-72
18-144
Delay condition (ms)
Fig. 5. The average task time with 95%CI in with and without headphones
Fig. 6 shows the average error rate with 95%CI in all the latency variation conditions for both Tactile and Tactile&Audio conditions. A two-way repeated measures ANOVA showed there is no significant difference in the latency variation condition [F(3,33)=2.43, p=0.07], or the between Tactile and Tactile&Audio conditions [F(1,11)=1.39, p=0.24].
Playing with Tactile Feedback Latency in Touchscreen Interaction
561
Average error rate
Error rate (%)
3%
2% Tactile&Audio Tactile
1%
0% 18
18-36
18-72
18-144
Delay condition (ms)
Fig. 6. The average error rate with 95%CI in with and without headphones
Fig. 7 and Fig. 8 show the mean values and standard deviations for degree of agreement with the satisfaction questionnaire statements on a 1 to 7 (totally disagreetotally agree) scale in Tactile and Tactile&Audio conditions. In both conditions the results show that the keypad with largest delay variation was clearly rated lowest in all five statements. Therefore we can conclude that it is less pleasant to use compared to other keypads with less delay variation. In the interview, the users commented that, without the audio feedback (with the headphones on), the significance of the tactile feedback was quite obvious. They noticed differences in the keypad’s sensitivity and accuracy. Participants stated that keypad with the largest delay variation required more force to press the buttons and felt somewhat unreliable. Only a couple of users noticed the delay in some of the feedback and the others did not notice that at all (10 out of 12 participants). The keypad with the shortest delay was said to be the most pleasant to use; it was fast and felt natural. Users without the headphones (Tactile&Audio condition) commented that they noticed differences in the keypad’s accuracy and reliability. The keypad with the shortest delay was said to be more sensitive and reliable and it did not require so much force to press the buttons compared to other keypads. Only two of the test users said that they noticed that some feedback came after a delay but the rest did not notice that at all. However, almost all users commented that there was something seriously wrong with the keypad with the largest delay variation and it was horrible to use but they could not figure out the reason behind this. Users commented that they needed to press the buttons very hard before the keypad received their key-press.
562
T. Kaaresoja, E. Hoggan, and E. Anttila
Average of Subjective ratings Tactile 7 6 5
18 ms 18-36 ms
4
18-72 ms
3
18-144 ms
2 1 Average of Pleasant to use
Average of Felt comfortable
Average of Like real button
Average of Received keypress
Average of Want buy one
Fig. 7. Mean values and standard deviations of subjective ratings in Tactile condition
Average of Subjective ratings Tactile&Audio 7 6 5
18 ms 18-36 ms
4
18-72 ms
3
18-144 ms
2 1 Average of Pleasant to use
Average of Felt comfortable
Average of Like real button
Average of Received keypress
Average of Want buy one
Fig. 8. Mean values and standard deviations of subjective ratings in Tactile&Audio condition
3.5 Discussion This study clarified the effect of tactile feedback delay variation on the user performance in the use of the touchscreen device. The study compared the efficiency,
Playing with Tactile Feedback Latency in Touchscreen Interaction
563
accuracy and subjective perceived pleasantness of touchscreen use between different tactile feedback delay variation conditions. The experiment also investigated whether audio feedback biases the effect of tactile feedback delay variation on user performance and the evaluation of the pleasantness of the tactile feedback. It was found that the keypad with a feedback delay variation of 18-144 ms was slowest to use. However, there was no significant increase in the error rate. The results also showed that in the satisfaction questionnaire the keypad with the largest delay variation 18-144 ms received clearly the lowest score in all five statements compared to other delay conditions. Therefore it is suggested that users can tolerate a tactile feedback delay varying from 0ms to 36ms and even 76 ms when using touchscreen buttons. However, wider feedback delay variations impact user performance, with reduced accuracy and increased task times, and are thereby unacceptable. The same conclusion can be drawn from the pleasantness evaluations where users strongly disliked the keypad with the biggest delay variation. The pairwise comparisons between the Tactile&Audio and Tactile conditions showed no statistical differences. These results indicate that inherent audio feedback does not have an effect on user performance or the perceived pleasantness of touchscreen buttons incorporating tactile feedback variations.
4 From Performance Measurements to Illusions The experiment described next takes the issue of tactile latency one step further by examining the perceptual illusions. In the areas of HCI and UI design, delays in feedback for a user’s actions are often considered as negative issues and are to be avoided at all costs. However, the results of the experiment detailed above have shown that latency in tactile feedback is varying between 0 ms to 36 ms in touchscreen virtual button use cases is acceptable. In fact, in Experiment 1, most users (10 out of 12) were not aware of the delay. Interestingly when participants were asked in post-study questionnaires about their experience during the experiments, several participants mentioned aspects related to the size and weight of the buttons. For example, in Experiment 1, one participant stated, “especially the last one (18-144 ms) required much more force to press the buttons” and “I needed to press the buttons very hard”. A previous study [8] on constant feedback latencies produced similar findings where participants felt that they needed to apply more force to the touchscreen buttons when the delay was longer, even though the activation threshold of the touchscreen was not altered in any way. It appears as though the introduction of different delays to tactile feedback does not affect latency perception but in fact affects the perception of the size/weight of the button.
5 Experiment 2: Perceived Weight Study Finding design parameters for tactile feedback to encode information is a difficult task as the set of parameters that can be manipulated is much smaller than in sound
564
T. Kaaresoja, E. Hoggan, and E. Anttila
for example, and this reduces the number of stimuli that can be created. Researchers are trying to find effective new parameters that can be used for the design of distinguishable tactile feedback. Previous studies have investigated parameters such as rhythm, spatial location and texture [13, 18, 19]. The post-study questionnaire results in Experiment 1 suggest that latency may be a potential new parameter for use in the design of tactile feedback. Introducing latency may be sufficient to create a full-strength illusion of weight or depth. In an effort to establish weight (created using latency) as a tactile feedback design parameter, this section presents the results from an experiment investigating whether users perceive buttons with a bigger delay as being heavier or larger. We present practical implementation details and guidelines based on the results focusing on the design of heavy/light touchscreen buttons and more specifically, on the most effective parameter design. Drawing on the results, we argue that weight can be used as an additional parameter when designing tactile feedback for touchscreens. 5.1 Related Work Traditionally most research into the perception of weight has concluded that the visual sense has dominance over the other senses. In particular, the most famous area of study is the size weight illusion [20-22] where when lifting two objects of different volume but equal weight, people judge the smaller object to be heavier. However, more recently it has been shown that the sense of touch can also affect the perception of weight. Ellis and Lederman [23] showed that the size-weight illusion is a primarily haptic phenomenon where haptic volume cues can create the same size-weight illusion as well as visual volume cues if not better. Amazeen [24] also showed that a dependence of perceived heaviness relies on a dependence on perceived volume (perceptual independence). This volume can be perceived through the sense of touch by allowing users to grasp virtual objects using force feedback devices for example. These results would suggest that it might be beneficial to investigate the use of haptic volume cues in touchscreen interaction too. Although the objects cannot be physically grasped, it may be possible to create a similar illusion of weight or volume without using visual cues. 5.2 Equipment and Stimuli The equipment was identical to Experiment 1. The tactile pulse used in the Experiment 2 is shown in Fig. 9. The rise time of the pulse was 5 ms and the fall time 4 ms, displacement of the display module being about 25 µm, leading a feeling of soft tactile click. When the user pressed a key, this pulse was given after a delay. The delays used were 18, 38, 58, 78, 98, 118, 138 and 158 ms. Given the results of Experiment 1, participants did not wear headphones to block out the audio feedback from the actuators, as this had no statistical effect on the accuracy of the number entry tasks.
Playing with Tactile Feedback Latency in Touchscreen Interaction
565
Fig. 9. The tactile feedback pulse used in Experiment 2. The figure shows the displacement of the display module.
Fig. 10. Screenshot of interface in Experiment 2
5.3 Methodology The aim of this experiment was to investigate whether users perceive buttons with a bigger delay as being heavier. A magnitude estimation method was used where participants assigned an arbitrary number of their own choice to the weight of a baseline stimulus presented. Participants were then presented with a series of further stimuli (buttons plus delay) and asked to assign a weight number reflecting its perceived weight relative to the baseline. Overall there were 28 tasks in the experiment using a random baseline stimulus (chosen from the 7 clicks with different delays). For example, a participant could be presented with the 18 ms delay stimulus as the baseline and asked to assign a weight value to the 158 ms delay stimulus in comparison to the baseline (Fig. 10). The experiment took approximately 30 minutes to complete. Overall, fourteen participants took part in the experiment: 8 male and 6 female, all employees of Nokia Research Center. In general terms, the experiment involved a within subjects design where teach participant completed all magnitude estimation tasks for all seven delay categories. All tasks were counterbalanced.
566
T. Kaaresoja, E. Hoggan, and E. Anttila
5.4 Hypothesis The hypothesis for this experiment was as follows: 1.
Longer delays in tactile feedback will result in the perception of heavier buttons.
5.5 Results During the experiment, the experimental software recorded data on the participants’ magnitude estimations of each stimulus as shown in Fig. 11. Given that participants could assign any weight rating they wished, all data were normalized prior to analysis.
Fig. 11. Average weight estimate for each tactile feedback delay (with standard deviations)
From the average weight estimates for each tactile feedback delay we found a significant correlation of 0.81 (r = 0.532, p = 0.05) showing a positive relationship between weight and latency. This correlation suggests that higher feedback delays result in the perception of heavier buttons. Analysis of the weight estimates for each tactile feedback delay using an ANOVA showed a significant difference (F=13.69, df=6, p<0.05). As summarised in Table 1, Post hoc Tukey tests with Bonferroni corrected significance levels showed that buttons with delays of 78 to 158 ms are perceived as significantly heavier than those using an 18 ms delay (p = 0.05). Buttons with delays ranging from 98 to 158 ms are perceived as significantly heavier than those with a 38 ms delay (p = 0.05). Buttons with delays of between 118 and 158 ms are also perceived as significantly heavier than 58 to 78 ms delays (p = 0.05) and a 158 ms delay is also perceived as significantly heavier than a 98 ms delay (p = 0.05). Lastly, there were no significant differences in the perceived weight of touchscreen buttons with delays ranging between 18 and 58ms or between 118 and 158 ms.
Playing with Tactile Feedback Latency in Touchscreen Interaction
567
Table 1. Summary of tactile latencies (in milliseconds) perceived to create heavier buttons
18
38
58
18 38 58 78 98 118 138 158
78
98
118
138
158
*
* *
* * * *
* * * *
* * * * *
These results indicate that it is possible for users to interpret delays as having an effect on the perceived ‘weight’ of touchscreen buttons. Button presses with tactile feedback delayed by 18 ms, 78 ms, and 118 to 158 ms respectively are perceived as being significantly heavier. 5.6 Post-study Questionnaire Results At the end of the experiment, participants were given the opportunity to try all buttons with all delays again and asked which button they thought was the heaviest and which was the lightest. The results are shown in Fig. 12. Participants did not rate any of the buttons with short delays as ‘heavy’. 158 ms was voted as the heaviest 57% of the time and 118 ms 43% of the time. This suggests that the weight difference between 118 and 158 ms is negligible. As opposed to the votes for the heaviest button, no long delays were chosen to be the lightest buttons. The shortest delays of 18 and 38 ms were voted as the lightest buttons (with 1 vote for 78 ms, which is surprising. Further investigation with a larger number of participants is required.)
Fig. 12. Total number of votes for the heaviest and lightest button based on delay
568
T. Kaaresoja, E. Hoggan, and E. Anttila
In the post-study questionnaire, participants were also asked to rate levels of pleasantness, comfort, realism and confidence on a scale of 0 to 7. As can be seen in Fig. 13, all buttons with all delays were rated fairly highly for all measures. Questionnaire Results 18ms
38ms
58ms
78ms
98ms
118ms
158ms
Average Rating (out of 7)
7 6 5 4 3 2 1 0 Pleasantness
Comfort
Realistic
Confidence
Tactile Feedback Delay Fig. 13. Average rating for each delay (out of 7) with standard deviations
Analysis of the questionnaire results showed no significant differences except for confidence levels. A Kruskal-Wallis test (H = 12.87, df = 6, p = 0.05) followed by post hoc Dunn’s Tests (p = 0.05) confirmed that participants experienced significantly lower confidence levels when using buttons with 158ms feedback delays compared to those with 18 and 38 ms. Some participants stated: “it (158 ms) seems unresponsive” and “sticky (118 ms and 158 ms)” which helps to explain such low confidence levels. 5.7 Discussion Overall, when using this range of delays, 3 significantly different weights can be produced using: 18, 78 and 118 or 158 ms. One participant mentioned, “only C (158 ms) is unpleasant to use”. Fortunately, given that feedback delays of 118 ms are also perceived as significantly heavier than 78 ms and 18 ms, there is no need to use a delay of 158 ms when aiming for three distinctive weights. In the future, tactile latency could be incorporated into touchscreen widgets such as soft keyboards. For example, buttons with important or crucial functions such as ‘delete’ or ‘send’ could use a longer feedback delay to create the illusion of weight.
Playing with Tactile Feedback Latency in Touchscreen Interaction
569
6 Overall Discussion and Conclusions This paper reported the results of two experiments investigating tactile feedback latencies in touchscreen interaction. Both experiments approached latency from a different perspective. The first experiment took a traditional stance with an aim to establish the levels at which variable tactile feedback latencies become detrimental to performance and perceived pleasantness of touchscreen keypads. The second experiment focused on the potential benefits of latency as a design parameter. The first experiment produced some interesting findings. Firstly, tactile feedback latencies with small variations between 18 and 72 ms were found to be acceptable to most users. The results show that the keypad with a feedback delay variation of 18144 ms was slowest to use. However, there was no significant increase in the error rate. The majority of participants indicated in the satisfaction survey that high levels of variation (18 to 144 ms) were not pleasant or preferred. In real world systems, it is more likely for users to encounter variable feedback delays than constant delays so it is important for designers to take this into account and ensure that such variations do not exceed 72 ms. Unexpectedly, the results of the first experiment also indicate that audio feedback does not have an extra effect of the performance and perceived pleasantness of tactile feedback with variable latencies. This is a very positive finding given that most devices produce some intrinsic audio feedback when the tactile actuators vibrate. It is reassuring to know that this extra feedback does not affect performance. However, it must be noted that the additional audio feedback could perhaps cause annoyance to users, especially in discreet situations and environments such as libraries. Experiment 2 was conducted to establish whether users perceive buttons with a bigger tactile feedback delay as being heavier. The results show that adding tactile feedback latencies to button presses does indeed create the illusion of added button weight. More specifically, to create a set of three different buttons with different perceived weights, the tactile feedback latencies should be set to 18 ms, 78 ms and 118 ms+. It may seem unusual to encourage the introduction of feedback delays but in this case our research has successfully shown that tactile feedback delays often go unnoticed by users and that, instead, these delays create the illusion of weight. This may explain the high tolerance of users to the tactile feedback latencies in the first experiment discussed in this paper and in our previous work on constant feedback latencies [8]. In order for this new design parameter to be used effectively, it will be necessary to combine it with other tactile parameters to see if there are any negative effects when used in combination. Furthermore, future studies will investigate similar touchscreen widget weight illusions with other combinations of modalities such as audio and visual. The thresholds of this parameter will also be investigated to establish the maximum and minimum delays that can be introduced to create the illusion of weight without disturbing performance. Given the relatively low number of design parameters in the tactile modality, compared to audio for example, the addition of a weight/latency parameter can greatly increase the amount of distinguishable tactile feedback that can be created. This new parameter may be used to enhance many different types of interface widgets and also to represent the weight of 3D objects.
570
T. Kaaresoja, E. Hoggan, and E. Anttila
References 1. Wright, M., Cassidy, R.J., Zbyszynski, M.F.: Audio and Gesture Latency Measurements on Linux and OSX. In: International Computer Music Conference, Miami, FL, USA, pp. 423–429 (2004) 2. He, D., Liu, F., Pape, D., Dawe, G., Sandin, D.: Video-Based Measurement of System Latency. In: IPT 2000, International Immersive Projection Technology Workshop, Ames IA, USA (2000) 3. Mine, M.R.: Characterization of End-to-End Delays in Head-Mounted Display Systems. University of North Carolina (1993) 4. Zampini, M., Brown, T., Shore, D.I., Maravita, A., Roder, B., Spence, C.: Audiotactile temporal order judgements. Acta Psychologica 118, 277–291 (2005) 5. Zampini, M., Shore, D.I., Spence, C.: Audiovisual temporal order judgements. Experimental Brain Research 152, 198–210 (2003) 6. Lee, J.-H., Spence, C.: Spatiotemporal Visuotactile Interaction. In: Ferre, M. (ed.) EuroHaptics 2008. LNCS, vol. 5024, pp. 826–831. Springer, Heidelberg (2008) 7. Kaaresoja, T., Brewster, S.: Feedback is...Late: Measuring Multimodal Delays in Mobile Device Touchscreen Interaction. In: ICMI 2010. ACM, Beijing (2010) 8. Kaaresoja, T., Anttila, E., Hoggan, E.: The Effect of Tactile Feedback Latency in Touchscreen Interaction. In: World Haptics 2011. IEEE, Istanbul (2011) 9. Poupyrev, I., Maruyama, S., Rekimoto, J.: Ambient Touch: Designing Tactile Interfaces for Handheld Devices. In: UIST 2002, pp. 51–60. ACM, Paris (2002) 10. Nashel, A., Razzaque, S.: Tactile Virtual Buttons for Mobile Devices. In: CHI 2003, pp. 854–855. ACM Press, Ft. Lauderdale (2003) 11. Kaaresoja, T., Brown, L.M., Linjama, J.: Snap-Crackle-Pop: Tactile Feedback for Mobile Touch Screens. In: Eurohaptics, Paris, France, pp. 565–566 (2006) 12. Hoggan, E., Brewster, S.A., Johnston, J.: Investigating the Effectiveness of Tactile Feedback for Mobile Touchscreens. In: CHI 2008, Florence, Italy (2008) 13. Brown, L.M., Brewster, S.A., Purchase, H.C.: Multidimensional Tactons for Non-Visual Infomation Display in Mobile Devices. In: MobileHCI 2006. ACM, Espoo (2006) 14. Adelstein, B.D., Begault, D.R., Anderson, M.R., Wenzel, E.M.: Sensitivity to haptic-audio asynchrony. In: 5th International Conference on Multimodal Interfaces, pp. 73–76. ACM, Vancouver (2003) 15. Mäki-Patola, T., Hämäläinen, P.: Latency Tolerance for Gesture Controlled Continuous Sound Instrument Without Tactile Feedback. In: International Computer Music Conference (ICMC), Miami, USA (2004) 16. MacKenzie, S., Ware, C.: Lag as a Determinant of Human Performance In Interactive Systems. In: CHI 1993. ACM, Amsterdam (1993) 17. Laitinen, P., Mäenpää, J.: Enabling mobile haptic design: Piezoelectric actuator technology properties in hand held devices. In: HAVE 2006 (Haptic Audio Visual Environments), Ottawa, Canada (2006) 18. Hoggan, E., Brewster, S.: New Parameters for Tacton Design. In: CHI 2007. ACM, San Jose (2007) 19. Ternes, D., MacLean, K.E.: Designing Large Sets of Haptic Icons with Rhythm. In: Ferre, M. (ed.) EuroHaptics 2008. LNCS, vol. 5024, pp. 199–208. Springer, Heidelberg (2008)
Playing with Tactile Feedback Latency in Touchscreen Interaction
571
20. Murray, D.J., Ellis, R.R., Bandomir, C.A., Ross, H.E.: Charpentier on The Size-Weight Illusion. Perception and Psychophysics 61(8), 1681–1685 (1891) (1999) 21. Ross, H.E., Gregory, R.L.: Weight Illusions and Weight Discrimination - A Revised Hypothesis. Q. J. Exp. Psychol. 22(2), 318–328 (1970) 22. van Mensvoort, I.M.K.: What you see is what you feel: exploiting the dominance of the visual over the haptic domain to simulate force-feedback with cursor displacements. In: DIS 2002, pp. 345–348. ACM, London (2002) 23. Ellis, R.R., Lederman, S.J.: The Role of Haptic Versus Visual Cues in The Size-Weight Illusion. Perception and Psychophysics 53(3), 315–324 (1993) 24. Amazeen, E.L.: Perceptual Independence of Size and Weight by Dynamic Touch. Journal of Experimental Psychology 26(3), 1133–1147 (1999)
The Role of Modality in Notification Performance David Warnock, Marilyn McGee-Lennon, and Stephen Brewster Glasgow Interactive Systems Group, Department of Computing Science University of Glasgow, Glasgow, G12 8QQ, UK {warnockd,mcgeemr,stephen}@dcs.gla.ac.uk http://MultiMemoHome.org
Abstract. The primary users of home care technology often have significant sensory impairments. Multimodal interaction can make home care technology more accessible and appropriate, yet most research in the field of multimodal notifications is not aimed at the home but at office or high-pressure environments. This paper presents an experiment that compared the disruptiveness and effectiveness of visual, auditory, tactile and olfactory notifications. The results showed that disruption in the primary task was the same regardless of the notification modality. It was also found that differences in notification effectiveness were due to the inherent traits of a modality, e.g.olfactory notifications were slowest to deliver. The results of this experiment allow researchers and developers to capitalize on the different properties of multimodal techniques, with significant implications for home care technology and technology targeted at users with sensory impairments. Keywords: Multimodal interfaces, accessibility and usability, technology in healthcare.
1
Introduction
With an ageing population throughout Europe and similar trends emerging in other countries [25] it is apparent that preparations need to be made to support a larger aged population. While the post-war baby-boom is often cited as the primary reason for the predicted surge in people over retirement age, modern medicine has also increased life expectancy, suggesting that this trend will continue. As the population ages many will develop health problems that will require care, which is usually provided by family or paid carers. Increased demand for care services will make care at home uneconomical, which could lead to people being moved from their own homes into dedicated care homes where the quality of life is often perceived to be lower [18]. Home care technology can prevent this by helping care recipients to look after themselves, reducing their dependence on carers. This will enable them to continue living in their own homes with dignity and independence. A home reminder system is one type of home care system which can deliver alarms, warnings and information to the user. There are many applications for such reminders in the home: personal care reminders might remind the user to take P. Campos et al. (Eds.): INTERACT 2011, Part II, LNCS 6947, pp. 572–588, 2011. c Springer-Verlag Berlin Heidelberg 2011
The Role of Modality in Notification Performance
573
medication or eat dinner; lifestyle reminders could remind users about events and appointments; and home management reminders might remind users to lock doors and windows at night. Users of home care technology are very likely to have one or more sensory impairments such as sight or hearing loss [12]. Multimodal interaction uses one or more of the senses in order to convey information; as such it is ideal for creating more accessible systems for people with sensory impairments. Multimodal systems have the potential to allow interaction in a more appropriate way by altering the interaction modality based on factors such as user activity, social context, sensory impairment, preference, message complexity, message urgency and message sensitivity. For example, a potentially embarrassing toileting reminder might normally be delivered as a speech message, but could instead be delivered by private tactile notifications in a social setting. In order for the designers of home care systems to use a range of modalities in this manner they need to know about the relative properties, capabilities and shortcomings of a variety of multimodal technologies. This paper empirically investigates the properties and effects of notifications delivered through different modalities in order to inform the design of multimodal technology for the home.
2
Related Work
Notifications can provide valuable information, aid multitasking, deliver timesensitive data and alert the user to important events. They can also be disruptive and irritating. McBryan & Gray [21] describe a hypothetical scenario where notifications are delivered to a user, Fred, via his mobile phone. The mobile phone notifications irritate Fred when he is at home, so the notification system re-routes messages to be delivered via his television. Context sensitivity alongside a choice of modalities have stopped Fred from disabling his phone to stop the notifications. If the phone is legitimately unavailable, or Fred is not responding to the phone notifications, important messages can still be delivered using an alternative modality. Many researchers have advocated using a range of modalities to provide more appropriate and acceptable interactions [1,2,16,19,21,22,28]. However, no guidelines or models exist that will help developers to include them, and most research in this area is directed towards the office or high-pressure environments such as aeroplane cockpits. In order to develop such guidelines, research is needed that will reveal how disruptive certain modalities are along with their general effectiveness as notifications. A multimodal reminder system should not be disruptive in the home environment. If a notification interrupts the user it could create stress [11,20], generate annoyance and anxiety [3], cause the user to make mistakes in the interrupted activity [17], alter the perceived difficulty of a task [3] and cause a person to speed up their activities [8,10,20]. Monk et al. [24] found that interruptions as short as 0.25 of a second were just as disruptive as interruption 5 seconds long. Berg et al. [5] investigated the cause of slips, trips and falls in older adults and found that most were considered ‘avoidable’ and blamed on ‘hurrying too much’. Home care developers must be aware of these adverse affects in order to minimise their impact.
574
D. Warnock, M. McGee-Lennon, and S. Brewster
The most comprehensive study into modality disruptiveness was performed by Arroyo, Selker & Stouffs [1]. Their work investigated the disruptive properties of five notification modalities: heat, smell, sound, vibration and light. Disruption was measured subjectively by the participants. The authors reported that their results were not statistically significant, providing only anecdotal evidence for differences between notification modalities. A later experiment by Arroyo & Selker [2] that considered heat and light based notifications attempted to demonstrate that disruptiveness differs with modality. Subjects were asked to play a text-based adventure game with disruptiveness measured primarily by the number of mistakes made post-notification. The data reported by the authors did not reveal significant differences between the disruptiveness of the modalities, although the authors asserted that light was superior to heat in their conclusions. Latorella [19] performed a study into the properties of multimodal notifications in aeroplane cockpits. Latorella reported that there were subtle differences between the visual and audio alerts, but that there were also interaction effects between the modalities of the task and the notification. Latorella concluded that the modality of the notification had an effect on response time, errors in the notification task, and errors in the primary task post-notification. Arroyo, Selker & Stouffs [1] suggested that previous exposure to a modality may reduce the negative effects of the notification. Other research has shown that training can reduce the disruptiveness of notifications [9], as can familiarity with the primary task [10] which may support their conclusion. This would suggest that common methods of notification (such as an audio beep) should be less disruptive than uncommon methods (such as an olfactory notification). There is a conflict in some of the literature regarding the relationship between modality and disruptiveness, and the study by Arroyo, Selker & Stouffs [1] did not provide knowledge useful to multimodal system developers. A comprehensive study is needed that will investigate how disruptive and effective notifications delivered in different modalities really are; this will help developers to include these modalities when developing multimodal systems.
3
Design and Method
An experiment with a within-subjects repeated-measures design was carried out in order to evaluate the disruptiveness and effectiveness of notifications in different modalities. Participants attempted to complete a primary task while notifications instructed the participant to carry out a brief secondary task. The primary task was a card-matching game and the secondary task was to receive a notification and acknowledge it by pressing one of three buttons. The notifications were presented in visual, audio, tactile and olfactory modalities. The independent variable was the modality of the notification and the dependent variables were the disruption in the primary task and the speed and accuracy of responses in the secondary task. The experiment was carried out with 27 participants (14 male and 13 female). The participants included 20 people in the 18-30 age group, 4 people aged 31-45 and 3 aged 46-60.
The Role of Modality in Notification Performance
575
Fig. 1. A card-matching game in progress. The player has already matched 6 cards and is attempting to match the ’umbrella’ card. The timer at the bottom of the screen shows the player has 32 seconds remaining.
3.1
Primary Task
A primary task was desired that encouraged the type of cognitive workload experienced by a person in their own home. This would provide a more realistic picture of how multimodal notifications might interfere with home life. McGeeLennon, Wolters & McBryan [22] used a digit span test to evaluate serial recall, and Arroyo & Selker [1] used a proof-reading test for their primary task. While these tasks have been repeatedly employed in experiments to build a mental workload, the type of workload they encourage is more likely to be encountered in an office environment than at home. A later experiment by Arroyo & Selker [2] used a computer game, which was more likely to produce the desired type of cognitive workload; however their game was complex, time-consuming and difficult to interpret. The task chosen was a simple card-matching game called ‘Concentration’ (also known as Memory or Pairs). In concentration, pairs of cards are presented face-down to the player. The player can then turn over two cards per turn in an attempt to find the pairs and remove them from the game (see Fig. 1). Psychologists have used this task in experiments to better understand how the brain processes information; in particular, the game has been used to evaluate how and why adults can outperform children despite the accepted view that children
576
D. Warnock, M. McGee-Lennon, and S. Brewster
Fig. 2. Icons used in the primary task
have superior visual-spatial memory skills [4,13,26]. Schumann-Hengsteler [26] suggests that the game of Concentration may not be entirely visual-spatial, arguing that adults have the ability to re-encode the information on the cards. A picture of a boat can be remembered by the picture itself, the verbal label ‘boat’, or simply by understanding what a boat is. In doing this, adults makes more efficient use of their memory which provides a greater performance benefit than the superior visual-spatial memory of a child player. Simple icons were displayed on the cards to allow for the ‘re-encoding’ phenomenon as described by Schumann-Hengsteler [26]. While there are various ISO standard pictogram sets, they are generally used for sign-making and most lack easily identified verbal labels. Instead, the game will used simple A-Z icons taken from an online speech therapy resource website1 as shown in Fig. 2. A voluntary pilot study hosted on the project website2 was run to evaluate the primary task and identify a suitable configuration. The experiment needed to be performed quickly enough that multiple trials and conditions could be completed, but there also had to be enough cards to ensure that the task was sufficiently challenging. The pilot consisted of 37 anonymous participants who each played three games of concentration with 16, 24 and 36 cards. The 24 card game provided an average completion time of 63.9 seconds, suggesting that a 24 card game with a 60 second time limit would allow many players to complete the game. This configuration was used in the experiment as shown in Fig. 1. Concentration is a simple leisure activity that might well be carried out at home, it is a well-known game with very simple rules and it can quickly build a mental workload. The pilot study demonstrated that it was effective for the needs of the experiment and also allowed for the validation of the experimental measures as discussed in Sect. 3.4. 1 2
Speech Teach UK, http://www.speechteach.co.uk The MultiMemoHome Project, http://MultiMemoHome.org
The Role of Modality in Notification Performance
3.2
577
Secondary Task
Gillie & Broadbent [14] demonstrated that the complexity of the secondary task and its similarity to the primary task are the main factors in determining how disruptive a notification is. To isolate the effects of modality, the secondary task should be simple and as different from the primary task as possible. The secondary task in our study required a participant to press one of three buttons in response to a notification. These buttons are large, physical, coloured buttons placed directly in front of the participant and fixed to the desk. Physically separate buttons encouraged the participants to look away from the screen and remove their hand from the mouse; this separation was intended to re-enforce the concept that participants must ‘stop’ the primary task to deal with the secondary one. Similar to the experiment carried out by McGee-Lennon, Wolters & McBryan [22], the buttons were labelled with the terms “Heating”, “Lights” and “Telephone” to provide home-related context as shown in Fig. 3. This was used to provide additional semantic links between the notifications and buttons where the delivery method could support it.
Fig. 3. Button colour & labelling
3.3
Notifications
To evaluate the differences between notifications in different modalities, a wide range of unimodal notifications were designed for the experiments. These included common notification techniques such as text and speech along with less common notification modalities such as olfaction and abstract visual display. Eight unimodal notifications were developed for the experiment as shown in Table 1. The text and pictogram notifications were delivered directly into the game window to the top of the play area. No additional hardware was required for this configuration. The abstract visual display was created with a short-throw projector positioned to project a coloured light against the wall adjacent to the participant. The projector was deliberately aligned so that the projection lay in the peripheral vision of the participant, as shown in Fig. 4a. In all the audio conditions the notifications were delivered through a pair of Sennheiser HD 25-1 II closed-back headphones as shown in Fig. 4b. These headphones helped to prevent background noise from causing interference. 3
IEC-60878 IEC-60878 ISO-7001 Taken from two international stanTherLight Telephone dards; IEC-60878 and ISO-7001. mometer
Abstract Visual Yellow Light
Green Light
“Phone”
A simple one-word message displayed in a large bold font above the game.
Blue Light Projector used to shine a coloured light against the wall. The colour of the light matched the colour of the correct button.
Auditory Voice
Spoken Spoken “Heating” “Lights”
Spoken “Phone”
Created using the same synthetic voice that was used by McGeeLennon, Wolters & McBryan [22].
Earcon
Acoustic Grand Piano
Clarinet
Marimba
The Earcons had the same rhythm and varied in the sound of the instrument; taken from an experiment by McGee-Lennon, Wolters & McBryan [22].
Auditory Icon
Gas Ignition
Light Phone Switch Dialing Click (x2) Beeps
Auditory Icons at 1 second each taken from an online sound effect archive.3
multiLP
textLP
voiceLP
Tactons varied in rhythm and were taken from an experiment by Brewster & Brown [6].
Dale Air “Dark Chocolate”
Dale Air “Riverside”
Dale Air “Raspberry”
Smells were selected based on information from an experiment by Brewster, McGookin & Miller [7].
Tactile Tacton
Olfactory Aromacon
The Role of Modality in Notification Performance
(a) Abstract Visual
(b) Audio
(c) Tactile
(d) Olfactory
579
Fig. 4. Experimental configurations
Tactile notifications were delivered via an Engineering Acoustics Inc. C2 vibrotactile actuator4 powered by a small amplifier. This was secured to the top of the wrist on the participant’s non-dominant hand with a stretchable bandage as shown in Fig. 4c in order to simulate notifications delivered from a watch. The device has a very low latency and was able to create precise tactile messages. The olfactory notifications were delivered using a Dale Air5 Vortex Active smell device which has the capacity for 4 different smells. Scents are stored on 1-inch disks, which are blown by a fan to deliver the smell. Delivery times are much longer than the other devices (as the smells took longer to reach the nose), so to ensure the smells were delivered in a reasonable time frame the delivery device was placed directly in front of the participants as shown in Fig. 4d. The notifications used in the study were all powered by off-the-shelf technology, so any commercial product providing interaction in these modalities would be likely to offer it at a similar quality. The notifications used in the experiment are defined in Table 1 which also shows how the unimodal notifications were grouped by sensory apparatus. 4 5
Other experiments into disruptiveness have examined errors introduced into the primary task and change in activity rate post-notification [1,2]. Latorella [19] observed that activity rate and error rate disruptions have distinctly different properties. Other experiments investigating the properties of concentration have used various techniques to measure performance. Schumann-Hengsteler [26] measured the number of missed opportunities, Baker-Ward & Ornstein [4] measured the number of perfect matches, and Gellatly, Jones & Best [13] observed player strategy. Another measure was created by combining perfect matches and missed opportunities which we called superfluous views. This is a measure of how many decisions (i.e. cards clicked) were not productive. When a card was viewed, that card was marked as ‘seen’. Every subsequent viewing of that card which failed to match it to another card was considered a superfluous view. A high number of superfluous views suggested that the participant found it difficult to remember which cards were where. This metric was expressed as superfluous views per turn to prevent it being influenced by player speed or game completion. The activity rate of the participants was measured as turns per second. These measures of performance were tested as part of the pilot study described in Sect. 3.1. Superfluous views per turn was found to provide the most accurate, reliable and stable measure of error rate in the primary task, while turns per second was confirmed to be an adequate measure of activity rate. Notification performance was measured by response accuracy and response time. Response accuracy simply checked that that the button pressed corresponded to the notification delivered. Response time was measured as the time between starting notification delivery and the participant pressing a button. This did not reflect the true response time as it is difficult to separate the delivery time from the processing time, particularly for olfactory notifications. Delivery time was considered to be an inherent trait of the modality, and as such separating it from the overall response time would not be representative of real-world performance. 3.5
Hypotheses
In order to explore the relationship between the effectiveness and disruptiveness of notifications and the modality used to deliver them, the following hypotheses were tested by the experiment. H1: Error rate in the primary task will vary with the notification modality (measured as superfluous views per turn). Based on evidence provided by Arroyo, Selker & Stouffs [1], it was hypothesised that people will be more disrupted by notifications that they do not encounter regularly. As olfactory notifications are very uncommon, it is likely they will prove to be the most disruptive. H2: Activity rate in the primary task will vary with the notification modality (measured as turns per second). Other work has shown that introducing notifications has a significant impact on the speed at which the primary task is performed [17]. It is hypothesised that some modalities will have a more significant impact on the speed at which the primary task is performed than others.
The Role of Modality in Notification Performance
581
H3: The accuracy of notification responses will vary with notification modality (measured as the percentage of responses correct, incorrect and missed). It is hypothesised that inherent differences between the delivery methods will have a significant impact on how easy the notifications are to interpret. H4: The speed at which participants respond to notifications will vary with the notification modality (measured as the average response time). Latorella [19] showed that audio notifications were responded to quicker than visual ones. 3.6
Procedure
At the beginning of each trial participants were given an information sheet and consent form, then asked to take a short demographic survey to collect gender and age information. Participants were also asked to self-assess their sensory ability on a 21-point Likert scale. Each participant then completed a control condition (which had no notifications) and 4 experimental conditions. During the visual and audio conditions each participant would only see one of the delivery methods described in Sect. 3.3. Participants were counter balanced to ensure even coverage of the different delivery methods. With 27 participants in the visual condition, 9 received textual notifications, 9 received pictograms and 9 received abstract visual notifications (and similarly for audio). As the tactile and olfactory conditions both use a single delivery method the same configuration was used for all 27 participants. The control condition always came first, while the experimental conditions were delivered in a random order. As such the experimenter and participant were both blind to the order or type of experimental conditions. Each experimental condition consisted of a training segment and five games (except the control condition, where training was not required). At the start of each condition a screen described the type of notification and the hardware was configured as required (see Sect. 3.3). Each participant was given the opportunity to make minor comfort adjustments (such as volume) before each condition began. Participants were trained by introducing each notification in turn and associating them with the correct button. Notifications were then delivered randomly until the participant had correctly acknowledged 6 sequential notifications. The participant was informed and corrected if a mistake was made. This training helped to ensure that each participant had fully understood the links between the notifications and buttons at the start of each experimental condition. With the training complete, the participant would then play five games of concentration. In each game 3 notifications were delivered at random points with buffers between them to prevent overlap. A large buffer at the end of the game helped to ensure that quick players could not finish before all the notifications had been delivered. At the end of every game the participant was provided with the opportunity to rest before the next game started. Once all the conditions had been completed participants were paid and offered the opportunity to ask questions. The experiment required around 50 minutes per participant.
582
4
D. Warnock, M. McGee-Lennon, and S. Brewster
Results
H1: Error rate in the primary task will vary with the notification modality (measured as superfluous views per turn). Fig. 5 shows a comparison between error rate and the notification modality. A within-subjects one-way ANOVA was performed on superfluous views per turn for each modality, which showed significant differences between the means of the conditions (F (4, 101) = 16.57, p < 0.0001). Post hoc Tukey tests revealed that the significant differences were between the control and the experimental conditions; there were no significant differences in error rate between any of the conditions with notifications. The data did not support the hypothesis, suggesting that there is no connection between task disruption and the modality of the notification. H2: Activity rate in the primary task will vary with the notification modality (measured as turns per second).
1.0 0.8 0.6 0.4 0.0
0.2
Superfluous View per Turn
1.2
1.4
Fig. 6 shows the relationship between activity rate and notification modality. A within-subjects one-way ANOVA showed a significant effect of notification modality on the speed of the player (F (4, 101) = 8.76, p < 0.0001). Post hoc Tukey tests identified two groups: one containing the visual and audio conditions and the other containing tactile and olfactory. These two groups were significantly different from each other. Olfactory and tactile notifications introduced
Control
Visual
Audio
Tactile
Olfactory
Modality
Fig. 5. Graph of Error Rate and notification modality. No notifications were delivered in the control condition. Superfluous views per turn has a theoretical range of 0-2.
The Role of Modality in Notification Performance
583
0.9
●
0.8
● ●
0.7 0.6 0.5 0.3
0.4
Turns per Second
● ●
Control
Visual
Audio
Tactile
Olfactory
Modality
20%
40%
60%
Correct Incorrect Missed
0%
Percentage Responses
80%
100%
Fig. 6. Graph of turns per second and modality. No notifications were delivered in the control condition.
Visual
Audio
Tactile
Olfactory
Fig. 7. Graph of responses by modality
D. Warnock, M. McGee-Lennon, and S. Brewster
6 5 4 3
●
1
2
Response Time (seconds)
7
8
584
Visual
Audio
Tactile
Olfactory
Modality
Fig. 8. Graph of response time and modality
a temporal disruption, slowing the players, while visual and audio conditions appeared to prompt the player to speed up resulting in a higher rate of activity. The evidence supports the original hypothesis that that notification modality will influence the rate of activity in the primary task. H3: The accuracy of notification responses will vary with notification modality (measured as the percentage of responses correct, incorrect and missed). The number of correct responses, as shown in Fig. 7, was evaluated with a KruskalWallis one-way ANOVA which found a significant effect of modality on correct responses (χ2 (3) = 41.2, p < 0.0001). The audio and visual modalities had a high correct acknowledgement rate with means of 90% and 96% respectively. Tactile and olfactory did not perform as well, scoring 76% and 70% respectively. Modality was also found to have an effect on responses incorrect (χ2 (3) = 24.96, p < 0.0001) and missed (χ2 (3) = 31.77, p < 0.0001). The visual notifications had the lowest number of incorrect responses at 2.5%, while the tactile condition had the highest at 16.7%. More olfactory notifications were missed than any other modality. For visual and audio, very few of the notifications were missed. The evidence in this case strongly supports the hypothesis that there is an effect of modality on notification response accuracy. H4: The speed at which participants respond to notifications will vary with the notification modality (measured as the average response time).
The Role of Modality in Notification Performance
585
This hypothesis was evaluated with a within-subjects one-way ANOVA over the average response time for each modality, which found a significant effect of modality on response time (F (3, 75) = 322.09, p < 0.0001). Average response time was calculated using the response times for both correct and incorrect acknowledgements and is shown in Fig. 8.
5
Discussion
The study presented here show that the introduction of notifications was disruptive to the primary task, causing a significant jump in the error rate. However, there were no significant differences between the modalities, implying that the notifications were all equally disruptive. Activity rate was not significantly different from the control condition for any of the experimental conditions, despite the increased time demand on the participant. However, the visual and audio conditions did result in significantly higher activity rate than the tactile and olfactory conditions. Arroyo, Selkers & Stouffs [1] had tested disruption subjectively, but found no significant differences in any objective measures. This study agrees with their results, suggesting that there is no relationship between notification modality and error rate. Cellier & Eyrolle [10] suggested an interrupted person takes a ‘mental snapshot’ of the task at hand when interrupted and then uses that information to resume the task later. The results of this experiment suggest that the modality of the notification does not affect the process of pausing and resuming a task and that as a result, the modality of the notification does not influence the error rate in the primary task. Anecdotal evidence suggests that disruption could be greater for modalities which the user is unfamiliar with [1,2]; this is backed up by research showing that disruption effects can be reduced through experience or training [9,14,23]. This experiment did not measure familiarity, but the difference between visual and auditory activity rate and tactile and olfactory activity rate could be explained by this. Given the ubiquity of visual and audio devices and the relative scarcity of tactile and olfactory ones, it seems reasonable to assume that participants were more ‘familiar’ with visual and auditory notifications. This might provide an increased ability to intercept and process these notifications. However, further experiments would be needed to evaluate this. The effectiveness of the notifications seems to be attributed to their individual properties. The olfactory notifications, which took the longest to deliver, showed a very long response time. Figure 7 shows olfactory and tactile notifications, which lacked semantic links to the buttons, had the lowest number of correct responses. Tactile notifications, which are targeted and attention-grabbing, had a lower ‘miss’ rate compared to olfactory notifications but were more likely than olfactory to be misinterpreted. This suggests that the individual properties of each modality are directly linked to their effectiveness as notifications. The visual and auditory notifications performed well, as they were quick to deliver and provided a semantic link to the buttons (with the exception of the Earcons). The tactile notifications showed an unexpected ‘response lag’, as shown
586
D. Warnock, M. McGee-Lennon, and S. Brewster
in Fig. 8. In addition, many of the participants verbally expressed that they had forgotten the button/notification associations at the start of the tactile condition, which did not occur with the other notifications. Hoggan & Brewster [15] carried out a similar experiment but did not report any problems with tactile training or recall, suggesting that this issue could be the result of poor notification design. However, the tactile notifications had more correct responses than the olfactory ones, which could simply be the participants expressing a lack of confidence in their ability to correctly identify the notifications. This could also explain the increased response time. The olfactory notifications could be said to have performed the poorest as they have the longest response time, the lowest number of correct responses and also caused a decrease in player speed compared to visual or audio notifications. Smell technology is rarely examined in multimodal research; smells are difficult to control, they linger after delivery and the technology relies on limited-lifespan chemical components. For these reasons it is often unfairly dismissed as completely impractical, yet in reality artificial smells are common in the home from air fresheners to perfumes and scented candles. This suggests that people might also be willing to accept smell-based technology in their home. If a home care system was equipped with the same modalities used in this experiment, it would be able to adapt to a range of situations. For example, a reminder notification to lock doors may be considered non-urgent in the afternoon and be displayed textually. Once night falls it becomes more important, so audio notifications might be used instead to alert the target user. Another example would be an elderly person living alone who must take certain medicines daily. To remind the person to take their medicine, a message could be sent to their TV screen; however, if the reminder system detects a social situation, any potentially embarrassing notifications could be re-routed through a private tactile device or an ambient visual display. If this person also had a visual impairment then Auditory Icons or Earcons could be substituted for speech in the same situation. The study presented here has shown that there is no ‘best modality’; all of the unimodal notifications examined by the experiment could be effective notifications. We agree with other researchers [1,21,22,28] who have argued that a range of modalities should be used to provide more appropriate and effective interactions. Notification modalities should be chosen to suit the situation based on factors such as user preference, message sensitivity, social setting, message complexity, message urgency and user impairment.
6
Conclusion and Future Work
The study presented here has shown that there are a wider range of modalities available to home care developers than are commonly used. The results of the study will help developers to include these modalities, allowing them to create home care technology that is more appropriate, acceptable and effective. This knowledge will also help to expand upon models of interruptibility in the home, such as that developed by Vastenburg, Keyson and Ridder [27]. Their model considers message urgency and user engagement to evaluate notification
The Role of Modality in Notification Performance
587
acceptability, which is very much a step in the required direction. However, their model does not include modality or social acceptability, and it does not make provisions for sensory impairment. New models are required that will satisfy the needs of home care patients. In addition, more research is needed in order to fully understand the acceptability of different notification modalities when delivering messages of varying urgency or importance. In conclusion, the results of this experiment show that the modality of the notification will not affect the primary task, but it is the notification itself which is disruptive. In addition, the results highlight how effective and practical different notification modalities can be. This will help to guide future development of multimodal technology by encouraging developers and researchers to take advantage of a wider range of modalities, allowing the adoption of techniques which are more effective, appropriate and practical. Acknowledgments. This work was funded by the EPSRC (grant number EP/G069387/1). We would like to thank the anonymous reviewers for their feedback and insights, those who participated in the study and Miss M. Vernigor for modelling the experiment.
References 1. Arroyo, E., Selker, T., Stouffs, A.: Interruptions as multimodal outputs: which are the less disruptive? In: Proceedings of Fourth IEEE International Conference on Multimodal Interfaces, pp. 479–482 (2002) 2. Arroyo, E., Selker, T.: Self-adaptive multimodal-interruption interfaces. In: International Conference on Intelligent User Interfaces, pp. 6–11 (2003) 3. Bailey, B.P., Konstan, J.A., Carlis, J.V.: The effects of interruptions on task performance, annoyance, and anxiety in the user interface. In: Proceedings of INTERACT, pp. 593–601 (2001) 4. Baker-Ward, L., Ornstein, P.A.: Age differences in visual-spatial memory performance: do children really out-perform adults when playing Concentration? Bulletin of the Psychonomic Society 26(4), 331–332 (1988) 5. Berg, W.P., Alessio, H.M., Mills, E.M., Tong, C.: Circumstances and consequences of falls in independent community-dwelling older adults. Age and Ageing 26(4), 261–268 (1997) 6. Brewster, S., Brown, L.M.: Tactons: structured tactile messages for non-visual information display. In: ACM International Conference Proceeding Series, vol. 53, pp. 15–23 (2004) 7. Brewster, S., McGookin, D., Miller, C.: Olfoto: designing a smell-based interaction. In: Conference on Human Factors in Computing Systems, pp. 653–662. ACM, New York (2006) 8. Burmistrov, I., Leonova, A.: Do Interrupted Users Work Faster or Slower? The Micro-analysis of Computerized Text Editing Task. In: Proceedings of HCI International on Human-Computer Interaction: Theory and Practice, pp. 621–625 (2003) 9. Cades, D.M., Trafton, J.G., Boehm-Davis, D.: Mitigating disruptions: Can resuming an interrupted task be trained? In: Human Factors and Ergonomics Society Annual Meeting Proceedings. Human Factors and Ergonomics Society, vol. 50(3), pp. 368–371 (2006)
588
D. Warnock, M. McGee-Lennon, and S. Brewster
10. Cellier, J.-M., Eyrolle, H.: Interference between switched tasks. Ergonomics 35(1), 25–36 (1992) 11. Cohen, S.: Aftereffects of stress on human performance and social behavior: A review of research and theory. Psychological Bulletin 88(1), 82–108 (1980) 12. Department of Health: National Service Framework for Older People (2007) 13. Gellatly, A., Jones, S., Best, A.: The Development of Skill at Concentration. Australian Journal of Psychology 40(1), 1–10 (1988) 14. Gillie, T., Broadbent, D.: What makes interruptions disruptive? A study of length, similarity, and complexity. Psychological Research 50(4), 243–250 (1989) 15. Hoggan, E., Brewster, S.: Designing audio and tactile crossmodal icons for mobile devices. In: Proceedings of the 9th International Conference on Multimodal Interfaces, pp. 162–169 (2007) 16. Hoggan, E., Crossan, A., Brewster, S., Kaaresoja, T.: Audio or tactile feedback: which modality when? In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, pp. 2253–2256. ACM, Boston (2009) 17. Kapitsa, M., Blinnikova, I.: Task performance under influence of interruptions. In: Operator Functional State: the Assessment and Prediction of Human Performance Degradation in Complex Tasks, pp. 323–329. Ios Pr. Inc., Amsterdam (2003) 18. Kane, R.A.: Long-Term Care and a Good Quality of Life: Bringing Them Closer Together. Gerontologist 41(3), 293–304 (2001) 19. Latorella, K.A.: Effects of Modality on Interrupted Flight Deck Performance: Implications for Data Link. Technical report. NASA Langley Technical Report Server (1998) 20. Mark, G., Gudith, D., Klocke, U.: The cost of interrupted work: more speed and stress. In: Proceedings of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, pp. 107–110 (2008) 21. McBryan, T., Gray, P.: A Model-Based Approach to Supporting Configuration in Ubiquitous Systems. In: Proceedings of Int. Conf. on Design, Specification and Verification of Interactive Systems, pp. 167–180 (2008) 22. McGee-Lennon, M.R., Wolters, M., McBryan, T.: Audio Reminders in the Home Environment. In: Proceedings of the 13th International Conference on Auditory Display, pp. 437–444 (2007) 23. Miyata, Y., Norman, D.: Psychological issues in support of multiple activities. In: User Centered Systems Design: New Perspectives on Human-Computer Interaction, ch.13, pp. 265–284. L. Erlbaum Associates Inc., Mahwah (1986) 24. Monk, C.A., Boehm-Davis, D.A., Trafton, J.G.: Very brief interruptions result in resumption cost. In: Proceedings of the Twenty-Sixth Annual Conference of the Cognitive Science Society, p. 1606 (2004) 25. Office for National Statistics: National Population Projections (2008) 26. Schumann-Hengsteler, R.: Children’s and adults’ visuospatial memory: The game concentration. Journal of Genetic Psychology 157(1), 77–92 (1996) 27. Vastenburg, M.H., Keyson, D.V., Ridder, H.: Considerate home notification systems: A user study of acceptability of notifications in a living-room laboratory. International Journal of Human-Computer Studies 67(9), 814–826 (2009) 28. Warnock, D.: A Subjective Evaluation of Multimodal Notifications. In: To Appear in Proceedings of Pervasive Health 2011, Dublin, Ireland (May 2011)
Co-located Collaborative Sensemaking on a Large High-Resolution Display with Multiple Input Devices Katherine Vogt1, Lauren Bradel2, Christopher Andrews2, Chris North2, Alex Endert2, and Duke Hutchings1 1 Department of Computing Sciences Elon University, Elon, NC 27244, USA {kvogt,dhutchings}@elon.edu 2 Department of Computer Science Virginia Tech, Blacksburg, VA 24060, USA {lbradel1,cpa,north,aendert}@cs.vt.edu
Abstract. This study adapts existing tools (Jigsaw and a text editor) to support multiple input devices, which were then used in a co-located collaborative intelligence analysis study conducted on a large, high-resolution display. Exploring the sensemaking process and user roles in pairs of analysts, the two-hour study used a fictional data set composed of 50 short textual documents that contained a terrorist plot and subject pairs who had experience working together. The large display facilitated the paired sensemaking process, allowing teams to spatially arrange information and conduct individual work as needed. We discuss how the space and the tools affected the approach to the analysis, how the teams collaborated, and the user roles that developed. Using these findings, we suggest design guidelines for future co-located collaborative tools. Keywords: Visual analytics, sensemaking, co-located, CSCW, large highresolution display.
Fig. 1. Study setup, two users with their own input devices in front of the large display
Large, high-resolution workspaces (such as the one shown in Fig. 1) are beneficial to intelligence analysis in that they allow for spatial information organization to act as an external representation or memory aid [5]. This advantage was shown to help individual intelligence analysts in their task, in that they were able to spatially organize and reference information. This work explores how such a workspace, allowing for these spatial strategies, can impact the strategy and workflow of a team (of 2) users working collaboratively on an intelligence analysis task. In this environment, we provide users with a social setting in which to perform their analysis, and a shared representation in which to organize their thoughts. We analyze their process in terms of their activities and roles exemplified during their task, their use of space, and level of collaboration. In such co-located settings (versus remote settings), it has been shown that teams experience a greater quality of communication because of subtle physical interaction cues and a stronger trust that develops with the shared experience [6]. Also, given that analysts often work with large collections of electronic documents, it is worthwhile to explore how the design of tools on large, high-resolution displays could facilitate collaboration during analysis. Further, if this environment supports collaborative work, then the ability to make sense of documents develops great potential. To investigate the collaborative use of a large, high-resolution display environment, we have completed an exploratory study of two visual analytic tools: Jigsaw [7], and a simple multiwindow text editor. The study we present involves synchronous, co-located collaborative sensemaking. Here, we define co-located work as multiple users working each with his or her own input devices (mouse and keyboard) on the same computer display.
2 Related Work Design tensions exist in collaborative tools between “individual control of the application, and support for workspace awareness” [8]. Some previous groupware tools have had difficulty achieving a balance between these extremes, either supporting the
Co-located Collaborative Sensemaking on a Large High-Resolution Display
591
group through consistent view sharing (“What You See Is What I See” – WYSIWIS) or the individual through relaxed view sharing [9]. However, Gutwin and Greenberg feel that a solution to this tension exists, stating that “the ideal solution would be to support both needs – show everyone the same objects as in WYSIWIS systems, but also let people move freely around the workspace, as in relaxed-WYSIWIS groupware” [8]. Single display groupware provides an interface to achieve this balance. Single Display Groupware (SDG) concerns face-to-face collaboration around a single shared display [10]. Early SDG systems include Liveboard [11], Tivoli [12], and the Digital Whiteboard [13]. When compared to co-located multi-display groupware, SDG resulted in increased collaborative awareness [14]. Stewart et al. continued to investigate SDG systems in subsequent work ([15, 16]). They proposed that the multi-user nature of SDG systems on early displays with limited screen size “may result in reduced functionality compared with similar single-user programs” [16], although this concern can be alleviated by increasing the physical size (and resolution) of the SDG display. SDG systems using multiple input devices have been found to increase interaction between participants and keep participants “in the zone” [15]. Providing a separate mouse and keyboard to each participant has been shown to allow users to complete more work in parallel than if they were restricted to a single mouse and keyboard [17]. Multiple input devices provide the benefit of allowing reticent users to contribute to the task [18, 19]. As a result of our desire to keep participants in the “cognitive zone” [20], given the cognitively demanding nature of sensemaking tasks, we chose to implement multiple input devices for our set-up. The sensemaking process has been illustrated by Pirolli and Card (Fig. 2) to outline the cognitive process of “making sense” of documents throughout their investigation in order to produce a cohesive and coherent story of interwoven information found across document sources [21]. This process can be broken down into two broad categories: foraging and sensemaking. The foraging loop involves extracting and filtering relevant information. The sensemaking loop represents the mental portion of sensemaking where a schema, hypothesis, and presentation are iteratively developed. The analyst is not restricted to a single entry point to this loop, and instead can enter at the top or bottom before looping through the various steps [21]. The sensemaking process has been studied and observed on large, high-resolution displays as well as multiple monitor set-ups for individual users [5, 7, 22]. Paul and Reddy observed, through an ethnographic study concerning collaborative sensemaking of healthcare information, that collaborative sensemaking should focus on the following factors: prioritizing relevant information, the trajectories of the sensemaking activity, and activity awareness [23]. We believe that the large display used in our study provides users with the opportunity for this awareness and prioritization. Collaborative sensemaking has also been studied in terms of web searches [24, 25], as well as remote collaborative sensemaking for intelligence analysis [26]. Furthermore, collaborative sensemaking has been observed in co-located tabletop settings [27-29], although, to the best of our knowledge, co-located collaborative sensemaking applied to intelligence analysis has not been investigated on large, high-resolution vertical displays.
592
K. Vogt et al.
Fig. 2. Adapted from sensemaking loop, Pirolli and Card [21]
User performance on simple tasks, such as pattern matching, has been shown to improve when using a large, high-resolution vertical display when contrasted with a standard single monitor display [30]. In addition to quantitative improvement, users were observed using more physical navigation (e.g. glancing, head/body turning) than virtual navigation (e.g. manually switching windows or tasks, minimizing/maximizing documents) when using large, high-resolution displays, such as the one shown in Fig. 1. Andrews et al. expanded the benefits of using large, high-resolution display to cognitively demanding tasks (i.e., sensemaking) [5]. We chose to use these displays to explore collaborative sensemaking on large vertical displays, especially the user roles that develop throughout the sensemaking process and how the sensemaking process is tackled by teams of two.
3 Study Design We have conducted an exploratory study examining the collaborative sensemaking process on a large, high-resolution display. Teams of two were asked to assume the role of intelligence analysts tasked with analyzing a collection of text documents to uncover a hidden plot against the United States. The teams were provided with one of two tools, Jigsaw or a multi-document text editor, with which they were asked to conduct their analysis. While each team was told that they were expected to work collaboratively, the nature of that collaboration was left entirely up to the participants. 3.1 Participants We recruited eight pairs of participants (J1-J4 used Jigsaw, T1-T4 used the text editor). All pairs knew one another and had experience working together prior to the study. Six of the eight pairs were students and the other two pairs consisted of research associates
Co-located Collaborative Sensemaking on a Large High-Resolution Display
593
and faculty. There were four all male groups, one all female, and three mixed gender. Each participant was compensated $15 for participation. As a form of motivation, the solutions generated by the pairs of participants were scored and the participants received an additional financial award for the four highest scores. The rubric for evaluating the participants’ verbal and written solutions was based on the strategy for scoring Visual Analytics Science and Technology (VAST) challenges [22]. The participants earned positive points for the people, events, and locations related to the solution and negative points for those that were irrelevant or incorrect. They also received points based on the accuracy of their overall prediction of an attack. 3.2 Apparatus Each pair of users sat in front of a large display consisting of a 4x2 grid of 30” LCD 2560x1600 pixel monitors totaling 10,240x3,200 pixels or 32 megapixels (Fig. 1). The display was slightly curved around the users, letting them view the majority, if not all, of the display in their peripheral vision. A single machine running Fedora 8 drove the display. A multi-cursor window manager based on modified versions of the IceWM and x2x was used to support two independent mice and keyboards [31]. Thus, each user was able to type and use the mouse independently and simultaneously in the shared workspace. This multi-input technology allowed two windows to be “active” at the same time, allowing participants to conduct separate investigations if they chose. A whiteboard, markers, paper, and pens were also available for use. These external artifacts were provided as a result of a pilot study where participants explicitly requested to use the whiteboard or write on sheets of paper. Each participant was provided with a rolling chair and free-standing, rolling table top holding the keyboard and mouse so that they could move around if they chose to do so. The desks and chairs were initially positioned side-by-side in the central area of the screen space. 3.3 Analytic Environment During this exploratory study, four of the pairs (J1-J4) examined the documents within Jigsaw, a recent visual analytics tool, while the other four (T1-T4) used a basic text editor, AbiWord [32], as a contrasting tool. We chose to investigate these two tools due to the different analytical approaches the tools inherently foster. Jigsaw supports a functionbased approach to analysis, allowing the tool to highlight connections between documents and entities. The Text Editor instead forces the participants to read each document first, and then draw connections themselves without any analytical aid. We do not intend for these two tools to be representative of all visual analytics tools. Instead, we sought to explore co-located collaborative sensemaking in two different environments. This text editor allows the user to highlight individual document sections and edit existing documents or create text notes. Teams using this text editor were also provided with a file browser in which they could search for keywords across the document collection. Jigsaw [7, 33] is a system that has been designed to support analysts; it visualizes document collections in multiple views based on the entities (people, organizations, locations, etc.) within those documents. It also allows textual search queries of the documents and entities. The views are linked by default so that exploring an entity in one visualization will simultaneously expand it in another. This feature is controlled by the user and can be turned on or off within each view. We were not able to change Jigsaw’s source code to
594
K. Vogt et al.
allow windows to be linked separately for each participant, therefore all Jigsaw views were connected unless the linking feature was disabled by the participant teams. Jigsaw can sort documents based on entity frequency, type, and relations. This information can be displayed in many different ways, including interactive graphs, lists, word clouds, and timelines. Jigsaw also comes equipped with a recently added Tablet view where users can record notes, label connections made between entities, identify aliases, and create timelines. As a result of the complexity of the visualizations available in Jigsaw, pairs using this visual analytics tool were given a thirty minute tutorial prior to the start of the scenario, while pairs using the text editor only required a five minute tutorial. 3.4 Task and Procedure After a tutorial on Jigsaw or the text editor with a sample set of documents, each pair was given two hours to analyze a set of 50 text documents and use the information gathered to predict a future event. This scenario comes from an exercise developed to train intelligence analysts and consists of a number of synthetic intelligence reports concerning various incidents around the United States, some of which can be connected to gain insight into a potential terrorist attack. This same scenario was also used in a previous study evaluating individual analysts with Jigsaw [33]. Following the completion of the scenario, each participant filled out a report sheet to quantitatively assess their individual understanding of the analysis scenario, then verbally reported their final solution together to the observers. Finally, individual semi-structured interviews were conducted where each participant commented on how they solved the scenario, how this involved collaboration, and their sense of territoriality. 3.5 Data Collection During each scenario, an observer was always present taking notes. Video and audio of every scenario, debriefing, and interview was recorded. The video was coded using PeCoTo [34]. We also collected screenshots in fifteen second intervals and logged mouse actions and active windows. The screenshots played two roles in our analysis. Their primary role was to allow us to “play back” the process of the analysis so that we could observe window movements and the use of the space. Furthermore, we applied the previously described point system in order to evaluate the accuracy of their debriefing, providing a way to quantitatively measure their performance. The scores can be seen below in Table 1. There was no significant difference between overall performance between the Jigsaw and Text Editor tool conditions when evaluated with a t-test, although statistical significance is difficult to show with small sample sizes. Table 1. Overall team scores grouped by tool used comparing aggregated performance
Jigsaw J1 J2 J3 J4
11 -1 -2 -7
Text Editor T1 T2 T3 T4
13 -1 10 14
Co-located Collaborative Sensemaking on a Large High-Resolution Display
595
4 Analysis 4.1 User Activities Each group exhibited a variety of activities depending on their amount of progress to achieve a satisfactory solution. After analyzing the video, interviews, and solution reports, we have concluded that five major activities that were used by the participants, which together formed a strategy for analyzing the data. These were not usually explicitly identified by the participants, but rather tasks that the participants naturally took on in order to uncover the underlying terrorist plot. The five activities are extract, cluster, record, connect, and review and will be described in greater detail below. Although each group exhibited the execution of each activity (one exception being cluster which we will discuss later), the groups used different methods to implement that activity that were often based on the interface condition (Jigsaw or text editor) of the group [Table 2]. Extract. The groups had no starting point or lead to begin with - just fifty text documents and the knowledge that there was a terrorist threat to the nation. Therefore, they needed to familiarize themselves with the information presented within the documents and then extract that which seemed important. In Jigsaw, the visualizations allowed for participants to begin this process by looking at frequently occurring entities and the other entities and documents to which they connected. With the text editor, these features were not available therefore the participants were forced to open and read each document. They then all used color-coded highlighting to distinguish entities and/or important phrases. The coloring scheme was decided upon by the participants, whereas Jigsaw maintains a set color scheme for entities. In the text editor groups, the subjects opened documents in consistent locations to read them, but soon moved the opened documents into meaningful clusters (see next activity). The Extract activity required little display space to complete in either study condition. This activity was done together in some groups with both participants simultaneously reading the same document and in parallel in others with each participant reading half of the documents, often split by document number. Table 2. Five sensemaking activities and their methods for corresponding tool Activity Extract Cluster
Record Connect
Review
Tool Jigsaw Text Editor Jigsaw Text Editor Jigsaw Text Editor Jigsaw Text Editor Jigsaw Text Editor
Method Look over frequently occurring entities and related documents Read all of the documents, together or separately, and highlight (did not occur with this tool as Jigsaw automatically color codes and groups the entities by type) Group document windows by related content Tablet (3 groups), whiteboard & paper (1 group) Whiteboard, paper Loop between list, document view, and Tablet (or paper/whiteboard), together or separately Search function; reread paper, whiteboard, and documents Reread, search for unviewed documents (2 groups) Reread, possibly close windows after reviewing
596
K. Vogt et al.
Cluster. With the text editor, all of the groups found a need to cluster and organize the documents. The groups clustered by grouping the document windows by content in the space (they resembled piles), using whitespace between clusters as boundaries. The clusters eventually filled the display space, allowing the participants to view all documents at once in a meaningful configuration. Even when only one partner organized the documents into clusters, the other partner could easily find documents relevant to a certain topic due to their agreed upon clustering scheme (e.g. chronological order, geographical as shown in Fig. 3). Most text editor groups used the multi-mouse functionality to simultaneously organize the display space. Three of the four groups eventually re-clustered their documents after some analysis. The cluster activity as defined above (spatially arranging document windows) was not present in any of the Jigsaw groups, because Jigsaw organizes the entities and documents through its various functionalities. Many Jigsaw groups, however, clustered relevant entities within their Tablet views, giving spatial meaning to the information recorded.
Fig. 3. Geographical clustering of documents on the large display screen, done by group T4 (T4-B, the forager, arranged the space while T4-A, the sensemaker, instructed document placement)
Record. Recording important information proved to be a useful strategy for all groups. Through interviews the participants revealed that this not only served as a memory aid, but also a way to see what events, dates, people, and organizations related. In the scenarios with the text editor, with two of the groups using the whiteboard and three using scrap paper (one used both), all groups found a need to use an external space to record important information regardless of how much of the display was filled by clusters. This allowed them to preserve the cluster set-up and keep the documents persistent. Three of the Jigsaw groups used the Tablet view to take notes and one group used paper and the whiteboard. Thus all participants devoted a separate space to keep track of pertinent information. Groups also recorded important information verbally to alert their partner to a potential lead, allowing their partner to create a mental record. Connect. In order to make connections and look for an overall plot, the Jigsaw participants would often loop through the list view, document view, and the Tablet, connecting the information they discovered. Two groups worked on this separately and two did this together. With the text editor, participants searched for entities and reread their notes. In comparison to their discourse during the other activities, the groups were more talkative when making connections. Text editor group T1 cleared a screen to use as a workspace for their current hypotheses. They opened relevant documents in their workspace and closed irrelevant documents or documents from which they had extracted all information.
Co-located Collaborative Sensemaking on a Large High-Resolution Display
597
In all text editor cases, the meaning conveyed by clustered documents on the display was helpful in drawing connections. Review. This appeared to be a very important element in the groups’ analyses. Often when one or both partners reread a document for the second, third, or even fourth time, it took on a new meaning to them after they understood the greater context of the scenario. This element of review could also help the participants as they worked to Connect. Two of the Jigsaw groups chose to search for unviewed documents to ensure that they had encountered all potentially important information. Two of the text editor groups began closing windows after they had reread them. Sometimes this was because the document was considered irrelevant. For example, group T3 moved unrelated documents to what they called the “trash window”. They later reread all of the trash window documents and closed those which still seemed irrelevant. The Review activity also included discussing current and alternative hypotheses. While the activities listed in the table can be loosely defined in this sequential order, the order is certainly not set nor were they visited only once within each scenario. Rather, there was often rapid but natural movement between these activities and their methods depending on the current needs of the analysis. In particular, the middle three activities were present many times throughout the study. Extract was only necessary during the first part of each scenario and review was usually only seen after a significant portion of the first activity had been completed. 4.2 Comparison between Sensemaking Loop and Activities The processes we observed closely reflect the Pirolli and Card [21] sensemaking model (Fig. 2) which was developed for individual analysts. We have found that it may also generally be applied to collaborative pairs, although the loop is utilized differently because of the roles that developed. Extract and cluster relate to steps two through seven. The Evidence File and Schema steps were combined by the pairs due to the available display space. They were able to sort evidence into a meaningful schema by placing documents in different areas of the display. Record is very similar to schematizing and connect is a part of developing hypotheses. Review does not directly map to one stage of the sensemaking loop, but rather it is the equivalent of moving back down the loop, analyzing previous work, and returning to the shoebox and evidence file. Note that External Data Sources is not mentioned here because the participants were only presented with fifty documents so we are assuming that prior analysis has moved through this step. The cumulative Presentation directly links to the debriefing following the scenario.
Fig. 4. Screenshot of one of the scenarios, group J2, using Jigsaw, illustrating one way in which the users partitioned the display to conduct individual investigations
598
K. Vogt et al.
While the activities described above and the sensemaking loop hold parallel ideas, we do want to distinguish the two concepts. The overall strategy we propose has been condensed to five activities as a result of the collaboration and space. Additionally, we have given the idea of review new emphasis. This is a very important element in the sensemaking process, but is not explicitly identified in the sensemaking loop. All of the activities, excluding Cluster, were present in both scenarios. This is notable considering the vast differences of the scenarios based on tool type. Since the activities we observed correspond to the Pirolli and Card sensemaking model [21], with the primary difference in user behavior being the tool-specific methods adopted to fulfill those activities, we propose that these activities are very likely to be universal. 4.3 Collaboration Levels The amount of time spent working closely together appears to have impacted the scores. We applied the video coding code set from Isenberg et al. [29] to determine how much time was spent closely coupled (collaborating together) versus loosely coupled (working individually). Closely coupled is defined by Isenberg et al. active discussion, viewing the same document, or working on the same specific problem [29]. Loosely coupled is defined as working on the same general problem, different problems, or being disengaged from the task. Upon graphing this data (Fig. 5), two clusters appear separating the highscoring groups from the low-scoring ones. The high scoring cluster worked closely over 89% of the time spent on the scenario. The low scoring cluster only worked closely in between 42% and 67% of the time. All but one group at least collaborated closely during the remaining half hour of the scenario in order to synthesize their hypotheses. The correlation coefficient between the amount of time spent collaborating closely and score is .96105, suggesting that there is a strong correlation between these variables. This reinforces the result from [29] that strongly links collaboration levels with performance.
Fig. 5. Jigsaw (dark blue) and Text Editor (light green) scores versus collaboration levels
Co-located Collaborative Sensemaking on a Large High-Resolution Display
599
4.4 User Roles All groups divided the responsibilities of the collaborative sensemaking task. The roles could be observed during the study because of actions and conversation, but they were also evident during the interviews following the study. Five of the eight groups established clearly defined collaborative roles (measured through video coding). This appeared to be because the three groups were going through the steps of the analysis independently, but in parallel. Therefore various team-related roles and responsibilities in the analysis were less likely to develop. For the five groups who established clearly defined roles, the two broad roles we identified through this analysis are sensemaker and forager. These high-level roles were primarily established after a considerable amount of the investigation had been completed, normally after the half-way point of the study session. Primarily, the sensemaker tended to be the dominant partner, often dictating what the forager did. Common activities for the sensemaker included standing, writing on the whiteboard, using a hand to point to information (instead of using a cursor), and rarely using a mouse, instead requesting the forager to perform various activities. The forager’s role consisted of questioning the current hypotheses, finding information, and maintaining a better awareness of where the information was located. For example, the sensemaker would request actions such as “can you open [a particular document]?” and the forager would perform the action. These two roles closely match the two primary sub-loops (Fig. 2) in the Pirolli and Card model [21]. The first loop, foraging, involves sorting through data to distinguish what is relevant from the rest of the information. The second loop, sensemaking, involves utilizing the information pulled aside during the foraging process to schematize and form a hypothesis during the analysis. Thus, the sensemaker was more concerned with the synthesizing of the information, while the forager was more involved in the gathering, verifying, and organizing of the information. While the sensemaker and forager each spent the majority of their time at their respective ends of the loop, they did not isolate themselves from the rest of the sensemaking process. To illustrate the distribution of responsibilities prompted by the roles adopted, we will explain in detail two of the pairs where the participants formed distinct roles. These are the two groups in which the roles are most clearly defined, and are therefore the most interesting to talk about. In group T1, the team with the second-highest score, both participants spent the first hour foraging (i.e., exposing, clustering) for information while taking a few breaks to engage in sensemaking activities (i.e., connecting). Participant T1-A (the subject who sat on the left) at times led T1-B’s (the participant who sat on the right) actions by initializing activities or finalizing decisions. At the 68-minute mark, participant T1-B moved to the whiteboard (never to return to the computer input devices) and established a clear, dominant role as sensemaker while T1-A continued to forage for information. Specifically, T1-A organized the documents, searched, and provided
600
K. Vogt et al.
dates, locations, relevant events, etc., but T1-B drew a picture connecting the relevant events working to form a hypothesis and requested information from T1-A. T1-B began focusing on Record and Connect, but they both engaged in the Review activity together. The Review activity was interspersed throughout the scenario as pieces of information inspired participants to revisit a document. During the interviews, T1-B revealed that he wanted to build a chart or timeline to organize their thoughts better. Although interviewed separately, they seemed to have similar views on their roles. T1-B stated, “I basically just tried to stand up there and construct everything while he finds evidence,” while T1-A said, “I was just trying to feed him the data, that was my skill, find it, and he can put it in a flow chart.” The other pair is group T4, the group with the highest score, where T4-A was the sensemaker and T4-B the forager. Again, the sensemaker is the participant (T4-A) who built a picture on the whiteboard, meaning he drove Record and Connect. In fact, T4-A barely touched his mouse after the first fifteen minutes of the scenario. He only had 104 mouse clicks while T4-B had 1374. They worked through Extract and Cluster together, but T4A verbally dictated the clustering while T4-B controlled it with the mouse. While T4-A worked on the whiteboard, T4-B fed him details as needed. As T4-A stated, “We ended up splitting the tasks into organization and story-building… I would say I built most of the story.” Both participants worked on the Review activity, but during this T4-B questioned T4-A’s hypotheses which forced him to justify and support his thoughts. This lopsided mouse usage is not a new method of interaction [35], however, it is interesting that T4-A abandoned his mouse in favor of instructing his partner.
5 Design Implications Viewing all documents simultaneously appeared to be an effective strategy, given the added space provided by the large display. All 50 documents comfortably fit into user-defined clusters. No Jigsaw groups chose this approach, instead relying on the specialized views available. Visual analytics tools designed for large displays should take this into consideration by allowing users to open many documents and flexibly rearrange the clusters as needed. This may not be feasible after the document collection becomes large enough, in which case a tool such as Jigsaw would be valuable in narrowing down the document collection. We recommend that developers combine these two analysis approaches to perform well on all document collection sizes. Because the highest scoring groups had clearly defined user roles while the lowest scoring groups did not, we recommend that co-located collaborative visual analytics tools support the division of responsibilities. One way to achieve this would be to implement specialized views for foragers and sensemakers. Some sensemakers stood and used a physical whiteboard to record their thoughts. All text editor groups used the whiteboard or paper to record their thoughts. One Jigsaw group used the whiteboard while the rest used Jigsaw’s Tablet view. From this we can see a clear need for tools that integrate evidence marshaling and sensemaking
Co-located Collaborative Sensemaking on a Large High-Resolution Display
601
into the analytic process. The Tablet view in Jigsaw and other integrated sensemaking environments such as the Sandbox in the nSpace suite [36] are one approach. Another approach, suggested by the studies conducted by Robinson [18] and Andrews et al. [5] as well as our observations of the text editor group would be to integrate sensemaking tools right into the document space. As we observed in this study, the users of the text editor already were arranging documents into structures based on their content. A logical continuation of this would be to integrate sensemaking tools and representations into this space directly, so that the sensemaking is done directly with the documents, allowing the user to maintain the context of the original source material. We have also considered some frustrations expressed by the users while developing design implications. One issue involved the presence of the taskbar on only one of the eight monitors, an issue recognized in the past (for example GroupBar [37]). It became difficult and inconvenient for the users to locate windows in the taskbar, especially with over fifty windows opened simultaneously. For future visual analytics tools, we recommend implementing a feature that allows easier location of documents. This could be done through a better search feature, such as flashing document windows to make locating them easier.
6 Conclusion We have conducted a study which explores an arrangement for co-located collaborative sensemaking and applied it to intelligence analysis, an application that, to the best of our knowledge, has not yet been seen for this specific set-up and application. We extracted five common activities which the participants used in their overall strategy during collaborative sensemaking. While the activities were common with all groups, the execution of the activities varied based on the tool (Jigsaw or text editor). These activities reflected many of the steps in the Pirolli and Card sensemaking loop [21]. The participants also moved through the loop by using the roles of sensemaker and forager so that the two major areas of sensemaking could be performed synchronously. The groups that adopted these roles tended to score higher. Taking all of these findings into account, we have developed design implications for systems that use multiple input devices collaboratively on a large, vertical display. The application of co-located collaboration to other visual analytics tools should be further investigated in order to develop a more accurate set of guidelines for designing co-located collaborative systems on large displays. We are also interested in studying the impacts of spatially arranged data on co-located collaborative analysis. Acknowledgments. This research was supported by National Science Foundation grants NSF-IIS-0851774 and NSF-CCF-0937133.
602
K. Vogt et al.
References 1. Thomas, J., Cook, K.: Illuminating the Path: The Research and Development Agenda for Visual Analytics (2005) 2. Heer, J.: Design considerations for collaborative visual analytics. Information Visualization 7, 49–62 (2008) 3. Heuer, R.J., Pherson, R.H.: Structured Analytic Techniques for Intelligence Analysis. CQ Press, Washington, DC (2010) 4. Chin, G.: Exploring the analytical processes of intelligence analysts, p. 11 (2009) 5. Andrews, C., Endert, A., North, C.: Space to think: large high-resolution displays for sensemaking. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems. ACM, Atlanta (2010) 6. Waltz, E.: The Knowledge-Based Intelligence Organization. In: Knowledge Management in the Intelligence Enterprise. Artech House, Boston (2003) 7. Stasko, J.: Jigsaw: supporting investigative analysis through interactive visualization. Information Visualization 7, 118–132 (2008) 8. Gutwin, C., Greenberg, S.: Design for individuals, design for groups: tradeoffs between power and workspace awareness. In: Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work. ACM, Seattle (1998) 9. Stefik, M., Foster, G., Bobrow, D.G., Kahn, K., Lanning, S., Suchman, L.: Beyond the chalkboard: computer support for collaboration and problem solving in meetings. Communications of the ACM 30, 32–47 (1987) 10. Stewart, J.E.: Single display groupware. In: CHI 1997 Extended Abstracts On Human Factors In Computing Systems: Looking to The Future, pp. 71–72. ACM, Atlanta (1997) 11. Elrod, S., Bruce, R., Gold, R., Goldberg, D., Halasz, F., Janssen, W., Lee, D., McCall, K., Pederson, E., Pier, K., Tang, J., Welch, B.: Liveboard: a large interactive display supporting group meetings, presentations, and remote collaboration. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 599–607. ACM, Monterey (1992) 12. Pedersen, E.R., McCall, K., Moran, T.P., Halasz, F.G.: Tivoli: an electronic whiteboard for informal workgroup meetings. In: Proceedings of the INTERACT 1993 and CHI 1993 Conference on Human Factors in Computing Systems, pp. 391–398. ACM, Amsterdam (1993) 13. Rekimoto, J.: A multiple device approach for supporting whiteboard-based interactions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 344–351. ACM Press/Addison-Wesley Publishing Co., Los Angeles, California (1998) 14. Wallace, J., Scott, S., Stutz, T., Enns, T., Inkpen, K.: Investigating teamwork and taskwork in single- and multi-display groupware systems. Personal and Ubiquitous Computing 13, 569–581 (2009) 15. Stewart, J., Raybourn, E.M., Bederson, B., Druin, A.: When two hands are better than one: enhancing collaboration using single display groupware. In: CHI 1998 Conference Summary on Human Factors in Computing Systems, pp. 287–288. ACM, Los Angeles (1998) 16. Stewart, J., Bederson, B.B., Druin, A.: Single display groupware: a model for co-present collaboration. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: the CHI is the Limit, pp. 286–293. ACM, Pittsburgh (1999)
Co-located Collaborative Sensemaking on a Large High-Resolution Display
603
17. Birnholtz, J.P., Grossman, T., Mak, C., Balakrishnan, R.: An exploratory study of input configuration and group process in a negotiation task using a large display. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, San Jose (2007) 18. Robinson, A.: Collaborative Synthesis of Visual Analytic Results. IEEE Visual Analytics Science and Technology, 67–74 (2008) 19. Rogers, Y., Lim, Y.-k., Hazlewood, W.R., Marshall, P.: Equal Opportunities: Do Shareable Interfaces Promote More Group Participation Than Single User Displays? Human– Computer Interaction 24, 79–116 (2009) 20. Green, T.M., Ribarsky, W., Fisher, B.: Building and applying a human cognition model for visual analytics. Information Visualization 8, 1–13 (2009) 21. Pirolli, P., Card, S.: The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis. In: International Conference on Intelligence Analysis (2005) 22. Plaisant, C., Grinstein, G., Scholtz, J., Whiting, M., O’Connell, T., Laskowski, S., Chien, L., Tat, A., Wright, W., Gorg, C., Liu, Z., Parekh, N., Singhal, K., Stasko, J.: Evaluating Visual Analytics at the 2007 VAST Symposium Contest. In: Computer Graphics and Applications, vol. 28, pp. 12–21. IEEE, Los Alamitos (2008) 23. Paul, S.A., Reddy, M.C.: Understanding together: sensemaking in collaborative information seeking. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 321–330. ACM, Savannah (2010) 24. Paul, S.A., Morris, M.R.: CoSense: enhancing sensemaking for collaborative web search. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, pp. 1771–1780. ACM, Boston (2009) 25. Morris, M.R., Lombardo, J., Wigdor, D.: WeSearch: supporting collaborative search and sensemaking on a tabletop display. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 401–410. ACM, Savannah (2010) 26. Pioch, N.J., Everett, J.O.: POLESTAR: collaborative knowledge management and sensemaking tools for intelligence analysts. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 513–521. ACM, Arlington (2006) 27. Tobiasz, M., Isenberg, P., Carpendale, S.: Lark: Coordinating Co-located Collaboration with Information Visualization. IEEE Transactions on Visualization and Computer Graphics 15, 1065–1072 (2009) 28. Isenberg, P., Fisher, D.: Collaborative Brushing and Linking for Co-located Visual Analytics of Document Collections. Computer Graphics Forum 28, 1031–1038 (2009) 29. Isenberg, P., Fisher, D., Morris, M.R., Inkpen, K., Czerwinski, M.: An exploratory study of co-located collaborative visual analytics around a tabletop display. In: 2010 IEEE Symposium on Visual Analytics Science and Technology, VAST, pp. 179–186 (2010) 30. Ball, R., North, C., Bowman, D.A.: Move to improve: promoting physical navigation to increase user performance with large displays. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 191–200. ACM, San Jose (2007) 31. Wallace, G., Li, K.: Virtually shared displays and user input devices. In: 2007 Proceedings of the USENIX Annual Technical Conference, pp. 1–6. USENIX Association, Santa Clara (2007) 32. http://www.abisource.com/
604
K. Vogt et al.
33. Kang, Y.-a., Gorg, C., Stasko, J.: Evaluating visual analytics systems for investigative analysis: Deriving design principles from a case study. In: IEEE Visual Analytics Science and Technology, Atlantic City, NJ, pp. 139–146 (2009) 34. http://www.lri.fr/~isenberg/wiki/pmwiki.php?n=MyUniversity.P eCoTo 35. Pickens, J.: Algorithmic mediation for collaborative exploratory search, p. 315 (2008) 36. Wright, W., Schroh, D., Proulx, P., Skaburskis, A., Cort, B.: The Sandbox for analysis: concepts and methods. In: CHI 2006: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 801–810. ACM, New York (2006) 37. Patrick, G.S., Baudisch, P., Robertson, G., Czerwinski, M., Meyers, B., Robbins, D., Andrews, D.: GroupBar: The TaskBar Evolved. In: OZCHI, pp. 34–43 (2003)
Exploring How Tangible Tools Enable Collaboration in a Multi-touch Tabletop Game Tess Speelpenning1, Alissa N. Antle2, Tanja Doering3, and Elise van den Hoven1 1
Abstract. Digital tabletop surfaces afford multiple user interaction and collaboration. Hybrid tabletops that include both tangible and multi-touch elements are increasingly being deployed in public settings (e.g. Microsoft Surface, reacTable). Designers need to understand how the different characteristics of tangible and multi-touch interface elements affect collaborative activity on tabletops. In this paper, we report on a mixed methods exploratory study of a collaborative tabletop game about sustainable development. We explore the effects of tangible and multi-touch tools on collaborative activity. Forty-five participants, in trios, played the game using both versions of the tools. Our analysis includes quantitative performance measures, qualitative themes and behavioral measures. Findings suggest that both tangible and multi-touch tools enabled effective tool use and that collaborative activity was more influenced by group dynamics than tool modality. However, we observed that the physicality of the tangible tools facilitated individual ownership and announcement of tool use, which in turn supported group and tool awareness. Keywords: Tangible interaction, collaboration, CSCL, tabletop gaming, multitouch, Futura, interaction design.
improve spatial cognition [5]. These differences are also described as embodied facilitation [6]. Technologies such as SMART boards, plasma screens, and projected wall displays all have been studied in order to better understand how to design them to support collaborative interaction [7]. Several different tabletop systems have also been developed to support multi-user interaction that facilitates collaboration (e.g. Microsoft Surface, ReacTable). For designers it is important to understand how the characteristics of these different interface styles affect user experience. Some comparative studies have explored how multi-touch tabletops enable social interaction (e.g. [8, 9, 10]). Several studies suggest that multi-touch tabletops stimulate more collaboration compared to traditional user interfaces (e.g. [7, 9, 11, 12]). In addition, tangible artifacts that are part of a user interface may invite people to interact more, because tangible objects lower the threshold for participation [13]. Thus, multi-touch interactive tabletops with additional tangible objects may provide new opportunities for socializing, collaborative learning and working. Nevertheless, as Hornecker and Buur [13] stated, the research field lacks an analysis and understanding of the aspects that support the social aspects of tangible interaction. A design framework that provides concepts or guidelines that help designers understand how to support collaboration related to multi-touch-tangible tabletops does not yet exist. In this paper, we report on findings from an exploratory comparison of multi-touch and tangible controllers for a visualization tool that were developed to augment a collaborative, sustainability learning game, called “Futura: the Sustainable Futures Game” (original game described in [14]). We abbreviate the game title as “Futura” in the remainder of this paper. Our study addressed two research questions: ‘How does tool use differ between multi-touch and tangible interaction?’ and ‘How does the user interface (UI) style of the tools affect collaboration?’ Our goal was to understand how the characteristics of the tangible and the digital controllers for the visualization tools affected activity and collaboration in the context of a tabletop game. The learning goal of Futura is to increase players’ awareness of the complexity of sustainability issues. Previous work with Futura indicated that players had trouble linking individual actions to cumulative, global effects [13]. In addition, many of the groups played the game with a parallel play or individual style at the beginning of game sessions, and shifted to a more collaborative play style only after they played the game several times [14]. Therefore, the goal for the visualization tools was to support players to understand cumulative effects and to enable an earlier shift from individual to collaborative play. We implemented visualization tools that addressed these needs and that could be controlled with either tangible or multi-touch objects. This enabled us to compare the tangible and multi-touch controllers of the visualization tools to determine how UI characteristics changed or enabled interaction and collaboration. For brevity we call the tangible and multi-touch controllers that are part of the visualization tools, the “tools”. Compared to prior work, this exploratory study focused on a comparison of two versions of visualization tools, which were novel elements in the Futura game.
Exploring How Tangible Tools Enable Collaboration
607
Prior research based on the Futura game has included two observational studies focusing on understanding how people learn [14] and a survey-based case study that explores how Futura can be used to engage the public with issues of societal importance [15]. This paper presents a comparative study focusing on how the UI style of these tools affects collaboration. The study was conducted with a redesigned Futura application.
2 Related Work Tabletop-based games have been in the focus of several case studies on interaction with tangible and multi-touch UI elements. Examples of studies of these kinds of hybrid tabletop gaming can be found in [16], [17] and [18]. Few studies explore how the different affordances of tangible or multi-touch interface elements affect interaction and collaboration. There are no existing studies that compare the effects of tangible and multi-touch interaction on collaborative game play activity. In this section, we will summarize some research that compares users’ performance and preference using mouse, multi-touch and tangible user interfaces for a variety of application areas. In a user study in which participants had to implement warehouse layouts with either multi-touch or tangible shelves, Lucchi et al. [19] found that the completion time for the tangible group was faster or more efficient. The authors assumed that this was caused by users’ familiarity with physical artifacts in the everyday world. In another user study where users had to do several matching tasks [20], the authors also found that tangible artifacts were easier and more accurate to manipulate. However, they emphasized that effects likely depend on the task, tangible objects and quality of technological implementation. A user study with children solving jigsaw puzzles using three different UI styles (physical, TUI and GUI) showed no significant differences in children’s enjoyment. Observation data indicated that children’s engagement was higher for both physical and tangible conditions, compared to the graphical condition [21]. A follow-up study using video analysis of hand-actions indicated that there were significant differences in the frequency, duration and type of actions used to manipulate puzzle pieces between UI styles [22] and that tangible and physical UIs facilitated many fast actions, and a more sophisticated puzzle solving strategy that evolved over time. Wang [23] performed a comparative study of a tangible and a multi-touch interface in which he investigated the effect of interface style on users’ strategies and performance solving a jigsaw puzzle. Both multi-touch and tangible interaction enabled complementary actions (i.e. using physical actions on objects to reduce cognitive load) but interaction was more precise and efficient with the TUI version. The author suggests that these performance benefits may be a result of the tangible puzzle’s 3D manipulation space and tactile feedback. In a study by Rogers and colleagues [24], the authors compared a mouse, multitouch and physical-digital condition for a collaborative design task. The mouse condition led to frustration because it resulted in unequal participation. However, it resulted in the most utterances. The physical-digital condition led to more equitable participation in terms of verbal contribution. The multi-touch condition had the most
608
T. Speelpenning et al.
equitable participation in terms of physical actions. Based on their findings, we focus on physical as well as verbal interactions between the users.
3 Background Theory In order to investigate how the characteristics of tangible and multi-touch tools affect collaboration, we first provide a working definition of collaboration. Roschelle and Teasley [25] (page 70) defined collaboration as “a coordinated, synchronous activity that is the result of a continued attempt to construct and maintain a shared conception of a problem”. This activity results from individuals that share and negotiate meanings to solve a problem. In this model, collaboration is measured in terms of communication. Brehmer [26] describes the relationship between collaboration and communication as follows: if the necessity for coordination and cooperation increases, the need for communication also increases. The theory of Computer Supported Collaborative Learning (CSCL) provides an important perspective on collaborative learning which applies to groups playing a learning game, like Futura. A review of the CSCL literature suggests three elements that are required to support collaborative learning: shared objects of negotiation [27], referential anchors [28] and metacognitive processes [29]. In our study, we focus on shared objects of negotiation, because we are interested in characteristics of objects -tools -- that may affect or improve collaboration. We also draw on empirical findings from the tangible interaction literature because designers need to understand how the different characteristics of tangible and multi-touch elements affect collaborative activities on tabletops. We next summarize the following main theoretical concepts from the CSCL and tangible interaction literature that we use to inform our analysis: objects of negotiation, access points, physical constraints and awareness. In our thematic analysis of observational data (below), we also search for emergent concepts. 3.1 Objects of Negotiation Objects of negotiation are objects that externally represent information that is important in some way in the collaborative task [27]. An individual or group can modify this representation during the learning process. If an individual changes a shared representation, he may feel an obligation to announce, discuss or negotiate his action with the rest of the group in advance [27]. In Futura, players must collaborate and therefore communicate about various different external representations in order to win the game and through winning attain the game’s learning goal. The tools we developed enabled players to change the main external representation in the game, as described below in section 4.3. 3.2 Access Points Access points are physical or spatial interface characteristics that enable a user to join a group’s ongoing interactions with a system. They provide perceptual and manipulative access and enable fluidity of sharing [30]. These characteristics may enable a user to interact, participate and join a group’s activity. The amount and type
Exploring How Tangible Tools Enable Collaboration
609
of input devices that are used in shareable interfaces are important factors that shape interaction and affect the type of social interaction. Manipulative access and fluidity of sharing are two elements that are amplified by tangibility. The fluidity of sharing describes how easily one user hands over control to someone else [30]. One of the benefits of tangible tools that result from their physical and spatial qualities is that they may better enable access compared to similar multi-touch tools [30]. 3.3 Physical Constraints Hornecker [13] describes embodied facilitation as physical and perceptual affordances and spatial properties that shape and determine people’s interactions with a physical artifact. Some of the affordances of physical artifacts enable specific actions that are not possible or less obvious when using digital artifacts (e.g. a multi-touch tool). For example, physical or tangible tools can be manipulated in 3D space, whereas the use of multi-touch tools is constrained to 2D space. Because of this, a wider range of actions may be performed with tangible artifacts, and such actions are more visible to other users [23]. 3.4 Awareness Awareness is involved in creating a context for a person’s activities by understanding the activities of others [31]. Ideally, individual contributions are relevant to group activities in that they support the group to collaborate. Hornecker found that multitouch input resulted in more positive indicators of workspace awareness compared to mouse input in a collaborative task including an increase of helping behaviors and more handovers of objects [32]. She suggested “being able to see another’s physical actions can enhance awareness, which in turn can support fluid interaction and coordination” [32] (page 167). Fleck et al. suggest that “maintaining joint attention and awareness” is critical to coordinated, collaborative activity [33]. They suggest that a physical artifact may better enable collaboration because of what we can do with it. For example, a physical artifact draws attention when a user picks it up from a tabletop and moves it in 3D space in front of their body.
4 The Futura Game and Visualization Tools 4.1 Overview Futura (Figure 1) is a multiplayer, tabletop game that challenges players to make decisions that help preserve the environment, and at the same time meet the needs of a growing population. There are three roles to be played in the game: providing food, energy or shelter. Each role can access specific resources. Players can place resources on the world map to support the growing population, but they have to make sure that the resources do not damage the environment. The goal of the game is to balance supporting the population and damaging the environment in a simulated world. The game spans many years and the first version of the game lasts about three minutes. For more details about the original game, see [14, 15]. For a video overview of the Futura project, see http://www.antle.iat.sfu.ca/Futura.
610
T. Speelpenning et al.
Fig. 1. The Futura game (in the multi-touch condition)
4.2 Design Goals: The New Version of Futura Players can only win the game when they collaborate since all three roles must work together. Coordination and communication are necessary. Based on results of a previous study we wanted to better enable player understanding of the cumulative impact of actions on the world state, and to support collaboration based on this information [14]. We augmented the original version of Futura (described in [14]) with two visualization tools. 4.3 Visualization Tools We developed two visualization layers that provide information about the cumulative impact of player actions on the environment, and on the population. Each of the two layers is activated using magnifying glass tools. There are two tangible magnifying glasses (TUI), and two magnifying glasses implemented as digital objects activated through multi-touch (MT). We refer to these as the TUI and MT conditions. In both conditions, the two tools provide players with more information about the cumulative impact of each resource that has been placed on the world map up to that point in the game (on the environment and population respectively). One of the magnifying tools is labeled with a tree and represents the impact on the environment (Figure 2). The other is labeled with people and represents the ability of the current resources to meet the needs of the population. The impact visualizations are generated in real time based on the current state of the game world. The impact visualizations are in the form of
Fig. 2. Tangible (left) and multi-touch (right) magnifying glass tools
Exploring How Tangible Tools Enable Collaboration
611
“heat maps” in which colour indicates the strength of effect. We used two slightly different colour mappings for the two maps so they could be differentiated (red/dark pink is hot or negative; green/blue is cool or positive). For example, for the environmental layer, the colour red indicates areas of high damage caused by resources located in those areas. In the population layer, dark pink indicates areas where resources are not efficiently supporting the population. 4.4 Tool Usage When a player uses one of the magnifying glass tools to activate a visualization layer, the game pauses for everyone, and the appropriate impact visualization appears as an overlay on the game world (Figure 3). In the TUI condition, the tangible magnifying glasses are placed on the edge of the table. Any player can place a tool on the table’s active surface to activate the corresponding impact visualization. Removing the tool removes the visualization. Only one of the visualizations can be displayed at a time.
In the MT condition, the two magnifying glasses are static buttons placed in each lower corner of the tabletop. Touching a multi-touch tool button will activate the corresponding visualization. Touching the resume button will remove the visualization. All other game features are identical between the two conditions. The tools may provide opportunities for collaborative communication and coordination. For example, players may use the displayed impact visualization to trigger discussion of the contribution of each resource type to the current game state and then modify their game strategy. 4.5 System Implementation The Futura game runs on a multi-touch version of EventTable, a custom-made digital tabletop system that facilitates both multi-touch and tangible interaction, the latter using the reacTIVision-tracking library [34]. Fiducial markers are used to track the tangible tools. The platform is initially described in [14]. Some modifications were made to improve finger tracking. For example, three cameras were added to the existing camera configuration. This is described in detail in [23]. Fiducial markers are used to track the tangible tools.
612
T. Speelpenning et al.
5 Study Methodology In order to understand the differences between tangible and multi-touch tool use, and the effects of tool interface style on collaboration, we designed an exploratory comparative study between a tangible interface (TUI) condition and a multi-touch (MT) condition of the visualization tools for the game Futura. We used a within subjects design, because the group dynamics and personality of users may have a major influence on the results [35]. Another advantage of a within subjects design is that users are able to compare the two UIs. The order of the conditions was counterbalanced (AAABBB or BBBAAA). 5.1 Participants We collected data from 45 participants who played the game in groups of three players resulting 15 different groups. Participants (26 male, 19 female) in our study were predominantly graduate and undergraduate students from different programs at Simon Fraser University (Canada). Ages ranged from 18 to 36 years old (M =23, SD = 3.69). Most participants reported that they were in the third year of their study (14 out of 45). Most participants reported that they played computer games sometimes (16 out of 45) or often (13 out of 45), but had never (22 out of 45) or rarely (17 out of 45) interacted with a digital tabletop. Participants were recruited through a user study pool at the university. The groups were randomly assigned and thus participants knew each other to different degrees: some were friends (13 out of 45), classmates (16 out of 45), others did not know each other at all (16 out of 45). Participants were rewarded with a $10 gift card for participating in our study. 5.2 Procedure To minimize potential learning effects, we explained the mechanics of the game carefully before participants started playing using an in-game tutorial. To stimulate the use of the magnifying glass tools, we demonstrated and encouraged the use of them at the end of the tutorial. Participants played each of the two conditions (TUI and MT) three times. In previous studies, we noticed that many groups played the game about three times before they were able to win (without a tutorial) [14]. They were directed to try to win the game. Each game took about five minutes (longer than the original three minute version of the game because players used the tools to pause the game). The total duration of the study was about 50 minutes for each group. The participants were asked to fill in the game questionnaire after playing in both conditions. The study setup is shown in Figure 4. 5.3 Data Collection and Analysis The mixed methods study design involved collecting data about demographics, game performance, tool use behaviors, player opinions, as well as observations of interaction and play patterns. Quantitative measures included game performance, game scores, and tool use. Tool use data was analyzed for both frequency of
Exploring How Tangible Tools Enable Collaboration
613
occurrence and temporal patterns of usage. In the questionnaire, we asked about learning outcomes, collaboration and tools use. For example we asked the participants if they used the magnifying glasses (TUI and MT) to play the game and if they collaborated with others. We also asked participants if they thought the physicality of the tools made a difference. We used both open and closed questions, the latter ranked with a 7-point Likert scale. We analyzed numeric data with Wilcoxon test to compare the difference of two means, since the data involved repeated measures and included non-Normal count data and ordinal Likert ratings.
Fig. 4. Playing Futura in groups of three (left). The setup of the user study, showing the Futura game tabletop, three players, microphones, video camera and a additional screen (right).
Since we were interested in how the tangible tools influenced collaboration, we also collected qualitative observational notes and video. Real-time observation of the table surface on an additional screen helped the observers to see what was going on in the game without being intrusive (Figure 4, right). We used a structured note taking approach to record how they interacted and collaborated with each other, when players started interacting, how they used the tools, and how this changed over time. Afterwards three researchers looked for patterns in collaboration. We did this by analyzing our observational notes and video through the lenses of theoretical concepts (objects of negotiation, access points, physical constraints and awareness) as well as searching for other repeated patterns or themes. Based on previous work comparing physical-digital and multi-touch tabletops, in which verbal and physical acts were both deemed important [24], we considered verbal interactions including: utterances, discussion, information exchange, negotiation, and suggestions, and physical interactions including: gesturing and performing actions on game objects.
6 Results Our results address two research questions: ‘How does the tool use differ, comparing multi-touch and tangible interaction?’ and ‘How does the UI style of the tools affect collaboration?’ Results provide insights about how players use the tools, how they collaborate with and through the tools, and with each other. We report our general tool use related findings first, then follow with our four theoretical themes: object of negotiation, access points, physical constraints and awareness. We introduce new themes under each of these, as appropriate.
614
T. Speelpenning et al.
6.1 General Tool Use We observed that all 15 groups used the environment and population magnifying glasses in both the TUI and MT touch conditions. Groups used the tools during most of their games (72 out of 90 games). In total, we counted 204 magnifier usages in 90 games (90 = 2x3 games x 15 groups). Of these, 108 usages occurred with the TUI tools and 96 with the MT tools. Neither condition significantly differed regarding frequency of usage (average number of usages per game TUI: 2.4, MT: 2.1). However, the frequency of usage varied from group to group. The maximum number of magnifying glass usages (summed over all six games in both conditions for each group) was 20 and the minimum was three. Basic usage patterns were similar for the TUI and MT groups. For example, we found that users would normally not use the magnifier glasses early in the first game (in either condition), since they were still focused on understanding their own game roles and possibilities in the game. Sometimes players seemed to forget about the tools, but started to use them later, “Maybe we should try these” (man, age 26, group 2, game 2). Furthermore, if people figured out a strategy to win the game, which normally took at least a few games, they used the tools less, since they did not need the additional information. We also looked at how long people used the tools and if there were temporal patterns of usage (e.g. using a tool at a specific moment in the game or using both magnifier glasses directly one after each other). We found that early magnifier glass usages (within the first 45 seconds) occurred more in the second condition, regardless of which condition came second (first session = MT 2, TUI 3; second session = MT 8, TUI 10). The pattern of using both magnifier glasses shortly after each other occurred more often when players played their first three games with the TUI tools. We suggest that this could indicate that the TUI tools were more suitable for shared explorations. This suggestion needs further research. 6.2 Object of Negotiation The tools affected the game play of everyone in the game when they were used. Our analysis of qualitative observational data indicated that people used the tools in different ways independent of condition. For example, some players only quickly glanced at the impact visualizations without having a conversation and others explored the impact visualizations in depth but did not talk. Others used it to discuss either game functionality or game strategy, sometimes including gestures like pointing to specific points on the visualization and thus, focusing all players’ attention to this spot. We observed these different behaviors equally between players in both conditions, and we did not find any significant differences in how they used the tools to negotiate with each other or play the game. The characteristics and quality of collaboration we observed depended primarily on the group dynamics and personality of individuals. As can be seen in Figure 5, some groups communicated and interacted frequently while other groups played the game largely individually. The differences in group dynamics can be explained by the different relationships in groups: some players knew each other well, others were strangers; some groups consisted of quiet and shy people and other groups had one or two main “talkers”. The finding that group dynamics is an influencing factor is in line with findings from other studies [12, 24, 35].
Exploring How Tangible Tools Enable Collaboration
615
Analysis of the questionnaire data indicated that although users tended to like the second tool they used more, many preferred the TUI tools regardless of if they used them in the first or section sessions (26 vs. 17). This contradiction between use and preference may be explained by the learning effect of the game: when players understood the game better, they were more likely to win and thus liked everything about the game more (including the tools). We did not see differences in collaboration responses in the questionnaire data. For example, the Likert rating for the statement “The magnifying glasses helped me to collaborate with other players” was similar in both conditions (Z = -.938, n.s. = .348). One person said he would like to have tools for everyone individually. In both conditions and in both session orders, players used the environment tool more than the population tool (60% vs. 40%). We suggest that players may have understood the environment impact visualization better than the population one. In the environment impact visualization, the colour red represented damage, which was easily understandable for all users, “See, here where the marine resource is, it's less polluted” (woman, 22, group 12). However, the population impact visualization showed how well the population was supported. Dark pink meant that the population was not well supported. This was not always clear, “we need more population” (man, 22, group 15). Other players related the population impact visualization to density.
Fig. 5. Group dynamics; collaborating group (on top) and individual play (bottom), in both conditions (MT left, TUI right)
6.3 Access Points We would expect a difference between the conditions related to access points, because the physicality of tangible tools enables mobility and handing over or passing around the tools, which could have influenced collaboration. However, our results showed no differences in how players accessed the game in either condition. This may be because all three players started the game together. The effect of access points may be more important when an additional player is joining an existing game.
616
T. Speelpenning et al.
6.4 Physical Constraints In the TUI condition, we observed that initiating and resuming of the impact visualizations were done significantly more often by the same person than in the MT condition; 77,4% vs. 56% (Wilcoxon test; Z = -5.83, p<.001). In the MT condition, a different person more often resumed the game by touching the resume button in the middle of the screen. However, this does not suggest more collaboration because it often happened without the resuming player checking with the other players. In the MT group, players often reached over and canceled the impact visualization that someone else activated, without clear announcement or communication (Figure 6). This may have occurred less in the TUI group because of the physical and spatial constraints on the actions that can be taken with a physical object [13]. This finding can be explained by looking to meaning rather than behaviors alone. Each type of action that is enabled by an artifact’s affordances can change the meaning of the artifact. Physicality opens up meaning making. We suggest that the finding that many more players resumed the game themselves in the TUI condition may be explained the concept of ownership. The physicality of the tool enables the action of picking it up and holding it, which may in turn cause a feeling of ownership. Psychological ownership is defined as the state in which individuals feel an object is theirs [37]. Thaler [36] uses the term “endowment effect” to describe the idea that goods that belong to one’s endowment are valued higher than identical goods that do not. Endowment contributes to a feeling of ownership. When a player picks up and holds one of the TUI magnifying tools, they, and other players, may all sense that the object temporarily belongs to that player. Other players may then hesitate to take action on the object which was placed by, and temporarily owned by, another player. Further investigation is needed to find out whether it is this sense of ownership that inhibits the initiating and resuming of other players, when using physical tools.
Fig. 6. Initiating and resuming tools is more often done by different players in the MT condition (top); and more often done by the same player in the TUI condition (bottom). For both conditions, from left to right: 1) Player activates impact visualization; 2) Impact visualization is activated; 3) Player resumes the game.
Exploring How Tangible Tools Enable Collaboration
617
6.5 Awareness Our analysis of observational data indicated that players more often announced that they were going to use one of the tools in the TUI condition compared to the MT condition. Announcements can be verbal, gestural or take the form of actions on objects. For example, announcements include visibly holding up a physical tool or waiting for others to finish their tasks before using a tool. Using the tools affected the game play of everyone and so an individual may have felt an obligation to obtain agreement from the other players before using a tool. In our analysis, we found more instances of gestures, eye contact, and verbal announcements in the TUI condition. This may be because tangible interaction in 3D space makes individual actions more legible, which leads to better peripheral awareness. People need to make body movements in order to use both the multi-touch and the TUI magnifying tools. However, the 3D interaction space and the physicality of the TUI tools make the use more visible when they are lifted off the table in preparation for placement or to resume play [23]. This visibility may have made people more aware of each other and the tools (Figure 7). When asked why they liked the TUI tools more, some users (6 out of 26) said that they liked being aware of using a physical artifact.
Fig. 7. No tool usage announcing in MT condition (left), clear gesture of tool usage announcement in TUI condition (right)
Awareness contributes to group coordination and collaborative context [31]. Coordination of actions is an important part of collaborating [32]. Announcing the use of the tools may contribute to the coordination of collaboration [33]. In a collaborative learning game, individual contributions are relevant to group activities. Awareness is critical when designing a collaborative system [31] and announcing the use of the TUI tools can be seen as an increasing awareness about the game. Gestures and utterances make it clear that people are going to use the tools. The findings that players in the TUI condition more often activated the visualization, that the same person more often resumed the game, and players more frequently announced tool use support the suggestion that the TUI design better enables tool awareness. We suggest that these findings support the effectiveness of the TUI tools for enabling awareness. Enhancing awareness supports fluid interaction and coordination during collaboration.
618
T. Speelpenning et al.
6.6 Limitations The tools we added to the game are not mandatory to win the game. Although all groups used the tools at least once, it may be possible that players did not think they were useful and decided not to use them frequently. In some cases, the players linked information to the impact visualization that was not related at all. For example, some players interpreted the game world and impact visualization as location specific “maybe we can put highly polluted things far away” (woman, 27, group 7). While the impact map shows the impact of each resource at the location where the resource has been placed, the spatial locations of resources on the map do not mediate their impact in any way. For example, placing a highly polluting resource anywhere on the map would affect the game state in the same way. This possibly confused players. Furthermore, since we did not test the effectiveness of the impact visualizations, it is possible they were confusing instead of helpful. However, any negative effects would have occurred in both conditions. The tangible object recognition system (using fiducial markers) did not work as precisely as the touch recognition of the system. This means that activation and deactivation of the visualizations using the multi-touch (buttons) was easier and faster than using the TUI objects. This may explain why 17/45 participants liked the multitouch tools better. Of these participants who liked the multi-touch tools more, all of them gave these reasons: the MT tools worked better, faster or were easier to use. There was also some latency between activating a tool and the appearance of the impact visualization (about 0.5s). Although the latency was present in both conditions, it was slightly longer in the TUI condition. Neither of these limitations appears to directly impact our results about ownership or group and tool awareness. Generalization of our results should take into account that we performed this study in a lab setting. For example, this limits our ability to make inferences about tool use, collaboration and public engagement in a public setting. In addition, the statistical conclusion validity is limited because of the small sample size of 15 groups.
7 Conclusions In this paper, we present the results of an exploratory study that compared the differences between tangible and multi-touch tool use, and the impact of tool use on collaboration in a digital tabletop game. Analysis of tools use and its impact on collaboration was viewed through a concept from the theory of CSCL: objects of negotiation, and concepts from previous empirical research in tangible interaction: access points, physicality and awareness. Research that includes analysis using theories about collaborative learning and previous findings from tangible interaction studies for tabletop game applications is one of the unique contributions of our work. This paper presents new and valuable insights about tool use, objects of negotiation, access points, physical constraints and awareness in collaborative tabletop settings. We provide insights about how people participate and interact with both tangible and multi-touch tools in the context of a tabletop game. Our gamespecific findings include that the tools were used in different ways regardless of UI style. Some players used it individually, through quiet pondering, and others used the
Exploring How Tangible Tools Enable Collaboration
619
tools intensively and collaboratively to helped them to discuss the game strategy. We also found that players in the MT group often “took over” tool usage that a different player had initiated without consultation. We suggest that the 3D interaction space and physicality the TUI tools may have contributed to the feeling of psychological ownership which prohibited this from happening as much in the TUI condition. This is supported by the finding that tangible tool use was correlated with more player announcements (both verbal and behavioral). We suggest that better awareness of group activity, and therefore better collaboration, can occur when the same player activates the visualization and resumes the game when this is accompanied by some form of announcement. In terms of player preferences, we found that, despite implementation limitations, players still preferred the tangible tools. Overall, our results provide limited support for the benefits of tangibility. Physical affordances can change the meaning of the artifact as well as actions taken with it. We introduced the concept of ownership which we used to explain player behaviors in the TUI condition. The physicality of the tools and the 3D interaction space may enhance ownership, enable announcements of use, and facilitate awareness. Together these processes and forms of interaction support coordinated collaborative activity. We presented the details of our analysis to support designers to make decisions about tangible and multi-touch elements of digital tabletops deployed to facilitate collaborative activity. Acknowledgements. The authors would like to thank Sijie Wang and Nahid Karimaghalou for their technical support, Anna Macaranas, Allen Bevans, Katie Seaborn and Josh Tanenbaum for their ideas, thoughts and practical support. Thanks to Wijnand IJsselsteijn for his coaching. In addition, we would like to thank all the participants of our user study for their participation. This research was funded by the NSERC RTI and the GRAND NCE grants from Canada.
References 1. Fitzmaurice, G., Ishii, H., Buxton, W.: Bricks: Laying the Foundations for Graspable User Interfaces. In: CHI 1995, Denver, CO, USA, pp. 442–449. ACM Press, New York (1995) 2. Ullmer, B., Ishii, H.: Emerging Frameworks for Tangible User Interfaces. In: HumanComputer Interaction in the New Millenium, pp. 579–601. Addison-Wesley, Reading (2001) 3. Kirk, D., Sellen, A., Taylor, S., Villar, N., Izadi, S.: Putting the Physical into the Digital: Issues in Designing Hybrid Interactive Surfaces. In: BSC HCI 2009, Cambridge, UK, pp. 35–44 (2009) 4. Jordà, S., Geiger, G., Alonso, M., Kaltenbrunner, M.: The reacTable: Exploring the Synergy between Live Music Performance and Tabletop Tangible Interfaces. In: TEI 2007, Baton Rouge, LA, USA, pp. 139–146. ACM Press, New York (2007) 5. Maher, M., Kim, M.: Do Tangible User Interfaces Impact Spatial Cognition in Collaborative Design? In: Luo, Y. (ed.) CDVE 2005. LNCS, vol. 3675, pp. 30–41. Springer, Heidelberg (2005) 6. Hornecker, E.: A Design Theme for Tangible Interaction: Embodied Facilitation. In: ECSCW 2005, pp. 23–43. Springer, New York (2005)
620
T. Speelpenning et al.
7. Rogers, Y., Rodden, T.: Configuring Spaces and Surfaces to Support Collaborative Interactions. In: Public and Situated Displays, pp. 45–79. Kluwer Publishers, Dordrecht (2003) 8. Waldner, M., Hauber, J., Zauner, J., Haller, M., Billinghurst, M.: Tangible Tiles: Design and Evaluation of a Tangible User Interface in a Collaborative Tabletop Setup. In: CHI 2006, Sydney, Australia, pp. 151–158. ACM Press, New York (2006) 9. Scott, S., Carpendale, S.: Guest Editors’ Introduction: Interacting with Digital Tabletops. In: IEEE Computer Graphics and Applications, vol. 7(4), pp. 24–27. IEEE Press, New York (2006) 10. Rekimoto, J., Ullmer, B., Oba, H.: DataTiles: A Modular Platform for Mixed Physical and Graphical Interactions. In: CHI 2001, Seattle, WA, USA, pp. 269–276. ACM Press, New York (2001) 11. Price, S., Falcao, T., Sheridan, J., Roussos, G.: The Effect of Representation Location on Interaction in a Tangible Learning Environment. In: TEI 2009, pp. 85–92. ACM Press, New York (2009) 12. Piper, A., O’Brien, E., Morris, M., Winograd, T.: SIDES: A Cooperative Tabletop Computer Game for Social Skills Development. In: CSCW 2006, Banff, Canada, pp. 1– 10. ACM Press, New York (2006) 13. Hornecker, E., Buur, J.: Getting a Grip on Tangible Interaction: A Framework on Physical Space and Social Interactions. In: CHI 2006, Montréal, Canada, pp. 437–446. ACM Press, New York (2006) 14. Antle, A.N., Bevans, A., Tanenbaum, J., Seaborn, K., Wang, S.: Futura: Design for Collaborative Learning and Game Play on a Multi-touch Digital Tabletop. In: TEI 2011, Fungchal, Portugal, pp. 93–100. ACM Press, New York (2011) 15. Antle, A.N., Tanenbaum, J., Tanenbaum, K., Bevans, A., Wang, S.: Balancing Act: Enabling Public Engagement with Sustainability Issues through a Multi-Touch Tabletop Collaborative Game. In: Campos, P., et al. (eds.) INTERACT 2011, Part II. LNCS, vol. 6947, pp. 194–211. Springer, Heidelberg (2011) 16. Magerkurth, C., Memisoglu, M., Engelke, T., Streiz, N.: Towards the Next Generation of Tabletop Gaming Experiences. In: Proc. of Graphics Interface, Waterloo, Canada, pp. 73– 80 (2004) 17. Dang, C.T., Straub, M., André, E.: Hand Distinction for Multi-Touch Tabletop Interaction. In: ITS 2009, Banff, Canada, pp. 101–108 (2009) 18. Leitner, J., Haller, M., Yun, K., Woo, W., Sugimoto, M., Inami, M., Cheok, A., BeenLirn, H.: Physical Interfaces for Tabletop Games. Computers in Entertainment, vol. 7(4), pp. 1–21. ACM Press, New York (2009) 19. Lucchi, A., Jermann, P., Zufferey, G., Dillenbourg, P.: An Emperical Evaluation of Touch and Tangible Interfaces for Tabletop Displays. In: TEI 2010, Cambridge, MA, USA, pp. 177–184. ACM Press, New York (2010) 20. Tuddenham, P., Kirk, D., Izadi, S.: Graspables Revisited: Multi-Touch vs. Tangible Input for Tabletop Displays in Acquisition and Manipulation Tasks. In: CHI 2010, Atlanta, Georgia, USA, pp. 2223–2232. ACM Press, New York (2010) 21. Xie, L., Antle, A.N., Motamedi, N.: Are Tangibles More Fun? Comparing Children’s Enjoyment and Engagement Using Physical, Graphical and Tangible User Interfaces. In: TEI 2008, Bonn, Germany, pp. 191–198. ACM Press, New York (2008) 22. Antle, A.N.: Exploring How Children Use their Hands to Think: An Embodied Interactional Analysis. Behaviour and Information Technology, http://www.antle.iat.sfu.ca/Physicality/ThinkingWithHands (accepted)
Exploring How Tangible Tools Enable Collaboration
621
23. Wang, S.: Comparing Tangible and Multi-touch Interfaces for a Spatial Problem Solving Task. Masters Thesis. Simon Fraser University, Surrey, BC, Canada (2010), https://theses.lib.sfu.ca/thesis/etd6352 24. Rogers, Y., Lim, Y., Hazlewood, W., Marshall, P.: Equal Opportunities: Do Shareable Interfaces Promote More Group Participation Than Single User Displays? Human Computer Interaction 24(1/2), 79–116 (2009) 25. Roschelle, J., Teasley, S.: The Construction of Shared Knowledge in Collaborative Problem Solving. In: CSCL 1995, Berlin, Germany, pp. 69–197 (1995) 26. Brehmer, B.: Distributed Decision Making: Some Notes on the Literature. In: Rasmussen, J., Brehmer, B., Leplat, J. (eds.) Distributed Decision Making: Cognitive Models for Cooperative Work. John Wiley & Sons, New York (1991) 27. Suthers, D.: Representational Guidance for Collaborative Learning. Artificial Intelligence in Education, 3–10 (2003) 28. Clark, H., Brennan, S.: Grounding in communication. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Perspectives on Socially Shared Cognition, pp. 127–149. American Psychological Association, Washington, DC, USA (1991) 29. Duffy, T., Dueber, B., Hawley, C.: Critical Thinking in a Distributed Environment: A Pedagogical Base for the Design of Conferencing Systems. In: Electronic Collaborators: Learner-Centered Technologies for Literacy, Apprenticeship, and Discourse, pp. 51–78. Lawrence Erlbaum Associates, Mahwah (1998) 30. Hornecker, E., Marshall, P., Rogers, Y.: Entry and Access - How Shareability Comes About. In: DPPI 2007, Helsinki, Iceland, pp. 328–342 (2007) 31. Dourish, P., Bellotti, V.: Awareness and Coordination in Shared Workspaces. In: CSCW 1992, Toronto, Canada, pp. 107–114. ACM Press, New York (1992) 32. Hornecker, E., Marshall, P., Sheep Dalton, N., Rogers, Y.: Collaboration and Interference: Awareness with Mice or Touch Input. In: CSCW 2008, San Diego, CA, USA, pp. 167– 176. ACM Press, New York (2008) 33. Fleck, R., Rogers, Y., Yuill, N., Marshall, P., Carr, A., Rick, J.: Actions Speak Loudly with Words: Unpacking Collaboration Around the Table. In: ITS 2009, Banff, Canada, pp. 189–196 (2009) 34. Kaltenbrunner, M.: ReacTIVision and TUIO: A Tangible Tabletop Toolkit. In: ITS 2009, Banff, Canada, pp. 9–16 (2009) 35. Meerbeek, B., Bingley, P., Rijnen, W., Hoven van den, E.: Pipet: A Design Concept Supporting Photo Sharing. In: NordiCHI 2010, Reykjavik, Iceland (2010) 36. Thaler, R.: Toward a Positive Theory of Consumer Choice. Journal of Economic Behavior & Organization 1(1), 39–60 (1980) 37. Beggan, J.: On the Social Nature of Nonsocial Perception: The Mere Ownership Effect. Journal of Personality and Social Psychology 62(2), 229–237 (1992)
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration in Information Seeking Mathias Heilig1, Stephan Huber1, Jens Gerken1, Mischa Demarmels2, Katrin Allmendinger3, and Harald Reiterer1 1
Abstract. Social activities such as collaborative work and group negotiation can be an essential part of information seeking processes. However, they are not sufficiently supported by today’s information systems as they focus on individual users working with PCs. Reality-based UIs with their increased emphasis on social, tangible, and surface computing have the potential to tackle this problem. By blending characteristics of real-world interaction and social qualities with the advantages of virtual computer systems, they inherently change the possibilities for collaboration, but until now this phenomenon has not been explored sufficiently. Therefore, this paper presents an experimental user study that aims at clarifying the impact such reality-based UIs and its characteristics have on collaborative information seeking processes. Two different UIs have been developed for the purpose of this study. One is based on an interactive multi-touch tabletop in combination with on-screen tangibles, therefore qualifying as a reality-based UI, while the other interface uses three synchronized PCs each controlled by keyboard and mouse. A comparative user study with 75 participants in groups of three was carried out to observe fundamental information seeking tasks for co-located collaboration. The study shows essential differences of emerging group behavior, especially in terms of role perception and seeking strategies depending on the two different UIs. Keywords: Collaboration, Tabletop, Tangible User Interface, Information Seeking, User Study.
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
623
empirical work expresses the importance of collaborative activities during information seeking processes. Morris and Teevan [14] give some figurative examples: students working together to complete assignments, friends seeking information about joint entertainment opportunities, family members jointly planning a vacation travel or colleagues conducting research for their projects. Furthermore, Kuhlthau [10] defines information seeking as a constructive process, where social and collaborative activities are essential to advance the knowledge work process. Working collaboratively enhances evidently the quality of information seeking activities in many different aspects. One example is the increasing coverage of the relevant information space as well as a reduction of unnecessary and redundant work. Another example is the higher confidence in the quality of findings, through the constructive development of strategies and answers in a group, which is often composed of people with different expertise. 1.2 Reality-Based User Interfaces In today’s digital information seeking systems collaborative search is not sufficiently supported. One obvious reason is the limitation of desktop or terminal PCs, which are controlled by mouse and keyboard and therefore do not offer appropriate mechanisms for collaborative work. To overcome this gap several researchers (e.g. [2], [13]) have proposed the use of multi-touch tabletops for co-located, collaborative information seeking activities. These researchers assume that the horizontal form factor of a tabletop interface democratizes the interaction between multiple users through the possibilities of simultaneous touch operations. Furthermore, these settings promise a more natural interaction between users in a way that enhances the perception of the others’ interaction, gestures and posture during a work and discussion. Additionally, the concept of tangible user interfaces (TUIs) is also proposed as a tool to support collaborative activities [6]. Through the possibility of parallel manipulation and their physical affordance, they are able to further enhance co-located, collaborative activities with digital information systems. Explanatory models for these effects are often derived from cognitive science and psychology. For instance, the embodiment theory [3], which indicates that our cognitive development is crucially influenced by our physical and social interaction with objects and living beings of our environment, is gaining more and more attention in human-computer interaction. Besides, the field of HCI has started to build up its own explanatory model: the paradigm of reality-based interaction [8] summarizes the findings from cognitive science, the technical evolution with regard to multimodal interaction, and surface, tangible and social computing. The aim of this paradigm is to guide the interaction design of digital systems by putting the emphasis more on the interaction with the real, non-digital world, thus designing it more “reality-based” and more natural. Different input techniques enable multimodal interaction to take advantage of the physical capabilities of the users. To improve the understanding of digital systems, UIs are based on the rules of the physical world. Also the everyday knowledge of the users is regarded as an instrument to design simple and effective computer systems. Reality-based UIs respect the social skills of the users to enable for example collaborative work. Therefore we think that reality-based UIs have the potential to enhance collaborative, co-located information seeking.
624
M. Heilig et al.
1.3 Research Focus and Goals In this paper we present an experimental user study with 75 participants that aims at clarifying the impact of reality-based UIs and their characteristics on the collaborative information seeking processes in comparison to desktop-based PC UIs. Thereby our focus lies on tightly-coupled collaboration during the exploration and filtering of search results. These tasks are typical situations in information seeking where it is beneficial for people to work together [10]. Reality-based UIs provide adequate features to support this aspect as revealed by several studies (e.g. [5], [12]). The focus of this work is to detect hidden details on how the interaction, the communication as well as strategies change dependent on different UI types. Special attention of the study was paid to the emerging roles and behavior patterns that occur during collaborative work. Therefore, our study is guided by three research questions to disclose the mechanics of co-located, collaborative information seeking with reality-based UIs: (Q1) How do reality-based UIs influence the interaction strategies in comparison to PC-based UIs? (Q2) What impact do the two UIs types (realitybased and PC-based) have on the communication (verbal and non-verbal) during the group work? (Q3) Are there differences in the occurrence of roles people adopt during the group work depending on the UI? The paper is composed of six chapters: After the introduction (1) important related work is discussed (2). In this section research is presented that empirically explored impacts of reality-based UIs in collaborative conditions. The insights of these projects will be summarized and the need for our study will be emphasized. Then, the concepts behind the two UIs (3) that were used in our comparative user study will be described. Thereafter, the design of the study and data analysis will be explained (4), followed by the presentation of the results (5). A special focus will be laid on the explanation of different roles and strategies participants adopted depending on the interface condition. Finally, the paper closes with a discussion and a conclusion (6).
2 Related Work In the last years numerous systems have been developed that explicitly or implicitly adapt reality-based concepts. Many of these approaches have been designed to enhance co-located collaborative work. The following section introduces selected approaches that focus not only on the design of such UIs, but also on the evaluation showing their potential practical impact. 2.1 Interactive Surfaces The interactive surface WeSearch [13] has been designed for collaborative web search to leverage the benefits of tabletop displays for face-to-face collaboration. The system was also an integral part of a user study, which showed that tabletop displays are effective platforms to facilitate collaborative web search. Furthermore, the study revealed that tabletop displays enhance the awareness of group members’ actions and artifacts such as search criteria and allow natural transitions between tightly- and loosely-coupled work styles.
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
625
Another research project [11] evaluated how different configurations of input (single-mouse, single-touch, multi-mouse and multi-touch) influence the balance of inter-user participation around a tabletop interface during planning tasks. The project showed that tabletop UIs could be designed to enable a more balanced participation. The paper further reported that with touch interaction fingers rather than voices do the talking: “interactive participation is more equal with touch input and multiple entry points than with mice or single input, but verbal participation is not”. Isenberg et al. [7] investigated co-located collaboration on a multi-touch tabletop for complex visual analytics. They intensively analyzed the closeness of teams’ collaboration and the influence of the group work on the task performance. The study showed that teams, which worked tightly together, were more successful completing the task and required fewer support. Furthermore, the study presented eight types of collaboration styles that identify how people work together while solving problems. 2.2 TUIs and Hybrid Interactive Surfaces TUIs are also proposed as a tool to enhance co-located collaboration [6]. They benefit from possibilities such as parallel, physical manipulation and further inherit implicit characteristics such as a better awareness of the others’ actions through their visibility and their physical affordance. This promises to facilitate the involvement and active participation in group work without explicit verbal communication. The use of hybrid interactive surfaces – tangible interaction in combination with tabletop displays [18] – is an approach to combine the advantages of interactive surfaces and TUIs for colocated collaborative work. However, until now there are only few empirical studies that explored the impact of hybrid interactive surfaces. The user study in [4] showed first results that hybrid interactive surfaces exhibit considerable advantages, with respect to parallel interaction of multiple users, in comparison to classic search interfaces. Furthermore, Jetter et al. [9] introduced with their Facet-Streams a hybrid interactive surface for co-located, collaborative product search. The system uses techniques of information visualization with tangible and multi-touch interaction to materialize collaborative search on an interactive surface. Two user studies demonstrated the potential of this hybrid UI concerning visual and physical affordance as well as simplicity in interaction. With regard to collaborative work the authors observed an increased awareness and better mutual support among collaborators and seamless transitions between tightly-coupled collaboration and loosely-coupled parallel work. 2.3 Insights and Open Research Areas The discussed research projects reveal that multi-touch tabletop displays offer promising possibilities for co-located, collaborative work such as the equal access to information, the smooth transitions between individual and collaborative activities as well as providing a more balanced participation. Besides, hybrid interactive surfaces show additional qualities for collaboration such as parallel, physical manipulation or the increased awareness and better mutual support among collaborators. However, until now the influence of such UIs on collaborative work, especially on information seeking, has not been explored in detail and efforts to identify the
626
M. Heilig et al.
mechanics of collaboration in these tasks and how reality-based UIs might support them are missing. Therefore, the aim of our experimental user study is to provide detailed insights on whether and how the interaction, the communication as well as strategies of users during collaborative exploration and seeking activities changes dependent on different interface types. Since the introduced studies already revealed that reality-based UIs might offer benefits beyond efficiency and result quality [5], these parameters are not the main focus of our study. Instead, our findings will be manifested in different behavior pattern and roles that occur during group work in information seeking.
3 Experimental User Interfaces In order to be able to study the influence of reality-based UIs on collaborative search, we designed two UI configurations (see fig. 1). Each UI represents one specific interface type (reality-based versus PC-based). The experimental systems differ only in the interaction mechanics (independent variable), whereas search as a shared and co-located experience should remain stable to assure a fair and valid comparison. The following sections introduce the shared characteristics of both UIs.
a
b
Fig. 1. (a) The Search Token as reality-based UI; (b) Three Synchronized PCs as alternative UI
3.1 Interaction and Visualization Principles Data Collection and Visualization. We designed a visualization that arranges about 200 movie objects in a grid-structured canvas (see fig. 2a). In the default view the movie objects are displayed as poster representations. Semantic zooming was used to display three different levels of detail for each object (see fig. 2b). Each UI type provides the possibility to enter filter keywords, which trigger the semantic zoom of the matching information objects. All objects can freely be arranged on the canvas by dragging operations (either with touch or by mouse). Users are thus able to create personal clusters of intermediate search results for discussion in the group. Dynamic Query and Sensitivity. One important concept of the two UIs is based on dynamic filter mechanisms that bring interesting information objects to the users' attention using keyword-based dynamic queries [1]. To support multiple users during collaborative work in a co-located environment it is crucial that all actions of other
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
627
group members are accessible and can be comprehended by everyone involved in the information-seeking process. To address this issue, a filter method inspired by the concept of sensitivity [15] is used for our UIs, which is in contrast to common filter strategies that instantly hide all non-matching objects upon filtering. To express that a specific information object matches a user defined filter criterion the visual representation of this object is enlarged, emphasizing its importance to all collaborators. All information objects that do not match the filter query are decreased in size and transparency, allowing the user easily to visually distinguish between matching and non-matching information.
a
b
Fig. 2. (a) The visualization with about 200 media objects. (b) Matching media objects increase in their size and offer three semantic zoom levels representing different levels of detail.
The combination of multiple filter criteria is a fundamental concept to enable all collaborators to personally get involved in the search and exploration process. In the presented interfaces users are able to combine multiple filter criteria using Boolean operations. Following the concept of sensitivity described above, information objects that match more than one filter criterion are represented even bigger than the ones that only match a single filter criterion. The default operation that is used to combine the different filter criteria is a Boolean AND. Additionally, the users can interactively alter the weight of all filter criteria. While this allows much more powerful search operations, it can also enhance the collaborative process in the way that a collaborator is able to scale the weight of a specific criterion up or down to better communicate the corresponding aspects to the other collaborators. The mathematical model behind the weighting of the filter criteria is based on the concept of weighted Boolean [17]. In addition, a color highlighting mechanism visually links matching information objects with the corresponding filter criterion (see fig. 3). Each criterion has a distinct color, which is also used to highlight a matching keyword in the detail information of an object. The colors in the result view can be used to associate the important information objects with the user that is manipulating the corresponding filter criterion as proposed with the concept of collaborative brushing and linking [7]. Resize Algorithm. The resize algorithm (see fig. 3) applied in the two UIs is based on a simple mechanism [4]: Each filter consists of a keyword and a weight. The keyword can either match or not match a specific information object (when it is found or not in the object's metadata). The weight of each filter can be between 0 and 2 and corresponds to
628
M. Heilig et al.
the resize factor this filter will add to the size of a matching information object. Therefore, a weight between 0 and 1 will shrink all matching information objects, which corresponds to a (weighted) Boolean NOT operator. A weight between 1 and 2 will increase the size of all matching information objects. For example, a filter that matches a specific information object and has a weight of 0.1, would shrink this object to a tenth of its size, where as a weight of 1.6 would increase its size by 60%.
Fig. 3. Schematic diagram of the resize algorithm for media objects: The linear function (blue) is altered by a logarithmic correction (green) to enhance the objects’ zooming behavior
The real size of an information object is calculated out of all weights of every matching filter. For example if three different filters match an information object, one with a weight of 2, one with a weight of 1.5 and one with a weight of 0.8, the size of this object will be 2.4 times (2 * 1.5 * 0.8 = 2.4) of its default size. This simple algorithm is modified through the application of an logarithmic function (see fig. 3, green line) to help to solve two different problems: (1) Information objects that match a filter should get bigger even with low weights, so that more detail information can be shown early; (2) Once all detail information of an object is visible, the growth of this object should be damped so that it does not cover too much screen space. 3.2 Reality-Based User Interface: The Search Token The foundation of the Search Token as the first UI condition for the experiment is built upon a physical object that can be placed on a multi-touch tabletop display (see fig. 1a). The Search Token as hybrid interactive surface enhances the visibility of interaction with the system since their physical appearance provides a higher visual and tangible affordance than a UI that is solely based on digital sliders, text fields, buttons, etc. Similar to the Parameter Bars [16] a Search Token can be dynamically configured with different search parameters, thereby acting as filter to the information space. By placing a Search Token on the tabletop, it is augmented by a visualization. One Search Token consists of four main parts: the transparent Plexiglas cylinder (the physical object), the textbox for the filter keywords, the circular indicator for the weight of the entered search criterion and the virtual on-screen keyboard (see fig. 4a). This keyboard can be temporarily hidden to save screen space. The visualization is
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
629
virtually connected to the physical token like a digital shadow, following its movement on the screen. Moving and turning a token thus enables all participants around the tabletop to access the token’s visualization. When a search criterion is entered, rotating the Search Token allows users to define the criterion’s weight. Thereby, the circular indicator around the physical cylinder interactively shows the adjusted weight and the Plexiglas cylinder glows in the highlighting color (see fig. 4b). To combine several search criteria, multiple Search Tokens (three in the user study) can be used on the surface of the tabletop display. With regard to reality-based UIs [8], the Search Token qualifies as such as it incorporates main characteristics in a comprehensive manner: the physicality of people and objects, the social context as well as the environment.
a Fig. 4. (a) A Search Token as hybrid surface on a multi-touch tabletop display enable users to simultaneously enter search criteria via on-screen keyboards; (b) By turning a Search Token, the weight of a search criterion can be adjusted
3.3 Synchronized PC User Interface In contrast to the Search Token, we designed the second UI condition intentionally against the principles of reality-based interaction. However, as we wanted to especially focus on the mechanics of reality-based interaction, certain aspects should remain stable. This includes the co-located setting as well as the possibility for parallel interaction. Therefore, three PC-devices were triangularly arranged on a table that had a similar size as the multi-touch tabletop (see fig. 1b). Through a “real-time” synchronization of the visualization and filters on all clients, the participants share one logical view on the UI (see fig. 5a). Via the text boxes in the lower area of the screen the participants are able to simultaneously define search criteria (see fig. 5b). The sliders next to the search boxes allow defining the weight of the search criteria. This setting enables each collaborator to equally participate in the interaction through the use of mouse and keyboard belonging to the particular PC. With this setting the participants had the possibilities to see the faces, gestures and posture of the group members, to communicate with each other, and to interact simultaneously, comparable to the reality-based UI. Basically, the differences between the UIs are the following: (1) the form factor, (2) the physically merged interaction space, and (3) the interaction (mouse/keyboard versus Search Token and touch interaction). This allows us to isolate the mechanics of interaction (IV) for our experimental user study.
630
M. Heilig et al.
a
b
Fig. 5. (a) Participants share a synchronized logical view from different PCs and are able to simultaneously enter search criteria into the text boxes; (b) Via the slider widgets that are assigned to the text boxes, it is possible to adjust the weight of search criteria
4 Experimental User Study Our experimental user study was intended to explore in detail, how collaboration and peoples’ behavior in the context of information seeking would be affected by using either reality-based UIs or PC-based UIs. Figure 1 shows the setup of our experimental interfaces, as discussed in the previous chapter. 4.1 Participants and Design We used a between-subjects design (IV: UI, reality-based vs. PC) with 75 participants, who were randomly assigned to 25 groups of three (triads, 12 tabletop and 13 PC groups). Participants were students or university faculty (39 females and 36 males) from a variety of non-technical institutes. The average age was 26 (SD = 7.4 years). Triads are a typical setting for small groups working together on an information seeking assignment and our participants stated that they were familiar with such situations. Information seeking was also a very frequent task amongst them. Furthermore, nearly all of our participants had prior experience with using touch displays (e.g. smart phones). We decided to apply a between-subjects design as we identified several aspects that can have a significant and uncontrollable influence on the results of a within-subjects design: First, the novelty of a tabletop UI with tangibles might evoke a strong “wow”-effect and lead to a bias when putting the reality-based UI configuration in contrast to the PC-based UI. Second, group dynamics evolve over time as people get to know each other. Third, in within-subjects designs some participants tend to transfer strategies of the first UI to the following. Even a counter-balanced within-subjects design might not have been able to rule out such interaction effects. We also explicitly decided to divide participants into groups in which they did not know each other. While this may not reflect a real-world situation, it allows us a better level of control of inter-person relationships and their possible effects on group dynamics.
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
631
4.3 Procedure and Tasks After a short introduction, each group was given a five minute instruction to the respective UI followed by a five minutes free exploration phase to get to know the system and each other, before working collaboratively on a total of four tasks, two of them being training tasks. These training tasks required the participants to search for a movie object within the collection that matches various attributes (e.g. genre: “romance” or keyword: “murder”). Subsequently, the two “real” tasks were designed in a way to simulate a realistic negotiation situation in which compromises need to be made. They required participants to agree on a movie object within a limited amount of time. For each task every participant received one criterion during the first task or two criteria during the second task (e.g. keyword: “explosion”, genre: “crime”) representing his or her fictive interests. In order to simulate a realistic collaborative situation, the tasks were designed in a way that made it impossible for the group to satisfy the total of all criteria. Thus, all participants had to negotiate whose personal criteria to minimize (e.g. by reducing the weight for one or several criteria) or whose to give up completely. We analyzed the quality of these compromises by means of the results’ distance to the non-existing ideal compromise. However, the compromises’ quality didn’t differ significantly between the two conditions. A time limit of 5 minutes per task was used to control the session duration and increase participants’ motivation to come to a decision. We did not interrupt users before a final decision was made, since the time limit was not intended as a sharp criterion for the completion of a task. The mean duration of these tasks was 4:46 minutes (mean = 286s, SD = 74s). After completing the tasks, each participant filled out a personal questionnaire about their subjective assessment of the group work and the UI. Overall, the whole session per group took about 45 minutes and participants were compensated with 15 EUR. 4.5 Data Collection and Analysis We used a variety of data collection techniques including questionnaires (pre- and posttest), interaction logging as well as video and audio recording. Two video streams captured both the detailed interaction on the display and the overall group dynamics from a bird-eye view perspective. In case of the PC interface, a screen capture recording was used to show the detailed interaction on all displays. Each participant wore colored batches and bracelets, allowing us to easily allocate their interactions. Video and screen recordings were analyzed in detail. Based on several overall screenings, a complex coding scheme was developed, which focused on three aspects: (1) the individual interaction of participants with the system (e.g. typing in a keyword, moving an object), (2) the visual focus and attention of participants either to the system or amongst another (e.g. turning around, looking up), and (3) the kind of verbal communication between participants. The coding scheme was then applied to the last and most complex task with two given criteria for each participant. As coding the data in this depth is impossible within one session, we applied a multi-pass coding, with each pass focusing on one of the three aspects. To ensure inter-coder reliability, the material of 4 groups (two of each UI configuration) was encoded a second time by a researcher not involved in the study (statistically analyzed with Cohen’s Kappa, κ = 0.67). Additionally, we looked for interesting patterns and captured such scenes during the coding sessions.
632
M. Heilig et al.
5 Study Results In this section we describe, how the major interaction strategies, communication behavior and roles the participants adopted to solve the tasks differ between the two UI conditions. In the PC condition, each participant was in theory able to work with an individual PC. However, we did not enforce participants to spread around the individual PCs. Interestingly, two groups gave up the advantage to work simultaneously and shared one PC. Upon closer inspection, it became obvious that in these groups one participant took on a dominant role and mainly solved the tasks while the other group members showed a very cautious behavior. To allow a detailed and reasonable comparison between the interface conditions, we excluded these groups, leaving us 23 triads (69 participants, 12 Search Token and 11 synchronized PC groups). T-tests were used to analyze the data for statistical differences. 5.1 Interaction Strategy Simultaneous Interaction. Regarding our first research question (Q1, interaction strategies) we were interested in how often people interact in parallel. This aspect is reported as one major advantage of reality-based UIs (e.g. [6]). To compare the two conditions, we use percentages of task time as a normalized value. The results show that the reality-based Search Token condition featured more simultaneous interaction than the PC-based condition (see fig. 6). In 15.3% (SD = 4.97%) of the time all three participants interacted in parallel. In comparison, the PC-based condition showed this behavior in 3.45% (SD = 1.22%) of the time, with the difference between the conditions being statistically significant (t(23) = 1.78, p = 0.04; statistically analyzed on group level). Parallel interaction of two participants occurred with the Search Token condition in 47.3% (SD = 39.11%) of the time and happened therefore significantly more often (t(23) = 2.16, p = 0.004) than the synchronized PC UI (11.9%, SD = 8.01%).
Fig. 6. The diagram shows the time (in percent of the total task time), when three participants and two participants work simultaneously with the system
Interface-Element Sharing. We identified an interesting behavior of some participants in the Search Token condition. Without being asked, they occupied a Search Token, which was previously in use by another group member (see fig. 7).
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
633
Such a behavior never occurred in the PC-based condition (No participant used a text box or slider, which was already in used by another group member). This suggests that the threshold to intervene the interaction of others is lower with the Search Token than with the PC-based UI. We observed that such a behavior pushes the collaboration through a closer and mutual interaction: Once executed, the other participants imitated this behavior and also used Search Tokens of other participants on the tabletop display. However, in one case such “token-takeover” led to the other participants backing out and interacting less often with the system.
Fig. 7. The figure shows two exemplary scenes, where a participant is taking over a Search Token from another group member
5.2 Communication Verbal Communication. The second research question (Q2) asked for the impact of communication. We classified communication into process-dependent (strategic meta-contributions to proceed the task, e.g. “let’s sort these movie objects to the right!” or “I take the upper search box”), task-dependent (contributions to solve the task with regards to content, e.g. “Do you think ‘Gladiator’ is a biography” or “Is ‘American History X’ a cruel movie?”), no communication, and undefined communication. However, analysis showed no significant differences between the groups in terms of the different types of communication. For example, the triads in the Search Token condition featured ‘no communication’ in 14.63% (SD = 3.97%) of the time. In comparison to the triads in the PC-based condition (16.32%, SD = 4.47%) we detected no significant difference (t(23) = 2.08, p = 0.84). A deeper analysis with process-dependent, task-dependent communication, and undefined communication variables also revealed no significant differences. Non-verbal Communication. An important aspect of non-verbal communication is the visual focus of the participants during the group work as an indicator for attention. In both UIs the visual focus laid to a great extent on the display(s) of the system. The duration all three participants collectively having their visual focus on the system was 92.68% (SD = 21.34%) of the time in the Search Token condition. The groups in the synchronized PC condition shared their visual focus on the system’s display in 80.66% (SD = 24.58) of the time (t(23) = 2.08, p = 0.054). By further analyzing the video material for this phenomenon we detected that with the Search Token UI gestures and posture were perceived from the other group members without needing
634
M. Heilig et al.
to look up from the display. Whereas the participants in the PC condition had a lot of short interruptions in order to see and perceive the non-verbal expressions of the other group members. Furthermore, we noticed that several participants in the Search Token condition unconsciously used non-verbal activities to communicate involvement and active participation. They for example expressively held a Search Token (see fig. 8) and thus showed the other group members that they took part in the group work.
Fig. 8. The images show three examples of participants, which hold a Search Token in their hand without using it for interaction, but for communicating involvement
5.3 Roles of Collaboration Profiling. While chapter 5.1 and 5.2 elaborated on the differences between the UI conditions on a group level, we were also interested to see, if participants adopt different roles depending on the UI condition (Q3). For analysis, we generated a quantitative profile for each participant based on the encoded video material. This profile was composed of the same three dimensions as analyzed on group level, with (1) system interaction (2) visual attention and (3) verbal communication. The system interaction was subdivided into the behaviors: no interaction, filter action, and object manipulation and allowed us to get an understanding what type of interaction a person prefers. Visual attention is decomposed into no attention, attention to the system, and attention to other team members. Verbal communication is separated into the behaviors: no communication, process-dependent communication, task-dependent communication and undefined communication. The time in percent a participant showed one of the behaviors during the task session was plotted on the axis of a spider-gram. To recognize similar profiles, we printed out the profiles of the 69 remaining participants and asked two different experimenters to sort them independently into clusters after visual similar behavior pattern and without being aware of which plot belonged to which of the two interface configurations. Later on, we took the intersection of these clusters and analyzed the behavior of the participants in the video material. Participants that showed a behavior that did not fit to the other participants of a cluster were excluded (This happened only in 2 cases). We decided that a cluster had to contain a minimum of 6 participants to be regarded as a role (that corresponds to a probability of about 25% that a role occurred in one of the 23
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
635
groups). This way we extracted 5 roles (see fig. 9) with 39 of our 69 participants (56.5%). In the following, we present the key characteristics of each role. Interestingly, most roles can be used to easily distinguish between the two interface conditions. Thereby, this allows us to point out in detail how the interfaces affected peoples’ role behaviors.
Fig. 9. The spider-grams show the profiles of the five different roles that participants adopted during the group work
Overall with regard to solving the tasks we could distinguish between participants taking on a lead role and participants taking rather a cautious or passive role. What is interesting is that participants impersonated these roles differently depending on the interface configurations. We will first discuss the active/lead participants (role 1 & 2) and then continue with the more passive/cautious roles (role 3, 4, 5).
636
M. Heilig et al.
Leading and Active Participants Role 1: The Determined Pusher. The behavior of this role was adopted by at least one person in 6 (out of 12) groups in the Search Token condition but only one group (out of 11) in the PC-based condition. The determined pusher is a very active participant and tries to engage the other team member to work together and to solve the task. Further, the participant is very attentive and frequently contributes verbally task-dependent (e.g. “Let’s inspect the movie Gladiator”), but also strategic (processdependent, e.g. “I propose to delete all criteria!”). This role features a lot of filter actions to communicate own ideas how to solve the task. However, the participant involves the other group members as well through discussion, gestures, and in case of the Search Token condition, through the sharing of Search Tokens (see chapter 5.1, Interface-Element Sharing). Role 2: The Inquiring Sorter. The counter-part to the determined pusher is the inquiring sorter. In 7 groups of the PC-condition at least one person adopted this role while this happened only once in the Search Token condition. Similar to the determined pusher, these participants try to animate the other team members to actively take part in group work through intensive and motivating feedback on verbal contributions and actions of other group members. However, and contrary to the determined pusher, the interaction with the system did not focus on filter activities. Instead the inquiring sorter interacted with the virtual media objects in the visualization (e.g. sort the objects matching after search criteria) to highlight special correlations in the collection. As discussed in the “interface-element sharing” incident, these participants also did not take over the search boxes from other participants. Discussion. We conclude that the Search Token condition allowed active lead users to take on a more dominant role within the groups. They took the chance to influence or even control the strategy, the interaction (by controlling the physical tokens and filter keywords directly) and also the overall group participation (by handing over tokens or using tokens to communicate). In contrast, such participants in the PC condition seemed to be limited in their influence on the system interaction and mainly focused on highlighting search results and sorting in order to take on the lead role. We assume that one important reason for this phenomenon is that the reality-based interface is shared in its physical entirety. Thereby, conflicting activities (e.g. two persons reaching for the same token) can be easily resolved by accepted and wellestablished social protocols. The shared, but virtual PC condition makes resolving such conflicts much more difficult. For example it can easily happen that two persons try to interact with the same search box at the same time. The physical awareness of the others’ actions is missing, which leads to conflicting interactions. Therefore, we think that dominant persons avoided these conflicts and thereby had fewer possibilities to influence the group activities. However, one has to be aware of that dominant persons in a reality-based condition can also have a potentially larger negative impact on the group.
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
637
Cautious and Passive Participants Role 3: The Cautious Attendee. This role emerged mainly in groups, which were in the PC condition (4 groups compared to 2 groups in the Search Token condition). Participants that adopted this role can be characterized as persons easily conceding to strategy decisions of other group members. While they take part in task-dependent communication, they often abandon own ideas and mainly say something to support the decision and interaction of other group members (e.g. “That’s right.”, “Yes, these are the two movies”). From an interaction point of view, they only engage with the system during the initial phase when every group member enters their keywords, but stay passive during the refinement and consolidation phase. Role 4: The All-Accepting Follower. The counter-part to the cautious attendee adapted the role of an “all-accepting follower”. This role, which emerged in 4 groups of the Search Token UI and only once with the synchronized PC UI, is similarly characterized by an incessant acceptance and agreement on the strategies of the other group members. While these participants seem even more cautious in verbal communication (e.g. “Yes, that’s my opinion, too.”, “This one is also a movie with a murder, right?”), they did use the Search Tokens to interact with the system. This happened most of the time in parallel with another team member, following the lead of this person. Role 5: The Interested Observer. Interestingly, while sharing some of the characteristics of the cautious attendee and the all-accepting follower, we could identify an additional different role within the Search Token condition (6 groups). Most of the time, these participants simply observed the system interactions of other group members in a very interested manner (especially during the refinement phase with sorting and arranging of objects). Interaction occurred mostly in the early phases through pre-decided filter actions and with a Search Token they had placed by themselves on the tabletop. However, in later phases, while they actively participated in group work through task-dependent verbal contributions (e.g. “What movie did we have earlier?”, “That is an action movie!”, “No, this movie doesn’t match our criteria!”), they left the execution of proposed strategies to the other group members. Discussion. We conclude that the cautious/passive participants in the reality-based condition capitalized on a broader variety of means of expression to take part in the group activities. On the one hand, as discussed in section 5.2 about non-verbal communication, they were able to show their involvement in the group activities by postures, gestures, and just holding a token. On the other hand, they were much more active regarding filtering the information space by using the tokens, even in later phases. In contrast, such participants in the PC condition were only active in the beginning of a task and then seemed to use the PC monitor as privacy shield, allowing them to stay passive without having to fear any consequences. However, cautious participants in the reality-based condition were also more in danger of being pushed out of the group. As discussed in section 5.1, dominant participants sometimes took over the Search Token of others. In a few of these cases the now “tokenless” and cautious participants retired their selves from the collaboration completely.
638
M. Heilig et al.
6 Conclusion We presented an extensive experimental user study that provides a rich understanding of the influence different interface types (reality-based versus PC-based) can have in a collaborative information seeking situation. With respect to our research question Q1 (interaction strategies) we conclude that participants working with the reality-based UI, developed a wider variety of information seeking strategies such as interfaceelement sharing or simultaneous interaction compared to the participants in the PC condition. We assume that this is caused by the natural and “materialized” interaction and consequential qualities (e.g. physical awareness and manipulation). Concerning our second research question (Q2: communication) we identified mixed findings: As we analyzed the groups’ verbal communication, we didn’t determine noticeable differences between the two interface conditions. These results fit very well to the findings of Marshall et al. [11], who showed that verbal participation in group work is not constrained by the type of input. However, regarding non-verbal communication, we observed that in contrast to the PC condition the participants in the reality-based condition seamlessly perceived gestures of their group members and used the physical artifacts to communicate and produce meaning. We further detected five different roles (Q3: roles of collaboration), which allow to easily distinguish between the two interface conditions. Participants, although having similar personalities (e.g. dominant/active persons or cautious/passive persons) often adopted different roles depending on the interface condition they were using. One determining factor for this phenomenon was the emerging social environment triggered by the reality-based UI. Another one was the apparent, multi-faceted possibility to physically express and communicate ideas through tangible interface elements (e.g. interface-element sharing). During the user study some interesting alternative set-ups of our experiment with respect to the PC condition came up that would be beneficial to deepen the insights of our findings and would disclose further research questions: 1) One shared PC with a single large display, that can be controlled by multiple keyboards and mice; 2) the participants around the table will be equipped with individual tablet PCs that share a synchronized view and can be controlled by touch input. Altogether, this paper demonstrates that the application of reality-based interfaces tremendously alters the behavior of collaborators in small groups across multiple dimensions.
References 1. Ahlberg, C., Williamson, C., Shneiderman, B.: Dynamic queries for information exploration. In: CHI 1992, pp. 619–626. ACM, New York (1992) 2. Amershi, S., Morris, M.R.: CoSearch: A System for Co-located Collaborative Web Search. In: CHI 2008, pp. 1647–1656 (2008) 3. Gibbs, R.W.: Embodiment and Cognitive Science. Cambridge University Press, Cambridge (2006) 4. Heilig, M., Demarmels, M., Allmendinger, K., Gerken, J., Reiterer, H.: Fördern realitätsbasierte UIs kollaborative Rechercheaktivitäten? In: Mensch und Computer 2010, pp. 311–320 (September 2010) 5. Hinrichs, U., Carpendale, S., Scott, S.D.: Evaluating the effects of fluid interface components on tabletop collaboration. In: AVI 2006, pp. 27–34 (2006)
Hidden Details of Negotiation: The Mechanics of Reality-Based Collaboration
639
6. Hornecker, E.: Understanding the Benefits of Graspable Interfaces for Cooperative Use. In: Coop 2002, pp. 71–87. IOS Press, Amsterdam (2002) 7. Isenberg, P., Fisher, D., Morris, M.R., Inkpen, K., Czerwinski, M.: An Exploratory Study of Co-located Collaborative Visual Analytics around a Tabletop Display. In: VAST, pp. 179–186. IEEE Computer Society, Los Alamitos (2010) 8. Jacob, R.J.K., Girouard, A., Hirshfield, L.M., et al.: Reality-Based interaction. In: CHI 2008, p. 201. ACM Press, New York (2008) 9. Jetter, H.-C., Gerken, J., Zöllner, M., Reiterer, H., Milic-Frayling, N.: Materializing the Query with Facet-Streams – A Hybrid Surface for Collaborative Search on Tabletops. In: CHI 2011, Vancouver, BC, Canada, May 7-12 (2011) 10. Kuhlthau, C.C.: Seeking meaning: a process approach to library and information service. Libraries Unlimited (2004) 11. Marshall, P., Hornecker, E., Morris, R., Dalton, N.S., Rogers, Y.: When the fingers do the talking: A study of group participation with varying constraints to a tabletop interface. In: 2008 Workshop on Horizontal Interactive Human Computer Systems, pp. 33–40. IEEE, Los Alamitos (2008) 12. Morris, M.R., Paepcke, A., Winograd, T.: TeamSearch: Comparing Techniques for CoPresent Collaborative Search of Digital Media. In: First IEEE International Workshop on Horizontal Interactive Human-Computer Systems (TABLETOP 2006), pp. 97–104. IEEE, Los Alamitos (2006) 13. Morris, M.R., Lombardo, J., Wigdor, D.: WeSearch. In: CSCW 2010, p. 401. ACM Press, New York (2010) 14. Morris, M.R., Teevan, J.: Collaborative Web Search: Who, What, Where, When, and Why. Synthesis Lectures on Information Concepts, Retrieval, and Services 1 1, 1–99 (2009) 15. Tweedie, L., Spence, B., Williams, D., Bhogal, R.: The Attribute Explorer. In: CHI 1994, pp. 435–436. ACM, New York (1994) 16. Ullmer, B., Ishii, H., Jacob, R.J.K.: Tangible Query Interfaces: Physically Constrained Tokens for Manipulating Database Queries. In: INTERACT 2003, pp. 279–286. IOS Press, Amsterdam (2003) 17. Waller, W.G., Kraft, D.H.: A Mathematical Model of a Weighted Boolean Retrieval System. Information Processing & Management 15 5, 235–245 (1979) 18. Weiss, M., Wagner, J., Jansen, Y., Jennings, R., Khoshabeh, R., Hollan, J., Borchers, J.: SLAP widgets. In: CHI 2009, p. 481 (2009)
A Tactile Compass for Eyes-Free Pedestrian Navigation Martin Pielot1, Benjamin Poppinga1, Wilko Heuten1, and Susanne Boll2 1
OFFIS Institute for Information Technology, Escherweg 2, 26121, Oldenburg, Germany {martin.pielot,benjamin,poppinga,wilko.heuten}@offis.de 2 University of Oldenburg, Escherweg 2, 26121, Oldenburg, Germany [email protected]
Abstract. This paper reports from the first systematic investigation on how to guide people to a destination using the haptic feedback of a mobile phone and its experimental evaluation. The aim was to find a navigation aid that works hands-free, reduces the users’ distraction, and can be realised with widely available handheld devices. To explore the design space we developed and tested different prototypes. Drawing on the results of these tests we present the concept of a tactile compass, which encodes the direction of a location "as the crow flies" in rhythmic patterns and its distance in the pause between two patterns. This paper also reports from the first experimental comparison of such tactile displays with visual navigation systems. The tactile compass was used to continuously display the location of a destination from the user’s perspective (e.g. ahead, close). In a field experiment including the tactile compass and an interactive map three conditions were investigated: tactile only, visual only, and combined. The results provide evidence that cueing spatial locations in vibration patterns can form an effective and efficient navigation aid. Between the conditions, no significant differences in the navigation performance were found. The tactile compass used alone could significantly reduce the amount of distractive interaction and together with the map it improved the participants’ confidence in the navigation system. Keywords: We Mobile Accessibility, Multi-Modal Interface, Novel User Interfaces and Interaction Techniques.
A Tactile Compass for Eyes-Free Pedestrian Navigation
641
Fig. 1. A "tactile compass" encodes in which direction and how far a place is located from the perspective of the user by vibration patterns generated with a common mobile phone. Unlike wide-spread turn-by-turn wayfinding systems, this tactile compass allows a more traditional sense of navigation, empowering people to revive their inherent navigation skills.
According to a study by Madden and Rainie [11] one in six adults report to have physically bumped into another person because they were distracted using their phone. The "iPod Zombie Trance", i.e. listening to loud audio content via headphone, may result into a dangerous loss of situational awareness. Australian authorities believe that this loss is responsible for the still increasing number of pedestrian fatalities1. Beyond safety considerations, having to repeatedly looking at a tiny display to stay oriented is a stark contrast to the experience of a good hike. Auditory displays may be overheard when there is too much noise, or they may interrupt other actions, such as enjoying the hike or a chatting with a co-traveller. To support navigation and orientation in a less distracting and obtrusive way, the sense of touch has been studied as a communication medium. It has been shown that vibro-tactile stimuli around the torso can easily be interpreted as pointing directions [27]. This kind of information presentation technique has successfully been used to "drag" people towards waypoints [28] while reducing the distraction at the same time [3, 14]. With respect to map-based navigation it has been shown that cueing the general direction of the destination can improve the navigation performance and reduce the distraction [15, 22]. Supporting map-based navigation is desirable, since turn-by-turn instructions may disengage users from the environment [9], lead to a worse understanding of the environments spatial layout [1], and are not necessarily more effective than maps [7]. However, the majority of the above approaches require special hardware (e.g. [28, 3, 14]) and are thus hard not widely available. Other approaches or they require proactive interaction [13, 19] to obtain the spatial information, which may be undesired 1
be the user at times. In this paper we therefore propose and investigate the concept of a tactile compass, which cues spatial information by the sense of touch and neither requires neither special hardware nor explicit interaction. With the help of focus groups and qualitative studies we developed a set of vibration patterns that indicates the general direction and distance of a geospatial location. Similar to a traditional compass needle pointing northbound, it can be used as a tactile compass to point towards any geographical reference point, such as the travel destination (see Fig. 1). This paper reports from a systematic investigation in how to convey geospatial locations with a handheld device and our resulting tactile compass. Further, it reports from the first experimental evaluation of such tactile user interfaces comparing it with a map-based navigation system. We provide evidence that the tactile cues can reduce the distraction by decreasing the amount of interaction with the device.
2 Related Work The Surveying the previous work in the field of pedestrian navigation Tscheligi and Sefelin [25] argue that landmark-based wayfinding and the consideration of the context of use are two major prerequisites for the success of pedestrian navigation systems. To address the context of use using non-visual modalities for providing spatial cues has been studied for many years. Holland et al. [6] proposed AudioGPS, where the general direction of a destination is encoded in 3D spatial audio delivered by headphones. In Strachan et al.’s GPS tune [23] a similar approach was taken. Directions are encoded in the panning of an audio track. Both groups did user trials which showed that such interfaces are effective. If users do not want to impair their auditory perception, e.g. to avoid the loss of situation awareness, the sense of touch offers a viable alternative. Using tactile displays to deliver spatial information has been proposed more than a decade ago [24]. A popular example of such tactile displays come in the form of waist belts, such as Tsukada et al.’s [26] ActiveBelt. Eight vibro-tactile actuators are sewn into the fabrics of a waist belt. If it is worn, tactile stimuli can be produced all around the wearer’s torso. As reported by van Erp et al. [27] those stimuli can intuitively be interpreted as horizontal directions. For example, a vibration near the navel is perceived as pointing forward. This type of spatial information presentation has successfully been used to facilitate waypoint navigation [28] or to cue the location of several objects [10, 4, 16]. It could be shown the even the complex visualisation of the location of the team mates in a computer game can be effectively processed despite demanding foreground tasks and lead to a significantly increased situation awareness [16]. It also could be shown that cueing the general direction of a travel destination can improve map-based navigation in virtual environments [22] and the real-world [15]. Recently, researchers have started to investigate to use the pager motors available in common mobile phones to convey spatial information with devices that are more widespread than tactile waist belts. An emerging interaction technique that can be realised with simple audio or tactile feedback is "scanning" the environment via pointing gestures. Recent examples are SoundCrumbs [12] in combination with an auditory feedback and Sweep-Shake [18] in combination with tactile feedback. The user points the device into the supposed direction of e.g. a landmark and receives
A Tactile Compass for Eyes-Free Pedestrian Navigation
643
auditory or tactile feedback when pointing into the right direction. By pointing into different directions users can scan for the landmark. Several studies have shown that people can reach given places with this interaction technique [13, 19, 29]. Both, tactile waist belts and scanning with tactile feedback have shown to be successful to support navigational tasks, but both have their drawbacks. Tactile waist belts are very intuitive but hardly available. Even if a person owned such a display it would be unlikely that it is carried along all the time. The scanning technique, on the other hand, requires nothing but a mobile phone with a built-in magnetometer. However, the need to actively point the device to probe for the spatial direction may be tedious over time. What is still missing is a method to convey spatial directions (similar to the above described tactile belts) with a single tactile actuator, so that the tactile display (e.g. as part of the mobile phone) requires no active interaction. The question is if such a display can be realised in a way that is easy enough to effectively be used without extensive training.
3 Design of the Tactile Compass In this section we describe our design that explores how to convey geospatial locations with a single vibration motor and without required explicit interaction. To underpin our design decisions we first elaborate the requirements and constraints given by the pedestrian navigation task and the device limitations. We then describe how we investigated the design space to find a set of possible solutions and illustrate the prototypes we built. We then present the results from a qualitative outdoor evaluation testing these prototypes. Drawing on our findings we illustrate the final design of our tactile compass. 3.1 Requirements and Constraints Use of a Tactile Display. As mentioned in the introduction, interacting with mobile devices may be seriously distracting the user. According to a study by Madden and Rainie [11] one in six cell-owing adults report that they have physically bumped into another person because they were distracted by talking or texting on their phone. At the same time, studies showed that tactile displays used to conveying navigational information can reduce this kind of distraction significantly [3, 14]. Thus, the location of the destination to be displayed by the tactile compass should be conveyed by the sense of touch. Support Navigation by Survey Knowledge. In modern, commercial wayfinding systems routes are typically organised as a set of waypoints. The geometric location of the waypoints in conjunction with an underlying street network can be used to generate turning instructions, such as "now turn left". However, this form of navigation guidance has shown to disengage users from the environment [9] and make people feel "bossed around" [14]. In addition, it does not take into account that pedestrians are often not as confined to the street network as cars are, e.g. when they are on a hike or freely exploring a city centre. With the design of the tactile compass we therefore aim at supporting navigation in its traditional sense. Instead of giving travellers the fastest or shortest route
644
M. Pielot et al.
to a destination within a confined street network, the tactile compass should support travellers by indicating the direction of a geographical location "as the crow flies". Thus, it supports classic survey-based knowledge, where the traveller uses a mental map formed of a gestalt-like network of the relative locations of landmarks [8]. As spatial information always describes the relation between two objects, the location shall be described in relation to the user. This egocentric cueing is most easy to interpret and does not require considering any further reference points. Encode Spatial Information in the Tactile Stimulus Only. The pointing-interaction proposed in previous work [13, 19, 29] would already propose a viable solution to the above requirements. However, pointing means that the user has to interact explicitly with the mobile device to find the location of the destination. As reported in [19] users may desire constant tactile feedback but are not willing to constantly hold the device in the hand. Thus, the tactile compass should work hands-free. If no explicit interaction, such as pointing, shall be required the spatial information has to be encoded by the tactile stimulus only. While the visual and acoustic feedback can provide a huge variety of information, tactile information presentation is limited in multiple ways. On the one hand, humans can only perceive vibro-tactile stimuli if the actuator can stimulate the skin. On the other hand, the tactile feedback can only be varied in limited ways, depending on the used actuator technology. Most mobile phones only allow turning the electric motor on and off, which generates vibration through an off-centred weight attached to the motor axis. Thus, only vibrations with different length or rhythms can be created. Other parameters (see [2]), such as the frequency, the amplitude, or the waveform of the vibration cannot be altered. 3.2 Investigation of the Design Space In order to investigate how to cue the location of a place in relation to the user’s location we conducted a focus group with five colleagues from our research group. We brainstormed potential solutions, identified common concepts, and derived five prototypes. The most prominent aspect we identified was that every method we could think of either represented direction or distance information. Examples of this are describing locations by their distance and their azimuth with respect to a person using the clock face, such as "2 o’clock, 150 metres". All other approaches would require external means of geospatial referencing, such as landmarks or GPS coordinates. The other prominent aspect was whether the presentation was binary or if it had multiple levels. For example, being near the destination versus being not near the destination would be a binary distance representation. The distance to a destination in metres would be an n-ary presentation. While some of these combinations are reasonably suited to the problem, others are obviously insufficient to guide a pedestrian efficiently to a destination. Indicating in a binary way whether the user is at the destination or not would not be much help in reaching the waypoint in the first place. 3.3 Implementation of Prototype Methods In summary we assumed that conveying absolute distances, distance changes over time, and general directions were the most promising ways of guiding a person to a
A Tactile Compass for Eyes-Free Pedestrian Navigation
645
place or destination. We used these elements of our design space to construct five different methods to guide people to a given geo location. Technically, each method allows specifying a destination by a pair of latitude/longitude coordinates. Further, each method is fed with the GPS signal of the mobile phone, so it knows the user’s geo-location, heading and walking speed. For each method, we had to design a mapping from the information that shall be presented to a set of rhythm patterns. Further, we already optimised the methods were it seemed suitable although that may have diluted the "pure" aspects. Approaching/Departing. This method focuses on conveying the relative distance to the location as it changes over time. It indicates whether the user is walking towards presented location or not. User can reach the destination following the "walking towards the location" signal. Walking towards the waypoint is encoded in a single long (240 ms) pulse. Not walking towards the location is encoded by two short pulses of 120ms with a pause of 120ms between them. We defined "approaching" when the location is within a cone of 2x60° in front of the user. Hot’n’Cold. This method draws on the child game hot ’n’ cold. It indicates the absolute distance to the location. By moving into the direction where the signal gets "hotter" the user will eventually reach the destination. Therefore the method continuously generates a single tactile pulse of 120ms. The distance between the user and the location to each is encoded in the pause between the pulses. The closer the user gets to the waypoint the shorter the pause becomes. The pause durations range from 5000 ms (1000 metres or beyond) to 300 ms (reached the destination). Left, Right, Ahead. This method encodes if the waypoint is left, right, or ahead of the user. The waypoint being left of the user is encoded in two pulses of 120ms. The waypoint being right of the user is encoded in three pulses of 120ms. To avoid having the user running zigzag we introduced a small frontal corridor, which indicates if the location is in a cone of 30° in front of the user. This case is encoded by a single 120ms pulse. Continuous Direction. This method encodes the exact direction of the location in the 360° full circle. It therefore creates a rhythm pattern that is altered continuously depending on the direction to present. If users are able to interpret the rhythm patterns they can just "read" the direction of the location and head into the respective way. The basic principle of the method is to encode the direction in the relative lengths of two vibration pulses. If the waypoint was dead ahead both pulses have a length of 80ms. If the location is right of the user, the length of the second pulse is increased. The further right the location gets the longer the second pulse becomes. Locations to the left of the user are displayed by increasing the length of the first pulse. During our repeated early tries, we found that it was necessary to communicate when a waypoint is being reached and when the GPS signal is becoming too bad. Thus, each method implemented two basic Tactons: A short reoccurring pulse followed by a long pause indicates insufficient GPS signal quality. A sequence of three pulses (short, long, short) was used to announce that the destination has been reached: the first and the last pulse had a duration of 80 ms, while the middle pulse had a duration of 500 ms. The pause between those pulses was 120 ms.
646
M. Pielot et al.
3.4 Qualitative Evaluation of Designs We conducted a pilot study to figure out, which of the design approaches is most suited as a tactile compass for navigation. Therefore, the four methods were prototypically implemented on a Nokia N95. In addition, the prototype allowed us to define and store geo locations. One then could select a geo location and choose a method to display its location. Method. The pilot study was conducted in a residential area near the OFFIS Institute for Information Technology in Oldenburg, Germany. The geo coordinates of four places were designated as destinations and stored in the prototype. Seven participants, who were partially familiar with the covered area, took part in the study. Each participant had to reach all destinations in the same order. For every destination another method was used. To avoid order effects the order of methods was randomised amongst the participants to avoid sequence effects. Beyond the tactile feedback the participants had no other cues about the location of the destination. Qualitative data was collected through thinking aloud as well as video recording. All participants signed an informed consent prior to the study. At the end of the study the participants were asked to rank the design approaches and discuss their impressions about each of them. Results and Discussion. In all but four occasions the participants were able to reach the given destination. In general, all methods were found to be reasonably effective. The four breakdowns were instances of a participant standing on one side of a building while the destination was located on the other side. Many participants did not like the distance-based methods since they required them to actively seek the destination. Direction-based methods were preferred over distancebased methods. Five informants named the Continuous Direction method as the most preferred one. Although it was found to be the most complex method, the informants appreciated its rich feedback. In particular, informants appreciated that they could observe how the direction of the destination slowly changes while they are moving. In some cases obstacles such as larger buildings or blind alleys with the destination point behind them lead to confusion. In order to avoid the obstacles, the participants had to change their direction and accept the advice that their direction is wrong until they passed the obstacle. This caused irritations because the participants were not happy with foregoing the instructions of the system. It was stated that more waypoints on the route could help to resolve this issue. However, there are a few limitations to the results. The methods we tested are only instances of the design space we laid out above. Neither the design space is necessarily complete nor might the instances have been the best examples for the entities of the design space. Nevertheless, the results suggest that participants prefer direction-based methods with rich information and are willing to accept complexity. Thus, one challenge would be how to encode as much information as possible in the tactile cues, before they become too complex to be understood. 3.5 The Tactile Compass’ Design Drawing on the results from the above qualitative evaluation of the design methods we built a method that combined the "best of all worlds". We based it on the
A Tactile Compass for Eyes-Free Pedestrian Navigation
647
Continuous Direction method, as this method was preferred by the users. As seen in Figure 2 we advanced the method by indicating locations behind the user in three short pulses. In addition, we added an absolute distance cue similar to the one we used in the Hot ’n’ Cold method. The method therefore alters the pause between the patterns. The closer the user gets to the destination the shorter the pause becomes (see Figure 3). To improve the ability to discriminate the pulses we increase the length of the short pulses and the pauses between the single pulses of a pattern to 120ms.
Fig. 2. The pointing direction is encoded in the relative length of two pulses. In this illustration the geographic reference point is somewhat to the right-hand side, so the first pulse remains short while the length of the second pulse is slightly increased.
Fig. 3. The distance is encoded in the pause between a set of pulses. The shorter the pause becomes the closer the user is to the presented location.
4 Field Study The tactile compass was evaluated as a navigation aid in a field study. The study took place in a city forest. Fourteen participants had to reach a given destination, either by a
648
M. Pielot et al.
map, by the tactile compass, or both navigation aids. The goals where it investigate (1) if the tactile compass can effectively guide pedestrians to a given location, (2) how well it can keep up with a map, and (3) if the tactile cues have a positive effect on the participants distraction. 4.1 Material For the baseline we had to choose between a turn-by-turn navigation system and a map. We decided for the map for two reasons: first, maps are still used often and heavily relied on, in particular in an exploration scenario. Second, Ishikawa et al. [7, 20] suggest that maps are still superior to navigation systems in terms of navigation performance. To provide the map we used a custom built application that displays the user’s position and walking direction on a map layer using OpenStreetMap data. The evaluation took place in a city forest. The area offers lots of winding paths and combines dense forest with a couple of open meadows that could be used as shortcuts. Additionally, the forest contains lots of landmarks that we used to measure how attentive the participants could remain with respect to the environment. For the evaluation we defined three places (see Figure 4). The application could be configured to show one of those places on the map and display its location through the tactile compass. No route was displayed on the map and no turning instructions were given by the application. Thus, the participants had to find their own way to the given destination. The places were arranged in a way that there was never a direct path or line of sight between the start and the place to reach. The route always included detours (unless the participants walked cross-country). Thus, the task of reaching the destination was never trivial for the participants.
Fig. 4. Left: the city forest that we used as evaluation environment: the dots mark the three places we used as destination in the navigation tasks. Right: a participant during the field study while learning the tactile compass’ vibration patterns.
4.2 Participants Fourteen participants (four female) took part in the study. Their age ranged from 14 to 53 with an average of 28.25 (SD 11.51). According to the Santa Barbara
A Tactile Compass for Eyes-Free Pedestrian Navigation
649
Sense-of-Direction Scale (SBSOD) [5] they reported an average sense-of-direction (3.07, SD 1.14, with possible scores ranging from 1 to 5). All of the participants had little or no knowledge about the spatial layout of the evaluation environment. Since the evaluation was video-recorded the participants signed an informed consent. No payment was provided for the participation. 4.3 Design The independent variable was the type of the navigation aid. To isolate effects caused by the map and by the tactile cues, we used three levels: {map, tactile, and map & tactile}. Map denoted the condition where the participants only used the map displayed on the mobile device’s screen. In the tactile condition the device only presented direction and distance of the destination via a tactile compass as described in Section 3.5. The map & tactile condition combined the use of the visual map and the tactile compass. We used a within-subjects design with all participants using all three navigation aids in random order. The following measures were taken in order to assess the navigation performance and distraction in each condition: Navigation Performance. Similar to what has been reported in previous studies [21, 20, 15] on evaluations of navigation systems we measured navigation performance in terms of completion time and occurrences of disorientation events. The time the participants needed to reach the destination from the starting point of each condition was considered as completion time. Disorientation events were defined as situations where the participants stopped for more than 10 seconds, or for 5 seconds when they expressed their disorientation verbally. In addition the participants were asked to rate on a five-point Likert scale how confident they were with their navigation decisions. We did not measure navigation errors, since there was no "correct" route, and thus the concept of a navigation error made no sense. Distraction. For measuring distraction we combined three measures. First, the participants were asked to count the number of benches they saw during the study. From the number of the detected and the missed benches we then computed a detection rate. Second, the participants rated how much they felt distracted by the device on a five-point Likert-scale. Third, we measured how long the participants visibly interacted with the device. Visible interaction was defined as any situation with where a participant interacts with the mobile device in a way that was visible to the experimenter. Even a mere glance at the display was considered as interaction as long as it was clearly perceivable by the experimenter. We divided the cumulated time of visible interaction by the completion time to get a time ratio for each condition. 4.4 Procedure Prior to the study, informed consents were sent out to the potential participants. Only those participants who signed the consent forms were invited to the study. Initially, the experimenters re-explained the tasks (navigate to the given location using any path they want). The participants were introduced to the three conditions and allowed to train them.
650
M. Pielot et al.
Prior to the first session, the participants were informed that they had to count all the benches they could find, and report the number once they had reached the destination. The places shown in Figure 4 had to be reached in the given order (1, 2, and 3). During the navigation task one experimenter followed the participant in some distance and recorded their action with a video camera. Once the participants had reached a destination the completion time was noted. The participants were asked to report the number of detected benches and rate the subjective level of confidence about the navigation decisions and the subjective level of distraction for the current navigation aid. Afterwards, the condition was switched and the next place was selected as destination, until all three places had been reached. The whole procedure took about 45 minutes for each participant.
5 Results All participants reached the destinations in all trials within a reasonable amount of time. No notable performance breakdown was observed in any of the conditions. In the following we report our quantitative findings as well as participant comments and our observations. 5.1 Quantitative Results The quantitative results were extracted from the questionnaires and the video recordings. Table 1 shows the mean results for every dependent variable grouped by the condition. Statistical significance was analysed using ANOVA and Tukey posthoc tests for the ratio variables and the Friedman Test and Benferroni corrected Wilcoxon Signed Rank tests for the questionnaire’s Likert scale results. Table 1. Quantitative average results by condition. The subjective measures are the results of five-point Likert-scales, where 5 means highest. Bold faced numbers indicate that a significant effect on its condition was found.
Measure/Condition Navigation Performance Completion time (s) Disorientation ev. (#) Confidence * Distraction Interaction time (%) *** Benches discovered (%) Subj. Distraction
map
tactile
map & t.
361.6 0.21 4
464.3 0.50 4
398.6 0.29 5
35.75 46.36 3
6.41 50.41 3
27.82 41.26 3
Navigation Performance. There was a significant effect of the navigation aid on the subjective confidence of the participants (Χ2(2) = 8.45, p < .05). It was significantly higher when both navigation aids were used in combination (both p<.01). The
A Tactile Compass for Eyes-Free Pedestrian Navigation
651
difference between map and tactile was not statistically significant. Otherwise, no significant effects on the navigation performance were found. Neither the completion time (F2 = 1.08, p = .35) nor the number of disorientation events (F2 = .58, p = .56) differed significantly between the conditions. Distraction. There was a significant effect on the interaction time (F2 = 15.26, p < .001). A Tukey HSD post-hoc test showed that the participants interacted significantly less often with the device in the tactile condition compared to the map condition (p<.001) and the map & tactile condition (p<.001). The difference between map and map & tactile was not significant. With respect of the number of benches found, no significant effect could be observed (F2 = .48, p = .62). Similarly, the difference in the subjective distraction was not statistically significant (Χ2(2) = .894, p = .64). 5.2 Comments and Observations Visible Interaction. It did not bring any value to the participants, but still we found instance of visible interaction in the tactile condition. Examples for visible interaction where for examples participants unlocking the screen saver (although there was nothing to see), playing with the phone’s slider, or simply looking at the display. Cross Country Walking. Most participants stuck to the given paths. This is surprising as the city forest contains many open areas and people are allowed to enter them. Only a few participants walked cross country. This always involved the tactile compass and mostly happened in the tactile condition. These participants also "scored" the fastest overall completion times. Training Effect. Since all participants used the tactile compass two times, in the tactile condition and in the map & tactile condition, we could observe the learning effect in our quantitative results. They results show that in terms of navigation performance participants often performed better with the tactile compass, when they used it for the second time. However, the differences were not statistically significant. Overview vs. Direction Cueing. Four participants stated that they had missed having an overview of the environment in the tactile condition. They said that the map in addition gave them an impression about further waypoints on the route, which improved their confidence. Three participants said that they preferred the combination of both navigation aids, since they cancelled out each other’s weaknesses and provided the richest set of information.
6 Discussion All navigation aids, the tactile compass, the map, and the combination of both, allowed the participants to effectively reach the given destinations. The tactile compass could significantly reduce the time participants interacted with the handheld device. Combining both navigation aids significantly increased the participants’ subjective confidence in the system. The results show that the approach of conveying a general direction by rather unintuitive vibration patterns can form an effective navigation aid. This goes in line
652
M. Pielot et al.
with previous findings. Several studies showed that conveying general directions are sufficient to guide a traveller to a destination [13, 18, 19]. This paper extends the previous work by reporting the first experiment comparing this navigation technique together with and against a map. Although the participants were not familiar with the tactile compass and the fact that it conveys less overview than a map, we did not find a significant disadvantage in the objective navigation performance. We carefully conclude that the tactile compass is not only effective but also reasonably efficient. Navigation Performance. Previous studies show cueing the destination’s direction by an egocentric tactile cue can improve the efficiency of navigating with a map [15, 22]. In these studies the participants were wearing a tactile waist belt pointing at the destination while navigating by a map. In both study environments, a virtual world [22] and a village [15], the navigation performance significantly improved. This study could not confirm these findings, except for the improved confidence of the participants. The two relevant differences between these and the study presented here are the environments (virtual environment and village vs. forest) and the used tactile display (tactile waist belt vs. tactile rhythm patterns). The environment of the forest might have penalised bad navigation choices less than the environments from the previous work, as it allows walking cross country and the path network is very dense. Participants making a bad route choice could often correct that only shortly after. We also assume that the tactile compass is less intuitive than the very powerful waist belts. This may have led to performance penalties which did not occur in the above studies. However, the tactile compass also showed that it may be a highly effective navigation tool when used with some training and/or in unconventional ways. Especially in the beginning some participants had difficulties to interpret the tactile compass, which lead to a high variance in the navigation performance measures. Both, the highest (1162s) as well as the lowest (244s) completion times were measured for the tactile compass only condition. The highest completion times occurred when the participants started with the tactile compass and still had difficulties interpreting it despite the training session. This also correlated with the bulk of the disorientation events. Five out of seven disorientation events with the tactile compass occurred in only two sessions that also resulted in the two highest completion times. On the other hand, the fastest completion times were also measured in the tactile condition, when the participants had fewer difficulties in learning the tactile compass’ vibration patterns. These users also showed a tendency to take shortcuts by walking cross-country. Distraction. Previous work indicates that tactile displays used to convey navigation information can improve the users’ attention. It was found that people are able to spot more entities they are tasked to search for [3] and pay more attention to their immediate surroundings, such as obstacles and other people [14]. However, in the study presented here the participants neither felt less distracted nor spotted more benches. Again, related work [3, 14] studied tactile vests and waist belts, which are presumably more intuitive to interpret than the rhythm patterns of the tactile compass. We assume that the participants had to devote too much attention to the tactile cues, so the increase in cognitive workload cancelled out the advantages of the eyes-free usage. More training might reduce this disadvantage. Still, there was a positive tendency, as the participants found most benches in the tactile condition.
A Tactile Compass for Eyes-Free Pedestrian Navigation
653
The visible distraction by the device was significantly affected by the experimental manipulation. In the tactile condition the participants interacted significantly less often visibly with the mobile device. At the first glance, this result might seem obvious, as there was no map to look at in the tactile condition. However, visible interaction was also possible in the tactile condition. Examples are the participants visibly listening to the tactile patterns or holding the device by the ear to "hear" the patterns. Further, the interaction with the map could have been far less, so the differences would be been insignificant. In fact, the frequent visible interaction with the map (about 28-36% in the conditions where the map was available) indicates how dangerously distracting a map can be. These findings go in line with previous studies [15]. Limitations. As every experiment, one limitation of the findings is that they are subject to Hume’s problem of induction. A single experiment cannot prove that the findings will be reproduced in different settings. This means, we do not know how the tactile compass would perform in crowded Tokyo by night, a deep Finish forest in the winter, or the endless plains of the Mid West of the US. Nevertheless, the study has shown that cueing general directions in tactile patterns – although being not necessarily intuitive – is an effective navigation aid. It showed that pedestrians can find a path to a destination in a complex, winding path network with the tactile compass only. Since most participants only used existing paths and there was mostly no line of sight between them and the destination, there is some chance that the results could be replicated in other environments with road networks as well.
7 Conclusions In this paper we presented the design and evaluation of a tactile compass for everyday mobile devices that encodes the direction and distance of geographic location in vibration patterns. The contribution of the paper is twofold: (1) we provide evidence that people can effectively navigate by vibration patterns cueing the geospatial location of a destination "as the crow flies". For this form of information presentation we established the term tactile compass. (2) In addition, we could show that such tactile information cueing has a measurable positive effect on the user’s attention, which may be threatened by today’s display-centric user interfaces. With the tactile compass, it becomes possible to create a "tactile sense of direction" with the phone resting in the pocket. It provides new ways to overcome the commanding nature of turn-by-turn instructions and the dangerous "head down" interaction caused by visual maps used on the move. This new "sense" allows to freely exploring the environment while remaining a sense of one’s own movement in relation to a point of reference. Beyond navigation systems a tactile compass might boost any location-based service that communicates spatial information, such as a friend finder (Google Latitude) or a POI search app (AroundMe). It is associated to the vision of a more traditional understanding of navigation, more than going from A to B as fast as possible, rather in a sense of Kurt Tucholsky (1890-1935) stating that "Umwege erweitern die Ortskenntnis" ("Detours expand the knowledge of a place").
654
M. Pielot et al.
Future work needs to investigate if this "tactile sense of direction" can be made more intuitive. Intuitiveness may be achieved by revising the patterns and the implicit interaction until finding the simplest but yet sufficiently powerful variant. The simple the pattern the more beneficial the tactile compass will be in terms of distraction and cognitive workload. Further, the question of how far this concept scales has to be addressed. The presented study has shown that the tactile compass works in smaller scales of roughly a few 100 metres travel distance. We do not know how well these findings scale for longer distances (e.g. several miles) and different environments (e.g. suburbia or city centre). To support navigation at a larger scale, an approach could be to introduce intermediate landmarks to avoid walking into dead ends. However, to avoid curbing the user again, as turn-by-turn instructions do, such intermediate landmarks should be placed at reasonable intervals, so that users "hop" along a set of interesting places until reaching the destination. Acknowledgements. The authors are grateful to the European Commission, which co-funds the IP HaptiMap (FP7-ICT-224675). We like to thank our colleagues for sharing their ideas with us.
References [1] Aslan, I., Schwalm, M., Baus, J., Krüger, A., Schwartz, T.: Acquisition of spatial knowledge in location aware mobile pedestrian navigation systems. In: MobileHCI 2006: Proceedings of the 8th Conference on Human-Computer Interaction with Mobile Devices and Services, pp. 105–108. ACM, New York (2006) [2] Brewster, S., Brown, L.M.: Tactons: structured tactile messages for non-visual information display. In: Proc. of the Australasian Conference on User Interface (2004) [3] Elliott, L.R., van Erp, J., Redden, E.S., Duistermaat, M.: Field-based validation of a tactile navigation device. IEEE Transactions on Haptics 99 (2010) (preprints) [4] Ferscha, A., Emsenhuber, B., Riener, A., Holzmann, C., Hechinger, M., Hochreiter, D., Franz, M., Zeidler, A., dos Santos Rocha, M., Klein, C.: Vibro-tactile space-awareness. In: Ubicomp, Adjunct Proceedings (2008) [5] Hegarty, M., Richardson, A.E., Montello, D.R., Lovelace, K., Subbiah, I.: Development of a self-report measure of environmental spatial ability. Intelligence 30, 425–447 (2002) [6] Holland, S., Morse, D.R., Gedenryd, H.: Audiogps: Spatial audio navigation with a minimal attention interface. Personal Ubiquitous Comput. 6(4), 253–259 (2002) [7] Ishikawa, T., Fujiwara, H., Imai, O., Okabe, A.: Wayfinding with a gps-based mobile navigation system: A comparison with maps and direct experience. Journal of Environmental Psychology 28(1), 74–82 (2008) [8] Lawton, C.A.: Gender differences in way-finding strategies: Relationship to spatial ability and spatial anxiety. Sex Roles 30(11-12), 765–779 (1994) [9] Leshed, G., Velden, T., Rieger, O., Kot, B., Sengers, P.: In-car gps navigation: engagement with and disengagement from the environment. In: CHI 2008: Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, pp. 1675–1684. ACM, New York (2008) [10] Lindeman, R.W., Sibert, J.L., Mendez-Mendez, E., Patil, S., Phifer, D.: Effectiveness of directional vibrotactile cuing on a building-clearing task. In: CHI 2005: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 271–280. ACM, New York (2005)
A Tactile Compass for Eyes-Free Pedestrian Navigation
655
[11] Madden, M., Rainie, L.: Adults and cell phone distractions. Technical report. Pew Research Center (2010) [12] Magnusson, C., Breidegard, B., Rassmus-Gruhn, K.: Soundcrumbs - hansel and gretel in the 21st century. In: HAID 2009: 4th International Workshop, Haptic and Audio Interaction Design (2009) [13] Magnusson, C., Rassmus-Gröhn, K., Szymczak, D.: The influence of angle size in navigation applications using pointing gestures. In: Nordahl, R., Serafin, S., Fontana, F., Brewster, S. (eds.) HAID 2010. LNCS, vol. 6306, pp. 107–116. Springer, Heidelberg (2010) [14] Pielot, M., Boll, S.: Tactile Wayfinder: comparison of tactile waypoint navigation with commercial pedestrian navigation systems. In: The Eighth International Conference on Pervasive Computing, Helsinki, Finland (2010) [15] Pielot, M., Henze, N., Boll, S.: Supporting paper map-based navigation with tactile cues. In: MobileHCI 2009 (2009) [16] Pielot, M., Krull, O., Boll, S.: Where is my team? supporting situation awareness with tactile displays. In: CHI 2010: Proceeding of the Twenty-Eighth Annual SIGCHI Conference on Human Factors in Computing Systems. ACM, New York (2010) [17] Platzer, E.: Spatial Cognition Research: The Human Navigation Process and its Comparability in Complex Real and Virtual Envrionments, PhD thesis. Universit der Bundeswehr Munchen (2005) [18] Robinson, S., Eslambolchilar, P., Jones, M.: Sweep-shake: finding digital resources in physical environments. In: MobileHCI 2009: Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services, pp. 1– 10. ACM, New York (2009) [19] Robinson, S., Jones, M., Eslambolchilar, P., Murray-Smith, R., Lindborg, M.: “I Did It My Way”: Moving away from the tyranny of turn-by-turn pedestrian navigation. In: MobileHCI 2010: Proceedings of the 12th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, New York (2010) [20] Rukzio, E., Müller, M., Hardy, R.: Design, implementation and evaluation of a novel public display for pedestrian navigation: the rotating compass. In: CHI 2009: Proceedings of the 27th International Conference on Human Factors in Computing Systems, pp. 113– 122. ACM, New York (2009) [21] Seager, W., Fraser, D.S.: Comparing physical, automatic and manual map rotation for pedestrian navigation. In: CHI 2007: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 767–776. ACM, New York (2007) [22] Smets, N.J.J.M., te Brake, G.M., Neerincx, M.A., Lindenberg, J.: Effects of mobile map orientation and tactile feedback on navigation speed and situation awareness. In: MobileHCI 2008: Proceedings of the 10th International Conference on Human Computer Interaction with Mobile Devices and Services, pp. 73–80. ACM, New York (2008) [23] Strachan, S., Eslambolchilar, P., Murray-Smith, R., Hughes, S., O’Modhrain, S.: Gpstunes: controlling navigation via audio feedback. In: MobileHCI 2005: Proceedings of the 7th International Conference on Human Computer Interaction with Mobile Devices & Services, pp. 275–278. ACM, New York (2005) [24] Tan, H.Z., Pentland, A.: Tactual displays for wearable computing. In: ISWC 1997: Proceedings of the 1st IEEE International Symposium on Wearable Computers, p. 84. IEEE Computer Society, Washington, DC, USA (1997) [25] Tscheligi, M., Sefelin, R.: Mobile navigation support for pedestrians: can it work and does it pay off? Interactions 13, 31–33 (2006)
656
M. Pielot et al.
[26] Tsukada, K., Yasumura, M.: ActiveBelt: Belt-type wearable tactile display for directional navigation. In: Davies, N., Mynatt, E.D., Siio, I. (eds.) UbiComp 2004. LNCS, vol. 3205, pp. 384–399. Springer, Heidelberg (2004) [27] van Erp, J.B.F.: Presenting directions with a vibrotactile torso display. Ergonomics 48, 302–313 (2005) [28] van Erp, J.B.F., van Veen, H.A.H.C., Jansen, C., Dobbins, T.: Waypoint navigation with a vibrotactile waist belt. ACM Transactions on Applied Perception 2(2), 106–117 (2005) [29] Williamson, J., Robinson, S., Stewart, C., Murray-Smith, R., Jones, M., Brewster, S.: Social gravity: A virtual elastic tether for casual, privacy-preserving pedestrian rendezvous. In: CHI 2010: Proceeding of the Twenty-Eighth Annual SIGCHI Conference on Human Factors in Computing Systems. ACM, New York (2010)
Are We There Yet? A Probing Study to Inform Design for the Rear Seat of Family Cars David Wilfinger, Alexander Meschtscherjakov, Martin Murer, Sebastian Osswald, and Manfred Tscheligi Christian Doppler Laboratory "Contextual Interfaces", ICT&S Center, University of Salzburg, Sigmund Haffner Gasse 18, 5020 Salzburg, Austria {david.wilfinger,alexander.meschtscherjakov,martin.murer, sebastian.osswald,manfred.tscheligi}@sbg.ac.at
Abstract. When researching interactive systems in the car, the design space can be divided into the following areas: driver, front seat passenger and rear seat. The latter has so far not been sufficiently addressed in HCI research, which results in an absence of implications for interaction designs in that space. This work presents a cultural probing study investigating the activities and the technology usage in the rear seat as social and physical space. The study was conducted with 20 families over a period of four weeks and unveiled aspects relevant for HCI research: aspects of diversion, educational motivation, togetherness, food as activity, physical space, perception of safety, and mobile computing. In relation to these areas, implications for the design and integration of interactive technology in the rear seat area are deduced. We show that cultural probing in the car is a promising and fruitful approach to get insights on passenger behavior and requirements for interactive systems. To improve the rear seat area and to show the potential of probing results to inform design, a design proposal for an interactive rear seat game called RiddleRide is introduced. Keywords: rear seat, design space, cultural probing, car, design.
holistic image of the car as usage context of technology either novel methods to investigate this special area need to be contrived, or already existing methods need to be adapted. This work will introduce cultural probing [7] as an adapted method to research “rear seat” in terms of physical and social space. 1.1 Motivation Although modern cars are manufactured highly standardized, the industry has provided its customers a multitude of different personalization possibilities. Most of those are related either to the whole car (e.g. motorization, color) or to the drivers and their needs (e.g. navigation systems, steering wheel remote controls). Despite novel approaches like rear seat entertainment systems, the rear seat area is the area with the lowest personalization possibilities when buying a vehicle. Therefore it is necessary to investigate adaptations users make in their rear seat areas and ideas they have to redesign this space. The usefulness of researching self-done adaptations and artifacts was already acknowledged within the HCI community, where do-it-yourself (DIY) HCI has gained recent attention [5]. Additionally, passengers in the rear seat do not have to conduct driving relevant tasks. This often leads to boredom and an extended wish for entertainment. Interactive systems, which are already common in other usage contexts (e.g. homes, airplanes) like video and computer games have been increasingly situated in the modern cars. Since these contexts differ from each other, user requirements for entertainment in cars are a significant area for research. Children, who are often passengers in the rear seat area, are an important user group in the car. Following the statistics provided by the NHTS (National Household Travel Survey) children in the US between the ages of 5-15 years travel on average 39,29 minutes per day in a private owned vehicle [14]. Their wellbeing has a major impact on the atmosphere in the vehicle. This can range from a positive effect on the driver when the kids are happy, to a real safety threat when drivers cannot focus on their driving tasks anymore. Therefore, the front seat passengers are highly relevant sources of information when designing for the rear seat area since they are directly affected by what happens behind them. Automotive HCI research so far has been focused mainly on the driver and the cockpit whilst neglecting the rear seat. Nevertheless, the relevance of this space was already identified by researchers and designers [11]. One of the few studies targeting the rear seat has been presented by [13] introducing a system that supports the communication between passengers in the vehicle. 1.2 Research Goals The above-mentioned reasons led us to the conviction that the investigation of the rear seat area is highly relevant for the HCI community. As a first step to approach the rear seat as interaction space, we therefore aimed at investigating the usage of the rear seat by families, with mainly kids traveling in the rear seats. We acknowledge that the usage of the rear seat area is more diverse with other passengers than children, but view this study as a start for further investigations in that space. In the presented study, the following research goal is investigated:
Are We There Yet? A Probing Study to Inform Design
659
− How is the rear seat area used by families and which implications for HCI exist when introducing technology in this area of the car? In more detail the following sub goals are identified: − − − −
Which weaknesses of the rear seat area exist and how are they fixed by families? Which activities are conducted in family cars in the rear seat area? Which artifacts are used by families for their rear seat activities? What are participating families requirements for a “perfect” rear seat area including interactive technology? − How is technology used in the rear seat area of family cars and what are implications for the design of future interactive technology? 1.3 Method To investigate the research goals, we chose the cultural probing method, originally developed by [7]. Probing studies contain packages that include open-ended, provocative and oblique tasks that support an engagement of participants early in the design process. The main idea is to involve participants, encourage them to take photos, write diaries, send postcards and conduct other kinds of activities that facilitate a creative approach to the study topic. The underlying idea is to get unknown ideas from a group the researchers are not familiar with. Meanwhile, different variations of the cultural probing method have been presented (e.g., organizational probes [17] and perspective probes [2]). Probing studies were additionally conducted in different contexts researching technology usage, such as the home [3]. Recent research within the HCI community has utilized cultural probes in order to study life-logging practices [15], or to investigate emotions experienced by families living geographically distant [10]. Why Probing? Probing studies and their results are often criticized due to the nature of the probing results. Probes give insights on users in a very subjective way and their interpretation is of a very subjective nature too. Bill Gaver (as cited in [4]), however states that the uncertainty of probes is an asset. Probing additionally enables researchers and designers to understand everyday life situations and allows them to get a sense of context and how participants live in it [11]. Another advantage of applying cultural probing in family cars is the absence of a researcher. Common car usage is likely to be influenced by the presence of a researcher, as the car is a very private setting with a limited amount of space. Another strength of the probing approach is the possibility to gather information over a longer period of time, leading to insights different to short time qualitative methods like interviews. Probing as a method enables the researcher to gain insights into those complex and emotionally laden processes that have evolved within the rear seat area and in the interaction between front and back row. This gives probing an advantage compared to other in-situ methods like contextual inquiry or ethnographic methods. When working towards design innovations for an interaction space, probing has the potential to deliver insights more focused on design than other context driven methods like the contextual inquiry. Apart from that, probing can support collaborative aspects and
660
D. Wilfinger et al.
trigger participants to conduct probing activities through playful elements. The playful aspects of probing especially have lead to useful insights when conducting probing activities together with children in private contexts [3]. Within the car context, quantitative measures dominate studies conducted in the recent past. Cultural probes so far have not been applied. When approaching an interaction space like the rear seat, it is vital to understand the context and the experiences that evolve within the space. Due to their situational nature, these experiences are ideally captured in time and physical proximity. Probing as a method gives designers a tool to capture those experiences, reflect them and use them as a source of inspiration for future interaction designs. Other than in participatory design, probes are not artifacts, which directly lead to a new design solution. The user does not become a designer and does not make design decisions, which directly affect the interactive system. In our effort to design future in-car interfaces for that space, probing enables us to understand it generally before further detailed research questions can be developed addressing requirements for certain interactive systems. We argue that the usage of design methodology like probes will be well suited for addressing our research goals. Due to space limitations an extended discussion of the usefulness of probing as method cannot be conducted in this paper but is available in related literature (see for example [4], [7], [9]).
2 Study Setup Since children are a main target group for the usage of the rear seat area, we invited 20 families (including single parents) for the study, explicitly asking for families whose kids mainly sit in the rear seat. While most probing studies aim at being completed by an individual user, we understand the car interior as shared space, thus we applied a collaborative probing approach having the family conducting the probing activities together. At least one member of the family was invited to an initial one-hour workshop where the method and the goal of the study were presented. Additionally the workshop was suitable to have a discussion about wishes towards the rear seat area. During the workshop, the probing package was handed over and explained. The participants kept the package for about four weeks conducting the activities as described in the next chapter. Every week, they received a reminder text message, after four weeks they sent the package back. After returning the packages, the participants were invited again to discuss the probes and the probing approach itself in a final workshop. Since the car is a safety critical context, all participants were explicitly advised to follow traffic regulations and not to conduct any probing activities if this would reduce the traffic safety. No probing activity required the driver to get actively involved while driving. During the initial workshop, we highlighted the creative nature of cultural probing in order to avoid pressure to perform (e.g. “take the materials as inspiration, not as checklist”). At six initial workshops, probing packages were handed out to 20 participating families. 16 packages were returned after more than the four weeks; one family returned the package earlier. Three families either delivered no probes or returned the probing package too late to be invited for a final workshop. The 17 families that were included in the interpretation had in total 28
Are We There Yet? A Probing Study to Inform Design
661
children: 9 families with one child, 8 with 2 children, 1 with 3 and 1 family with 4 children. The mean age of the children was 6.7 years (SD: 4.32; n = 28); the youngest child was 1, the oldest 16 years old. 2.1 Probing Package Cultural probing includes the usage of a probing package which is intended to inspire the participants and give them tools to conduct the probes [7]. The probing package proposed in this work was developed to fit the automotive context. Activities that take more time and are distracting were designed to be conducted during breaks, for example in the spot where the car is parked. Probes which are actually taken in the car or within the context of a journey were quick to take and offered a scalable effort that allowed the participants to decide themselves how many resources they would like to invest. The probing package included a roadbook, utilizing the metaphor of the diagrammatic book used in rally sports. The roadbook was a 21x14.8cm (DIN A5) booklet containing a double page for each probing activity (see details below). It additionally contained free pages for painting and other unstructured input and a page with the contact data of the study organizers. Apart from the roadbook, the probing package included materials needed to conduct the probing activities: pens, markers, glue, sticky notes, a disposable camera, postcards, stickers and a large sheet of paper for a collage (see figure 1).
Fig. 1. Probing Package
All the items that were supposed to be taken into the car fit into a large cup as used in drive-in restaurants so that participants could easily store it in the car's cup holder. All other materials were stored in a poster carrier. 2.2 Probing Activities The selection of probing tasks was developed by a concerned group of researchers having experience in the car domain and with probing. They discussed and gathered the most relevant aspects of the rear seat. In a second step, the researchers looked
662
D. Wilfinger et al.
through the related literature and chose probing tasks, which were described to be working well, and which had an appropriate (low) workload for families conducting probing activities in the car. The variety of different probing activities was expected to be beneficial for the study goals - even if one probing approach was not used, the others could deliver valuable information. Each probing activity was linked to a different aspect of the rear seat and was therefore intended to deliver information on how HCI can address this aspect in future interaction designs. In order to motivate participants to take probes, each probing activity referred to a different situation, indicated by the headlines in this chapter. When the car is parked When using the car, different events and properties shape the perception of the rear seat area. An incident, for example, when kids hurt their hands because of automatic windows, will influence the parents' perception about this technology and potentially lead to a design intervention by the parents. It is useful to identify those modifications since they might solve problems or improve the rear seat, which makes them a valuable source of information on end user needs and wishes. In order to express these thoughts and stories, participants were equipped with a disposable camera and were explicitly encouraged also to use their own photo and video devices (e.g. a smartphone) if they felt more comfortable doing so. We additionally gave colored stickers to the participants that were supposed to be stuck next to the interesting item before taking the picture, similar to a commenting tool. Some were pre-printed (super, bad, practical, impractical), some had space for comments to be written on. When we are at home A main goal of the probing study was to identify aspects of personalization in the rear seat area. Due to its properties, the back row of the car only allows a limited amount of personalization (e.g. electricity, limited storage...). We argue that, in everyday life, participants use spaces that allow a higher degree of modification and personalization than the car. A highly personalized space is, for example, the home. It is thus hypothesized that an insight on technologies, which improve other spaces, will allow the deduction of wishes for the rear seat area. Thus participants were asked which technology from their everyday life (e.g. the household) they would like to include in the rear seat area of their car. The task was to take this technology, integrate it in the rear seat as good as possible and document this with the single-use camera. For technology, which was too heavy, too big or mounted permanently (e.g. refrigerator), the participants were asked to simply take a picture of it in its usual position. Before we leave In preparation for longer journeys, certain decisions have to be made and planning has to be conducted, especially when travelling with children. Within the probing approach, the participants were given the opportunity to create a “perfect” rear seat for the trip. A 42 x 59.4cm (DIN A2) sheet of photo carton was included in the probing package for that purpose. Participants were asked to create a
Are We There Yet? A Probing Study to Inform Design
663
collage showing the ideal rear seat area using magazines and other materials they had at home. This was done because personal magazines were intended to better reflect the participating household. When we are on the road Of special interest for researching the rear seat area were events that occurred during a trip or while being on the road. Therefore it was highly important to capture those events and the experiences made in the situation. For that purpose the probing package included self made postcards with pre-printed questions and space for creative input. The address field was already filled out and the postage prepaid. Postcards were used because they have a decent size and grammage to write on in a car. Furthermore postcards have the right connotation when it is about sharing a short message from a trip. Each probing package contained three types of postcards, two copies each. The cards included questions for the destination on the backside, on the front side each card had a different topic encouraging participants to describe experiences or sketch them (“What do you like to do on the rear seat? What did you dislike on the rear seat today? What did you enjoy the most on the rear seat today?”). When we get bored Boredom is one of the major problems when children are present in the rear seat area. The goal of this probe was to identify solutions for fighting boredom that are already used by the participants. This approach is based on the assumption that parents have highly qualitative concepts on fighting boredom, which are under constant evaluation and iteration in real life conditions. In the context of this study it was therefore necessary to capture those approaches in order to identify implications that support the design of technology that fights boredom. For that purpose, a double page in the road book was dedicated to capturing ideas regarding boredom and its avoidance; asking participants for what they do when sitting in the car gets boring. When something exciting happened. Probes can suffer from the downside that the appropriate probing material to record a certain situation is not available or left somewhere. Utilizing tools that are already possessed and carried by the participants could reduce this disadvantage. Within the study participants were given a phone number. They were asked to call that number whenever something interesting or exciting happened in the car to leave a message on the answering machine. This approach allowed participants to save a thought or an experience and share it with the researchers since mobile phones are available most of the time. Additionally the participants were also encouraged to send text or multimedia messages.
3 Results Theoretical contributions on challenges of utilizing cultural probes within the design process have been presented by [9], among others. It has been pointed out that cultural probes are not analytical tools and their fragmentary nature requires particular
664
D. Wilfinger et al.
attention. Addressing this issue, we followed an inductive and evolving process of data analysis, analogous to [12]. Taking into account the analysis of cultural probes in HCI presented by [4], we tried to include both the hermeneutic nature of the original probes and the data collection approach of more recent studies. All probes were collected and digitalized, text passages on the collages were transcribed and pictures and drawings were described. The following data analysis included individual and collective sessions of analyzing the returned probes by five researchers with different backgrounds (communication science, computer science, interaction design). One researcher was responsible for collecting the probes and for digitalizing them. In a second step researchers with different backgrounds analyzed the probes commenting each probe with the topic it belonged to and the findings on this topic. All these comments were collected, grouped to the dimensions in the paper and interpreted. We additionally organized two half-day workshops with the participating researchers where the probes were discussed. In those workshops we focused on deducing design implications from the probes and on discussing ambivalent results. Each probe was analyzed regarding its explicit content and its implicit content (e.g. what was sketched, and what motivated the sketch, respectively). This phase took into account other probes of the particular family and attributes of this family (e.g. type of car, age of kids). The probes were tagged with results of this analysis, which was again collaboratively discussed to aggregate different perspectives. Based on those tags, a categorization structuring the results was made in order to identify areas of interest. 3.1 Findings The following sections present a selection of insights gathered from the returned probes. Due to the high diversity of results and space limitations, we are not able to include all findings but intend on giving an overview on the nature of results. Aspects of diversion Activities conducted in the rear seat area played a central role in the results of the study. The main goal of those activities found in the probes was to entertain the children. This was often done to fight boredom, which again would cause troubles like kids fighting or complaining. Findings on boredom were frequent, since the probing material explicitly addressed boredom with the motivation that it will be one of the main application areas of future rear seat technology. In order to fight boredom, participants used entertainment technology that they brought from home. Portable single user devices such as CD/ DVD Players and Nintendo Gameboys were mentioned. TVs were popular within the probes addressing wishes for future technology in the rear seat. The effect on other passengers when using technology in the car was always present in the probes. Devices like game boys and CD players aimed at one user at one time and therefore did not interfere with the interests of other persons in the car. To sum it up, music and video in general were reported to be a powerful tool against boredom, either through the car's HiFi system or through other devices such as musical instruments.
Are We There Yet? A Probing Study to Inform Design
665
Having music devices that can not be used without others listening (because they do not support headphones, for example) can cause annoyance for parents and be a reason for a fight between the children on the rear seat. One child stated that it is sometimes annoying for her mum when she listens to audio books and songs through her mother's CD player in the car. Conflicts in that regard also existed due to a difference in age of the children. Audio books and songs played through the car HiFi system were annoying for kids in different ages because the content was not tailored for them. Popular games played in the rear seat area involved counting a certain type of car (e.g. a cab) or cars with different characteristics (e.g. color, type of license plate). Also items in the landscape were used and included in games (e.g. “I spy with my little eye”). While those games would only work in a car, participants also played games that function outside of the car like thinking of a person and guessing who it is. Telling jokes was also mentioned as popular amusement as well as telling stories. Educational motivation Another identified type of activity was learning related, some parents used the available time for training or educating their kids. An activity aiming at this were quizzes, other kids did small math problems to be entertained but also to be kept mentally active. Others listened to CDs with some kind of educational content through their personal player or through the car's HiFi system. Books were also important to take, for entertainment but also for learning and studying purposes. Togetherness Activities for the whole family were often triggered by the parents. Some probes expressed that the time spent together in the car was not seen as “lost” but to be a family time where activities could be conducted together, creating a co-experience [1]: “We want to talk to the kids and sing with them. We don't want them to watch TV alone or listen to music with a headphone.” In the car the whole family had the opportunity to talk, play and do things that may not be possible at home due to a busy schedule of each family member. For that purpose one family suggested a rotatable front chair making it possible to play games together, or simply look into each other’s faces laughing and chatting. Social and family life also influenced the choice for entertainment technology used in the rear seat. One family stated that they bought a portable DVD player for longer journeys because they did not want a built in device. That was for two reasons: The parents wanted to use the device for themselves outside of the car and as well as not wanting the kids to use the player when all three were sitting on the rear seat. Having a portable device therefore allowed the parents to use it more flexible and to better control its usage without having to argue with their kids about using the device. They just left it at home. Rules and activities in the car also depended on the family member who was driving and who owned the car: “Sometimes we go with grandma and grandpa, but then we can not take anything with us and not make anything dirty, because grandma will get angry with us.”
666
D. Wilfinger et al.
Food as activity A topic that occurred in the probes as popular activity in the rear seat was eating and drinking. Of the 14 collages that were returned, food or drinks were mentioned 11 times. Food items were used to satisfy hunger (e.g., bread, sausages, fruits) or just to snack (e.g., popcorn, ice cream, chocolate, chewing gum). Drink items included both cold (e.g., iced tea, water, juices) and hot drinks (e.g. coffee, to help fight sleepiness of the driver). Probes also reported about a mess produced by food in the rear seat. Not only food and drinks themselves were mentioned but probes also included appliances to cool (e.g., refrigerator), warm (e.g., microwave oven), make (e.g., coffee machine, mixer), or store food (e.g., functional cup holder). Food in the car also had the connotation of being entertaining often mentioned in combination with other activities against boredom. Physical space Storage is a central topic for adults travelling with kids in the rear seat of their car. Although cars usually offer high amount of compartments, the existing solutions do not appear to be suitable for families with young children. Cleanness in the rear seat was important for parents, aesthetic aspects were only mentioned by one family. For the participants functionality ruled once kids inhabited the rear seat space. The main downside of current rear seats is missing accessibility. Fortunately, all participants used child seats, but apart from being beneficial for safety purposes, child seats also severely limit the movement of the children. For this purpose, most compartments and other storage locations in the rear seat area are not accessible for children. Most study participants solved that problem by installing seat hangers that are hung down on the back of both front row seats. The pockets of those are high enough and provide plenty of space for toys, books, pencils and something to drink. Accessibility is a major concern for adults because if kids do not have the possibility to grasp things they want, nobody is available to hand it to them. Space in general is very limited in the rear seat. Families with only one kid used the other side of the seat for storage. If two child seats were necessary, only the narrow space between the two seats was available. Additionally, the floor is a useless space while driving. Children in their seats were neither able to put their feet on the floor nor could they pick up items which were accidently dropped. One parent stated: “If something falls on the floor, the children will scream until it is picked up again.” Parents disliked the car floors because they were hard to clean, especially from liquid substances like ice cream and kids complained that they were not allowed to have ice cream in the car. The collected collages showed the importance of comfort for participants. All returned collages included items to increase comfort. Among others items like blankets, mobile and integrated neck supports, and massage chairs were mentioned. Perception of Safety One major issue for the rear seat area was the safety of children. In the returned pictures safety belts were often illustrated. Child seats are a standard installation in the cars of the study participants and the most common adaption of the cars. Another object used for that purpose were baby monitors, one family stated that they wanted
Are We There Yet? A Probing Study to Inform Design
667
the baby monitor because their kids always fall asleep shortly before reaching the destination. Obviously they wanted to let their children sleep in the car but still be aware of what was going on in the car. On a collage the baby monitor was enhanced by a connection to a mobile phone. Mobile Computing Laptops with internet access were a stated wish for technology being available in the rear seat, as well as game consoles and television sets. Computers integrated in the car were seen as desirable (no USB dongle required to surf the web) but not affordable. Mobile phones were standard equipment in the rear seat, mainly for older kids. The internet was requested for sending emails and getting information about the route, mainly to find hotels or events going on in the surroundings. For all electronic devices participants requested power outlets within the reach of the rear seat area, the back of the first row seats was a popular spot to install those devices. 3.2 Summary The presented results have shown the importance of six areas, derived from the probing study. (1) Fighting boredom through entertainment is one of the most important issues for both parents and children. Strategies used range from using modern technology (e.g. portable video, audio and gaming devices), to traditional toys, to social games without any technology usage. (2) Traveling in the car can be considered a family activity, that is often used consciously for various family related activities (e.g. discussion) and learning. (3) Especially on long trips comfort and wellbeing (e.g. eating, drinking, sleeping) are of high importance when driving with kids. Although this seems obvious, it has to be considered when introducing technology to the car. (4) The desire for space, storage and cleanness must not be underestimated. (5) Regarding safety and control, a balance between comfort and protection is to be found. Adults want to be aware what is going on the rear seat. (6) Computers and mobile phones are mainly used for communication and to stay connected to peer ties. Therefore, power supplies and persistent radio connection is required. Some of our results might not be relevant to HCI in first place. Nevertheless, we believe that in a space like the rear seat area, different findings will not show their value until they are seen through the lens of a specific design problem. Food and drinks, for example, call for technology that sustains spills and stains. Seeing it in a wider perspective all available or not available artifacts have implications on the primary driving tasks and influence safety (for example: distraction through noise). Identifying those implications was a major goal of our study. Overall the probing material inspired participants to creatively approach the rear seat areas in their cars. Feedback given by the participants showed that the probing study changed the way they saw their rear seat area. One participant left a handwritten note for the researchers saying: “(...) it was a lot of fun. Additionally, my husband finally put something for the kids into the car (e.g. reading light, drawer...)” The richness of the received probing data made it very valuable for this study. The method was explicitly carried out situational, all the material was produced within the participants natural surroundings and in close timely connection with an ongoing trip in the car, in the car and during trips (like the postcards).
668
D. Wilfinger et al.
4 Lessons Learned for the Design of Interactive Technology By gaining an understanding of the rear seat area, the following implications for the design of rear seat technology were collected. This collection took place after the accumulation of the results, again with all researchers participating in the process. In order to link the findings to the results, the relevant result chapters are mentioned in (emphasized) font. Gaming Playing games is already popular in the rear seat (aspects of diversion). However, future interactive technology will definitely bring games into rear seat devices. The games played in the rear seat area currently, give highly relevant input for the design of future rear seat games. All of them were multiplayer games, meaning that multiple kids on the rear seat played the game or that additionally one or both adults in the first row would join the game. Apart from a card game, all of the games mentioned in the probes did not involve any material to play the games but required imagination. All games were significantly influenced by the context they were played in. The games did not require physical movement apart from turning your head and upper body (physical space). Since the infant carriers used restrict movement, the games were using fantasy, observation and memory as main tools. Most games were open ended with the possibility to adapt the length of the gameplay according to the available time. Games that were played had either short round by round interactions between the players or could be interrupted without ruining the gameplay. Most games also used the fact that the players were moving and that the physical context was dynamic (aspects of diversion). Those games would not be enjoyable at home in a room where, for example, the number of items with a single color is stable. Integrating aspects of education into the games is a promising approach, especially when it is aimed at topics related to the current trip (educational motivation). We strongly argue for including those aspects into future rear seat gaming technology since the context of play differs severely from other contexts like the home. Multiple users vs. single use Technology in the rear seat has to take both single and multiple user situations into account (togetherness). Therefore, several aspects of interactive systems have to be considered. Sound has to be used consciously in cars, bearing different interests and tastes of passengers in mind. Interestingly, some participants saw it as an annoyance that they had to listen to others music whilst some parents enjoyed listening to music together giving them the possibility to sing the songs as a family. “We dock our MP3 Player so that mum and dad can also listen to our music - Michael Jackson is awesome!” Technology therefore has to be flexible when being used as either single or multiple user technology. Additionally, all kids need to have the same access to the interactive technology to avoid fights, as depicted in the painting presented in figure 2, which includes three of each kind: iPods, headphones, reading lights, fans, portable gaming devices. As one parent stated after the kids got in a fight over something in the rear seat: “I think that technology in the rear seat area is only useful when all kids have the same items.”
Are We There Yet? A Probing Study to Inform Design
669
Fig. 2. Proposed rear seat layout with three identical seats
Functionality When referring to devices they already had in the car (e.g. navigation system), participants expressed the wish to include child relevant functionality (aspects of diversion, educational motivation). Some children on the rear seat enjoyed playing with the portable navigation system when the driver did not need it and would have enjoyed functionality tailored to them (e.g. kids map, more pictures). Another functionality kids wanted was “a computer, which always tells us where we are, so we can mark where we have been and how we liked it there.” What was identified as one major improvement area for in-car technology in general was that kids wanted to participate in using technology that is used by the adults. Integrating functionality for children will give them the possibility to participate. Children also showed that they would like to take part in controlling the car. One kid even requested “(...) a horn and a speedometer, because sometimes dad goes too fast, I think, and then I can let him know. And I often see much better, if another car is doing something wrong - then I can honk.” Rear seat technology should not only aim at entertaining children independently from the surrounding car and context of use. As kids see themselves as part of a joined travel activity they want to contribute to and support their parents' tasks (togetherness). Therefore, future interactive technology should allow the kids to take part in their parents' activity (e.g. driving, navigating) in the fashion of a co-driver. Interaction spaces Following the analysis of the probes, the rear seat can be divided into two spaces (physical space). The larger space is the dead space (1), which is out of reach and sight of children. Most of the floor and the lower part of the doors is dead space and so far only used for storage. Technology can be integrated in the dead space, but no physical interaction can take place. The active space (2) is the area on the backside of the front
670
D. Wilfinger et al.
row seats, between the child seats, the seat itself and bellow the window. This area is currently used to store items for children giving them the possibility to interact with them. Integrating technology into this space may interfere with its current usage but will allow all kinds of interaction (mobile computing). The seat area also has a strong connotation as personal space: “The best thing about the seat area is, that mum and dad are not there. (...) Therefore it is like a children's room, which is never entered by the adults.” Most kids have their assigned place, where their child seat is mounted but also other personal items are arranged according to their wishes and needs. This creation of a bond to the certain place needs to be taken into consideration when introducing technology in the seat. Comfort and durability The high importance of food and drink in the seat in combination with a moving environment requires seat technology to sustain dirt created by groceries (food as activity). As comfort was considered highly important, integrated interactive systems must not lower the comfort of riding the car. Parental supervision Parents expressed the strong wish to be aware of what was going on in the rear seat while driving (e.g. with an additional mirror) but also while the car is parked (e.g. using a baby monitor). During driving, interactive technology has to keep parents in control when and how the technology is used (perception of safety). Adults have to be able to limit the usage of the devices or set them to certain modes like switching from single user mode to an application the whole family can enjoy. The portability of certain parts of the technology can additionally support the parents in banning the usage of the device. This part can be left at home and children should understand that this device cannot be used when the part is not available. Additionally, technology should support the parents in observing the car while it is parked. Although it can be a safety thread, leaving the child in the car can also be beneficial as long as the parent knows what is going on, preferably through audio and video. Usage scenarios mentioned by participants were for example when the car is parked at home and kids are sleeping or when children want to finish a book chapter and come into the house later. 4.1 Summary The findings give insights on what aspects of the rear seat have to be taken into account when designing technology for that space. Although not all returned probes lead to novel insights, the richness of the probing data has the potential to inspire designers experts with different backgrounds (e.g. psychologists). Even if not all findings can be considered being innovative, our work contributes to the design of rear seat technology by summing up and structuring aspects relevant for interactions in that space. To make the impact of the findings clearer the following chapter will show an example design idea derived from the results.
Are We There Yet? A Probing Study to Inform Design
671
5 Design Concept: Riddle Ride In order to illustrate the usefulness of the probing approach and its' results, we developed the concept for a novel rear seat entertainment system called RiddleRide (RR). The design decisions in the proposed system are related to design inspirations (which are emphasized) from the probing study. The focus of the design concept is not on technical feasibility, but to make a first step in an innovation centered design process of a new interactive rear seat system. RiddleRide is a multiplayer quiz gaming application. Its main purpose is to entertain and educate passengers (education) both children and adults in the car during a ride. RR is round-based, automatically detecting the number of players and giving questions via speech through the built in sound system. Each question is read aloud, letting all passengers participate and allowing kids who cannot read yet to join the game (family activity). No additional visual displays need to be installed in spots that are already taken by other installations such as seat hangers (interaction spaces). Since RR is round based, it gives all passengers time to answer their questions and makes it easy to stop the game at anytime and to start it again (interruption of games). After each player gives an answer, RR informs about the correct response and of the current scores. Since all players get the same question, RR aims at creating a coexperience [1] and makes playing the game a family activity. Answers are displayed on mobile devices that are already in the family's possession, such as Nintento DS, iPhone, Playstation Portable, Android Phones, tablets or any kind of device that supports a wireless connection. This reduces investment costs, allows the children to use interaction devices and modalities they are already familiar with and reduces potential fights over equipment by the children, since everyone can use their own belonging (multi vs. single user). Additionally, these mobile devices are within reach, easy to store away (limited space), and sustain dirt and liquid to an extend, which is necessary for the child (durability). The mobile devices can also be used with headphones reducing the distraction for the other passengers, when playing RR as a single player. RiddleRide is intended to entertain both children and grown-ups by adapting the answers given to the questions. To make the game interesting and challenging for different age and skill groups, RR automatically changes the level of difficulty for each player based on their performance keeping the tasks within the “zone of proximal development [18]”. While quiz games usually control difficulty through the question, RR uses the answers for that purpose, since the question is the same for all players (family activity). Tools to in- or decrease the difficulty vary from the number if answer possibilities, number of potentially right answers and graphic representation of answers (see figure 3). An important aspect is the educational value of the game, therefore all quiz questions have an educational purpose. RR is context aware, meaning that questions asked are related to the surrounding in which the vehicle is currently moving and to the destination of the trip (context sensitive gaming). RR is compatible with all kinds of trip durations, since it can be stopped at anytime delivering a score after each question. The fact that RR uses audio output for the questions, it reduces the influence of context factors such as bright sunlight. The volume of RR adapts automatically to the ambient noise in the car cabin (context sensitivity). Through the connection to car
672
D. Wilfinger et al.
telematic systems, RiddleRide will be able to detect situations were distraction has to be minimized (e.g. dangerous road conditions). In those situations the game will either be disabled or playing the game will only be possible with headphones.
Fig. 3. Question from RR, asking for the monastery that appears on the left side of the car on a hill. The three players get different answers presented, varying in number, visual style and similarity to adjust the difficulty.
When adults are present and not driving they also get the possibility to administer the game. They can choose topics, change game settings or simply disable the game when the children are not allowed to play (parental supervision) among other things. The driver is also included in RR. Since distraction from the driving task has to be reduced to a minimum, the driver acts as joker, who can support the other passengers when they need help. The role as joker allows the driver to join the game since it is neither stressful nor time critical. Additionally, the driver can control, when to answer and then to deny an answer if the driving workload is too high for additional tasks. Including the driver makes RiddleRide a real family activity. By giving this design example, the applicability of the probing study results as reflected in design decisions are shown. As a next step we will create prototypes of RiddleRide to study the game in the real context of use. We are aware of potential safety threats caused by technology in the vehicle, which is why RiddleRide will undergo a rigorous evaluation process during its development.
6 Conclusions As its main contribution, this work informs the user experience centered development of future interactive technology in the rear seat. Even if not all findings can be considered being innovative, our work contributes to the design of rear seat technology by summing up and structuring aspects, which relevant for interactions in that space.
Are We There Yet? A Probing Study to Inform Design
673
We were able to gather information about the constitution of user experience in the rear seat. With this information, designers in the automotive context can make conscious design decisions being aware of the effect their decisions will have on the UX in the car. The conducted study provides information on functionality, installation and interaction modalities of future interactive rear seat systems. Results from the probes show the importance of focusing on experiences rather than on what is technically possible in the rear seat. Being unaware of the experience laden processes when designing interactive systems, will lead to negative experiences, rejection of the system and in the worst case to severe safety threats for all passengers. Acknowledgments. The financial support by the Federal Ministry of Economy, Family and Youth, the National Foundation for Research, Technology and Development and AUDIO MOBIL Elektronik GmbH is gratefully acknowledged (Christian Doppler Laboratory for "Contextual Interfaces").
References 1. Batarbee, K.: Co-Experience. Ph.D. thesis. University of Art and Design Helsinki (2004) 2. Berkovich, M.: Perspective probe: many parts add up to a whole perspective. In: Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems, CHI 2009, pp. 2945–2954. ACM, New York (2009) 3. Bernhaupt, R., Wilfinger, D., Weiss, A., Tscheligi, M.: An ethnographic study on recommendations in the living room: Implications for the design of itv recommender systems. In: Tscheligi, M., Obrist, M., Lugmayr, A. (eds.) EuroITV 2008. LNCS, vol. 5066, pp. 92–101. Springer, Heidelberg (2008) 4. Boehner, K., Vertesi, J., Sengers, P., Dourish, P.: How HCI interprets the probes. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2007, pp. 1077–1086. ACM, New York (2007) 5. Buechley, L., Rosner, D.K., Paulos, E., Williams, A.: Diy for chi: methods, communities, and values of reuse and customization. In: Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems, CHI 2009, pp. 4823–4826. ACM, New York (2009) 6. Forlizzi, J., Barley, W.C., Seder, T.: Where should i turn: moving from individual to collaborative navigation strategies to inform the interaction design of future navigation systems. In: CHI 2010: Proceedings of the 28th International Conference on Human Factors in Computing Systems, pp. 1261–1270. ACM, New York (2010) 7. Gaver, B., Dunne, T., Pacenti, E.: Design: Cultural probes. Interactions 6(1), 21–29 (1999) 8. Geiser, G.: Man Machine Interaction in Vehicles. ATZ 87, 74–77 (1985) 9. Graham, C., Rouncefield, M., Gibbs, M., Vetere, F., Cheverst, K.: How probes work. In: Proceedings of the 19th Australasian Conference on Computer-Human Interaction: Entertaining User Interfaces, OZCHI 2007, pp. 29–37. ACM, New York (2007) 10. Kim, H., Monk, A.: Emotions experienced by families living at a distance. In: Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA 2010, pp. 2923–2926. ACM, New York (2010) 11. Koskinen, I.: Hacking a car: Re-embodying the design classroom. In: Nordic Design Research Conference Nordes 2009 (2009), http://nordes.org
674
D. Wilfinger et al.
12. Leonardi, C., Mennecozzi, C., Not, E., Pianesi, F., Zancanaro, M., Gennai, F., Cristoforetti, A.: Knocking on elders’ door: investigating the functional and emotional geography of their domestic space. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009, pp. 1703–1712. ACM, New York (2009) 13. Mahr, A., Pentcheva, M., Müller, C.: Towards system-mediated car passenger communication. In: Proceedings of the 1st International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 79–80. ACM, New York (2009) 14. NHTS: National household travel survey, http://nhts.ornl.gov/ 15. Petrelli, D., van den Hoven, E., Whittaker, S.: Making history: intentional capture of future memories. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009. ACM, New York (2009) 16. Schmidt, A., Dey, A.K., Kun, A.L., Spiessl, W.: Automotive user interfaces: human computer interaction in the car. In: Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA 2010, pp. 3177– 3180. ACM, New York (2010) 17. Vyas, D., Eliens, A., van de Watering, M.R., van der Veer, G.C.: Organizational probes: exploring playful interactions in work environment. In: Proceedings of the 15th European Conference on Cognitive Ergonomics: the Ergonomics of Cool Interaction, ECCE 2008, pp. 35:1–35:4. ACM, New York (2008) 18. Vygotsky, L.: Mind in Society. The Development of Higher Psychological Processes. Harvard University Press, Cambridge (1978)
Don’t Look at Me, I’m Talking to You: Investigating Input and Output Modalities for In-Vehicle Systems Lars Holm Christiansen, Nikolaj Yde Frederiksen, Brit Susan Jensen, Alex Ranch, Mikael B. Skov, and Nissanthen Thiruravichandran HCI Lab, Department of Computer Science, Aalborg University Selma Lagerlöfs Vej 300, 9220 Aalborg East, Denmark {lhc_dk,alex_ranch}@hotmail.com, [email protected], {britjensen,nissanthen}@gmail.com, [email protected]
Abstract. With a growing number of in-vehicle systems integrated in contemporary cars, the risk of driver distraction and lack of attention on the primary task of driving is increasing. One major research area concerns eyesoff-the-road and mind-off-the-road that are manifested in different ways for input and output techniques. In this paper, we investigate in-vehicle systems input and output techniques to compare their effects on driving behavior and attention. We compare four techniques touch and gesture (input) and visual and audio (output) in a driving simulator. Our results showed that the separation of input and output is non-trivial. Gesture input resulted in significantly fewer eye glances compared to touch input, but also resulted in poorer primary driving task performance. Further, using audio as output resulted in significantly fewer eye glances, but on the other hand also longer task completion times and inferior primary driving task performance compared to visual output. Keywords. In-vehicle systems, touch interaction, gesture interaction, eye glances, driving performance.
Technological progress and price reductions have pushed a growing use of touchbased interfaces to control various kinds of in-vehicle systems. The flexibility in its application capabilities, low price, and utilization of a more natural way of interaction, makes it an obvious choice for in-vehicle systems with its increasing presence in new cars and aftermarket GPS-units. But the inherent characteristics of touch screens imply high requirements for visual attention of the driver due to the lack of immediate tactile feedback or dynamics of screen layout [3]. This leads to withdrawal of attention from the driver [25]. Brown [5] classifies two types of withdrawal of attention namely general and selective. The general withdrawal of attention refers to insufficient visual perception during the driving situation. This is also known as eyes-off-the-road [11]. Interacting with in-vehicle systems often requires selective withdrawal of attention as drivers read displays or push buttons. Selective withdrawal of attention is a more subtle distraction type as it deals with mental processing, e.g. memory processes or decision selection. It is also known as mind-off-the-road [12] and takes place e.g. while talking to other passengers or while talking on the phone. Interacting with in-vehicle systems can lead to mind-off-the-road while interacting with e.g. a speech system or listening to audio instruction [2]. Thus, we can distinguish between input and output techniques that require high or low visual attention and perhaps lead to eyes-off-the-road or mind-off-the-road. Inspired by previous research on touch-screen interaction technologies [3, 20], we compare different input and output techniques for in-vehicle systems to investigate and measure their effects on the driving activity and driver attention. The paper is structured as follows; initially we present previous research on in-vehicle systems, and secondly we introduce the interaction techniques. Then we describe the experiment and the results are presented and finally, we discuss and conclude the results.
2 Related Work Driver attention and distraction are fundamental concepts when doing research or development within vehicle safety or in-vehicle systems design [2]. Attention can be defined as the ability to concentrate and selectively focus or shift focus between selected stimuli [8, 18]. Within cars and other vehicles, driver attention is primarily focused on monitoring the environment and executing maneuvers also called the primary driving task [2, 6, 13, 16]. Disruption of attention is defined as distraction and Green describes distraction as anything that grabs and retains the attention of the driver, shifting focus away from the primary driving task [2, 4, 13]. Within in-vehicle attention and distraction research, we observe a significant focus on the dynamics between the primary driving task and secondary driving tasks, e.g. operating various in-vehicle systems. This is significant since research identifies the use of in-vehicle systems as a cause of traffic accidents [4, 13]. Green [13] stresses that most drivers will go to great lengths to complete a given secondary task and rarely abandon a task upon initiation. With a critical primary task, this seemingly irrational behavior and distribution of attention between the primary and secondary task can endanger the safety of the driver and the surroundings. Lansdown et al. [16] acknowledge this unsettling tendency concerning in-vehicle systems in a study focusing on driver distraction imposed by in-vehicle secondary systems.
Don’t Look At Me, I’m Talking to You: Investigating Input and Output Modalities
677
A tendency within in-vehicle interaction research involves attempts to identify an interaction technique that surpasses the capabilities of the traditional tactile interface. In a comparative study, Geiger et al. [9] set out to evaluate the use of dynamic hand movements (gestures) in order to operate a secondary in-car system and compare it to a traditional haptic (tactile) interface. The parameters used for comparison, were errors related to driving performance, tactile/gesture recognition performance and the amount of time drivers didn’t have their hands on the steering wheel. The experiment showed that use of the tactile interface resulted in high task completion times and the system lacked in recognition performance when compared to the gesture interface. The gesture interface allowed users to perform the primary task appropriately, while the users also found the gesture interface more pleasant and less distracting. Alpern & Minardo [1] support these findings in a study where they evaluated gestures through an iterative development of an interface for performing secondary tasks. In the final iteration of their experiment, they noted that users made fewer errors compared to a traditional tactile radio interface. Findings from both studies indicate that gestures could be a viable alternative for secondary in-car systems. Bach et al. investigated how perceptual and task-specific resources are allocated while operating audio systems in a vehicle [3]. Three system configurations – a conventional tactile car stereo, a touch interface and an interface that recognizes gestures as input – were evaluated in two complementary experiments. Bach et al. identified an overall preference for the gesture-based configuration as it enabled the drivers to reserve their visual attention for controlling the vehicle. The conventional car stereo on the other hand lacked an intuitive interface; consequently the system required high perceptual and task-specific resources to be operated affecting the primary task performance of the subjects. The touch interface introduced a reduction in overall task completion time and interaction errors when compared to both the conventional tactile and gesture interfaces. While the potentials of gestures as an input method for in-vehicle systems seems promising, little attention has been given to the possible influence of output methods. In order to address this it would be necessary to distinguish between input and output to clarify how combinations of different output and input methods might affect the interaction and primary task performance. The need to separate output from input in relation to in-vehicle systems is acknowledged by Bach et al. [3] as a limitation in their study and the need for further research on this topic is recognized. Their initial research focus was on system input as opposed to output, which meant the included output mechanisms differed for each of their configurations. The variation in output could have affected the findings – the results do not show which kind of output mechanism that is suitable for in-vehicle systems. This suggests an additional study on output methods in order to investigate how they influence primary task performance and secondary task performance in the vehicle domain. The aim of our study is to compare different configurations of in-vehicle systems with an equal emphasis on both input and output mechanisms. We aim to confine system variables regarding input and intend to accomplish this through a study of visual and auditory output in combination with either touch or gesture input. The rationale behind this combination is the duality in the interaction possibilities of touch screens, which support both touch and gesture interaction and the polarity in the two different sensory channels of output.
678
L H. Christiansen et al.
3 In-Vehicle System: Input and Output We distinguish between input and output in this experiment while using a touch-screen. We integrate two different kinds of input and two different kinds of output enabling four different in-vehicle configurations; touch input with visual output, touch input with audio output, gesture input with visual output and gesture input with audio output. These configurations will hereafter be referred to as /