Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5615
Constantine Stephanidis (Ed.)
Universal Access in Human-Computer Interaction Intelligent and Ubiquitous Interaction Environments 5th International Conference, UAHCI 2009 Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 Proceedings, Part II
13
Volume Editor Constantine Stephanidis Foundation for Research and Technology - Hellas Institute of Computer Science N. Plastira 100, Vassilika Vouton 70013, Heraklion, Crete, Greece and University of Crete Department of Computer Science Crete, Greece E-mail:
[email protected]
Library of Congress Control Number: Applied for CR Subject Classification (1998): H.5, I.3, I.2.10, I.4, I.5 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13
0302-9743 3-642-02709-1 Springer Berlin Heidelberg New York 978-3-642-02709-3 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12704804 06/3180 543210
Foreword
The 13th International Conference on Human–Computer Interaction, HCI International 2009, was held in San Diego, California, USA, July 19–24, 2009, jointly with the Symposium on Human Interface (Japan) 2009, the 8th International Conference on Engineering Psychology and Cognitive Ergonomics, the 5th International Conference on Universal Access in Human–Computer Interaction, the Third International Conference on Virtual and Mixed Reality, the Third International Conference on Internationalization, Design and Global Development, the Third International Conference on Online Communities and Social Computing, the 5th International Conference on Augmented Cognition, the Second International Conference on Digital Human Modeling, and the First International Conference on Human Centered Design. A total of 4,348 individuals from academia, research institutes, industry and governmental agencies from 73 countries submitted contributions, and 1,397 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of the design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human-computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. This volume, edited by Constantine Stephanidis, contains papers in the thematic area of Universal Access in Human–Computer Interaction, addressing the following major topics: • • • • •
Universal Access in the Home Environment Ambient Intelligence and Ambient Assisted Living Mobile and Ubiquitous Interaction Alternative Interaction Techniques and Devices Intelligence, Adaptation and Personalization
The remaining volumes of the HCI International 2009 proceedings are: • • • • •
Volume 1, LNCS 5610, Human–Computer Interaction––New Trends (Part I), edited by Julie A. Jacko Volume 2, LNCS 5611, Human–Computer Interaction––Novel Interaction Methods and Techniques (Part II), edited by Julie A. Jacko Volume 3, LNCS 5612, Human–Computer Interaction––Ambient, Ubiquitous and Intelligent Interaction (Part III), edited by Julie A. Jacko Volume 4, LNCS 5613, Human–Computer Interaction––Interacting in Various Application Domains (Part IV), edited by Julie A. Jacko Volume 5, LNCS 5614, Universal Access in Human–Computer Interaction––Addressing Diversity (Part I), edited by Constantine Stephanidis
VI
Foreword
• • • • • • • • • • •
Volume 7, LNCS 5616, Universal Access in Human–Computer Interaction––Applications and Services (Part III), edited by Constantine Stephanidis Volume 8, LNCS 5617, Human Interface and the Management of Information––Designing Information Environments (Part I), edited by Michael J. Smith and Gavriel Salvendy Volume 9, LNCS 5618, Human Interface and the Management of Information––Information and Interaction (Part II), edited by Gavriel Salvendy and Michael J. Smith Volume 10, LNCS 5619, Human Centered Design, edited by Masaaki Kurosu Volume 11, LNCS 5620, Digital Human Modeling, edited by Vincent G. Duffy Volume 12, LNCS 5621, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris Volume 13, LNCS 5622, Virtual and Mixed Reality, edited by Randall Shumaker Volume 14, LNCS 5623, Internationalization, Design and Global Development, edited by Nuray Aykin Volume 15, LNCS 5624, Ergonomics and Health Aspects of Work with Computers, edited by Ben-Tzion Karsh Volume 16, LNAI 5638, The Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience, edited by Dylan Schmorrow, Ivy Estabrooke and Marc Grootjen Volume 17, LNAI 5639, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris
I would like to thank the Program Chairs and the members of the Program Boards of all thematic areas, listed below, for their contribution to the highest scientific quality and the overall success of HCI International 2009.
Ergonomics and Health Aspects of Work with Computers Program Chair: Ben-Tzion Karsh Arne Aarås, Norway Pascale Carayon, USA Barbara G.F. Cohen, USA Wolfgang Friesdorf, Germany John Gosbee, USA Martin Helander, Singapore Ed Israelski, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Kari Lindström, Finland
Holger Luczak, Germany Aura C. Matias, Philippines Kyung (Ken) Park, Korea Michelle M. Robertson, USA Michelle L. Rogers, USA Steven L. Sauter, USA Dominique L. Scapin, France Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK Teresa Zayas-Cabán, USA
Foreword
Human Interface and the Management of Information Program Chair: Michael J. Smith Gunilla Bradley, Sweden Hans-Jörg Bullinger, Germany Alan Chan, Hong Kong Klaus-Peter Fähnrich, Germany Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA Youngho Rhee, Korea
Anxo Cereijo Roibás, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P.R. China Bernhard Zimolong, Germany
Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven Landry, USA
Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Ron Laughery, USA Wen-Chin Li, Taiwan James T. Luxhøj, USA
Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK
VII
VIII
Foreword
Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa
Matthew J.W. Thomas, Australia Mark Young, UK
Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth André, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian Bühler, Germany Noelle Carbonell, France Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Arthur I. Karshmer, USA Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA
Patrick M. Langdon, UK Seongil Lee, Korea Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Toshiki Yamaoka, Japan Panayiotis Zaphiris, UK
Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA
Gordon M. Mair, UK Miguel A. Otaduy, Switzerland David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Dieter Schmalstieg, Austria Dylan Schmorrow, USA Mark Wiederhold, USA
Internationalization, Design and Global Development Program Chair: Nuray Aykin Michael L. Best, USA Ram Bishu, USA Alan Chan, Hong Kong Andy M. Dearden, UK
Susan M. Dray, USA Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA
Foreword
Sung H. Han, Korea Veikko Ikonen, Finland Esin Kiris, USA Masaaki Kurosu, Japan Apala Lahiri Chavan, USA James R. Lewis, USA Ann Light, UK James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA
Elizabeth D. Mynatt, USA Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin Röse, Germany Eunice Ratna Sari, Indonesia Supriya Singh, Australia Christian Sturm, Spain Adi Tedjasaputra, Singapore Kentaro Toyama, India Alvin W. Yeo, Malaysia Chen Zhao, P.R. China Wei Zhou, P.R. China
Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Amy Bruckman, USA Peter Day, UK Fiorella De Cindio, Italy Michael Gurstein, Canada Tom Horan, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Jonathan Lazar, USA Stefanie Lindstaedt, Austria
Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan Anthony F. Norcio, USA Jennifer Preece, USA Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Sergei Stafeev, Russia Charalambos Vrasidas, Cyprus Cheng-Yen Wang, Taiwan
Augmented Cognition Program Chair: Dylan D. Schmorrow Andy Bellenkes, USA Andrew Belyavin, UK Joseph Cohn, USA Martha E. Crosby, USA Tjerk de Greef, The Netherlands Blair Dickson, UK Traci Downs, USA Julie Drexler, USA Ivy Estabrooke, USA Cali Fidopiastis, USA Chris Forsythe, USA Wai Tat Fu, USA Henry Girolamo, USA
Marc Grootjen, The Netherlands Taro Kanno, Japan Wilhelm E. Kincses, Germany David Kobus, USA Santosh Mathan, USA Rob Matthews, Australia Dennis McBride, USA Robert McCann, USA Jeff Morrison, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Glenn Osga, USA
IX
X
Foreword
Dennis Proffitt, USA Leah Reeves, USA Mike Russo, USA Kay Stanney, USA Roy Stripling, USA Mike Swetnam, USA Rob Taylor, UK
Maria L.Thomas, USA Peter-Paul van Maanen, The Netherlands Karl van Orden, USA Roman Vilimek, Germany Glenn Wilson, USA Thorsten Zander, Germany
Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Thomas J. Armstrong, USA Norm Badler, USA Kathryn Cormican, Ireland Afzal Godil, USA Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Sung H. Han, Korea Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Tianzi Jiang, P.R. China
Kang Li, USA Zhizhong Li, P.R. China Timo J. Määttä, Finland Woojin Park, USA Matthew Parkinson, USA Jim Potvin, Canada Rajesh Subramanian, USA Xuguang Wang, France John F. Wiechel, USA Jingzhou (James) Yang, USA Xiu-gan Yuan, P.R. China
Human Centered Design Program Chair: Masaaki Kurosu Gerhard Fischer, USA Tom Gross, Germany Naotake Hirasawa, Japan Yasuhiro Horibe, Japan Minna Isomursu, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan
Kun-Pyo Lee, Korea Loïc Martínez-Normand, Spain Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Kazuhiko Yamazaki, Japan
In addition to the members of the Program Boards above, I also wish to thank the following volunteer external reviewers: Gavin Lew from the USA, Daniel Su from the UK, and Ilia Adami, Ioannis Basdekis, Yannis Georgalis, Panagiotis Karampelas, Iosif Klironomos, Alexandros Mourouzis, and Stavroula Ntoa from Greece. This conference could not have been possible without the continuous support and advice of the Conference Scientific Advisor, Prof. Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications Chair and Editor of HCI International News, Abbas Moallem.
Foreword
XI
I would also like to thank for their contribution toward the organization of the HCI International 2009 conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, and Maria Bouhli. Constantine Stephanidis
HCI International 2011
The 14th International Conference on Human–Computer Interaction, HCI International 2011, will be held jointly with the affiliated conferences in the summer of 2011. It will cover a broad spectrum of themes related to human–computer interaction, including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. More information about the topics, as well as the venue and dates of the conference, will be announced through the HCI International Conference series website: http://www.hci-international.org/
General Chair Professor Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email:
[email protected]
Table of Contents
Part I: Universal Access in the Home Environment Key Properties in the Development of Smart Spaces . . . . . . . . . . . . . . . . . . Sergey Balandin and Heikki Waris
3
Design a Multi-Touch Table and Apply to Interior Furniture Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chien-Hsu Chen, Ken-Hao Nien, and Fong-Gong Wu
13
Implementation of a User Interface Model for Systems Control in Buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Szu-Cheng Chien and Ardeshir Mahdavi
20
A Web-Based 3D System for Home Design . . . . . . . . . . . . . . . . . . . . . . . . . . Anthony Chong, Ji-Hyun Lee, and Jieun Park Attitudinal and Intentional Acceptance of Domestic Robots by Younger and Older Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neta Ezer, Arthur D. Fisk, and Wendy A. Rogers Natural Language Interface for Smart Homes . . . . . . . . . . . . . . . . . . . . . . . . Mar´ıa Fern´ andez, Juan Bautista Montalv´ a, Maria Fernanda Cabrera-Umpierrez, and Mar´ıa Teresa Arredondo Development of Real-Time Face Detection Architecture for Household Robot Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dongil Han, Hyunjong Cho, Jaekwang Song, Hyeon-Joon Moon, and Seong Joon Yoo Appropriate Dynamic Lighting as a Possible Basis for a Smart Ambient Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lajos Izs´ o A New Approach for Accessible Interaction within Smart Homes through Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viveca Jimenez-Mixco, Rafael de las Heras, Juan-Luis Villalar, and Mar´ıa Teresa Arredondo
29
39 49
57
67
75
A Design of Air-Condition Remote Control for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cherng-Yee Leung, Yan-Ting Yao, and Su-Chen Chuang
82
Verb Processing in Spoken Commands for Household Security and Appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioanna Malagardi and Christina Alexandris
92
XVI
Table of Contents
Thermal Protection of Residential Buildings in the Period of Energy Crisis and Its Influence on Comfort of Living . . . . . . . . . . . . . . . . . . . . . . . . Przemyslaw Nowakowski Design for All Approach with the Aim to Support Autonomous Living for Elderly People in Ordinary Residences – An Implementation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claes Tj¨ ader Speech Input from Older Users in Smart Environments: Challenges and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ravichander Vipperla, Maria Wolters, Kallirroi Georgila, and Steve Renals Sympathetic Devices: Communication Technologies for Inclusion Across Housing Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claudia Winegarden and Brian Jones
100
108
117
127
Part II: Ambient Intelligence and Ambient Assisted Living Design Framework for Ambient Assisted Living Platforms . . . . . . . . . . . . . Patricia Abril-Jim´enez, Cecilia Vera-Mu˜ noz, Maria Fernanda Cabrera-Umpierrez, Mar´ıa Teresa Arredondo, and Juan-Carlos Naranjo
139
Ambient Intelligence in Working Environments . . . . . . . . . . . . . . . . . . . . . . Christian B¨ uhler
143
Towards a Framework for the Development of Adaptive Multimodal User Interfaces for Ambient Assisted Living Environments . . . . . . . . . . . . Marco Blumendorf and Sahin Albayrak
150
Workflow Mining Application to Ambient Intelligence Behavior Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carlos Fern´ andez, Juan-Pablo L´ azaro, and Jose Miguel Bened´ı
160
Middleware for Ambient Intelligence Environments: Reviewing Requirements and Communication Technologies . . . . . . . . . . . . . . . . . . . . . Yannis Georgalis, Dimitris Grammenos, and Constantine Stephanidis A Hybrid Approach for Recognizing ADLs and Care Activities Using Inertial Sensors and RFID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Albert Hein and Thomas Kirste Towards Universal Access to Home Monitoring for Assisted Living Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rezwan Islam, Sheikh I. Ahamed, Chowdhury S. Hasan, and Mohammad Tanviruzzaman
168
178
189
Table of Contents
An Approach to and Evaluations of Assisted Living Systems Using Ambient Intelligence for Emergency Monitoring and Prevention . . . . . . . . Thomas Kleinberger, Andreas Jedlitschka, Holger Storf, Silke Steinbach-Nordmann, and Stephan Prueckner
XVII
199
Anamorphosis Projection by Ubiquitous Display in Intelligent Space . . . . Jeong-Eom Lee, Satoshi Miyashita, Kousuke Azuma, Joo-Ho Lee, and Gwi-Tae Park
209
AAL in the Wild – Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edith Maier and Guido Kempter
218
A Modelling Framework for Ambient Assisted Living Validation . . . . . . . Juan-Carlos Naranjo, Carlos Fern´ andez, Pilar Sala, Michael Hellenschmidt, and Franco Mercalli
228
Methods for User Experience Design of AAL Services . . . . . . . . . . . . . . . . . Pilar Sala, Juan-Pablo L´ azaro, J. Artur Serrano, Katrin M¨ uller, and Juan-Carlos Naranjo
238
Self Care System to Assess Cardiovascular Diseases at Home . . . . . . . . . . Elena Villalba, Ignacio Peinado, and Mar´ıa Teresa Arredondo
248
Ambient Intelligence and Knowledge Processing in Distributed Autonomous AAL-Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ralph Welge, Helmut Faasch, and Eckhard C. Bollow
258
Configuration and Dynamic Adaptation of AAL Environments to Personal Requirements and Medical Conditions . . . . . . . . . . . . . . . . . . . . . . Reiner Wichert
267
Part III: Mobile and Ubiquitous Interaction Designing Universally Accessible Networking Services for a Mobile Personal Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioannis Basdekis, Panagiotis Karampelas, Voula Doulgeraki, and Constantine Stephanidis Activity Recognition for Everyday Life on Mobile Phones . . . . . . . . . . . . . Gerald Bieber, J¨ org Voskamp, and Bodo Urban
279
289
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pascal Bruegger and B´eat Hirsbrunner
297
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vlado Glavinic, Sandi Ljubic, and Mihael Kukec
307
XVIII
Table of Contents
Accessible User Interfaces in a Mobile Logistics System . . . . . . . . . . . . . . . Harald K. Jansson, Robert Bjærum, Riitta Hellman, and Sverre Morka
317
Multimodal Interaction for Mobile Learning . . . . . . . . . . . . . . . . . . . . . . . . . Irina Kondratova
327
Acceptance of Mobile Entertainment by Chinese Rural People . . . . . . . . . Jun Liu, Ying Liu, Hui Li, Dingjun Li, and Pei-Luen Patrick Rau
335
Universal Mobile Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Machado, Tiago Barbosa, Sebasti˜ ao Pais, Bruno Martins, and Ga¨el Dias
345
ActionSpaces: Device Independent Places of Thought, Memory and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rudolf Melcher, Martin Hitz, and Gerhard Leitner Face Recognition Technology for Ubiquitous Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kanghun Jeong, Seongrok Hong, Ilyang Joo, Jaehoon Lee, and Hyeon-Joon Moon
355
365
Location-Triggered Code Execution – Dismissing Displays and Keypads for Mobile Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Narzt and Heinrich Schmitzberger
374
Mobile Interaction: Automatically Adapting Audio Output to Users and Contexts on Communication and Media Control Scenarios . . . . . . . . . Tiago Reis, Lu´ıs Carri¸co, and Carlos Duarte
384
Interactive Photo Viewing on Ubiquitous Displays . . . . . . . . . . . . . . . . . . . . Han-Sol Ryu, Yeo-Jin Yoon, Seon-Min Rhee, and Soo-Mi Choi
394
Mobile Audio Navigation Interfaces for the Blind . . . . . . . . . . . . . . . . . . . . Jaime S´ anchez
402
A Mobile Communication System Designed for the Hearing-Impaired . . . Ji-Won Song and Sung-Ho Yang
412
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wang-Chin Tsai and Chang-Franw Lee Ubiquitous Accessibility: Building Access Features Directly into the Network to Allow Anyone, Anywhere Access to Ubiquitous Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gregg C. Vanderheiden
422
432
Table of Contents
XIX
Using Distributed Processing to Create More Powerful, Flexible and User Matched Accessibility Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gregg C. Vanderheiden
438
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bruce N. Walker and Anya Kogan
445
Design and Evaluation of Innovative Chord Input for Mobile Phones . . . Fong-Gong Wu, Chia-Wei Chang, and Chien-Hsu Chen
455
Part IV: Alternative Interaction Techniques and Devices The Potential of the BCI for Accessible and Smart e-Learning . . . . . . . . . Ray Adams, Richard Comley, and Mahbobeh Ghoreyshi Visualizing Thermal Traces to Reveal Histories of Human-Object Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomohiro Amemiya Interacting with the Environment through Non-invasive Brain-Computer Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Febo Cincotti, Lucia Rita Quitadamo, Fabio Aloise, Luigi Bianchi, Fabio Babiloni, and Donatella Mattia Movement and Recovery Analysis of a Mouse-Replacement Interface for Users with Severe Disabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Caitlin Connor, Emily Yu, John Magee, Esra Cansizoglu, Samuel Epstein, and Margrit Betke Sonification System of Maps for Blind – Alternative View . . . . . . . . . . . . . Gintautas Daunys and Vidas Lauruska
467
477
483
493
503
Scanning-Based Human-Computer Interaction Using Intentional Muscle Contractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Torsten Felzer, Rainer Nordmann, and Stephan Rinderknecht
509
Utilizing an Accelerometric Bracelet for Ubiquitous Gesture-Based Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Albert Hein, Andr´e Hoffmeyer, and Thomas Kirste
519
A Proposal of New Interface Based on Natural Phenomena and So on (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ichiro Hirata, Toshiki Yamaoka, Akio Fujiwara, Sachie Yamamoto, Daijirou Yamaguchi, Mayuko Yoshida, and Rie Tutui Timing and Accuracy of Individuals with and without Motor Control Disabilities Completing a Touch Screen Task . . . . . . . . . . . . . . . . . . . . . . . . Curt B. Irwin and Mary E. Sesto
528
535
XX
Table of Contents
Gaze and Gesture Activity in Communication . . . . . . . . . . . . . . . . . . . . . . . Kristiina Jokinen
537
Augmenting Sticky Notes as an I/O Interface . . . . . . . . . . . . . . . . . . . . . . . . Pranav Mistry and Pattie Maes
547
Sonification of Spatial Information: Audio-Tactile Exploration Strategies by Normal and Blind Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marta Olivetti Belardinelli, Stefano Federici, Franco Delogu, and Massimiliano Palmiero What You Feel Is What You Get: Mapping GUIs on Planar Tactile Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Schiewe, Wiebke K¨ ohlmann, Oliver Nadig, and Gerhard Weber
557
564
Multitouch Haptic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Schmidt and Gerhard Weber
574
Free-form Sketching with Ball B-Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rongqing Song, Zhongke Wu, Mingquan Zhou, and Xuefeng Ao
583
BC(eye): Combining Eye-Gaze Input with Brain-Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roman Vilimek and Thorsten O. Zander
593
Colorimetric and Photometric Compensation for Optical See-Through Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Weiland, Anne-Kathrin Braun, and Wolfgang Heiden
603
A Proposal of New Interface Based on Natural Phenomena and so on (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toshiki Yamaoka, Ichiro Hirata, Akio Fujiwara, Sachie Yamamoto, Daijirou Yamaguchi, Mayuko Yoshida, and Rie Tutui
613
Part V: Intelligence, Adaptation and Personalisation Managing Intelligent Services for People with Disabilities and Elderly People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julio Abascal, Borja Bonail, Luis Gardeazabal, Alberto Lafuente, and Zigor Salvador A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors in Embodied Conversational Agents . . . . . . . . . . . . . Afia Akhter Lipi, Yukiko Nakano, and Matthias Rehm
623
631
Intelligence on the Web and e-Inclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laura Burzagli and Francesco Gabbanini
641
Accelerated Algorithm for Silhouette Fur Generation Based on GPU . . . Gang Yang and Xin-yuan Huang
650
Table of Contents
XXI
An Ortho-Rectification Method for Space-Borne SAR Image with Imaging Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xufei Gao, Xinyu Chen, and Ping Guo
658
Robust Active Appearance Model Based Upon Multi-linear Analysis against Illumination Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gyeong-Sic Jo, Hyeon-Joon Moon, and Yong-Guk Kim
667
Modeling and Simulation of Human Interaction Based on Mutual Beliefs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Taro Kanno, Atsushi Watanabe, and Kazuo Furuta
674
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan-Paul. Leuteritz, Harald Widlroither, Alexandros Mourouzis, Maria Panou, Margherita Antona, and Asterios Leonidis User Individual Differences in Intelligent Interaction: Do They Matter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jelena Naki´c and Andrina Grani´c Intelligent Interface for Elderly Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changhoon Park User Interface Adaptation of Web-Based Services on the Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikolaos Partarakis, Constantina Doulgeraki, Asterios Leonidis, Margherita Antona, and Constantine Stephanidis
684
694 704
711
Measuring Psychophysiological Signals in Every-Day Situations . . . . . . . . Walter Ritter
720
Why Here and Now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio Rizzo, Elisa Rubegni, and Maurizio Caporali
729
A Framework for Service Convergence via Device Cooperation . . . . . . . . . Seungchul Shin, Do-Yoon Kim, and Sung-young Yoon
738
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J´erˆ ome Simonin and No¨elle Carbonell
748
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ivar Solheim
758
Adaptative User Interfaces to Promote Independent Ageing . . . . . . . . . . . Cecilia Vera-Mu˜ noz, Mercedes Fern´ andez-Rodr´ıguez, Patricia Abril-Jim´enez, Mar´ıa Fernanda Cabrera-Umpi´errez, Mar´ıa Teresa Arredondo, and Sergio Guill´en
766
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
771
Key Properties in the Development of Smart Spaces Sergey Balandin and Heikki Waris Nokia Research Centre Itamerenkatu 11-13, 00180 Helsinki, Finland
[email protected],
[email protected]
Abstract. This paper is targeted at improving and expanding the understanding of the Smart Spaces concept of by the R&D community. Through the identification of key properties based on an analysis of evolving trends in the mobile industry, the developers are provided with recommendations that improve the adoption of Smart Spaces. It is especially important to understand how Smart Spaces can change the whole services ecosystem and the role that mobile devices will play. The paper discusses some core technologies being developed in the industry that might play a dominant role in the future Smart Spaces. A special attention of the discussion is the latest trend towards a networked inter-device architecture for mobile devices and what new possibilities it opens. With that the discussion expands into general properties of Smart Spaces. The paper summarizes functional and non-functional properties. By understanding the properties and their implications to the development and adoption of Smart Spaces, the developers are better equipped to ensure that the needs of the various stakeholders are taken into account. For this purpose, the paper proposes a set of questions that can be used to estimate how well the planned Smart Space fares when compared against each of the properties. Keywords: Smart Spaces, Future Mobile Devices, Properties, Taxonomy.
1 Introduction Nowadays people are surrounded by tens of various devices that serve different purposes and what is important – most of these devices already have sufficient processing power, memory and communication capabilities, plus advanced internal control and management system. This fact gives us an opportunity to revise the basic principle of how services are organized and delivered to the users. Actually, a similar trend can already be observed in the Internet, where the services increasingly offer the user the possibility to upgrade related software packages or even replace them by the corresponding distributed network services. Another similar trend can be seen through the success of global image repositories such as Picasa and the recently announced Google repository. Instead of placing all service components to the same physical device, services are implemented in a distributed manner with the involvement of multiple devices. The main research question addressed by this paper is what is the role of mobile devices it this global trend and what are the technical and especially non-technical C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 3–12, 2009. © Springer-Verlag Berlin Heidelberg 2009
4
S. Balandin and H. Waris
properties that the developers of Smart Spaces should consider. This paper is targeted at initiating a discussion on how to facilitate a broad adoption of Smart Spaces. The paper is organized as follows. The next section gives a general definition and provides an overview of the Smart Spaces. We then make an overview of the core technologies that we believe will have a key impact on the Smart Spaces in the near future. The subsequent chapter discusses the main points to be considered in the development of Smart Spaces. It contains a discussion on issues that we consider critical for the success of Smart Spaces as commercial products. The paper is concluded with a summary of the main findings, which we hope to also influence the future work in the field, and the list of references.
2 Definition of Smart Spaces In the book by Diane Cook and Sajal Das the following formal definition of Smart Spaces is given: “Smart Space is able to acquire and apply knowledge about its environment and to adapt to its inhabitants in order to improve their experience in that environment” [1, 2]. This definition assumes continues interaction of the user with the surrounding environment that is targeted in continuous adaptation of the services to the current needs of the user. This interaction is enabled by sensing functionality that gathers information about the space and the user; adaptation functionality for reacting to the detected changes; and effecting functionality for changing the surrounding space to benefit the user. Based on the definition the main focus of Smart Spaces is on the user. The general view of the Smart Spaces hierarchy is depicted by Figure 1.
Fig. 1. Hierarchical layers of Smart Spaces with user in the center
Obvious key concepts for any Smart Spaces are mobility, distribution and context awareness. These are addressed by the recent advances in wireless networking technologies as well as processing and storage capabilities, which have moved mobile and consumer electronics devices beyond their traditional areas of applications and allow their use for a broader scope of services. The significant computing power and highspeed data connections of the modern mobile devices allow them to become information processing and communication hubs that perform rather complex
Key Properties in the Development of Smart Spaces
5
computations locally and distribute the results. This lets multiple devices interact with each other and form ad-hoc dynamic, distributed computation platforms. Together, they form a space where via a number of wireless technologies the users can access a huge variety of services. Similarly, existing and future services form spaces that cater for a variety of needs ranging from browsing to interactive video conversations. These services surround the user all the time and have access to large amounts of data. Over time they can learn the users’ needs and personal preferences, making it possible to build even more advanced services that proactively predict those needs and propose valuable services in the given environment before the users realize it themselves. These layers, each of which can utilize a number of technologies form a smart environment (Smart Space). A further important aspect is that Smart Spaces improve the interaction between users and their physical environments, allowing more efficient consumption of available resources such as energy.
3 Overview of the Related Core Technologies Sensors play a key role in the development of Smart Spaces as the main sources of the context describing the physical world. Multiple sensors allow the continuous observation of the characteristics of the space, which can be collected and processed by a number of devices, which in turn allows the required actions to be taken. As a result, we can automate many services that currently require overprovision of resources or human intervention. The success of the Smart Spaces concept thus depends on whether a standard solution for information representation and communication between the sensors and processing devices will be applied. Another source of massive amounts of information is the World Wide Web, which is especially important when there is a need for interpretation of the obtained information, access to generic data and so on. From this respect the main enabler for Smart Spaces is the semantic web [3] and its underlying technologies, such as Resource Description Framework (RDF) [4]. It provides information representation, including structure and semantics, in a machine readable form. The Semantic Web is an enabler for creating a true web of information and opens the door for the creation of sophisticated Smart Space services where most of the informational interactions happen in an automatic fashion. It completely changes the nature of applications from the current monolithic to highly distributed, mobile and agent-like entities. Devices need to act as information processing and storage units and the resulting services need to be delivered to the consumers. We believe that the mobile device, due to being available to users and possessing significant internal processing power and data storage, should be a central component of the personal Smart Spaces. For interaction between the mobile device and the smart objects surrounding it, the most efficient approach seems to be the expansion of intra-device connectivity solutions. Unfortunately, there is today no existing optimized interface in the mobile industry similar to ISA [5], USB [6], PCI [7] or PCI Express [8] available in the PC world. This has strong historical reasons, especially the need to optimize device performance as much as possible. As a consequence, a large number of sometimes incompatible interface alternatives exist for connecting purpose specific components, and strongly monolithic mobile device architectures include extension busses such as I2C [9] and
6
S. Balandin and H. Waris
SPI [10], which provide a bandwidth of at most a few Mbit/s. The current situation contradicts the target of easy expansion outside of the device. Out of the listed PC world solutions including also FireWire [11], SATA [12] and eSATA, the most important standards from are PCI-Express and USB. Neither fits well for the mobile industry due to being designed with different requirements in mind. However, USB is being used as an external connection interface for mobile devices and other peripherals. Especially the USB 3.0 standard might carve out a niche in the mobile industry if it is aligned with the currently developed core mobile device technologies. The PCI-Express was designed and optimized as a solution with backwards compatibility to PCI. Since the PCI interface is not used in the mobile industry, it is unlikely that it will become a key technology for future Smart Space devices although the technology and industry convergence is still an open story. Another very interesting angle concerns the technologies developed in the space industry. SpaceWire is a standard for high-speed links and networks for use onboard spacecraft. The standard is free for download from the ESA [13] and after a thorough study and modeling we have found that this technology has a good potential for intraand inter-device communications. However, a number of restrictions made it suboptimal for mobile devices. Among the most critical limiting factors one can mention that PHY used DS coding, which doesn’t scale well in terms of bandwidth of a single link; also the standard has minimal support for Network layer functionality and no definition of the Transport layer; it does not have Quality of Service (QoS) support; and finally uncertainty about its future made us drop it from the list of candidates. The development of the new standard for mobile industry was started by the MIPI alliance [13]. The new standard was targeting PHY with 4 pins for the bi-directional 1 Gbit/s link with ultra-low power consumption. As a result, the targeted solution has Bit Error Rate (BER) of 10-14 (i.e. 1 error every 30 hours at link speed of 1 Gbit/s) for chip-to-chip connections, making it impossible to ignore transmission errors as is done with PC busses. The corresponding protocol stack solution, UniPro, provides mechanisms for detecting errors and recovering from them, as well as many other capabilities such as node discovery, QoS, network management. UniPro provides many opportunities for the efficient handling of intra- and inter-device connectivity. To enable the integration of the mobile device and its surrounding equipment, the potential of a wireless extension of the MIPI UniPro has been identified and is being researched. The development of this extension will support the device federation concept, making all surrounding devices in the Smart Space into logical sub-modules of the mobile device internal network. Such a low-level device interconnect will significantly speed up communication and reduce power consumption, but any potential drawbacks of the approach are still to be discovered and investigated. A further enabler technology is the Network on Terminal Architecture (NoTA) [15], a modular service-based interconnect-centric architecture for embedded devices with the basic underlying paradigm similar to what is used for Web Services. NoTA is based on the Device Interconnect Protocol (DIP) that can be implemented for various physical interfaces ranging from MIPI high speed serial interfaces to wireless transports, e.g. Bluetooth. NoTA core, DIP and related system services are open for all interested parties. A number of ideas of various services and solutions on top of NoTA have already been proposed. A number of related publications are available, and a good general overview and further references can be found in [16].
Key Properties in the Development of Smart Spaces
7
Another very interesting technology is called the Multi-device, Multipart, and Multivendor (M3) framework. M3 can be built on top of NoTA or other communication platforms. It extends the principles of rapid product innovation of Internet mash-up services to cover services in physical devices. We recommend two papers onM3 describing an example of workload balancing for RDF store below the semantic information brokers (SIB) [17] and providing the high level definition of the related Smart Spaces architecture [18].
4 Main Points to Consider in the Development of Smart Spaces This chapter describes properties that we currently see as relevant. As described previous chapters, the technical enablers are finally available and adopted by the consumers. Therefore, the main emphasis is on non-technical properties that primarily address challenges related to usability and commercial deployment rather than specific technical problems, although they can be often addressed by technical solutions. We feel that this is beneficial for the R&D community because despite their importance to the broad acceptance of the research results, the properties typically become relevant only later during the development process. 4.1 Technical Properties The product properties of Smart Space systems can be split into two categories. Functional properties are dependent on the functionality that the Smart Space should offer to its users, and are outside the scope of this paper because our assumption is that Smart Spaces can be used to provide arbitrary functionality and therefore the desired composition of these properties varies case by case. The R&D community is already adequately addressing non-functional properties such as resource awareness and security, which may be difficult or expensive to incorporate into the solutions once the products have been deployed. There are further technical properties that are not related to any particular Smart Space as a single product, but rather to the efficiency of the process that creates them as a group or category of products. The following elaborates on these in more detail and presents questions that can be used to estimate how well a work-in-progress Smart Space can be productized. Interoperability of the devices and services in the Smart Space is critical since Smart Spaces are unlikely to comply with a single particular architecture. For a specific Smart Space, it is possible to create a successful stand-alone system that fulfills its business objectives. However, in the absence of adequate interoperability mechanisms the Smart Space will not be able to achieve economies-of-scale and ecosystem benefits that come from cost efficient mass production and ability to maximize the value add of investments through specialization and reuse. Questions: 1. Is the Smart Space composed of components that interoperate using a clear set of common interfaces (low integration effort of planned system)? 2. Is the Smart Space providing an interoperability mechanism for non-predefined components (low integration effort for extensions or enhancements)? 3. Is the Smart Space constructed from components that can be re-used in other Smart Spaces (lower risk for invested effort, support for evolution)?
8
S. Balandin and H. Waris
4. Is the Smart Space allowing all components to be implemented using any technical solutions (low adaptation effort by developers and businesses)? 5. Is the Smart Space composition implementable using easily available and well known development methodology and tools (efficient development effort)? All "yes": high interoperability; All “no”: low interoperability, more standard solutions should be adopted to lower the development costs. Smart spaces are inherently versatile as unique combinations of devices and services serving some purposes in a particular context. The nature of Smart Spaces as systems deployed in a physical space also make it more expensive to upgrade them in a managed fashion as time goes by. It is important to be able to easily extend the functionality of the Smart Space as it emerges over time. Questions: 1. 2. 3. 4. 5.
Is the Smart Space providing access to Internet functionality as de-facto standards? Is the Smart Space based on popular device platforms and Internet solutions? Is the Smart Space supporting the addition and modification of components? Is the Smart Space applicable to components with a wide performance range? Is the Smart Space supporting use of functionality from a different Smart Space?
All "yes": high extensibility; All “no”: low extensibility, later enhancements should be supported, or access to complementary or additional functionality provided. The complexity of developing and operating the Smart Space determines how easily many other properties can be improved. Logical complexity increases the risk involved in starting to develop it as a product, whereas implementation complexity reduces efficiency of installation, maintenance and upgrading. Questions: 1. 2. 3. 4. 5.
Is the Smart Space logically coherent and simple for an average developer? Is the Smart Space installable and maintainable cost efficiently by a non-expert? Is the Smart Space following a logical classification, supporting marketing efforts? Is the Smart Space adhering to a governance model to manage features and IPR? Is the Smart Space available in verified configurations, for distribution channels?
All "yes": simple to develop and operate; All “no”: the development and operation should be made easier. 4.2 Non-technical Properties For the adoption of Smart Spaces it is crucial to go beyond the technology enabler development, demonstrators and small trials. Smart spaces must address real and everyday consumer needs in a way that generates demand for the technical solutions. In particular, their accessibility needs to be targeted to suit the intended users of the various Smart Spaces, and the Smart Space must promise enough commercial added value compared to the costs involved. We are presenting a set of further questions that can be used subjectively to estimate how a Smart Space addresses some key properties. If some property is addressed particularly weakly, the researcher or developer may want to determine whether that is intentional or whether to focus available resources to improve that. The first property to focus on is the generality or specificity of intended users, because the brief first impression must convince the intended user of the value that the Smart Space can provide, and being attractive to more potential users will increase the chances that more will become users. Questions:
Key Properties in the Development of Smart Spaces
1. 2. 3. 4. 5.
9
Can the expected user be from any age group (flexibility and reception of novelty)? Can the expected user be of any occupation or life situation (habits, social needs)? Can the Smart Space be used with any level of attention (effort/means to interact)? Can the Smart Space be used with any level of technical skill? Can the Smart Space be used regardless of the level of mental or physical abilities?
All "yes": intended for a very generic user and thus a potentially large user base, adoption more determined by the rest of the properties. All “no”: requires a very specific user type, other circumstances (e.g. location) need to make it likely that such users would be available in sufficient numbers to make the deployment successful. The next challenge is to make the users aware of the existence of the Smart Space, which may be something very purpose specific in a particular physical space, composed of arbitrary physical elements that are not obvious indicators to any user that there would be a Smart Space in the area. Questions: 1. 2. 3. 4. 5.
Is the Smart Space associated with a concrete, visible object (position/coverage)? Is the Smart Space associated with a recognizable or familiar object or person? Is the Smart Space prominently labeled or indicated (sensory perception)? Is the Smart Space in a physically and information-wise uncluttered area? Is the Smart Space in a context occurring frequently with other similar spaces (possibility to extrapolate or intrapolate, or to memorize for re-use)? All "yes": easy to observe; All “no”: hard to observe or attract attention, existence and availability should be bootstrapped to the environment, or communicated via some other means such as advertisements or training, until a sufficient level of awareness has been established among the intended user base. Users can be aware of the availability of the Smart Space, but were not involved in its preparation and do not know that it would offer potentially attractive services or value. There is no general means to make all users understand the value of all potential Smart Spaces a priori, but functional familiarity with their representations may be possible. Questions: 1. Is the Smart Space serving a similar purpose as an associated object (extrapolate)? 2. Is the Smart Space used in similar ways by different users (examples)? 3. Is the Smart Space performing in a similar range as the associated object? 4. Is the Smart Space starting from a common user need in the context (motivation)? 5. Is the Smart Space involved in the daily habits of its users (likelihood of learning)? All "yes": easy to comprehend; All “no”: hard to comprehend, contents and value proposition should be communicated via some other means such as instructions, or simplified so that the functionality is comprehensible in expected usage situations. When the users start to interact with a newly encountered unique Smart Space available in a particular location, it may be their only occasion to use the system. It is important to serve the intended users by adapting to the interaction types that suit them best in the given circumstances. Questions: 1. Is the Smart Space usable with any modality (ability to serve users at their terms)? 2. Is the Smart Space usable by interacting with a concrete object (interaction learning effort)? 3. Is the Smart Space usable as an extension of existing object functionality (low cognitive learning effort)?
10
S. Balandin and H. Waris
4. Is the Smart Space usable with different methods leading to a function (ability to serve different user logics and approaches)? 5. Is the Smart Space usable with a similar effort regardless of the level of expertise (ability to serve users of various capabilities)? All "yes": intuitive interaction; All “no”: interaction requires meticulous effort and should be made easier through more alternatives suiting different users, or better integration with objects existing in the space or in the user’s possession. User interactions with a unique Smart Space can never fully satisfy the needs of all intended users: Better adaptation to the individual user’s imported configurations and preferences can compensate for the limitations. Questions: 1. Does the Smart Space provide parameters to configure most of its functionality? 2. Does the Smart Space identify the parameters unambiguously for portability? 3. Is the Smart Space linked with user accessible example configurations (ability to learn how to adapt the system)? 4. Is the Smart Space capable of exporting and importing configurations (ability to automatically apply selected configurations)? 5. Is the Smart Space capable of applying partially fitting configurations (portability of settings across similar but different systems)? All "yes": user specific preferences can be intuitively applied; All “no”: tailoring requires meticulous effort and should be made easier by adopting preference descriptions commonly used by comparable users and services. The final condition for a successful Smart Space is commercial viability. There needs to be a balance between the investments on deployment and operating costs and the expected income for all stakeholders. Attempting to estimate these may feel useless, but may also help in adjusting the ambition levels of the development effort. Questions: 1. Is the Smart Space fully sponsored by any of multiple committed business parties? 2. Is the Smart Space making the contributions of stakeholders visible to their potential clients (support for advertisement funded business model)? 3. Is the Smart Space operation clearly profitable after potential costs? 4. Is the Smart Space composed of elements that are fully reusable in other spaces (ability to recoup investments in case of lifetime expiration or failure)? 5. Is the Smart Space managing the rights of all stakeholders investing in it (reduce business risk over the lifetime of the system)? All "yes": low business risk; All “no”: business risk is high and the ambitions of the R&D effort should be considered accordingly. The attractiveness must also be made known to developers and users: what it can provide them, how well it does that, and how it can be used subsequently. Questions: 1. Is the Smart Space suggesting potentially useful non-requested functionality (value beyond expectations)? 2. Is the Smart Space capable of recognizing user's interest in similar spaces (ability to speed up adoption and distribution through network effect)? 3. Is the Smart Space detecting and repeating successful usage patterns (automation)? 4. Is the Smart Space detecting and correcting unsuccessful usage patterns? 5. Is the Smart Space conveying an image of continuity backed up by credible sponsors (trust that it is worth the personal resources invested in using it)?
Key Properties in the Development of Smart Spaces
11
All "yes": further use of the deployed technologies and solutions is encouraged; All “no”: further use beyond the immediate reason that the user started to interact is discouraged, and any economies-of-scale benefits are difficult to obtain. Finally, for the Smart Spaces to become successful as a broad category of systems available in physical locations, it is important to support a healthy ecosystem of multiple actors with versatile development capabilities and business interests. Questions: 1. Is the Smart Space possible to deploy in multiple combinations (ability to incorporate elements from multiple vendors)? 2. Is the Smart Space possible to deploy at multiple levels of quality (ability to adapt and apply the space in multiple environments)? 3. Is the Smart Space exempt from regulatory or other non-user imposed constraints (reliability of available functionality)? 4. Is the Smart Space capable of self-configuration to accommodate enhancements? 5. Is the Smart Space offering a light-weight licensing for enhancements? All "yes": the freedom of building additional or complementary business using the Smart Space is unconstrained; All “no”: the Smart Space constrains external innovations, and any ecosystem benefits are difficult to obtain.
5 Discussion and Conclusions The main purpose of this paper is to initiate a discussion on how Smart Spaces could be broadly adopted by users in their everyday lives, by paying attention to pragmatic product issues. The paper makes an overview of existing technologies that according to our opinion will play a key role in the future Smart Spaces. An important observation is that both efficient communication and service development frameworks have to be proposed and widely accepted in order to guarantee the broad success of Smart Spaces. It is clear that the Smart Spaces concept is an opportunity for consumer electronics and services industries to get even closer to the users, proactively assist them, and as a result optimize the consumption of critical resources. It is natural for mobile devices become the personalized access point and interface to the surrounding Smart Spaces due to their availability to the users and their significant processing and storage capabilities. For example, the management functionality should inform the Smart Space about the user preferences and see how to obtain the favorite service of the user from the modules available in the given space. By having access to a large amount of personal information (e.g. calendar, email, etc.) and being carried by the user, the device can learn about the individual preferences and thus find or build up new services and offer them to the user at the most convenient time. We have noted that the R&D community is well capable of addressing functional and non-functional product properties. However, for solutions intended to be deployed commercially as Smart Space products, there are additional properties that we encourage researchers and developers to take into account already at an early phase in order to increase the probability that their results will end up in the market. We have presented properties related to the efficiency of product creation; to the usability of arbitrary Smart Spaces in the physical space; and to their deployment as commercial products. Within each of these categories we have proposed a set of key properties and presented a list of five simple questions that allow the developers to subjectively estimate how easy it would be to make the leap from a technical Smart
12
S. Balandin and H. Waris
Space solution to a sustainable product desired by users and valuable to businesses. We do not expect that the questions would be answered in the affirmative in all or even most of the categories for any prospective Smart Space. However, if any of the categories scores poorly in any of the categories it should prompt the developer to reconsider the assumptions of the R&D effort. Finally, the questions can be translated into taxonomy and used for classifying Smart Space concepts and implementations.
References 1. Cook, D., Das, S.K.: Smart environments: Technology, protocols and applications. John Wiley & Sons, Chichester (2004) 2. Das, S.K.: Designing Smart Environments: Challenges, Solutions and Future Directions. In: Proceedings of ruSMART conference, St. Petersburg, Russia (2008) 3. Oliver, I.: Towards the Dynamic Semantic Web. In: Proceedings of ruSMART conference, St. Petersburg, Russia (2008) 4. Official web site of Resource Description Framework (RDF) / W3C Semantic Web Activity (2009), http://www.w3.org/RDF/ 5. IEEE Personal Computer Bus Standard P996, Draft D2.02. IEEE Inc. (July 13, 1990) 6. Universal Serial Bus (USB) 2.0 Specification, http://www.usb.org/developers/docs/ 7. Peripheral Component Interconnect (PCI) Standard, http://members.datafast.net.au/dft0802/specs.htm, http://www.pcisig.com/specifications/ordering_information 8. PCI express - computer expansion card interface, http://www.pcisig.com/members/downloads/specifications/ pciexpress/PCI_Express_Base_Rev_2.0_20Dec06_cb2.pdf 9. I2C-BUS (2000), http://www.nxp.com/acrobat_download/literature/ 9398/39340011.pdf 10. SPI - Serial Peripheral Interface, http://www.mct.net/faq/spi.html 11. FireWire® 800,http://www.lacie.com/download/more/WhitePaper_Fire Wire_800.pdf 12. Serial ATA-IO: Enabling the future, http://www.serialata.org 13. Web site of European Space Agency (ESA), official web page of the Spacewire standard working group, http://spacewire.esa.int/content/Home/HomeIntro.php 14. Official web site of Mobile Industry Processor Interface (MIPI) Alliance (2009), http://www.mipi.org/ 15. Official web site of Network on Terminal Architecture, NoTA World Open Architecture Initiative (2009), http://www.notaworld.org/ 16. Lappetelainen, A., Tuopola, J.-M., Palin, A., Eriksson, T.: Networked systems, services and information, The ultimate digital convergence. In: 1st International NoTA conference, Helsinki, Finland (2008) 17. Boldyrev, S., Balandin, S.: Illustration of the Intelligent Workload Balancing Principle in Distributed Data Storage Systems. In: Proceedings of workshop program of the 10th International Conference on Ubiquitous Computing (September 2008) 18. Oliver, I., Honkola, J.: Personal Semantic Web Through A Space Based Computing Environment. In: Middleware for Semantic Web 2008 at ICSC 2008, Santa Clara, CA, USA (2008)
Design a Multi-Touch Table and Apply to Interior Furniture Allocation Chien-Hsu Chen, Ken-Hao Nien, and Fong-Gong Wu Department of Industrial Design, National Cheng-Kung University No.1, University Road, Tainan 701, Taiwan
[email protected],
[email protected],
[email protected]
Abstract. This is a study based on the integration of FTIR multi-touch technology with Industrial Design to produce a multi-touch table. An multi-touch system interface is also developed through this study. Furniture allocation is applied as the content to provide users practical operating experience on the multi-touch interface. The process includes FTIR structure related testing, hardware technology and specifications; and the exterior design. The system interface includes image recognition system and multi-touch application, and is developed in FLASH. This study not only uses the easy-to-use characteristics of the multi-touch technology but also integrates PV3D to link the 3D scene with the user interface. This provides a real-time 3D simulation image that the user can view the result of the furniture allocation while controlling the user interface. Observation and interviews are made on the users to evaluate the advantages and related problems of the multi-touch technology for future study and development. Keywords: Multi-Touch, Interior Design.
1 Introduction With the development and popularization of computer, massive digital information has leaded us to enter the age of digitizing. In this age, computing is closely linked with our life. For instance, digital document, music, map, mail etc, and the digital information also have changed the method of paper writing, data saving or information sending. Not only thus it can be seen massive digital information has changed our life, on the other hand it has also brought us a convenient life. And then we could find that keyboard and mouse are the most popular way to interactive with digital information no mater it is simple or not. But regarding such interactive way, recently, some scholars have proposed different views. Ishii and Ullmer [1] then proposed a new thought in the human-machine interaction-Tangible Bits. They thought that digital information should allow users to grasp or manipulate, but not have to use the keyboard or mouse. For this purpose, they have described three key concepts of Tangible Bits: interactive surface; the coupling of bits with graspable physical objects; and ambient media for background awareness. Its goal is to integrate the digital information into physical environment C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 13–19, 2009. © Springer-Verlag Berlin Heidelberg 2009
14
C.-H. Chen, K.-H. Nien, and F.-G. Wu
for achieving easy to use with natural operation in order to reduce learning in digital world. Enable the information to be manipulated directly in the intuition way is also called Tangible User Interface. Besides Tangible User Interface, Mitsubishi Electric Research Laboratories (MERL) have presented their study on the new type of interface [2]. They provided a touch screen that users can directly operate on it. Different from signal touch screen, it has the multi-touch and multi-user characteristics, enriching the operation interface. Before MERL, Rekimoto [3] also presented the related research about multi-touch technology called SmartSkin project. From NYU, Han [4] also presented the multitouch technology called frustrated total internal reflection (FTIR). Although the technologies of multi-touch are different, they have the same purpose to let users directly manipulate virtual information on the screen with their fingers and also bring the possibility of multi-user simultaneously working together. Thus it can be seen, the computer interface operation is no longer limited in the traditional mouse and keyboard. The new operating mode unceasing was proposed. Both the TUI and multi-touch have the same purpose on a more natural behavior. However, research apply these interactive interface in the tabletop or large-scale display. Because the user does not need to look at the computer screen, operate keyboard and mouse. This has changed the operation behavior of interface and increased the future interface from existence degree of freedom. Recently also has related research on combining the digital information with our physical environment for providing an intuition interface through different combination. For instance, Park et al. [5] presented a series of future wisdom family equipment in the “Smart home”. Without the complex technology and user interface, it can provide awareness of information according to user’s demand and take the user to the better life experience. And in some case, as we can see, they combined the digital information with around environment, for instance, furniture, wall, windows or tabletop. And then Sukeda et al. [6] also present the concept of information-accessing furniture. They embed the technical equipment in the table, wall and mirrors providing the different information in different situation. They hope that this can help users more easily to get the information from daily life. From the related research, the information display and the interaction of human and digital information will not only be through the use of screen, keyboard or mouse. With the development of new interaction interface will enhance those concepts to be come true. Although multi-touch, one of those research, is still at the initial development stage, it is getting popular and expending quickly because of the simple and intuitive operation. So the goal of this study is to integrate the multi-touch technology and design from the point of view of Industrial Design. In order to make it come true, this study describes the integration of multi-touch technology and design. A multitouch table is made and an interface for interior furniture allocation is designed to demonstrate its application and provide different kind of user experience.
2 Design and Implementation At the beginning, we have to choose the multi-touch technology first and understand the related technical equipment. After that, we could integrate it with design process.
Design a Multi-touch Table and Apply to Interior Furniture Allocation
15
Considering the low-tech, low cost and design implementation, we finally choose the FTIR technology. In the beginning of the design, we tested the related materials and equipments in order to make sure the related size and technology, including the projector position, size, the capturing area of camera and the electric circuit of IR-LED. After several testing, we recorded the related positions between projector, camera, mirror, and the area of image to design a foundation for them (Fig. 1).
Fig. 1. The design of foundation
Next, we extended foundation design into the table design, and several idea sketches were drawing with the outward appearance concept to have chosen (Fig. 2). Then, we co-manufacture the table with the furniture company to make it.
Fig. 2. The outward appearance of table design
On the two sides (right and left) of projecting screen, the electric circuits of IRLED were made. Each side has five groups of infrared LED; there are ten groups on each side. Each group has five infrared LED and adds one 130 ohm resistance, and the distance between two infrared LED is 1.8 cm. we made five infrared LED in a series connection, and then put ten groups in a parallel connection. During the manufacturing process, some assembling and equipment testing were made for final checking (Fig. 3). In the final stage, color and pattern planning had been done to complete the final work (Fig. 4).
16
C.-H. Chen, K.-H. Nien, and F.-G. Wu
Fig. 3. Assembling and equipment testing
Fig. 4. Final accomplishment
3 Software System After the design is completed, we created a system through FLASH to develop multitouch image recognition system which is called Blob Tracking, and apply to interior furniture allocation. Blob Tracking is used to trace the white point position when the user touches on the screen, and then sending the points’ data to FLASH application. Therefore, we can use those data to define different manipulation meaning. For example, signal point can represent the drag target, and two points can rotate or scale the target. This not only could achieve the characteristics of multi-touch, simultaneously also reduce
Design a Multi-touch Table and Apply to Interior Furniture Allocation
17
the difficulty of computer programming for industrial designers such that they can concentrate in the interface design application.
4 Interior Furniture Allocation Concerning the interface, we designed an interior furniture allocation for application. Through the characteristics of the multi-touch, users can drag and rotate furniture to complete the allocation from Bird-view. Because of the GUI, the environment of single point, it usually has to manipulate object through other menus. But multi-touch can manipulate the object directly, it is more intuitive and simple, through different definition, it will extend richer manipulations. So in the multi-touch environment, it will decrease the complication of manipulation. And provide users an easy-to-use manipulation. On the other hand, in the past, it was natural to have only one view to complete the allocation task, after finishing the task, and then have a 3D render. This kind of process not only spends much time but also use only one view to complete the task. It is not easy to image the 3D space from only one view. So we design a simultaneous 3D view for the users to see while they are allocating the furniture. 4.1 Interface This study combines FLASH with PV3D to design the interface. PV3D is a set of FLASH 3D technology. It can simulate 3D in FLASH application. Therefore, we use FLASH and PV3D to design the interior furniture allocation interface and simulate the 3D render image. Users can simply drag or rotate to allocate the furniture into ideal interior spatial position. While allocating the furniture into space, user can view a real-time 3D image related to his/her furniture allocation or user can switch to the first person viewpoint to view the result of the furniture allocation and modify the allocation. (Fig. 5) The furniture of the interface takes the reference of the IKEA product, choosing the classified living room as the system subject. The 3D model is constructed by 3DS Max. In order to reduce loading of system, we must construct the 3D models by lower spot. Besides, we modified all textures through Photoshop in order to get more quality of 3D models. User interface
Fig. 5. Interface
3D scene
18
C.-H. Chen, K.-H. Nien, and F.-G. Wu
4.2 Playing Experience We invited 20 users to operate the interface. During the operation, the users have to finish the allocation without any limitation (Fig. 6). The result of the interview and observation are as follows: They think that multitouch is intuitive, convenient and free. The manipulation is easier and faster than the mouse. In the past, using mouse usually has to manipulate objects through RightClick or other icons, but multi-touch can manipulate objects directly, it is easier and faster. So it also can manipulate furniture quickly, allocate their own interior space. And through 3D, it can bring them more real feel of space, including the relative position, relative height and the reality. One can also see the whole scene with different view from camera. So most users think that simultaneous 3D screen help them a lot. Furthermore, some users point that multi-touch allow multi-user, so it is possible for them to discuss the furniture allocation with their family or designer and client have a platform to discuss.
Fig. 6. Playing experience
Besides, some users mentioned that although multi-touch is very intuition and simple to manipulate, but they feel that must be reasonable and not have too complicated gesture. If it were too complicated and need to be remember excessive it would become a bad effect. Also there is another worthy question is about accurate. It is not easy for us to manipulate small objects on the interface because of our fingers have their own limitation. So there are have some shortages about accurate to be improved in the future.
5 Conclusion After this experience of the integration of design, we discovered that there are still trifle place to be improved such as assembling in the manufacturing process. We have realized the difficulty and had the questions to be at after the integrating technology with design. However, in this study, we show the multi-touch technology can be integrated into the traditional furniture design process and provide the implementation possibility for future furniture design. And, a FLASH based multi-touch imagerecognition software is development as the SDK of the multi-touch interface design in
Design a Multi-touch Table and Apply to Interior Furniture Allocation
19
FLASH environment such that designers can use it to develop different multi-touch applications. In addition, we develop an interior furniture allocation system to demonstrate the design work of multi-touch table. The results of this study, most users have positive opinion on multi-touch, and also think it will be a new trend in the future. But most users think that there is a main problem when they operated the multi-touch table. The problem is insensitivity. Because of insensitivity, users have to manipulate it with more forces, it will increase the trouble of manipulation imperceptibly. So after this study, we will look for some possible materials and methods to improve the problem, and star to plan a larger multi-touch screen and some other different form to develop at the same time. Looking forward to possible developments in the future.
References 1.
2.
3.
4.
5. 6.
Ishii, H., Ullmer, B.: Tangible bits: towards seamless interfaces between people, bits and atoms. In: Pemberton, S. (ed.) Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1997, pp. 234–241. ACM Press, New York (1997) Shen, C.: Multi-User Interface and Interactions on Direct-Touch Horizontal Surfaces: Collaborative Tabletop Research at MERL. In: Proceedings of the First IEEE International Workshop on Horizontal Interactive Human-Computer Systems (2006) Rekimoto, J.: SmartSkin: an infrastructure for freehand manipulation on interactive surfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Changing Our World, Changing Ourselves. CHI 2002, pp. 113–120. ACM Press, New York (2002) Han, J.Y.: Low-cost multi-touch sensing through frustrated total internal reflection. In: Proceedings of the 18th Annual ACM Symposium on User interface Software and Technology. UIST 2005, pp. 115–118. ACM Press, New York (2005) Park, S., Won, S., Lee, J., Kim, S.: Smart home – digitally engineered domestic life. Personal Ubiquitous Comput. 7(3-4), 189–196 (2003) Sukeda, H., Horry, Y., Maruyama, Y., Hoshino, T.: Information-Accessing Furniture to Make Our Everyday Lives More Comfortable. IEEE Transactions on Consumer 178 Electronics 52(1) (February 2006)
Implementation of a User Interface Model for Systems Control in Buildings Szu-Cheng Chien and Ardeshir Mahdavi Department of Building Physics and Building Ecology, Vienna University of Technology Karlsplatz 13 (259.3), A-1040, Vienna, Austria
[email protected],
[email protected]
Abstract. Occupant control actions in a building (i.e. user interactions with environmental systems for heating, cooling, ventilation, lighting, etc.) can significantly affect both indoor climate in and the environmental performance of buildings. Nonetheless, relatively few systematic (long-term and highresolution) efforts have been made to observe and analyze the means and patterns of such user-system interactions with building systems. Specifically, the necessary requirements for the design and testing of hardware and software systems for user-system interfaces have not been formulated in a rigorous and reliable manner. This paper includes the prototyping of a new generation of user interface model for building systems in sentient buildings. The outcome of these efforts, when realized as a web-based user interface, would allow the occupants to achieve desirable indoor climate conditions with higher levels of connectivity between occupants and sentient environments. Keywords: sentient buildings, user interface, environmental controls.
1 Introduction An increasing number of sophisticated devices and systems are being incorporated in the so-called high-tech buildings. Particularly in large and technologically sophisticated buildings, the occupants, confronted with complex and diversified manipulation possibilities for environmental controls, are forced to deal with these devices via a wide range of distinct and uncooperative interfaces. These situations can lead to a frustration of the occupants while they attempt to achieve comfortable (visual/thermal, emotional and psychological) conditions. Occupant control actions in buildings (i.e., user interactions with environmental systems for heating, cooling, ventilation, lighting, etc.) can significantly affect both indoor climate in and environmental performance of buildings. Nonetheless, relatively few systematic (long-term and high-resolution) efforts have been made to observe and analyze the means and patterns of such user-system interactions with building systems. Specifically, the necessary requirements for the design and testing of hardware and software systems for user-system interfaces have not been formulated in a rigorous and reliable manner. Thus, we focus in this paper on an effort to further articulate the implementation of an adequate user interface system that can facilitate effective communication and C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 20–28, 2009. © Springer-Verlag Berlin Heidelberg 2009
Implementation of a User Interface Model for Systems Control in Buildings
21
interaction between occupants and environmental systems in sentient buildings [4]. An initial result of this effort is a prototypical interface design named as BECO (“Built Environment communicator”). BECO serves as a user interface model for building systems of an experimental project concerning “self-actualizing sentient buildings” [1]. We first discuss the related work and the results of previous research concerning the comparative evaluation of market products (interfaces) for user-based control of building systems. Secondly, in order to better understand the genesis of the result, we describe test bed infrastructure and system architecture. Furthermore, we elaborate on the implementation of the proposed interface model in terms of implemented services, layout design, and navigation.
2 Background 2.1 Related Work As to the role of user interfaces in the context of intelligent built environments, there are a number of precedents. For example, the ubiquitous communicator – the user interface of PAPI intelligent house in Japan – is developed as a communication device that enables the occupants to communicate with people, physical objects, and places [7]. The other example of this type of user interface is one from Samsung. Samsung's homevita system gives occupants an overview of their home network and allows them to manage daily household tasks such as controlling lights, air conditioners, and even washing machines [8]. More recent works on the integration of user interfaces into intelligent environments include Swiss house project in Harvard University [3], and Interactive space project by SONY [6]. In contrast to the above approaches, we concentrate on the exploration and translation of systematic user interface requirements and functionalities (for office environments) in prototypical designs, whereby users’ control behavior is considered. These requirements are then implemented in terms of a prototypical user interface model mainly supporting user interactions with building systems for indoor climate control. 2.2 Previous Research In previous research efforts [1, 2], the requirements and functionalities of user interfaces for building systems have been explored. We compared twelve commercial user-interface products for building control systems. These products were classified as follows: A type ("physical" devices), B type (control panels), and C type (web-based interfaces). Thereby, we considered three dimensions, namely control options, information types, and hardware. The results were arranged in terms of: 1) comparison matrices of the selected products based on three dimensions, namely control options, provision of information, and hardware, and 2) product comparison/evaluations by the authors' based on seven criteria (functional coverage, environmental information feedback, intuitiveness, mobility, network, input, and output). Subsequently, we conducted an experiment, in which forty participants examined and evaluated a subset of the above user interfaces for buildings' control systems, mainly in view of three evaluative categories (first impressions, user interface layout design, and ease of learning). Comparison results of the selected user interface products for intelligent
22
S.-C. Chien and A. Mahdavi
environments warrant certain conclusions regarding their features and limitations and inform efforts to develop new interface designs. Control Options and Functional Coverage- in sentient environments, one key point is how the occupants interact with the multitude of environmental control devices and how they deal with the associated information loads (technical instructions, interdependence of environmental systems and their aggregate effects on indoor conditions) in an effective and convenient manner. The result of the above mentioned study implies that limited functional coverage and intuitiveness of use often correlate. This suggests that an overall high functional coverage may impose a large cognitive load on (new) users. Provision of Information- if it is true, that more informed occupants would make better control decisions, then user interfaces for sentient buildings should provide appropriate and well-structured information to the users regarding outdoor and indoor environmental conditions as well as regarding the state of relevant control devices. Most B and C type products in our study provide the users with some information such as the state of the devices. However, they do not sufficiently inform the occupants regarding indoor and outdoor environmental conditions. This implies that the occupants are expected to modulate the environment with the condition of insufficient information. Mobility and Re-configurability- the hardware dimension addresses two issues, namely, 1) mobility: user interfaces with spatially fixed locations versus mobile interfaces; and 2) re-configurability: the possibility to technologically upgrade a user interface without replacing the hardware may decrease the cost of rapid obsolescence of technology protocols. C-type terminals such as PDA and laptops that are connected to controllers via internet, facilitate mobility. In contrast, Type A and B products are typically wall-mounted and thus less mobile. As far as re-configurability is concerned, the user interface software may be easily upgraded in Type B and C products, whereas the conventional A-type products are software-wise rather difficult to upgrade. Input and Output- certain type-B and type-C products provide the users with richer manipulation possibilities that – if transparent to the user – could support them in performing a control task. There are other products (particularly type-A), however, that are rather restricted in presenting to the users clearly and comprehensively the potentially available manipulation and control space. Nonetheless, as our results suggest, type-A products are more positively evaluated than the more modern/high-tech (type-B and C) products, especially in view of first impressions and ease of learning. Here, we see a challenge: modern (high-tech) interface products that offer high functional coverage, must also pay attention to the cognitive user requirements so that formulation and execution of control commands are not overtly complicated.
3 Built Environment Communicator The observations analyzed in the previous section informed the resulting interface named as BECO- “Built Environment communicator” (see Fig. 1), which serves as a user interface model for building systems of a research project “self-actualizing
Implementation of a User Interface Model for Systems Control in Buildings
23
Fig. 1. A screen shot of BECO in web browser
sentient buildings” [1]. In this section, firstly, the testbed infrastructure and system architecture are described. The features of this user interface model are then introduced in view of implemented services, layout design, as well as navigation. 3.1 Testbed Infrastructure A testbed infrastructure is set up to simulate office-based sentient environments where a set of services are deployed and seamlessly integrated. The testbed is installed for “self-actualizing sentient buildings” research project [1] as a 1:1 mockup of two office rooms located in our Building Automation Laboratory in Vienna Technical University, Department of Building Physics and Building Ecology. This testbed infrastructure involves a system controller associated with a variety of network protocols (based on the Internet, LAN, and LON Network), devices, and services. In order to create a realistic office environment, this existing light-weight test bed is equipped with systems for heating, lighting, ventilation, shading, and de-/humidification. These devices include: 1) HVAC system; 2) Radiator; 3) Electrical windows; 4) Electrical shading; 5) Ambient lighting system (2 luminaires and 1 task spot for each room); 6) De-/Humidification system (see Fig. 2). 3.2 System Architecture Our interface development is based on Silverlight 2 which is a major tool for building rich interactive user experience that incorporates user interface and media [10]. Visual Studio 2008 (based on C#, as a .NET language) is used as a development tool for coding this silverlight-based user interface framework and Adobe Illustrator for layout and graphic design. Specifically, in order to make the interface more graphical and interactive, XAML (Extensible Application Markup Language) is used as a user interface markup language to create user interface (dynamic) elements and animations. Also, a Microsoft SQL Server, which is a relational database and management system produced by Microsoft, serves as a database server of this interface application. ASP .NET AJAX was developed to improve performance in the browser space
24
S.-C. Chien and A. Mahdavi
Fig. 2. Schematic representation of the equipped devices in a test room (Lab 1)
by making communications between the web-based interface and database server asynchronously. In addition, a specific socket-based communication protocol is conducted to connect to the model-based service via a socket port. 3.3 Implemented Services All identified system services are implemented and aggregated in terms of a webbased interface that provides a central portal for the occupants to access all control services. Thereby, we consider four aspects, mainly control options, provision of information, settings, and hardware: Control options- three control groups considered essential for the occupants of an office [1] are implemented in order to accommodate the occupants’ preferences to control the occupants’ environment. These control groups include “Home” (based on control via perceptual values/parameters), “Devices” (involving control via devices) and scenes (encompassing control via scenes). All deployed control groups have been integrated in BECO providing a “one-for-all” and consistent interface to unify the control solutions. The realization of the above-mentioned control groups may be further customized via user-based definitions of spatial (micro-zoning) and/or temporal (schedule) extensions. An example of a spatial extension is a user-customized assignment of a control device state to a certain location (e.g., Lab1 or Lab2). Such spatial extension is deployed in these three control groups, namely “Home”, “Devices”, and “Scenes”. An example of a temporal extension is a user-defined timebased variation of (schedule for) the position of a certain device/scene. Such temporal extension is employed in control groups regarding “Devices” and “Scenes”.
Implementation of a User Interface Model for Systems Control in Buildings
25
Provision of information- information groups implement a schematic information service for the office-based environment, which continuously updates information from Building information model. Primary information groups include general information, information booth, and information extensions. General information, which is in the bottom of the layout, provides the occupants with user information, time, and date. The occupants can inquire the context information (i.e., indoor/outdoor information) and control task information (regarding device states) via information booth. Also, room surveillance (as linked to IP CAMERA) and location information may be obtained separately by the occupants. Among these information groups, the information booth, room image, and location information are divided into sections and placed into panels that allow the occupants to inquire one or two or close all at a time. Settings- “Settings” include general setting and scene setting. General setting pertains, for example, to startup page (based on “Home” and “Devices”), measurement (involving metric and English system), and suggestion notification marking. Scene setting includes manipulation steps such as control states setting (regarding the control devices in control options) and assigning name/icon. Also, the occupants may assign scene setting to timeline/date setting as optional extension. Hardware- occupants may use mobile interfaces (e.g., laptop and/or tablet pc) to call up this web-based interface model – BECO - and achieve the desirable indoor climate via internet regardless of the spatial limits. Also, it is software-wise easy to upgrade to provide the occupants and building management with high reconfigurability and flexibility potential. 3.4 Layout Design In order to achieve a clear visual hierarchy and semantic structure, this section discusses certain strategies to organize versatile groups and objects in this interface model: Layout framework- the users typically favor the interface to be easy to use/learn and to navigate through independent of the functional coverage ranges (see section 2). Keeping the user interface simple and clear makes it easier for the users to adapt to. Furthermore, changes in the appearance of the layout should clearly relate to users’ intention and operations. Thus, the first step in the design is to achieve a visually consistent and easily-recognizable framework. Firstly, a closure grouping strategy is deployed to form a focal point for short-term user-system interactions (see Fig. 3). Then, related attributes are gathered together and separated from other distinct attributes. For example, most information groups are constantly employed in the right side of the layout to keep them unambiguous separate from the control groups in view of navigation memory. Center stage- the primary job of a transient posture user interface with its shortterm usage patterns is to accomplish an indoor climate control task. For establishing a visual hierarchy and guiding the occupants’ focus immediately to the main control zone where the most important task take place, an obvious and large area is anchored in the center of this interface layout, whereas the auxiliary contents are clustered around the “center stage” [9] in small panels/pieces (see Fig. 4).
26
S.-C. Chien and A. Mahdavi
Fig. 3. (a) Interface layout; (b) Closure grouping; (c) Layout zoning in terms of attributes; (d) Visual hierarchy: center stage and auxiliary content
Use of color- for undertaking a variety range of assigned tasks, this user interface is designed and organized into many subsections in view of the layout. In addition to using the above-mentioned layout framework to integrate them visually, making each subsection distinct and capturing the users’ attention immediately is also an important issue. In our deployed layout, fives series of high-contrast colors are assigned together with the layout framework to identify and “echo” separate attributes in this user interface layout. 3.5 Navigation As to the navigation experience, instead of offering too many “jumps” to satisfy a wide range of flexibility/functional coverage, it is a key issue to provide a more straightforward manipulation memory helping the occupants to get around safely in a quasi “onepage” depth. A strong layout framework discussed in section 3.4, consistently shown on each sequence page, makes learning and retaining of the required manipulation sequence easy and relieves occupants’ cognitive burden to handle varying page content by a wide margin. Moreover, certain cognitively friendly user patterns are used to support the occupants whilst offering richness in manipulation options: Card stack- a number of control options are required for this interface, whereas the occupants may need only one group at one time. Thereby, the control options are grouped into three separate “cards” [9] together with titled tabs (i.e., “Home”, “Devices”, and “Settings”) to allow the occupants to access them one at a time.
Implementation of a User Interface Model for Systems Control in Buildings
27
Accordion- instead of overwhelming the occupants, each information group on the right-hand of the layout (based on context, surveillance image, and location information) are embedded in accordion-like panels and may be opened and closed separately from the others simply when needed. However, the occupants may also trigger these three groups simultaneously and keep them in view all the time. In this aspect, the occupants may experience a neat layout while offering richness in manipulation options. Target guiding- guiding the occupants to go through so many jumps may distract their attention and let them get lost easily in navigation. Two patterns (control "in place” and Sequence guiding) are used to guide the occupants to effectively accomplish the control task, whereby the perceived complexity of the interface is decreased. Continuous scrolling- going through long lists of items may also impose a cognitive burden on the occupants. In order to present a long set of items effectively in “Devices” (control group) and context information panel, a pattern of continuous scrolling is used to enhance the occupants’ rapid selection/review of the items. The occupants may click the arrow to invoke the scrolling. In response to the click, a certain list of items on the display is scrolled through in a horizontal/vertical way. Thus, the occupants may jump to the desired items visually. Terms /icons- labels (e.g., iconic buttons, tags, and text items) are used here to communicate knowledge visually/verbally and to enhance navigation proceeding. For example, in order to convey the cognitive message regarding the main control tasks to the occupants, “Home” and “Devices” control groups are presented in terms of large language-neutral icons. Also, by means of assigning short and easy-to-understand titles, certain text items (together with mapped icons) are made convenient to use by the occupants. To better portray the navigation of the interface, an illustrative scenario with manipulation steps is described and demonstrates how the occupant adjusts the indoor climate conditions. In this example scenario, a company manager is working and finds the room air to be too warm. Thus, she calls up “control via perceptual values” in “Home” control groups and chooses “Temperature” option (see Fig. 4). A control box is triggered on the main control zone of the interface screen. She presses “cool” button twice. That way, she has control over the temperature of the room, while the model-based system [5] translates her input with its own simulation-based approach
Fig. 4. The occupant adjusts the indoor climate conditions by control via perceptual values
28
S.-C. Chien and A. Mahdavi
to trigger an appropriate control action involving the related devices. Subsequently, the system changes the states of HVAC, the position of the blinds as well as the window of her office room. Meanwhile, the animated icon in the control box becomes cool by 2 levels, as an information feedback of the temperature transition. Once the control task is finished, she clicks somewhere else to terminate the control box and the screen reverts to a default view of “Home” control group.
4 Conclusion The present paper demonstrated the translation of systematic user interface requirements and functionalities (for office environments) into prototypical designs, whereby users’ control behavior is considered. The proposed user interface model mainly supports user interactions with building systems for indoor climate control. With easilyrecognizable icons and well-structured navigation possibilities, a wide range of control options are provided to the occupants. The implemented interface prototype provides a testable basis for future developments in user interface technologies for sentient buildings. Acknowledgements. The research presented in this paper is supported, in part, by a grant from FWF (Fonds zur Förderung der wissenschaftlichen Forschung), project Nr. L219-N07. We also thank Ministry of Education of Taiwan for its support of this work.
References 1. Chien, S.C., Mahdavi, A.: User Interfaces for Building Systems Control: from Requirements to Prototype. In: 7th European Conference on Product and Process Modelling, pp. 369–374. CRC Press, Sophia Antipolis (2008) 2. Chien, S.C., Mahdavi, A.: User Interfaces for Occupant Interactions with Environmental Systems in Buildings. In: 24th International conference on Passive and Low Energy Architecture, pp. 780–787. RPS Press, Singapore (2007) 3. Huang, J.: Inhabitable Interfaces: Digital Media- Transformations in Human Communication. In: Messaris, P., Humphrey, L. (eds.), pp. 275–286. Peter Lang, New York (2006) 4. Mahdavi, A.: Anatomy of a cogitative building. In: 7th European Conference on Product and Process Modelling, pp. 13–21. CRC Press, Sophia Antipolis (2008) 5. Mahdavi, A., Spasojevic, B.: Energy-efficient Lighting Systems Control via Sensing and Simulations. In: 6th European Conference on Product and Process Modelling, pp. 431– 436. Taylor & Francis, London (2006) 6. Rekimoto, J.: Organic interaction technologies: from stone to skin. Commun. ACM 51(6), 38–44 (2008) 7. Sakamura, K.: Houses of the Future- TRON House & PAPI: Insight The Smart Environments. In: Chiu, M.L. (ed.), pp. 203–222. Archidata Press, Taipei (2005) 8. Samsung Homevita, http://support-cn.samsung.com/homevita/ 9. Tidwell, J.: Designing Interfaces: Patterns for Effective Interaction Design. O’Reilly, Sebastopol (2005) 10. Wenz, C.: Essential Silverlight 2 Up-to-Date. O’Reilly, Sebastopol (2008)
A Web-Based 3D System for Home Design Anthony Chong, Ji-Hyun Lee, and Jieun Park Graduate School of Culture Technology, KAIST, Daejeon, Republic of Korea
[email protected],
[email protected],
[email protected]
Abstract. Buying a home is a big investment and is one of the most important decisions made in one’s life. Home owners after purchasing the apartments are interested in having their own home unique design identity. They often seek expert interior designers to assist in designing the homes and bringing out the uniqueness in them. In current interior design industry, designers have to meet the owners often to discuss the designs and alter the housing layout design accordingly to the owners’ preferences. This process is often repeated many times before a finalized housing design layout will be accepted by the owners. In this paper, we propose a rule-based housing design system to generate many sets of alternatives housing design layouts based on the initial designer’s housing design layout. Designers, therefore, are able to produce alternative housing designs for the owners and also able to explore various alternatives done by the rule-based design system that they have not encountered before. Keywords: Housing design, rule-based system, web-based system.
1 Introduction Buying a home is a big investment and is one of the most important decisions made in one’s life. Selecting a house or an apartment for a buyer often involved many considerations. Location, cost, family size, neighbors, even to the extents of considering design of the house’s feng-shui (Chinese Geomancy) especially for Asian buyers where they believed certain layout of the home design will affect their wealth, health, career and family harmony. The Korean housing market, which accounts for 43.4% of the construction industry [1], is faced with a highly competitive environment due to various customer needs and the growth of the housing supply ratio (expected to attain to 11.6%), and the changes in housing policies where Korea Ministry of Construction and Transportation has set up plan to construct 5000, 000 housing units every year from 2003 to 2012. By 2012, there will be 5 million housing construction in Korea [2]. However, in most countries (including Korea) apartments/houses are often built with standardized plans that traditional reflects commonly house types. These apartments often come with standard design. Home owners after purchasing the apartments are interested in having their own home unique design identity. They often seek expert interior designers to assist in designing the homes and bringing out the uniqueness in them. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 29–38, 2009. © Springer-Verlag Berlin Heidelberg 2009
30
A. Chong, J.-H. Lee, and J. Park
In a customer-oriented environment, the design of housing however requires intensive communication between the customer and the designer, as well as complicated processes in design and construction phases. Designers and the owners often need to meet up very regularly to determine and to confirm the house designs on papers. These initial design concepts are then be further defined, criticized, and rejected, or revised, refined, and developed until a housing design is accepted by the clients. Regular changes to the design are made very often on the spot due to many reasons, for example, budget costing, materials of furniture, lightings and so on by owners. Rough plan proposals are reviewed and revised until a finished plan which meets all requirements is presented and approved. In housing design, there is decision which an interior designer take is likely to have implications that cut across multiple aspects. Removing a room’s balcony, for example, may result in a bigger comfortable room with more space, but at the same time, the room will feel warmer during summer and colder during winter along with bad noise insulation problem. These isolated issues themselves are the interconnectedness which makes interior design a highly complex activity. The paper is organized as follows: Section 2 is the literature review describing the housing design concept development and listed some of the commercial home design software products available in the market. Section 3 introduces the basic design rules. Section 4 describes the implementation of the rule-based home design system to a design scenario. Section 5 concludes this paper.
2 Literature Review 2.1 Housing Design Concept Development The final design of a housing layout accepted by clients often involves intensive discussion between the customers and the designer. This is the information acquisition process where time should be spent to gather as many facts as possible about the inhabitants of a home, getting the idea of who they are and what they like. It is important for a designer to conceptualize the future housing layouts his or her clients would prefer. There are the basic four elements in planning a house design that a designer have to consider, understand and gather before the graphic stage of planning any housing design drawing on paper [3]. The designer has to form (1) a Client Profile to find out the number, ages, sex, activities and relationship of people who will make up the household in order to met the special needs and interests of each individual and of the group. Normally this planning also involves the degree of privacy and the interaction. Designer also has to know and understand his or her clients’ lifestyle. For example, the clients have a hobby of rearing fishes and the designer has to consider the fish tank in his or her housing design stage. Designer has to know whether the clients like to invite friends to their home on every weekend or prefer to spend the weekend quietly working on a hobby in the house. In this case, the living room’s size has to be taken in mind during the design stage. Designer also need to consider the (2) Functional Goals during the design of the housing layout. Lifestyle of the family determines the home functions for which it is
A Web-Based 3D System for Home Design
31
being planned. The designer needs to consider any special needs for the clients who might need a home office with separate access or special facilities for an elderly or disabled person. When designing the layout of the house, designer needs to look at the (3) Equipment Needs where designer needs to know what kind of equipments the clients will be having in the house, for example, electricity appliance (television, sound systems) during the planning of the housing design. Designer needs to look into the (4) Space requirements where it is highly based on the careful study of the activities, behavior patterns, development needs and desires of the clients. The clients might love to read and have a big collection of books and the collection will generally increases slowly. The designer will have to put this information into consideration when allocating the space for during the design process. In the complexity of housing design, it is necessary to allow creative ideas and inspiration to develop and evolve freely. At this stage of the design process, creativeness begins to synthesize all the previously gained data with professional housing design knowledge and experience into a concept which will determine the outcome of the housing layout design. Designer uses bubble diagrams from adjacency studies and schematic drawings to generate floor plan via a series of sketches which begin to allocate more concrete layout shapes to activity spaces. A two dimensional floor plan drawing may seen as plain and it is sometimes hard to read, but its importance in determining the kind of life possible in any given space. Designer must first digest, analyze, and evaluate the information gathered in a systematic way. The design process provides a number of ways by which to organize and translate information into solutions. Interior zoning (zones) and adjacency studies are often used in design principles. In zoning principle, regardless of the size of the house, space divides itself into zones that group similar kinds of activities accordingly to the degree of privacy or sociable interaction. Designer normally starts the design accordingly to three activities -Social (living, dining and balcony), private (bedroom) and work (Kitchen) activities. In adjacency principle, the relationships between various spaces and activities can be outlined roughly using a bubble diagram to group and organized accordingly to zoning principles. The Bubbles represent interior spaces and their importance and relationship to each other. The connecting lines between bubbles indicate access and flow. It is easier to do this with the abstract tool of a bubble rather than to begin defining spaces with walls and the technical relationships between rooms, access and flow. Errors can be seen by conceptualizing what was a thought into concrete relationship on paper. This process is critical and precedes formal space-planning and help to clarify how space relates and flows before floor plans are drawn. Schematic drawings are made to help visualize concepts. They are refinements of the bubble diagrams used in analysis, with greater details, more accurate proportion and with measurement, suggesting how the space might look and feel. A bubble diagram with correspond to its schematic diagram is shown in figure 1. 2.2 Home Design Software Products Designer often uses 3D modeling software (example 3D Studio Max and 3D Maya) to assist in their job. However, in recent years, commercial home design software
32
A. Chong, J.-H. Lee, and J. Park
Fig. 1. A bubble diagram and its 2D schematic representation diagram. In bubble diagram, larger bubbles represent larger space. Overlapping bubble are common spaces that are accessible from another space (Kitchen and Living room).
products can be found in the market. 1) Punch! Professional Home Design Suite [4], 2) Better Homes and Gardens Home Designer Suite [5], 3) Instant Architect [6], 4) Total 3D Home & Landscape Design Suite [7] 5) My Virtual Home [8] 6) 3D Home Architect & Landscape Design Deluxe Suite [9] 7) Instant Home Design [10] 8)Your Custom Home [11] 9) Design Workshop Classic [12] 10) Quickie Architect [13]. All these packages offer easy-to-use application software products offer carious degree of features such as 3D household objects libraries with license fees involved. These commercial home design software products are not web-based and their functionalities are limited. Therefore, we propose to modify an open source home design software to allow the introduction of design rules to generate new housing layout automatically. The next section will describe the basic rules for developing the rule-based housing design system.
3 Design Rules In housing design, designer often requires intensive communication with the customer to finalize the housing design layout with the clients. During these period of communication, designer often draw multiple design layouts to show to the clients. This is very time consuming for both parties. In this work, we aim to develop common housing design rules that will allow computer to help the designer in generating housing layout automatically. In previous section, the 4 elements are often used by designer in collecting information from their clients. Designer uses these information to manually design the first housing design layout for the clients. Taking the first housing design layout, introduce the basic rules to the home design software to generate multiple new housing design layout based on the initial housing design layout. The 3D coordinates of all objects in the housing design layout will be defined. This is to ensure objects faced the correct direction as determined by the initial housing design layout. The 15 basic rules are determined as followed.
A Web-Based 3D System for Home Design
1.
2. 3. 4. 5. 6. 7. 8.
9. 10. 11. 12. 13. 14. 15.
33
All objects (furniture) must be within the dimension of the walls of each individual room. Designer would have considered the furniture style for each individual room during the initial design. Toilet cannot be relocated. The reason is due to the understanding of the drainage piping system of apartment building. All windows cannot be move and resize due to the windows locations in apartments are fixed. Living room area dimension size cannot be changed as the designer would have considered the lifestyle of the clients. All doors must have a clearance of more than 1 meter facing inwards and outwards. This is to ensure doors are not blocked by other objects in the design. All enclosed dimension (4 walls) will generate a door. This is to prevent an enclosed area without a door. Door dimension cannot be changed. All four edges of the door must be in collision with the wall and the floor. This is to ensure no stand-alone door appearing in the new housing layout design. The collision detection algorithm will be used. At least one edge of wall must be adjacent to any other wall. This is to ensure no stand-alone door. The collision detection algorithm will be used. No penetration of furniture with any other object by employing collision detection algorithm. No less than 2 meter gaps between two parallel walls. This is to ensure proper housing layout. The x-axis of the L-shaped couch is always in the direction of the LCD TV. Backside of all bookshelves must collide with the wall. The collision detection algorithm will be used. Headboard of the beds must collide with the wall. The collision detection algorithm will be used. Base (-ve X-axis) of all objects will always be in the direction of the floor. This is to prevent objects from appearing upside down.
These are the common rules that the home design software must be receiving and with these rules, new multiple housing design layouts can be generated automatically by the computer. These rules, depending on the owners’ requirement and needs, can be changed and updated to the home design software. The next section, we will introduce a free open source home design software (Sweet Home 3D) to demonstrate our work.
4 Implementation 4.1 Software Environment In our work, Sweet Home 3D (http://sweethome3d.sourceforge.net/), an open-source free interior design application, with a 3D preview was used for the designing and modeling of an apartment. Sweet Home 3D enables designers to easily design their housing layout with ease. This allows the designers to design and make the changes with ease with the home owner’s concepts in mind. In this section, we will use this
34
A. Chong, J.-H. Lee, and J. Park
software with the basic rules as set in the previous section to generate many sets of alternatives housing design layouts based on the first initial designer’s housing design layout. Designers therefore are able to produce alternative design for the owners done by the framework and also able to explore various alternatives done by the design framework system that they have not encountered before. This rule-based housing design system offers the designers a promising application for the housing domain because it allows times and efforts to be saved from showing multiple housing layout design to customers. Sweet Home 3D is written in Java and offers the ability to extend the functionalities of this application to include rule-based design to generate multiple housing design layouts. Sweet Home 3D has an applet version which supported web-based home design. 4.2 A Design Scenario We will describe a scenario in this section to demonstrate of our rule-based housing design concept. A woman and an adult son purchase a new apartment. The mother (62) is a retired teacher who teaches private knitting in her home. She has over 30 years experience in knitting and has is now teaching knitting at the comfort of her apartment. Her son (30) is a freelance 3D computer animator who works at home accepting various computer animation modeling jobs from his agency located in another city remotely. The designer will design the housing layout and furnish these specific rooms for the clients. 1. 2. 3. 4. 5. 6.
A knitting room for the mother A Dining/Living room One bedroom for the mother One Bedroom/home office for the son A Kitchen Two toilets
The bedroom/home office: Is to be a quiet, orderly work environment for the son who can work from home. He will not have on site clients. The home office needs to have a computer server (to save his data and work), a computer with a large scale monitor and professional books. The son has requested a desk, a bookshelf, an ergonomic office chair; appropriate lighting; floor, wall and window treatments; and furnishings, finishes, and accessories (FF&A) for his office.
‐
‐
The knitting room: Is to be a space that will be a well lighted inviting creative environment for teaching knitting to students of all ages. The space should be big enough to occupy up to 5 students. She has requested storage for knitting equipments (needles, yarn). The Living Room will serve as a combined relaxing social and informal entertaining area. There is natural light from windows. Access to the home office and knitting room will be through the living room. The client would like an informal designated eating and serving area to seat a maximum of 8 people. Bedrooms: Standard furnished common bedroom.
A Web-Based 3D System for Home Design
35
Client Preferences: The clients have asked the designer to begin with the rooms as if they were white boxes. Architectural details are to be appropriately drawn to scale and all floor, wall, window, and ceiling treatments are to be decided by the designer. The designer based on the requirement of the owner and created the initial housing layout as shown in figure 2. This initial housing design layout which the designer designed is based on the information gathered from the owners. All requirements from the owners are met: two bedrooms, two toilets, one knitting room, a kitchen and one living room with enough space to entertain 8 people. The mother’s bedroom (upper left hand corner) is next to the knitting room with a single partition. The son’s room is at lower right hand corner.
Fig. 2. This initial housing design layout which the designer designed is based on the information gathered from the owners
4.3 Implementation of the Design Scenario We executed the rule-based Sweet Home 3D to generate new alternative housing design as shown in figure 3, 4, 5 and 6 based on the initial housing design. Each figure shows different housing layout designs with comparison to the initial design from the designer. In figure 3, a new partition wall has been created in the son’s room (lower right hand corner) to create a mini-office environment. One of the edges is adjacent to the side wall obeying the basic rule. There is no object collide with that new wall. The partition between the mother’s room (upper left hand corner) and the knitting area has been extended to provide an enclosure for the mother’s room. An extra door has been created for the mother’s room. The two toilets and windows remain at their original location. As shown in figure 4, the couch and bookshelf in the living room have been shifted but not out of the dimension of the living room. The locations of the mother’s room (now at lower left hand corner) and the kitchen (now at upper left hand corner) are switched. A corridor has been created between the knitting room and the mother’s room obeying the rules (>2m gaps) between parallel walls. The doors for the toilet and the knitting room have been re-located.
36
A. Chong, J.-H. Lee, and J. Park
Fig. 3. Computed layout based on the design rules with comparison with figure 2 (designer’s design)
Fig. 4. Computed layout based on the design rules with comparison with figure 2 (designer’s design)
As shown in figure 5 below, the couch and bookshelf in the living room have been shifted. The kitchen is now at a new location (upper left hand corner) and the mother’s room is now created at the lower left hand corner. The knitting room has is now at the lower left hand corner. The mother’s room has been shift to the center. A new door for the mother’s room has been created. The door obeys the rule of clearance at least of 1 meter. Location of the bookshelf at the living room and the bed itself does not allow the door at the other walls to have a 1 meter clearance. A new wall partition has been created in the son’s room to ensure better privacy. The doors for the toilet and the knitting room have been shifted. Figure 6 below shown the couch and bookshelf in the living room have been shifted. The son’s room is now at a new location (upper center). The area dimension of the living room remains the same obeying the dimension rule of the living room.
A Web-Based 3D System for Home Design
37
Fig. 5. Computed layout based on the design rules with comparison with figure 2 (designer’s design)
Fig. 6. Computed layout based on the design rules with comparison with figure 2 (designer’s design)
5 Conclusion and Future Work We proposed a rule-based system where the Sweet Home 3D is able to accept rules as listed in section 3 to generate new alternative housing design from the initial housing design. Designer will have the first meeting with the owners and will collect design layout preference information from the owners. Designer will use sweet home 3D to create the initial design and introduces rules to the modified sweet home 3D software to generate multiple housing design layouts for the clients at the second meeting. This will reduce significant time and efforts for both parties to reach to an agreement on the housing design layout. For the future work, the system could be improved by using a Case-Based Design approach where past experience of home designer will be stored and updated. This to allow the designers to look at each problem as a new case and computer is able to search for a related solution from the database. The system will revise the former solutions and adapt to the new situation. Designers can re-use the stored past experience to assist them in their housing design.
38
A. Chong, J.-H. Lee, and J. Park
References 1. Kim, Y.S., Oh, Y.K., Kim, J.J.: A planning model for apartment development project reflecting client requirements, Korea. Journal of Construction Engineering and Management, Korea Institute of Construction Engineering and Management 5(3), 88–96 (2004) 2. Kang, P.M.: Directions of Korean Housing Policy. In: 3rd ASIAN Forum conference, Tokyo, Japan, January 27-29 (2004); For the Field of Architecture and Building Construction, http://www.asian-forum.net/conference_2004/session_pdf/ 3-4%20R%20Korea%20G%20Kang.pdf 3. Nissen, L.A., Faulkner, R., Faulkner, S.: Inside Today’s Home. In: Harcourt Brace, 6th edn. (1994) 4. Punch! Professional Home Design Suite, http://www.punchsoftware.com/index.htm 5. Better Homes and Gardens Home Designer Suite, http://www.homedesignersoftware.com/products/ 6. Instant Architect, http://www.imsidesign.com/ 7. Instant Architect, http://www.individualsoftware.com/products/ home_garden_design/total3d_home_landscape/ 8. My Virtual Home, http://mvh.com.au 9. 3D Home Architect & Landscape Design Deluxe Suite, http://www.punchsoftware.com/index.htm 10. Instant Home Design, http://www.topics-ent.com/ 11. Your Custom Home, http://www.valusoft.com 12. Design Workshop Classic, http://www.artifice.com/dw_classic.html 13. Quickie Architect, http://www.upperspace.com/
Attitudinal and Intentional Acceptance of Domestic Robots by Younger and Older Adults Neta Ezer1, Arthur D. Fisk2, and Wendy A. Rogers2 1
Georgia Institute of Technology, Department of Industrial Design 247 4th St. Atlanta, GA 30332-0170, USA 2 Georgia Institute of Technology, School of Psychology 654 Cherry St. Atlanta, GA 30332-0170, USA {neta.ezer,af7,wr43}@gatech.edu
Abstract. A study was conducted to examine the expectations that younger and older individuals have about domestic robots and how these expectations relate to robot acceptance. In a questionnaire participants were asked to imagine a robot in their home and to indicate how much items representing technology, social partner, and teammate acceptance matched their robot. There were additional questions about how useful and easy to use they thought their robot would be. The dependent variables were attitudinal and intentional acceptance. The analysis of the responses of 117 older adults (aged 65-86) and 60 younger adults (aged 18-25) indicated that individuals thought of robots foremost as performance-directed machines, less so as social devices, and least as unproductive entities. The robustness of the Technology Acceptance Model to robot acceptance was supported. Technology experience accounted for the variance in robot acceptance due to age. Keywords: Domestic Robots, Older Adults, Technology Acceptance.
1 Introduction As robots are entering the domestic environment, a question to ask is: Will people be accepting of robots in their homes? This is an important and interesting question because robots have the potential to assist their human owners in many ways, but at the same time may be perceived as altering the social environment of the home. Robots would be considered disruptive technologies, as they would not simply be new versions of existing technologies. Disruptive technologies are often not accepted as readily as incremental innovations [9], [10]. This question about robot acceptance is particularly relevant to older adults. Robots are currently being designed to help older adults live in their homes longer, by helping them to perform activities such as medication management and to provide emergency monitoring [4]. There is a need to understand, first, what older adults’ perceptions are of a robot in their home and second, what variables can predict whether older adults would be accepting of such a robot. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 39–48, 2009. © Springer-Verlag Berlin Heidelberg 2009
40
N. Ezer, A.D. Fisk, and W.A. Rogers
1.1 Robot Acceptance In the Technology Acceptance Model (TAM), acceptance is defined as a combination of attitudes, intentions, and behaviors towards a technology [6]. In the model, perceived usefulness and perceived ease of use of a technology are incorporated into consumers’ attitudes about the technology. These attitudes predict intentions to buy or use the technology and actual behaviors involving acquiring and using the technology [7]. The relationship between perceived usefulness and perceived ease of use and technology acceptance has been demonstrated for numerous information technologies [11], [17]. The acceptance of a robot for the home, however, may involve alternative predictors. Other expectations about a robot, for example its social abilities, may be more predictive of attitudinal and intentional acceptance of that robot. Robots may also carry out tasks in which they behave as teammates with their human owners [5]. Thus, variables that are generally predictive of acceptance of humans as social partners (e.g., friendliness) and as teammates (e.g., motivated) may be more predictive of acceptance of robots than those describe in the TAM. 1.2 Older Adults and Robots Several research projects are currently underway to design robots for the older adult population [1]. In the future, robots may help older individuals learn new skills, manage finances, and remember to take their medication, among other things. A robot may be especially effective for these types of activities because it can be socially engaging, and an intelligently dynamic device [3], [12], [16]. Although there are many potential benefits of assistive robots in the home for older adults, older individuals might not be as accepting as younger adults of such a device in the home. Older adults may be especially concerned about how difficult a new device will be to learn [8]. On the other hand, older adults appear willing to accept technology if it allows them to live independently in their home [15]. Consequently, if older adults perceive a robot in their home as helpful rather than intrusive, they may be just as accepting of it as younger adults. Despite the growing interest in developing robots for older adults, few studies have investigated this age group’s acceptance of robots. The studies that have been conducted have generally measured responses of older adults to specific robots with limited functionality [2], [13], [14]. For example older adults expressed excitement with a nurse-robot that helped them navigate through a building [13]. These studies provide evidence that older adults may accept certain robots in certain situations. They do not, however, reveal more general attitudes and perceptions older individuals have about robots, which could be used to predict acceptance for a wider variety of robot types in the context of the home. There is a need to understand the relationship between older adults’ expectations of domestic robots and their acceptance of them. 1.3 Overview of Study An exploratory survey study was used to understand younger and older adults’ prototypical characteristics of domestic robots and the relationship between these characteristics and robot acceptance. Acceptance was limited to attitudinal and intentional acceptance because most robots designed for domestic use are still in the research and
Attitudinal and Intentional Acceptance of Domestic Robots
41
early development phase. It was predicted that perceived robot characteristics related to social partner and teammate acceptance would add significant predictive power to acceptance over that explained by perceived usefulness, and perceived ease of use alone. Additionally, there were two possible predicted patterns of age-related differences in robot acceptance. If older adults thought of robots as beneficial to them, they were predicted to be as accepting as younger adults of a robot in their home; if they did not see the benefit, they were expected to be less accepting than younger adults of a robot in their home.
2 Method 2.1 Sample Questionnaires were sent to 2500 younger adults (18-28 yrs) and 2500 older adults (65-86 yrs) in the Atlanta Metropolitan area and surrounding counties using an agetargeted list with a 65% hit rate. Forty-three packets were returned as undeliverable. Of the total questionnaire packets sent, 177 included completed questionnaires from individuals in the targeted age groups (110 packets contained only sweepstakes entry forms and 23 respondents were not of the correct age). The effective response rate was 5.6%. The response sample was composed of 60 younger adults (M = 22.7 yrs, SD = 3.2) and 117 older adults (M = 72.2 yrs, SD = 5.7). The younger and older adult samples were 21.7% and 53% male, respectively. Participants indicated living independently either in a house, apartment, or condominium. There were no older adults who indicated living in a nursing home or assisted living facility. 2.2 Questionnaire A separate page was included with the questionnaire instructing participants to imagine that someone gave them a robot for their home and to draw and describe this robot. This page was to be filled out before participants began the questionnaire. The questionnaire contained four sections: 1) Views about Robots, 2) Robot Tasks, 3) Technology/Robot Experience, and 4) Demographics. The Robot Tasks section of the questionnaire will not be discussed in this paper. Views about Robots. The first part of the section contained 48 Likert-type items of possible robot characteristics. The items were developed through an extensive literature review of variables predictive of technology/machine, social partner, and teammate acceptance. The instructions were for participants to indicate how much each item matched the characteristics of the robot they had imagined in their home from 1 = “not at all” to 5 = “to a great extent”. The second part of the section included four statements about perceived usefulness (performance, productivity, effectiveness and usefulness) and four statements about perceived ease of use (easy to learn to use, easy to become skilled at, easy to get technology to do what user wants, and overall ease of use) The instructions were for participants to indicate how much they agreed with each of the eight statements about
42
N. Ezer, A.D. Fisk, and W.A. Rogers
the robot they imagined in their home. A Likert scale was used from 1 = “strongly disagree” to 5 = “strongly agree”. The last part of the section contained items about the attitudinal and intentional acceptance of the robot that participants had imagined in their home. There were three 5-point scales for attitudes (Bad-Good, Unfavorable-Favorable, and NegativePositive) and three 5-point scales for intentions (No Intention-Strong Intention, Unlikely-Likely, and Not Buy It-Buy It). Participants were instructed to circle the number on each scale representing their attitudes about the robot and their intentions to buy the robot if it were available for purchase. Technology and Robot Experience. The technology and robot experience parts of the questionnaire consisted of 20 technology items and six robot items, respectively. Participants were asked to indicate on a Likert-type scale how often they had used each technology in the past year from 1 = “not at all” to 5 = “to a great extent (several times a week. The robot items were categories of existent robots: manufacturing, lawn mowing, mopping, vacuum cleaning, guarding, and entertaining. Participants were asked to indicate how much experience they had with each on a Likert-type scale from 1 = “no experience with this robot” to 5 = “I have and use this robot”. 2.3 Procedure The questionnaires and supporting materials were mailed to residents in the Atlanta area. Recipients were given four weeks to complete and return the questionnaire. A reminder postcard was mailed two week after the initial mailing. Recipients could mail back a sweepstakes entry form to win one of fifty $50 checks.
3 Results 3.1 Technology and Robot Experience Participants were each given a technology experience score from the mean of their responses to the frequency of using 18 technologies in the past year. Home medical device and non-digital camera were excluded due to a lack of significant correlations with the other items. A score of 1.0 on the technology experience scale would indicate no experience and a score of 5.0 would indicate daily experience with the items that were presented. An ANOVA with age (younger, older) as the grouping variable showed younger adults (M = 4.05, SD = .44) as having significantly more experience with tech2 nology than older adults (M = 3.38, SD = .66), F(1, 175) = 48.9, p <.01, ɳp = .22. Similarly, each participant was given a robot experience score derived from the mean of their responses to familiarity with six types of robots. On this robot experience scale, a score of 1.0 would indicate no experience and a score of 5.0 would indicate extensive experience with the robots. Participants indicated minimal experience with robots (M = 1.92, SD = .74). An ANOVA, with age (younger, older) as the grouping variable showed younger adults (M = 2.20, SD = .73) reporting slightly, but significantly, more robot experience than older adults (M = 1.77, SD = .71), F(1,175) 2 = 14.3, p <.01, ɳp = .08.
Attitudinal and Intentional Acceptance of Domestic Robots
43
3.2 Robot Characteristics The robot characteristic variables were submitted to a principle axis factor analysis (kappa = 4). Three factors were retained after examination of the scree plot. The factor analysis was rerun with seven items removed (complex, dependent, independent, interesting, pointless, simple, and static) due to these items not meeting the criterion of a factor loading greater than .4. The resulting pattern matrix is presented in Table 1. The three factors were labeled “performance-oriented traits”, “sociallyoriented traits”, and “non-productive traits”. Each participant received a mean score on each of the factors. Items with negative loadings were reverse scored. Six outliers were removed from analysis. Younger adults had mean performance-oriented, socially-oriented, and non-productive mean scores of 4.08 (SD = .54), 3.08 (SD = .98), and 1.41 (SD = .44), respectively. Older adults had mean scores of 3.84 (SD = .80), 2.81 (SD = .96), and 1.41 (SD = .44) on these factors, respectively. Paired t-tests, with Bonferroni correction of .0167, were conducted to assess differences in the means of the robot characteristics factors. Participants ascribed significantly more performance-oriented traits (M = 3.92 SD = .73) to their imagined robots than socially-oriented traits (M = 2.89 SD = .98 ), t(173) = 14.65, p < .01; significantly more performance-oriented traits than non-performance traits (M = 1.41 SD =.43 ), t(174) = 32.3, p < .01; and significantly more socially-oriented traits than nonperformance traits, t(173) = 16.71, p < .01. Age-related differences in performance-oriented traits were examined separately from the other two factor scores, as a covariance matrix indicted a difference in variances between younger and older adult scores on this scale. The ANCOVA indicated that age did not have a significant effect on scores, F(1, 167) = .13, p = .71. There was no significant effect of robot experience, F(1,167) = .28, p = .60 on scores. Technology experience had a significant relationship with these scores, F(1,167) = 7.24, p = 2 .01, ɳp = .04, with more experience related to higher scores. A MANCOVA was performed on the other two factors, socially-oriented traits and non-productive traits, with age group (younger, older) as the grouping variable and robot experience and technology experience as covariates. Box’s M test was nonsignificant, Box’s M =.63, p = .891. Again, age did not have a significant effect on scores, Pillai’s Trace statistic F(2,166) = 2.48, p = .087. Robot experience, F(2,166) = .746, p = .476, and technology experience, F(2,166) = 2.57, p = .079 did not have significant relationships with trait scores. 3.3 Technology Acceptance Model Variables Participants were assigned mean scores for ease of use and usefulness. The scores of seven participants were not included due to Mahalanobis distances exceeding the criterion, which was set at 2(2, N = 180) = 9.21 at p < .01. Younger adults had a mean usefulness score of 4.41 (SD = .75) and a mean ease of use score of 4.05 (SD = .78); older adults’ mean scores were 4.07 (SD = .86) and 3.87 (SD = .94), respectively. A paired t-test, with Bonferroni correction of .025, indicated usefulness scores (M = 4.18, SD = .84) as being significantly greater than ease of use scores (M = 3.83, SD = 85), t(166) = 4.00, p < .01. The scores of the two variables were significantly correlated, r(167) = .54, p < .01.
44
N. Ezer, A.D. Fisk, and W.A. Rogers 1
Table 1. Factor Weights and Communalities Based on a Principle Axis Analysis with Promax Rotation for 41 Items of Robot Characteristics Item
1
Original item PerformanceSocially Nonproduc2 category oriented traits oriented traits tive traits Efficient Tech/machine+ 0.81 Reliable Tech/machine + 0.78 Precise Tech/machine + 0.75 Teammate+ 0.75 Helpful Coordinated Tech/machine + 0.71 Useful Tech/machine + 0.69 Safe Tech/machine + 0.64 Quiet Social0.63 Calm Teammate + 0.62 Tech/machine + 0.62 Sturdy Agreeable Teammate + 0.58 Confident Teammate + 0.54 Trustworthy Teammate + 0.53 Serious Social0.48 Dynamic Social + 0.45 Unfeeling Social -0.85 Social + 0.71 Compassionate Unimaginative Teammate -0.71 Unsocial Social -0.70 Expressive Social + 0.69 Friendly Social + 0.63 Dull Social + -0.63 Playful Social + 0.60 Creative Teammate + 0.60 Lifelike Social + 0.57 Artificial Social -0.54 Boring Social -0.49 Motivated Teammate + .0.43 0.46 Talkative Social+ 0.45 Tech/machine 0.67 Unpredictable Wasteful Tech/machine 0.66 Chaotic Teammate 0.66 Risky Tech/machine 0.61 Demanding Teammate 0.58 Clumsy Tech/machine 0.58 Selfish Teammate 0.54 Nervous Teammate 0.52 Lazy Teammate 0.51 Breakable Tech/machine 0.47 Careless Tech/machine 0.46 Hostile Teammate 0.45 2 Factor weights <.4 suppressed; Plus sign denotes positive trait and minus sign denotes negative trait in original category.
A MANCOVA with age group (younger, older) as the grouping variable, ease of use and usefulness as dependent variables, and technology and robot experience as covariates was conducted. The analysis indicated a non-significant effect of age on scores, Pillai’s Trace statistic F(2,159) = .14, p = .87. Technology experience had a 2 significant relationship with scores, F(2,159) = 6.57, p = .002, ɳp = .08. Univariate tests indicated significant relationships between technology experience and ease of
Attitudinal and Intentional Acceptance of Domestic Robots
45
use, F(1,160) = 6.05, p = .02, ɳp = .04, and technology experience and usefulness, 2 F(1,160) = 12.68, p < .001, ɳp = .11, with more technology experience related to higher scores for both. Robot experience was not significantly related to scores, F(2,159) = .38, p = .68. 2
3.4 Attitudinal and Intentional Robot Acceptance Each participant was given mean attitudinal and intentional acceptance scores. Younger adults had a mean attitudinal score of 4.13 (SD = .94) and a mean intentional score of 3.57 (SD = 1.18); older adults’ mean scores were 3.19 (SD = 1.20) and 3.07 (SD = 1.37), respectively. A paired t-test indicated that the mean score of attitudinal acceptance (M = 3.99, SD = 1.11) was significantly greater than the mean score of intentional acceptance (M = 3.25, SD 1.32), t(177) = 8.85, p < .01. A MANCOVA was performed with age group (younger, older) as the grouping variable, acceptance (attitudinal, intentional) as dependent variables, and technology and robot experience as covariates. The analysis indicated that age did not have a significant effect on robot acceptance, Pillai’s Trace statistic F(2, 170) = .32, p = .72. Technology experience was found to have a significant relationship with robot accep2 tance F(2, 170) = 3.74, p = .03, ɳp = .04. Univariate tests indicated technology experience having a significant relationship with attitudinal acceptance scores, F(1, 171) 2 = 4.12, p = .04, R = .04 and intentional acceptance scores, F(1, 171) = 7.09, p = .01, 2 R = .09, with more technology experience related to greater acceptance. Robot experience was not significantly related to acceptance, F(2, 170) = 3.74, p = .53. Table 2. Regression of Technology Acceptance Model (TAM) Scores and Robot Characteristic Scores on Attitudinal Acceptance Scores
Model TAM Robot Characteristics
TAM + Robot Characteristics
Variables Usefulness Ease of use Performance-oriented Socially-oriented Non-productive Usefulness Ease of use Performance-oriented Socially-oriented Non-productive
β .37 .27 .19 .13 -.17 .35 .25 -.03 .08 -.08
Attitudinal Acceptance Coefficients Model Summary 2 t p R F p 4.83 <.01 .32 38.2 <.01 3.52 <.01 2.05 .04 <.01 .14 9.2 1.60 .11 -2.00 .05 4.25 <.01 3.06 <.01 <.01 .33 15.8 -0.32 .75 1.15 .25 -1.03 .30
3.5 Predictors of Attitudinal and Intentional Robot Acceptance A hierarchical multiple regression analysis was performed to investigate predictors of attitudinal and intentional robot acceptance. Model summaries are presented in Table 2 and Table 3. The analysis indicated usefulness and ease of use as significantly
46
N. Ezer, A.D. Fisk, and W.A. Rogers
predicting attitudinal acceptance scores. The addition of the robot characteristic variables did not significantly increase the amount of variance explained in attitudinal ac2 ceptance over that explained by the TAM-related variables, R -change = .01, F-change (3, 158) = .88, p = .45. Attitudinal acceptance scores explained a significant amount of 2 variance in intentional acceptance scores, R = .34, F(1,162) = 82.17, p < .01. The addition of the TAM-related variables significantly increased the amount of variance ex2 plained in intentional acceptance scores. R -change = .07, F-change (3, 158) = 9.69, p < .01. The addition of the robot characteristic variables into the model did not explain significantly more variance in scores over those explained by attitudinal acceptance 2 scores, usefulness, and ease of use, R -change = .02, F-change (3, 157) = 1.64, p = .18. Table 3. Regression of Attitudinal Acceptance Scores, Technology Acceptance Model (TAM) Scores, and Robot Characteristic Scores on Intentional Acceptance Scores
Model Variables Attitudinal Acceptance Attitud. acceptance Usefulness TAM Ease of use Attitud. acceptance Attitudinal Acceptance Usefulness + TAM Ease of use Perform.-oriented Robot Characteristics Socially-oriented Non-productive Attitud. acceptance Usefulness Attitudinal Acceptance Ease of use + TAM + Robot Perform.-oriented Characteristics Socially-oriented Non-productive
Intentional Acceptance Coefficients Model Summary β t p R2 F p .58 9.07 <.01 .34 82.2 <.01 .34 4.40 <.01 .30 35.4 <.01 .28 3.66 <.01 .40 5.37 <.01 .19 2.50 .01 .41 37.0 <.01 .18 .24 .19 -.10 .38 .17 .14 .07 .11 .01
2.35 2.78 2.38 -1.21 5.16 2.04 1.82 0.80 1.65 0.13
.02 .01 .01 .23 <.01 .04 .07 .42 .10 .90
.18
12.1
<.01
.43
19.4
<.01
4 Discussion The results suggest that participants imagined robots mostly as helpful, purposeful devices, less as socially-intelligent devices, and least of all as uncontrollable or wasteful devices. Age not did not have a significant effect on the characteristics that participants indicated their robot as having, when technology and robot experience were accounted for. The results suggest younger and older adults with comparable technology experience will have similar expectations of robots as performance-oriented machines. Of course self-selection bias may have played a part in this result, with only individuals having positive views of robots returning the questionnaire..
Attitudinal and Intentional Acceptance of Domestic Robots
47
A regression analysis revealed that usefulness and ease of use were predictive of participants’ attitudinal acceptance of a robot in their home; usefulness, ease of use and attitudinal acceptance were predictive of intentional acceptance. The results provided evidence for the robustness of the Technology Acceptance Model. Overall it appears that younger and older adults would be willing to accept a robot in their home, as long as it benefits them and is not too difficult to use. Acknowledgements. Research presented in this article was supported in part by Grant PO1 AG17211 from the National Institutes of Health (National Institute on Aging) under the auspices of the Center for Research and Education on Aging and Technology Enhancement (CREATE).
References 1. AARP. Robots Roll in an Aging Society, http://www.aarpinternational.org 2. Bickmore, T.W., Caruso, L., Clough-Gorr, K., Heeren, T.: It’s just like you talk to a friend’ relational agents for older adults. Interacting with Computers 17, 711–735 (2005) 3. Breazeal, C.: Toward Social Robots. Robotics and Autonomous Systems 42, 167–175 (2003) 4. Dario, P.: MOVAID: A Personal Robot in Everyday Life of Disabled and Elderly People. Technology and Disability 10, 77–93 (1999) 5. Dautenhahn, K., Woods, S., Kaouri, C., Walters, M., Koay, K.L., Werry, I.: What is a Robot Companion - Friend, Assistant or Butler? In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1488–1493. IEEE Press, New York (2005) 6. Davis, F.D.: Perceived Usefulness, Perceived Ease of Use and User Acceptance of Information Technology. MIS Quarterly 13, 319–339 (1989) 7. Davis, F.D.: User Acceptance of Information Technology: System Characteristics, User Perceptions and Behavioral Impacts. International Journal of Man-Machine Studies 38, 475–487 (1993) 8. Demirirs, G., Rantz, M.J., Aud, M.A., Marek, K.D., Tyrer, H.W., Skubic, M., Hussam, A.A.: Older Adults’ Attitudes Towards and Perceptions of ‘Smart Home’ Technologies: A Pilot Study. Informatics for Health and Social Care 29, 87–94 (2004) 9. Dewar, R.D., Dutton, J.E.: The Adoption of Radical and Incremental Innovations: An Empirical Analysis. Management Science 32, 1422–1433 (1986) 10. Green, S.G., Gavin, M.B., Aiman-Smith, L.: Assessing a Multidimensional Measure of Radical Technological Innovation. IEEE Transactions on Engineering Management 42, 203–214 (1995) 11. Karahanna, E., Straub, D.W., Chervany, N.L.: Information Technology Adoption Across Time: A Cross-Sectional Comparison of Pre-Adoption and Post-Adoption Beliefs. MIS Quarterly 23, 183–213 (1999) 12. Matsumoto, N., Yamazaki, T., Tokosumi, A., Ueda, H.: An Intelligent Artifact as a Cohabitant: An Analysis of a Home Robot’s Conversation Log. In: Second International Conference on Innovation Computing, Information, and Control (2007) 13. Montemerlo, M., Pineau, J., Roy, N., Thrun, S., Verma, V.: Experiences with a Mobile Robotic Guide for the Elderly. In: 18th AAAI National Conference on Artificial Intelligence, pp. 587–592. AAAI Press, Edmonton (2002)
48
N. Ezer, A.D. Fisk, and W.A. Rogers
14. Rantz, M.J., Marek, K.D., Aud, M.A., Tyrer, H.W., Skubic, M., Demiris, G., Hussam, A.: A Technology and nursing collaboration to help older adults age in place. Nursing Outlook 53, 40–45 (2005) 15. Sharit, J., Czaja, S.J., Perdomo, D., Lee, C.C.: A Cost-Benefit Analysis Methodology for Assessing Product Adoption by Older User Populations. Applied Ergonomics 35, 81–92 (2004) 16. Ueda, H., Minoh, M., Chikama, M., Satake, J., Kobayashi, A., Miyawaki, K., Kidode, M.: Human-Robot Interaction in the Home Ubiquitous Network Environment. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 990–997. Springer, Heidelberg (2007) 17. Venkatesh, V., Davis, F.D.: A Theoretical Extension of the Technology Acceptance Model: Four Longitudinal Field Studies. Management Science 45, 186–204 (2000)
Natural Language Interface for Smart Homes M. Fernández, J.B. Montalvá, M.F. Cabrera-Umpierrez, and M.T. Arredondo Life Supporting Technologies ETSIT Universidad Politécnica de Madrid Ciudad Universitaria; 28040 MADRID, ESPAÑA Tel.: (+34) 91 549 57 00 ext. 3407 Fax: (+34) 91 336 68 28 {mfernandez,jmontalva,chiqui,mtarredondo}@lst.tfo.upm.es
Abstract. The development of new ICT technologies, like Ambient Intelligence (AmI), and smart homes technologies has been proven to be a key factor to allow people with disabilities gain independency at their homes, vehicle or working environments. The biggest problem that users with disabilities face at the moment of using these technologies is the difficult use of their interfaces. In this paper we present the methodology and implementation used for the development of an interface with smart homes, based on natural language, which provides an easier way, especially for people with physical disabilities and the elderly, to perform the usual tasks at home without the need of a previous learning and complex processes. Keywords: Smart Home, natural language, ambient intelligence.
1 Introduction According to the European Union Statistics Office [1], around 17% of the European population is above 60 years old, and it is foreseen that this rate will increase up to 20-30% during the next years due to the improvements in the healthcare and the users’ quality of life. On a European level, Spain is positioned as the 5th country with respect to the proportion of elderly people, and for the middle of the XXI century, it will be one of the oldest countries in the world. In this sense, a close relationship between ageing and disability exists: according to the Instituto Nacional de Estadistica 31,2% of the Spanish citizens over 64 present some type of disability [2]. The same source reveals that the group of people with disability constitutes 8,5% of the national population, proportion that is maintained on a worldwide level. Such demographic changes imply an increase of the dependent population that is facing important problems for their active participation in the society as full members. Most of the published reports about social support and dependency confirm the objective benefit, both emotional and physical for the dependent person, to continue living at home rather than moving to residences or sheltering houses. Currently in Spain, almost eight out of ten old people stay at home and in 76% of the situations, the relatives are in charge of providing support in order to achieve the daily activities C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 49–56, 2009. © Springer-Verlag Berlin Heidelberg 2009
50
M. Fernández et al.
[2]. However, disabled people associations complain both about the insufficiency of assistive appropriated services and support to the family assistant. The development of new ICT technologies, like Ambient Intelligence (AmI), can help to create adequate environments so that the claimed and necessary Independent Living can be achieved. The use of home automation technologies and assistive devices has been proven to be a key factor to allow people with disabilities gain independency at their homes, vehicle or working environments [3]. The biggest problem that users with disabilities face at the moment of using these technologies is the difficult use of their interfaces. This paper presents a new modality of interaction with smart homes, based on natural language, which provides an easier way, especially both for people with physical disabilities and the elderly, to perform the usual tasks at home without the need of previous learning difficult processes [4].
2 Methodology The human language is a very complex way of communication. There are many issues that must be considered when a language-based interaction is developed. For the implementation of the interface, we developed a specific methodology based on the combination of the user centered approach provided by Userfit [5] and the usability evaluation of spoken language dialogue systems [6]. The methodology consists of 4 steps that are further described below. 2.1 Step 1: Vocabulary Framework This first step is used to analyze the context where the interface is going to be built, the elements of interaction, the actions and the users. It is based on three phases. • Context analysis: What and Where The context analysis is meant for answering the questions, What and Where. The What question is meant for identifying the different interaction elements. These are the blinds, the lights, and in general the home appliances that will be connected to the interface. The where question is meant for differentiating the elements that are the same or have the same function. Obviously two devices cannot be at the same place at the same time. For example: The light of the kitchen or the light of the bedroom. • Task analysis. Which and When The task analysis is used to identify the different actions that the user is able to perform with the interaction devices. For example programming the temperature of the room or turning off the lights. • User profile: Who One important question is who uses the interface. By identifying the person that has permission to use the interface, security access parameters can be applied, as well as preferences depending on the user’s profile. This enables a personalization of the interface upon the preferences and needs of the user.
Natural Language Interface for Smart Homes
51
2.2 Step 2: Syntactic Construction The next step determines which are the syntactic structures that should be taken into account. For one request there could be different syntactic structures, being the only difference among them the order of the words. If the user wants “to turn on the light of the bedroom”, the same user could ask the same request in different ways, and in order to be coherent with the natural language, a great amount of the most common syntactic structures have been developed. Some of these structures could be: turn on the light of the bedroom, from de bedroom turn on the light, please could you turn on the light. 2.3 Step 3: Semantic Interpretation In the previous step there are some aspects that have not been taken into consideration. The order of the words is not the only important issue in the interface; it is more important to find a request with sense in relation to the connection between words and with the actual situation of the smart home. The system can accept a sentence like “switch on the window from the kitchen”, and the semantic interpretation of the interface understands the request. The same applies to the coherence between the request and the actual state of the home. The interface can understand an action that cannot be performed in this moment. For instance, the system has understood “switch off the TV” and the TV is already switched off. 2.4 Step 4: Answer Construction In order to use the natural language to provide the system answers, a module that automatically generates the answers to each user’s request has been developed, taking in to account the order of the words in the answers. Apart from the definition of the collection of words, the possible structures with the different combination were identified in order to make more natural the provided answers.
3 Implementation The developed interface consists on a software application implemented with C programming language and Loquendo libraries. The voice recognition has two phases: the syntactic analysis and the semantic analysis. The action diagram is shown in Figure 1. When a request is executed, the feedback is provided to the user. The syntactic analysis firstly defines the operation field of the request: if it is a query or an action, and its type. Secondly, it outlines the words that describe the request and provides with information to perform the demand (action, element, localization, parameter, value, etc.). The request structure recognized by the system has to match any of the defined grammars in use. Once the recognition is done, the system performs a semantic analysis to verify the correctness of the request. Then, the consistency of the result is established. In case of any inconsistency between the request’s components, the system provides an interaction module that exchanges questions (made by the system) and responses (made by the user) until it becomes a consistent request to be executed.
52
M. Fernández et al.
Fig. 1. Action diagram
Once the analysis is done and the request is consistent, two different behaviors are distinguished: • If the request is to implement an action, the main function is the amendment of the relevant databases that origin an active task immediately (turn off a light, increase the volume of the television) or a delayed one (programming of the oven). • If the request corresponds with a query (state, parameter or programming), the interface only looks for the value in the relevant database and provides the user with the answer. 3.1 Example of an Action If we consider an action on a device that produces a change in its state, the request corresponds with the basic function that any smart home interaction system should have. For its analysis and treatment the so-called tree structures is used, consisting on a root that correspond with the system application, and successive levels of leaves that determine the components of the activity. The tree structure that defines the achievement of the request shows two different versions. In a first version the structure is centered in the element (see Figure 2), after its determination, it is analyzed the first level of leaves that correspond with the possible actions that can be carried out. The second level determines the localization of the element, to uniquely identify the device to perform the action. In the second version, there is a tree structure centered in the action (see Figure 3), where the first level of leaves are the possible elements on which a task can take place. After that, a second level appears that determines the location to define the device uniquely. It is in these parts of the tree in which it is not possible to determine the component of the next level when a man-machine dialogue is established to resolve any ambiguity. Whether through the first or the second structure and after defining completely the request, a feedback is provided to the user indicating the result of the request, either indicating the action taken (current status of a device or modification of this state), or giving specific details (indicating the values of the parameters needed to program a device).
Natural Language Interface for Smart Homes
53
Fig. 2. Tree diagram centered in the element
Fig. 3. Tree diagram centered in the action
The final objective is to simulate a conversation as natural as possible, and therefore giving variable feedback responses that randomly vary their structure. The three different answers before the conclusion of an action are composed of the initial part of the sentence (I have already…, I just…, I have…). Then, the general structure: Connector (I have already, I just, I have) + Action + Element + Room (in those cases where it is necessary to indicate it, because more than one device of the same type exists). For example, before a specific request like “turning on the light in the kitchen”, the possible system answers can be: I have already turned on the light of the
54
M. Fernández et al.
kitchen, I have turned on the light of the kitchen, I have just turned on the light of the kitchen. 3.2 Example of a Query If we consider a system query about the state of a specific device, two possible queries are defined, conditioned by the type of request that the system has recognized. A first type of query is centered in the element (see Figure 4) of a device, for example: How is the window of the kitchen? The syntactic structure that the system expected has the following components: a connector that differentiates this request from others (How is, Which is the state of, etc.), the device (indicating the element or the location in those cases of ambiguity) and the state the user wants (How is the light of the kitchen?, ¿Is the light of the kitchen turned on?). The interface’s feedback is the state of the device.
Fig. 4. Query centered in the element
A second type of query centered in the action (see Figure 5), in which the user wants to know which devices are in a specific state, for example: What device is switched on in the bedroom? In this case, the elements that define this request are similar to the above one: a connector (is it, what is ...), a state and a location if the user wants to limit his query to a specific room in the house. In this situation, the feedback offered is the list of devices that share this state. The tree structure is used to define the request.
Natural Language Interface for Smart Homes
55
Fig. 5. Query centered in the action
4 Results The validation of the system is currently being carried out with real users through a series of questionnaires and interviews, to get the information as possible not only related to the acceptance of the system, but also to the new recognition possibilities that have not been considered previously. The evaluation consists of the following steps: • The user selects the type of voice. • The user performs different tasks according to the evaluation scenario based on the execution of different use cases. • The user fills in a questionnaire based on qualitative and quantitative questions. • After the interview, the experts fill in the evaluation questionnaire for each use case providing information about the user interaction with the system. The main problem encountered during the evaluation has been the difficulty that the users experiment in differentiating a failure on a request where the system makes a correct recognition, but the quality of the recognition was not good enough, and the failure of the request when the system was not able to make a proper recognition due to a non-existent grammar. The expected results after a massive evaluation will be related to the different aspects of its usability like: how many interactions have been required to conduct an ambiguous request, how much time the user has spent in a task, if there is any element or action that has not been taken into account, or if the user has had a problem with the natural language interaction.
56
M. Fernández et al.
6 Future Work Further work will be needed to migrate the Spanish version of the system to other languages. The syntactic construction and semantic interpretation should be done to adapt them to the new language structures. It is important to take into account the vocabulary of each new language as well as the grammatical structures. A module for biometric identification of the users is under development. It will enable the voice identification of the users, without the need of a specific password. This will enable the implementation of the user profile with the adaptation of the interface to security issues, users’ preferences and needs.
5 Conclusions The evolution of information technologies can facilitate the enhancement of the quality of life of dependent people both elderly and disabled. The difficulties these people may encounter with the use of these technologies rely in the way they interact with them. The main advantage of the methodology used is the fast development of an easy and useful interface based on natural language, focused in the responses and reactions of the users after the evaluation. One of the most important points to emphasize from the design is the independence of operation with the user, giving simplicity and reducing the maximum possible learning and training, to get wider acceptance. The different expected results imply various aspects of usability like; how many interactions have been required to conduct an ambiguous request, how long the user has used to do a task, is there any element or action that has not been taken into account, or if the user has a problem with the type of interaction based on natural language. One of the big limitations of this system is that the final users are not familiarised with the use of new technologies. Additionally, the reluctance to change may be an obstacle for the daily life operation of such systems.
References 1. 2. 3.
4. 5. 6.
Eurostat regional yearbook 2008 (2008), http://epp.eurostat.ec.europa.eu/ Encuesta de Discapacidad, Autonomía personal y situaciones de Dependencia Instituto Nacional de Estadística (2008), http://www.ine.es Montalvá, J.B., Rodríguez, A., Conde, E., Gaeta, E., Arredondo, M.T.: Middleware Architecture for Users Interfaces in Ambient Intelligent supporting Independent Living. In: I Workshop on Technologies for Healthcare & Healthy Lifestyle, Valencia, Spain (2006) Landucci, L., Baraldi, S., Torpei, N.: Natural Interaction. A Remedy for the Technology Paradox (2008), http://ercim-news.ercim.org/content/view/310/497/ Saz, J.T.: El diseño centrado en el usuario para la creación de productos y servicios de información digital. Depto. CC. de la Documentación, Univ. de Zaragoza (2004) Dybkjær, L., Bernsen, N.O., Minker, W.: Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication 43, 33–54 (2004)
Development of Real-Time Face Detection Architecture for Household Robot Applications Dongil Han, Hyunjong Cho, Jaekwang Song, Hyeon-Joon Moon, and Seong Joon Yoo Department of Computer Engineering, Sejong University 98 Gunja, Kwangjin, Seoul, Korea {dihan,hmoon,sjyoo}@sejong.ac.kr,
[email protected],
[email protected]
Abstract. This paper describes the structure of real-time face detection hardware architecture for household robot applications. The proposed architecture is robust against illumination changes and operates at no less than 60 frames per second. It uses Modified Census Transform to obtain face characteristics robust against illumination changes. And the AdaBoost algorithm is adopted to learn and generate the characteristics of the face data, and finally detected the face using this data. This paper describes the hardware structure composed of Memory Interface, Image Scaler, MCT Generator, Candidate Detector, Confidence Mapper, Position Resizer, Data Grouper, and Overlay Processor, and then verified it using Virtex5 LX330 FPGA of Xilinx. Verification using the images from a camera showed that maximum 16 faces can be detected at the speed of maximum 30. Keywords: multiple face detection, MCT (Modified Census Transform), realtime FPGA implementation, hardware design.
1 Introduction With the development of biometric technologies using the information of human bodies, conventional security methods using keys and keypads are being replaced by identity certification methods using face, iris, fingerprint, retina, etc. The face detection method, among them, is currently used with other biometric methods for cases where enhanced security is required, such as some companies, banks, and governmental agencies, due to its low cost and convenience resulting from contactless operation, despite its drawback that relatively large variability is caused by plastic surgery, changes in facial expression, etc. [1] The method is gradually expanding its scope of application as users are certified by the face for personal laptops. Reliable face recognition rate of the method necessarily requires the face detection technology that can accurately extract faces from images. Face detection algorithms have so far been developed mainly for PC-based environments. The technologies could not efficiently detect a face, however, when applied in an embedded system due to the relatively insufficient resources and performance of the system. The need is far more increasing, however, for high-performance real-time C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 57–66, 2009. © Springer-Verlag Berlin Heidelberg 2009
58
D. Han et al.
face detection technologies for embedded systems as human face information becomes useful in more fields for mobile devices such as cellular phones and digital cameras and as the household robot market, covering cleaning, entertainment, toys, etc. It is known that the face detection performance is most affected by illumination changes, face rotation, and facial expression changes. Illumination changes, among them, can result not from the user's intention but from time, lighting, and the changes in the indoor and outdoor environment though face rotation and expression changes can be controlled in their influence if the user is cooperative with the face detecting system. It is still a challenging task, therefore, to clearly detect a face regardless of illumination changes. This paper extracted only the structural information of an object and transformed the images by MCT(Modified Census Transform), which reduces the influence of illumination changes, to reduce the cost required for illumination compensation, and then presented a hardware structure that instantly detects a face using thus transformed images against illumination changes in an embedded system. It also verified the performance by implementing the structure with Virtex5 LX330 FPGA and testing it in the daily life environment with various illumination changes. This paper consists of five chapters. Chapter 2 describes MCT used in this paper and AdaBoost that generates learned data using transformed images. Chapter 3 describes the details of the whole hardware structure proposed in this paper. Chapter 4 analyzed the results of operating the structure in the PC environment and of testing it in the daily life environment after implementing it by FPGA. Chapter 5 finally discusses the conclusion and the future research plan.
2 MCT and AdaBoost Learning Algorithm This chapter describes MCT that extracts the characteristics required for recognizing a face in an image while minimizing the cost required for illumination compensation and the AdaBoost learning algorithm that effectively processes obtained face data characteristics. This chapter also describes the performance and speed change of the single-layer structure used, instead of the cascade structure, to minimize the memory space used to save learned data in an embedded system. 2.1 MCT Ideal operation of the face detection or recognition system requires to extract and use only the structural characteristics of a face from images. MCT [2] expresses the structural information of the window with the binary pattern {0, 1} moving the 3x3 window in an image as small as to be able to assume that illumination of small region is almost constant, though the value is actually variable, where the pattern contains the information on the edges, contours, intersections, etc. Thus connected patterns are transformed into decimal numbers and then the values of the pixels in MCT-transformed images. The figure 1 shows MCT transformed images in different illumination level.
Development of Real-Time Face Detection Architecture
a. Test image 1
b. Test image 2
c. MCT result of Test image 1
59
d. MCT result of Test image 2
Fig. 1. Gray level and MCT images in different illumination level
2.2 AdaBoost Learning Algorithm Viola and Jones [3] extracted characteristics to effectively identify a face using the AdaBoost learning algorithm proposed by Freund and Schapire [4], and composed them in a cascade structure of 38 phases. Fröba and Ernst also introduced a face detector composed of a 4-phase cascade structure using MCT-transformed images and AdaBoost learning algorithm [1]. Such a cascade structure removes an image of the parts that is clearly out of a face in the early phase using a few characteristics with high discrimination capability, and uses more characteristics with relatively weak discrimination capability to concentrate on screening the parts difficult to identify. The 4-phase cascade structure proposed by Fröba and Ernst highly raised the speed by passing all the facial parts and removing at least 99% of the non-facial parts in the first phase.
Fig. 2. The four classifiers of Cascade Detector [1] (the white dots mark positions of elementary classifiers in the analysis window)
The cascade structure thus needs to save, in the form of a lookup table, the location information for required characteristics and the face reliability values corresponding to the characteristic values. The memory is usually not enough to store all the information in an embedded system where the resource is limited compared with the PC environment. The hardware is advantageous, however, in that it is faster than the software operation because it supports multi-tasking. This paper, therefore, composed the face detector with a single-layer structure using only the last fourth phase of the cascade structure proposed by Fröba and Ernst. The test result shows that the single-layer structure can also effectively and instantly detect a face by processing 30 frames per second. The details of the detection performance are discussed in Chapter 4.
60
D. Han et al.
3 Proposed Hardware Structure Figure 3 shows the whole hardware structure proposed by this paper. Memory Interface block controls the streams of the image data and used to save images. Image Scaler down-scales images by phases, MCT Generator transforms images by MCT, and Candidate Detector delivers to Confidence Mapper the MCT values for the 400 pixels of the 20x20 area currently processed in the window. Confidence Mapper determines the reliability value of the possibility that a pixel is out of the facial area and cumulatively sums the 400 values, using the reliability values learned offline in advance and saved in the ROM table. Thus summed total reliability value is compared with the preset threshold and used to find candidate facial areas. Position Resizer calculates the location, in the original 320x240 image of the candidate facial area identified by Candidate Detector and Confidence Mapper. Data Grouper finds and removes non-facial background area out of the facial areas identified by Confidence Mapper and Position Resizer. Finally, Overlay Processor covers, with rectangular shape such as □, the location in the original image identified as a facial area and outputs the image on the display.
Fig. 3. Proposed Face Detection Hardware Structure
3.1 Memory Interface Memory Interface receives original images from a camera and stores these images into two input memory block. This block also distributes the received images to the image scaler block and the MCT generator block. And this memory interface block receives scaled down images from the image scaler and stores them into to scaledown memory block. This block also generates synchronizing information and image size information required for processing images, and transmits them to other blocks. 3.2 Image Scaler Image Scaler down-scales the images input from a camera by phases during the 13 steps of scaling. Such down-scaling is to find a face with windows for each phase while down-scaling the original image by phases and identifying faces of various sizes because the size of the window seeking a face is fixed to 20x20 in the image. It eventually enables to detect a face regardless of the size occupied in the image.
Development of Real-Time Face Detection Architecture
61
Image Scaler uses bilinear interpolation to down-scale an image. Table 1 below shows the changes in size of the down-scaled images and the duration required for down-scaling. Table 1. Size changes and processing time with 54 MHz operating speed Scaling Step
Image Size
Processing Time
Accumulated Time
0 (Original size)
320 x 240
1. 464 ms
1.464 ms
1
284 x 213
1. 152 ms
2. 616 ms
2 3 4 5 6 7 8 9 10 11 12 13
252 x 189 224 x 168 199 x 149 176 x 132 156 x 117 138 x 104 122 x 92 108 x 81 96 x 72 85 x 64 75 x 56 66 x 49
0. 910 ms 0.718 ms 0.561 ms 0.440 ms 0. 344 ms 0. 268 ms 0. 210 ms 0. 166 ms 0.129 ms 0.101 ms 0.078 ms 0.060 ms
3. 526 ms 4. 244 ms 4. 805 ms 5. 245 ms 5. 589 ms 5. 857 ms 6. 067 ms 6. 233 ms 6. 363 ms 6. 464 ms 6. 542 ms 6. 602 ms
3.3 MCT Generator This block internally consists of window Interface block and MCT calculator block. The window interface block transmits the nine pixels in a 3x3 window to MCT calculator at the same time while moving the window in an image. For the 3x3 window extracted from the window interface block, the MCT calculator performs MCT operation. 3.4 Candidate Detector and Confidence Mapper For these MCT-transformed images, 20x20 search windows are used to detect candidate facial areas. The search window moves by one pixel in the horizontal direction first from the left top end. Every pixel in the window has unique non-facial reliability value to the MCT value. Each of the 400 pixels goes through the process that determines it is out of the face if the total exceeds the threshold. This process is designed with the divided modules of Candidate Detector and Confidence Mapper. Candidate Detector collects the successively input MCT images while providing 20x20 window images, using total 19 Line Memories and a separate delaying logic. As shown in Table 2, each pixel of the 20x20 window has a unique reliability value to the MCT value for each location based on the data learned by AdaBoost. Confidence Mapper sums the 400 non-facial reliability values and outputs the coordinate and reliability when the total is within the threshold.
62
D. Han et al. Table 2. Confidence Rule Table using fixed point transform. Confidence Value Q 8.8 Format (16bits) Transform example. (at coordinate (1, 1)) Real number MCT Value Integer value Float value value 1 1.302622 00000001 01001101 2 1.77252 00000001 11000101 3 4 5 6 7 8 9 10 11
0.487639 0.937634 0.15517 0.316182 0.308015 1.77252 1.154538 0.358852 0
00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000000 00000000
01111100 11110000 00100111 01010000 01001110 11000101 00100111 01011011 00000000
511
0.153845
00000000
00100111
Table 2 shows the example non-facial reliability values of each MCT value for locations (1, 1) in a window. For example, the non-facial reliability value is 0.487639 when the MCT value is 3 at the location (1, 1) in the window. This value is 0 when the MCT value is 11 and it means the location is a part of a face with the high probability. The 400 reliability values are summed in the window to select candidate areas. To implement the hardware, as shown in Table 2, the probability value within the range of real numbers are converted into fixed points Q 8.8 (16 bits) to produce 16-bit Confidence LUT. Confidence Mapper was thus implemented by creating total 400 Confidence ROMs (511x16bit). 3.5 Position Resizer This block calculates the corresponding location in the original image for any candidate facial area found in a down-scaled image. To reduce the complex real number calculation and time for real-time processing, this paper first calculates the corresponding coordinates in the original image for all the coordinates, converts them into integers, saves the values in the ROM table. Thus original position can be calculated by simple LUT read operation. 3.6 Data Grouper A process is required to group the duplicate areas determined as the same face prior to determining the final face detection areas: the candidate areas detected by Candidate Detector and Confidence Mapper are detected four to five times around the final detection area.
Development of Real-Time Face Detection Architecture
(a) Before grouping
63
(b) After grouping
Fig. 4. Before and after image of grouping
As shown in Figure 4, duplicate areas can be detected not only in the same image size but in different down-scaled sizes. Overlap is determined when an existing area is overlapped with at least 1/4 of a newly detected area and then the smaller value - i.e. more possibly facial area - is selected out of the two non-facial reliability values. This paper determined final detection areas only when the overlap frequency is at least three to lower the wrong detection rate and raise the detection reliability. 3.7 Overlay Processor This block finally displays the face detection result after adding the final detection areas provided by Data Grouper. This block enables to instantly view the face detection results without the help of the processor.
4 Design Verification and Performance Analysis The verification environment was realized with the system by Virtex5 lx330 FPGA where the result was verified by converting the 320x240 QVGA camera images at the gray level. The proposed architecture operates at the clock speed of 13.5 MHz using a memory of 53.01 KB (31.6 KB RAM + 21.4 KB ROM) and 74,632 (35%) LUTs.
Fig. 5. Verification environment
64
D. Han et al.
The face detection performance was verified using the frontal face test set [9] and Yale Test set[11], each face with different expressions(angry, surprised, etc., under various lighting conditions) provided by MIT, CMU and Yale University. The MIT+CMU three tests A, B, and C include total 130 frontal images with 506 faces and 50 rotated images with 223 faces for the rotated test set. For the detection performance, as shown in Table 3, the detection rate was as high as 80.48%. Table 3. Face Detection rate. (MIT+CMU and Yale Test set) Class of test set Test Set A (CMU)
Detection rate 70.41% (119/169)
False-positives 1
Test Set B (MIT)
64.94% (100/154)
0
Test Set C (CMU) Sum of A, B, C Yale Test set Average
85.25% (156/183) 74.11% (375/506) 100% (165 / 165) 80.48% (540/671)
0 0 0 -
Fig. 6. Results from the Yale test set and MIT + CMU test set
Fig. 7. Rotated face case: correctly detected faces and missed faces
Development of Real-Time Face Detection Architecture
65
Fig. 8. Output result in various intensity level
Figure 6 shows a successful detection for a test image. For rotated faces, as shown in Figure 7, the system could detect not only a frontal face but also a rotated face within 12° left and right and within 15° up and down. It is noteworthy that it showed reliable detection performance not only with normal illumination but also for a face blocked with illumination or for a bright face without any difference in brightness.
5 Conclusion and Future Research Plan There are already various researches reported for face detection algorithms. This paper applied a robust detection algorithm to overcome the deterioration of the detection rate due to illumination by MCT. And it implemented a high-performance real-time detection engine operated at 30 to 60 fps. The result of this paper is very meaningful compared with so far reported face detection hardware engines because they have not been for real-time processing due to low processing speed and limited performance. Accurate acquisition of the face location with a high-performance face detector can contribute to the improvement of the face recognition technology. The detection engine introduced in this paper needs to solve some tasks, despite its excellent detection rate, for complete application. For the images from an indoor robot, unlike those from security systems, the face may not be frontal due to limited cooperation to the system. For the future research plan, therefore, complementary research is under way to establish a model that can detect a face rotated in the lateral or vertical direction, and it is expected to contribute to development of a dedicated visual processing chip for robots with the addition of a recognition function. Acknowledgments. This work is supported by ETRI and Seoul Research & Business Development Program (CR070048). The hardware verification tools are supported by IC Design Education Center.
References 1. Samir, N., Michael, T., Raj, N.: Biometrics, pp. 63–75. Wiley Computer Publishing, Chichester (2002) 2. Bernhard, F., Andreas, E.: Face detection with the Modified Census Transform. In: 6th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 91–96 (2004)
66
D. Han et al.
3. Paul, V., Michael, J.J.: Robust real-time face detection. International Journal of Computer Vision, 137–154 (2004) 4. Yoav, F., Robert, E., Schapire: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 119–139 (1997) 5. Ming-Hsuan, Y., Dan, R., Narendra, A.: A snow-based face detector. In: 12th Advances in Neural Information Processing Systems, pp. 855–861 (2000) 6. Najwa, A., Srivaths, R., Anand, R.M., Niraj, K.J.: Hybrid architectures for efficient and secure face authentication in embedded systems. In: 7th IEEE Transaction on VLSI Systems, vol. 15(3), pp. 296–308 (2007) 7. Su-Hyun, L., Yong-Jin, J.: A design and implementation of Face Detection hardware. In: 44th IEEK Transaction on System Design, vol. 44(4), pp. 43–54 (2007) 8. Duy, N., David, H., Parham, A., Ali, S.: Real time Face detection and Lip feature extraction using Field-Programmable Gate Arrays. IEEE Transactions on Systems, Man and Cybernetics—Part B: Cybernetics 36(4), 902–912 (2006) 9. Rowley, H., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE patt. Anal. Mach. Intell., 22–38 (1998) 10. CMU/VASC Image Database, http://vasc.ri.cmu.edu/idb/html/face/index.html/ 11. Georghiades, A.: Yale Face Database, Center for Computational Vision and Control at Yale University, http://cvc.yale.edu.proje/yalefaces/yales.html
Appropriate Dynamic Lighting as a Possible Basis for a Smart Ambient Lighting Lajos Izsó Budapest University of Technology, Department of Ergonomics and Psychology Egry J. u. 1, blg E. 311, 1111 Budapest, Hungary
[email protected]
Abstract. The objectives of this empirical study were to contribute to the development of an intelligent, adaptive home lighting system for the elderly. The basic idea was that a carefully chosen dynamic lighting with seemingly everincreasing (“up”) - or ever-decreasing (“down”) - illuminance can be used to increase (or decrease) the users’ activation level as they wish, a change that will be reflected in objective psychophysiological parameters, in objective performance, and also in subjective feelings. The paper examines the effects of two particular different forms of dynamic lighting – having the same average illuminance over time – on the performance of the number verification task (NVT) by older adults. As a group, the older adults showed no difference between the two forms of dynamic lighting. However, by involving the individual’s sensation seeking needs it was shown that the kind of dynamism influences both the subjective preferences and the objective visual performance. These findings emphasize the importance and sensitivity of individual characteristics of the elderly and have to be taken into consideration for the design of adaptive lighting systems. Keywords: AAL (ambient assisted living), ambient lighting assistance, dynamic lighting, sensation seeking needs.
1 Introduction: The Concept of the ALADIN Project This paper is a summary of the main results of our dynamic lighting experiments carried out in the frame of the ALADIN project (Ambient Lighting Assistance for an Ageing Population), the basic concept of which can be seen in Figure 1. According to the Description of Work [1] ALADIN aims at developing an intelligent assistive system based on ambient lighting to support mental alertness and memory performance as well as relaxation in specific situation. As described here, the system is also expected to assist with regulating circadian rhythms. The ALADIN prototype aims at enabling the users to make adaptations tailored to their specific needs and wishes by developing an intelligent control system that is capable of capturing and analysing the individual and situational differences of the psycho-physiological effects of lighting, and – based on it - is also capable of providing the users with specific adaptive lighting inputs that fit their actual needs and wishes. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 67–74, 2009. © Springer-Verlag Berlin Heidelberg 2009
68
L. Izsó
Fig. 1. The basic concept of the ALADIN project
2 The Concept of Dynamic Lighting Dynamic lighting is, by our definition, lighting that provides light output parameters varying over time with a variation that can be perceived by people. The varying lighting parameters can be illuminance or spectral characteristics or both. As natural lighting is almost always dynamic, there are probably biological evolutionary mechanisms in humans that make them prefer varying lighting rather than fixed lighting. The basic idea behind the desire to use dynamic lighting is that dynamic lighting with carefully chosen characteristics could induce better performance and lower strain at the same time. A correctly timed change in illuminance or color temperature will periodically stimulate the arousal that otherwise would have decreased under fixed lighting due to processes such as habituation, fatigue, monotony, or saturation. Some lighting system manufacturers – as Philips [2] - already provide dynamic lighting systems. In previous studies carried out at the Budapest University of Technology and Economics, we - Izsó and Majoros [3], Majoros [4], Izsó [5] - conducted laboratory experiments studying the effect of dynamic lighting on the visual performance and the related feelings of young people. These results showed that, compared to fixed lighting: - dynamic lighting of the same average illuminance leads to a better quality of visual performance, i.e., more accurate work, although there is no difference in the quantity of work - dynamic lighting is judged significantly as more stimulating, more pleasant and less tiring - dynamic lighting produces higher levels of arousal in physiological terms.
Appropriate Dynamic Lighting as a Possible Basis for a Smart Ambient Lighting
69
This paper reports a series of new laboratory experiments aimed at identifying the effects on performance and subjective feelings induced by two selected dynamic lighting conditions in older adults. Dynamic changes referred to illuminance, changes in spectral characteristics were not studied.
3 Method 3.1 Setting, Participants and Sessions The experiments were carried out in the Indoor Ambient Environment Laboratory of the Budapest University of Technology and Economics. 12 participants were hired (8 females and 4 males), their age ranged from 66 to 84, with an average of 71. The time of sessions was always between 13.00 pm - 17.00 am. The procedure and timing for each experimental session was as follows: • Applying the electrodes and giving instructions (8-10 min). • Performing the computerized version of the number verification task (NVT) for 2×16 minute periods (covering 4 cycles of dynamic lighting) - with a 4 minute break in between – under one form of dynamic lighting (36 min). During the break the illuminance was 200 lx . • Completing two subjective rating scales, and giving his/her opinions about the lighting verbally (10-12 min). The NVT (Numerical Verification Test) was used as an objective measure of visual performance. Several authors – e.g. Rea [6], Rea et al. [7] - have already described the development and implementation of the NVT as a tool for measuring visual performance as a function of different lightings. We used the NVT hit ratio (the ratio of the number of successful hits to the total number of hits) as a parameter of performance quality and the NVT absolute hit number (the total number of hits) as a parameter of performance quantity. It is known from “speed/accuracy trade-off” studies that during performing particular tasks a lot of factors influence if a person sacrifices accuracy for speed, or the contrary. Some persons optimize their performance for speed, others for accuracy. Therefore it is important to use both speed (quantity) and accuracy (quality) as performance measures simultaneously. This is the reason why both NVT hit ratio and NVT hit number were used. These measures were automatically calculated in three different window sizes. The data corresponding to the 60 s window was used in this study. Since the NVT task was very simple and the participants had practiced it before, the possibility of a “learning effect” could be excluded. It was enough to avoid “carryover” effects by conducting the “down” and the “up” sessions on different days. The following psychophysiological measures were recorded during the sessions: the time interval between successive R waves of the electrocardiogram and its variance, the skin conductance level, the skin conductance response, the electromyogram and the respiratory amplitude. The two subjective rating scales used were the “relaxation” scale and the “pleasantness” scale. These are five-point scales but in our case half-points were also possible, so in effect we had 10-point scales. The anchor labels of the two scales were:
70
L. Izsó
1: “perfectly relaxing, calming lighting” 2: “rather relaxing, calming lighting” 3: “neutral lighting between relaxing and activating” 4: “rather activating, stimulating lighting” 5: “perfectly activating, stimulating lighting” And 1: “perfectly pleasant lighting” 2: “rather pleasant lighting” 3: “neutral lighting between unpleasant and pleasant” 4: “rather unpleasant lighting” 5: “perfectly unpleasant lighting” In addition to the use of these two scales, the Sensation Seeking Scale (SSS) – Zuckerman [8], [9], [10], [11] - was also applied as a standard personality test for describing the sensation/stimulation seeking disposition of the participants. We used the second Hungarian version of SSS (form IV), based on more extensive factor analyses, contained four subscales: 1. Thrill and Adventure Seeking: The desire to try risky sports or activities involving elements of speed, movement, and defiance of gravity. 2. Experience Seeking: The desire to seek experience through the mind and the senses; through music, art, travel, and an unconventional style of life with unconventional friends. 3. Disinhibition: The desire for or actual enjoyment of uninhibited and socially extraverted activities, e.g., parties, social drinking, and a variety of sexual partners. 4. Boredom Susceptibility: A strong aversion to monotony or a lack of change, and a preference for the unpredictable; restlessness in confining, dull conditions. The SSS score is simply the sum of the scores given on the four subscales. 3.2 Lighting Manipulations The programmable lighting system applied consisted of six Zumtobel/Luxmate RCE 2×58 W mellow light luminaries arranged in a regular array. The CCT was 3000K and the CRI was 85%. The illuminance – time functions of the suddenly decreasing (“down”) and the suddenly increasing (“up”) dynamic lighting are shown in Figure 2. In both cases the illuminance on the desk changed consistently between 300 lx and 900 lx, with an average of 600 lx. In the “down” condition the duration of the sudden decrease was 6 s, the length of both stable periods was 60 s, and the length of the slow increase was 354 s. In the “up” condition, the duration of the sudden increase was 6 s, the length of both stable periods was 60 s, and the length of the slow decrease was 354 s. Thus, for both “down” and “up” conditions, the total cycle time was eight minutes (480 s). As the whole session contained four complete cycles, the participants were expected to perform the NVT task for 32 minutes (1920 s) with a four minute break in the middle. This way the test persons received the same light exposure during the two sessions, but with different dynamics. These time parameters were determined so that the short 6 s down and 6 s up periods were clearly perceived by every participant, but the 354 s up and down changes were not.
Appropriate Dynamic Lighting as a Possible Basis for a Smart Ambient Lighting
71
3.3 Hypotheses Based on our earlier studies - Izsó and Majoros [3], Majoros [4], Izsó [5] - it was taken for granted that either form of dynamic lighting would lead to better quality NVT performance than the equivalent fixed lighting. Our present hypothesis was simply that the form of dynamic lighting influences both the subjective preferences and the objective NVT performance. More specifically, it was assumed that the sudden brightening, the “up” dynamic lighting, would be subjectively preferred to the sudden darkening, the “down” dynamic lighting, and therefore would induce better NVT performance. This hypothesis was tested using subjective judgments and the overall NVT performance measures.
Fig. 2. The illuminance – time functions of the suddenly decreasing (“down”, first graph) and the suddenly increasing (“up”, second graph) dynamic lighting conditions
4 Results Contrary to our hypothesis, neither the mean NVT hit number nor the mean NVT hit ratio showed any statistically significant difference between the “down” and “up”
72
L. Izsó
Fig. 3. The individual pleasant – unpleasant scores during the “up” lighting condition as a function of individual SSS scores (r = -0.720, p = 0.008)
Fig. 4. The individual mean NVT hit ratio during the “up” lighting condition as a function of individual SSS scores (r = 0.617, p = 0.043)
forms of dynamic lighting. The only statistically significant effect found was that the standard deviation of the NVT hit number was higher during the “up” lighting condition than during the “down” lighting condition (Wilcoxon test, p=0,012). This indicates that the mean hit number was somewhat more uneven during the “up” lighting condition.
Appropriate Dynamic Lighting as a Possible Basis for a Smart Ambient Lighting
73
These results first seem to be “disappointing”, but if the individual differences in sensation seeking, as measured by the SSS score, are taken into account, the overall picture becomes more complex but makes also more sense at the same time. For the “up” form of dynamic lighting, the individual SSS scores were correlated with both the pleasant – unpleasant scores (Figure 3) and the mean NVT hit ratio (Figure 4). These correlations are such that the participants with higher SSS scores found the “up” lighting more pleasant and performed better during “up” lighting in terms of NVT hit ratio than those with lower SSS scores.
5 Discussion and Implications for Practice Neither the mean NVT hit number nor the mean NVT hit ratio showed a statistically significant effect of the form of dynamic lighting. However, when SSS scores are also considered, it was apparent that the young participants used earlier had much higher SSS scores (13 – 30) than the older adults used here (6 – 20). As shown in Figure 3, all older adults with SSS scores of 12 or higher rated the “up” lighting condition as pleasant. Assuming that the same or a similar relationship holds for the young people used in our earlier studies, who all had SSS scores of at least 13, it is obvious (Figure 3 and 4) why they all strongly preferred the “up” lighting to “down” lighting. Taking thus into account our elderly persons’ SSS scores makes the overall picture even more understandable. The fact that during the “up” lighting condition people with higher SSS scores found the lighting more pleasant, and – consequently - also produced better NVT hit ratio, can be explained by their increased need for sensation. The objective of our empirical study was to contribute to the development of an intelligent, adaptive ambient home lighting system for the elderly. The basic idea was that an appropriate dynamic lighting with seemingly ever-increasing (“up”) - or everdecreasing (“down”) - illuminance can be used to increase (or decrease) the users’ activation level as they wish, a change that will be reflected in objective psychophysiological parameters, in objective performance, and also in subjective feelings. We consider our finding that the impact of dynamic lighting is moderated by the individual’s sensation seeking needs as important for the design of future adaptive lighting systems, especially for the elderly. Namely, our experiences have shown that while practically all the young people involved in our earlier experiences had relatively high SSS scores and therefore all preferred the ”up” dynamism, among the elderly the SSS scores are globally lower and distributed so that a part of the elderly also enjoys the ”up” dynamism, but an other part of them definitely does not.
References 1. Description of Work, Annex I of the project contract of the ALADIN project (approved by EC on 24/11/2006) 2. Philips, http://www.dynamiclighting.philips.com/start_int.html 3. Izsó, L., Majoros, A.: Dynamic Lighting as a Tool for Finding Better Compromise between Human Performance and Strain. Applied Psychology in Hungary, 83–95 (20012002) 4. Majoros, A.: Effects of Dynamic Lighting. LUX Europa, Reykjavik, Iceland (2001)
74
L. Izsó
5. Izsó, L.: Developing Evaluation Methodologies for Human-computer Interaction. Delft University Press, Delft (2001) 6. Rea, M.S.: Visual performance with realistic methods of changing contrast. Journal of the Illuminating Engineering Society, 164–177 (1981) 7. Rea, M.S., Ouellette, M.J., Kennedy, M.E.: Lighting and parameters affecting posture, performance and subjective ratings. Journal of the Illuminating Engineering Society, 231–238 (1985) 8. Zuckerman, M.: Dimensions of sensation seeking. Journal of Consulting and Clinical Psychology 36, 45–52 (1971) 9. Zuckerman, M.: Sensation Seeking: Beyond the Optimal Level of Arousal. Erlbaum, Hillsdale (1979) 10. Zuckerman, M.: Sensation seeking: A comparative approach to a human trait. Behaviour and Brain Science 7, 413–471 (1984) 11. Zuckerman, M.: Sensation seeking, mania, and monoamines. Neuropsychobiology 13, 121–128 (1985)
A New Approach for Accessible Interaction within Smart Homes through Virtual Reality V. Jimenez-Mixco, R. de las Heras, J.L. Villalar, and M.T. Arredondo Life Supporting Technologies-Technical University of Madrid ETSI Telecomunicacion-Ciudad Universitaria 28040-Madrid, Spain {vjimenez,rheras,jlvillal,mta}@lst.tfo.upm.es
Abstract. This paper proposes an innovative Virtual-Reality-based interaction strategy integrated with a real domotics platform as a testbed to evaluate user experience of people with disabilities and their assistants. A living lab has been arranged to analyze and extract those applications with better acceptance for the users, making use of a multimodal approach to adapt the interaction mechanisms to their needs, skills and preferences. A preliminary testing phase validated the system in terms of performance, reliability and usability. A complete evaluation trial is about to be configured for assessing the final system with a wide range of target users. Keywords: virtual reality, domotics, accessibility, living lab, smart homes.
1 Introduction An increasing number of people currently experiment difficulties to perform daily activities, mainly because their close environments are not properly adapted to their needs, skills or preferences, especially in case of people with disabilities [1]. Pushed by social pressure, public investment and technological advances, plenty of accessible sites (including home, workplace or public spaces) are arising, in particular from an architectural point of view. Furthermore, progress in domotics provides improved adapted designs for home appliances, even though there are still important usability concerns to incorporate the latest technologies into everyday life [2]. Virtual Reality (VR) emerging technologies can offer significant opportunities to support user interaction in technological environments, helping to reduce the existing usability gaps [3]. The idea of integrating VR into domotic spaces as a channel to approach environment control to users implies the exploration of a very testable research area with promising perspectives for supporting independent living [4]. This paper aims at presenting the strategy followed in the Technical University of Madrid to establish a living lab for assessing user experience of people with disabilities in smart homes founded on two main technologies: VR and domotics. Further expectations of the research work include the validation of VR-based human-machine interaction in a variety of smart home applications with different profiles of users and interaction devices. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 75–81, 2009. © Springer-Verlag Berlin Heidelberg 2009
76
V. Jimenez-Mixco et al.
2 Materials and Methods The smart home infrastructure is located in the Laboratory for Domotic Usability Evaluations at the Telecommunication School of the Technical University of Madrid (UPM). The domotic installation is composed of several home appliances (lights, window blinds, doors, taps, heating) and environment sensors (presence, smoke, gas, flood, temperature), connected through an EIB gateway. An open architecture platform based on an OSGI middleware allows software-based environment control, while a set of web-based interfaces enable secure and personalized local/remote access to the domotic services [5]. A virtual environment has been developed trying to replicate the real appearance of the laboratory, using Multigen Creator1 for 3D design and EON Studio2 for interactivity. Users may interact with the system through a convenient combination of different displays and devices: • • • • • • •
6 m2 Stewart retro-projection screen3 for stereoscopic glasses Trivisio AR-vision-3D4 stereoscopic binocular display 5DT HMD 800 head-mounted display5 5DT Data Glove 5 for detecting finger motion5 Intersense InterTrax 2 tracking system with 3-DOF6 Stereo sound system Different models of tactile screens, mice and keyboards
The implementation of the VR-based solution went through the following phases: 1) graphical design of the virtual elements for a realistic 3D representation; 2) compilation of individual components into the simulated lab; 3) integration and configuration of multimodal interaction devices (visual, tactile, acoustic and haptic); 4) incorporation of animation and interactivity to the virtual scene; 5) development
Fig. 1. System architecture representation
1
http://www.presagis.com http://www.eonreality.com 3 http://www.stewartfilmscreen.com 4 http://www.trivisio.com 5 http://www.5dt.com 6 http://www.isense.com 2
A New Approach for Accessible Interaction
77
of a Web-Service-based bi-directional communication interface between the virtual and real environments; and 6) deployment of VR solution over most web browsers (taking advantage of EON Reality’s Web plug-in). The Design For All principles [6] have been considered all along the design and implementation process, taking into account concepts such as usability, adaptability, multimodality or standards-compliance. The resulting architecture of the system is represented in Fig. 1.
3 Design of the Virtual Environment One of the key factors that support accessibility is the provision of multimodal user interfaces. Multimodality, as the possibility of using in parallel more than one interaction modes for communication between user and system, has been implemented in four modes. On one hand, visual and acoustic modalities are used for system output. Inputs to the system are in contrast achieved through tactile and haptic methods. A more detailed description is included below: − Visual: The 3D representation of the virtual lab and its components give the main feedback to the user: virtual doors have been provided with (open/close) motion, lights change their color to differentiate between on/off status (conveniently updating light conditions in the whole virtual environment), water pours from the tap when open, etc. Every interaction option is indicated with a help tool, both textual and graphical, to facilitate user’s navigation along the scene. Moreover, the user has also the possibility to visualize the real living lab at any time through a web camera. Several PC screens, VR displays and HMDs allow personalized visualization scenarios. − Acoustic: In addition to the visual help tools, acoustic messages guide the user in the navigation and interaction with the scene: for instance, when the user points an interactive device such as the door, a voice message informs about the different interaction possibilities (e.g. open/close). Virtual elements are also complemented with acoustic signs to increase the user’s immersion feeling, like the sound of window blinds raising or water pouring from the tap. This modality is implemented through a stereo sound system, either integrated in the environment or through personal headphones. − Tactile and haptic: These two modalities allow the user to navigate in the virtual scene and control its elements. Navigation provides the user with a feeling of presence into the virtual environment, as s/he can move all over the scene (forward, backward and around), collide with the virtual objects, observe in any direction, make zoom and even change the point of view. The different interactive devices of the scene (e.g. lights, doors, webcam) can be controlled too, changing their status through a combination of these modalities. Complete tactile interaction is possible by pressing keyboard buttons, moving and clicking the mouse or touching the tactile screen. Haptic communication is implemented by detecting commands from hand-based gesture recognition (using the VR glove) as well as by aligning user perspective along with head or arm real-time orientation (through the 3-DOF motion tracker).
78
V. Jimenez-Mixco et al.
Furthermore, the system has been designed according to various guidelines and recommendations in order to make it accessible and user-adapted [7][8]: − Users are enabled to perform tasks effectively using only one input device, such as mouse, keyboard, tactile screen or data glove. − The virtual environment provides object descriptions that are meaningful to users (when interactivity is enabled). Visual objects that are primarily decorative and contain little or no information need not be described. − Users can easily activate a graphical menu to personalize different interaction features (e.g. acoustic signals) − Users are provided with both acoustic and visual feedback when they interact with the elements in the scene (including alerts). − Users are allowed to change the point of view and make zoom in the scene so as to find the most comfortable perspective. − Every interaction has been implemented with at least two modalities: e.g. users can turn on/off the light either tactually (tactile screen, keyboard, mouse) or haptically (data glove), whereas feedback is obtained both graphically (screen, VR display) and acoustically (stereo system, headphones). Although voice recognition is still not supported, it is being considered as a significant improvement for the system. − Immersiveness can be adapted according to user preferences, from selecting specific visualization devices (e.g. HMD instead of PC screen) to simulating the own user’s hand in the virtual scene. The inclusion of a virtual avatar is under consideration for further research. − Users can combine the different interaction devices and modalities as they wish, to achieve the most usable solution for them. At present, most VR interfacing devices are wired, presenting tough usability concerns. Emerging wireless gadgets are to provide a relevant step forward in this sense.
4 Results The proposed solution has resulted in a running living lab for testing VR applications in the smart home domain, especially devoted for people with disabilities. This approach permits that users move around and interact with home appliances in a virtual environment, allowing them to check and change online the status of real devices directly through the 3D virtual elements (Fig. 2). To keep consistency between both real and virtual environments, status and orders for each device are shared by means of continuous feedback through an Internet connection and a typical web browser. Because of the collection of displays and interactive devices included, users may play around with different interaction modalities and degrees of immersion. Also some visual and acoustic guidance and help tools have been provided, to facilitate navigation within the interface and make it more intuitive. In addition, by adjusting a number of configuration elements on the user interface, the same interactive application may be validated for different settings and user profiles. In case the virtual environment is disconnected from the real lab, the application can be used by elderly or cognitive disabled to learn how to manage domotic installations in a non-threatening environment, while those with physical impairments may
A New Approach for Accessible Interaction
79
Fig. 2. Views of the UPM living lab: real picture (left) vs. virtual representation (right)
exploit the system to find the most convenient combination of modalities and interaction devices. By keeping both labs interconnected, confident users can go one step further and take the application directly as a ubiquitous remote control of the smart home, enabling them to check, both indoors and outdoors, the status of any home alarm or change in advance the temperature of the heating system. Moreover, carers, relatives or informal assistants are able to monitor in a non-intrusive way the real environment of any person requiring external supervision. A preliminary evaluation phase has been carried out to validate the system in terms of performance, reliability and usability. 25 volunteers were able to assess combinations of the different displays and interaction mechanisms, both in simulated and real running modes. The results have been satisfying in terms of system usability, supporting the interest in VR technologies applied to smart home interaction. At present a complete validation procedure is being arranged, considering several user profiles from people with disabilities, professional assistants and informal carers.
5 Discussion and Conclusion VR technologies are in general adaptable to a wide range of individual requirements. Particularly, the multimodal approach inherent to VR and the low-effort interaction techniques followed can make VR-based interfaces especially valuable for users with disabilities or special needs. Conjunction of these facts may enhance the variety of accessible solutions for addressing the specific impairments and preferences of each person, especially in terms of interaction limitations. This involves not only physical, but also cognitive disabilities: − People with hearing impairments are perfectly able to use this system, since acoustic output is not essential for correct operation: all acoustic signals have their equivalent visual feedback. − In contrast, acoustic feedback reinforces the capability of a user with low vision to navigate and interact with the environment.
80
V. Jimenez-Mixco et al.
− Someone with limited mobility can benefit from both haptic and tactile modalities to control domotics in an error-tolerant environment, without requiring excessive effort or accuracy. − Tactile modality combined with acoustic and visual guidance may be useful for children with attention disorders [9]. − Elderly and people with certain phobias can experiment with new technologies in virtual environments so as to learn, get used to them and overcome their fears [10]: the system offers the option of interacting with just the virtual world, without real connection to the domotic platform, consequently avoiding any risk related to device operation. In summary, this paper has presented an innovative approach that looks for accessible user interaction with a smart home control platform through VR. The current research work aims at giving answer to a number of open issues such as: − The adequacy of VR for enhancing user experience for people with disabilities. − The chances of VR as widespread usable human-computer interaction method. − The convenience of VR for a daily handling of domotic environments. From the preliminary results of this work, we can conclude that VR shows promising possibilities for providing disabled people with more adapted access to domoticrelated applications, mainly due to the capability of integrating different interaction devices. In this sense, the addition of other modalities, like natural language voice recognition or augmented reality, or new interfacing devices coming from the emerging generation of intuitive wireless gadgets for entertainment or telecommunication, might be the starting point for definitely spreading VR technologies while fostering their key role in improving accessibility for people with special needs. Acknowledgments. Part of this work has been accomplished under the TRAIN-ON project, a research initiative partly funded by the Office of Universities and Research of the Community of Madrid and the Technical University of Madrid.
References 1. World Health Organization, World Report on Disability and Rehabilitation (2006), http://www.who.int/disabilities/publications/ dar_world_report_concept_note.pdf 2. Cheverst, K., Clarke, K., Dewsbury, G., Hemmings, T., Hughes, J., Rouncefield, M.: Design With Care: Technology, Disability and the Home. In: Harper, R. (ed.) Inside the Smart Home. Springer, London (2003) 3. Stanney, K.M., Mourant, R.R., Kennedy, R.S.: Human Factors Issues in Virtual Environments: A Review of the Literature. Presence: Teleoperators and Virtual Environments 7, 327–351 (1998) 4. Marsh, H.: Virtual Reality, Human Evolution, and the World of Disability. In: Proceedings of the First Annual International Conference: Virtual Reality and Persons with Disabilities, Center On Disabilities, California State University-Northridge, 79 (1993)
A New Approach for Accessible Interaction
81
5. Conde, E., Rodriguez-Ascaso, A., Montalva, J.B., Arredondo, M.T.: Laboratory for Domotic Usability Evaluations. In: Proceedings of the I International Congress on Digital Homes, Robotics and Telecare for All (DRT4all), Madrid (2005) 6. Carbonell, N., Stephanidis, C. (eds.): UI4ALL 2002. LNCS, vol. 2615. Springer, Heidelberg (2003) 7. Tiresias web site. Guidelines for the design of accessible information and communication technology systems, http://www.tiresias.org/research/guidelines/index.htm (last access, March 2009) 8. ETSI EG 202 487: Human Factors (HF); User experience guidelines; Telecare services, eHealth (2008) 9. Rizzo, A.A., Klimchuk, D., Mitura, R., Bowerly, T., Shahabi, C., Buckwalter, J.G.: Diagnosing Attention Disorders in a Virtual Classroom. IEEE Computer 37(5), 87–89 (2004) 10. Parsons, T.D., Rizzo, A.A.: Affective outcomes of virtual reality exposure therapy for anxiety and specific phobias: a meta-analysis. Journal of Behavior Therapy and Experimental Psychiatry 39, 250–261 (2008)
A Design of Air-Condition Remote Control for Visually Impaired People Cherng-Yee Leung1, Yan-Ting Yao1, and Su-Chen Chuang2 1
Department of Industrial Design, Tatung University, 40, Sec. 3, Jhongshan N. Rd., Taipei 104, Taiwan, Rep. of China, 2 Department of Special Education, National Taichung University, 140, Min-Shen Rd., Taichung 403, Taiwan, Rep. of China {leung,yty}@ttu.edu.tw,
[email protected]
Abstract. Air condition is operated by a remote control presented mainly with visual cues. Individuals with visual impairment have difficult to use it. This research aims to design an air condition remote control for them. Based on investigation, literature review, and discussion among 4 experts, a set of design principles emphasizing on consistency, discrimination, efficiency, label, and feedback has been set up. 6 main functions each having 2~4 alternatives were modeled by ABS accordingly. 20 visually impaired people volunteered in two experiments: within function and between function. By Friedman test and LSD method, the 6 functions have been classified into three groups: (power switch, temperature setting), (wind speed selection, wind direction selection), and (sleeping mode, time setting) arranging form top to bottom on a remote control. Through discussion, the braille labels have been decided placed at the left side of the function buttons. The implications were discussed in the conclusion. Keywords: Air Condition, Non-barrier environment, Remote Control, User Center Design, Visually Impaired.
1 Introduction Air condition is one of necessities for human being. It is operated by a remote control, especially for domestic use. The remote control of air condition has been in mature and stable stage, and most of its control methods are similar. As shown in Figure 1, the infrared emitter is located at the top of remote control, followed by the LCD display area showing the current operating status, then at the bottom are the press buttons, next to which are text labeling the function of the buttons. On the air condition itself, there are lights, icons showing the current operating status and wind-guiding blades showing wind direction (Fig. 2). Most of the remote controls in the market utilize infrared to wirelessly transfer information (Fig. 3). Users first have to hold the remote control and point it at the air condition when pressing the desired buttons. At this moment, the air condition will react with sounds and flash lights to indicate that it has received the signal. The detailed operational information will be shown on the LCD display at the top of the remote control. Users then use the remote control to adjust the air condition in order C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 82–91, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Design of Air-Condition Remote Control for Visually Impaired People
83
Infrared emitter LCD display area
Operational buttons (with text label)
Fig. 1. An example of remote control [20] Wind-guiding blades showing the wind direction
Icons and Display Lights
Fig. 2. An example of air condition [20]
Fig. 3. An illustration of using remote control
to get their desired situation. However, information shown on the LCD display area, text labels next to press buttons, lights and wind-guiding blades on air condition itself are designed for sighted people. That is, all the functions on remote control almost have been designed to be operated mainly with visual cues. Visually impaired people lack the ability to receive information through the visual sense, which happens to be primary channel for communication with the air condition. Therefore, an individual with visually impaired has difficult to use it. With non-barrier environment [7, 8, 19]
84
C.-Y. Leung, Y.-T. Yao, and S.-C. Chuang
and user center design concept [6, 15, 17, 18] under consideration, and based on the requirements and opinions in operating air condition for visually impaired people [13], this research aims to design an air condition remote control for them.
2 Literature Review Visual sense is the primary Chanel to receive information [21]. The visually impaired people are weak in visual sense; therefore, they have to rely on other senses to interact with outside world [20]. As solving the problems confronting with visually impaired, it is important not to forget those affecting everyday things [3]. The complicate operation processes and functions and instructions not being discriminated are top two problems in operating home electronic appliances for visually impaired [2]. Air condition is one among these hard operating appliances [2]. The purpose of non-barrier environment is to design and build an accessible environment for disabled people [7, 8, 19]. All the designs have consider their uses is the concept of user center design which is a broad term to describe design processes in which end-users influence how a design takes shape [6, 15, 17, 18]. Burgess [1] mentioned some control deign principles which include enough space for hand or finger manipulating controls, proper location and strength for activating controls, clear indication for showing operation status, easily readable label for instruction and discrimination. He also mentioned that labeling positions must be consistent, and grouping. Wu [22] added that braille, tactile icon, and other tactually stimulus are necessary whenever visually impaired involved. Edman [4] addressed that the depth or height of tactile area symbols, and line symbols has to be 0.5 ~ 1 mm, and at least 1 mm, respectively. He also said the area of tactile symbols is at least 5.0 * 5.0 mm2, and 3 mm between symbols. You and Chen [23] got the minimum discriminated width of a tactile geometric shape is around 4.28 mm. Frascara and Takach [5] suggested that the height of relief should be more than 1 mm. With voice feedback, braille label, and various sizes of buttons, the visually impaired has better performance in operating Fountain machine [14]. Regarding remote control using with TV for elders, Hung [10] stated some design guidelines about the lay out, size, number, and shape of press buttons. Huang [9] figured out the color combinations for text against background. Leung and Chen [11] added text patterns used in remote control for regular people. However, little literature concerns about air condition remote control [12], especially for the visually impaired. Therefore, this study tried to put some efforts on this relative ignored field.
3 Research Design Three consecutive stages were conducted in this research: 1) design principle, 2) model building, and 3) experiment which involved two experiments: within function experiment and between function experiment.
A Design of Air-Condition Remote Control for Visually Impaired People
85
3.1 Design Principle Stage Based on the 41 requirements and opinions in operating air condition for visually impaired people [13], and 6 main functions they need have been identified, 2 product designers and 2 special educators were invited to discuss the design principles for an air condition remote control. Refer to literature, the characteristics of visually impaired, and the feature of air condition, two issues were addressed: 1) press buttons must be tactually distinguishable in terms of their functions and operation status; 2) feedback must be clear and be heard. A set of design principles (table 1) has been set up. This design principle emphasizes on consistency, discrimination, efficiency, label, and feedback. Table 1. Design Principles Principle
Description
Shape consistency All the press buttons must be consistency in shape to avoid confusion Common Icons
Common icons are put on the top of buttons to be tactually distinguishable and reduce burden of learning.
Separation
There must be some space between buttons to separate them for being tactually distinguishable.
Grouping
Buttons performing similar functions have to be groped together to increase efficiency and readability.
Extruding
Buttons must be extruded from the level of remote control, relief icons used on the top of button, and braille used to be tactually distinguishable.
Single Layer
All the function has only one layer to simplify operation steps.
Label
All the functions are labeled with braille to be readable.
Tactile feedback
All buttons are pressed in up and down direction, activated buttons are sunken, and non-activated buttons are kept in normal height; therefore, this kind of tactile feedback is tactually clear and can show the operation status.
Voice feedback
Bi-Bi sound are employed to indicate button pressed successfully to give clear hearing feedback.
3.2 Model Building Stage Based on the design principles mentioned above, all the buttons were designed in round shape, and being pressed in up and down direction. Their heights are 5 mm above the level of remote control, and 2 mm after being pressed. Common icons were used, such as, I/O represents power on and out, Δ increasing, ∇ decreasing. Three design alternatives for the button of power switch, 3 for temperature setting, 3 for time setting, 3 for wind speed selection, 2 for sleeping mode, and 4 for wind direction selection have been figured out. Models have been made by ABS, each corresponding to one of the design alternatives (Table 2). Their specifications are shown in Table 3.
86
C.-Y. Leung, Y.-T. Yao, and S.-C. Chuang Table 2. Models for Each Function Function
Power Switch
Temperature Setting
Time Setting
Wind Speed Selection
Model
Function
Model
Function
Sleeping Mode
Wind Direction Selection
Model
Table 3. The Specification (in mm) for the Models Pattern
Specification
Single Line
Double Line
A Design of Air-Condition Remote Control for Visually Impaired People
87
3.3 Experiment Stage Two experiments were conducted: within function experiment and between function experiment. Twenty visually impaired people volunteered in both experiments. 9 males and 11 females are able to tactually read braille. They were between 23 and 59 years old (mean 41.4, standard deviation 8.10). In the within function experiment, each subject was asked to rank the alternatives within each function based on 3
Fig. 4. Process -Within Function Experiment
Fig. 5. Process-Between Function Experiment
88
C.-Y. Leung, Y.-T. Yao, and S.-C. Chuang
criteria: the status readability, feedback feeling, and preference (Fig. 4). The collected data were analyzed by the Friedman test. If a significant difference was shown, the least significant difference method (LSD) [16] was employed to perform the post hoc test. Then one or more alternatives within each function were selected to be included in the final model. There were two parts in the between function experiment: 1) function buttons layout, and 2) braille labeling position choice. In the formal part, each subject was asked to arrange the 6 functions in a remote control from top to bottom based on their preference. The collected data were also analyzed by the Friedman test and LSD if a significant result was shown. Then functions could be grouped. In the latter part, they have to choose either left side or right side of the function button to attach braille labels (Fig. 5). The collected data were analyzed by proportion comparison and discussion.
4 Results and Analysis The results are presented in the following sections. 4.1 Within Function Experiment Table 4 lists the results of the Friedman test for the within function experiment. After cross comparing the results among three criteria, the selected alternatives within each function are also listed in Table 4. Table 4. Cross-Comparison of Alternatives within Function Function
Power
Temperature
Wind Speed
Timer
Sleep
Wind Direction
A B Alternative C D R1 Friedman Test (χ02 / p)
F2
LSD Grouping5
R F P
Selected Alternative
P3
9.094 / 0.0106 6.098 / 0.0474 10.300 / 0.0058 C BA C BA C BA
3.100 / 0.2122 0.400 / 0.8187 6.699 / 0.0351 N/A4 N/A4 AB BC
9.094 / 0.0106 0.400 / 0.8187 0.700 / 0.7047 C BA N/A4 N/A4
C
A, B
C
15.648 / 0.0004 CA B CA B CA B
3.430 / 0.1800 3.430 / 0.1800 7.354 / 0.0253 N/A4 N/A4 BA
3.947 / 0.1390 3.189 / 0.2030 12.603 / 0.0054 N/A4 N/A4 BD DA C
C, A
B
B, D
N/A4 N/A4
1: Status Readability; 2: Feedback Feeling; 3: Preference; 4: Not available; 5:
A Design of Air-Condition Remote Control for Visually Impaired People
89
Alternatives within each function is from the best to the worst. There was no significance between alternatives within group, but significance between groups. 4.2 Between Function Experiment The Friedman test for the function button layout had χ02 = 26.476 (p < 0.0001). LSD grouping has shown that the 6 models each corresponding to one function have been classified into three groups: (power switch, temperature setting), (wind speed selection, wind direction selection), and (sleeping mode, time setting) arranging form top to bottom on a remote control. There were 10 subjects preferred the braille labels put to the left side of press button, 8 preferred right side and 2 had no preference. Since the reading habit in braille is from left to right and from top to bottom, after discussing with all the subjects, they agreed that the braille label put to the left of the press button is more convenient.
5 Discussion and Conclusion The marketed air condition is operated by a remote control presented mainly with visual cues. Individuals with visual impairment have difficult to use it [13]. Therefore, they hardly operate air condition by themselves alone. Two designers and 2 special educators figured out a set of design principles (Table 1). Each design alternative has been modeled by ABS. After a two-stage experiment, a final model is then shown in Fig. 6.
Fig. 6. The final model of air condition remote control for visually impaired people
This model is assumed to be tactually operated. Each function shown on the remote control is separated by 1 mm high line and labeled with braille to its left side in order to make function choice unambiguously. The icons on top of press buttons can be read by fingers. The activated buttons and inactivated buttons can be told by their heights. The reasons why the subjects preferred these alternatives are shown in Table 5 and might be a good reference for designing an air condition remote control for visually impaired.
90
C.-Y. Leung, Y.-T. Yao, and S.-C. Chuang Table 5. The reasons why the alternatives chosen in the final model
Function
Power
Temperature
Single button Simple icon
Unfamiliar with regular number Few buttons Traditional icons
Wind Direction
Wind Speed
Sleep
Icon had clear direction and easily tactually distinguished
Single button Simple icon
Time setting
Selected Alternative
Reasons
Familiar with braille Each button has unique meaning
Control and display are consistently matched
All the subjects involved in this study stated that they wish this model can be commercialized as soon as possible. They expected one day they can comfortably operate air condition by themselves alone and enjoy it. A real prototype would be made in the near future to test its usability. Hopefully, a real working tactile remote control can be realized soon. Acknowledgments. The thanks will be given to the Technology Development Association for the Disabled in Taiwan and all the people involved in this research.
References 1. Burgess, J.H.: Human Factors in Industrial Design: the Designer Companion. TPR, Blue Ridge Summit (1989) 2. Chiou, W.K.: A Study on Tactile Interfaces As an Aid to Home Electronic Appliance for the Visually Impaired. Research report (NSC 89-2213-E-182-005), National Science Council, Taiwan, R.O.C. (2000) 3. Dixon, J.M.: Low-Tech Devices: Do We Have What We Need? The Braille Monitor. In: Proceedings of the Third U.S./Canada Conference Technology for the Blind, National Federation of the Blind (1997) 4. Edman, P.K.: Tactile Graphics. American Foundation for the Blind, New York (1992) 5. Frascara, J., Takach, B.S.: The Design of Tactile Map Symbols for Visually Impaired People. Information Design Journal 7(1), 67–75 (1993) 6. Garrett, J.J.: The Elements of User Experience: User-Centered Design for the Web. New Rider, Berkeley (2002) 7. Huang, C.J.: Accessibility of E-Government Web Sites. Encyclopedia of Digital Government. Idea Group Reference, London (2007) 8. Huang, C.M.: Usability of E-Government Web-Sites for People with Disabilities. In: Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS 2003), Big Island, Hawaii, January 6-9, 2003, IEEE, Los Alamitos (2003) 9. Huang, Y.H.: A Study on the Legibility of Air-Condition Remote Control for the Elderly. Thesis for Master of Science, Department of Industrial Design, Tatung University, Taiwan (2008) 10. Hung, T.C.: The Research of Applied to Solid User Interface for the Elderly: TV Based Remote Controller. Thesis for Master of Science, Department of Industrial Design, National Cheng Kung University, Taiwan (2002)
A Design of Air-Condition Remote Control for Visually Impaired People
91
11. Leung, C.Y., Chen, L.Y.: A Study on the Pattern and Color Contrast of Chinese Characters on the TV Remote Control. Tatung Journal 27, 129–133 (1997) 12. Leung, C.Y., Huang, W.N.: A Study on the Pattern and Color Contrast of Chinese Characters for Air Condition Remote Control. In: The 4th Academic Research Conference, The Chinese Design Association, Tatung University, Taipei, Taiwan, R.O.C., pp. 367–370 (1999) 13. Leung, C.Y., Yao, Y.T., Chuang, S.C.: A Study on Requirements and Opinions in Operating Air-condition for Visually Impaired People. In: International Conference on Speciation Education and Art Therapy, Taichung University, Taichung, Taiwan, R.O.C. (2008) 14. Lin, S.C.: The Research of Drinking Machine Interface for Visually Impaired Users. Thesis for Master of Science, Department of Industrial Design, National Cheng Kung University, Taiwan (2000) 15. Moggridge, B.: Designing Interactions. MIT Press, Cambridge (2007) 16. Montgomery, D.C.: Design and Analysis of Experiments, 7th edn. John Wiley & Sons, New York (2009) 17. Preece, J., Rogers, Y., Sharp, H.: Interaction Design: Beyond Human-Computer Interaction, 2nd edn. Wiley & Sons, New York (2007) 18. Saffer, D.: Designing for Interaction. New Rider, Berkeley (2007) 19. Steinfeld, E., Danford, G. (eds.): Enabling Environments: Measuring the Impact of Environment on Disability and Rehabilitation. Kluwer Academic/Penum Publishers, New York (1999) 20. Tatung Tau-yuan 1st Plant Website (2008), http://www.tatung.com/B5/air/index.htm 21. Wan, M.M.: Assistive Technology Devices and Services for Persons with Visual Impairment, 2nd edn. Wu-Nan Publishers, Taipei (2007) 22. Wu, C.Y.: A Study on Designing the Optimal Parameters of Hierarchical Menu System on 3C Products for Visually Impaired People. Thesis for Master of Science, Department of Industrial Design, Tatung University, Taiwan (2006) 23. You, M.L., Chen, W.Z.: A Study on Applying Tactile Symbols to Assist the Visually Impaired in Recognizing the Operational Function on Products. Journal of the Chinese Institute of Industrial Engineers 15(1), 9–18 (1998)
Verb Processing in Spoken Commands for Household Security and Appliances Ioanna Malagardi1 and Christina Alexandris2 1 Educational & Language Technology Laboratory Department of Informatics & Telecommunications National and Kapodistrian University of Athens, HELLAS
[email protected] 2 National and Kapodistrian University of Athens, HELLAS
[email protected]
Abstract. The present paper concerns the handling of verbs in the Speech Recognition Module of an HCI system for the remote control of household security and the operation of household appliances. The basic language used is Modern Greek, but the system’s design includes the basis of a multilingual extension for the use of the system by native-speakers of other languages. The human- computer communication must preferable to be accomplished in natural language. Some methods of Artificial Intelligence can contribute to the solving of the natural language processing problems. The target for a multilingual extension of the system has imposed the restrictions that commands are kept simple and referring expressions such as deictic noun phrases and pronouns as well as anaphoric expressions are avoided. The interaction with the system is strictly based on dialogs with restricted options in order to increase the feasibility of the speech interface. Keywords: speech recognition, natural language processing, motion verbs, interlinguas.
1 Introduction The present paper proposes the architecture of a system for the handling of verbs in the Speech Recog nition Module of an HCI system for the remote control of household security and the operation of household appliances. The basic language used is Modern Greek, but the system’s design includes the basis of a multilingual extension for the use of the system by native-speakers of other languages, to fulfil the needs of the multinational workforce in Greece today. The target for a multilingual extension of the system has imposed the following two restrictions: (1) Commands are kept simple and Referring Expressions such as deictic noun phrases (i.e “this window”), deictic pronouns (i.e. “this, that”) [6] and pronouns related to anaphoric expressions (“it”, “they”) are avoided. (2) The interaction with the system is strictly based on dialogs with restricted options. Thus, dialog management does not involve processing conversations with the system [4]. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 92–99, 2009. © Springer-Verlag Berlin Heidelberg 2009
Verb Processing in Spoken Commands for Household Security and Appliances
93
Commands are restricted to simple orders, in the form of imperatives and expressing three types of actions. The first action is the movement of an object (change of position), the second action is the opening or closing of some objects (change of state) and the third action is to put one object on another object (change of relation). Even for the simple graphical representation in the computer’s screen, we have to consider the physical attributes of the objects and principles of geometry and physics. Here, the actions concerned only involve actions related to change of state.
2 Understanding and Managing Verbs Understanding the imperatives requires understanding the meaning of actions such as “open”, “close”, “put” and the meaning of prepositional words such as “on”. One integrates the meanings of the constituents and produces a meaning of sentence as a whole, taking pragmatic factors into consideration where appropriate. Having done so, the system constructs a plan for execution of the task in the environment. Only then can the system perform the action in the given environment. Among the many issues involved in the comprehension of imperatives in a physical domain and the execution of underlying tasks, it is of crucial importance to represent the meanings of verbs and prepositions in order to characterize underlying actions. Thus, we describe a simple representation in which movements denoted by action verbs can be expressed in a manner that can be implemented in terms of a computer program. Suppose an agent is asked to perform the following commands in a suitable environment: “Open the door”, “Open the bottle”, “Close the box”, and “Put the book on the desk”. Each of these sentences specifies an underlying task requested of an agent. In order to perform the task, the performing agent has to “understand” the command. Understanding the imperatives requires understanding the meaning of actions such as “open”, “close”, “put” and the meaning of prepositional words such as “on”. The agent must integrate the meanings of the constituents and produce a meaning of the sentence as a whole, taking pragmatic factors into consideration where appropriate. Having done so, one must construct a plan for execution of the task in the given environment. Only then can the agent perform the action. All of the above steps need to be followed, regardless of whether the agent is human or program-controlled such as an animated agent in a computer graphics environment or a robotic agent [8]. 2.1 The Complexity Factor in Expressing Motion Motion can be indicated by a verb either directly or indirectly. The simplest way to specify motion of an object is by using a verb that specifies motion in a straightforward manner. An example verb is “move” as used in the sentence “Move the chair from the wall to the table”. It simply directs the system to execute a motion with the chair as the affected object. Indirect specification of motion can be achieved in two ways: in terms of geometric goals, or in terms of a force. Indirectly specifying motion in terms of a goal involving physical relationship among objects is quite common among verbs. Consider the sentence “Put the bottle on the table”. The instruction requires that a physical object be moved (i.e., the bottle) with a goal to establish a physical relationship (the relationship of “on”) between it and another physical object (i.e., the table). The
94
I. Malagardi and C. Alexandris
performance of such an instruction demonstrates that the goal of establishing a physical relationship drives the motion of the first object. For verbs such as “put” that specify motion in terms of a geometric goal, properties of the objects that participate in the underlying action are of crucial importance. Except these verbs, there is another way to specify motion indirectly without using these verbs. This is by specification of a force rather than the actual motion itself. In these cases too, we have to focus primarily on physical characteristics of the actions that underlie motion verbs. In order to do so, we need to obtain physically realizable representations for the meanings of such verbs. One source of the multiplicity of meaning of a command is the multiplicity of the senses of a word as recorded in a dictionary. Another source is the possibility of an object to be placed on a surface in different ways. For instance, when the user submits a command, the agent, in order to satisfy the constraints of the verb, he may ask for new information and knowledge about objects and verbs, which may be used in the future. In this case, a machine-readable dictionary would be used, providing the definition of the verbs [8]. For example, if the user enters the command “open the door”. The agent isolates the words of the command and recognizes the verb “open” and the noun phrase “the door”. The verb “open” appears in the lexicon with a number of different definitions. For example, in the LDOCE [12] we find, among others, the senses of “open” a: to cause to become open, b: to make a passage by removing the things that are blocking it. The agent finds in the knowledge base that there are two alternative ways of interpreting the verb “open”, using either a “push” or a “pull” basic motion. Then, it selects the first one and asks the user if this is the right one. When the user enters a “Yes” answer, this is recorded in the knowledge base and the process terminates. When the user enters a “No” answer, the process continues trying sequentially all the available sides of the book until a “Yes” answer is given by the user [9]. The above-presented scenario involving a knowledge base and a sequence of questions and answers performed between the user and the agent (system) may provide a more rigorous and sublanguage-independent approach in respect to motion verbs and their arguments (objects), however, it entails difficulties in its implementation in spoken and multilingual applications.
3 Input Management In the present system, physically realizable representations for the meanings of motion-verbs concern actions to be performed in respect to household security and appliances. The requested actions comprise user-queries or system-output concerning (1) the performance of an action [7], for example “Lock all the windows”/ “All windows are locked”, or a (2) check [7], for example “Is the central-heating on?”/ “The centralheating is turned off”. The set of lexical entries in state and action types may be paired with phrases or expressions initiating sentences constituting queries in respect to (a) actions (“Action”) that the user asks to be performed or (b) queries in respect to objects that the user wishes to be checked (“Check”) respectively.
Verb Processing in Spoken Commands for Household Security and Appliances
95
Table 1. Relation of Use Case, Function and Code Use Case Use Case 1:
Function House-Security Function
Code HOUSE
Use Case 2:
Appliances-Control
APPLIANCES
Input management for the Speech Recognition Module is based on the use of keyword lists. Keyword lists are linked to user input control, in the form of keywordgroups. Keyword recognition includes a number of yes-no question sequences of a Directed Dialog [15], [16]. The use of directed dialogs and yes-no questions aims to the highest possible recognition rate of a very broad and varied user group. Additionally, the use of selected keywords allows the efficient handling of ambiguous “Multitasking” verbs, typically occurring in Greek [3]. “Multitasking” verbs are related to multiple semantic meanings and used in a variety of expressions, existing in Modern Greek, and possibly in other languages as well. For example, in Greek, at least in the sublanguage related to the communication context of commercial activities, the semantically related verbs “buy”, “get” and “purchase” may be used in similar expressions as the (primitive) verbs “is” and “have” as well as the verbs “give” and “receive” to convey the same semantic meaning from the speaker [14], [3]. Keywords constituting the actual elements recognized by the system may divided into three main categories: (a) Elements consisting keywords that are mapped in respect to closed and relatively small lists, (b) Elements that are mapped in respect to open databases, such as names, (addresses and locations may be added) (c) Elements consisting numbers that may include information such as quantity, address or date. The type of input recognized by the Speech Recognition Module is attempted to focus on (at least) two types of keywords within the Speaker’s utterance, namely the type of action requested by the Speaker and the type of object or activity related to the requested type of action. “Time” is an optional parameter added to this basic form. This approach may be formally described in its basic from as a Template related to the type of content of the utterance: [(OBJECT) + (ACTION-TYPE)]. Table 2. Keywords categories recognized by the Speech Recognition Module (Basic form) Category: Object [(OBJECT) + (ACTION-TYPE)] OBJECT: (HOUSE-FEATURES) (APPLIANCE-TYPE)
Category: Action [(OBJECT) + (ACTION-TYPE) ACTION-TYPE: = OPEN, CLOSE, START, CHECK
STOP,
The actual main components of the speakers response, constituting keyword categories, may be described, in the present application, as a sublanguage-specific set of categories (closed lists) (a) such as: (HOUSE-FEATURES), (APPLIANCE-TYPE), (ACTION-TYPE) and open-categories (b) for names, (possibly addresses and place names (PLACE), in a future extension of the system) as well as keyword categories are related to temporal information and quantitative expressions (Time).
96
I. Malagardi and C. Alexandris
Keywords grouped under ACTION-TYPE involve expressions related to activities such as activating the alarm or checking if the power supply is turned off. Specifically, keywords constituting ACTION-TYPE are expressions related to requested actions to be performed in respect to household features and appliances, for example “Turn on TV”, “Open the door” or “Turn off the oven”. Additionally, Keywords constituting ACTION-TYPE include expressions related to checking the operation of household features and appliances, for example “Is the gas switched off?” and “Is the alarm on?”. The logical relations of the two types of keywords to be recognized by the Speech Recognition Module and the actual words related to each closed list are described by Table 2 and Table 3 respectively. Table 3. Words related to the closed lists and open lists of Keyword categories recognized by the Speech Recognition Module Keyword category ACTION-TYPE
ACTION-TYPE Objects (OBJECT):
Time:
Function type OPEN, CLOSE START STOP
Keyword list OPEN = activate, activated, open, opened, running, switch-on, switched-on, turn-on, turned-on CLOSE = ,close, closed, de-activate, deactivated shut, shut-down, switch-off, switched-off, turn-off, turned-off turnoff, stop START = run, start, started, start, begin STOP = stop, stopped, pause, paused CHECK CHECK = inform, check, see, look HOUSEHOUSE-FEATURES: doors, windows, FEATURES alarm, garage-door, door-lock, camera APPLIANCE APPLIANCE-TYPE: central-heating, -TYPE: gas, electricity, power-supply, lights, oven, refrigerator, washing-machine, television DAY-OFDAY-OF-WEEK = Monday, Tuesday, WEEK Wednesday, Thursday, Friday, Saturday, RELATIVE- Sunday, Weekend TIME RELATIVE-TIME = today, tomorrow, CLOCK yesterday, CLOCK = twelve o’clock, half DATE past two, ten fifteen DATE =February the eighteenth, March the third
Lexical entries composed of more than one word that have to be processed by the system as a singular expression are presented with a dash “-“ between the components. The limited set of lexical entries is chosen according to the criteria of simplicity, directness in order avoid as much as possible the occurrence of (1) ambiguities in respect to the speech recognition component and (2) complications in the user’s/hearer’s understanding of the system output constituting natural or synthetic speech [1],[2]. For example, expressions such as “activate” (a device, a program), although are in general practice regarded as highly appropriate and correct by professionals and the computer literate, may have the effect of rather unusual or even incomprehensive to a remarkable percentage of users like the elderly or non-native speakers.
Verb Processing in Spoken Commands for Household Security and Appliances
97
Therefore, the present system also allows the recognition of simpler expressions such as “open” to be mapped to the same command or information as a more “appropriate” expression such as “activate”. This basic set of lexical entries and respective dialogs allows the possibility of additional development according to the needs of the User Cases utterances and respective lexical additions to keyword groups.
4 User-Friendly System Output System output in respect to the information on the objects is related to the limited set of lexical entries in state and action types described above. The above-described pairing of lexical entries with phase- or expression types also foresees and allows the default handling of less than perfect speaker’s utterances, since spoken language is characterized by fragmented syntactical structures. The chances of ungrammatical pairings of lexical entries with phase- or expression types are accounted for, however, are predicted to be very limited with native speakers. The effort must be made for the utterances produced by the system to be (1) clear and unambiguous towards the user but at the same time to be (2) friendly and naturalsounding and, in addition, to contain expressions that, from a semantic aspect, (3) constrain the range of the user’s possible responses to a minimum, thus restricting as far as possible the probability of ambiguities and misinterpretations regarding user input. Thus, the system must be compatible to the criteria of successful operation at the Utterance Level (Informativeness, Intelligibility, Metacommunication handling i.e. repetition, confirmation of user input/pauses), the Functional Level (Ease of use/functional limits, Initiative and interaction control, processing speed/smoothness) and the Satisfaction Level (Perceived task success, comparability of human partner, trustworthiness) [13]. The Speech-Act oriented approach in the steps of the dialog structure for spoken technical texts are targeted to meet the requirements of “Precision”, “Directness” and “User-friendliness”, summarizing the criteria of Informativeness, Intelligibility and Metacommunication handling on the Utterance Level (Question-Answer-Level) [13], the Functional Level (initiative and interaction control) and the Satisfaction Level (perceived task success, comparability of human partner and trustworthiness) [13].
5 Multilingual Extension of the System Although keyword-group user-input may vary according to the language or even in respect to the user, this type of input cannot deviate considerably from being restricted and hence, manageable for multilingual applications and allowing minimum interference of language-specific factors. In an attempt to meet the needs of the diverse community of foreign residents, the present system allows the use of Interlinguas (ILTS), to be used as semantic templates for a possible multilingual extension of the present dialog system.
98
I. Malagardi and C. Alexandris
The Interlinguas are designed to function within a very restricted sublanguage, with a rigid and controlled dialog structure based on Directed Dialogs, most of which involve Yes-No Questions or questions directed towards Keyword answers. The structure of the proposed Interlinguas is based on a strategy for filtering user-input for the efficient handling of both ambiguous and “multi-use” expressions used for expressing multiple types of information [3]. Traditional ILTS [5],[10],[11] are constructed around a verb-predicate signalizing the basic semantic content of the utterance, the so-called “frames”. Thus, for example, the utterance “I am booked for Friday” is signalized by the frame “booked”. In the present application, the role of the “frame” in the Interlingua structure is weakened and the core of the semantic content is shifted to the lower level of the lexical entries (Table 4). The “frame” level will not signalize the meaning of the sentence: This task will be performed by the lexical entries. The proposed Basic Interlinguas [3] may be characterized to be more of Interlinguas with an accepting or rejecting input function [3] rather than the traditional Interlinguas with the function of summarizing the semantic content of a spoken utterance. Table 4. Basic Interlingua “Frame” type ACTION
CHECK
Keyword categories WHAT (OBJECT) WHERE (PLACE) WHEN (TIME) WHAT (OBJECT) WHERE (PLACE) WHEN (TIME)
Object type OBJECT (HOUSE)
OBJECT (APPLIANCE)
6 Conclusions and Further Research The processing of spoken commands involving movement by an HCI system intended for the broad public and with an envisioned extension to multilingual applications entails a well-structured approach in the Design Phase. For the remote control of household security and operation of household appliances, the above presented approach, facilitates Speech Recognition, thus, contributing to the quality, usability and safety of the system. The next step is the integration of the above-proposed strategy in the Speech Recognition Module, its evaluation and subsequent adaptation to multilingual applications.
References 1. Alexandris, C.: Word Category and Prosodic Emphasis in Dialog Modules of Speech Technology Applications. In: Botinis, A. (ed.) Proceedings of the 2nd ISCA Workshop on Experimental Linguistics, ExLing 2008, Athens, Greece, August 2008, pp. 5–8 (2008) 2. Alexandris, C.: Show and Tell: Using Semantically Processable Prosodic Markers for Spatial Expressions in an HCI System for Consumer Complaints. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4552, pp. 13–22. Springer, Heidelberg (2007)
Verb Processing in Spoken Commands for Household Security and Appliances
99
3. Alexandris, C.: The CitizenShield Dialog System in Multlingual Applications. In: Proceedings of the National Conference in Knowledge Management and Governing Systems, Hellenic Society of Systemic Studies – HSSS 2007, Pireus, Greece, May 12-14 (2007) 4. Bos, J., Ota, T.: A spoken language interface with a mobile robot. Artificial Life and Robotics 11, 42–77 (2007) 5. Dorr, B., Hovy, E., Levin, L.: Machine Translation: Interlingual Methods. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, 2nd edn., ms. 939 (2004) 6. Foster, M.E., Gurman-Bard, E., Guhe, M., Hill, R.L., Oberlander, J., Knoll, A.: The Roles of Haptic-Ostensive Referring Expressions in Cooperative, Task-based Human-Robot Dialogue. In: Proceedings of HRI 2008, Amsterdam, The Netherlands, March 12-15, 2005 (2008) 7. Heeman, R., Byron, D., Allen, J.F.: Identifying Discourse Markers in Spoken Dialog. In: Proceedings of the AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Stanford (March 1998) 8. Kontos, J., Malagardi, I., Trikkalidis, D.: Natural Language Interface to an Agent. In: EURISCON 1998 Third European Robotics, Intelligent Systems & Control Conference Athens. Published in Conference Proceedings “Advances in Intelligent Systems: Concepts, Tools and Applications”, pp. 211–218. Kluwer, Dordrecht (1998) 9. Kontos, J., Malagardi, I., Bouligaraki, M.: A Virtual Robotic Agent that learns Natural Language Commands. In: 5th European Systems Science Congress. Heraklion Crete. Res. Systemica., vol. 2 (October 2002) (special issue), http://www.afscet.asso.fr/resSystemica/accueil.html 10. Levin, L., Gates, D., Lavie, A., Pianesi, F., Wallace, D., Watanabe, T., Woszczyna, M.: Evaluation of a Practical Interlingua for Task-Oriented Dialogue. In: Proceedings of ANLP/NAACL-2000 Workshop on Applied Interlinguas, Seattle, WA (April 2000) 11. Levin, L., Gates, D., Wallace, D., Peterson, K., Lavie, A., Pianesi, F., Pianta, E., Cattoni, R., Mana, N.: Balancing Expressiveness and Simplicity in an Interlingua for Task based Dialogue. In: Proceedings of ACL 2002 Workshop on Speech-to-speech Translation: Algorithms and Systems, Philadelphia, PA (July 2002) 12. Longman Dictionary of Contemporary English, The up-to-date learning dictionary. Editorin-Chief Paul Procter. Longman group Ltd., UK (1978) 13. Moeller, S.: Quality of Telephone-Based Spoken Dialogue Systems. Springer, New York (2005) 14. Nottas, M., Alexandris, C., Tsopanoglou, A., Bakamidis, S.: A Hybrid Approach to Dialog Input in the CitzenShield Dialog System for Consumer Complaints. In: Proceedings of HCI 2007, Beijing China (2007) 15. Williams, J.D., Witt, S.M.: A Comparison of Dialog Strategies for Call Routing. International Journal of Speech Technology 7(1), 9–24 (2004) 16. Williams, J.D., Poupart, P., Young, S.: Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management. In: Proceedings of the 6th SigDial Workshop on Discourse and Dialogue, Lisbon (September 2005)
Thermal Protection of Residential Buildings in the Period of Energy Crisis and Its Influence on Comfort of Living Przemyslaw Nowakowski Wroclaw University of Technology, Department of Architecture 54-210 Wroclaw, Str Prusa 53/55, Poland
[email protected]
Abstract. It has been noticed within a few years now that energy prices soared all over the world. Apart from growing costs of vehicle fuels, prices of energy used in flats have risen. They take a substantial share in domestic budgets. It poses a problem mainly for residents in countries where the building industry have been using less advanced technologies. The solutions applied there refer both to conditions of the residential resources management and selection of building and decor materials. A response to the need for reducing energy consumption in residential buildings comprises successively corrected and tightened legal rules and regulations concerning thermal protection of buildings. Also, the users themselves undertake independent initiatives aimed at reduction of exploitation costs. However, the issue of improving thermal isolation of buildings is a costly venture, and profits from its implementation are noticeable only after many years. The paper will discuss technical tendencies of thermal protection of newly erected residential buildings (passive buildings, among other things) as well as the older ones, subject to so called thermal modernisation. What is more, the paper will concern the influence the buildings have on the living comfort and possibilities of counteracting negative consequences as far as influence of thermal isolation technologies on people, natural environment and a technical condition of buildings is concerned. Keywords: residential buildings, low energy consumption, living comfort.
1 Introduction: Energy Safety and Policy for Renewable EnergyResources One of the important challenges of a contemporary country is to ensure energetic safety. The safety implies maintaining such economy conditions which allow to meet a demand for fuel and energy at simultaneous complying with environment protection requirements. Recently, the basic factor that decides about energy delivery safety has been reliability of fuel delivery systems. However, the technology progress nowadays allows to apply solutions to a large degree independent of network systems. There are technologies among them that use renewable energy resources, which are of a particular great interest in Europe. A more general application of the technologies lets C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 100–107, 2009. © Springer-Verlag Berlin Heidelberg 2009
Thermal Protection of Residential Buildings in the Period of Energy Crisis
101
achieve two fundamental goals. The first one is to increase the energy safety in Europe by reducing dependence on imported fossil fuels such as petroleum, natural gas and coal. The second goal is to reduce the greenhouse gas emission, carbon dioxide in particular, resulting from the fossil fuels combustion. In order to minimise the risk of lowering the energy safety of a country, e.g. because of interference for political reasons, technical breakdowns or strikes, fuels delivery is diversified in terms of their kinds and directions. The economic aspect of safety amounts mainly to ensuring energy carriers price accepted by consumers. In the ecological aspect, however, safety is connected with maintaining possibly an intact natural environment condition, which requires meeting appropriate standards and ecological commitments by governments. Usage of renewable energy resources and new “clean” technologies of its production is strongly promoted nowadays. Climatic conditions do not allow for entire cutting off the human environment from energy resources. Even buildings constructed according to best technologies need to be warmed up or cooled down and lit up. Such appliances of every day use require power supply. Acquisition and combustion of the traditional energy resources such as coal, crude oil and gas negatively affect the environment. What is more, the resources are running low and will soon exhaust. Therefore, applying renewable natural energy resources is becoming a necessity on a larger scale. It will become one of major priorities of power industry development in the following years. Rational use of energy of the natural renewable origins, which mean water and wind energy, solar energy, the geothermal energy, and biomass, is one of important components of the sustainable development that brings measurable ecological, energetic and social effects. The increase in contribution of the renewable energy resources in the fuel and energetic balance of the world observed since the 80. contributes to improvement of the energy use efficiency and saving the natural resources, improving the condition of the natural environment (mainly by reducing the carbon dioxide emission to the atmosphere and reducing the amount of the produced waste). It has been estimated that since that time the solar energy use around the world has doubled, and the water energy has risen four times, and is still on the increase. Thus, supporting the renewable energy resources is becoming more and more serious challenge to nearly all the countries around the world, and Europe in particular, which is reflected in many EU programmes [3].
2 Heating Costs in Buildings It has been estimated that in the European Union countries the use of energy in buildings runs at about 40% of the total energy use. In the living buildings energy is mainly used for heating and, alternatively, cooling rooms (about 60%), heating up water (about 25%) and preparing meals, lightening rooms, supplying with power the household appliances (the rest 15%). In the public utility buildings, a share of energy used for heating up and cooling rooms is similar to the energy use level in the living buildings. However, in the latter ones more energy is used for lightening systems and electrical appliances. The amount of energy used at home depends on many factors such as a house project, a selection of materials used for its construction, residents activity and the
102
P. Nowakowski
manners of realising their living needs (the needs are connected to heating up the rooms, preparing warm water for use and lightening systems, etc.). Elements that have a considerable influence on the heat loss include: a building heating surface, a house shape, insulating thermal partitions, materials used for a building construction, a selection of thermal insulation materials or insulation and construction materials, a rooms project and arrangement (rooms orientation according to the geographic directions), the windows number and size, constructions favourable to emerging thermal bridges, and a ventilation system. Heating costs depend on energy and fuel prices in the world market. In the recent years, a particular price increase has been recorded as far as crude oil and gas are concerned, since the prices vary according to a current political situation around the world. Also, a general trend of increasing energy prices has been noticed, which is a consequence of a continuous development of the global economy that despite the increased energy use efficiency generates growing energy demands, particularly in dynamically developing countries such as China and India. The growing energy prices are also a result of the progressing natural resources exhaustion. Exploitation costs constitute a great part of household expenditures. They result mainly from a considerable energy use. Growing energy prices force people to minimise the heat loss. It has been estimated that in older buildings, which do not meet requirements imposed upon modern buildings, the heating cost of one square meter may be about twice and a half higher. In order to maintain the monthly heating costs of such a building at a rational level, part of property owners withdraw from heating up rooms and thus decide on using the building in a thermal discomfort [4]. A thermomodernisation may be the measure to improve the microclimatic conditions in buildings of an old type and maintenance costs reduction. It involves introduction of changes which eliminate a considerable heat loss in buildings. Major thermomodernisation works include: windows and external doors exchange or repair, walls and roofs warming, warming the floor on the ground, improving ventilation systems, introducing appliances that use the renewable energy resources, modernisation and exchange of the heating system in a building, etc.
3 Energy Saving Living Buildings Nowadays, the term „energy saving building” is being used more and more frequently, since the energy efficiency has become an important feature of a building, and in the next years it is going to become a requirement generally imposed on builders and users. In Germany, regulations that determine an energy-consuming potential of a building (energy used for heating up the building) have been in force since 1995. The potential is to be at the level of 50-100 kWh per a square kilometer of a house within a year. What is more, it is expected that the value will decrease to the level of 2 2 30-70 kWh/m a. In Switzerland, houses using energy below the level of 55 kWh/m are regarded as energy-saving [4], [6]. In Poland newly erected buildings are charac2 terised by a considerable energy consuming potential reaching 120 kWh/m a. There2 fore, use of energy at the level of 90 kWh/m a may be deemed to be a bordeline value below which we can talk about an energy-saving building. However, it should be 2 taken into account that the borderline will soon be lowered to 70 kWh/m a.
Thermal Protection of Residential Buildings in the Period of Energy Crisis
103
Growing exploitation costs, more and more strict usage requirements, and also care for the natural environment force the contemporary investors to seek new concepts and technologies of erecting buildings. It has been necessary to abandon traditional techniques and implement more energy-saving solutions. Introducing the system energy-saving solutions lead to transformation of the traditional buildings into low energy objects or so called passive houses. The concept of a passive house emerged in Europe in Germany in the 80 of the twentieth century. In Europe, about 10 000 houses of this type have been built so far. The concept has also been implemented in objects of general use, particularly in kindergardens, schools and offices [9]. The essence of the passive building industry is maximising energy profits and minimising the heat loss. In order to meet the conditions, all external divisions and partitions must have a low coefficient of heat penetration. What is more, the external building layer has to be air-tight and provide a good protection against the heat loss. Similarly, windows carpentry has to cause a less heat loss than that standard one used so far. The blowing in and blowing out ventilation system, in turn, will decrease the heat loss connected with a building ventilation by 75-90%. What is particular about the passive buildings is the fact that the demand for heat is satisfied by thermal profits generated by the solar radiation, the heat generated by equipment and people staying in a building. Only in periods of particularly low temperatures the air blown into rooms is heated up. The passive house standard can be regarded as a synonym of a highly energysaving building, since it requires a very little amount of energy for heating up. The basic condition required from a house aspiring to meet this standard is its advanced energy-saving quality: a building cannot require more than 15 kWh/m2 for heating in a year, 60% of air cannot escape from the building in an hour, which is evidence of its air-tightness. The requirements are very high, as opposed to casual, traditional, even warmed up buildings. In a passive house, a thermal comfort is ensured by passive energy resources, unnoticed earlier. There can be only its residents, electrical appliances, the solar energy and the heat recovered from ventilation. Frequently, a building does not need an active autonomous heating system [6] , [8]. The basic manner of reaching the standards of a passive building is its highly good thermal insulation. A thick layer of insulation, however, lessens a house use surface leaving at the same external size of the building unchanged. A good thermal insulation allow to considerably decrease energy needed for heating a house. However, the heat loss and the need to additionally heat up a house in the winter cannot be eliminated totally. In fact, each house is equipped with many waste heat sources (such as light bulbs or household appliances and multimedia), but normally the heat they generate is not sufficient. Therefore, the passive buildings can utilise the solar energy for heating. The southern walls have to have big windows then which let in more sunlight in the winter. Three-pane windows contribute to a considerable decrease in the heat loss. An optional element of a passive house is solar collectors installed on the roof, which use the solar energy for heating an entire building and heating up water for daily use. Whereas the air for ventilation is let through a ground heat exchanger, where it is slightly heated up. Then, it is heated in a recuperator, taking part of the heat from the air used and leaving the building. It allows to
104
P. Nowakowski
significantly minimise the loss of energy for ventilation. Additionally, the same convertor and recuperator may help to cool down the air taken from outside in the summer (Fig. 1). A well designed building must comply with many very important conditions frequently excluding each other. The process of designing energy - saving buildings is complex and requires a lot of knowledge and experience. The functional arrangement and technological solutions ought to comply with the assumed low use of energy. Planning an energy-saving house, a passive one in particular, should take into consideration the following requirements: a consistent shape of a building (best if it is connected with other buildings), the main facade of the building oriented towards the south (the windows also directed towards the south should be possibly large, the northern ones, however, should be possibly small), lack of construction elements that would shade the passive house (hindering use of passive sun rays as a means to heat up the house), use Venetian blinds, roller blinds, awnings or protruding arcades (in the summer time they will protect against overheating the interior), ensure as best thermal insulation of a house construction elements as possible (usually U≤0,15 W/m2K), and a building tightness, use windows with possibly least permeability, eliminate thermal bridges, group wiring and plumbing (e.g. situating the bathroom next to, above or below the kitchen), minimise the length of wires and pipes, insulate and seal ventilation and heating ducts.
Fig. 1. Principle of a passive house operation [7]
Modern installations consist of a complicated technical system, whose efficient operation is possible only by using a computer controlled appliances. These appliances should control not only the heating and ventilation systems, but also lightening, telecommunication and protection systems. Thus, an energy-saving building will be at the same time an “intelligent” building in the future. Additionally, an appropriate greenery design on a building plot may appear very helpful for maintaining appropriate temperatures in a passive building during an entire year (Fig. 2). Planting deciduous trees in front of the southern elevation gurantees
Thermal Protection of Residential Buildings in the Period of Energy Crisis
105
Fig. 2. Insulation of a house by deciduous trees in the summer and winter time [1]
shade in the summer season, when protection against high temperatures is very important. In the winter time, when plants lose leaves, the sun rays are coming through the unshaded windows, which is a significant source of heat in a general energetic balance of the building. From the northern side of the building, in turn, the coniferous greenery is better, since it protects the building against cold winds that cause undesirable chilling. The concept of passive buildings is at present a major proposal of erecting energy-saving buildings, which has a chance to turn out to be successful and popular in the practical application. Merely a 10 – 20 % increase of investment outlays may result in saving energy even at the level of 80 – 85 %. What is more, immeasurable ecological advantages – decreased use of dominant energy resources (coal, gas and fuel oil) – are an additional gain. Such houses, however, are still a minority in the building industry. Investors give up additional outlays, which are recouped only after 8 – 10 years, accepting at the same time increased energy use and exploitation costs [8].
4 Comfort in Rooms, Temperatures Range In an energy-saving house, a buffer rooms arrangement is very important. It means that the same temperature does not have to be maintained in all the rooms. The temperature should be adjusted to room functions. Maintaining optimal level of temperatures in living rooms allow to save thermal energy. Reducing the temperature in several living rooms will additionally decrease use of thermal energy. Reducing the temperature by 1° C may save thermal energy even up to 6%. In the living room and in the children’s rooms there should be the temperature of 20 – 21° C, since in those places residents spend most of their time. In the bedroom even the temperature of 16 – 18° C is sufficient. The lowered temperature makes sleep better, and it is not stuffy. The bathroom should be well heated – the temperature their should remain at the level of 22 – 24° C. In the kitchen, the temperature may remain at the level of 18° C, as additional heat is generated during the cooking process. In other storage accommodation units (pantries, compartments, laundry
106
P. Nowakowski
rooms) merely the temperature of 12 – 15° C is enough, in the garage, however, maximum 4 – 8° C. Simultaneously, when determining temperatures in rooms a rule ought to be observed that the temperature difference between neighbouring rooms should not exceed 8° C. Thanks to this, relatively thin partition walls can be used, without additional thermal insulation. This principle is applied in most house projects, in which rooms arrangements is designed in such a way so it forms a buffer zone. The garage, for instance, has to adhere to household rooms (the laundry and pantry rooms) which, in turn, are to adjoin the kitchen and then with rooms [2].
5 Progress in the Energy Saving Building Industry The contemporary technology enables introducing a full control of the heating and humidity processes in buildings. This is possible in a building maximally tight, very well insulated, equipped with automatically controlled central heating appliances and mechanical ventilation with heat recycling. Windows in such a building are used for lightening the house interior with daylight and are mostly not designed to be open. Automatic appliances provided with sensors of external and internal temperature control ventilation and heating systems. This basic system can easily be further developed and improved by connecting with a system of solar collectors, and also by a system of roller blinds, curtains and shutters regulated automatically depending on the temperature outside and insulation conditions. Complying with the severe requirements in the energy-saving building industry, the passive one in particular, requires applying specific building materials and installation systems. Following the normative recommendations cause then considerable limitations in selection of materials for thermal insulation and manners of ventilation, limiting them to mechanical blowing in and blowing out ventilation. As materials for thermal insulation foamed polystyrene and more seldom mineral wool are recommended and generally propagated. They are, in fact, characterized by a high coefficient of thermal insulation, however, their influence on the chemical microclimate of the interior and on the natural environment is harmful. Production of those materials is energy consuming, their use is connected to emission of substances harmful to the surrounding area (e.g. styrol from polystyrene or tiny fibres from glass wool). Additionally, as utilised materials they are not biodegradable and require special storage. A favourable alternative to the mentioned insulation materials may be mats made of cork, ground cellulose, wood fibres and sheep wool, etc. In spite of the fact that the materials have slightly worse insulation qualities and are more expensive, they are at least neutral to people’s health, and they are biodegradable. Therefore, they are ecological. Long-term usage of the mechanical blowing in and blowing out ventilation and lack of an appropriate supervision (which may particularly occur in detached houses) may cause a significant contamination of ducts and specific components such as filters, ventilators, etc. A contaminated mechanical ventilation may cause spraying dust, saprophytes and pathogenic bacteria [1], [5]. The mechanical ventilation has to be propelled by electric energy and be subject to control and cleaning. This, in turn, increases real exploitation costs of a house, which are not always appropriately calculated at the design stage.
Thermal Protection of Residential Buildings in the Period of Energy Crisis
107
6 Summary The contemporary energy crisis and growing costs of houses maintenance result from multiple international conflicts, monopolization of traditional resources supplies and growing demand for them. Interest in the natural renewable resources is still passive. The potential of local resources is too little used. It is average energy consumers who suffer consequences of an ineffective energy policy of particular countries in the form of continually growing house exploitation costs. Apart from a necessary diversification of energy resources, directions and forms, modern building systems whose application cause measurable energy use should be propagated. Therefore, the aim should be to convert the present building model into the energy-saving and ecological building industry, adjusted to the sustainable development principles, which means not threatening people and the natural environment. It refers first of all to materials whose production does not require considerable resources and energy outlays, materials that in the building and use of a house will not affect harmfully the living organisms, and materials which after pulling down a building may be reused. It involves installation systems and appliances for removing rubbish and waste in a way that enables their cleaning, utilization and recycling. It requires introduction of new solutions, technologies and materials, and also change of designers’ attitude as well as change of habits of house users. It is a process that will take place within many years and the changes will occur gradually.
References 1. 2. 3. 4. 5. 6. 7. 8. 9.
Alternative energy resources, http://www.drewnozamiastbenzyny.pl Comfort in rooms, temperatures, http://www.termodom.pl Energetic safety, http://www.termodom.pl Energy saving building, http://www.termodom.pl Fritsch, M.: Handbuch gesundes Bauen und Wohnen. Deutscher Taschenbuch Verlag, Munich (1999) Müller, G., Schluck, J.: Passivhäuser. Bewaehrte Konzepte und Konstruktionen, Kohlhammer (2007) Passive house, http://en.wikipedia.org/wiki/Passive_house The passive house, http://www.dompasywny.pl What is a passive house? http://www.termodom.pl
Design for All Approach with the Aim to Support Autonomous Living for Elderly People in Ordinary Residences – An Implementation Strategy Claes Tjäder The Swedish Institute of Assistive Technology (SIAT) Box 510, SE – 162 15 Vällingby, Sweden
[email protected]
Abstract. Most elderly want to remain in their ordinary home. There are products and services available which make it possible to support an autonomous living with a high quality of life. To finds ways how to implement DfA supportive technology in co-operation with housing enterprises is dealt with in this paper. Methods used at some workshops to single out measures from different perspectives are described. Keywords: Elderly, autonomous living, implementation strategy, DfA technology support, real estate and/or housing enterprises.
1 Introduction Already today but even more tomorrow the ordinary residence will be a place where home care and health care support is given. Supportive Design for All technology can be one ingredient in a coherent strategy to realise the desire for most elderly to remain in an ordinary residence. This paper deals with the challenge on how to introduce the technology support into the flats and jointly used facilities in block of flats.
2 Demographic Development With a growing proportion of the population over the age of retirement, and a predicted rise in the proportion of people with disabilities, the cost of social and institutional care will grow. EU-citizens nowadays live 8 years longer than they did 30 years ago. Improved health and life conditions result from measures taken at a European, national and regional level. A woman born in EU today will reach an average lifetime which is 81,4 and a man 75,3 and it will increase. To be able to remain for a longer time in the ordinary resident is important from several perspectives. The most important one is obvious, most elderly want to remain in their ingrained environment as long as possible due to the fact that the daily life functions. Other aspects are the lack of good alternatives to the existing residences. The number of elderly is increasing every year, in some years the number of elderly C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 108–116, 2009. © Springer-Verlag Berlin Heidelberg 2009
Design for All Approach with the Aim to Support Autonomous Living
109
will exceed the level of today essentially as a result of the great numbers of children born during the 1940 – 1959. This paper deals with a possible contribution to enhancing the quality of life and reducing the social and economic costs, and developing a strategy for supporting an ageing population in Europe. Technology can be an ingredient in a strategy supporting an ageing population in Europe with the aim to enhance the quality of life.
3 Technology Is There In a national 3 year project (Technology support at Home) SIAT has been working together with user organisations, municipalities and housing enterprises in developing and testing supportive technology for elderly with cognitive impairments. These target groups have high demands on usability. As more people become elderly more people will face cognitive difficulties, at first, and cognitive impairments later on. By working together with users with cognitive difficulties and impairments when testing and developing products, services and methods, only those with accessible design will meet the user requirements. On the other hand, those products, services and methods that meet the requirements originating from users with cognitive difficulties most likely will also be easy to use by other user groups. This could then be a way forward when developing supportive technology initially designed for some fragile groups into mainstream products and services. The result from the practical 3 year project showed that of 60 IT based technical support devices/installations, 4 of 5 had a positive effect. The devices concerned smart home technology and cognitive support. They built on routines the users had. The evaluations were done in relation to increased autonomy and ability by occupational therapists.
4 Who Is the Customer? Some products are now being launched on the market. Reminder systems for example integrated in ordinary residences using the same IT-infrastructure as systems for energy efficiency and individual measuring of water or energy consumption in block of flats is of course beneficial for real estate owners. But also families with children could enhance their quality in life with the DfA product – “go out button” which give tenants a reminder, when leaving the residence, in case they have forgotten to switch off some electrical appliances, the water or not locked the door properly. This kind of system can now be seen in some new ordinary block of flats. At the same time it could be stated that the introduction takes time. This is the fact in many countries. Findings in research in Sweden about elderly indicate that early introduction of technical support is vital for the user. It is also demonstrated that introduction of technical support should be seen in conjunction with other measures such as support from relatives, house adaptations or home care from the municipality. There are no systems available which have integrated all the technologies and designed services around them even though technology has been available for many years.
110
C. Tjäder
Could the technical support be part of the public undertaking versus elderly or is it a responsibility for the elderly themselves? From a company perspective it is paramount to clarify that kind of question. We can now see some installations of supportive technology in line with DfA concept already made in my country by some real estate companies as a part of a comfort package included in the monthly rent.
5 Implementation Strategy In the following a process will be described. The aim with the exercise is to come forward with some kind of “recommendation” or “good advice” to housing enterprises and/or real estate owners. The ambition is to achieve some concrete results during 2009. 5.1 Aim with an Implementation Strategy The following aims have been formulated: − to make it possible for as many elderly as possible to live an autonomous life with high quality in an ordinary residence they have chosen themselves. − to establish a co-operation between several actors – public and private – targeting a “win-win-situation”. − to introduce technical support in a context together with other measures with the aim to increase physical and cognitive accessibility and social measures such as home care support. − to raise awareness of products and services available on the market and clarify burden sharing between society and the elderly themselves. 5.2 Supportive Technology for All – A Process Initiation Within SIAT and in co-operation with external actors a process has been initiated to concretize the implementation strategy with the aim to facilitate introduction of products and services in block of flats. The target group for the process is housing enterprises and real estate owners that already today have demonstrated an interest. In our case it means municipality owned and cooperatively owned block of flats. Basically the process is built on the insight that the gap between social sector and the real estate industry needs to be narrowed. Recently SIAT was involved in a project aiming to define the requirements of a future network for broadband not only in new block of flats but also for block of flats when there is need to make a major maintenance investment such as replacement of water and sewage systems. The initiator to that project was the Federation for municipality owned enterprises that wanted to take the future social and health care services into account when planning for future IT infrastructure in ordinary residents. Recently the result of this specific project has been used in both a planning process for a new residential area and in a process where a big municipality owned company in the Stockholm area is uses the result when planning major maintenance investments in
Design for All Approach with the Aim to Support Autonomous Living
111
residents for elderly. This is an example that could help creating more business cases in the future. 5.3 Driving Forces and Method Chosen – Where to Start? The needs are with the end users, that is why the products and services must have this as the starting point. Just as crucial is the fact that the companies developing products and services must have in mind how the product or service is to be marketed or who is to pay. A large and strong interest to make the product and service commercially viable is extremely important. Up till now there has been a huge difference between consumer products and services on one hand and assistive products on the other when it comes to marketing and driving forces. As the market conditions vary the companies acting on the procured market and the company on the ordinary consumer market uses different strategies. Within the framework of workshops, which also involved representatives from private sector, the strategy for implementation was outlined. It was stated in the beginning that it would be useful to limit the scope to some real estate owners. As a result of that consideration focus in the workshops was on municipality owned companies and co-operatively owned (tenant owners). The municipality owned companies do have a double responsibility. They have to make sure that the municipality owned block of flats constitute a good and competitive alternative to present and new tenants. Municipalites do also have a responsibility via the social care service versus elderly and the their housing conditions and persons with disabilities. As a result of the different roles the municipalities hold SIAT draw the conclusion that it would be natural to address the municipality owned enterprises when starting the work with a possible implementation strategy. Some co-operatively owned ( tenant owners) block of flats also have demonstrated a genuine interest in technology support of DfA character. Some also have clearly expressed an interest to take measures to support tenant owners to be able to remain in their ordinary residence as long as possible with a high quality of life. To investigate and to get a better picture of what these co-operatively owned block of flats already have done was a natural activity. It was seen as a part of a strategy that a co-operation with these organisations could be very fruitful. Especially when it comes to dissemination of a future strategy these co-operatively owned block of flats, with own experiences, could be very useful partners as we want to reach out with a message on the usage of DfA supportive technology. 5.4 What Was the Purpose with the Workshop Series? The workshop series included hands-on activities. It was a combination of presentations, individual and group activities such as: 1. Listing of proposals and prioritization of the products and services which were seen to be important to housing enterprises when they do decide on major maintenance investments or initiate new production. 2. Create understanding for what is needed to make it possible for good products and services to come into use.
112
C. Tjäder
3. Initiate development of strategies which leads to the result that real estate owners and /or housing enterprises invest in technology support which facilitate for elderly to remain in their ordinary residences with a high quality of life. The result from the opening workshop is shown in figure 1.
Fig. 1. Results from the opening Workshop
At the end of a following workshop we discussed possible criteria for selecting certain products and services. The main purpose with the selected products and services were to enhance the conditions for elderly and people with impairments when it comes to housing and living conditions. That does not exclude that other groups of users such as relatives or personnel could benefit from the products and services chosen. During the first two workshops the main task was to select products and services, some examples are given in figure 1. At the end of workshop 2 we discussed the criteria for selecting certain products and services. It was also questioned whether the criteria should be somewhat different. Perhaps a better way would be to scrutinize how the products and services should be marketed to be successful versus the real estate owners.
Design for All Approach with the Aim to Support Autonomous Living
113
It was mentioned that the word assistive technology to some people is stigmatizing. Then it would be better to use the word technology support as that expression is more neutral. The idea was to focus on how to make the presentation. Another conclusion was to concentrate on functionalities and to describe the usefulness for most tenants from different perspectives rather than on the special needs that elderly and people with impairments have. One example was the usefulness that a easy interface for a booking system in common wash house. Cognitive symbols easily recognizable and understandable are other examples of “products” that are easy to explain from a usefulness perspective. The aim with the workshop 1 and 2 was to chose and create methods and descriptions in order to augment the interest among real estate owners for: • • • •
The supply of good products to facilitate ordinary daily activities at home The value of technology support for the tenants The value for real estate owners for the selected products and services. The installation of the “recommended” products and services.
5.5 Selection Criteria The next phase included a method to describe efforts and effects. The products and services were first positioned individually by every workshop participant in the diagrams showed in figure 2 and 3 and later on collectively discussed during the workshop. A product which is placed in the top left is presumably less interesting compared to a product which is placed in the right hand corner along the x-axle, see figure 2. In figure 3 the products were introduced in relation to the overall goal to most elderly
Fig. 2. Importance for remaining at home
114
C. Tjäder
Fig. 3. Effort for remaining at home, versus time and cost
Fig. 4. Complexity of the approach followed
Design for All Approach with the Aim to Support Autonomous Living
115
namely to what extent the product or service can contribute to remain at home in the ordinary residence. At the HCI Conference 2009 I will in my presentation share examples from workshops of products and services positioned into these figures. As examples I have inserted some products in the figures. Activities needed to go all the way from the producer to the end user is built on an interchange of contacts between several organizations. If you not are aware of what groups that needs to be satisfied you might end up in a result where good products remains as pilot projects. To meet the economic conditions and to understand complexity is crucial. In figure 4 an effort has been made to try to illustrate the complexity, when you take several actors into account. 5.6 Obstacles and the Way Forward At one workshop we tried to identify obstacles which make it difficult to introduce the supportive technology. The solution we believe lies in finding the argument which neutralises the problems and if possible demonstrates new opportunities instead. During this exercise we used the method with the “six thinking hats”, a model developed by the Edward de Bono. He claims that in a changeable world you have to focus on what it is going to be instead of what it is and thinking is something that can developed. ”Six thinking hats” is used worldwide from preschool to global companies.
Fig. 5. The ‘Hat model”
The purpose with the ”hat model” is to get a better understanding of the structure of thinking. As an example by starting with analyzing possible arguments against a certain product (black hat) we elaborated some problematic areas. This was followed by exercises where we used for example the ”green hat” (corresponds to creativity and new ideas) where the group came up with proposals, new arguments and solutions. In the links between the actors there is a need for information and selling activities which is described in figure 5. This means in practical life that it is not always easy to
116
C. Tjäder
release different products from a variety of suppliers. One supplier/distributer with a well-known brand and well functioning channels has a much easier way to reach out to the market than many small suppliers with a weak brand. It is of vital importance to take this into account if you want to work effectively and result-oriented. 5.7 How to Speed Up the Process? The 3 year project - Technology support at Home – mentioned in the beginning of the paper still is now being followed-up by several measures. The technology is there but there is no comprehensive “package” to deliver. A key issue is that real estate owners and housing enterprises have to be involved as users of the products and services available already today. Real estate owners lack good examples of what could be done with products and services to make it possible for elderly to remain in the ordinary residences. Activities in my country and elsewhere can contribute to speed up implementation of supportive DfA technology. Our ambitions in Sweden are to present Guide lines and requirements, when it comes to physical accessibility and technical support, for actors involved in planning and procuring housing for elderly. The process described above aims to complement these Guide lines with some sort of “Recommendations” on what kind of products and services available on the market that fulfils the DfA requirements. At the Conference I will be able to elaborate on some results from the process described above.
Speech Input from Older Users in Smart Environments: Challenges and Perspectives Ravichander Vipperla, Maria Wolters, Kallirroi Georgila, and Steve Renals 1
Centre for Speech Technology Research, School of Informatics, University of Edinburgh 2 Institute for Creative Technologies, University of Southern California
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. Although older people are an important user group for smart environments, there has been relatively little work on adapting natural language interfaces to their requirements. In this paper, we focus on a particularly thorny problem: processing speech input from older users. Our experiments on the MATCH corpus show clearly that we need age-specific adaptation in order to recognize older users’ speech reliably. Language models need to cover typical interaction patterns of older people, and acoustic models need to accommodate older voices. Further research is needed into intelligent adaptation techniques that will allow existing large, robust systems to be adapted with relatively small amounts of in-domain, age appropriate data. In addition, older users need to be supported with adequate strategies for handling speech recognition errors.
1 Introduction Older people are an important user group for many types of smart environments, ranging from sophisticated home automation systems to state-of-the-art environmental control systems. Speech can form an important interface for smart home environments because it is hands-free and enables potentially richer interactions. Spoken interaction is of particular benefit for people with mobility restrictions, such as those caused by diseases such as rheumatism and arthritis which affect one in three adults over the age of 65 in the UK1. Speech input and output is also very useful for visually impaired people: 10% of the population aged 65-74 in the UK is visually impaired2. Although there has been an increasing amount of research in smart home environments [1], there has been limited use of speech-based interactions. This is largely due to the challenges posed by spoken language systems in domestic environments. If the users are not forced to wear microphones, or to interact via some kind of handset, then room-based microphones distant from the user must be used. This dramatically 1
National Statistics: Morbidity: Arthritis is more common in women. http://www. statistics. gov.uk/cci/nugget.asp?id=1331. Last visited 27/01/09. 2 Rosemary Tate, Liam Smeeth, Jennifer Evans, Astrid Fletcher, Chris Owen, Alicja Rudnicka: The prevalence of visual impairment in the UK. A review of the literature. Royal National Institute for the Blind. Last retrieved 15/02/2009. www.rnib.org.uk/xpedio/groups/public/documents/publicwebsite/public_prevalencereport.doc. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 117–126, 2009. © Springer-Verlag Berlin Heidelberg 2009
118
R.Vipperla et al.
increases the problem of Automatic Speech Recognition (ASR), since the users’ speech must be separated from the many other acoustic sources in a home setting. Microphone arrays, which enable software directed beam forming, are an attractive approach to this problem [2], but the technology is still relatively immature, and is computationally demanding. However, good results have been achieved for large scale automatic speech recognition tasks in less demanding environments, such as business meetings [3]. Furthermore, accurate speech recognition and natural-sounding speech synthesis, do not comprise a useful interaction modality on their own. These speech technologies must be combined with speech understanding and dialogue management if a usable spoken language modality is to be provided. The INSPIRE system [4] is one of relatively few examples of a smart home system with well-developed spoken interaction. Hands-free speech interfaces provide flexible solutions that free people from having to carry an interaction device (such as a phone) or from having to physically move to a console, and are thus well suited to older people and to people with disabilities or limited mobility. Such speech interfaces have even been shown to be feasible for users with severe speech impairments such as dysarthria, if the commands are appropriately designed and the system is sufficiently adjusted using samples of the user’s speech [5]. Despite this great potential, older users’ speech input presents challenges that the ASR community has only recently begun to address. In this paper, we examine the effect of speaker age on recognition performance, in terms of acoustic variability and in terms of linguistic factors. We have performed two experiments. In the first experiment, we investigate linguistic differences between older and younger users in the context of the language modeling component of an ASR system; in the second experiment, we focus on the acoustic variability that arises from vocal ageing and report on the combined effect of linguistic and acoustic factors on ASR for older users. These experiments were performed using the MATCH corpus, which contains interactions between both older and younger users and several different appointment scheduling dialogue systems. We conclude that innovative strategies are required for adapting existing speech recognition systems to older voices; in particular to achieve high accuracy, speech recognition systems not only need to cover the precise domain of interaction, but they also have to take into account the acoustic and linguistic characteristics of older users’ speech.
2 Older Speakers, Older Voices: A Challenge for ASR The effects of ageing are notoriously difficult to study because chronological age is a relatively poor predictor of anatomical, physiological, and cognitive changes [6, 7]. This variability is not just due to genes, but also to individuals’ lifestyle [8]. As a consequence, older users are notoriously difficult to design for, because individual older people will have very different needs and abilities. With ageing, several degenerative changes occur in the respiratory system, larynx and the oral cavity which form the human speech production mechanism [9]. Significant changes affecting speech include loss of elasticity in the respiratory system leading to decreased lung pressure, calcification of the laryngeal tissues leading to the instability of the vocal fold vibrations, loss of tongue strength, tooth loss, and changes in the dimensions of the oral cavity [10]. Ageing affects many acoustic parameters of
Speech Input from Older Users in Smart Environments: Challenges and Perspectives
119
the speech wave form such as fundamental frequency, jitter, shimmer and harmonicnoise ratios [11]. Ageing voices are also characterized by increased breathiness and slower speaking rates. All these changes in the acoustics of older voices have their impact on ASR systems: Word Error Rates (WERs) for older voices are significantly higher than for younger voices [12-14]. At first blush, older users’ language should not differ much from that of younger users. Even though cognitive abilities such as fluid intelligence generally decline with age [15], acquired knowledge, such as vocabulary, tends to be well-preserved [16] – to the extent that older users may use a richer vocabulary than younger ones. However, older users are more prone to word finding difficulties [17] and may produce more disfluencies under stressful conditions [18]. Word finding difficulties can lead to unexpected pauses, phrasing, and disfluencies. Patterns of word use also change during the life span. Older people use fewer words that denote negative emotions and fewer self-referential words [19].
3 The MATCH Corpus 3.1 Design and Structure of the Corpus The MATCH corpus was recorded during a cognitive psychology experiment that investigated the accommodation of cognitive ageing in spoken dialogue interfaces [20]. 24 younger users (aged 18-29 years, mean 22) and 26 older users (aged 52-84 years, mean 66) booked health care appointments using nine different simulated spoken dialogue interfaces. Each person used each system only once in order to constrain the duration of the experiment. All dialogues were strictly system-initiative. Users could only select health care professionals and time slots proposed by the system; they were not able to suggest any aspect of the appointment themselves. This very restrictive design was chosen for two reasons: (1) it allowed us to control the dialogue structure for the purposes of the underlying cognitive psychology experiment; (2) user utterances were more likely to be restricted to the options presented in a given system message, which should make them easier to recognize. All users participated in an extensive battery of cognitive tests before the experiment and completed detailed questionnaires rating system usability. A total of 447 dialogues were recorded using an EDIROL R01 digital recorder and a sampling frequency of 44.1 kHz.3 The dialogues contain 3.5 hours of speech. All dialogues were transcribed orthographically by a trained annotator using the tool Transcriber [21] and the AMI transcription guidelines [22], which were used for creating the AMI meetings corpus [23]. The AMI guidelines were chosen because they have been explicitly designed to provide a solid basis for speech recognition research and to facilitate a wide range of further possible annotations. The corpus has been annotated semiautomatically with dialogue acts and information state update information [24]. For our recognition experiments, the users’ speech was divided into contiguous sequences delimited by pauses, speech spurts. Older users produced a total of 1680 speech spurts4 while younger users produced 1369 spurts. 3 4
Recordings for three dialogues were lost. Speech spurts are contiguous sequences of user speech delimited by pauses.
120
R.Vipperla et al.
3.2 Advantages and Limitations of the Corpus Although appointment scheduling is not a central task of smart environments, it is a key functionality in many related applications such as electronic diaries and automatic scheduling of health care appointments. Since the MATCH corpus was created for a cognitive psychology experiment, dialogue structure, appointment scenarios and system vocabulary were tightly controlled. As a result, the vocabulary is much less diverse and the language is more formulaic than that of corpora which were recorded for speech research, such as DARPA Communicator [25]. It is also relatively small compared to other speech research corpora. Despite these disadvantages, the MATCH corpus is one of very few corpora that contain a large proportion of older speakers. Unlike the Dragon corpus [26] or the OYEZ corpus [27], it contains highly detailed dialogue act and information state annotations. The MATCH corpus has already been used successfully for training simulated users [28]. Simulated users typically interact with the dialogue system in order to learn dialogue policies. We found that the behavior of older users could not be modeled adequately using data from younger users – age appropriate data was needed. 3.3 Differences between Older and Younger Users In our analyses of the MATCH data, we found substantial differences in both speech and language between older and younger users. While younger users mainly produced utterances that were directly relevant to the appointment scheduling task, older users often attempted social interaction with the system. They thanked it for providing information, or provided information that was not specified in the task definition and could not be processed by the dialogue system. In particular, older users frequently attempted to take the initiative and suggest convenient appointment slots, even though the dialogues were strictly system-initiative. Overall, older people produce significantly more individual words (tokens) and significantly more distinct word forms (types) than younger people. Taken together the 26 older users used 373 distinct types, whereas the 24 younger users only had a vocabulary of 92 distinct types between them. Older users were more likely than younger users to use expressions other than “yes” to express agreement, such as “fine”. Older people also tend to use expressions that are more appropriate in human-human interactions, such as forms of “goodbye” or “thank you”. More detailed results can be found in [24]. These results lead us to expect that language models trained on material from younger users only will underperform when confronted with data from older users. In particular, we expect to see a high proportion of out-of-vocabulary words.
4 Experiments In our experiments, we examined the effect of age-specific language models and acoustic models on speech recognition performance. All experiments were set up using the Hidden Markov Model Toolkit (HTK).5
5
HTK version 3.4. http://htk.eng.cam.ac.uk
Speech Input from Older Users in Smart Environments: Challenges and Perspectives
121
4.1 Experiment 1: Impact of Language Modeling Design. The aim of this experiment was to assess the effect of the differences in interaction style between younger and older users described above, on the language modeling component of the speech recognizer and consequently on ASR performance. From the transcripts of the MATCH corpus, the following bigram language models were constructed: 1) from all the utterances of the older speakers (LM-Older); 2) from all the utterances of the young speakers (LM-Young); 3) for each test speaker, from the entire corpus excluding the test speaker (LM-All-1); 4) for each older test speaker, from the corpus of all the older speakers excluding the test speaker (LM-Older-1); and 5) for each young test speaker, from the corpus of all the young speakers excluding the test speaker (LM-Young-1). For each older test speaker, three ASR experiments were performed, keeping the acoustic model fixed and using different language models for the speaker viz., LM-All-1, LM-Older-1 and LM-Young. Similarly, ASR experiments were repeated for each of the young speakers using the language models: LM-All-1, LM-Young-1 and LM-Older. Since the amount of data in the MATCH corpus is not sufficient to build acoustic models from scratch, we used the speech from other corpora for this purpose. Acoustic models were trained on 73 hours of meetings data recorded by the International Computer Science Institute (ICSI), 13 hours of meeting corpora from the National Institute of Standards and Technology (NIST) and 10 hours of corpora from the Interactive Systems Lab (ISL) [29]. These models were then adapted using the maximum a posteriori approach [30] with 13 hours of speech from 32 UK speakers from the Augmented Multi party Interaction (AMI) data. For training the models the waveforms were parameterized into perceptual linear prediction cepstral feature vectors. Energy along with 1st and 2nd order derivates were appended giving a 39 dimensional feature vector. The acoustic models were trained as crossword context dependent triphone Hidden Markov Models (HMMs). Results. Goodness of fit of the language model on a test set was measured using perplexity [31]. The lower the perplexity, the better is the language model for the test set. We also assessed the number of out-of-vocabulary (OOV) words, i.e. the number of words in the test set not present in the vocabulary of the language model. We found that language models trained on younger users were a bad fit of the language of older users, whereas data from the older users allowed us to model the language patterns of younger users reasonably well. In particular, models trained on younger users only did not contain many of the words older people used. These findings are consistent with the results of our experiments with simulated users discussed above. Detailed results are shown in Table 1. Table 1. Perplexity and % OOV Words Test Set Younger Older
Language Model LM-Older LM-Young
Perplexity 5.44 19.18
OOV (%) 1.38 15.57
122
R.Vipperla et al.
Fig. 1 shows ASR Word Error Rates (WER) using different language models as explained above, averaged over all the young speakers and older speakers respectively. As we would expect from the results presented in Table 1, we find that WERs for older speakers are particularly high when using the language models of the younger speakers. This is due to the mismatch between the older and younger users’ interaction styles. Clearly, we need age-appropriate data to build adequate language models for older speakers.
Fig. 1. ASR Word Error Rates for young and older speakers’ test sets using different language models
4.2 Experiment 2: Impact of Acoustic Models Design. In this set of experiments, we examined the impact of differences in the acoustics of older and young speakers on speech recognition performance. In order to isolate the effect of the acoustic models, we only used the language model LM-All, which contains all utterances in the MATCH corpus, for this set of experiments. The acoustic models described in the previous experiment (models adapted with AMI data) were used as the baseline models. For each of the old speakers, two acoustic models were created by maximum a posteriori adaptation of the baseline models using the speech from either the rest of the old speakers excluding the test speaker (AMI + MATCH older-1) or speech from the young speakers (AMI + MATCH younger). Acoustic models were similarly created for each young speaker with the speech data from all the older speakers (AMI + MATCH older) and the speech data from the rest of the young speakers (AMI + MATCH young-1). Results. Fig. 2 shows average WERs for both young and older speakers. The WERs for older speakers are higher than those for younger speakers by 10.99% absolute using the baseline acoustic models. Adapting the models with speech from a new domain (i.e. appointment scheduling) is expected to reduce the WERs for the test data in the new domain. While adapting the baseline models with older speakers from the
Speech Input from Older Users in Smart Environments: Challenges and Perspectives
123
Fig. 2. ASR Word Error Rates for young and older speakers’ test sets using different acoustic models
MATCH corpus (AMI + MATCH older) brings down the WERs for young speakers, the results are even better with adaptation using speech from other younger speakers in the same corpus (AMI + MATCH young-1). The results for older speakers in Fig. 2 are quite interesting, Contrary to the belief that speech from a new domain should help in creating better models for the new domain, adapting the baseline models with speech from the younger speakers of MATCH corpus (AMI + MATCH young) deteriorates the performance for the older speakers in the same corpus. Hence, there is a clear mismatch in the acoustics of older and young speakers resulting in a higher WER for older speakers. The reasons for this result require further investigation.
5 Conclusion In our ASR experiments, we discovered that older users’ speech resulted in higher error rates compared with the speech of younger users. This was caused by both acoustic and linguistic factors. We have performed experiments with a variety of acoustic and language models, estimated from both in-domain and out-of-domain data, derived from both younger and older users. These results have highlighted the fact that ASR systems need to take into account both acoustic and linguistic aspects of the speech of older users. Our results indicate that the speech recognition component of a spoken dialogue system used in a smart home environment must be adapted to both the domain of usage and to the acoustic and linguistic characteristics of the users. In particular, we have shown that in-domain speech data matched to younger users does not appropriately adapt the system to the language of older users in the same domain. Even though the MATCH corpus was tightly controlled and covered a comparatively narrow domain, the findings of Möller et al. [32] suggest that we can expect to see similar results for other domains, such as controlling household items or televisions. In order to accommodate the vocabulary and speaking patterns used by older people as well as the sound of older voices, designers and programmers need to ensure
124
R.Vipperla et al.
that adequate data is collected. In particular, the tasks must be clearly specified, and all relevant domains must be covered. This data set need not be large - it is possible to use existing data and small amounts of matched data to adapt generic ASR systems to task domain and user age. “Factored” adaptation algorithms are particularly promising. They can combine adaptation data that partially matches the task in question either in terms of age or in terms of domain. Last but not least, since older people’s speech poses special challenges for ASR, systems need to provide adequate support for handling recognition errors, both within the voice modality, and in combination with other modalities. Acknowledgements. This research was funded by SHEFC grant HR04016 MATCH (Mobilising Advanced Technologies for Care at Home). We would like to thank Robert Logie and Johanna Moore for collaborating on experiment and corpus design and analysis, Neil Mayo and Joe Eddy for coding the Wizard-of-Oz (WoZ) interface, Vasilis Karaiskos for administering the experiment, Melissa Kronenthal for transcribing all 447 dialogues, and Matt Watson for scheduling participants, administering the cognitive test battery, and data entry. This work has made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF) (http://www.ecdf.ed.ac.uk/). The ECDF is partially supported by the eDIKT initiative (http://www.edikt.org.uk).
References 1. Helal, S., Mann, W., El-Zabadani, H., King, J., Kaddoura, Y., Jansen, E.: The Gator Tech Smart House: a programmable pervasive space. Computer 38, 50–60 (2005) 2. Vovos, A., Kladis, B., Fakotakis, N.: Speech operated smart-home control system for users with special needs. In: 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, pp. 193–196 (2005) 3. Renals, S., Hain, T., Bourlard, H.: Recognition and interpretation of meetings: The AMI and AMIDA projects. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (2007) 4. Moeller, S., Krebber, J., Raake, A., Smeele, P., Rajman, M., Melichar, M., Pallotta, V., Tsakou, G., Kladis, B., Vovos, A., Hoonhout, J., Schuchardt, D., Fakotakis, N., Ganchev, T., Potamitis, I.: INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control. In: Proc. LREC, pp. 1603–1606 (2004) 5. Hawley, M.S., Enderby, P., Green, P.D., Cunningham, S.P., Brownsell, S., Carmichael, J., Parker, M., Hatzis, A., O‘Neill, P., Palmer, R.: A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics 29, 586–593 (2007) 6. Arking, R.: Biology of Aging. Oxford University Press, New York (2005) 7. Rabbitt, P., Anderson, M.: The lacunae of loss? Aging and the differentiation of cognitive abilities. In: Lifespan Cognition: Mechanisms of Change. Oxford University Press, New York (2006) 8. Deary, I.J., Whiteman, M.C., Starr, J.M., Whalley, L.J., Fox, H.C.: The impact of childhood intelligence on later life: Following up the Scottish Mental Surveys of 1932 and 1947. Journal of Personality and Social Psychology 86, 130–147 (2004) 9. Linville, S.E.: Vocal Aging. Singular Thomson Learning, San Diego (2001)
Speech Input from Older Users in Smart Environments: Challenges and Perspectives
125
10. Ramig, L.O., Gray, S., Baker, K., Corbin-Lewis, K., Buder, E., Luschei, E., Coon, H., Smith, M.: The Aging Voice: A Review, Treatment Data and Familial and Genetic Perspectives. Clinical Linguistics and Phonetics 53, 252–265 (2001) 11. Xue, S.A., Hao, G.J.: Changes in the human vocal tract due to aging and the acoustic correlates of speech production: a pilot study. Journal of Speech, Language, and Hearing Research 46, 689–701 (2003) 12. Vipperla, R., Renals, S., Frankel, J.: Longitudinal study of ASR performance on ageing Voices. In: Proc.1 Interspeech 2008, pp. 2550–2553 (2008) 13. Baba, A., Yoshizawa, S., Yamada, M., Lee, A., Shikano, K.: Acoustic models of the elderly for large-vocabulary continuous speech recognition. Electronics and Communications in Japan, Part 2 (Electronics) 87, 49–57 (2004) 14. Wilpon, J.G., Jacobsen, C.N.: Study of speech recognition for children and the elderly. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, vol. 1, pp. 349–352 (1996) 15. Baeckman, L., Small, B.J., Wahlin, A.: Aging and Memory: Cognitive and Biological Perspectives. In: Handbook of the Psychology of Aging, pp. 349–377. Academic Press, San Diego (2001) 16. Verhaeghen, P.: Aging and vocabulary scores: a meta-analysis. Psychology of Aging 18, 332–339 (2003) 17. Shafto, M.A., Burke, D.M., Stamatakis, E.A., Tam, P.P., Tyler, L.K.: On the tip-of-thetongue: neural correlates of increased word-finding failures in normal aging. J. Cogn. Neuro-sci. 19, 2060–2070 (2007) 18. Caruso, A.J., McClowry, M.T., Max, L.: Age-related effects on speech fluency. Seminars in Speech and Language 18, 171–179 (1997) 19. Pennebaker, J.W., Stone, L.D.: Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology 85, 291–301 (2003) 20. Wolters, M., Georgila, K., Logie, R., MacPherson, S., Moore, J., Watson, M.: Reducing Working Memory Load in Spoken Dialogues: Do We Have to Limit the Number of Options? In: Interacting with Computers (accepted, 2009) 21. Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication 33 (2000) 22. Moore, J., Kronenthal, M., Ashby, S.: Guidelines for AMI Speech Transcriptions. AMI Deliverable (2005) 23. Carletta, J.: Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation 41, 181–190 (2007) 24. Georgila, K., Wolters, M., Karaiskos, V., Kronenthal, M., Logie, R., Mayo, N., Moore, J., Watson, M.: A Fully Annotated Corpus for Studying the Effect of Cognitive Ageing on Users’ Interactions with Spoken Dialogue Systems. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (2008) 25. Walker, M.A., Passonneau, R.J., Boland, J.E.: Quantitative and qualitative evaluation of DARPA Communicator spoken dialogue systems. In: Proceedings of the 39th Meeting of the Association for Computational Linguistics, pp. 515–522 (2001) 26. Anderson, S., Liberman, N., Bernstein, E., Foster, S., Cate, E., Levin, B., Hudson, R.: Recognition of Elderly Speech and Voice-Driven Document Retrieval. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Phonenix, Arizona (1999) 27. Vipperla, R., Renals, S., Frankel, J.: Longitudinal study of ASR performance on ageing voices. In: Proc. Interspeech, pp. 2550–2553 (2008)
126
R.Vipperla et al.
28. Georgila, K., Wolters, M., Moore, J.: Simulating the Behaviour of Older versus Younger Users. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, Human Language Technologies (ACL/HLT), pp. 49–52 (2008) 29. Hain., T., Burget., L., Dines., J., Garau., G.: M.Karafiat., Lincoln., M., McCowan., I., Moore., D., Wan., V., Ordelman., R., Renals, S.: The 2005 AMI System for the transcription of Speech in Meetings. In: Proceedings of the Rich Transcription 2005 Spring Meeting Recognition Evaluation (2005) 30. Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994) 31. Jurafsky, D., James, H.: Martin: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. Prentice-Hall, Englewood Cliffs (2008) 32. Möller, S., Gödde, F., Wolters, M.: A Corpus Analysis of Spoken Smart-Home Interactions with Older Users. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (2008)
Sympathetic Devices: Communication Technologies for Inclusion Across Housing Options Claudia Winegarden1 and Brian Jones2 1
Industrial Design, College of Architecture, Georgia Insitute of Technology 247 4th Street, Atlanta, Georgia 30324, USA 2 Interactive Media Technology Center, College of Computing, Georgia Insitute of Technology 85 Fifth Street, NW, Atlanta, Georgia 30332-0130, USA
[email protected],
[email protected]
Abstract. Encouraging wellness at home is a necessary step in alleviating the healthcare system, but also a vehicle for promoting independence and quality of life among older adults. Even though much healthcare research is focused on autism, asthma, diabetes, to mention a few, depression caused by isolation is a serious condition related to healthy aging and outcomes. Addressing communication patterns across housing options might bring us closer to understanding and preventing social isolation and loneliness among older people. This paper discusses a research-based iterative process of applied within subjects survey and action research studies for designing communication technology devices for older adults. The relevance of this project is to understand the role of design and technology for adoption, home care affecting an independent healthy aging. Keywords: design, older adults, communication technologies, isolation.
1 Introduction The number of aging adults above age 60 in the United States is expected to double by the year 2030 to nearly 45% of the adult population [1][2]. With this increase comes the challenge of a higher demand on the healthcare system. Researchers in academia, healthcare, government, and industry are searching for ways to reduce this impact by looking to the home as one solution. Both encouraging wellness at home and promoting independence and quality of life among older adults is a necessary step in alleviating the burden on the healthcare system. Even though much healthcare research is focused on autism, asthma, diabetes, obesity and cancer, depression is a serious condition related to healthy aging and outcomes. Data suggest that social isolation and loneliness might be a major cause of depression. Addressing communication patterns with the social activities and channels used between two or more individuals (interpersonal communication), but also with the effectiveness of communicating with oneself (intrapersonal communication) might prevent isolation and loneliness among older people. Moreover, this understanding of communication needs and designing of communication technology devices might create new opportunities for affecting social inclusion. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 127–136, 2009. © Springer-Verlag Berlin Heidelberg 2009
128
C. Winegarden and B. Jones
The purpose of the project described herein is to approach the design of communication technologies for older adults with a better understanding of the user needs across housing options. This paper discusses the research-based iterative process of applied within subjects survey and action research studies for designing communication technology devices. We hypothesize that providing more acceptable or “sympathetic” aesthetic and experience design, along with ease-of-use and understanding of the usefulness of technology devices will result in greater adoption by older adults. The significance of the project is to link research to design and design to research by not only identifying the causes of isolation, but also by executing solutions based on grounded findings and evaluation of the efficiency of the design and technology intervention. Importantly, the relevance of this project is to understand the role of design and technology for adoption, home care and independent healthy aging.
2 Background Depression is a growing health concern among the aging population affecting nearly 7 million aging adults in this country. Depression has been identified by the Centers for Disease Control as one of four areas that should be addressed to improve older Americans’ health and quality of life. Several studies have linked depression to loneliness (or emotional isolation as a subjective unwelcome feeling of lack or loss of companionship) and social isolation (defined as the objective absence of contacts and interactions with a social network) [3][4][5]. Data suggest that social isolation and loneliness is the major cause of depression and one of the major factors influencing premature death among elders [6][7]. Studies have shown that aging adults experience social isolation and loneliness as a result of death or loss of their companion [8], decline of their social networks due to: loss of mobility or disability [9], relocation to support disability or expected future care needs, or relocation or death of family or friends. Even in congregate living facilities, which afford social exposure, loneliness may still be experienced among older adults [10]. Successful communication patterns not only can be defined with the social activities and channels used between two or more individuals (interpersonal communication), but also with the effectiveness of communicating with oneself (intrapersonal communication).While studies of computing communication technologies within the aging population suggest that family communication and community support have a significant impact on reducing isolation and encouraging healthy aging, few studies have examined the role of intrapersonal and interpersonal communication and how technologies might be a portal to avoid social isolation and loneliness. Yet, many technology-enabled services aimed at health and wellness in the home have been slow to expand in the market due to a poor understanding of user needs and the manner in which services are delivered [11]. As such, this project aimed at identifying the communication needs of older adults living independently across a range of housing options and to design/develop inclusive communication devices to help older adults maintain personal and social connections.
Sympathetic Devices: Communication Technologies for Inclusion
129
2.1 Participant Population This project focused on older adults who are living independently, with or without other residents, like spouses. A total of 26 participants were part of the sample population. In determining the settings for the study, three housing options were chosen that included a population diverse in income and social organization: Continuing Care Retirement Communities (CCRC), Naturally Occurring Retirement Communities (NORC) and Aging in Place (AP). CCRC are generally suburban private neighborhood designed facilities that target a faith-based population of older adults, age 70 and older. The CCRC provides a continuum of care including independent living options. For this project, we targeted a CCRC located in the suburbs of Atlanta, where independent living older adults are living in both duplex-style and apartment-style homes. As part of the services, the community also facilitates formation of clubs and organizes events both on and off campus for residents to participate in. Due to the funds needed to purchase a residence and cover the monthly fee, residents are considered to comprise the high-end of the study’s income spectrum. In contrast to a community designed to deliver retirement living and services, NORC are neighborhoods, where many of the residents are older adults, who have either lived in their homes over several decades, or resulted from a significant natural migration, due to ease of access to other older adults or various amenities such as shopping, faith-based organizations, entertainment, etc. They are organized communities with centers providing services to support the needs and interests of a concentrated area of the older adult residents. For this project, we identified two NORC in the Atlanta area that addressed different demographic groups: one supporting a faithbased neighborhood area and the other supporting an inner city neighborhood area. Both communities were representative of older adults with different income levels, from average to below the poverty line. AP was another housing option included in our study that consists of older adults distributed around the city and suburbs generally aging in their homes over several decades. These individuals were recruited through a list available through a center located at our institution. 2.2 Survey and Action Research Studies The project was based on a volunteer sampling design. A total of 26 participants comprised the sample population across the aforementioned housing options. To address the topic of isolation across housing options, we proposed a number of iterative within subjects research study. For example, a participant from one phase of the study may also participate in another. The study was divided into different phases: survey research phase (1), action research product design phase (2). In phase one of the study, participants were each given a cell phone linked to a system that called them three times a day and asked them questions related to isolation. This project used Jitterbugs as a designed cell phone device targeted for older adults. Jitterbugs are easy-to-use cell phones with simple interactions and suitable tactile button interfaces. Participants were trained on the use of these cell phones prior to the data collection in order to guarantee the successful performance of the phone survey
130
C. Winegarden and B. Jones
study. During the survey, participants were asked questions about the number and type of activities performed daily as well as their emotional state. The survey took on average 20 minutes daily, and ran for five consecutive days. After the survey was completed, participants had a 30-minute face to face followup interview at their homes to assess their attitude towards technology and its use. The findings of this phase were used to inform the design criteria for the communication technology which would come to be known as Sympathetic Devices. The design and development of these devices was carried out with the Industrial Design Graduate Studio course at the Georgia Institute of Technology. By the end of the course, a variety of devices were designed. For phase two of the study, participants engaged in a focus group that lasted approximately two hours. Participants were presented with prototypes of the designed devices so as to gather their feedback and select a concept to be developed and evaluated in subsequent studies. However, before approaching the aforementioned phases, a careful literature review was conducted to understand current communication technologies and answer how their design and functions can have a significant impact of older adults and their adoption of the technology. 2.3 Communication Technologies Communication technologies such as video conferencing, email, photo-sharing, social networking, and medical information sites would all benefit older adults, but many don’t have the knowledge or patience to learn how to use the computer, let alone setup and learn to use these applications. As such, in order to bring these applications to the older adult, technology must be designed in a manner that older adults can relate to, understand the benefits, manipulate, and enjoy the technology. Several previous projects have focused on the need for more tangible means of delivering communications, such as ECHOES [8], Fridgets [12], and The Jive [13]. In the ECHOES project, a TeleTable concept targeted populations, age 65 and up, to understand and improve companionship, with a goal of preventing loneliness and depression. The researchers developed multiple touch screens embedded in a kitchen table, and a small, portable box, called a Pitara, full of interesting objects and digital media readers to identify them. This product would allow older adults to interact virtually with others, such as playing games, organize photos, and perform communication related tasks, such as writing a digital letter with a stylus and sharing with friends and family. The focus of the research was primarily on providing an outlet for grief at the loss of one’s spouse and resituating the individual socially to avoid onset of loneliness and depression. Many of the interaction aspects of the project were tested through usability studies, indicating that touch screen interaction can be useful for organization and gestures. The concept was clever, but as designed would prove expensive for older adults. While they performed user studies on the basic concepts of the interaction, it is not evident they considered how older adults would accept transforming or changing out their kitchen table. The Fridgets project focused on designing a technology that would increase independence and connectedness among older adults. The project featured refrigerator magnets that each had a different function, such as weather notifications, reminders, sports information, cooking information, etc. One central 10 inch LCD screen served
Sympathetic Devices: Communication Technologies for Inclusion
131
as the primary means of delivering visual content, while touching other magnetic addon modules on the refrigerator will change out the information on the screen to the context of the module they tapped. The concept device could also receive email and photos from friends or family, and touch screens are used to interact with information. The Jive project focused on providing older adults the possibility of creating their own social networking sites by tapping into friends’ and family’s feeds. Each person’s information was associated with a sensor tagged photo block, that when stuck to the screen provided the social networking site page for that friend or relative. The project study did indicate that older adults can feel more connected with the device; however, there was no evidence on older adults building a social networking presence.
3 Design Phase one of the studies was crucial to the success of the project. The findings of this phase were used to inform the design scope and criteria of the communication technology—sympathetic devices. More importantly, it was a phase intended to link research to design and design to research. Survey, interview and focus group studies were carried out to incorporate the user in the design process. During the survey studies, students were given the opportunity to be involved in the project. They carried on a design project for two months linked to the framework and population of this study. A total of sixteen students were involved in the class. The class was divided into eight teams, where each team conceptualized, designed and modeled a sympathetic device for the subsequent phase. The major relevance of conducting this project in the class, was that students were able to work closely with older adults to advance their design solutions. The most interesting aspect was how the students and the aging community were unified in the process. The face-to-face interviews were of extreme importance in the project. Older adults stayed in touch with the students after completing their research protocol. There was a natural and mutual interest in aiding the design process, what can be referred as to communication by design. It was truly integrative evidence on how to design for inclusion and communication technologies. 3.1 Formative Conversations Based on the interviews, participants were able to inform the design process in an insightful manner. Different open-ended questions were developed to assess the participant’s technology use, attitude to trying new things and communication frequency and needs. Their answers, even though varied, informed the outlook of how communication technologies are relevant for everyday activities. Below, there are a few short excerpts from the conversations established with older adults and the designers. Even though they are a few in number, they served as examples of how rich their interaction communications can be formative for design. One of the major comments from participants was the fact that older adults are willing to try new things and are not afraid of technology. For example, one of the participants stated: “I haven’t been exposed to new things for awhile. I had to get a new printer, so I learned about that, and I am still learning about my computer. I’m
132
C. Winegarden and B. Jones
not illiterate, but I’m not a real expert user either. A lot of my friends here, throw up their hands and say that don’t want to look at a computer. Right now, if you’re not a little computer literate, then you’re out of it. And I don’t want to be out of it.” However, as stated, if technologies are used, they should be designed in a manner to support current lifestyles. One participant interestingly stated: “I write them (letters), but then I send them on the computer.” Older adults are active users. But being active does not necessarily mean that the process of doing activities should be fast. They should be enhanced with the opportunity to accommodate to an individual pace and to celebrate the joy, even the rituals of accomplishing tasks: “I like to play games on the computer… I am task oriented, but I don’t like to sit down, and I don t like to be idle, I need to be doing something, I don’t, could be just nervous energy”; and: “A cell phone means fast, be brief, get it over with and I like conversations…and that is not rigid.. I was 51 years old before I learnt to use a pressure cooker and I was in my mid thirties before I learnt to drive. So gadgets and I...we are not like that”. More important is to realize that connections among people regardless of age should be enhanced. The problem relies on better indentifying the different communication styles: “Contemporaries have all time to talk. Younger people are involved in study and their careers and their jobs etc…and they do not have as much time or are able to call as often as the contemporaries.” Even though older adults may feel technologically challenged, attention should be placed on the accessibility and inclusive aspect of designs for them. It is not so much a question of what but how they should be designed for older adults. 3.2 Design Criteria Based on the data collected from the survey and interviews, design criteria emerged when conceptualizing the communication technology devices for older adults. This criteria is nevertheless an approach to design of sympathetic devices, by understanding and responding to the real needs of older adults. It became apparent that students were able to inform the design process following a set of open guidelines in their creative process. One the first guidelines included making use of current and available technologies in a creative manner. There are many approaches in bringing about solutions that are detached from the reality of users, becoming products that are only available at the laboratory level or to just a few people. As such, the goal was to arrive to the design of cost-effective solutions that make use of simple off the shelf technologies in order to potentially advance those concepts for fast deployment in the market. Second, inclusive design approaches become central for concept ideation. Designing devices for older adults should be addressed in a universal manner. Older adults may experience limited vision, dexterity problems, hearing loss, mobility problems, etc. As such, communication technology devices should be designed in a multisensory manner to address the different needs from the population. Third, simplicity in the interface also became fundamental, where tangible embedded technologies were a preferred viable technology. They offer the possibility of envisioning concepts that can move beyond the typical desktop interface into new physical contexts. As such, concepts could have the potential to be outlined by using
Sympathetic Devices: Communication Technologies for Inclusion
133
graspable interfaces, sensor-based interactivity, and ambient technologies to mention a few, where the products and environments older adults may interact with on a daily basis, can potentially be computationally facilitated in a less intrusive manner. Lastly, flexibility of use was a major design criterion. Older adults have diverse needs not only across the population and their housing options but mainly within their healthy lifestyles. A well designed device should be able to accommodate sudden and temporal changes without compromising the functional, accessible and simple use. That is, allowing different use of the product(s), across users and spaces. 3.3 Sympathetic Conceptualizations A series of concepts and rough prototypes were developed using creative technologies and interfaces aimed at promoting more intra and inter personal communication and socialization. At a high level, devices were conceptualized around everyday simple activities and using communication at the interpersonal and intrapersonal level to reduce isolation in creative ways. Concepts were articulated around remembering (cognitive aids), eating (addressing the issue of eating alone), learning (life-long elearning tools), and moving (incentivizing activity). A total of eight communication technology devices were conceptualized. The first concept, “Thinking of You” is a communication device consisting of five square (5x5 inch) stained wooden blocks with a translucent white overlay and a stainless steel hook. Each block can be placed to a magnetic bar that can be mounted to the wall. Each block is associated with a person in their social network by attaching something memorable about that person to the hook. The user may touch a block to send a simple gesture to the person associated with the block. When receiving a gesture, the translucent overlay glows. The second concept, “TagIt” is functions similarly to the previous concept, but designed as a stained wood tri-fold frame with metal décor on the two outer panels and a touch screen LCD display embedded in the center. These panels provided a surface on which to attach small magnetic photos surrounded by clear frames. The photo frames glow when a message arrives and touching the photo brings up the associated person’s message or media on the LCD screen. The user may touch a control on the LCD to send an audio/video response recorded by the device. The remote recipient may receive messages through email, instant message, or other means that fits their preference and communication style. The “Altruist” concept focuses on informing the user of the presence of their connected friends at one of the common areas in a community. The model consists of two glass objects that could be used as a keychain or carried in a pocket. At home, the objects would be placed in a glass dish. Each friend would have one of these objects and be able to join the other’s circle of friends by fitting their two glass pieces together for a short time. After joining, the glass dish base station will play music, specific to location, when one of the circle of friends enters a common area, like the gym or spa. The hope being that they will be more likely to get out and socialize if they know their friends might be at a particular location. The “Forgetfulness” concept utilizes a bracelet with color-coded plastic tags embedded and is intended to replace the string on the finger reminder. The tags may be removed to place on some object, then a voice message is recorded on the bracelet.
134
C. Winegarden and B. Jones
Colored lights on the bracelet indicate the tag is missing. Pressing a button plays back the messages and returning the tag clears the associated message. The “ShareLab” concept is designed to provide a simplified method for older adults to share their crafts with others, either for sale or pride. The device is designed to look like a scrapbook, but with an LCD screen and physical slider to move from capturing a photo to editing that photo, then sharing the photo. The share application included a method of pushing images to monitors around a community or emailing to a friend. The concept also allows receipt of messages in response to shared images. “C-Connect” is a concept to link older adults to cultural institutions, events and databases. Influenced by the design of audio guides from museums, the device looks like an extended PDA with a small LCD screen and physical controls to select options and playback media. The device can learn preferences and search for information on lectures and similar cultural events, suggesting the most relevant to the user. It can also download and playback PODcasts or other media. The “Mockingbird” concept is aimed at allowing older adults to easily capture music, build playlists, and share the music with others. The system consists of a small clip (similar to a refrigerator clip) that when pinched close, would record a short clip of music played nearby. Returning the clip to one of two types of stations, it will upload the clips and online software will match them up with the song ID online and add it to their playlist. In the docked position, the docking station will play the playlist from an online personalized radio site. The clip can also be clipped to a common area music system to share the individual’s playlist. The “DinnerCloth” is a concept providing a means for indicating presence of another individual sitting down to dinner, thus providing a sense of connection to friends and family during meal time. The design is in the form of a placemat with interleaved loops. Two small fiber optic strands with wooden beads at either end are split and one given to the older adult and one to the other family member or friend. Each may then weave their strand in a pattern that associates them with the other individual. When someone sits down with a shared strand, the matching one glows on the placemat and vice versa, thus given a sense of presence at meal time, when a sense of loneliness might otherwise set in. 3.4 Older Adults in Design The aforementioned concepts were prototyped and used in a focus group with older adults in a usability lab setting. The group was comprised of older adults representing the NORC communities and AP. The participants varied in ability (including vision and hearing impairment, dexterity, and mobility), age, and number of home residents (from alone, to companion care to living with family). The focus group was articulated with different activities. First, older adults were presented with the prototypes. Older adults provided feedback for each from a first impression aspect, guessing what each model might do. Each concept was then presented by students to the group to help them understand the benefit of the concepts. The group was asked to talk through their thoughts on the concept and after all were presented, grade each on most to least likeable considering individually perceived value and easy of use, aesthetic preferences, and likeliness to adopt the communication technology—sympathetic devices. It became apparent that aesthetics of each concept were weighed more heavily than the function by most of the group. When
Sympathetic Devices: Communication Technologies for Inclusion
135
asked about whether each would benefit from the concept, at least 50% focused on the appearance and had to be asked several times to get a valid answer. The well-received concept was the “Forgetfulness” bracelet concept. One possible explanation is due to the simple aspect and usability of the product addressing a common shared need of the group. Also, it was the only concept that was presented through a video scenario, which likely had some impact on the recall of function by the participants. It is apparent from the focus group that aesthetics and inclusive design indeed play a significant role when designing a good technological system for older adults. The subsequent well-received concepts were “TagIt” and “Thinking of You” due to the perceived usefulness and realistic approach. It is expected to redesign the last two concepts into a unified approach to develop functioning prototypes, evaluate and implement them in the homes and lifestyles of older adults.
4 Conclusion Communication technologies have the potential to enrich many aspects of our lives, especially for older adults. But if not well designed, it is not likely they will be successfully adopted. This project and the documentation related to the design of communication technology—sympathetic devices for inclusion serves as an example of designing for older adults. The major aspect of this effort is the evidence that research has started to extend from investigator-initiated and technology-centered to a needsdriven, user-centered approach. Research is crossing its practice boundaries by including the users, industry and the classroom in a unified process. In terms of the design, this project serve as evidence that aesthetic and experience design, along with ease-of-use and understanding of the usefulness of technology devices have an impact on older adults accepting technologies. There are a number of criteria that affect the success of designing such technologies for older adults ranging from its functionality, value, design and cost. We have described four criteria toward designing sympathetic devices: realistic design, inclusive design, simple design and flexible design. As such, to design for older adults it is to design in an inclusive manner, where the physicality of the interface, therefore traditional approaches towards product design, remains a valid approach in bringing about successful, implementable and adoptable solutions. Acknowledgments. We would like to thank the Health Systems Institute and the GVU Center at the Georgia Institute of Technology for their funding commitment as part of their Seed Grant Initiatives to promote innovative research approaches; and Jitterbug for their support on providing the cell phone devices for the data collection phase. We are also grateful to the Industrial Design Program at the Georgia Institute of Technology and the Fall 2008 graduate studio students for their involvement conceptualizing sympathetic devices. Lastly, special thanks for Jenna Schmidt and Jae Wook Yoo for their relationship as graduate assistants in the project.
References 1. American Institute of Aging, http://www.aoa.gov/prof/Statistics/ profile/profiles.aspx 2. Alwan, M., Wiley, D., Nobel, J.: State of Technology in Aging Services. Interim Report Submitted to: Blue Shield of California Foundation (2008)
136
C. Winegarden and B. Jones
3. Townsend, P.: The Family Life of Older People. Routledge and Kegan Paul, London (1957) 4. Weiss, R.S.: Issues in the Study of Loneliness. In: Peplau, L., Perlman, D. (eds.), pp. 71– 80. Wiley, New York (1982) 5. Cattan, M., White, M., John Bond, J., Learmouth, A.: Preventing Social Isolation and Loneliness Among Older People: A Systematic Review of Health Promotion Interventions. In: Ageing & Society 25, pp. 41–67. Cambridge University Press, Cambridge (2005) 6. Alpass, F.M., Neville, S.: Loneliness, Health and Depression in Older Males. Aging & Mental Health 7(3), 212–216 (2003) 7. The Press Association, http://ukpress.google.com/ article/ALeqM5im4H_ga3jHqqJDkomRgonLn75k4w 8. Donaldson, J., Evnin, J., Saxena, S.: ECHOES: Encouraging Companionship, Home Organization, and Entertainment in Seniors. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, Portland, pp. 2084–2088 (2005) 9. Barrow, G.M.: Social Bonds: Family and Friends, Aging, the Individual, and Society. West Publishing Company, St. Paul (1986) 10. Adams, K.B., Sanders, S., Auth, E.A.: Loneliness and Depression in Independent Living Retirement Communities: Risk and Resilience Factors. Aging & Mental Health 8(6), 475– 485 (2004) 11. Coughlin, J., Pope, J.: Innovations in Health: Wellness and Aging in Place. In: Gerontechnology, pp. 47–52 (2008) 12. Bauer, J., Streefkerk, K., Varick, R.R.: Fridgets: Digital Refrigerator Magnets. In: CHI 2005 Student Design Competition, Portland, pp. 2060–2064 (2005) 13. Jive: Social Networking for your Gran, http://jive.benarent.co.uk/
Design Framework for Ambient Assisted Living Platforms Patricia Abril-Jiménez1, Cecilia Vera-Muñoz1, Maria Fernanda Cabrera-Umpierrez1, María Teresa Arredondo1, and Juan Carlos Naranjo2 1
Life Supporting Technologies, Technical University of Madrid (UPM), Ciudad Universitaria s/n. 28040- Madrid, Spain {pabril,cvera,chiqui,mta}@lst.tfo.upm.es 2 ITACA Institute, Technical University of Valencia, Camino de Vera s/n. 46022- Valencia, Spain
[email protected]
Abstract. Nowadays the new technological advances offer the possibility to provide a great number of different personalized services that cover the needs of diverse categories of users. The application of the Ambient Intelligence (AmI) paradigm and the Ambient Assisted Living (AAL) concept makes possible the creation of new applications that can significantly improve the quality of life of elderly and dependant people. This paper presents an interaction framework that provides a new generation of user interfaces for AAL services in the context of an AmI-based platform. This solution aims to develop the technological context where elderly and dependant citizens can increase their life independence. Keywords: Ambient Intelligence, Ambient Assisted Living, wireless sensor networks, adaptative interfaces.
1 Introduction One of the main applications of Ambient Intelligence (AmI) concept is the development of services aimed at improving citizens’ quality of life, providing a better control of the environment through the electronic devices. AmI implies the development of seamless environment of computing and the use of advanced networking technologies. It also embraces specific interfaces that are aware of the particular characteristics of human presence and personalities, take care of users needs, are capable of responding intelligently to spoken or gestured indications of desire, and even being engaged in intelligent dialogues. In AmI applications technology is transparent and invisible for users, who can effortless access to a great number of functionalities while they are immersed in their natural and familiar environments (i.e. at home). In addition, it is guaranteed that all security and privacy requirements are preserved. AmI paradigm differs from traditional models in two important aspects. First, the user interaction is proactive meaning that actions are the result of a user’s presence or behaviour. Second, this new concept is not linked to any specific device or set of C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 139–142, 2009. © Springer-Verlag Berlin Heidelberg 2009
140
P. Abril-Jiménez et al.
devices, but associated with technologies located in the environment. This is possible due to the utilization of wireless sensors and actuators networks that control and monitor the environment in which they are embedded. Additionally, it is necessary a middleware platform in which the network components are linked, and a semantic data processing system for analyzing the conditions of user needs and the context environment. This paper presents the design of a user interface framework, which is part of an AmI platform developed under the scope of the AmIVital project, partially funded by the EU and the Spanish Ministry of Industry, Tourism and Trade [1]. The platform aims to provide Ambient Assisted Living (AAL) services to dependant people and chronic disease patients. It has been based on an architecture that is easily expandable for offering additional services and can also be integrated with other existing healthcare platforms.
2 The Interaction Framework AmIVital platform has defined a set of technological services in the AAL domain [2]. These services cover several aspects of the daily living activities and health status of chronic patients and dependant people (i.e. virtual calendar, health record, domotic control, personal alarms). Technological services can be combined in different groups or configurations according to the user’s preferences. Each of these combinations is called a functional service. The project has defined set of basic predefined functional services attending to the identified users needs (i.e. health and telecare, social interaction, information and learning). However, the global architecture of the platform has been designed for supporting new functional services, defined as any possible combination of existing technological services. Functional services are provided to the users by means of an adaptative interaction framework that is based on a service-oriented architecture (SOA) [3][4]. This alternative has been selected because it respects the main principles of AmI platforms: it is autonomous, reusable, ubiquitous, low coupling and adaptable. The services on the defined architecture are supported by a middleware that intercommunicate end-users and service providers. This middleware makes possible the adaptation of the user interfaces based on the user’s preferences and context data provided by a network of sensors. The interaction framework is composed by several components: the sensors network, the user profile, the knowledge management module, and the context module (Fig. 1). The first component corresponds with the physical layer of the SOA platform, that is, the set of sensors distributed in the environment for capturing the relevant data. More specifically, the selected sensors are non-invasive, according the AmI paradigm, and are classified in two categories: wearable and environmental sensors. Wearable sensors provide information on the user’s physical and physiological status while environmental sensors detect changes in the user’s environment and actuate accordingly. Sensors are connected in networks that are further integrated in the general communication platform, using a combination of wireless communication technologies: Wi-Fi, ZigBee and Bluetooth. The sensors network module requires a deep specification of the scenarios in which the sensors are embedded.
Design Framework for Ambient Assisted Living Platforms
141
The user’s profile contains all the information related with the user preferences, capabilities, and health status. It is one of the main input data for the interaction adaptation process. It is first filled in when the user subscribes to any of the services offered by the platform and updated periodically based on the user’s actions.
Functional Services Technological Services Knowlegde management module CORE
Context Module
Inference motor
User data
Physical layer: sensors, actuators and devices network Fig. 1. Interaction framework architecture
The knowledge management module is responsible for the multimodal dynamic interaction adaptation. It incorporates an inference motor that creates a set of rules baased on parameters obtained from the user profile (i.e. user’s preferences and services subscription). These rules are used for later adapting the user interfaces and define the type of interaction, the type of device, and the specific technological services to be used. Finally, the context manager module manages the real presentation of the information provided by the different services to the users. It effectively performs the user interface adaptation by using the data provided by the knowledge management module (defined adaptation rules) and the sensors network (context information). Furthermore, the context manager has the capacity of extracting data from the user’s behaviour and service usage and utilizes it for updating the adaptation rules, improving the whole interaction mechanisms. The module is also responsible for adjusting the information to be presented to the specific device that the user is utilizing. The proposed framework follows several steps for the interaction adaptation. First, there is a collection of user’s data that includes user’s target groups (i.e. medical staff, elderly people) and the individual and non-transferable user preferences in terms of technological services. Secondly, the user profile is created and it is associated with the content to be provided based on the data type and user category. Then, the real content delivery and presentation is defined, by specifying the different possibilities that the system has for providing relevant information to the user. Additionally, the system collects feedback from the user actions and updates his current profile accordingly.
142
P. Abril-Jiménez et al.
As a result, the developed framework provides a complete set of AAL functional services that can be totally adapted to different user requirements in terms of interaction, environmental context information and different devices.
3 Conclusions Ambient Intelligence technologies are playing nowadays a key role in several aspects of the society, offering new solutions and services that cover the needs of an increasingly aging population. The proposed interaction framework is a novel system that facilitates the adoption of these technologies by providing a natural pleasant and easy method for interacting with the environment. The platform has a clear impact on the specific areas of health care and social assistance, providing a seamless and natural access to different AAL services for dependant citizens, and offering innovative and integrated solution. Elderly users, chronic patients and dependant people are the main benefited from this kind of technological solution, by receiving new services that can significantly improve their quality of life and reduce their level of dependency. Acknowledgments. This work has been partially supported by the Spanish Ministry of Industry, Tourism and Trade, project AMIVITAL-CENIT, and the European Union VI FP, Contract No. 045088.
References 1. 2.
3. 4.
AmIVital project. EU and Spanish Ministry of Industry, Tourism and Trade funded, http://amivital.ugr.es Cruz-Martín, E., del Árbol-Pérez, L.P., Fernández González, L.C.: Telefónica Research and Development, Spain, The teleassistance platform: an innovative technological solution to face the ageing population problem (2007) Newcomer, E., Lomow, G.: Understanding SOA with Web Services. Addison Wesley, Reading (2005) Bell, M.: Introduction to Service-Oriented Modeling. In: Service-Oriented Modeling: Service Analysis, Design, and Architecture. Wiley & Sons, Chichester (2008)
Ambient Intelligence in Working Environments Christian Bühler TU Dortmund University, Rehabilitation Technology Research Institute Technology and Disability (FTB) Emil Figge-Straße 50, 44221 Dortmund
[email protected]
Abstract. The concept of ambient intelligence (AmI) has recently been adopted related to living scenarios and denoted as ambient assisted living (AAL). It has received high attention related to the demographic shift and the positive options of care and support for elderly people at home and on the move. However, there exists an equally important field of application related to work. In the context of labour high mobility and flexibility of people is requested. The demand to work up to higher ages complements the situation. People in the workforce develop growing expertise and different abilities over time. They need tailored support systems at work keeping the efficiency and effectiveness and elements of prevention or adjustment to changing abilities. Indeed, environments in industry and at work provide already a high degree of networking and computing infrastructure, much more than in the private sector and can provide a basis for an advanced AmI infrastructure. The idea is discussed within the framework of creating accessible workplaces for people with disabilities. Here, so far a reactive strategy has been followed based on the individual case. Only in case a concrete person with a disability is included in the work force and only in that very moment a workplace adaptation is considered. However, now this reactive strategy is outdated, because today the complete infrastructure needs to be considered to make a workplace accessible. Following an AmI strategy –ambient assisted working (AAW) provides a flexible approach towards workplace adaptation for all, including people with disabilities and older people in the workforce. In order to use AAW, the process has to start much earlier in a more inclusive way. Without knowing the exact demands of a future worker, the system needs to be designed. The flexible networking character of AmI provides the required flexibility. Keywords: Ambient Intelligence, Work, Ambient Assisted Working, Universal Design, Accessibility, Information technologies, Higher Age.
1 Introduction In the past 10 years the greying of the society has attracted the focus of policy and society. It is considered as very positive to live longer in good health and to enjoy retirement. This approach is based on the traditional assumption that there is an overall lack of job offers and enough younger workforce in society. Early pension programmes by companies and governments have been installed in order to combat C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 143–149, 2009. © Springer-Verlag Berlin Heidelberg 2009
144
C. Bühler
unemployment and often people have already left the labour market into retirement between the age of 50 and 65. In the area of technology, there has been a lot of emphasis on the support of people at home through modern technology like smart homes, support of living, and the support of health care and nursing. However, there exists an equally important field of application related to work. Meanwhile the ratio of people in gainful employment and of all other people is considered critically. Consequently, it is intended to let people stay longer in employment. In the context of labour high mobility and flexibility of people is expected and requested. The demand for life long learning and the wish and need to work up to higher ages complement the situation: people in the workforce develop growing expertise and different abilities over time. They need tailored support systems at work keeping the efficiency and effectiveness and elements of prevention or adjustment to changing abilities. This leads to the question, if modern technology can be used to support older people or people with functional restrictions at work. In the past changes of workflow and workplace adaptation have been motivated mostly for increase of productivity and quality. Safety at work and ergonomic considerations have also been aspects. Individual workplace adaptation has been installed in the framework of vocational rehabilitation of people with disabilities. This individual adaptation has been done case by case considering the concrete workplace and the special requirements of the worker. Very often it has been related to access to buildings and workshops, special furniture, devices and equipment. Accessibility and compensation of disability have been the main elements [7]. Although high technology has played an important role here, the general e-accessibility of company infrastructures has not much been at the scope of such measures [3]. The challenge to support older people at work connects closely to all these approaches. In the first part of the paper the traditional strategy of creating accessible workplaces and the limitations of the concept in a modern infrastructure are described. It relates back to examples of current legislation which supports this old approach. The second part deals with options for AmI at work and presents examples of current options.
2 Approach of Traditional Workplace Adaptation In many highly developed countries, ergonomics and safety have developed as important aspects in the design of workplaces besides the overall need to create workplaces for efficient and effective work processes. It is considered essential to support workers in a way to enabling them to perform their tasks during a whole working day and for many years with high efficiency and a quality output. This implies preventive measures for health and avoidance of repetitive strain injury syndrome (RSI) and other work related health problems. In this perspective workers are considered as human potential and the investment in their education, training and experience needs to be secured over time. This brings automatically along the need to consider human factors at work and to protect the workers. Of course these issues are also very much requested and supported by the policy of the unions and the work related health and rehabilitation systems in many countries, but also the solidarity systems in health and
Ambient Intelligence in Working Environments
145
pension systems. In this context the adaptation of a workplace to individual needs of a worker becomes a complementing option to general ergonomic, safety and accessibility considerations. Particularly in the case of highly qualified workers with changing functional capacities due to RSI, disease and accidents, which often also occur in the labour context or even are caused by work, schemes have been developed to provide individual workplace adaptations. Reactive options comprise among others the reorganisation of the work process and the tasks to be performed, the adaptation of the workplace and the work environment or the complete change of the work itself. An example for a support system in this respect is given by the German Social Book IX [11]. This legislation constitutes a framework of support for a worker with a disability and the respective employer. Besides the clarification of rights and obligations, it builds the baseline for advice and support services. In particular it is possible to receive financial support for individual workplace adaptations out of a special fund. The adaptation is undertaken as a consequence of the concrete changes of the worker’s abilities. It considers the individual case of personal requirements in the respective given work situation. The adaptation can thus be tailored and optimised individually, which is very positive. Examples of workplace adaptations are collected and provided as information for other cases [9]. On the other hand, a reactive strategy has to be followed. Only in case a concrete person is included in the work force - either a newly employed person with a disability or a current worker achieving problems due a disability - and only in that very moment a workplace adaptation is considered. This can have negative consequences, because the process starts only after the case has been made. If no concrete case exists at the time of general planning of infrastructures it might be done without any consideration of accessibility. Under worst circumstances, it can happen, that a complete infrastructure is designed inaccessible at least until a later adaptation has been planned and implemented. This can also be very costly. And it may be a hindrance to hire a new person needing accessible infrastructures. Today, this pure reactive strategy is outdated: Instead a careful planning of the infrastructures can provide very important complements to this reactive strategy.
3 Changing Infrastructures 3.1 AmI, AAL In the last years the industrial societies have very much developed into information and knowledge societies. The modern infrastructures of private, public and industrial properties have taken up very much electronic communication, networking, smart sensors and actuators. Wired telephony, cable networks (LAN and WAN), wireless connections (infrared, Bluetooth, WiFi), automatic door openers, motion sensors for illumination and safety, window openers, air condition, access control systems, GPS based services, acoustic sensors, video control, pattern recognition, biometrics, etc. are among the applications inbuilt in the infrastructures. Often the devices are operated locally and not yet in an integrated way linked to other appliances although the connection would easily be possible. In a new perspective and with the progress of technology, the option of connecting the distributed smart devices has created the vision of ambient intelligence [2].
146
C. Bühler
In such an AmI infrastructure intelligent agents, interconnected through fixed and mobile networks, support the nearby persons in a responsive way: it eases interp ersonal communication, access and delivery of information and the control of the environment. People will interact through individualised mobile devices, but also infrastructure based I/O and they will be able to make use of the computational power of the surrounding networks (ubiquitous computing, nomadic computing, cloud computing). Already today, mobile phones can be interpreted as such personalised multimodal interaction devices. They offer wireless connectivity (GSM, UMTS, GPS, Wifi, Bluetooth) and access to computational power in the network, a variety of input options (reduced keyboard with T9 prediction, full keyboards, softkeys, pointing devices, touchscreen, voice) and outputs (display, tactile vibration, visual signalling, synthetic voice, sound) and inbuilt computational power and flexibility. These devices are perceived as everyday appliance is this case as telephones rather than as computers, which will be typical for AmI devices. The concept of ambient intelligence has recently been adopted and transferred related to living scenarios and denoted as ambient assisted living (AAL). It has received high attention related to the demographic shift and the positive options of care and support for elderly people at home and on the move. The focus is here on private use rather than working environments. 3.2 Work Environments Indeed, environments in industry and at work provide already a high degree of networking and computing infrastructure, usually much higher than in private surroundings. Systems for access control, time keeping, process control, automation technology, tracking of goods, work planning and monitoring and computer assisted work in trade, maintenance etc. combined with company intranets can provide a basis for an advanced AmI infrastructure. This is connected to the potential of creating a very flexible and supportive work environment for all. It is taking up the ideas of Design for all and accessibility, now applied to the work environment with an advanced modern information and communication infrastructure.
4 AmI at Work During their work life and career people and companies benefit from tailored support systems at work keeping the efficiency and effectiveness and elements of prevention or adjustment to changing abilities. Following an AmI strategy –ambient assisted working (AAW) provides a flexible approach towards workplace adaptation for all, including people with disabilities and older people in the workforce. In order to make use of AAW, the planning process has to start much earlier in a more inclusive way compared to traditional workplace adaptation. The overall system needs to be designed without knowing the exact demands of a future worker. Basically three levels need to be considered: 1. the network(s) as computing resource and connection with all agents, 2. the agents (machines, devices, appliances in the infrastructure) and 3. the personal mobile interface. In particular the flexible networking character of AmI
Ambient Intelligence in Working Environments
147
provides the required flexibility in level 1. Level 2 can integrate the large variety of agents, like CNC-Machines, conveyor belts, robots, but also appliances like building automation, guiding systems [4, 6, 8] workflow control systems or work support devices (e.g. handling support [1], lifting aids [10] – body weight system [5]). These are less generic and depend very much on the work subject. New agents can be added to the system whenever needed. Level 3 deals very much with the individual requirements matched to the working tasks of an individual person. The assumption of AmI is that all the systems are connected and smart sensors and actuators are able to exchange information in order to create a flexible and intelligent environment. The interaction with the worker can be done in several ways: direct sensing and interaction with agents and/ or operation through the personal mobile interface. 4.1 Case Examples In the following two virtual cases shall illustrate the use of AmI in a work environment for very different people. In both cases, it is intentionally left open which particular ability or disability the workers may have or which specific tasks they perform. The first example refers to a concrete specialist mainly working in an office like environment. The second case refers to maintenance tasks in a workshop environment. Both examples build upon existing options, but the full benefit is achieved through the AmI like system integration. From the examples it becomes obvious, that in this kind of business environments AmI is in closer reach compared to private environments. And looking more deeply in the organisation of work environments one will detect a lot more already existing options for AAW. 4.1.1 Case 1: Flexible Office Environment Mrs. Adapt is employee of the multinational company Access International. She works part time from her home office in alternating telework. Keeping contact with colleagues and customers worldwide requires much coordination, communication and the use of new media. Due to a car accident during a business trip Mrs. Adapt acquired a disability. Access International has put effort to keep her qualified work performance, because she is an experienced expert in her business field. Her work environment has been adapted in a way allowing her to perform all necessary tasks despite her disability. Fortunately Access International was well prepared for such requirements due to their modern infrastructure. All subsidiaries and premises of the company are connected via a modern intranet. Secure gateways allow access by the employees to data from all over the world. When reaching a subsidiary Mrs. Adapt gains access to the premises through a RFID and biometric controlled electronic access control system. At arrival a workplace for the day is automatically assigned according to her daily work plan. The workplace is accessible and she is dynamically guided on accessible path to find it. Her individualised software environment is loaded on the respective workplace by the system. She only needs to carry along her personalised mobile I/O device, which is configured according to her individual operation requirements. This device communicates wireless in the respective working environment. It allows her to operate and use all devices and machines of the company, which belong to her work or her nearby environment.
148
C. Bühler
Due to the accident sometimes Mrs. Adapt faces temporary memory loss. The system is able to detect the condition automatically through the interaction pattern. In this case she is supported by the memory function of the intranet, which provides information on the daily tasks and a logging of the development of ongoing work. In case of a serious memory loss, she can reach (or can be reached by) her personal coach via the communication system, who helps her to resume work. Regarding her health status Mrs. Adapt is secured via a health telemonitoring link. In case of potential irregularities or contingency it provides warning, alarm and quick emergency support. This option provides the safety needed and enables Mrs. Adapt to perform her work without anxiousness. Once in a while Mrs. Adapt is confronted with problems, which she can not solve easily ad hoc. In this case, she can forward the issue to peers through the intranet (sometimes also Internet). At the same time a database with solutions of the past and relevant legal background are available in a database. Mrs. Adapt makes available her experiences and adapted information in a Wiki shared with other people with the same disability. In this way adaptations and solutions once being made can support many other people. 4.1.2 Case 2: Maintenance of Machines Following a basic instruction course on the construction, function and frequent service tasks of the machines technicians are supported in daily work by a PDA-based maintenance support system. The PDA (personal digital assistance) transmits information on the next service task, the material and tools needed and the location of the machine which needs the service. It guides the service staff dynamically to the machine. Upon arrival a connection between the machine and the PDA is established (could be local, could also go through the overall network). The PDA receives the information about the service condition and the tasks to be performed. Automatically, all necessary safety measures and environment settings are taken and approved by the service person, so that no one will critically interfere during service. It calls the service routine, which is presented to the service person step by step in an appropriate format (sequence of pictures/ drawings, sequence of tasks, voice output and hands free interaction by voice etc.). The PDA and the machine exchange information in a way that the respective next step is only presented after successful completion of the previous one. In case of problems the service centre can log in to the process and can provide assistance. It is an option to use video in both directions (recorded or lifestream) to deal with specific problems by the support of a remote expert. Logging of the work process will be fed into the instruction programme of the service personnel or may lead to a change of service routines. Such a support system can be used for complex and very technical tasks, but also for simple service tasks, performed by less technical staff. Even people with learning disabilities could be enabled by such an interactive support for example servicing printers and copy machines.
5 Summary Although today AmI is very much discussed for public and private environments, it seems more closely ahead in industry and business environments. The high degree of
Ambient Intelligence in Working Environments
149
automation, facility management, work related control, safety in highly networked computational environments as present in industry creates a very good backbone for AmI applications. The high flexibility of such solutions offers options to tailor for very different workers and also tasks. Accessibility issues and workplace adaptation can be considered from a different angle than traditional approaches. Already during the planning status, without having a concrete case of a worker with special needs in mind, the basis for AAW can and needs to be implemented in the AmI infrastructure. With this understanding AmI in a work environment can support very different user needs and can help to create working environments for all. Acknowledgement. The work forming the basis of this paper has been supported by grants of the European Commission, the Bundesministerium für Arbeit und Soziales (BMAS, Germany), Deutscher Bundestag and the Ministerium für Arbeit und Soziales des Landes NRW (MAGS NRW).
References 1. Dalmec lifting support manipulator, http://www.dalmec.com/ disegni_manipolatori/micro/fotocolonna.jpg (download 12.11, 2008) 2. Ducatel, K., Bogdanowicz, M., Scapolo, F., Leijten, J., Burgelman, J.C.: Scenarios for ambient intelligence in 2010. Technical report, Information Society Technologies Programme of the European Union Commission, IST (2001) 3. Forschungsinstitut Technologie und Behinderung (Publisher): Digital informiert - im Job integriert. Dokumentation AbI-Kongress 2008, Evangelische Stiftung Volmarstein (2008) ISBN: 978-3-930774-14-2 4. HaptiMap, http://www.presse-service.de/data.cfm/static/ 710154.html (download 15.10. 2008) 5. Honda Body Weight Support System, http://world.honda.com/news/2008/ c081107Walking-Assist-Device/ (download 13.2. 2009) 6. Loadstone, G.P.S.: SmartPhone Navigation, http://www.loadstone-gps.com/ (download 14.11. 2008) 7. Mueller, J.: Assistive Technology and Universal Design in the Workplace. Assistive Technology 10(1), 37–43 (1998) 8. NAV4BLIND, http://www.nav4blind.de/ (download 12.06. 2008) 9. REHADAT, Information System for vocational rehabilitation, http://rehadat.de (download 12. 2008) 10. Robotic Suit HAL, http://web-japan.org/trends/07_sci-tech/images/ scia080822.jpg (download 3.11. 2008) 11. Sozialgesetzbuch (SGB) Neuntes Buch – Rehabilitation und Teilhabe behinderter Menschen (860-9), 19.6, BGBL I S.1046/7; 22.12, BGBL I S 2959 (2001)
Towards a Framework for the Development of Adaptive Multimodal User Interfaces for Ambient Assisted Living Environments Marco Blumendorf and Sahin Albayrak DAI-Labor, TU-Berlin Ernst-Reuter-Platz 7, D-10587 Berlin, Germany
[email protected]
Abstract. In this paper we analyse the requirements and challenges ambient assisted living and smart environments pose on interactive systems. We present a framework for the provisioning of user interfaces for such environments. The framework incorporates model-based user interface development technologies to create a runtime system that manages interaction resources and context information to adapt interaction. This approach allows the creation of adaptive and multimodal interactive ambient assisted living applications. Keywords: smart environments, multimodal interaction, model-based user interface development, ambient assisted living, multi-access service platform.
1 Introduction Recent developments show that the information society is evolving into a direction, where computing technology moves from single workstations towards distributed systems of networked interactive devices, appliances and sensors. Such systems provide access to a broad range of services from a variety of application domains and will be installed in different physical locations including homes, offices, cars and public spaces. They facilitate the combination of ambient intelligence with the idea to create applications that are tailored to users, interaction technologies and modalities as well as to the environment and different usage situation. This brings together two trends that, at first glance, seem to follow contrary means. On the one hand side the idea of ubiquitous computing aims at hiding technology and complexity in the environment to provide a more intelligent surrounding. On the other hand the general availability of interactive technology at all times raises the demand for the continuous availability of your personal services at your fingertips. Based on these trends, the idea of smart home environments supporting Ambient Assisted Living (AAL) concepts is currently under heavy development. AAL environments target the assistance of the user during every day live by embedding ambient intelligence into home environments. Aiming at increasing support for everyday activities by technology, smart home environments address any kind of user. However, some user groups are likely to benefit more than other. A main focus is thus currently put on the utilization of technology for elderly and disabled users that can gain support with everyday things they C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 150–159, 2009. © Springer-Verlag Berlin Heidelberg 2009
Towards a Framework for the Development of Adaptive Multimodal User Interfaces
151
can not master by themselves anymore. In any case, there are numerous issues that need to be solved independently of the targeted user group. In the following we will elaborate on the technological challenges from a humancomputer interaction perspective. We review the current state of the art and identify open issues. Afterwards we introduce a framework for the creation of multimodal user interfaces for smart home environments and identify its main components. A reference implementation in form of the Multi-Access Service Platform, successfully deployed in an ambient assisted living testbed as part of the Service Centric Home project (www.sercho.de), is presented. A conclusion and outlook summarize the work and give an outlook on future work.
2 Challenges for Interaction in AAL Environments In smart environments the internetworking of resources, including interaction devices (possibly combining multiple interaction resources), sensors and controllable appliances, forms a complex system offering new means for services and interaction. Realtime and reallife issues like continuous availability, extensibility, resource efficiency, safety, security or privacy [15, 2] are major challenges for such systems. However, interactive applications for such environments face additional challenges. In contrast to PC-based systems, personal devices and applications, interactive systems and applications embedded in the environment need to address multiple users and various user groups with different skills and preferences. Such systems are used in scenarios much less predictable than the usual “user in front of a PC” usage schema [15, 1]. Special needs of different target groups like supportive or rehabilitative usage, non disruptiveness, invisibility, declining capabilities of users, low acceptance for technical problems and the involvement in the active everyday life [1, 19, 2] have to be considered carefully. While personalization puts a strong focus on the user as the main actor for any kind of system, context-of-use adaptivity goes one step further and comprises adaptation to user, platform and environment. Adaptive systems are required to monitor themselves and their environment and provide appropriate reactions to monitored changes before they lead to a disruption of operation [15, 19, 10]. The required context information must be well defined and properly modeled [12]. The complexity of such adaptive systems is massively increased by the distributedness of AAL environments. The availability of multiple resources (interaction, sensors and appliances) raises the need to utilize different resources for interaction, making the adaptation to the different capabilities or even different modalities an essential issue. The interactive systems must be capable of dealing in real time with the distribution of input and output in the environment to provide humans with continuous, flexible, and coherent communication [10]. The distribution of interaction across multiple interaction resources sequentially can provide a richer interaction by addressing the fact that the user moves around in the environment during interaction. Using multiple interaction resources simultaneously takes into account the appropriateness of a combination of resources for a given task over the utilization of a single device. The combination of multiple different interaction resources also leads to the usage of multiple modalities and interaction paradigms. Interaction shifts from an explicit paradigm, in which the user’s attention is on computing, to an implicit paradigm, in which
152
M. Blumendorf and S. Albayrak
interfaces themselves proactively drive human attention when required [10]. Multimodal interaction can provide greater robustness of the system [16] and natural communication between user and system by voice and gestures can enhance usability [19, 14], especially if keyboard and mouse are not available or suitable to use. However, multimodal interaction is still not entirely understood and interfaces must be carefully designed, respecting each modality’s peculiarities, expressiveness, and interaction capabilities [12] and further investigation how users cope with multimodal distributed interaction is required [10]. Distributed multimodal interaction requires the ability to consider an unpredictable number of devices and device federations ranging from mobile phones or headsets to multitouch wall displays and needs to address the lack of a dominating interaction paradigm in AAL environments. Looking at all these challenges, a main factor is the overall interaction experience, which has to excellent, so that users like the vision of being surrounded by of computers (which is usually not the case with today’s GUIs) [14]. It is required to establish an appropriate balance between automation and human control [10]. While it is sometimes appreciated if the system learns human behavior patterns, human intervention directing and modifying the behavior of the environment should always be possible [10, 14]. Currently there is a strong lack of development tools, methodologies and runtime environments for such applications [10] and the integration of human-computer interaction and software engineering [14]. In our work we focus on the issues ambient assisted living environments pose to interactive systems at runtime. While development tools and methodologies can address the design issues of the described challenges, there is an urgent need to manage and maintain interaction at runtime. The consideration of context information, adaptation, handling of distribution and processing of multimodal interaction require extensive efforts at runtime. After introducing the state of the art in the next section, we thus focus on deriving an architecture addressing the identified issues at runtime.
3 Related Work Recent work in the area of ambient assisted living, ambient intelligence and smart home environments strive for the development of a common architecture and framework for AAL environments [11, 12, 2]. A main aspect of such frameworks is the integration of home control, sensor information and interaction means. The development and runtime management of multimodal interaction can be greatly supported by model-based approaches. Utilizing formal user interface models takes the design process to a computer-processable level that makes design decisions understandable for automatic systems. While earlier work in this field addresses the derivation of multiple consistent user interfaces from a single user interface model at design time, the focus has recently been extended to the utilization of user interface models at runtime [4, 6, 8, 18]. This provides the possibility to exploit the information contained in the model for user interface adaptation and the handling of multimodal interaction within changing contexts-of-use. Coninx [6] presents a runtime system that targets the creation of context-aware user interfaces. Similarly the DynaMo-AID Runtime Architecture [5] processes context information and utilizes user interface models to
Towards a Framework for the Development of Adaptive Multimodal User Interfaces
153
build adaptive user interfaces. Tycoon [13], SmartKom [17] as well as the conceptual structure of a multimodal dialogue system [7] and the Framework for Adaptive Multimodal Environments (FAME) [9] present architectures for multimodal interaction processing at runtime. Runtime issues like the coordination and integration of several communication channels that operate in parallel (fusion), the partition of information for the generation of efficient multimodal presentations (fission), and the synchronization of the input and output and feedback as well as adaptation to the context of use can be supported by the utilization of user interface models. In the following we present a framework, utilizing model-based technology for the creation of interactive AAL systems. It addresses the presented challenges and aims at managing multimodal user interfaces for AAL environments.
4 A Model-Based Framework for AAL Services From a user interface perspective a framework addressing the identified challenges has to support adaptation, distribution, migration and multimodality of user interfaces. Reflecting the clash of user interfaces for AAL environments and the ubiquitous computing paradigms, we propose to call such user interfaces Ubiquitous User Interfaces (UUIs). Recent work in this area has shown that model-based approaches can help addressing the various issues. In the following we identify the core features of a framework, combining UI models with a runtime system supporting UUIs. Figure 1 shows the general architecture. Based on a model of the user interface the architecture aggregates context information, fusion, fission and adaptation components as well as a connection to backend-services on the one and communication channels to the interaction resources on the other hand. A Dialog Manager handles the dialog based on the set of models, representing the interaction and user interface design on different levels of abstraction. For each interaction step it calculates the current dialog state based on the model, recent interaction and relevant context information. Context information is provided through a comprehensive context model, reflecting information perceived by observers (sensors) as well as predefined information like user preferences. It also maintains the available interaction resources based on which the distribution component distributes the current dialog state. Once the relation of interaction elements and interaction resources has been calculated, the information is passed to the Interaction Manager. It then takes care of the creation and management of the physical user interface from the internal state of the interactive system and the delivery of the different UI parts via communication channels. Communication channels connect interaction resources directly to the runtime system and make them accessible for interaction. They abstract from the underlying communication technology (e.g. HTTP) and deliver final user interface and continuous updates to the user and user input back to the system. Each UI part is thereby build up to a full user interface (e.g. a webpage for a browser) representing the given part of the interaction. User input received through a channel as result of an interaction is matched by the interaction manager and passed to the fusion component. The fusion component aims at interpreting the user input based on the expected input and relates input from different resources and modalities to find a suitable interpretation. The found interpretation is then handed to the dialog manager as a basis to calculate the next dialog state
154
M. Blumendorf and S. Albayrak
based on the models and the different levels of abstraction. Additionally, a call to the application logic can be issued via the service model to trigger any system action. Backend services also have the possibility to access context information and to directly alter the UI presentation, e.g. to proactively contact the user or present crucial notifications. An API allows external access to directly control the user interface via the adaptation component. Once the new UI state has been calculated it is again distributed and the perceivable presentations are updated. The distribution component is also responsible to trigger the migration of (part of) the UI to alter the distribution even within a dialog step.
Fig. 1. A runtime architecture, reflecting the identified requirements
The described approach allows the connection of multiple interaction resources to a server-side system managing the interaction and UI distribution. The architecture separates input and output IRs, to be able to (dynamically) incorporate additional IRs at any time (top of Figure 1). While output IRs are provided with a presentation UI, input IRs can be configured i.e. to limit the possible input. This approach e.g. allows the limitation of the vocabulary of a speech recognizer or the incorporation of a gesture device and the restriction of the known gestures at runtime. Input processing as well as the response generation happen based on a multi-step process, enriching received input and created output with additional means on different levels of the architecture and the UI model. Based on this reference architecture, we developed the Multi-Access Service Platform, a model-based runtime system that allows the derivation and adaptation of user interfaces from design models at runtime. In the following we illustrate selected aspects of the platform.
Towards a Framework for the Development of Adaptive Multimodal User Interfaces
155
5 The Multi-Access Service Platform The described architecture has been implemented as the Multi-Access Service Platform (MASP) in the Service Centric Home Project. The MASP aims at the creation of ubiquitous user interfaces for smart home environments and provides a model-based runtime system to create distributed multimodal interaction. The core of the MASP is a set of user interface models representing the interaction on different levels of abstraction. The well-accepted Cameleon Reference Framework [4] proposes four level of abstraction (task & domain, abstract, concrete & final UI) which has been incorporated into the framework above. Figure 2 shows an overview of the net of models we are currently utilizing in the MASP. It follows the same abstraction levels, but also introduces additional models for configuration purposes. The models are connected by mappings and executed at runtime as we described in [3]. This allows the models to dynamically evolve over time and describe the application, its current state and the required interaction as a whole instead of providing a static snapshot only.
Fig. 2. The MASP user interface models, maintained by the dialog manager and the mappings between the different parts
While the service model connects a given backend system at runtime, the task- and domain-model describe the basic concepts underlying the user interface. Based on the defined concepts, the interaction models define the actual communication with the user. They thereby define abstract interaction elements (Abstract Interaction Model), aiming at a modality and device independent description of interaction and concrete elements (Concrete Input & Concrete Output Model), targeting specific modalities and devices. The interaction model has a main role to provide support for multimodal interaction. A context model provides interaction-relevant context information. The model holds information about the available interaction resources and allows their incorporation into the interaction process at runtime. Additionally it provides information about user and environment and comprises context providers, continuously delivering context information at runtime. The model is continuously updated at runtime to reflect the current interaction context. Based on these models a fusion and a distribution model define fusion and distribution rules to support multimodal and distributed user interfaces. A layouting model produces layout constraints for different usage
156
M. Blumendorf and S. Albayrak
scenarios. Finally a mapping model interconnects the different models and ensures synchronization and information exchange between them. By linking the task model to service and interaction model, the execution of the task model triggers the activation of service calls and interaction elements. While service calls activate backend functions, active interaction elements are displayed on the screen and allow user interaction. They also incorporate domain model elements in their presentation and allow their manipulation through user input as defined by the mappings. The context model influences the presentation of the interaction elements that are related to context information. Thus, the execution of the task model triggers a chain reaction, leading to the derivation of an interaction state from the defined user interface model. This state contains a set of concrete interaction elements (as well as the related tasks and abstract interaction elements) that are the basis for the distribution calculation. Based on the distribution rules, the elements are assigned to interaction resources and layout rules are applied for each resource. The delivery of the elements and the final rendering are done by the interaction manager and the communication channels. The described models are created by the application developer, based on a set of metamodels that outline the modeling possibilities. Each metamodel can also define adaptation possibilities (construction elements) that can be applied at runtime. These can be used by the designer to create an adaptation model for his application influencing the final presentation of the user interface based on context information.
Fig. 3. Screenshot of our implementation of the meta-UI
Based on these models, the MASP provides a comprehensive set of features that can be directly controlled via the Meta-UI shown in Figure 3. Additionally the features can also be incorporated into applications to support application specific behavior. This includes control over used modalities, the migration of the user interface by directly choosing interaction resources, the distribution of parts of the user interfaces, and the (de-)activation of predefined adaptation means. While our underlying models aim at describing the anticipated interaction, the components of the architecture provide the features at runtime. Multimodality is realized by the distribution component,
Towards a Framework for the Development of Adaptive Multimodal User Interfaces
157
segmenting the user interface across multiple resources supporting multiple modalities, and the fusion component, matching and interpreting input from different modalities. Both components are strongly related to the underlying user interface model allowing the configuration of the components. The distribution component takes account of the information about all available interaction resources, stored in the context model to calculate the optimal usage of the available interaction resources based on the current status of the interaction of the dialog manager. Using the active input elements from the concrete input model and the active output elements from the concrete presentation model it calculates a distribution, also taking into account the relations between the elements expressed on the other abstraction levels. Interaction elements related to the same task are e.g. likely not to be separated. A main goal of the distribution is to support as many different input capabilities as possible and guidance to the user about how to use these. Additionally the recently used interaction resources and modalities are considered to ensure consistency during the interaction. Once the distribution has been calculated, the assigned user interface elements are delivered via the communication channels, which also perform resource specific adaptations of the final user interface and handle the communication. Utilizing these channels ensures full control over each used interaction resource by providing the means to manipulate the user interface. In terms of output i.e. adding and removing presentation elements at any time, in terms of input i.e. altering the input configuration. Input configurations help restricting the possible user interaction in a single modality and thus allow incorporating context and influencing the recognition engines. In contrast to other multimodal approaches we did not put a main focus on the semantic analysis of any user input but at the provisioning of a multimodal user interface guiding and restricting the possible interaction. As input configurations are derived from the abstract and concrete interaction models, these models are also used to interpret the received input and derive its meaning. While the abstract interaction model combines input and output and focuses on the modality independent definition of the interaction goals in terms of commands, choices, free input and output, the related concrete interaction model separates input and output interfaces and aims at the definition of concrete interaction necessary to complete the abstract interaction goals. After preprocessing and filtering monomodal input by the channel and the interaction manager, the fusion engine combines multiple inputs based on information from the concrete interaction model and its configuration in the fusion model. Afterwards this input is evaluated in terms of the goals of the abstract interaction model and then finally brought down to the task and domain level where it either alters a domain object or completes a task (which could also lead to a service invocation). Each interaction can thus be interpreted on multiple levels of abstraction before finally proceeding to the next interaction step of the dialog. The combination of multiple devices allows combining complementary devices to enhance the interaction and support multiple modalities and interaction styles. A common example for this feature is the utilization of a mobile device as a remote control for a large display by distributing an application across devices and modalities. Finally this whole user interface derivation and input interpretation process can be influenced by an adaptation component. The component is configured by an adaptation model, holding adaptation rules, and has access to all models. Similar to the evolution model of [18] the adaptation model defines an adaptation as graph transformation, denoting the node(s) to apply the transformations to (left-hand-side) and a
158
M. Blumendorf and S. Albayrak
description of the alteration of these nodes (right-hand-side). The alteration of the nodes is defined in form of construction elements that are provided by each metamodel to predefine possible adaptations. Adaptations are triggered by context situations and thus by states reached by the context model. Due to the availability of design information in the user interface models at runtime, the defined adaptations can address a broad spectrum of model alterations. However, in contrast to other approaches, the possible adaptations for each model are determined by its metamodel and thus only well defined adaptation can be performed and the integrity of the models is ensured at any time.
6
Summary and Outlook
In this work we presented a framework for the creation of smart home user interfaces, supporting context-sensitive, multimodal interaction. The framework aims at providing application spanning functionality to ease the development of future smart home applications. User interface adaptation, migration, distribution and multimodal interaction have been in the focus for the development of the framework. Based on the developed framework we created the Multi-Access Service Platform (MASP, masp.dai-labor.de), a reference implementation using executable user interface models to derive distributed multimodal user interfaces. The MASP has been successfully used in the Service Centric Home Project and different multimodal applications have been built. This includes a cooking assistant, an energy assistant and a meta user interface allowing to control the application-spanning features (adaptation, migration, distribution and multimodality). We are currently creating a health monitor example applications, providing ubiquitous access to your health data, based on a portable sensor platform and MASP supported interaction. In our future work we aim at broadening our approach to other surroundings like offices, public spaces, cars, or other mobile situations. This also leads to the concept of personal interactive spaces, virtually surrounding a user continuously. Additionally the improvement of the input processing capabilities and the multimodal interaction are a major focus as well as the improvement of existing and the development of additional adaptation means. Based on our Ambient Assisted Living Testbed we also want to conduct further user studies and put a stronger focus on elderly and disabled users as main target groups.
References 1. Abascal, J., Fernández de Castro, I., Lafuente, A.L., Cia, J.M.: Adaptive interfaces for supportive ambient intelligence environments. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2008. LNCS, vol. 5105, pp. 30–37. Springer, Heidelberg (2008) 2. Becker, M.: Software architecture trends and promising technology for ambient assisted living systems. In: Assisted Living Systems - Models, Architectures and Engineering Approaches. Dagstuhl Seminar Proceedings, vol. 07462 (2008) 3. Blumendorf, M., Lehmann, G., Feuerstack, S., Albayrak, S.: Executable models for human-computer interaction. In: Interactive Systems - Design, Springer, Heidelberg (2008)
Towards a Framework for the Development of Adaptive Multimodal User Interfaces
159
4. Calvary, G., Coutaz, J., Thevenin, D., Limbourg, Q., Souchon, N., Bouillon, L., Florins, M., Vanderdonckt, J.: Plasticity of user interfaces: A revised reference framework. In: Proceedings of the First International Workshop on Task Models and Diagrams for User Interface Design, pp. 127–134. INFOREC Publishing House (2002) 5. Clerckx, T., Vandervelpen, C., Luyten, K., Coninx, K.: A task-driven user interface architecture for ambient intelligent environments. In: Proceedings of the 11th international conference on Intelligent user interfaces, pp. 309–311. ACM Press, New York (2006) 6. Coninx, K., Luyten, K., Vandervelpen, C., Van den Bergh, J., Creemers, B.: Dygimes: Dynamically generating interfaces for mobile computing devices and embedded systems. In: Chittaro, L. (ed.) Mobile HCI 2003. LNCS, vol. 2795, pp. 256–270. Springer, Heidelberg (2003) 7. Delgado, R., Araki, M.: Spoken, Multilingual and Multimodal Dialogue Systems. John Wiley & Sons, Ltd., Chichester (2006) 8. Demeure, A., Calvary, G., Coutaz, J., Vanderdonckt, J.: The comets inspector: Towards run time plasticity control based on a sematic network. In: Proceedings of the 5th annual conference on Task models and diagrams (2006) 9. Duarte, C., Carriço, L.: A conceptual framework for developing adaptive multimodal applications. In: Proceedings of the 11th international conference on Intelligent user interfaces, pp. 132–139. ACM Press, New York (2006) 10. Emiliani, P.-L., Stephanidis, C.: Universal access to ambient intelligence environments: opportunities and challenges for people with disabilities. IBM Syst. J. 44(3), 605–619 (2005) 11. Kirste, T., Rapp, S.: Architecture for multimodal interactive assistant systems. In: Statustagung der Leitprojekte “Mensch-Technik-Interaktion” (2001) 12. Emiliani, P.-L., Burzagli, L., Gabbanini, F.: Ambient intelligence and multimodality. In: Stephanidis, C. (ed.) UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 33–42. Springer, Heidelberg (2007) 13. Martin, J.-C.: Tycoon: Theoretical framework and software tools for multimodal interfaces. In: Intelligence and Multimodality in Multimedia interfaces. AAAI Press, Menlo Park (1998) 14. Mühlhäuser, M.: Multimodal interaction for ambient assisted living (aal). In: Assisted Living Systems - Models, Architectures and Engineering Approaches. Dagstuhl Seminar Proceedings, vol. 07462 (2007) 15. Nehmer, J., Becker, M., Karshmer, A., Lamm, R.: Living assistance systems: an ambient intelligence approach. In: Proceedings of the 28th international conference on Software engineering, pp. 43–50. ACM, New York (2006) 16. Reeves, L.-M., Lai, J., Larson, J.-A., Oviatt, S., Balaji, T.-S., Buisine, S., Collings, P., Cohen, P., Kraal, B., Martin, J.-C., McTear, M., Raman, T.-V., Stanney, K.-M., Su, H., Wang, Q.-Y.: Guidelines for multimodal user interface design. Commun. ACM 47(1), 57– 59 (2004) 17. Reithinger, N., Alexandersson, J., Becker, T., Blocher, A., Engel, R., Löckelt, M., Müller, J., Pfleger, N., Poller, P., Streit, M., Tschernomas, V.: Smartkom: adaptive and flexible multimodal access to multiple applications. In: Proceedings of the 5th international conference on Multimodal interfaces, pp. 101–108. ACM Press, New York (2003) 18. Sottet, J.-S., Ganneau, V., Calvary, G., Coutaz, J., Demeure, A., Favre, J.-M., Demumieux, R.: Model-driven adaptation for plastic user interfaces. In: Baranauskas, C., Palanque, P., Abascal, J., Barbosa, S.D.J. (eds.) INTERACT 2007. LNCS, vol. 4662, pp. 397–410. Springer, Heidelberg (2007) 19. Weber, W., Rabaey, J.-M., Aarts, E. (eds.): Ambient Intelligence. Springer, Heidelberg (2005)
Workflow Mining Application to Ambient Intelligence Behavior Modeling Carlos Fernández1, Juan Pablo Lázaro1, and Jose Miguel Benedí2 1
Research Group of technologies for Health and Wellbeing (TSB), ITACA Institute, Polytechnic University of Valencia, Spain {cfllatas,jplazaro}@itaca.upv.es 2 Department of information Systems and Computation (DSIC), Polytechnic University of Valencia, Spain
[email protected]
Abstract. The handmade human behavior modeling requires too many human resources and for too long a time. In addition, the final result does probably not reflect the current status of the person due to the influence of time. The use on Workflow Mining techniques to infer human behavior models from past executions of actions can be a solution to this problem. In this paper, a Human Behavior modeling methodology based on Workflow Mining Techniques is proposed.
1 Introduction A workflow is a formal representation of a certain process in order to be managed by a computer-based system. Humans benefit from this formal representation, thanks to the use of authoring tools. Using these tools a non-expert in computer sciences can create complex processes, enabling them to be managed automatically by a system, without the need of writing any line of source code. The wider purpose of an Ambient Intelligence system is to provide to a certain user a number of services in a proactive way according to his/her current context, as well as hiding the great complexity of the technology, and to provide these services using a communication paradigm that is as easy as possible to understand and to be used. This is not a trivial problem to solve from the technological point of view, because it requires an important concept to be well implemented in the ambient intelligence system: a good model of the user behaviour and user context. The implementation of an individualized model of the behaviour of a simple user is a very complex task. In order to implement it, the participation of experts in the field of knowledge of the type of behaviour that is wanted to be modelled, is required. Then, after observing the behaviour patterns of the user in a sufficient period of time (months or even years) the experts in elderly care will be able to define which the behaviour model of that user is, and convert it into a formal workflow for its automatic processing. This methodology has two important disadvantages: firstly, it requires too many human resources and for too long a time, and secondly, the final result probably doesn’t reflect the current status of the person due to the influence of time. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 160–167, 2009. © Springer-Verlag Berlin Heidelberg 2009
Workflow Mining Application to Ambient Intelligence Behavior Modeling
161
Using Pattern Recognition techniques enables us to infer the workflows from prior examples. This methodology is commonly known as Workflow Mining. The use of Workflow Mining in the human behaviour modelling field does not only require the use of inference algorithms, but it also requires these algorithms to represent the workflows in a simple structured way, but with the great expressivity needed in real life. An adequate high level of expressivity is considered essential because the utilization of complex systems for representing workflows makes the interpretation of the results of the inference algorithms difficult. Furthermore, if the expressivity is not enough, the real problem might not be well represented by the system. This is the reason why the utilization of finite state models, and in particular the Timed Parallel Automaton (TPA) [10] is probably the best option to define Workflows that have high expressiveness requirements and need to be simple enough to be interpreted by software systems. Pure workflow mining techniques found in literature are based on the use of mediators for the creation of general purpose workflow models. These techniques can be applied to infer, not only fundamental behaviour models of the user, but also individual models. Therefore, a strategy based on the obtaining of individualized models that overtrain user’s parameters and learn the specific behaviour of that person is proposed. By using this trained model it is possible to know, at any given time, whether the behavioural pattern of that user is normal or not. The system could detect changes in individualized behaviour patterns by means of the analysis of the historical behaviour of the user. In the following paper this methodology is presented as an alternative to discover human behaviour patterns. This technological solution is used to model an individualized user behaviour using automatic learning methods, like Workflow Mining techniques, that allow inferring individual behaviour patterns from the actions performed by the user. This methodology is currently being applied in PERSONA [20] European project where this technology is being used to detect abnormal behavioural patterns in elderly people living at home. The behavioural information is acquired from the information coming from a massive distribution of sensors and detectors and the system is implemented in a computer with limited resources in terms of memory and processing capacity. This fact makes the simplicity of the inferred workflow by the proposed technology very important. 1.1 Ambient Intelligence and Behavior Modeling The Ambient Intelligence (AmI) model is focused on the creation of physical spaces where the technology is thought to serve the people. The current concept of AmI is the result of the evolution of the ubiquitous computing [19] of Mark Weiser and the vision of ISTAG (Information Society Technology Advisory Group). In the AmI concept, the services are thought to continuously empower the user in terms of time and space. Those services must be as invisible as possible to users’ sensors and psichology and be as less intrusive as possible. Therefore, the interfaces among the user and the services must be natural and cannot be an obstacle to the interaction between the user and the environment. Into the AmI model there is an essential concept known as Context. The context is the user projection in the system. This projection supposes the mapping of user data in
162
C. Fernández, J.P. Lázaro, and J.M. Benedí
repositories. In this way, the higher the quality of the data, the higher the quality of the services the system can offer. That user information is gathered through sensor systems. The information produced by those sensors is usually processed by intelligent algorithms based on artificial intelligence and pattern recognition techniques. Those algorithms are working as ‘software sensors’ and provide a higher and richer level of information, while the ‘hardware sensors’ provide the basic raw information of the user. For example, hardware sensors can offer information like Heart Rate or number of steps, while ‘software’ sensors can offer information like mood or activity level. This high level information is crucial to know the status of the user. That data only offer a vision over the static status of the user. Nevertheless, the human behavior is inherently dynamic. Human behavior is the collection of behaviors exhibited by human beings and influenced by the environment. As the environment is continuously changing and the user is more and more experienced, the human behavior is continuously evolving. For that reason, static data is not enough to offer a holistic view of the human behavior evolution. Therefore, we need to study the historical information of the actions of the user to detect anomalous behavioral patterns to improve the diagnosis of the status of the user. For example, some symptoms of the dementia illness are based on anomalous behavioral patterns [18], like excessive flirtatiousness, social withdrawal, or agitation. In addition, the wide variability of humans makes the creation of a general model that explains the behavioral patterns of different kind of human people very difficult. The behavioral patterns are different in each human being and the same aspects can suppose different reactions depending of the person. For example, defining a social withdrawal model is very different for a very shy person than for a very self-confident person. So, the definition of models that allow us to know the behavioral status of the patient and his evolution has to be done individually. 1.2
Workflow Technology
The human behavior has a wide variability and interdependence among the processes involved. This requires models with a high capacity of representation to allow the processes involved to be described. The use of natural language allows professionals to represent the processes with the needed expressivity. Nevertheless, the use of nonformal languages to represent human behavior adds undesirable ambiguity to specify those kinds of processes. In addition, the use of formal languages for representing human behavior allows us to take profit of the big amount of formal frameworks available in literature in order to automate, represent and learn these kinds of processes. The processes involved in the human behavior at one specific moment are usually based on the previous process and affect the future actions performed by the human being. In literature, there are a research field that fits with this syntactical point of view. This field is known as Workflow Technology [1]. The Workflow Technology is intended to provide a framework to represent, automate and mine processes in order to define them in a standardized way. The main objective of this framework is the study of Workflows. According to the Workflow Management Coalition definition, a Workflow is the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules [2]. In spite of the help that suppose the
Workflow Mining Application to Ambient Intelligence Behavior Modeling
163
use of Workflows for representing human behavior, the design of these kind of systems continue wasting too many computing resources. On one hand, the big amount of variables of an individual to be studied by experts may make the study larger in time. On the other hand the behavior of the individual is continuously changing. So, if the behavior of the individuals study requires too much time, the conclusion of the study will probably arrives too late and the actual status of the individual was not in accordance with the current detected status making the system useless. Therefore, it is crucial to create algorithms and tools that provide real views of the behavior of the individual order to allow the experts to detect anomalies as soon as possible. The daily actions of the user are gathered by the context of Ambient Intelligence System. Those actions can be used to mine the current status of the individual and providing a graphical view of the current behavior of the user, which can suppose a great help to know the current status. In literature, there is a research framework based on the use of action logs to infer Workflows which explain the flow followed by the processes. This research framework is known as Workflow Mining or Process Mining [3]. Most of the works in literature are based on the use of transactional logs as samples for workflow inferences [6, 8 and 9]. These models use logs from general systems of workflow management as input samples to infer models that explain the whole system. There are two approaches of Workflow Mining methodologies based on the kind of data gathered. One of these methodologies is the Event-based approach. The Eventbased Workflow Mining [6] approach learns workflows using the available information on transactional logs as input samples. The algorithms that are based on Event-Based data take into account the starting information of processes (i.e., the action name and the starting time) but do not consider the results of the actions. Nevertheless, the behavior modeling problem needs this kind of data to specify the process flow. When an action is made by the user this is probably related with the previous action results made, and the result of current actions affect to future actions that will be made by him. For example, the alarm procedure must be triggered when the detection action finds fire or a gas escape. In this case, the event based approach does not take into account the result of the detection action and, thus, it is not possible to infer this kind of processes. The second approach is the Activity-based Workflow Mining [10]. This model takes into account the results of the actions making possible the inference of behavior modeling processes, which can be based on the results of previous actions. This approach allows to infer flows like the fire example, because it can represent that the Alarm sequence is triggered when the fire detection action returns true. In Pattern Recognition theory [12], it is usual to select a representation framework bounded with theoretical properties which facilitates the inference. Most of the syntactical pattern recognition works are based on regular languages framework [13, 14 and 15]. Commercial Workflow representation languages are too complex to be covered by this framework making the inference more difficult. In addition, the field of human behavior is very demanding and requires very expressive representation languages to explain the complexity of real processes including parallel patterns and time. In literature, there is a very expressive formal framework able to describe workflows called TPA[10]. This model has theoretically probed its expressivity covering the control flow Workflow Patterns[5], which are the standard way to measure the expressivity in workflow theory. In addition, this model was used to describe formally Life-Style Activity Protocols [16] that represent the human behavior of the user in a
164
C. Fernández, J.P. Lázaro, and J.M. Benedí
specific context. The most important feature of this model is its complexity. This model is equivalent to regular language framework. That means that this model can use the powerful tools available for this framework for inference. In the literature, there is an Activity-Based Workflow Mining Algorithm called PALIA [11], able to infer TPA from executed actions information. This algorithm can be applied to infer the flow with the labeled transitions representing the result of the actions. In this paper, we use the Activity based approach to infer individualized Workflow models of the human behavior to facilitate the detection of anomalous behavior of people in their own context.
2 Results In this paper, a new methodology for using Workflow Mining systems to allow experts to infer Workflows that explain the individualized human behavior is presented. In Figure 1, a general schema of the methodology is presented. The data gathered by the sensors is stored in the AmI context repository. These data are the actions performed by the user and their results ordered in time. This data is used by the Workflow Mining algorithm to infer Workflow models readable by the expert who is able to detect changes in the behavior of the user. When the user under analysis is introduced in the AmI environment, the system starts to store all the information available into the AmI context in order to create a projection of the user in the system. Using this information, it is possible to create a Workflow Mining corpus selecting the most relevant data associated to the behavior of the user. In this way, the higher the quality of the user data, the clearer the workflow we obtain. The historical data needed is the date and time of the start and end events of each action performed, and the result of the action (i.e. LookAgenda, TakeHeartRate, etc). Using this data, a workflow that represents the user behaviour of a user in a certain environment is inferred by using workflow mining techniques. The first Workflow inferred represents the basic behaviour that will be compared with
Fig. 1. Workflow Mining process Methodology
Workflow Mining Application to Ambient Intelligence Behavior Modeling
165
future ones. The comparison with new inferred workflows defines the current status of the user indicating if the user behaviour is compliant with the previous model or on the contrary, if the user is following an abnormal (different) behaviour. The abnormal behaviour of the user may have two reasons. The first reason is that the user is having a problem that interferes in his/her normal flow of life, and the second is that the user has evolved and therefore, he/she changed the normal behavioural pattern. The latter will cause a new iteration starting again from the first phase. The expert user is the mediator in charge to decide if the changes in the behaviour of the user are due to a normal evolution in the user life, or, on the contrary, due to a problem. In this case, experts are able to detect dementia, depression and other problems in early stages. In order to test that methodology a prototype experiment is made using Workflow Mining technology. Due to the lack of available activity-based corpus in human behaviour modelling research field a lab experiment was made. In the experiment a Workflow Simulator [17] was used to create activity based corpus which can be used to test it. In addition, a modified version of PALIA [11] algorithm, specifically designed to count the number of accessed arcs and states, was made. Using the Workflow simulator, a patient life trace was simulated. On one hand, using this software, 90 days of a normal life of a specific patient was expressed in a log. The simulated patient was an old widow who lives alone in an AmI environment. Using those 90 samples, PALIA algorithm was used to infer a Workflow which represents the usual actions made by the user. On the other hand, another simulation was made via modifying the standard way of life of the patient adding some dementia indicating behaviour patterns, such us social withdrawal and memory errors. This
Fig. 2. Example of individual Workflow Mining use for dementia detection
166
C. Fernández, J.P. Lázaro, and J.M. Benedí
corpus is also inferred by using PALIA Algorithm. Both Workflows inferred are presented in the Figure 2. The workflow of the left represents the first workflow. In this Workflow each state represents an action, the triangle represents the start action and the double circle represents the final action. The rest of circles represent intermediate actions. The arrows among the circles represent the pass between two actions depending on the results of the actions represent by the labels. The number in the nodes is the number of executions which has achieved the specific node. The number in the arrows is the number of executions which has performed the action destiny after the origin action when the result of the origin action was the indicated on the label of the arrow. The Workflow on the right represents the second Workflow. This Workflow has the same representing shapes than the previous one. As can be seen in the figure, the simulated workflows show interesting differences: • In the first Workflow the actions performed before lunch are always dependent on the AmI agenda and the user looks strict with the actions made. In this Workflow, the user goes playing with fiends and cooks with family most of the times. Nevertheless in the second workflow, the user stays at home watching TV avoiding dates with friends and family most of the times. • In The first workflow, the user always has lunch at a restaurant when he plays with friends. In the second workflow, the user has lunch home most of the times when he is out playing with friends. • In two occasions, in the second Workflow, the user must go out for lunch after cooking for family. It can be a symptom of errors in cooking that haven’t occurred in the first Workflow. • In the first Workflow, the user sometimes does not have dinner at home probably because he goes out with friends. Nevertheless, the user always has dinner at home in the second Workflow. As can be seen in the Workflow, it is very easy to find these symptoms that clearly show the user is more and more decreasing his social life and probably has some memory problems (forget dates and cooking recipes). As expected, those symptoms point to possible dementia problems.
3 Conclusions and Future Work The use of Workflow technology allows using a wide spectrum of tools to represent, execute and mine human behavior models. Nevertheless, the representation of Human behavior is a hard task. Due to the wide variability of human beings, the use of generalized human behavior models is not enough to classify the behavior of users. The use of Workflow Mining techniques to model the human behavior facilitates experts to detect anomalous behavior of users in an individualized way, making a picture of the user behavior during a specific period of time. The capability to take a snapshot of the behavior of the user allows the comparison of the evolution of the human being with snapshots taken in previous steps. In the laboratory experiment, we can see that it is possible to use Workflow Mining techniques to help experts to detect anomalous behavior of the user in order to detect some problems in early stages.
Workflow Mining Application to Ambient Intelligence Behavior Modeling
167
Once this algorithm is successfully tested in laboratory conditions, the next step is to use that algorithm in a real environment to test it with real problems. This algorithm is currently being installed in the PERSONA project platform and it is planned to be used in the second phase of the project (mid 2009) to provide high level knowledge to the Ambient Intelligence context.
References 1. Fischer, L. (ed.): Workflow Handbook 2001, Workflow Management Coalition. FutureStrategies. Lighthouse Point, Florida (2001) 2. WfMC. Workflow Management Coalition Terminology Glossary. WFMCTC-1011, Document Status Issue 3.0 (1999) 3. van der Aalst, W.M.P., van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters, A.J.M.M.: Workflow mining: A survey of issues and aproaches. Data and Knowledge Engineering 47, 237–267 (2003) 4. van der Aalst, W., et al.: Workflow Patterns Distributed and Parallel Databases, p. 70 (2003) 5. van der Aalst, W., et al.: Workflow mining: Discovering process models from event logs. IEEE Transactions on Knowledge and Data Engineering 16, 1128–1142 (2004) 6. de Medeiros, A.K.A., Weijters, A.J.M.M.T., van der Aalst, W.M.P.: Genetic process mining: A basic approach and its challenges. In: Bussler, C.J., Haller, A. (eds.) BPM 2005. LNCS, vol. 3812, pp. 203–215. Springer, Heidelberg (2006) 7. Fernandez, C., Benedi, J.M.: Activity-Based Workflow Mining: A Theoretical framework Workshop On Technologies for Healthcare and Healthy Lifestyle (2006) ISBN:978- 84611-1311-8 8. Fernandez, C., Benedí, J.M.: Timed parallel automaton learning in workflow mining problems. In: Ciencia y Tecnología en la Frontera (2008) ISSN:1665-9775 9. Schalkoff, R.J.: Pattern recognition: statistical, structural and neural approaches. John Wiley & Sons, Inc., Chichester (1991) 10. Oncina, J., Garcia, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 448–458 (1993) 11. Oncina, J., García, P.: Inferring regular languages in polynomial update time. In: Pattern Recognition and Image Analysis 1991. World Scientific Publishing, Singapore (1991) 12. Coste, F.: State merging inference of innite state classifiers. In: Rapport technique n INRIA/RR-3695, IRISA (September1999) 13. Dominguez, D., Fernandez, C., Meneu, T., Mocholi, J.B., Serafin, R.: Medical guidelines for the patient: Introducing the life assistance protocols. In: Computer-based Medical Guidelines and Protocols: A Primer and Current Trends, vol. 139, p. 282. IOS Press, Amsterdam (2008) 14. Fernandez, C., Sanchez, C., Traver, V., Benedí, J.M.: TPAEngine: Un motor de workflows basado en TPA. In: Ciencia y Tecnología en la Frontera (2008) ISSN:1665-9775 15. Santacruz, K., Swagerty, D.: Early Diagnosis of Dementia American Family Physician 63(4) (February 2001) 16. Weiser, M.: Some Computer Science Issues in Ubiquitous Computing. CACM 36(7), 74– 84 (1993) 17. PERSONA Consortium. PERSONA project: PERceptive Spaces promOting iNdependent Aging, http://www.aal-persona.org/
Middleware for Ambient Intelligence Environments: Reviewing Requirements and Communication Technologies Yannis Georgalis1, Dimitris Grammenos1, and Constantine Stephanidis1,2 1
Institute of Computer Science, Foundation for Research and Technology – Hellas (FORTH), GR-70013 Heraklion, Crete, Greece 2 Computer Science Department, University of Crete, Greece {jgeorgal,gramenos,cs}@ics.forth.gr
Abstract. Ambient Intelligence is an emerging research field that aims to make many of the everyday activities of people easier and more efficient. This new paradigm gives rise to opportunities for novel, more efficient interactions with computing systems. At a technical level, the vision of Ambient Intelligence is realized by the seamless confluence of diverse computing platforms. In this context, a software framework (middleware) is essential to enable heterogeneous computing systems to interoperate. In this paper we first consider the basic requirements of a middleware that can effectively support the construction of Ambient Intelligence environments. Subsequently, we present a brief survey of existing, general-purpose middleware systems and evaluate them in terms of their suitability for serving as the low-level communication platform of an Ambient Intelligence middleware. Finally, we argue that an Object-Oriented middleware such as the Common Request Broker Architecture (CORBA) is most suited for basing a middleware for Ambient Intelligence environments.
1 Introduction 1.1 Ambient Intelligence The term Ambient Intelligence (AmI) describes those environments that enclose a plethora of diverse computing systems, embedded to, and indistinguishable from, the environment in which they operate [1]. An AmI infrastructure aims to support users in carrying out their everyday life activities by offering them an easy and natural way for interacting with the digital services that are provided by the hidden interconnected computing systems. In this respect, AmI environments provide the means to sense and construe the actions that serve the needs of their users in order to offer a personalized, context-sensitive and efficient interaction platform. 1.2 Distributed Services In an environment where interactions are realized by the confluence of different interconnected computing systems, the organization of the overall system architecture to a well-defined set of distributed software entities is crucial. The alternative centralized C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 168–177, 2009. © Springer-Verlag Berlin Heidelberg 2009
Middleware for Ambient Intelligence Environments
169
approach where all software entities run on a monolithic computing platform, despite being the easiest to implement, is neither scalable nor flexible. On the contrary, a distributed approach allows for (a) flexible, dynamic extension of the overall system with novel functionality, (b) enhanced system scalability, by sharing computation demands among different computers, (c) enhanced robustness, by isolating potential failures of individual software entities, and (d) unambiguous and straightforward modularization of the system’s architecture. Therefore, we consider an AmI infrastructure as a collection of interconnected distributed services; i.e., a collection of software entities that run on different machines, and are able to communicate with each other in order to provide to the infrastructure all the required functionality for sensing, drawing conclusions, and responding to the needs of its users.
2 Basic Requirements for an Ambient Intelligence Middleware 2.1 Role of the Middleware Despite being an overloaded term [2], middleware is a commonly used word in the context of distributed computing systems. In general, middleware is a set of programming libraries and programs (services) that constitute an indivisible platform which offers a comprehensible abstraction over the complexities and potential heterogeneity of the target problem domain. Different communication middleware platforms support different programming and communication models. Three of the most popular paradigms are the objectbased middleware, the event-based middleware and the message oriented middleware (MOM). In object-based middleware platforms, applications are structured into distributed objects that interact via location-transparent method invocation. Those platforms typically utilize the “request/response” communication style. On the other hand, event- and message- based systems mainly employ single-shot message exchange. Event-based middleware is particularly suited to the construction of noncentralized distributed applications that ultimately monitor and react to changes in their environment. In an inherently distributed environment such as AmI, the communication middleware should abstract over the intricacies of the underlying communication technologies, machine architectures and operating systems. Moreover, it should hide the distribution of the different parts that comprise the system and enable programs written in different programming languages to communicate seamlessly. Higher-level core functionality of an AmI infrastructure, e.g. context awareness, authentication, etc., basically comprises a set of appropriate services that depend on the middleware. 2.2 AmI Middleware Basic Requirements Enabling programs written in different programming languages to interoperate seamlessly is a key design goal in most middleware platforms. This is especially true in an AmI environment, where the basic system is built using diverse technologies from diverse research fields that traditionally utilize different programming languages.
170
Y. Georgalis, D. Grammenos, and C. Stephanidis
Synchronous communication is definitely essential for interacting with different AmI services. Using synchronous calls a service or an application can give commands to other services and query their internal state. However, the synchronous paradigm alone is inadequate for modeling all the interactions that can happen in an AmI environment. For this reason, asynchronous, event-based communication support is also needed in order to enable AmI services to notify interested parties about changes in their internal state, or to communicate the occurrence of an external (expected or unexpected) event. Event-based communication is also traditionally regarded as a mechanism for constructing loosely-coupled software components that need only know the format of the exchanged events without requiring any knowledge of the internal structure, implementation, or semantics of the entity that produced the event. Consequently, we consider both synchronous and asynchronous communication styles essential for an AmI middleware platform. Arguably, one of the most important and extensively researched properties of distributed systems is fault tolerance. Fault tolerance, in this context, refers to the property that enables an AmI infrastructure to continue to function properly even in the event of failures. Apparently, the failure of an AmI service that serves a specific function within an infrastructure should not, in any case, affect the other services that do not depend or use this failing service. On the other hand, services and applications that depend on a failing service, should handle gracefully its failure by continuing functioning with (potentially) reduced functionality. Ensuring that a failing service will not affect independent services is definitely the job of the middleware. However, the graceful handling of the failure of a dependent service cannot entirely be handled at the middleware level. The failback techniques ultimately depend on the semantics of the high-level task at hand. Therefore, we regard fault tolerance, in the context of an AmI communication middleware, as the set of functionality that supports the following: (a) isolates failures, (b) eliminates single points of failure within the core middleware infrastructure, (c) is able to restart failing services before the clients that use those services are affected, and (d) provides mechanisms for notifying higher level entities about the irreparable failure of a specific service. Another important requirement of an AmI middleware is security. Apparently, security, in order to be effective, should be considered throughout all the layers of an AmI infrastructure. In this context, however, we view security as the ability of the middleware to prevent malicious code from eavesdropping and exploiting the data exchanged through the network channels that enable services to communicate with each other. Orthogonally to the aforementioned requirements, a key property of the middleware that we consider absolutely essential is that it should be easy for developers to program with. Developing AmI components should feel natural to the programmer of all supported languages, who should be able to use distributed services as if they were local objects or functions of the source language. Going one step further, the middleware should limit, or even eliminate entirely, “boilerplate” (i.e. extraneous) code in order to enable the construction and usage of distributed services. An overview of the aforementioned requirements for an AmI middleware is presented in Table 1.
Middleware for Ambient Intelligence Environments
171
Table 1. Basic requirements of an AmI middleware Heterogeneity Communication Resilience Security Ease of use
Support for multiple languages and computing platforms Synchronous and asynchronous (event-based) communication Replication; isolation and graceful handling of failures Secure communication among distributed services Natural and intuitive usage for each target language
3 Communication Technologies Implementing a middleware that is able to satisfy all the aforementioned requirements requires substantial development effort. Fortunately, many existing communication technologies can be re-used towards this goal. In the following subsections, we will present the primary communication technologies that we have evaluated with respect to their appropriateness as the basis for a communication platform for an AmI middleware. 3.1 Common Object Request Broker Architecture (CORBA) The Common Object Request Broker Architecture (CORBA) [3] is a standard defined by the Object Management Group (OMG) [4] that provides a stable model for distributed object-oriented systems. CORBA enables software components written in multiple programming languages and running on multiple operating systems to work seamlessly together. The abstractions provided by CORBA Object Request Brokers (ORBs) allow for the creation and usage of distributed objects that look like typical local objects of the target programming language. The CORBA standard also defines a plethora of standard services that can be used to make the development of distributed systems easier and more robust. There are many advantages in using CORBA as a base for constructing a middleware for AmI environments. First of all, CORBA separates the definition of interfaces from their actual implementation using an interface definition language (IDL). The standard specifies a “mapping” from IDL definitions to specific programming constructs of the target implementation language. This mapping process enables the type safe invocation of the methods offered by distributed services, simplifies their implementation, and provides a comprehensive formal reference of the Application Programming Interface (API) that a specific service supports. CORBA is primarily designed for blocking request/response, synchronous type of communication. However, using the standard Notification Service [5], CORBA conformant applications can use publish/subscribe channels to effectively emulate asynchronous communication. Additionally, support for one-way method invocation, allows callers to continue execution without waiting for any response from the server. Fault tolerance in CORBA is not specifically addressed in earlier revisions of the standard. However, it allows for a high degree of fault tolerance by: (a) having clear invocation failure models (at-most-once and best effort delivery); (b) allowing clients to obtain persistent references to services through the standard Implementation Repository service [5]; (c) enabling object references to include multiple endpoints. Additionally, CORBA allows client code to register functions to intercept exchanged
172
Y. Georgalis, D. Grammenos, and C. Stephanidis
messages, enabling the creation of more advanced fault tolerance methods, depending on the target problem domain. Concerning the “ease of use” requirement, CORBA is arguably difficult to use. Nonetheless, having a versatile architecture, it allows for the construction of higher level communication platforms on top of it. Despite having many obscure, esoteric and obsolete features, a higher-level platform for AmI environments can definitely use a well-defined subset of CORBA and abstract away its “idiosyncrasies”. Furthermore, there are many high quality open source CORBA implementations for many different programming languages (e.g. [6], [7], [8], [9]). All these implementations have liberal licenses allowing client applications to be distributed at their authors’ own terms. Table 2 summarizes the key features of CORBA against the five requirements set in section 2.2. Table 2. Summary of CORBA features Heterogeneity Communication Resilience Security Ease of use
Supports multiple programming languages and computing platforms Synchronous request/response and asynchronous communication through the standard Notification Service Comprehensive invocation failure semantics, mechanisms for supporting transparent replication Ability to use encrypted communication channels Natural and intuitive usage for each target language
3.2 Internet Communications Engine (Ice) The Internet Communications Engine (Ice) [10] is a true object-based domainindependent middleware platform designed and implemented by ZeroC [11]. Ice derives its main architecture from CORBA, but tries to improve on it by (a) eliminating its unnecessary complexity, (b) providing better built-in security, (c) providing more efficient protocols for reduced network bandwidth and CPU overhead, and (d) providing extra functionality that is either underspecified or absent from the CORBA standard and its implementations. Slice, Ice’s equivalent of CORBA’s IDL, extends the latter by adding functionality for supporting dictionary types and by providing support for exception inheritance. Additionally, Slice allows the programmer to add directives describing the state of Ice objects so that they can be subsequently stored and loaded automatically. Ice offers many standard services including a service for propagating software updates around the distributed infrastructure (IcePatch), a very efficient Notification service (IceStorm), and a transparent proxy server that can be used for firewall traversal and enhanced security (Glacier). All in all, Ice succeeds in delivering a well-designed middleware that without trying to reinvent the wheel, offers a robust and easy to use platform for distributed computing1. That said, Ice’s disadvantages compared to CORBA implementations are stemming from purely practical reasons. On one hand, Ice’s GPL [12] license requires all the applications and services that use its libraries and generated code to be 1
It is worth noting that ZeroC comprises former CORBA implementers and a member of OMG’s Architecture Board (Michi Henning).
Middleware for Ambient Intelligence Environments
173
distributed under the GPL – a requirement that we have considered as too restrictive. On the other hand, the extra features offered by Ice were not deemed essential for the implementation of an AmI middleware. Nonetheless, ZeroC, in addition to the GPL license, offers proprietary licensing schemes for a fee. Consequently, should a proprietary licensing scheme be a viable option, Ice is an ideal communication platform for basing an AmI middleware. Table 3 summarizes the key features of Ice against the five requirements set in section 2.2. Table 3. Summary of Ice’s features Heterogeneity Communication Resilience Security Ease of use
Supports multiple programming languages and computing platforms Synchronous request/response and asynchronous communication through the IceStorm service Comprehensive invocation failure semantics, mechanisms for supporting transparent replication Ability to use encrypted communication channels, Glacier service Natural and intuitive usage for each target language
3.3 Web Services Web services [13], being the new Internet standard for service provision, are widely used in modern distributed systems. This technology uses a simple XML-based protocol to allow applications to exchange data across the Web. Services themselves are defined in terms of the well-defined XML documents – modeling messages – that are accepted and generated. Instead of providing a specification that offers high-level, standardized language-specific constructs for mapping service interfaces and data types, web service aware code, only needs to be able to generate and process the exchanged XML documents. The Simple Object Access Protocol (SOAP), that essentially constitutes the core of the Web services architecture, only defines the format of the exchanged messages, the marshaling rules for the data that appear in the messages, and a set of conventions for achieving Remote Procedure Call- (RPC) like functionality. Putting aside any performance considerations, especially in the context of contemporary high-speed networks, we found web services to be insufficient as a platform to base an AmI middleware. The potential advantages offered by an approach based on Web services such as universal firewall traversal, loose coupling of services and dynamic service composition, are outweighed by the disadvantages stemming from the absence of high-level programming idioms and communication guarantees in the specification. In an object-based middleware, the implementation of a service is realized (in object oriented languages) as the implementation of a class. Similarly, a remote call to a service is realized as a method invocation on a local object that acts as a proxy to the remote service. To the programmer, a service implementation or invocation is identical to the implementation and invocation of a local object of the target programming language. Additionally, remote invocations (at least in CORBA) have well defined failure semantics; at-most-once for blocking, synchronous calls and best effort delivery for one-way non-blocking calls. On the other hand, in Web services, the
174
Y. Georgalis, D. Grammenos, and C. Stephanidis
programmer has to implement the dispatching of the received messages to the appropriate functions of the target language in order to implement a service and explicitly construct and send a SOAP message in order to invoke a remote function. Moreover, the SOAP specification omits the definition of invocation failure semantics. The lack of natural programming abstractions is mitigated by the provision of additional libraries and tools. However, such tools are not standard and are available only for just a few programming languages. While universal firewall traversal is very important for geographically distributed services, it is not essential in the context of an AmI environment where the majority of the deployed services are restricted within a Local Area Network (LAN). Apparently, firewall traversal is also possible in CORBA by forcing a standard port to the Object Request Broker (ORB) of those services that should be visible from systems outside the basic infrastructure LAN and subsequently opening this port in the firewall either manually or through Universal Plug and Play (UPnP) messages. Also, the dynamic composition and invocation of Web services that need not know a priori the functions that a specific service supports is also possible in CORBA through its standard Dynamic Invocation Interface (DII). Table 4 summarizes the key features of Web Services against the five requirements set in section 2.2. Table 4. Summary of Web Services features Heterogeneity Communication Resilience
Security Ease of use
Support multiple programming languages and computing platforms Synchronous request/response. Asynchronous communication can be achieved but is not standard Depends on the underlying communication protocol and does not provide any mechanisms for supporting the implementation of resilience Ability to use encrypted communication channels Explicit message construction and dispatching
3.4 Thrift Thrift [14], used extensively in facebook [15], is a communication platform that emphasizes simplicity and efficiency in the delivery and invocation of distributed services. Naturally, Thrift enables the creation and usage of distributed services in many different programming languages. Using an Interface Definition Language (IDL) much like CORBA’s IDL, Thrift effectively separates the description of a service from its actual implementation while providing a natural object-oriented mapping for using and implementing services in the supported languages. One particularly useful feature that Thrift supports is the fine-grained service versioning. Thrift is able to distinguish and handle gracefully differences in the version of every field of complex data structures and every parameter in a service function. While Thrift is very well suited to the particular problem domain for which it was developed it does not provide all the mechanics required for an AmI middleware. Most importantly, it lacks the ability to use service references as first-class values and does not implement core infrastructure services such as Naming and Notification. The
Middleware for Ambient Intelligence Environments
175
absence of a Notification service makes it impossible for services to notify clients asynchronously2 about the occurrence of an event. Overall, while Thrift supersedes CORBA in terms of simplicity, efficiency and interface versioning, it lacks CORBA’s large feature-set, flexibility, maturity and robustness. Table 5 summarizes the key features of Thrift against the five requirements set in section 2.2. Table 5. Summary of Thrift’s features Heterogeneity Communication Resilience
Security Ease of use
Supports multiple programming languages Synchronous request/response. Asynchronous communication can be achieved but is not standard Depends on the underlying communication protocol and does not provide any mechanisms for supporting the implementation of resilience Ability to use encrypted communication channels Natural and intuitive usage for each target language
3.5 Etch Etch [16], which was originally derived from work on the Cisco Unified Application Environment [17], is a cross-platform, language- and transport-independent framework for building and consuming network services. Etch implements a Network Service Description Language (NSDL) which separates the description of a service from its actual implementation in the target language. The processing of an NSDL service description, allows client code to implement and invoke a distributed service as if it were a local object of the target language. However, like Thrift, Etch is not a pure object-based middleware as it cannot use a service object as the return value or parameter of a method. Nevertheless, it offers support for two-way communication between a service and its clients and simplifies security management by enabling connection authorization directives to be specified in NDSL. By supporting two-way communication, Etch is able to support effectively synchronous request/response and asynchronous communication. As far as standardized services are concerned, Etch currently provides a Naming Service for discovering deployed services and a Router Service for fault-tolerance and load balancing. The features that are planned for Etch provide all the functionality that we consider essential for the implementation of an AmI middleware. However, in its present release (version 1.0), Etch is incomplete. Although it offers most of the aforementioned functionality, it does so supporting only Java and .NET (C#) for implementing and consuming services. When its specification is fully implemented, offering the planned functionality for more programming languages, Etch will constitute an effective and efficient platform for an AmI communication middleware. Table 6 summarizes the key features of Etch against the five requirements set in section 2.2.
2
Thrift’s async keyword for qualifying a service’s function is equivalent to the oneway qualifier in CORBA which essentially makes the call of the function non-blocking for the client code (i.e. fire-and-forget).
176
Y. Georgalis, D. Grammenos, and C. Stephanidis Table 6. Summary of Etch’s features Heterogeneity Communication Resilience Security Ease of use
Currently (version 1.0) supports only two languages with more to come in subsequent versions Synchronous request/response; explicit support for asynchronous type-safe communication Provides a Router service that can be used for service replication Ability to use encrypted communication channels and also provides support for high-level authentication functions Natural and intuitive usage for each target language
4 Related Efforts The Amigo project [18] uses the OSGi framework [19] for implementing services in Java, and the .NET Web Services framework and tools for implementing services in .NET. Hydra [20] uses a Web Services-based approach with custom peer-to-peer (P2P) network technologies for creating and consuming services. The CHIL project [21] uses Smartspace Dataflow [22] and ChilFlow [23] for the integration of autonomic perceptual components and follows an agent-based approach for the implementation of high-level services using JADE [24]. Communication in JADE relies on Java Remote Method Invocation (RMI) for Java-based agents and on CORBA for agents running on different platforms. These efforts are still under development and support only a narrow range of platforms, as they target mainly Java and .NET-based AmI infrastructures.
5 Summary and Conclusions In this paper, we presented the basic requirements for an AmI communication middleware. Against these requirements, we evaluated a set of general-purpose communication technologies. Among these communication technologies, we found the Common Object Request Broker Architecture (CORBA) and the Internet Communications Engine (Ice) to be the most effective in providing the low-level building blocks for implementing a middleware for AmI environments. Both CORBA and Ice, provide a robust specification that has a very broad range of features that essentially make them independent of the target problem domain. They are sufficiently low-level so that specialized, high-level interaction patterns can be realized, and sufficiently high-level so that the need for tedious communication management and marshaling operations is alleviated. Despite the fact that a versatile object-based middleware constitutes an effective platform for an AmI communication middleware, it is apparent that request/response communication is not always suitable. Most importantly, it is neither efficient nor effective for streaming large amounts of continuous data, e.g. video or audio. Hence, one final issue to note is that an AmI middleware should also utilize a separate MOM communication platform (e.g., ChilFlow) for streaming data while maintaining an object-based core for creating and controlling data streams.
Middleware for Ambient Intelligence Environments
177
Acknowledgements. This work has been supported by the FORTH-ICS internal RTD programme “AmI: Ambient Intelligence Environments”.
References 1. IST Advisory Group 2003. Ambient Intelligence: From Vision to Reality, ftp:// ftp.cordis.lu/pub/ist/docs/istag-ist2003_consolidated_ report.pdf 2. Network Working Group. Request for Comments 2768, http://www.ietf.org/ rfc/rfc2768.txt 3. Object Management Group. The Common Object Request Broker: Architecture and Specification. Object Management Group, Framingham, Mass. (1998) 4. The Object Management Group (OMG), http://www.omg.org 5. Object Management Group. CORBAservices: Common Object Services Specification. Object Management Group, Framingham, Mass. (1997) 6. The ACE ORB (TAO), http://www.cs.wustl.edu/~schmidt/TAO.html 7. JacORB, http://www.jacorb.org 8. IIOP.NET, http://iiop-net.sourceforge.net 9. omniORB, http://omniorb.sourceforge.net 10. Henning, M.: A new approach to object-oriented middleware. IEEE Internet Computing 8, 66–75 (2004) 11. ZeroC, http://www.zeroc.com 12. GNU General Public License, http://www.gnu.org/copyleft/gpl.html 13. Simple Object Access Protocol (SOAP), http://www.w3.org/TR/soap 14. Slee, M., Agarwal, A., Kwiatkowski, M.: Thrift: Scalable Cross-Language Services Implementation 15. facebook, http://www.facebook.com 16. Etch, http://cwiki.apache.org/ETCH 17. Cisco Unified Application Environment, http://www.cisco.com/web/ developer/cuae 18. The Amigo project, http://www.hitech-projects.com/euprojects/amigo 19. OSGi Alliance. OSGi Service Platform Core Specification Release 4, http://www.osgi.org 20. The Hydra project, http://www.hydramiddleware.eu 21. The CHIL project, http://chil.server.de 22. The NIST Smart Space Project, http://www.nist.gov/smartspace 23. The ChilFlow System, http://www.ipd.uka.de/CHIL/projects/chilflow.php 24. Bellifemine, F., Poggi, A., Rimassa, G.: JADE – A FIPA-compliant agent framework
A Hybrid Approach for Recognizing ADLs and Care Activities Using Inertial Sensors and RFID Albert Hein and Thomas Kirste University of Rostock, Institute of Computer Science, Rostock, Germany {albert.hein,thomas.kirste}@uni-rostock.de
Abstract. In this paper we present a feasibility study regarding the recognition of high level daily living and care activities. We examine a hybrid discriminative and model based generative approach based on RFID and inertial sensor data. We show that the presented sensor configuration is able to deliver sensor readings and object sightings at a sufficient rate without forcing user compliance. We further evaluated the advantage of a model based approach over a static classifier, compared the individual contribution of each sensor type and could reach accuracy rates of 97% and 85%.
1 Introduction The elderly currently are the fastest growing demographic group in the USA [1], a development which is similar to other for example european regions due to increased life expectancy and decrease of the birth rate. So the aging human is more and more getting into the focus of not only demographic research. Graceful aging is a common demand of the current and future older population. The main motivation behind ambient assisted living is that 95% of the people wish to stay at home as long as possible. But the home care for elderly or sick people is a serious burden for family members, 30% of admissions to nursing homes are done not because of deterioration in the senior’s conduit but because of so called ”caregiver burnout”. An alternative to stationary treatment is the professional ambulant elderly care at home. Especially this kind of service needs accurate documentation of care activities to allow correct accounting for the health insurances. The usual documentation process is to this day still done manually and takes up to 40% of the working time, is error-prone and mostly inaccurate because it does not happen in situ but afterwards. Sensor based activity recognition is currently widely seen as an elemental technology for providing mobile assistance in AAL scenarios [2,3,4,5,6]. An ambient sensor infrastructure as in the iDorm [7] or the PlaceLab [8] is mostly unobstrusive, but (at least currently) too expensive and complex for home care settings where each apartment of every care patient would have to be equipped with environmental sensors. Wearable sensors as an alternative are already showing good results in human activity recognition, either through direct motion sampling [9,5] or through capturing object interactions [2]. In reality there are often ambiguities between different activities sharing the same motions or gestures like carrying a glass of water or carrying a pillbox and also ambiguities C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 178–188, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Hybrid Approach for Recognizing ADLs and Care Activities
179
between different activities using the same objects, which is e.g. using an object or simply carrying it around. These two problems are not clearly distinguishable using only one of these methods. This paper combines those former two approaches of direct motion measurement with inertial sensors and detection of object interaction with RFID for high level activity recognition in order to conduct a feasibility study. Our system uses hierarchical sensor fusion on different levels of abstraction for simultaneously integrating many channels of heterogenous sensor data. For inferencing high level activities we use a layered hybrid discriminative and model based generative approach. This will enable us to integrate prior knowledge into the decision process in the future to reduce the amount of training while keeping the probabilistic model simple. This approach was evaluated on two experimental settings with one Activity of Daily Living (ADL) breakfast scenario and one home care scenario where the combined approach reached accuracy rates of 97% and 85% respectively. Our main objectives were to investigate whether such a combined sensor configuration is technically viable in terms of delivering reliable data without explicit compliance of the test subject, if a static classifier or a hybrid model is able to infer useable estimates on a continuous time trace and how much each sensor type actually contributes to the final classification results.
2 Problem Domain There are several often cited publications reporting good results in low level activity recognition like [9,5] while high level activities are commonly seen as problematic and this kind of research is still at its beginnings [4]. These high level activities are mostly difficult to discriminate because of several general challenges : – – – – – – – –
Interleaved or interrupted activities which are not executed sequentially Ambiguities between different activities sharing the same motions or gestures Ambiguities between different activities sharing common object handlings Variations in the activity performance between single or multiple subjects and distortions by uninvolved persons Different levels of complexity between elementary or compound activities Different levels of granularity between coarse motion and fine grained gestures Lack of representative training data containing examples even of special cases Highest possible unobtrusiveness by not requiring the user to explicitly handle or interact with the system
For this task usually different approaches are adopted ranging from simple model free pattern recognition methods to complex model based probabilistic approaches which are able to capture also causal and temporal dependencies (for a more general overview compare [10]). Depending on the chosen approach, the specific application and sensor data, furthermore methods for including prior knowledge and common sense (compare [11]) are needed for handling varying activity sequences without needing an exponentially growing amount of training data. Based on the different fields of application, three different complex application domains can be pointed out, which can but don’t need to build on each other:
180
A. Hein and T. Kirste
1. Activity Profiling The activity profiling task usually tries to answer the question how often and how long which activity is performed over a given period. So Activity Profiling is an acquisition of activity data in a lexical manner which allows simple qualitative and quantitative analyses of activities on an absolute scale. Any causal or temporal dependencies remain out of consideration. This application domain is already well explored for base level activities usually using statistical classifiers (e.g. [9,12]). 2. Behaviour Assessment Applications out of this field ask the question whether there are long-term variations in the daily routine or any temporal abnormal behavior. This can be seen as a syntactical analysis of changes in specific behaviour patterns and allows reasoning about mental and physical decay or hazardous situations. All those measurements are done on a relative scale, where temporal but no causal dependencies have to be considered during the recognition process. This task is pretty well explored for simple quantitative measurings of low level activities. On a qualitative level for activities of higher abstraction still only a few mainly connectionist [7] or model based approaches can be found. 3. Proactive Assistance For proactive assistance systems it is important to not only infer the plain context but also the current intention of the user. In analogy to the other two application domains this implies a semantic analysis of the activity trajectory which allows autonomous assistance and the prediction of future events. Such a system must consider both, causal and temporal dependencies of activities. Today in this field there are basically model based intention recognition approaches built upon environmental infrastructural sensors and location data [13]. In this work we are mainly concerned with the first application domain of activity profiling. In addition to prior work we focus on recognizing high level activities by additionally including temporal dependencies.
3 Activity Recognition System Overview Due to the very heterogeneous nature of the sensor data the underlying activity recognition system must be able to cope simultaneously with its very different characteristics to avoid any unintentional loss of any potentially important information during the recognition process. Therefor it must be able to handle nominal and numerical data with different time bases (discrete and event driven) of a varying number of sensors. Because of the good experience with statistical classifiers for activity recognition and their good performance but the need of temporal modelling and a comprehensable decision making process we have chosen a hybrid and layered approach (Fig. 1). While each processing layer makes use of sensor data on different levels of abstraction the sensor fusion process is separated hierarchically into several steps. The individual raw sensor data channels are being synchronized and then processed in the feature extraction module. This module calculates 562 different features from half overlapping windows of 1.28s consisting of frequency domain, statistical, curve,
A Hybrid Approach for Recognizing ADLs and Care Activities
181
Probabilistic Model
Classification
Feature Extraction
ACC1..n
GYRO1..n
RFID1..n
Fig. 1. Inference Process and Sensor Data Fusion
physical, correlation and step detection features (for a detailed explanation see [12]) and handled object and object class information. In this step for many features already several sensors and sensor axes are combined to more abstract representations (e.g. physical features or step detection, the broad white arrow) while in others single sensor channels are still contained seperately (the thin arrows). All generated features are processed by an embedded Weka C4.5 decision tree [14] which has often shown superior accuracy over comparable algorithms in context of human activity recognition in the past [9,6,12] and is also capable of handling both nominal and numerical data. In this step all sensor channels are merged and mapped onto the different class probabilities for each data window (broad grey arrow). The RFID features are mirrored and passed through into the probabilistic model (dotted line). For the model layer we have chosen a Hidden Markov Model (HMM) with one hidden state for each activity class, which is also a common choice for sensor/RFID based activity recognition [2,5,11]. They are robust to sensor noise, able to reflect complex temporal properties, are human readable and adjustable and have well known algorithms for inference, which also allow prediction. We use the HMM in conjunction with a particle filter. An example HMM for the simplest experimental setting is shown in Fig. 2. .997
.971
.993
.997
1
2
.001
0
.005
.999
.002
3
.001
Fig. 2. Simplified HMM state diagram for a model automatically generated by the recognition system (Setting 1, high level context). The four states represent breakfast (0), make coffee (1), talk (2) and clear table (3). Only transition probabilities greater than 0.001 are shown as arrows.
182
A. Hein and T. Kirste
The emission probabilities P (Ot |St ) for the observations Ot and the state St for each timestep t are calculated as the product of the probability of the sensor sightings given the current state P (acct |St ) (maximum likelihood estimation calculated from the Weka class probabilities at the current timestep) and the probabilities of the RFID object feature sightings (objects and object classes) for the current state P (rf idi |St ) (Eq. 1). P (Ot |St ) = P (acct |St )
N −1
P (rf idi |St )
(1)
i=0
A priori and transition probabilities of the model are entirely calculated from experimental data ground truth. This way no additional parameter learning with the expectation maximization algorithm is needed, which can be very time consuming. To avoid overfitting, biases for the model accuracy, the RFID reliability and the weka class probabilities (trust in sensor data) were set, which also allows the inclusion of expert’s domain knowledge. While the first two were adjusted manually, the latter was determined by model averaging for the weka model weighted with its cross validated classification accuracy and a set of rudimental models always predicting one class and their particular accuracy [15].
4 Sensors We used two types of wireless sensors for our recording which usually are treated separately in literature: Inertial Motion Sensors (IMU = Inertial Measurement Unit) and RFID. There have been two approaches in the recent past combining accelerometers for the direct measurement of human bodies’ motions with a wearable RFID reader for identifying handled objects – the work of Stikic et al. [16] who attached the Intel iBracelet and a custom-made sensor board to the dominant wrist, and Wang et al. [11] also using the iBracelet and a comparable mobile sensor platform and a similar setup. For our initial tests we were recording raw data with a probably higher number of sensors than actually required for the final appliance, so that it is possible to evaluate single sensor channels or combinations of subsets afterwards on the original sensor data later. For the inertial motion recordings we used three SparkFun IMU 6-DOF v3 sensor boards. These are equipped with a 3-axis Freescale MMA7260Q accelerometer, 3-axis InvenSende IDG300 gyroscopes and a 2-axis Honeywell HMC1043 magnetometer. The LPC2138 ARM7 microcontroller is also capable of preprocessing the raw data onboard. We used the IMU to sample relative motion and rotation at a rate of 50Hz at a range of 6g to fully capture normal human motion as described in [17]. The raw data was instantaneously transmitted via a class 1 bluetooth link with a max. operating distance of 30 to 100m. Because of the compact size (51x41x23mm) the board could be attached at unobtrusive positioned at the dominant wrist for recording gestures and object motion without the need of attaching a sensor directly to the handled objects, at the chest/upper back and at the hip. These sensor positions have been shown to operate well in the literature and in own prior work. RFID is a popular technology for contactless identification of objects. A basic system consists of a reader module with an antenna and several active or passive tags in the
A Hybrid Approach for Recognizing ADLs and Care Activities
183
Fig. 3. RFID Wrist Antenna and Inertial Sensor Board
form of small boxes, stickers or even implants. Especially passive RFID stickers are a cheap, battery free solution for reliable object detection. RFID practically doesn’t return any false positive object sightings. Because currently no wearable RFID modules are available commercially, we used a Texas Instruments S4100 multi function reader with a custom-made wrist antenna (see Fig. 3). Depending on object geometry and material it has a reading range between 10 and 30cm. For tagging the objects we used ISO 15693 Standard HF stickers in the two sizes 43x43mm and 18x36mm. The reader module was plugged to an external class 1 serial/bluetooth adaptor. All data streams were wirelessly transmitted to a laptop computer where they were immediately formatted, synchronized and saved to disk.
5 Experimental Setup The experiments were conducted to evaluate the feasibility of our combined approach for recognizing sufficiently realistic daily living (ADL) and health care activities. The chosen repertory only consists of compound activities of a high level of abstraction by trying to consider all of the challenges specified in Section 2 like ambiguities and interleaved activities. The general experimental setup is related to the household ADLs of Stikic et al., also with a comparable sensor equipment. Attempts like the PlaceLab [8] or iDorm [7] represent basically opposing approaches, because no environmental sensor infrastructure was used in the experiment. As we were interested in a general proof of concept, we were not doing tests out of the lab with authentic subjects inside a nursing home at this early stage. It is best practice to carry out initial experiments in a controllable environment under optimal observability. The test runs have additionally been accompanied by video and audio surveillance to facilitate later manual annotation. To avoid biasing, the test subjects were not involved in planning and setting up the experiment or analyzing the data afterwards in any way. Each subject was instructed to behave as natural as possible, to try to ignore the attached sensores and especially not to pay attention to the rfid labels on the objects. This strongly distinguished these experiments from others found in prior publications where the subjects were explicitly instructed to wait until the rfid reader has scanned the current object [2], which significantly increases the number of object sightings but is in conflict with our requirement
184
A. Hein and T. Kirste
not to assume specific user interaction. Therefor here also the tag placement was done by a person not involved any further with the experiment which, as a matter of fact, resulted in a high number of tag placements due to the lack of knowledge of the most relevant objects. We utilized two main experimental settings as follows: 5.1 Setting 1: Breakfast The breakfast ADL setting follows the example of Patterson et al. [2], who used RFID gloves for detecting routine morning activities. In our experiments also one sensor equipped subject and four other participants were involved. No particular actions or action sequences were initially scheduled to be performed during the test. The test utilized real equipment and real food and was conducted distributed over two rooms – an office and a kitchen. 118 RFID tags were placed on 55 different objects at several different positions. The course was observed by a fisheye camera, which was able to oversee the whole setting. Activities of two levels of granularity were recorded: the coarse high level abstract context with ”breakfast”, ”make coffee”, ”talk” and ”clear table” and more detailed actions consisting of 27 activity classes like ”make bread”, ”drink”, ”hand over”, ”carry”, ”stir” or ”collect dishes” which come closer to the work of [2]. Altogether 1h and 8min (590mb) of raw data have been sampled. 5.2 Setting 2: Home Care The home care setting is part of ”MArika”, a subproject of the current state research project ”Mobile Assistance” [18]. For this setting we roughly rebuilt the floor plan of an apartment consisting of a bedroom, a bathroom, a living room and a kichenette in our SmartLab. The test runs were performed by professional care personnel (a geriatric nurse, a student helped out as a patient). We put 43 tags on 33 objects again partially at different positions for increasing the probability of detection. The scenery was observed by a fisheye and a ceiling mounted dome camera. This time a general preselection of care activities was given, as this is common for a care plan. The test agenda and the scenario has been developed in close cooperation with a nursing service, which also provided authentic equipment for the tests. We have sampled two runs of an authentic sequence of morning care activities taken from a real person. The activities were directly taken from the service accounting catalogue of the health insurances: ”general service” (greeting, fetching newspaper, ...), ”big morning toilet” (including washing whole body, brushing teeth), ”micturition and defecation”, ”administration of medications”, ”injections”, ”bandaging”, ”preparation of food” and ”documentation”. We collected 14min (317mb) and 12 minutes (289mb) of raw data.
6 Results Our first objective was to determine the distribution of RFID object sightings with the given sensor configuration and without explicit user compliance. In the first setting 316 Tags were read during the experiment at an average rate of m = 12, 9s and a standard
A Hybrid Approach for Recognizing ADLs and Care Activities
185
Table 1. Comparision of the overall accuracies for the different experimental settings RFID IMU
both
Breakfast coarse C4.5 50.2% 87.3% 86.9% HMM 67.4% 97.7% 97.8% Breakfast fine
C4.5 62.5% 65.5% 67.4% HMM 65.8% 73.3% 76.7%
Care
C4.5 51.4% 52.2% 57.1% HMM 84.9% 80.3% 85.1%
deviation of σ = 33, 1s. The second setting 60 object sightings were detected at an average rate of m = 21, 5s and a standard deviation of σ = 43, 7s. While there were several bursts of frequent sensor readings there were also gaps up to several minutes. The three IMUs delivered an overall good performance without any data loss. The static classification accuracy was evaluated using a 10-fold stratified cross validation. The output class probabilities were used as part of the observations for the HMM as described in Sec. 3. The decoding of the HMM was done by sequential monte carlo filtering using a particle filter with 100000 particles. The overall accuracies for the different experimental settings are itemized in Table 1 while a continuous time trace is shown in Fig. 4 for the breakfast example. The decision tree already showed surprisingly good classification results on inertial sensor data, especially for the coarse high level breakfast activities, while, as expected, it was completely unusable on RFID data only (the apparently high recognition rate is misleading, it only predicted the activity class with the highest prior probability). The combination of both sensor types did not bring any significant advantages. In all cases the static classifier produced many temporal glitches. In general the results of the HMM were temporally much more smoothed, but also prone to a short delay on transitions between activity states which, in the given application domain of activity profiling, is not a drawback at all. Altough the RFID object sightings were very irregular, the model did relatively well on inferring the high level activities. Using only IMU data, the HMMs behaved primarily as a temporal smoother for the static classification probabilities, which resulted in a nearly perfect recognition rate (breakfast coarse) or at least significant increase (care setting). Due to its ”lethargic” behaviour, the model had problems estimating the short term activities in the second breakfast setting. The combination of both sensor types inside the HMM generally resulted in the highest recognition accuracy, but in detail the combined result is not significantly higher than the best single sensor type performance. General problems came up while disambiguating activities with both, shared motions and objects. In the care setting e.g. ”micturition and defecation” was mistaken for ”morning toilet”. No additional representative object sightings were detected during the test runs, although characteristic tags were present (”toilet chair”, ”toilet brush”, etc.). By contrast the coarse abstract activities in the first breakfast setting implied very distinct motion patterns, which lead to a nearly perfect classification performance.
186
A. Hein and T. Kirste
Activity
RFID
HMM
Activity
C 4.5
Activity
IMU
Time (seconds)
Activity
Time (seconds)
Activity
IMU + RFID
Time (seconds)
Activity
Time (seconds)
Time (seconds)
Time (seconds)
Truth Estimate
Fig. 4. Accuracy of Inference for Setting 1, high level context. The four activities represent breakfast (0), make coffee (1), talk (2) and clear table (3). Results of RFID only, IMU only and the combination of both are compared against each other for the C4.5 decision tree and the HMM.
7 Conclusions and Future Work The experiments were conducted as a feasibility study. Regarding our first main objective it could be shown that the presented combined sensor configuration can deliver sensor readings and object sightings at a sufficient rate without requiring explicit user compliance or interaction. Probably many of the sporadic gaps in the RFID data and missed objects can be avoided by not only instrumenting the dominant wrist, as many items were utilized by the other hand, too. For future experiments a second antenna / reader module has already been built. Apart from that different subsets of sensors will be evaluated in order to increase the wearing comfort and the unobtrusiveness. Our second finding is, that the hybrid discriminative and model based approach is basically able to infer high level daily living or care activities. In addition, the model layer in general significantly outperforms simple static classification using the C4.5 decision tree. Our approach was able to handle the main general challenges regarding the classification of abstract high level activities. A solution for ambiguous classes sharing motion patterns and objects could be breaking down compound activities into smaller atomic actions, which can then be used as building blocks in a multilevel model. The third objective of this work was to find out how much each sensor type can contribute to the recognition process. This question can not clearly be answered, although it seems that in case of reliable IMU based recognition the additional knowledge of
A Hybrid Approach for Recognizing ADLs and Care Activities
187
object handling does not bring noticable advantages. As this is a primary problem of weighting and biasing in respect of sensor fusion, further investigation is needed. As the presented classification results are based on very few experimental training data, they have to be treated carefully in respect of generalization. Future experiments will have to follow under real world conditions to allow a reliable evaluation of a comprehensive set of care activities and multiple test subjects. This is expected to provide more realistic results allowing an outlook on every day use. As generally very few training data is available for a high number of complex activities, additional modelling and the inclusion of expert’s domain knowledge is inevitable. So a more direct way to involve RFID events, model the average length of activities or integrate uncertainty is desireable. Probably other models than HMMs will allow more control. An automatic base model generation from simple task models, as known from software engineering, is currently under development.
Acknowledgements Marika [18] is funded by the state of Mecklenburg-Vorpommern, Germany within the scope of the LFS-MA project. The care experiments were supported by InformatikForum Rostock e.V..
References 1. Roush, R.E.: Smart home technology for aging in place longer and better. Technical report, Roy M. and Phyllis Gough Huffington Center on Aging (June 2004) 2. Patterson, D.J., Fox, D., Kautz, H., Philipose, M.: Fine-grained activity recognition by aggregating abstract object usage. ISWC, 44–51 (2005) 3. Smith, J.R., Fishkin, K.P., Jiang, B., Mamishev, A., Philipose, M., Rea, A.D., Roy, S., Sundara-Rajan, K.: Rfid-based techniques for human-activity detection. Commun. ACM 48(9), 39–44 (2005) 4. Hu`ynh, T., Blanke, U., Schiele, B.: Scalable recognition of daily activities with wearable sensors. In: Hightower, J., Schiele, B., Strang, T. (eds.) LoCA 2007. LNCS, vol. 4718, pp. 50–67. Springer, Heidelberg (2007) 5. Lester, J., Choudhury, T., Kern, N., Borriello, G., Hannaford, B.: A hybrid discriminative/generative approach for modeling human activities. In: Kaelbling, L.P., Saffiotti, A. (eds.) IJCAI, pp. 766–772. Professional Book Center (2005) 6. Parkka, J., Ermes, M., Korpipaa, P., Mantyjarvi, J., Peltola, J., Korhonen, I.: Activity classification using realistic data from wearable sensors. IEEE Transactions on Information Technology in Biomedicine 10(1), 119–128 (2006) 7. Rivera-illingworth, F., Callaghan, V., Hagras, H.: Detection Of Normal and Novel Behaviour. In: Ubiquitous Domestic Environments. The Computer Journal (2007) bxm078 8. Hightower, J., LaMarca, A., Smith, I.E.: Practical lessons from place lab. IEEE Pervasive Computing 5(3), 32–39 (2006) 9. Bao, L., Intille, S.S.: Activity recognition from user-annotated acceleration data. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 1–17. Springer, Heidelberg (2004) 10. Hein, A., Kirste, T.: Activity recognition for ambient assisted living: Potential and challenges. In: Ambient Assisted Living Ambient Assisted Living Ambient Assisted Living, pp. 263–268. VDE Verlag (January 2008)
188
A. Hein and T. Kirste
11. Wang, S., Pentney, W., Popescu, A.M., Choudhury, T., Philipose, M.: Common sense based joint training of human activity recognizers. In: Veloso, M.M. (ed.) IJCAI, pp. 2237–2242 (2007) 12. Hein, A.: Echtzeitf¨ahige merkmalsgewinnung von beschleunigungswerten und klassifikation von zyklischen bewegungen. Master’s thesis, University of Rostock (November 2007) 13. Giersich, M., Forbrig, P., Fuchs, G., Kirste, T., Reichart, D., Schumann, H.: Towards an Integrated Approach for Task Modeling and Human Behavior Recognition. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 1109–1118. Springer, Heidelberg (2007) 14. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers Inc, San Francisco (2005) 15. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, Heidelberg (2001) 16. Stikic, M., Huynh, T., Van Laerhoven, K., Schiele, B.: Adl recognition based on the combination of rfid and accelerometer sensing. In: 2nd International Conference on Pervasive Computing Technologies for Healthcare 2008 (2008) 17. Bouten, C., Koekkoek, K., Verduin, M., Kodde, R., Janssen, J.: A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE Transactions on Biomedical Engineering 44(3), 136–147 (1997) 18. Landesforschungsschwerpunkt (April 2009), http://marika.lfs-ma.de/
Towards Universal Access to Home Monitoring for Assisted Living Environment Rezwan Islam1, Sheikh I. Ahamed2, Chowdhury S. Hasan2, and Mohammad Tanviruzzaman2 1
Marshfield Clinic, 3501 Cranberry Blvd, Weston, WI 54476, USA
[email protected] 2 MSCS Dept., Marquette University,1313 W Wisconsin Ave., Milwaukee, WI 53233, USA {iq,chasan,mtanviru}@mscs.mu.edu
Abstract. The improvement of the conditions of daily life at home and work, promoted by the socio-economic progression, best quality private living environments and the immense development in healthcare and biomedical technologies has extended the average age of life beyond 70. According to recent surveys this “population aging” phenomenon will contribute to reach the record number of 1 billion people over 60 years on earth by the year 2020. Due to a variety of reasons such as convenience or a need for security and privacy these elderly people generally prefer to avail healthcare facilities at their home. This is time to break through the physical boundaries of hospitals and bring healthcare facilities to the homes. Wireless and internet-based healthcare devices can play a vital role in this regard given that reliable, individualized systems with user-friendly interfaces are developed to enable elderly people feel comfortable with making use of novel technology. This paper presents a remote home monitoring application called Living Assistant that could be utilized to continuously monitor and control a wide range of electronic appliances and ambient parameters. Basically it has been designed to function as a healthcare aide for elderly patients suffering from restricted mobility or other chronic diseases. The advanced ways of user interfaces presented in this paper are simple, generic and universally applicable. With little customization the application can be used to accommodate other user groups as well. Keywords: Universal access, Home monitoring, Smart home, Assisted living, Living Assistant, Elderly people, Technology-enhanced learning, TinyOS.
1 Introduction During the last decade, the size of elderly population has shown noteworthy growth, especially in the developed countries. Carrying out daily tasks at home becomes difficult or impossible for elderly persons with restricted mobility capabilities. Besides, movement in (out)-doors requires a third-party’s assistance [30, 31]. Yet, these elderly people clearly prefer independent living to institutionalization [28, 29]. At same time, they exhibit ever-increasing tendency towards leading an isolated life away from their offspring. In this context, conceiving technologies for increasing their C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 189–198, 2009. © Springer-Verlag Berlin Heidelberg 2009
190
R. Islam et al.
autonomy, so as to enable them to self-manage their life is of utmost importance. Furthermore, a safe, convenient, sound and healthy living environment is the prerequisite for a good house for the elderly people with special needs. Alongside the overwhelming spread of computer and internet technology the volume of the market for mobile handheld devices (cell phone, PDA, smart phone etc.) is significantly increasing. In addition, recent advances in sensor networks and wireless mobile technologies, such as Bluetooth, WiFi, Zigbee [1] etc. has resolved the complicacies of developing applications constrained by the mobility of users. A wide variety of sensors are becoming inexpensive and readily available. With the objective of improving quality of lifestyle of modern people, current research works deal with the development of digital home monitoring system using wireless sensors. Despite the breakthroughs in the technological aspects, the relatively slow adoption of such systems indicates that there are certain factors restricting their acceptance and use [28, 29]. For example, most available home-care systems monitor the health of individuals suffering from chronic diseases such as heart disease, lung disorders, diabetes, etc. Most of them are costly and include health monitoring equipments which are difficult to use. Less attention has been paid to monitoring and maintaining the personal wellness of the elderly people to enable them to live a normal life. In this paper, we present the design and implementation of an affordable solution to provide remote home monitoring that allows users to monitor and control elements of their home from a remote device. Our proposed solution, Living Assistant (LA), allows individuals to monitor and control electronic appliances in their homes while they are away using remote devices such as cell phones, PDAs, and laptop computers. We have conducted necessary research to explore the field of home monitoring and developed a concept that would serve as an efficient universal home monitoring system. We have built a prototype of the LA that allows remote control of electric appliances via a webpage, electronic switch, and Zigbee-enabled Tmote-brand sensors [23, 27]. The website and server prototype enable the users to control the system remotely. Major objective of our application is to provide the means for improving the quality of life of elderly people at home by developing generic technologies for managing their domestic ambient environment, home automation systems with the aim to increase their autonomy and safety. In order to accomplish this goal LA aims at providing universal access [18] to a set of electronic devices and couple of ambient parameters. Such omnipresent access to information technology plays an extremely important part in the context of technology-enhanced learning (TEL) [22]. Our system proposes an enhanced interface technology which is customizable and adaptable to be used in any context. The advanced ways of user interfaces presented in this paper are simple, generic and universally applicable. Therefore, it should be easy for educators to adopt the system and apply it in their own contexts. Our application provides advanced ways of user interfaces which are simple, generic and universally applicable. These interfaces can be used to build customized and sophisticated healthcare tools as well. Inclusion of a number of biosensors will facilitate that. The existing prototype is easily extensible to accomplish that purpose. Such a system would be useful for healthcare professionals in real time monitoring of fall detection, sleep monitoring, pulse monitoring etc. of an elderly patient. The rest of the paper is organized as follows. Section 2 presents several scenarios where our application provides the perfect solution. Characteristics and functionalities
Towards Universal Access to Home Monitoring for Assisted Living Environment
191
are discussed in Section 3 and Section 4 respectively. We discuss our application in detail in Section 5. Usability of our proposed system in TEL is presented in Section 6. Section 7 focuses on the related works and finally we conclude in Section 8.
2 Motivation Scenario 1: Mr. Jones goes out for a walk in the evening as usual to do some light exercise. When he reached the nearby park, he could remember that he forgot to switch off the electric oven at his home. He gets worried as there was nobody at home to turn it off immediately to get rid of any mishap. In such situation, he uses the Living Assistant application from his PDA as he is a subscribed user. After logging on to the server Mr. Jones instantly gets access to the home monitoring system which displays status of the electrical devices. Then using the system he switches off the oven and be assured that it is turned off properly. At the age of above 65, many people like Mr. Jones suffer from such memory loss and frequently forget to properly handle the electronic devices of their daily life. They find the LA as a handy tool. Scenario 2: Mr. Hughes, another elderly customer, is on his way to home after meeting some of his relatives. It is about half an hour’s journey. When he starts driving, he asks his Living Assistant running in his PDA for the scheduled report which shows that temperature is running low at his home and reminds him to set the heating system on. Mr. Hughes turns on the heating system and sets it at a specific temperature. By the time he reaches home, he finds it at the desired temperature. Thus our system assists him in daily life activity and helps bring down electrical cost (utility bills) by reducing power consumption as well. Scenario 3: Mr. Hughes cannot attend his elderly parents regularly as they live quite far from his house. However, using the LA he can extend the level of attention paid to his parents as it allows him to monitor their living conditions frequently. Every night he makes sure that the living rooms are at proper temperature and lights are switched off. Sometimes his father falls asleep while reading books and keeps lights on. Mr. Glenn can easily switch off the light remotely using LA. Situations like these require the need for applications that can assist elderly people to monitor and control electronic devices and certain ambient parameters as well. The Living Assistant application will provide its users with such enhanced facilities.
3 Characteristics of Living Assistant The Living Assistant application aims at improving the lifestyle of modern people by allowing them interacting with their home ambience more conveniently. Proper handling of a number of challenging issues is essential for perfect accomplishment of this purpose. These are: Safety of the User: As the application deals with remote handling of electrical appliances, ensuring safety of the environment is of utmost importance. The process of distant regulation of a device should be safe enough for the home environment such
192
R. Islam et al.
that it does not cause electrical shocks, short-circuits, fires etc. Besides, immediate actions should be taken regarding any malfunctioning behavior of a device to safeguard major mishaps. Accuracy and Precision: The actions taken by a user from a remote place should be taken care of accurately with highest attainable precision. Lack of accuracy may allow the devices operate in a way not intended by the user and it may endanger the home environment. Likewise, operating conditions of the devices should be controlled precisely to attain a task done smoothly. Device Diversity: Modern home environment comprises wide variety of electrical devices. These devices may differ in their operating conditions such as driving power, voltage, current etc. The application should be adaptable to such diversity. In particular, the controlling hardware (switch) should be capable of driving on appliances that operate at a wide range of electrical power, voltage, current etc. Responsiveness: Actions taken by a remote user should be responded within a reasonable period of time. Slow response time degrade user’s perception about the system. Portability: The hardware unit consisting of a switch and set of sensors acts as the data collection unit of the application. In order to widen the usage of our system it is necessary that the hardware unit is lightweight and occupies less space to fit in different locations throughout a home or office environment. Such portability will facilitate multi-purpose uses of the same hardware unit distributed with the application package. Customizability: Different users may choose different combinations of electrical devices to put under continuous monitoring and regulation. Preference of a specific user may also vary depending on the weather parameters like temperature, humidity of surrounding environment of her residence. Users should have options to set alert level for the devices to be monitored. In order to provide all these facilities the application should be customizable. Energy Efficiency: The need for continuous monitoring keeps the Living Assistant application (hardware and software units) running all the time. So, it should consume low electrical power. It is better if the controlling switch and the sensors are supplied power from the server PC to eliminate usage of external power source. Security, Authenticity and Integrity of Data: In order to avoid false data all tiny sensors must be authenticated before data can be treated as reliable. Security and reliability of the data are highly dependent on the authentication mechanism. There should be integrity among all these data and by checking this integrity the system will be able to detect any anomalous situation such as faulty/malfunctioning device, erroneous information from any source. Universal and Ceaseless Access: The entire system must provide uninterrupted connectivity between the remote PC or handheld device of the users and the home monitoring system.
Towards Universal Access to Home Monitoring for Assisted Living Environment
193
User-friendly Multimodal Interface and Minimal Interaction: Entire user interface must be simple, self-explanatory and easy to learn and use. The system with its focus on e-learning and training should provide multimodal interfaces which enable the inclusion of a number of human senses. The data should be displayed in a way that requires minimal interaction on the user’s part.
4 Functionalities Home Monitoring: Continuously monitoring the statuses of electrical devices and certain environmental parameters of the home is the basic task of LA. A number of mote sensors are placed in the rooms to collect data regarding temperature, humidity, light, sound, smoke etc. There is a special purpose switch which works with a device selection unit to monitor and control on/off state or operating level of any electrical device. Data from the sensors and the switch are sent to a server connected to the internet. Data Display through Multimodal Interface: An authorized user can log on to the website at anytime, from any place and using a PC, laptop or PDA. The server module of LA displays status of home appliances/devices, doors, windows and values of selected ambient parameters graphically. The system interacts with the user through an interface having iconic representations of appliances/devices, doors, windows, rooms etc. The multimodal interface is customizable based on user’s preference. It requires minimal typed input from the user and is easy to learn and use. Ubiquitous Access: One of the major goals of the LA is to ensure universal access to home monitoring system. It is accessible from any phone, PDA or computer connected to the internet. There is no constraint on the software or hardware platform of the client. Scheduled Reports and Emergency Alarms: LA can schedule reports on home status at periodic times with specified intervals. Besides, fixed thresholds can be set up to check for an emergency situation (according to gathered data) and generate an alarm. The authorized users can also be immediately notified and can take measures before it is too late.
5 Living Assistant: Design and Development The major steps of developing our entire application start with creating an interface on the web that mobile devices can access at anytime from any place and to have a TMote sensor-based monitoring system that can control the flow of electricity to home electric appliances. Finally we enable the web interface to send signals to the TMote monitoring system, effectively allowing the user to monitor and control electric appliances at home while they are away. The entire system comprises a hardware unit and a software unit. Major part of the hardware unit is an analog control switch which controls the electronic device connected to it. There is a TMote which works as
194
R. Islam et al.
wireless data transceiver to transfer data to and from the server. The overall hardware interface will resemble the picture of Fig. 1. As shown in this figure any other electronic device can be attached to the switch and using a device selection unit a specific device can easily be chosen.
Internet Web Page Web Server Database of Users User Input Device Laptop, Smart Phone, or PDA
Tmote (Wireless Transceiver) Home Computer Running Living Assistant Software Analog Switch
Lamp
Power Outlet (12V AC)
Fig. 1. Hardware Unit
Fig. 2. Software Unit
Fig. 2 presents the architecture of the software unit of LA. Major components are TMote sensors, a home computer and a web server. Status of a device connected to the analog switch is stored in the home computer. A user accesses this data by logging on to the website through internet. Prototype Description: The prototype version of LA is built using Visual Studio 2008, nesC, TinyOS [28] and JDK 1.6. Using HTML with JavaScript for our web architecture ensures that our system functions on most devices and still allows active interaction with the client monitoring and control software. We use switching relays and a single plug interface for our electrical hardware to promote easy and inexpensive prototype development and construction. The web page accessed by the user through a PDA, Smart Phone, or Computer with internet connection, is programmed using HTML and JavaScript. The HTML acts as a wrapper to display the content to the user while the JavaScript is used to pull information from the user (input) as well as give output to the user. The user input is sent to an APACHE web server which hosts a database of customers as well as command information which is relayed to a Home Computer running Living Assistant Software. The Living Assistant system sends information to a TMote sensor to carry out a user specified task. The diagram below shows the user interface of the system. Evaluation: The user experience and opinion of the Living Assistant application has been examined by means of cognitive walkthrough among people from various age groups. The survey included 24 people of three different age groups with a questionnaire about the features of the application. The questionnaire contained questions about the usability of the prototype and the overall importance of certain concepts related to LA. Fig. 4 exhibits the results of the survey. The category being considered in Fig. 4(a) primarily covers data confidentiality and privacy issues along with the user friendliness and responsiveness of the application.
Towards Universal Access to Home Monitoring for Assisted Living Environment
195
Fig. 3. Software Interface of the Living Assistant Application 18-35
5.0
35-45
45-55
18-35
5.0
4.5
4.5
4.0
4.0
3.5
3.5
3.0
3.0
2.5
2.5
35-45
45-55
2.0
2.0
1.5
1.5
1.0
1.0
0.5
0.5
0.0
0.0 Security
Us er Friendlines s Res po ns ivenes s
(a)
P rivacy
Co nnectivity
Data Input
Navigatio n
Data Lo o k and Fe el Re pres entatio n
Overall
(b)
Fig. 4. Survey data collected on the Living Assistant application
From the graph it is evident that a user friendly interface is the most important and users seem to be less worried about security, especially in the higher age group. The usability category, as shown in Fig. 4(b), reveals that the prototype requires enhancement in navigation, data representation and visual style.
6 Usability of LA in Technology-Enhanced Learning In this section we discuss how our proposed application facilitates technology-enhanced learning. Technology-enhanced learning aims at supporting a learner-centered mode by offering several means of engaging the participants-both learners and educators more actively [22]. In the context of learner-centered orientation and technology support, experiential learning is encouraged as this style is more strongly self initiated and self organized. The experiential learning style can substantially be supported by the use of our proposed application as it enables provision of personalized learning objects in the area of versatile collection of electronic devices. Our application provides various advanced options for computer mediated communication as well. Features of LA are inclusive in the sense that there are no constraints regarding potential teachers, learners and organizations. They can be used in and adapted to any educational context. LA interfaces are adaptable and extensible by their very nature. They describe generic interactions that have been abstracted from application-specific
196
R. Islam et al.
or organization-specific implementations. LA interfaces are easy to use given the learning process is sufficiently transparent, emphatic towards the learners familiar with basic underlying pedagogical concepts. User involvement is inherent in technology-enhanced learning [22]. During the entire development process of the application we ensured user involvement by gathering their perceptions and feedbacks in relation to our proposed system.
7 Related Works Telemedicine and remote monitoring of patients at home are gaining higher urgency and importance [3, 4]. In [2], the in-house movements of elderly people are monitored by placing infrared sensors in each room of their homes. While such a method may not be as obviously intrusive as using cameras, still it intrudes into the privacy of the person. Ahamed et al. discusses the challenges of developing Wellness Assistant (WA) [10], software which can be used by people with obesity, diabetes, or high blood pressure, conditions which need constant monitoring. In [13] they provide the details of another application 'Healthcare Aide'. A similar software called Wellness Monitor (WM) is presented in [11] which facilitates continuous follow-up of cancer patients. In [3] the new possibilities for home care and monitoring are described using wireless micro sensors. Regular patient monitoring using personal area networks of wireless intelligent sensors is reported in [4]. The development of care support system to monitor the overall health of residents who need constant care has been reported in [5]. The Home Heartbeat [6] is commercial product developed by Eaton [6] with assistance from MAYA design [7]. Home Heartbeat uses wireless sensors to determine if windows or doors are open, which devices/appliances are on, if there is water in the basement, and so forth. Gaddam et al. proposed the development of a wireless sensors based home monitoring system especially for elder people [8]. All these systems have the purpose of monitoring a patient remotely or taking care of elderly people. These systems do not facilitate universal access as they target a particular user group. Most of these systems are not even affordable by common people. The relationship between services, spaces and users in the context of a smart home is analyzed in [9]. Then they propose a framework as well as a corresponding algorithm to model their interaction. [14] presents an omnipresent customizable service which can be used by different types of users from different fields such as education, healthcare, marketing, or business, at any time, and at any place. In [15] Mileo et al. describe an intelligent home environment in which modern wireless sensor network technologies allow constant monitoring of a patient in a context-aware setting. Some recent works in relation to ambient assisted living is reported in [16] which refers to electronic environments that are sensitive and responsive to the presence of people and provide assistive propositions for maintaining an independent lifestyle. Similar works presented in [17] aims at producing technological and media support to help elderly people to stay at their homes longer. This paper is about challenges of technological and media innovations concerning quality of life of elderly people. These works propose advanced interface mechanisms and other techniques to assist lifestyle but the approaches are too specific to be used in technology-enhanced learning. In [19, 20, 21] the issue of technology-enhanced learning is addressed and several advanced techniques are proposed for enhancing collaborative learning.
Towards Universal Access to Home Monitoring for Assisted Living Environment
197
8 Conclusion This paper has presented a remote home monitoring application called Living Assistant intended to facilitate the monitoring and controlling of a set of electronic devices at any time from any place. Although the application is basically designed to function as a healthcare aide for elderly patients suffering from restricted mobility or other chronic diseases, the software and hardware units are developed to be open, flexible and customizable for different user groups and adaptable to allow for a variety of electrical devices and many different types of sensors to monitor countless conditions regarding home ambience. We have built an analog control switch and a software module for interfacing with the system. The system enables users to access the system universally. Couple of significant challenging research issues like privacy, reliability and multimodal user interface capable of covering all human senses are at the focus of our future work. Features of the system proposed in this paper adequately match with the ultimate goal of technology-enhanced learning which is to bring clear benefits and increased value for the end users. Acknowledgements. We thank Salman Gill, Shawn Kasel, Ryan Ozechowski, Kendall Smith and members of Ubicomp Lab, Marquette University, particularly Md. Sazzad Hossain and Ian Obermiller, for their valuable contribution in implementing the prototype application. We also thank the users who participated in our study.
References 1. ZigBee Alliance – Our Mission (2007), http://www.zigbee.org/en/about/faq.asp#4 2. Tapus, A., Mataric, M.J., Scassellati, B.: Socially Assistive Robotics. IEEE Robotics & Automation Magazine, 35–42 (2007) 3. Jovanov, E., Raskovic, D., Price, J., Chapman, J., Moore, A., Krishnamurthy, A.: Patient Monitoring Using Personal Area Networks of Wireless Intelligent Sensors. In: Biomedical Sciences Instrumentation, pp. 373–378 (2001) 4. Dittmar, A., Axisa, F., Delhomme, G., Gehin, C.: New concepts and technologies in home care and ambulatory monitoring. In: Stud. Health Technol. Inform., pp. 9–35 (2004) 5. Maki, H., Yonczawa, Y., Ogawa, H., Hahn, A.W., Caldwell, W.M.: A welfare facility resident care support system. In: Biomedical Sciences Instrumentation, pp. 480–483 (2004) 6. MAYA Design, http://www.maya.com 7. Eaton Corporation, http://www.eaton.com 8. Gaddam, A., Mukhopadhyay, S.C., Gupta, G.S.: Development of a Bed Sensor for an Integrated Digital Home Monitoring System. In: IEEE Workshop on Medical Measurements and Applications, pp. 33–38 (2008) 9. Wu, C.L., Fu, L.C.: A Human-System Interaction Framework and Algorithm for UbiComp-Based Smart Home. In: HSI 2008, Poland (2008) 10. Ahamed, S.I., Haque, M.M., Stamm, K., Khan, A.J.: Wellness Assiatant: A Virtual Wellness Assistant using Pervasive Computing. In: SAC 2007, Seoul, Korea (2007) 11. Islam, R., Ahamed, S.I., Talukder, N., Obermiller, I.: Usability of mobile computing technologies to assist cancer patients. In: Holzinger, A. (ed.) USAB 2007. LNCS, vol. 4799, pp. 227–240. Springer, Heidelberg (2007)
198
R. Islam et al.
12. Ahmed, S., Sharmin, M., Ahamed, S.I.: GETS (Generic, Efficient, Transparent, and Secured) Self-healing Service for Pervasive Computing Applications. International Journal of Network Security 4(3), 271–281 (2007) 13. Ahamed, S.I., Sharmin, M., Ahmed, S., Haque, M.M., Khan, A.J.: Design and Implementation of A Virtual Assistant for Healthcare Professionals Using Pervasive Computing Technologies. Journal Springer e&i 123(4), 112–120 (2006) 14. Ahmed, S., Sharmin, M., Ahamed, S.I.: Ubi-App: A Ubiquitous Application using Ubicomp Assistant (UA) Service of MARKS for Universal Access from Handheld Devices. In: UAIS, pp. 273–283 (2008) 15. Mileo, A., Merico, D., Bisiani, R.: Wireless sensor networks supporting context-aware reasoning in assisted living. In: Proceedings of the 1st international conference on Pervasive Technologies Related to Assistive Environments, Greece (2008) 16. Ruyter, B.D., Pelgrim, E.: Ambient assisted-living research in carelab. In: ACM Interactions (Special issue on Designing for seniors: innovations for graying times) (2007) 17. Fuchsberger, M.G.: Ambient Assisted Living: Elderly People’s Needs and How to Face Them. In: Proceeding of the 1st ACM international workshop on Semantic ambient media experiences, British Columbia, Canada, pp. 21–24 (2008) 18. Holzinger, A.: Universal access to technology-enhanced learning. Springer Universal Access in the Information Society International Journal, 195–197 (2008) 19. Kleinberger, T., Holzinger, A., Müller, P.: Adaptive Multimedia Presentations enabling Universal Access in Technology Enhanced Situational Learning. Springer Universal Access in Information Society International Journal, 223–245 (2008) 20. Ebner, M., Kickmeier-Rust, M., Holzinger, A.: Utilizing Wiki-Systems in higher education classes: a chance for universal access? Springer Universal Access in Information Society International Journal, 199–207 (2008) 21. Holzinger, A., Kickmeier-Rust, M., Albert, D.: Dynamic Media in Computer Science Education; Content Complexity and Learning Performance: Is Less More? Educational Technology & Society, 279–290 (2008) 22. Motschnig-Pitrik, R., Derntl, M.: Three scenarios on enhancing learning by providing universal access. In: UAIS, pp. 247–258 (2008) 23. "mote." Computer Desktop Encyclopedia. Computer Language Company Inc. (2007) 24. Ubiquitous Computing, http://www.ubiq.com/ubicomp/ 25. AT&T Remote Monitor. AT&T (2007), https://www.attrm.com/attrm07_hm_kps.htm 26. Sentilla Pervasive Computing Solutions, http://www.sentilla.com/ 27. TMote datasheet, http://www.sentilla.com/pdf/eol/tmote-sky-datasheet.pdf 28. Velentzas, R., Marsh, A., Min, G.: Wireless Connected Home with Integrated Secure Healthcare Services for Elderly People. In: PETRA 2008, Greece (2008) 29. Vergados, D., Alevizos, A., Caragiozidis, M.: Intelligent Services for Assisting Independent Living of Elderly People at Home. In: PETRA 2008, Greece (2008) 30. Rialle, V., Lamy, J.B., Noury, N., Bajolle, L.: Telemonitoring of patients at home: a software approach. Elsevier Journal on Computer methods and programs in biomedicine, 257– 268 (2003) 31. Lee, R., Chen, H., Lin, C., Chang, K., Chen, J.: Home Telecare System using Cable Television Plants – An Experimental Field Trial. IEEE Transactions on Information Technology in Biomedicine, 37–43 (2000)
An Approach to and Evaluations of Assisted Living Systems Using Ambient Intelligence for Emergency Monitoring and Prevention Thomas Kleinberger1, Andreas Jedlitschka1, Holger Storf1, Silke Steinbach-Nordmann1, and Stephan Prueckner2 1
Fraunhofer Institute Experimental Software Engineering Fraunhofer-Platz 1, 67663 Kaiserslautern, Germany
[email protected],
[email protected],
[email protected],
[email protected] 2 Department of Anaesthesiology and Emergency Medicine Westpfalz-Klinikum GmbH, Hellmut-Hartert-Strasse 1, 67655 Kaiserslautern, Germany
[email protected]
Abstract. Ambient Assisted Living (AAL) is currently one of the important research and development areas, where software engineering aspects play a significant role. The goal of AAL solutions is to apply ambient intelligence technologies to enable people with specific needs to continue to live in their preferred environments. This paper presents an approach and several evaluations for emergency monitoring applications. Experiments in a laboratory setting were performed to evaluate the accuracy of recognizing Activities of Daily Living (ADL). The results show that it is possible to detect ADLs with an accuracy of 92% on average. Hence, we conclude that it is possible to support elderly people in staying longer in their homes by autonomously detecting emergencies on the basis of ADL recognition. Keywords: Ambient Assisted Living, Emergency Monitoring, Experiments.
1 Introduction In most industrialized countries, demographical, structural, and societal trends tend towards an increase in the number of elderly people and single households, which has dramatic effects on public and private health care, emergency medical services, and the individuals themselves. The increasing average age of the total population, and the resulting rise in chronic diseases, will result in a dramatic increase in emergency situations and missions within the coming years. The practical problems arising in preclinical emergency medical treatment and the central role of context factors, such as social isolation, reduced mental capabilities, and the resulting need for help, were examined in an epidemiological study conducted in the framework of the EC-funded emergency monitoring and prevention project EMERGE [1]. Results show that C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 199–208, 2009. © Springer-Verlag Berlin Heidelberg 2009
200
T. Kleinberger et al.
approx. 44% of Emergency Medical Services’ (EMS) system resources are dedicated to patients older than 70 years of age. In the long term, the consequence will be higher costs for such services or a decrease in service quality, or both. Assisted living solutions for elderly people using ambient intelligence technology can help to cope with this trend, by providing proactive and situation-aware assistance to sustain the autonomy of the elderly and by helping to limit cost increase, while concurrently providing advantages for the affected people by enhancing their quality of life. The goal is to enable elderly people to live longer in their preferred environments, to enhance the quality of their lives, and to reduce the costs for society and public health systems. Today’s commercially available products for emergency monitoring already use a broad range of modern technology, such as Personal Emergency Response Systems (PERS) (e.g., necklaces with emergency buttons, fall sensors in mobile phones with wireless notification of emergency services, vital data sensor tapes, etc.). However, they are mostly closed, stand-alone systems with a limited ability to describe the actual situation, often just too difficult for the elderly people to handle, and useless in emergencies. During our study in EMERGE, we observed that only 3% of the affected people in emergency situations over 65 years have a PERS at hand and, more importantly, the PERS was used only in 1.3% of the cases to report the incident. Reasons for this may include forgetting to wear the PERS or being unable to use it at all after an incident has happened. Still, assisted living systems promise a huge potential for disabled and elderly people. In this paper, we present an approach as well as evaluation results in a near real-life environment (Ambient Assisted Living Apartment) for emergency monitoring applications and corresponding user interaction components resulting from two research projects: EMERGE and BelAmI [2]. The impact of the developments in both projects has been measured from several perspectives: • the end users' point of view (elderly persons and their relatives), • the professional point of view (physicians/medical caregivers), • the research point of view (software and system engineers). An overview of currently available assistive technology in elderly care and currently available products can be found in Miskelly [3], in a recent study on European Remote Patient Monitoring Markets by Frost & Sullivan [4], and in a study for the European Commission on ICT for independent living for elderly people [5]. Dishman [6] provides an introduction to indoor electronic home care systems for elderly people and their potential for aging societies. While the traditional geriatric approach usually consists of manually acquiring ADLs [7] via care personnel or requesting it from the individuals through questionnaires and use it to assess appropriate assistance/care programs for the respective persons [8] [9] [10], new approaches aim at automatic emergency detection/notification based on Automatic Behavioral Monitoring Systems (ABMS) [11]. ABMS monitor the environment and not the individuals themselves. They combine technical features of Smart Homes (with automation technology) and telemedicine (especially physiological process telemetry) [12] [13]. However, the most prominent state-of-the-art products available nowadays are mobile phones for elderly people that combine emergency call functions and user location tracking (e.g., Vitaphone [14]), call systems (e.g., Hausnotruf of Deutsches Rotes Kreuz [15]), or
An Approach to and Evaluations of Assisted Living Systems
201
general activity monitoring devices (e.g., IST Vivago [16] or Sophia [17]). What is common to these currently commercially available solutions is that they fit into the affordable cost budget of an individual (typically with monthly fees below 50 €€ for equipment and services). Another commonality is that none of them makes use of behavior-based monitoring with the monitoring of ADLs. The paper is structured as follows: The approach taken in EMERGE for recognizing ADLs and specific situations is briefly described in section 2. Afterwards, three evaluations for detecting ADLs are described and the results are explained in section 3. Section 4 then provides an interpretation of our initial evaluation results and threats to validity. Finally, we draw conclusions in section 5.
2 Description of the EMERGE System The main goal of EMERGE is to support elderly people with emergency monitoring and prevention by using ambient, unobtrusive sensors and reasoning about arising emergency situations. To reach this goal, we aim to: • Track the daily routines of elderly persons to explore behavior and activity patterns, and to detect deviations as early indicators for emergency situations. • Anticipate emergency situations and try to prevent them by providing early assistance to the elderly person. • Proactively assist in emergency cases by providing helpful instructions, communication means, and information about the current state of the patient. The approach is to reason on unobtrusive and non-invasive sensor data. The solution is a combination of sensors, software components, IT platforms, and expert systems for situation recognition and decisions about the appropriate assistance [18]. An integrated approach for the assessment of the overall functional health status of a person needs to be taken. In the Human Capability Model (HCM), a functional description of several vital, mental, and psycho-social parameters over time is given [19]. Dependent on the reduced array of information that is available from a sensor environment, the process of fusing sensor data into a meaningful set of information will therefore often not result in a complete medical diagnosis, but will rather describe a situation as being critical or abnormal [20]. It was a key issue for the HCM to define a “normality model” for all these parameters and situations for each specific user. The model also defines end user communication and interaction facilities to allow the assessment of the situations severity and the adequate response. It focuses on two main areas for describing the functional health status: ADLs and vital parameters. ADLs are one part of the functional health status and can be understood as single activities or a series of acts. Besides others, they are influenced by habits, by the time of the year, the day of the week, by culture, and by age. Activities like sleeping, going to the toilet, preparing meals, and general mobility have shown to be sensitive to indicating changes in the behavior [21] with relation to potentially arising emergencies (cf. fig. 1). ADLs follow certain individual stereotypes and the recognition of deviations or inconsistencies should allow drawing conclusions about a person’s health status. Basic vital data concerning circulation, respiration, and consciousness, which can be further broken down into single parameters, such as pulse, heart rate, respiratory
202
T. Kleinberger et al.
rate, oxygen saturation, blood pressure, etc. are the focus of telemedical applications. Up to now, this has usually been related to a more or less obtrusive use of sensors, power supply and wiring on the body. We limit vital data monitoring to the use of a wrist device, which provides continuous pulse measurement and discontinuous measurement of arterial blood pressure and body weight by stand-alone devices. Situation Short-term
Long-term
Emergency
Activities of Daily Living
Fall
Pers. Hygiene
Motionlessness
Toilet Usage
Toilet Usage
Critical Vital Parameters
Preparing Meal
Preparing Meal
Critical Deviation Ascending/Descending
Pers. Hygiene
Sleep
Sleep
General Mobility
General Mobility
Medical Knowledge (HCM)
Vital Parameter
Fig. 1. Activities and situations
3 Evaluation of the ADL Detection For planning the evaluations, we used the well-established GQM approach (GoalQuestion-Metric) from the software engineering domain [22] and applied it to Assisted Living solutions. In order to consider the stakeholder interests adequately, all stakeholders were already integrated into the requirements analysis and development process for the prototypes [1]. They participated in the requirements elicitation, the validation of the underlying theoretical models for the application, as well as in the tests and evaluations. The evaluation program aims at: 1. proving the validity of our application model, 2. enabling the developers to decide on how to optimize the solution, and 3. enabling end users/stakeholders to draw conclusions about the benefits (as well as the limitations) of adopting the solution in real life. 3.1 Hypotheses and Variables The evaluation goal according to the GQM goal template was to analyze the Activity Recognition Component (ARC) for the purpose of evaluation with respect to its accuracy from the viewpoint of the researcher in the context of a controlled experiment in the assisted living apartment at the Fraunhofer IESE. For this purpose, we defined 3 hypotheses for the ADLs toilet usage, personal hygiene, and preparation of meals: • H1: The accuracy of the ARC regarding the ADL toilet usage is higher than 80 % of all the performed scenarios. • H2: The accuracy of the ARC regarding the ADL personal hygiene is higher than 80 % of all the performed scenarios. • H3: The accuracy of the ARC regarding the ADL preparation of meals is higher than 80 % of all the performed scenarios.
An Approach to and Evaluations of Assisted Living Systems
203
To fix the expectations for the hypotheses, we held discussions with experts from medicine and geriatrics about what level of accuracy would be necessary to still be able to draw conclusions about long-term behavior deviations. The ARC analyzes on-the-fly the continuous sensor data stream for characteristic patterns of ADLs. The three mentioned ADLs are the core activities for an ongoing long-term behavior monitoring. The accuracy of detecting these ADLs therefore is a very important indicator for proving the correct functionality of the ARC and a good quality parameter for long-term behavior deviation detection afterwards. The independent variable in our experiment is the ARC under evaluation, respectively the rules implementing the activity model. The dependent variable is the accuracy of the ARC regarding the detection of the situation prescribed in a given scenario. The accuracy will be calculated by using the following formula:
3.2 Samples, Materials, and Environment In a pre-test with an elderly person, we did not recognize any significant differences compared to a younger person in performing the evaluation task from a technological point of view. However, there might be time differences, which are not in the focus of this evaluation. Hence, for the recognition of ADLs, it does not matter whether the acting persons correspond to the addressed group. For playing out the defined scenarios, unbiased test persons were chosen on a voluntary basis. Their age was in the midtwenties. To define realistic scenarios, we followed a commonsense approach that was verified via a pilot study and an expert workshop. In the pilot study, a test person representing a part of the end user group (elderly woman, 73 years old, living independently in her apartment) performed the scenarios. Her activities were recorded and documented in a log. The results of the pilot study were compared with the initial scenarios. In a follow-up expert workshop, medical experts and social scientists defined alternative variations of the scenarios. To avoid an inherent bias, the scenarios were defined by experts who did not know the activity model. Scenario sets included positive and negative scenarios (e.g., toilet usage / no toilet usage). Some of the negative scenarios were intentionally very similar to positive scenarios. To assure statistical significance, we aimed to obtain a set of more than 50 scenarios. For that purpose, the positive and negative scenarios were multiplied without affecting the numerical proportion. Table 1 illustrates the final number of scenarios. The set of scenarios was used to create scenario cards containing short scripts. These scripts describe the respective ADL alternative via step-by-step sub-activities. The scenario cards were applied during the experiment to assist the test persons performing the ADL. In the assisted living apartment, a wide range of sensors are used for detecting situations of interest. Figures 2 and 3 illustrate the individual sensor setup for the evaluation of the ADL detection.
204
T. Kleinberger et al. Table 1. Overview of the scenario setting
ADL Toilet Usage Personal Hygiene Preparation of Meals
Number of scenarios 100 54 66
Number of pos. scenarios 50 33 45
Fig. 2. Setup for evaluation of ADL preparation of meals
Number of neg. scenarios 50 21 21
Number of test persons 6 5 4
Fig. 3. Setup for evaluation of ADL personal hygiene
3.4 Experimental Design and Procedure A controlled experimental design was chosen. The scenarios were to be performed by the test person(s) and the results, given by the ARC (experimental), were to be compared to the real facts as written on the scenario card (control). The test persons received an introduction to the environment, i.e., they were told where they could find certain things that they would need to perform a specific task. All relevant sensors located in the areas of the evaluation were active together with their software components. The software system with services for collecting the sensor information and the ARC itself with the timeline GUI were started. To avoid faulty sensor information that might impact the recognition result, no person other than the test person was allowed to stay inside the apartment during the test phase. The test person randomly picked one scenario card with the description of the scenario from a bowl. Then the test person performed the selected scenario by executing the steps written on the scenario card. Afterwards the scenario ID and the current time were noted and the card was removed from the scenario pool. The sensor data was collected during the execution of the scenarios and stored by a recorder integrated into the software. In addition, we filmed the actors with a video camera while they performed the scenarios. This enabled us to assess the quality of the collected data, especially in “problematic” cases. The scenario ID, the result delivered by the ARC (ADL detected/not detected), and the time (start-time, end-time, duration) were documented in an Excel sheet. 3.5 Analysis After performing and documenting the scenarios as described, the Excel sheet was extended with the type of the respective (real) scenario and the result (cf. figure 4).
An Approach to and Evaluations of Assisted Living Systems
205
Fig. 4. Extract of result-sheet
The results of all scenarios were transformed into confusion matrices. The accuracy was calculated using the formula in Section 3.1. Subsequently an χ2-test was used for each ADL for verifying the independence of the classifier (p-level: 0.05). Table 2. Evaluation results (confusion matrix) of ADLs ARC result \ Scenario
Toilet Usage
No Toilet Usage
Toilet Usage No Toilet Usage ARC result \ Scenario
50 0 Personal Hygiene
2 48 No Personal Hygiene
Personal Hygiene No Personal Hygiene ARC result \ Scenario
25 8 Preparation of Meals
1 20 No Preparation of Meals
Preparation of Meal No Preparation of Meal
43 2
2 19
n 100
2 test p=0.00
n
2 test
54
p=0.00
n
2 test
66
p=0.00
The results for the ADL toilet usage are shown in table 2. We got 50 true positives (i.e., the ARC detected toilet usage and the scenario was toilet usage, too), 48 true negatives (i.e., the ARC detected no toilet usage and the scenario was no toilet usage, too), and two false positives (i.e., the ARC detected toilet usage but the scenario was not toilet usage). The formula for accuracy yields 98/100=98%. Interpreting the result of the test and the achieved accuracy, we accepted hypothesis H1. The results for the ADL personal hygiene are shown in table 2. We got 25 true positives, 20 true negatives, one false positive, and eight false negatives. The formula for accuracy yields 45/54=83.3%. Interpreting the result of the χ2 test and the achieved accuracy, we accepted hypothesis H2. The results for the ADL preparation of meals are shown in table 2. We got 43 true positives, 19 true negatives, two false positives, and two false negatives. The formula for accuracy yields 62/66=94%. Interpreting the result of the χ2 test and the achieved accuracy, we accepted hypothesis H3.
4 Interpretation The analysis confirmed our hypotheses. The achieved results were much better than formulated in our hypotheses. Hence, it was shown that the environment of the assisted living apartment together with the ARC enables accurate detection of ADLs.
206
T. Kleinberger et al.
However, some problems were discovered, which are shown in table 3. The problematic tasks, i.e., false positive and false negative scenarios, are listed. The individual cause of failure could be established in retrospect by analyzing the video data and logged sensor events. The causes are presented in the last column of the table. Especially for personal hygiene, the accuracy could have been improved by increasing the common understanding of the definition of the ADLs. The people who defined the scenarios included washing their hands as part of the ADL personal hygiene, whereas the people who developed the activity model did not consider washing hands as part of this ADL. Table 3. Problematic tasks # 2 3 3
Scenario Type No Toilet Usage Personal Hygiene Personal Hygiene
Content Clean the toilet Shave Wash hands
1 1 1 2
Personal Hygiene Personal Hygiene No Personal Hygiene No Preparation of Meals Preparation of Meals
Brush teeth Have a shower Take cleaning cloth Tidy up the kitchen
Cause of Failure Sensor sequence similar to positive scenario Malfunction of electric power sensor Not sufficient for Personal Hygiene from modeling point of view Malfunction of sensor Wrong performance by test person (“too fast”) Wrong performance by test person (“searching”) Sensor sequence similar to positive scenario
Prepare frozen pizza
Performing without using dishes or cutlery
2
Threats that might have an impact on the validity of the results are discussed in the following. Construct validity: Mono-operation bias, monomethod bias, and confounding constructs. We used several different scenarios for evaluating the accuracy of the ARC. Furthermore, we used video analysis to verify whether the tasks were performed in the “correct” way. Except for the laboratory setting, the tasks did not require any special knowledge or experience other than what any adult would have. Internal validity: Instrumentation: The scenario cards were carefully designed and proof-read by several people for ambiguity of the task descriptions. Furthermore, we used video to check whether the tasks were performed according to the descriptions. Maturation: Due to the variation of the scenarios and the easy tasks (daily activities), we do not expect any learning effects. The experiments did not take longer than three minutes. Therefore, participants did not feel bored or got tired. Although they got an introduction to the environment, the participants might not have been familiar with the environment. However, because we did not measure their performance, this is not an issue. Selection: We used volunteers for the experiment. This is seen as a threat because they might be motivated more than the average population. However, because we did not measure their performance, this threat is not relevant for our experiments. External validity: Although the setting was artificial and the participants did not belong to the addressed population, we claim that the results can be generalized. In a trial run with an elderly person, we did not observe differences in the performance of the evaluation tasks compared to a younger person. The rules of the ARC were not
An Approach to and Evaluations of Assisted Living Systems
207
individually tailored to the participants’ behavior and the apartment reflects a real habitat of an elderly person. Conclusion validity: Reliability of measures and reliability of data processing: The measurement was conducted by the ARC. It worked identically for all participants and scenarios. Quality of data: The quality of the measured data can be considered as high, because the observations were recorded on video and checked against the correct performance of the scenarios. Statistical instruments were used to calculate the significance of the hypothesis.
5 Conclusions We presented initial evaluation results from two research projects, EMERGE and BelAmI. In detail, we used an approach to recognize the ADL personal hygiene, toilet usage, and preparation of meals by analyzing characteristic patterns in a continuous sensor data stream. The approach was evaluated empirically by means of experiments in a near real-life environment of an assisted living apartment. The results from the experiments were quite positive. The interpretation of our evaluation results proved that it is possible to measure ADLs accurately enough for the purpose of detecting behavior deviations. It can be concluded that we found a setup of a technological solution and controlled experiments in a near real-life environment that allows measuring ADLs unobtrusively with totally ambient sensors and nearly no restrictions on the normal behavior. To reach this objective, it turned out to be very useful to include all stakeholders very early in the requirements analysis and development process for the prototypes and especially in the setup of the experiments. This will now enable us to a) progress towards measuring long-term behavior deviations by putting the HCM into operation and perform experiments on long-term behavior deviation recognition and b) start real-life field trials and evaluations from the professional point of view and the end users’ point of view. Acknowledgements. This work was carried out in the projects EMERGE [1] and BelAmI [2].
References 1. EMERGE: Emergency Monitoring and Prevention, Specific Targeted Research Project (STREP) of the EC, Project No. 045056, http://www.emerge-project.eu 2. BelAmI: Bilateral German-Hungarian Collaboration on Ambient Intellgence Systems, funded by the German Federal Ministry of Education and Research (BMBF), the Fraunhofer-Gesellschaft, and the Ministry of Science, Education, Research, and Culture of Rhineland-Palatinate (MWWFK), http://www.belami-project.org 3. Miskelly, F.G.: Assistive technology in elderly care. Age and Ageing 30(6), 455–458 (2001), http://ageing.oxfordjournals.org/cgi/reprint/30/6/455 4. Frost, Sullivan: European Remote Patient Monitoring Markets, B519–B556 (2005)
208
T. Kleinberger et al.
5. ICT for independent living for elderly. A study on ICT enabled independent living for elderly for the European Commission created by VDI/VDE Innovation + Technik GmbH, Berlin, Germany, http://www.ict-ile.eu/db/products 6. Dishman, E.: Inventing Wellness Systems for Aging in Place. IEEE Computer 37(5), 34–41 (2004) 7. Katz, S., Ford, A.B., Moskowitz, R.W., Jackson, B.A., Jaffe, M.W.: Studies of Illness in the Aged: The Index of ADL: A Standardized Measure of Biological and Psychosocial Function. Journal of the American Medical Association 185(12), 914–919 (1963) 8. Lawton, M.P.: Scales to Measure Competence in Everyday Activities. Psychoparmacological Bulletin 24, 609–710 (1988) 9. Kane, R.A., Kane, R.L.: Assessing the Elderly – A Practical Guide to Measurement, pp. 25–67. Lexington Books, Lexington (1981) 10. Fillenbaum, G.: Multidimensional Functional Assessment – Overview. In: Mezey, M.D. (ed.) The Encyclopedia of Elder Care, pp. 438–440. Springer, New York (2001) 11. Glascock, A.P., Kutzik, D.M.: The Impact of Behavioral Monitoring Technology on the Provision of Health Care in the Home. Journal of Universal Computer Science 12(1), 59– 79 (2006) 12. Kutzik, D.M., Glascock, A.P.: Monitoring Household Behavior to enhance Safety and Well-Being. In: Burdick, D., Kwon, S. (eds.) Gerontechnology – Research and Practice in Technology and Aging, pp. 132–144. Springer, New York (2004) 13. Glascock, A.P., Kutzik, D.M.: Automated Behavioral Monitoring. In: Keijer, U., Sandstrom, G. (eds.) Smart Homes and User Values, Eastern Mediterranean University, Faculty of Architecture, Gazimagusa, Mersin, Turkey, pp. 83–106 (2007) 14. Vitaphone, http://www.vitaphone.de 15. DRK Hausnotruf, http://www.drk-hausnotruf.net 16. IST Vivago, http://www.istsec.fi 17. Sophia, http://www.sophia-nrw.de 18. Storf, H., Becker, M.: A Multi-Agent-based Activity Recognition Approach for Ambient Assisted Living. In: 3rd Workshop Artificial Intelligence Techniques for Ambient Intelligence, European Conference on Artificial Intelligence, Patras, Greece (2008) 19. Nehmer, J., Karshmer, A., Becker, M., Lamm, R.: Living Assistance Systems: An Ambient Approach. In: Osterweil, L.J. (ed.) Proc. 28th Int. Conf Software Engineering, Shanghai, pp. 43–50. ACM Press, New York (2006) 20. Garsden, H., Basilakis, J., Celler, B.G., Huynh, K.: A home health monitoring system including intelligent reporting and alerts. In: IEEE Conf. Engineering in Medicine and Biology Society 2004, vol. (5), pp. 3151–3154 (2004) 21. Celler, B.G., et al.: An instrumentation system for the remote monitoring of changes in functional health status of the elderly at home. In: Proc. of the 16th Ann. Int. Conf. of IEEE Eng. Med. and Biol. Soc., vol. 2, pp. 908–909 (1994) 22. Basili, V.R., Caldiera, G., Rombach, H.D.: Goal Question Metric Paradigm. In: Marciniak, J.J. (ed.) Encyclopedia of Software Engineering, vol. 1, pp. 528–532. John Wiley & Sons, Chichester (2001)
Anamorphosis Projection by Ubiquitous Display in Intelligent Space Jeong-Eom Lee1, Satoshi Miyashita2, Kousuke Azuma2, Joo-Ho Lee2, and Gwi-Tae Park1 1
School of Electrical Engineering, Korea University, Seoul, Korea
[email protected] 2 Department of Information and Communication Science, Ritsumeikan University, Kusatsu, Japan
[email protected]
Abstract. This paper describes a projection based information display system, which can make a user see three-dimensional (3D) object with naked eyes. The proposed system is based on Intelligent Space and a projector mounted mobile robot which is called Ubiquitous Display. Human can perceive a 3D structure from 2D retinal images by using diverse cues. As adopted psychological cues, the proposed system makes that a user can perceive 3D object by seeing oblique anamorphosis projected on where the user is facing. Through this, the user can get information more realistically and experience augmented reality which the real and virtual objects coexist without optical devices such as glasses, headmounted display. Moreover, by using Ubiquitous Display in Intelligent Space, ultimately human centered active information display can be feasible. Keywords: Anamorphosis Projection, Active Information Display, Ubiquitous Display, Intelligent Space.
1 Introduction The main goal of our research is creating a new active information display system to achieve a human centered information transfer method. In usual cases, people should approach information sources which are located around our living environment such as bulletin boards, artificial signs, local maps, etc. They are the most common ways of transferring information, but there are lots of problems in such methods. The first problem is that human should move to certain place to get information. The second is that there is no guarantee of acquiring necessary information. The third is that it takes a long time to renew information. As for these problems, it is thought that the problems are caused by “passive information display”; a user approaches information. However, to the contrary, if information approaches a user, the above-mentioned problematical point can be solved. The most familiar method without above mentioned problems is a mobile device based information display, that a user carries a cellular phone, a potable digital assistant (PDA) and head mount display (HMD)[1]. However, this method also has C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 209–217, 2009. © Springer-Verlag Berlin Heidelberg 2009
210
J.-E. Lee et al.
several problems such as; a user should carry a mobile device always; in case of cellular phone or PDA, its small screen cannot display enough information at once; in case of using HMD, users may get stress on using it because of its weight, dizziness, blocking user's sight and so on. Another method is to use a steerable projector [2, 3]. This method is able to display information on various locations by pan and tilt mechanism or mirror. Since the projector is able to display various information, renewing information is not a time consuming job in this method any more. But, even if it has pan-tilt actuator, a display area is limited. And also, it needs very precise calibration. So, to cope with these problems, we proposed a new active information display method based on Intelligent Space and a projector mounted mobile robot [4]. The proposed system is able to afford a user with relevant information by projecting it on where the user is facing, so that the user does not need to move for seeking information. And, as the proposed system provides information according to the user’s requests, the user is able to obtain necessary information. And also, renewing information is very easy. Moreover, the proposed system makes that a user can perceive a three-dimensional object by seeing image projected on where the user is facing. Through this, the user can get information more realistically and experience augmented reality which the real and virtual objects coexist without optical devices such as glasses, head-mounted display. The human visual system interprets depth in two-dimensional retinal images using both physiological and psychological cues. There are a variety of depth cues, but they can be generally categorized into three classes: oculomotor, monocular, and binocular [5]. As the binocular disparity between the two eyes is a very important cue for the perception of 3D object, most of 3D displays are based on it. Therefore, a main consideration is how a system can provide a different image for each eye [6]. But, in this paper, our approach is based on psychological cues and all psychological cues are monocular. The proposed system adopts a method to alter perspective representation of 3D object according to a user’s viewpoint. By projecting a computer-generated image which is made by oblique anamorphosis techniques, a user can see 3D object superimposed on the real world with naked eyes. Section 2 of this paper provides more details of the proposed system. And section 3 describes a 3D object display method by oblique anamorphosis techniques. Result of experiments is described in section 4. Finally, we summarize our results and discuss the performance of the proposed system in section 5.
2 Human Centered Active Information Display The proposed active information display system is based on Intelligent Space and Ubiquitous Display. Ubiquitous Display is a pan and tilt mechanism based projector mounted mobile robot, referred in this paper simply as UD-1. As it is shown in Fig. 1, Intelligent Space recognizes location and facing angle of human with its sensors. The information which the human wants is also recognized by Intelligent Space. Based on gathered data, the information to be presented is decided by Intelligent Space. And
Anamorphosis Projection by Ubiquitous Display in Intelligent Space
211
Fig. 1. Human, Ubiquitous Display and Intelligent Space
location of the projector mounted robot and angles of the mounted projector are decided based on where to display the information. With this combined system, an active information display system is achieved and all the problems are overcome. 2.1 Intelligent Space In this section, Intelligent Space, which is deeply related system with the proposing system in this paper, is introduced. Intelligent Spaces are rooms or areas that are equipped with sensors, which enable the space to perceive and understand what is happening in them. In Intelligent Space, a human is watched by distributed sensors connected via network which are called a Distributed Intelligent Network Device (DIND) [7]. By installing this DIND into the whole space, it can recognize objects inside the space easily and it can perform information exchange and sharing by mutual communication through the network. The main purpose of Intelligent Space is the accomplishment of human-centered systems. As stated above, Intelligent Space recognizes location and facing angle of human, and information which the human wants. In previous work [8], we have set up Intelligent Space as shown in Fig. 2. And we proposed a method for search a position of human by using multiple cameras in Intelligent Space. And also, we have developed a 3D head pose estimation algorithm that can be used to generate a 3D face shape from 2D face shapes and estimate head pose by fitting a 2D shape to a 3D shape based on Active Appearance Models (AAMs). Furthermore, we have developed a speech recognition module using an open source real-time large vocabulary recognition engine (Julius) and directional microphone with pan-tilt actuator. With collaboration of other DIND in Intelligent Space, a microphone can lean toward a human face using position and head pose information of the person.
212
J.-E. Lee et al.
Fig. 2. A set-up of Intelligent Space
2.2 Ubiquitous Display (UD-1) As shown in Fig. 3, UD-1 is composed of five parts; a projector, a pan-tilt mechanism, a power supply, a mobile robot and a computer. The UD-1 has following features in comparison with fixed place pan and tilt mechanism projector. • Ubiquitous Display Feature. The limitation of information displaying area disappears and information can be displayed on various places. • Seamless Display Feature . Information can be displayed continuously in a wide area. • Adaptable Display Feature. Since Intelligent Space monitors human, UD-1, and other obstacles, it is able to cope with varying situations • Interactive Display Feature. Since Intelligent Space works as a smart interface, information can be provided according to the state of a user.
Fig. 3. A Ubiquitous Display (UD-1)
Anamorphosis Projection by Ubiquitous Display in Intelligent Space
213
3 3D Object Display Based on Oblique Anamorphosis In Fig. 4, a procedure of 3D object display based on anamorphosis is shown. Through this process, a user can see 3D object superimposed on the real world without optical devices such as glasses, head-mounted display.
Fig. 4. 3D object display using anamorphosis
Intelligent Space visualizes information whether their forms are texts or graphics. And, UD-1 transfers information by projecting images visualized by Intelligent Space. Therefore, Intelligent Space has to consider geometrical relations between projector and projected plane or a user to display undistorted images. In Fig. 4, “2D fixed shape image” is a process for this. 3.1 Anamorphosis An "anamorphosis" is a distorted projection or perspective; especially an image distorted in such a way that appears in its true shape when viewed from a certain point [9, 10]. It has been studied by artists and architects since the early fifteenth century. There are two main types of anamorphosis : oblique (perspective) and catoptrics (mirror). The oblique anamorphosis is a sort that we will be concerned to create depth effects in this paper. Like transform equations described in [9], an anamorphosis is based on precise mathematical and physical rules to construct two dimensional representations of three dimensional objects. However, anamorphic images which are used in this paper are made according to simple rules depicted in Fig. 5. 3.2 Fixed Shape Display When information is projected on walls or floors, the projected image changes its shape from the original image based on geometrical relation between surface of wall
214
J.-E. Lee et al.
Fig. 5. Simple rules for drawing anamorphic images
or floor and projector. Let P is input image of the projector of UD-1 and let R is a projected image on a surface then their relation is as follows.
(H
R ≅ H PR P
PR
(1)
: Homography matrix )
By utilizing ( H PR ) , it is transformable between an input image and a projected −1
image. To cope with movement of UD-1 as well as rotation of the projector, we extended a virtual surface method presented in [11]. Usually 6 DOFs are considered enough to express a posture of an object in environment. However in case of UD-1, to project an image on a surface, there are 11 DOFs. Followings are the 11 DOFs.
Anamorphosis Projection by Ubiquitous Display in Intelligent Space
• Posture of UD-1
:
( xrobot , yrobot , φrobot )
• Pan-tilt mechanism
:
(θ pan , θ tilt )
• Image
:
( x, y, z , ϕ roll , ϕ pitch , ϕ yaw )
215
These parameters can be optimized by evaluation function J in (2).
J = ∏ Wi R i
(2)
i
⎛ Wi = {w0 , h , wn } : Weight ⎞ ⎜ ⎟ ⎝ R i = {r0 , h , rn } : Rules ⎠ • Examples of rules - to prevent deforming image - to set projection distance - to prevent occlusion of projection More details were described in [12].
4 Experiments To verify usefulness of the proposed system, experiments were performed under two conditions: 1) projecting on the floor, 2) projecting on the wall. The following are results of experiments. To display a real object, we take pictures as shown in Fig. 6. And then, UD-1 project anamorphic images of picture selected according to a user’s viewing point. The result is depicted in Fig. 7. And Fig. 8. shows results of projection of 3D computer graphics. Table 1 summarizes a survey of four subjects. As shown in Table 1, in case of projection on floor, it is difficult for subjects to perceive 3D object at a close range, but in case of projection on wall, the opposite results are produced.
Fig. 6. Capturing a finite number of image
216
J.-E. Lee et al.
Fig. 7. Results of experiment (anamorphic images of picture)
Fig. 8. Results of experiment (anamorphic images of 3D computer graphics) Table 1. Results of a survey (mean values) Distance Place Floor Wall
2m
4m
6m
8m
2.75 4.50
3.50 3.50
4.00 4.00
5.00 2.75
Levels of perception : 1(can’t see) ~ 5(can see).
Anamorphosis Projection by Ubiquitous Display in Intelligent Space
217
5 Conclusion In this paper, we described anamorphosis projection by Ubiquitous Display in Intelligent Space. The proposed system makes that a user can perceive 3D object by seeing oblique anamorphosis projected on where the user is facing. Experimental results showed that the proposed system works reasonably well. And, as the proposed system applies anamorphic images, it is able to induce a user to move a place that the user can perceive 3D object. Therefore, further studies are required to achieve an interactive system. Acknowledgments. This work was supported by the second stage of the Brain Korea 21 Project in 2008.
References 1. Nakamoto, H., Shibata, F., Kimura, A., Tamura, H.: A Variety of Mobile Mixed Reality Systems with Common Architectural Framework (5) Functional Validation of the Framework Based on Application Prototyping. Technical report of IEICE. Multimedia and virtual environment, vol. 106(234), pp. 103–108 (2006) 2. Ashdown, M., Sato, Y.: Steerable Projector Calibration. In: Proc. IEEE Workshop on Projector-Camera Systems (2005) 3. Pinhanez, C., Kjeldsen, R., Levas, A., Pingali, G., Podlaseck, M., Sukaviriya, N.: Applications of Steerable Projector-Camera Systems. In: Proc. of ICCV Workshop on Projector-Camera Systems (2003) 4. Lee, J.H.: Human Centered Ubiquitous Display in Intelligent Space. In: The 33rd Annual Conference of the IEEE Industrial Electronics Society (2007) 5. Bruce Goldstein, E.: Sensation and Perception, 7th edn. Wadsworth Publishing (2006) 6. Dodgson, N.A.: Autostereoscopic 3D Displays. Computer 28(8) (2005) 7. Lee, J.H., Hashimoto, H.: Intelligent Space - Its concept and contents. Advanced Robotics Journal 16(4), 265–280 (2002) 8. Lee, S.O., Sakurai, R., Nishizawa, T., Lee, J.H., Park, G.T.: A Spatial History Storing System based on Intelligent Space. In: The 34th Annual Conference of the IEEE Industrial Electronics Society (2008) 9. Hunt, J.L., Nickel, B.G., Gigault, C.: Anamorphic Images. American Journal of Physics 68(3), 232–237 (2000) 10. Art of anamorphosis, http://www.anamorphosis.com 11. Mitsugami, I., Ukita, N., Kidode, M.: Multi-Planar Projection by Fixed-Center Pan-Tilt Projectors. In: Proc. of IEEE International Workshop on Projector-Camera Systems (2005) 12. Miyashita, S., Lee, J.H.: Ubiquitous Display for Human Centered Inferace - Fixed Shape Projection and Parameter Optimization. In: Proceedings of the 17th World Congress The International Federation of Automatic Control (2008)
AAL in the Wild – Lessons Learned Edith Maier and Guido Kempter University of Applied Sciences, Hochschulstrasse 1, 6850 Dornbirn, Austria {edith.maier,guido.kempter}@fhv.at
Abstract. In the EU-funded ALADIN project the prototype of an ambient assistive lighting system was subjected to a three-month test in private households of older people. Despite intensive usability testing in the development phase, field trials pose special challenges including ethical issues such as obtaining informed consent and the need for guidelines for interviewing old people. Besides, real-life settings give rise to particular distortion effects which have to be taken into account in the analysis of the results. Although the findings indicate any overall slight increase in people´s mental and physical fitness, they also suggest how the prototype can be improved in several respects. Above all it has been shown that packaging the technology with social support measures is essential to achieve higher user acceptance. Besides, the article discusses lessons learned related to the organization of user testing in real-life settings. Keywords: AAL (ambient assisted living), ambient intelligence, lighting assistance, adaptive algorithm, field trial.
1 Introduction Given a rapidly ageing population, the elderly have become a favorite target group for ICT developers. However, translating research findings into marketable solutions has turned out to be a big challenge for many who have developed smart devices, applications and systems that are meant to benefit older people. Market exploitation has been surprisingly slow despite the enormous economic opportunities due to demographic change. This may be due to reasons such as low market awareness and visibility, lack of sustainable business models, lack of standards, but also regulatory frameworks and public policies that hinder the uptake of age-related ICT-based products and services. Since we believe that lack of user acceptance is a major barrier to market success of AAL solutions, we involved end-users throughout the duration of ALADIN, an EU-funded project aimed at developing an ambient adaptive lighting system. First of all, we carried out a detailed user requirements analysis with almost two hundred elderly people using in-depth interviews. We then conducted a thorough iterative testing process before the assistive lighting system was deployed in the households of elderly test persons. Nevertheless, the field trials in which we subjected the prototype to a three-month test in real-life settings proved to be a veritable endurance test, the experience of which we want to share with our colleagues. The paper briefly describes the project aims as well as the prototype and its components. It then goes on to describe the different stages of usability testing in the C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 218–227, 2009. © Springer-Verlag Berlin Heidelberg 2009
AAL in the Wild – Lessons Learned
219
course of its development. We discuss the special challenges we had to cope with during the field trials including the legal and ethical issues raised by usability testing “in the wild”. When analyzing the results, various distortion effects related to real-life conditions such as time effects or habituation had to be taken into account. We briefly describe the results from the field tests and their implications for the redesign of the prototype. We then discuss the lessons learned which primarily concern the organization of user testing in real-life settings and the “packaging” of assistive technologies to increase user acceptance. 1.1 The ALADIN Prototype and its Components The overall aim of the ALADIN project is to contribute to the growing body of knowledge about the effects of light on human health and wellbeing. So-called 'ambient lighting' with varying color temperature and brightness has been in use for some time. However, the user normally has no possibility to interact with the predefined control strategy (mostly defined by the time of the day) and the lighting solutions do not take into account individual differences [1]. Besides, whereas currently most cognitive assessment is done in a clinical setting [2], we use sensor-based monitoring combined with adaptive algorithms to assess people’s level of functioning in a continuous way [3]. The fully developed ALADIN prototype consists of an Apple Mini Mac with the installed software booting up automatically, a flat screen television set with service and control interfaces, a USB driven infrared system connected to a normal remote control, a sensor glove with integrated Varioport device and bluetooth connection. A Luxmate bus box controls illumination panels on all four walls of the selected room as well as a ceiling lamp above the table where the test person spends most of the time. The lighting devices consist of luminaires of two colors that can be mixed by changing their intensity. The computer and the flat screen are meant to replace the existing TV set and be integrated into the home entertainment system. Each testing prototype is a userspecific and flexible composition of the ALADIN components. In the field tests, the prototype was installed in the households of older adults for three months and subsequently removed without leaving any traces. As can be seen in Figure 1, the system comprises the following applications: • • • • • •
Television (Fernsehen) Automatic lighting (Automatisches Licht) Manual lighting ( Manuelles Licht) Exercises (Übungen) History (Rückschau) Advice & support (Wohlfühltipps)
Each of these applications works independently and has to be started deliberately by the user. “Automatic lighting” adapts the lighting to achieve an activating or relaxing effect. Because of the great diversity of situations, events or individuals, the direction of change or adaptation is not known beforehand. We have opted for genetic and simulated annealing algorithms to implement adaptive control. With “Manual lighting” a user can turn on and off different predefined lighting situations (e.g. reading
220
E. Maier and G. Kempter
light) as well as manually modify lighting situations. The application “Exercises” offers the user a variety of activating and relaxing exercises. “Advice & support” allows the user to browse through different recommendations aimed at healthy behavior. Finally, with “History” the user can view the results of the exercises he or she executed during the last five days. The “Advice” and “History” applications derive their input from the results of the exercises.
2 Usability Testing of AAL Applications 2.1 Usability Testing in the Development Phase In line with our participatory and iterative development approach we carried out both heuristic inspections by experts and end-user tests in the development phase. Given our specific target group, i.e. older people with various impairments such as diminished vision, the user interface of ALADIN had to conform not only to general usability guidelines, but also to accessibility guidelines, more specifically WCAG 2.0, which were published as a W3C Recommendation on 11 December 2008. Since in the course of our user requirements analysis it emerged that a TV set could be found in virtually all households, it was decided to implement the adaptive lighting system on a computer with added TV functionality. Interactive TV poses special challenges: watching TV is normally characterized by a “lean back” attitude whereas working on a computer is normally associated with a “lean forward” attitude. The human computer interface has been conceived with an active user in mind whereas TV systems are traditionally aimed at passive consumers. A TV screen is normally placed at a distance of a couple of meters from the viewer and has a lower resolution than a computer. When designing the screen layout we cannot use the full screen because TV sets vary in terms of the size used for presentation. The so-called “safe area” can be reckoned to be about 10% smaller than the full screen, or about. 576 x 460 pixels. The first usability test was conducted early on in the project with a HTML dummy with 12 senior citizens between 65 and 84 years old to examine the screen design of the prototype controlled by a remote control. We assumed that our target users would be familiar with handling remote controls. Bernhaupt et al. [4], however, have observed that although people may still want the remote control as an input device, it is perceived as too complex and difficult to use by many. They would prefer a universal remote control with only one button which is integrated with a display to inform users about what they have to do next. The test persons had no particular suggestions for navigation in addition to the well-accepted rules such as to always show the user’s own position within the software structure, to offer a return to the starting point and give immediate feedback after every user action. The majority of test persons preferred white text colour and blue background colour and were not interested in history information extending for more than seven days. Mock-up lab testing. The first ALADIN prototype with TV and remote control was carried out with ten clients in a nursing home (65 - 94 years). Users liked the overall
AAL in the Wild – Lessons Learned
221
graphic design (font, colours, contrast) and found the GUI easy to navigate, provided that the number of navigation elements was kept to a miminum. Several general observations could be derived from these early user tests: − − − −
The majority of users prefer number keys to cursor keys. Users expect immediate confirmation or feedback from the system. The number of menu items to choose from should not exceed five or six items. (Semi-)technical terms (e.g. biofeedback, sensor belt) have to be replaced with simple ever-day terms. − Users prefer an information structure with a very flat hierarchy As far as sensor technology was concerned, the end-user tests led us to discard the chest belt originally envisaged for capturing biosignals. Test subjects found it too difficult to attach to their bodies because this required a degree of motoric dexterity and flexibility that even quite a few of the young assistants did not possess. As a result, the sensor expert in the consortium developed a biosensor glove in the run-up to the field trials. The change in sensor technology brought about a change of methodology: instead of measuring skin conductance response (SCR), peripheral pulse was measured by means of photoplethysmography (PPG) in the field trials. 2.2 Special Challenges Related to Field Trials As in user testing in general, planning and organizing the field trials comprise preparing a test design, defining selection criteria for test persons and recruiting suitable test persons. When implementing a prototype in private households, however, particular ethical and legal issues arise. In our case, the test design had to take into account the constraints imposed by the high costs of the prototype and the limited time resources available on the part of the partners responsible for the field tests. We also had to take out insurance to cover any damages that might be incurred as a result of the installation of the lighting prototype. Ethical Issues. An important concern is achieving a balance in the relationship between the demand for a better quality of life and the studies that aim to achieve this on the one hand, and the rights of the research participants on the other. In ALADIN, obtaining informed consent and the protection of personal data proved to be the most important issues and played a role at all stages of the research process, i.e. from the requirements analysis by means of interviews to user testing and dealing with research outcomes. Policies regarding informed consent in usability testing are developed by organizations on the basis of generally agreed principles concerning the treatment of human participants. Those principles plus additional ones are enumerated below. Seven of these principles are derived from the related discussion in Dumas and Redish [5]. There is one more principle added in regard to waivers. 1. Minimal risk. Usability testing should not expose participants to more than minimal risk. Though it is unlikely that a usability test will expose participants to physical harm, psychological or sociological risks do arise. If it is not possible to abide by the principle of minimal risk, then the usability professional should endeavour to eliminate the risk or consider not doing the test. Dumas and Redish [5] citing the Federal Register state that minimal risk means that the probability and magnitude of harm or discomfort anticipated in the test are
222
E. Maier and G. Kempter
not greater, in and of themselves, than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests. 2. Information. Informed consent implies information is supplied to participants. Based on the suggestions of Dumas and Redish, we supplied the following information: the procedures we would follow; the purpose of the test; any risks to the participant; the opportunity to ask questions; and, the opportunity to withdraw at any time. 3. Comprehension. The professional needs to ensure that each participant understands what is involved in the test. This must be done in a manner that is clear and unambiguous. It must also be done so as to completely cover the information on the form. The procedure for obtaining consent should not be rushed, nor made to seem unimportant. The procedure is about the participant making an informed choice to proceed with the test and therefore they need to be allowed opportunity for questions. Clearly one possible outcome of applying this principle is that the person involved may choose not to participate. 4. Voluntariness. Professional behavior influences participant involvement. Participants should not be rushed, nor should facilitators fidget while the participant reads the form. Coercion and undue influence should be absent when the person is asked to give their consent to participate in the test. Excessive pressure might come in a number of subtle ways that one needs to be cautious of. 5. Participants’ rights. Countries vary as to their recognition of human rights. Even where there is general agreement, definitions of those rights and interpretations of how they apply vary. Participants should have is the right to be informed as to what their rights are. Karat [6] reviewed the codes of ethics of 30 national computer societies and found that they shared several major topic areas. The first on the list addressed the need to respect the rights of people involved with the technology. According to Dumas and Redish [5] the rights most relevant to usability testing include the right to leave the test without penalty, the right to have a break at any time, the right to privacy (such as not having their names used in reporting the results of the test), the right to be informed as to the purpose of the test and the right to know before the test what they will be doing. 7. Confidentiality. Confidentiality is different from the participant’s right to privacy; it refers to how data about the participants will be stored. The ACS (2000) code stipulates that it is obligatory for members to preserve the confidentiality of others information. The ACM (2000) code has specific clauses on constraining access to certain types of data, and on organizational leadership to ensure confidentiality obligations are adhered to within organizations. In remote testing this can be extended to electronic data-logging over the internet. 8. Waivers. Permission needs to be obtained from participants to use materials such as questionnaires, audio and video recordings (and their transcripts). In many countries they have the right to refuse to give waivers. Participants should be given the option of having the data used for the purposes of the test, or of also having it used in a wider context. If the latter, then the consent form should state in what further ways the data will be used, so that an informed decision can be taken by the participant. Such permission should state the purposes for which the material will be used.
AAL in the Wild – Lessons Learned
223
Informed consent is both a process and a formal record of the process. That formal record is typically a form, but may also be another type of recording, such as video. In our case we used a form that was signed by each participant. Interviewing old people. We had to acknowledge that we were interviewing old people for this research and that taking part in an interview was a demanding experience for both interviewer and participant. Past research has suggested that such interviewing is both possible and highly rewarding. However, we must take into account that failing eyesight, hearing, mobility, cognitive impairment etc may affect interviews. Interviewers have to be prepared to take their time, repeat themselves, ask for clarifications, listen to repetitions, help them to orientate to the questions with reference points for reminders (e.g. anchoring events to dates). Interviewers should keep the interview very focused and watch out for participant fatigue. In our project, they were told, that it would be better to return at a later date than to conduct a lengthy and tiring interview. What also needs to be taken into account are issues of power relationships. Older people might see researchers as high status compared to themselves and may easily defer to a researcher’s opinions and ideas. Inequalities in power throughout the interview could stem from gender (male interviewer-female participant), age (younger interviewer-older participant), employment status, (employed interviewer-retired older person) and so on. We needed to think carefully about how such issues might have affected each interview and discussed this in our reflexive analysis. In situations where power inequalities were evident it was helpful to keep reassuring the participant that they were the experts and we were there to learn from their experiences. This helped to put them more in control of the interview, just as accepting a cup of tea allowed them to relocate the interview as a social situation which they were familiar with. 2.3 Possible Distortion Effects in Field Trials In the analysis of the objective effects of the whole system the dependent variables, that is the outcome in terms of well-being and mental fitness are clearly defined by the survey instruments used and the data collected. Yet the definition of the independent input variables is much more complicated as we are not dealing with a laboratory test under controlled conditions but with a real life field test in which almost everything can happen and distort the assumed correlation between the application of the ALADIN components and the output. Besides a general Hawthorne effect, the fact that the situation of observing and surveying the test persons already has an effect on them, has to be taken into account (Adair 1984). We can identify five sources of possible distortion and interference with the objective ALADIN effects: Time effects: Three months of testing with alternating algorithms are sensitive to time effect because we cannot exclude that the outcome measured in one period could be a consequence of a factor present in a former period. Only the sequence of adaptive lighting algorithms (genetic and annealing algorithm) could be rotated to control sequence effects. This does not affect the evaluation of the system as a whole but it influences the weighting of the different components and periods. Learning effects: Habituation and learning are in some way correlated to time effects and mainly affect measurements of mental fitness. Concerning the performance in the
224
E. Maier and G. Kempter
activation exercises it is well known that regular training leads to improvements. So we should expect a notable increase in performance even without further supporting factors. Social support effects: Coaching and technical assistance are a vital part of a field test because we cannot expect the test persons to use the system without any help right from the start. We also know from the requirements analysis that any technology which replaces social contact would be rejected. Social support therefore has to be recognized as one of the main „disturbing” factors. Some control is possible through the personal diaries and the annotations of the assistants. Particular events: Only in an isolated laboratory and under constant control can we exclude incidental distortions as they often happen in real life. In the analysis of the data we also have to consider meteorological phenomena as well as illness or social problems which we can only partly reconstruct from the collected data. Way of use: A more interfering than distorting effect is closely correlated to the real life scenario. The test persons were free to use ALADIN as they wanted as long as they fulfilled some minimum criteria. This means that date, frequency and duration of usage of the different ALADIN components vary from test person to test person. To compensate for this, either complex modeling of the independent variables or a restriction to single case analysis are required. We chose the latter option.
3 Results and Conclusions 3.1 Summary of Results from Field Trials The prototype was tested in twelve single households for a period of three months. At regular intervals, various mental and physical performance tests as well as life and sleep quality questionnaires were administered. We also organized two focus group meetings with our test persons to discuss any issues that might not be included in the questionnaires or log data. The results indicate an overall increase in both mental and physical fitness. On the whole, test persons enjoyed the exercises and found the system easy to handle. However, they would like a bigger choice both with regard to the exercises and the music tunes that accompany the biofeedback sessions. Quality of life did not seem to improve significantly which may be due to the fact that most test persons started out from very high levels, leaving little space for any notable increase. The people who might benefit most, e.g. the fragile, home-bound older elderly, were represented rather poorly in the test population. This is largely due to the selection criteria which stipulated that test persons must not suffer from any serous health conditions. The real life field testing of the ALADIN provided us with a vast amount of data concerning the factors of well-being and mental fitness in an ageing population and the potential effects of ambient lighting assistance. Even if the sample was small and therefore statistically not significant we gathered very valuable information for advancing the use of lighting for better ageing and improving well-being. The conclusions which can be derived from the testing in real-life settings can be roughly divided into:
AAL in the Wild – Lessons Learned
225
1. Ideas about how to improve the ALADIN prototype and its different components 2. Ideas about how we can improve the process of testing 3. Ideas about how to “package” AAL technology to enhance user acceptance. 3.2 Possible Improvements of the Prototype A marketable version of ALADIN is certainly still some years away, but the field test helped us gather many proposals for improvement. Given the open and modular architecture, the prototype and its different components can be easily integrated into general building management systems. Different target groups might be addressed by different functions, modules or components. For example, we have learned that mental fitness training is an activity which appeals to the younger elderly. The automatic lighting adaptation on the other hand helps stabilize the circadian rhythm which is particularly welcome when someone is constrained to such a degree that outdoor activity and thus exposure to sunlight is no longer possible. To reach the fragile elderly, we would have to devise strategies to reach the relevant intermediaries such as health and care professionals, care organizations, building companies and housing associations. When it comes to the light installation we will have to develop a much cheaper, more stylish and less awesome solution. Besides, older people are very concerned about energy consumption which is why in a future redesign we may consider the use of LEDs. Some minor difficulties with usability and compatibility with home entertainment equipment could be avoided if ALADIN were connectable to any TV set as a plug-and-play device. In this case the remote control could be even simpler. Finally, our results confirmed a basic insight of market research: Products and their features have to give a clearly visible benefit to the customer. In the case of ALADIN it was the automatic light adaptation which lacked this benefit since the adaptation occurred imperceptibly and at a subliminal level. Even if it may be beneficial in the long run, people missed an obvious immediate effect. This may be compensated by adding some features with visible effects such as a light timer to scare off burglars when absent, automatic switch-off as an energy saving measure, automatic switch-on by means of a movement sensor for more comfort, or an automatic navigation help for a safe nocturnal walk to the toilet. 3.3 Lessons Learned Regarding Usability Testing of AAL Applications As is usually the case with such complex endeavors that consume lots of time and human resources, one would like to have another go to get everything right straight from the start. This is why it is so important to derive lessons learned from one´s experience. Even if you assume that you have prepared the field trials very well, cleared all the ethical issues, organized the logistics and written up detailed instructions, this does not ensure that they will run smoothly. In our case, matters were complicated by having three locations for conducting the field tests. On the whole, people prefer calling on a human person/expert rather than consult the manual or FAQs on the Web. Therefore, a need for central coordination throughout the field trials has emerged as essential for efficient implementation. Another important lesson learned is that for measuring factors such as wellbeing, sleep quality and attitude to life the field trials should last for a whole year. This
226
E. Maier and G. Kempter
would also help neutralize the novelty effect in the beginning. In the focus group discussions, it also became clear that the test persons who used ALADIN in the winter months were overall more positive than the ones who used it in spring. Whilst we were aware of this, the high costs of the lighting system made this impossible. Having three systems run in parallel was the most we could afford. The need for very precise instructions does not only apply to the test persons but also to the assistants and the people involved in installing the system. Although before the field tests detailed manuals for the organizations and the coaches involved in the field trials had been prepared, discussed and distributed to all the relevant parties, this still proved insufficient. Especially, we underestimated the need for technical assistance for installing the system. It might therefore be advisable to appoint a technical expert to handle the installation in all locations and who can be contacted when a problem arises. 3.4 Packaging Technology to Enhance User Acceptance The findings especially from the qualitative data strongly suggest that technology can only complement but never replace interaction with humans and ideally should facilitate human contact. It is therefore necessary, to package assistive technology to achieve (higher) user acceptance. With ALADIN, we envisage the following measures: Social support. Although the great majority of the elderly want to live independently at home for as long as possible, they nevertheless want to be embedded in a social network. In a follow-up project we would like to investigate how this type of support can best be delivered. One of our German partners, for instance, is planning to use SOPHIA, an information and communication platform developed by a housing foundation (Joseph-Stiftung) together with the university and clinic in Bamberg (see www.sophia-tv.de). Via a videophone this ‘virtual nursing home’ connects people to a large variety of health and care as well as social services in the region. Advice on ageing-friendly housing. In the course of ALADIN we have acquired a great deal of knowledge about older people’s needs and preferences with regard to housing. Due to mobility constraints, many older people spend a large proportion of their time indoors, which makes optimising lighting so essential for their wellbeing. Lack of daylight may cause seasonal depression and sleep disorders due to irregular circadian rhythms. This can be compensated by longer exposure to light during winter times and higher illumination levels in general since with ageing people’s vision tends to deteriorate. When installing lighting systems in the homes of the elderly, quantity, spectrum, timing, duration and spatial distribution are important characteristics to be considered. In addition, special age-related impairments such as impaired vision have to be taken into account. In the empirical research for ALADIN, the risk of falling down emerged as one of the most common worries among the elderly. The use of lighting for navigational purposes therefore would clearly respond to older people’s needs. A future lighting solution will also have to address safety and security concerns, e.g. as a protection against burglary or theft. Combining assistive lighting with a counseling service on ageing-friendly housing is an avenue we intend to pursue.
AAL in the Wild – Lessons Learned
227
Fig. 1. ALADIN system as deployed in the field tests
Acknowledgments. We would like to thank the European Commission for funding this project as well as all our partners and test persons who have contributed.
References 1. Boyce, P., Mcibse, F.: Education: the key to the future of lighting practice. Lighting Research and Technology 38(4), 283–291 (2006) 2. Pollack, M.: Intelligent Technology for an Aging Population. AI Magazine 26(2), 9–24 (2005) 3. Maier, E., Kempter, G.: Increasing psycho-physiological wellbeing by means of an adaptive lighting system. In: Cunningham, P., Cunningham, M. (eds.) Expanding the Knowledge Economy - Issues, Applications, Case Studies and information on ALADIN website, pp. 529–536. IOS Press, Amsterdam (2007) 4. Bernhaupt, R., Obrist, M., Weiss, A., Beck, E., Tscheligi, M.: Trends in the Living Room and Beyond: Results from Ethnographic Studies Using Creative and Playful Probing. In: Cesar, P., Chorianopoulos, K., Jensen, J.F. (eds.) EuroITV 2007. LNCS, vol. 4471, pp. 146–155. Springer, Heidelberg (2007) 5. Dumas, J.S., Redish, J.C.: Handbook of Usability Testing. John Wiley & Sons Inc., USA (1994) 6. Karat, J.: Evolving the scope of user-centered design. Commun. ACM 40, 7 (1997)
A Modelling Framework for Ambient Assisted Living Validation Juan-Carlos Naranjo1, Carlos Fernandez1, Pilar Sala1, Michael Hellenschmidt2, and Franco Mercalli3 1 ITACA institute, Valencia, Spain
[email protected],
[email protected],
[email protected] 2 Fraunhofer Institute for Computer Graphics Research IGD, Darmstadt, Germany
[email protected] 3 Centro di Cultura Scientifica Alessandro Volta, Como, Italy
[email protected]
Abstract. This paper describes a modeling framework that facilitates and streamlines the process of creation, design, construction and deployment of technological solutions in the context of AAL assuring that they are accessible and usable for senior citizens. The framework supports the design of the Human Interaction aspects of an AAL solution in all the stages of a user centered design methodology, putting in practice the guidelines for the verification and validation of the accessibility and usability facets. Two environments are defined: The authoring environment allow the definition of the user, environment and service models. The simulation environment is composed by software and hardware components that constitute a physical ensemble that in conjunction allow the ICT designer to implement actual Virtual Reality scenarios of AAL. It will be used to verify interaction designs and validate the accessibility of the AAL products by means of immersing the users in 3D virtual spaces. Keywords: Modeling framework, AAL services, workflow technology, Ontology, Services choreography.
1 Introduction Ambient Assisted Living (AAL) is the concept that embraces all the technological challenges in the context of Ambient Intelligence to face the problem of providing easy to use, accessible, affordable, sustainable and efficient solutions that improve the level of independence, promote the social relationships, leverage the immersion in the environments and encourage the psychological and physical state of the person. [1] The user interaction in AAL services is perhaps one of the most complex interaction designs that a usability engineer can deal with. AAL refers to electronic environments that are sensitive and responsive to the presence of people and provide assistive propositions for maintaining and independent lifestyle, so here the challenge for the designer is to experiment with new and innovative modalities of interaction that must be, of course, accessible and usable. Some of the interaction modalities the C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 228–237, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Modelling Framework for Ambient Assisted Living Validation
229
designer can use in an AAL environment are speech recognition systems, gestural systems, context-sensitive systems, visual interaction, hearing interaction, tactile interaction, etc. This paper describes a Modelling Framework to help the user-interaction designer to create immersive environments, user behaviour models, and to define new templates that permit the definition of new interaction modes and devices, using a formal specifications language and an immersive simulation platform to test the accessibility and usability aspects of AAL services. This modelling framework is used in the context of a User Centred design methodology and gives support to the different phases of such methodology. 1.1 User Centered Design in the Modeling Framework To produce innovative and successful products it is needed to involve end-users, in our case Senior Citizens, throughout the whole development process. To do that we propose a methodology based on User Centred Design (UCD) that leads the AAL solutions development to a better position to ensure accessibility and acceptance of those services. UCD is an approach that supports the entire development process with user-centred activities, in order to create applications which are easy to use and are of added value to the intended users. This approach is particularly useful when a new product or service is to be introduced, as it is the case of AAL solutions, because user-centred design draws together the practical, emotional and social aspects of people's experience bringing on the needed innovation that delivers real user benefit. Despite the benefits, using UCD in the ICT industry is still confined because of cost and benefits. When systems become more complex, UCD evinces not only inefficient but also ineffective with respect to improving accessibility as there aren’t available supporting tools. Designing and evaluating with manual approach is a timeconsuming endeavour, imposes a high work load on the designers and may lead to completion of the design search with premature, suboptimal solutions. The modeling framework overcome these limitations by means of the definition of the AAL Solution User-Centered Design Methodology and the development of computer-aided supporting tools to be used by Interaction Designers and Usability Engineers at all development stages. The proposed methodology is defined taken as base the DIN EN ISO 13407[2], where a generalized human-centered design process for interactive systems is described. The phases and the support that the Modeling Framework provides are: • Concept phase. The main purpose of this phase is to elaborate the AAL Solution Requirements, including both functional and non-functional requirements. Usability engineer uses the simulation environment to present to seniors a wide range of real situations and elicit Senior reactions to identify Senior needs and goals. Interaction designer uses the authoring environment to model senior profiles (virtual seniors), interaction modes and interaction devices to visualize concept solutions. She/he also makes use of the simulation environment to explore concepts with Seniors
230
J.-C. Naranjo et al.
• Design Phase. The main purpose of this phase is to define the AAL Solution specifications that will be used by the development team to implement the solution in the next phase. The conceptual design of the AAL solution addressing all aspects is created and reflected in low-fidelity prototypes, which are evaluated by Senior Citizens. The Interaction designer uses the authoring environment to design solution, creating virtual prototypes of the AAL service including virtual AAL-enabled spaces. Usability engineer uses the simulation environment to immerse Seniors in the AAL solution by means of the virtual prototype to experience with the new interaction devices, new interaction modes and elicit Senior reactions to identify early acceptance and accessibility issues • Implementation phase. Main purpose of this phase is to build the AAL Solution Prototypes. The key design activity is transforming the validated conceptual design of the AAL solution into a concrete and fully detailed design including highfidelity prototypes that become really coded and fully functional. The Usability engineer uses the simulation environment to test the developed components against its accessibility features making use of the library of virtual seniors and providing improvements or corrective actions to the interaction designer. The Interaction designer uses the authoring environment to improve the design solution based on the input received from the usability engineer. • Test phase. The main purpose of this phase is to validate the final implementation of AAL Solution Prototypes to ensure it satisfies business, market and user requirements. The usability engineer uses the simulation environment to test final implementation, first detecting usability issues with the automated tool and then with real Seniors 1.2 The Modeling Framework Environments The modeling framework address the problem of designing user interfaces and interaction modes which are accessible and accepted by elderly people living in Ambient Assisted Living environments. The modeling framework is a tool that can be used across all the stages of the design cycle. According to the workflows of a User-Centered design methodology, the framework set up two work environments: • Authoring environment • Simulation environment The core of the authoring environment is the authoring tool. This tool allows the creation of 3 different models: • AAL Service Interaction model (any present or future AAL service) • Environment model (smart house, airport, station, work-place,...) • User Behavior Model (mobility impaired, visual impaired, ...) These models will be designed using a library of templates that can be shared and enriched among the interaction engineers. The specification of the templates is made by means of ontologies. The models created in the Authoring environment are used in the Simulation environment to:
A Modelling Framework for Ambient Assisted Living Validation
231
Fig. 1. Modelling Famework environments
• • • •
Validate the concepts of the interaction Produce the detailed design of the interaction Support the building of the interaction Validate the interaction
2 The Modeling Framework tools The authoring and the simulation environments provide to the interaction designer with a set of tools to fully understand the usability and accessibility facets of the AAL solution. These tools are detailed below. The environment model builder is the tool for creating not only the 3D scene of the environment but also the individual behavior of the elements deployed on it. This editing tool imports the 3D scene from any graphical user interface (so-called 3D designer tools, e.g 3D studio Max) and assigns the elements that are relevant for the interaction with the AAL service with the desired behavior. The user model characterization tool is the tool devoted to simulate the user side interaction. With this tool the designer can simulate how and when the interaction will take place. In this way the designer can simulate several user behaviors and store them in the model library. The idea behind that is to create a collection of different user behaviors to test the service without the exhaustive presence of the real user in the framework. The AAL service compositor is the designer tool for creating the workflow of the AAL service. It receives as input the environment model and the user model and permits the designer the establishment of the links between the different elements of the scene. The objective is to build the functionality and behavior of the desired real service. The model library is the repository of all the models developed with the tools of the authoring environment. It works at as structured and well organized set of models
232
J.-C. Naranjo et al.
where the designed can share, search and contribute to the enrichment of the framework in a collaborative way. The simulation control panel is the core of the simulation session. It is the configuration tool of the simulation and allows the tuning of the parameters for the simulation, the launch and stop of the simulation and, what is more important, the final design of the simulation. This point should be explained in more detail; one of the key features of the framework is that the designer can play with different configurations regarding the actors, the environment and the services. He can create a simulation with the user model or the real user, with the environment model or with real part of it and with a complete service model or with a partially implemented real service.
Fig. 2. Models and Simulation control panel
This strategy allows the swapping between a model of a device and the real device if it is available or the swapping between a part of the service model and a developed functionality of the service, maintaining the flexibility of the environment and supporting the User centered design process. The 3D simulation browser is the render engine for the 3D scenes. The user is immersed on it and then is allowed to interact with the environment and the representation of the service that runs on it. It is based in the development of the Fraunhofer Institute for Computer Graphics Research IGD and is called InstantReality [3] framework which is a high-performance Mixed Reality (MR) system. The interaction between the InstantReality player and the user is done by means of the 3D interaction devices. The 3D interactions devices are the physical interfaces that the user uses to “touch” the objects in the InstantReality, some example of devices being used are gloves, pointers, wiimote, etc. The lab verifier is the next step in the immersion of the user in the simulation environment. Instead of using the 3D browser, the user is invited to use the service in a real environment, using real interactions devices, normally in the last phase of the refinement. Here the lab is integrated with the service model to evaluate the final aspects of usability of the solution. The lab verifier is the last step in the simulation, but the same infrastructure is used to validate individual interaction devices. In this case the real devices are integrated with the 3D environment and with the service model and other device models as well.
A Modelling Framework for Ambient Assisted Living Validation
233
3 Results Different technologies have been used to create the framework. In any model there are two views that should be defined: The static view and the dynamic view. The first one describe the structure and the inner relationships of the model, the second describes how the different parts of the model interacts each other as time goes. By the other hand, the framework requirements imply that the models must be fully executable. 3.1 The Static Model: Ontologies There is not a unique way to model a domain. As an abstract description of a certain field of knowledge, several approaches are possible, like in the Entity-relationship model, in which several solutions can solve the same problem. Managing such amount of information requires a robust an efficient solution. Ontologies can solve that because they were conceived to define concepts and their relationships with other concepts in a hierarchically way in order to share the knowledge about a certain domain with the aim of reusing it and make feasible the analysis of the knowledge about the domain. Together with task ontologies, domain ontologies complete the specification of application ontologies. They also provide other features like reasoning that allows classifying, inferring or deducing new entries as well as provide the right answers when querying. In order to prevent creating an ontology from scratch, and not reinventing the wheel again, the ontology languages (as OWL is) include statements that allow importing and reusing other existent ontologies (in OWL is owl:imports), with the aid of this kind of construction the ontologies can grown easily the level of detail provided. This is the approach adopted in this framework. Although there are other languages or solutions (KIF, Frame based, RDF, etc.) to create and maintain an ontology, the decision was to choose OWL as the language to develop the ontologies managed by the project. This decision was taken because there are a lot of tools that allows the development, the access and reasoning of the model and instances hold by the ontologies, as well as OWL is a W3C Recommendation since 2004[4] When talking about the environmental domain, the most frequent environments which the AAL services target are those places which are “enclosed” spaces, such as a residence, a hospital, a house, etc. In most of the cases these services are deployed in a house, at the user’s home. There are other AAL services that were thought for more “open” spaces like a train station, a metro station, etc. Each type of environment has its own objects; some of them are proactive, whereas others have a reaction when a stimulus arrives, i.e., they have a response to the environment. In order to model these kinds of environments, the concepts related to spaces themselves have to be identified, i.e., rooms, walls, etc. Furthermore, the different objects placed inside these spaces have to be specified, including furniture, appliances, sensors, and actuators. Objects have a set of properties, such as their type and their dimensions; their dimensions can be taken into account in order to compute the free space between objects, as well as to consider accessibility issues. In addition to the types of objects that can be found in the environment, there are other features of the environment that must be also taken into account when modeling it like the amount of light, timing (intervals, time in the day), season of the year, etc.
234
J.-C. Naranjo et al.
The user model refers to the model of those who will be using AAL products and services. In this case, the user is mainly the person who uses, maintains or is affected by the use of the system under consideration, that is, the elderly people. When modeling the elderly people some considerations should be taken, there are normative models of changes on various dimensions, and a large body of knowledge about how age-related chances appear typically and on average in an aging population. These changes can roughly be divided into psychological, social, and physiological/ biological. Psychological aging contains several factors, such as cognitive functioning, psychomotor performance, and personality. Social changes related to aging can be viewed on both the individual and the interpersonal level. [5][6].At the individual level, areas subjected to change are such as personal roles and attitudes With increasing age, social roles usually change or are in transition; e.g. from parent to grandparent, from employee to retiree, and from married person to a widower. As the personal roles change, the relationship to the surroundings changes too, and accordingly, attitudes and values may change. Age-related biological and physical changes include bodily changes, e.g. changes in blood circulation systems, sensory systems, immune system, body mass, and muscles. Many of these changes affect the individual’s capacity to function, and some of them predispose the individual to illnesses.[7] When modeling the service we have to focus in the interaction of the user with such service. The Ambient Intelligence vision in which the AAL services are enclosed promotes the use of the implicit Human Computer Interaction defined by Schmidt, 2005 [8]. The implicit human computer interaction (iHCI) takes the users context into account when creating new user interfaces for ambient intelligence. The basic idea of implicit input is that the system can perceive the users interaction with the physical environment and also the overall situation in which an action takes place. When modeling iHCI the focus goes from the user dialog to the capture of the user context, for example, the location of the user, the movement of the user and the activities he or she are carrying out. 3.2 The Dynamic Modeling: The Workflow Technology In order to describe the interaction in terms of execution, the framework will combine the use of those ontologies with the use of workflows. In the project, the Workflow Technology is used to deal with the problem of dynamic execution of processes choreography. Workflows are formal specification of processes execution which can be dynamically executed using a software piece called workflow engine.[9] Workflows can be defined to specify the behavior of the user and the environment in the interaction simulation framework. There are a lot of workflow engines and workflow formal languages available in the market which could be used, like JBPM, Windows workflow Foundation, Staffware, COSA, FLOWer ... etc. In the classic literature there are a lot of solutions to model the behavior of an entity whatever it is: an actor, a system, a service, etc. There are solutions based on rules that try to model the behavior by using statements composed by two parts: the precondition “IF” + the action “THEN”. One of these kinds of rules is the called Event-Condition-Action rule. These rules were thought to act in response to an event launched by the system, but they had limitations to cover all the possible cases that are needed to model an entire behavior. Little by little the structure of the rules was changing in order to model the flow
A Modelling Framework for Ambient Assisted Living Validation
235
according with the real behavior of the entity to be modeled. At this point other type of solutions appeared based on modeling the behavior by using workflows, which can be defined using rules or not and also can use rules within them or not. Currently a workflow is defined as the formal specification of a process. It defines actions, performed by humans or by computerized systems, and the set of allowed transitions among them that define the possible paths that a custom process can follow. The languages used to specify workflows can be either based on theoretic models (like Petri nets or parallel automaton) or based on models oriented to the representation (like Business Process Modeling Notation), the UML 2.0 Activity Diagrams or the XML Process Definition Language, and they are executed in different workflow interpretation systems, which use an executable specification of the workflow. Regarding to the executable languages, the Business Process Execution Language (BPEL) [10] is a language based on XML. To perform the execution of the workflows it’s need a specialized workflow engine. After some considerations it was chosen as the most adequate the JBPM[11], a workflow management system that supports BPEL. 3.3 The Choreographer The 3D browser and the workflow engine work together by means of the process choreographer. Choreography is defined by OMG as "the specification of interactions between autonomous processes" [12]. Choreography is done at a global level, and all (or most) of the participants (in the process) are equally involved in, the choreography has an objective point of view of the process. The important for the choreography is the protocol; it describes interactions from a global perspective, the set of messages managed during all the process, which are exchanged between the participants without the aid of any controller. Choreography is collaborative; each participant describes the part they play in the interaction [13]. This is the approach followed by the framework. It was intentionally chosen because we want to preserve the independence of the different processes. The role of the
Fig. 3. Choreographer
236
J.-C. Naranjo et al.
choreographer is to allow the communications of the events that should be spread among the different workflow instances of the models, the events that are transformed to signals going to the 3D browser and the events that the user triggers through the 3D scene and go to the workflows of the service model and the environment models.
4 Conclusions and Future Work The paper describe a modeling framework for validate the usability and accessibility of AAL services inside of a User Centered Design methodology. The framework has been designed to model the interactions of a service, the environment where it will be delivered and the user who will use them. The framework provides to the designer with tools to create the models, to share them and to enrich them and gives the flexibility to control the simulation session and swap between the models and the reality. With these tools in hands of the designer, the ICT AAL system designers will be able to create fully featured AAL ambient with sensors and interfaces. In concordance with the User Centered Methodologies, during user-designer interview sessions and small focus group sessions, selected Senior Users will be to immerse in virtual experiences to elicit the most optimal UI that maximize accessibility to the given services. This virtuous circle might be repeated all along the process of designconstruction-testing process, providing tremendous support and assistance to ICT designers to fine tune their developments of UIs solutions. Not all the work is in the scope of the project. There is a long way to cover all the designer’s requirements, especially in the modeling of user characteristics. The research field that will lead to a complete model of a virtual user is still far to be achieved. Other area of improvement and research will be a more immersive virtual reality experience. The 3D platforms and the interaction devices to be used with them can be improved with better and non intrusive systems. Acknowledgments. This work has been partially funded by the European Union in the context of the VAALID project. Our gratitude to the VAALID consortium composed of the following partners: Siemens S.A Fh-IGD, UniPR, VOLTA, UPM, UID, SPIRIT and ITACA..
References 1. Steg, H., Strese, H., Hull, J., Schmidt, S.: Europe is facing a demographic challenge Ambient Assisted Living offers solutions (2005) 2. International Organization for Standardization. DIN EN ISO 13407 3. http://www.instantreality.org 4. W3C World Wide Web Consortium, http://www.w3.org/TR/owl-guide/ 5. Cumming, E., Henry, W.: Growing old: the process of disengagement. Basic Books, New York (1961) 6. Atchley, R.: The social forces in later life: An introduction to social gerontology. Wadsworth Publishing Company, Belmont (1971)
A Modelling Framework for Ambient Assisted Living Validation
237
7. Gogging, N., Stelmach, G.: Age-related deficits in cognitive-motor skills. In: Lovelace (ed.) Aging and cognition. Mental processes, self-awareness and interventions. Elsevier Science Publishers, Amsterdam (1990) 8. Schmidt, A.: Interactive Context-Aware Systems Interacting with Ambient Intelligence. In: Ambient Intelligence, pp. 159–178. IOS Press, Amsterdam (2005) 9. Workflow Management Coalition, http://www.wfmc.org 10. OASIS Web Services Business Process Execution Language (WSBPEL), http://www.oasis-open.org/committees/wsbpel/ 11. http://www.jbpm.org 12. The Object Management Group (OMG), http://www.omg.org 13. Peltz, C.: Web Services Orchestration and Choreography. Hewlett-Packard Company. IEEE Computer Society Press (2003)
Methods for User Experience Design of AAL Services Pilar Sala1, Juan-Pablo Lázaro1, J. Artur Serrano2, Katrin Müller3, and Juan-Carlos Naranjo1 1
Research Group of Technologies for Health and Wellbeing (TSB), ITACA Institute, Polytechnic University of Valencia, Spain {msalaso,jplazaro,jcnaranjo}@itaca.upv.es 2 NST - Norwegian Centre for Integrated Care and Telemedicine, University Hospital of North Norway
[email protected] 3 Motorola Gmbh
[email protected]
Abstract. This paper presents the approach followed to design the Ambient Assisted Living Services considered for its implementation and validation during PERSONA project. A methodology based on Goal Oriented Design have been followed in iterative cycles to incorporate insights from different stakeholders to the selected services, enriching and refining them through the development of mock-ups and interview assessment.
1 Introduction PERSONA is an EU VI FP co-funded project which aims at advancing the paradigm of Ambient Intelligence through the harmonization of AAL technologies and concepts for the development of sustainable and affordable solutions for the social inclusion and independent living of Senior Citizens, integrated in a common semantic framework [1]. Its main objective is to develop a scalable open standard technological platform to build a broad range of AAL services to demonstrate and test the concept in real life implementations, assessing their social impact, and establishing the initial business strategy for future deployment of the proposed technologies and services. To meet its objectives, the project is faced with the following challenges: • To find solutions and develop AAL Services for social inclusion, for support in daily life activities, for early risk detection, for personal protection from health and environmental risks, for support in mobility. • To develop a technological platform that allows the seamless and natural access to those services indicated above. • To create psychologically pleasant and easy-to-use integrated solutions. • To demonstrate that the solutions found are affordable and sustainable for all the actors and stakeholders involved: elderly citizens, welfare systems, service providers in the AAL market. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 238–247, 2009. © Springer-Verlag Berlin Heidelberg 2009
Methods for User Experience Design of AAL Services
239
The PERSONA technological platform will exploit and incorporate a broad range of relevant technologies which are developed and integrated in the project: AAL system reference architecture, micro- and nano-electronics, embedded systems, Human Machine Interfaces, biosensors, energy generation and control technologies, and intelligent software tools for decision support. An important measure of success for the project will come from the outcome of the evaluation and validation in trials in Spain, Italy, and Denmark. In order to assure the success, work has been structured in three lines of activities: AL1 focused on business strategy, AL2 on user experience and AL3 on technology development. The goal of AL2 is to assure the involvement of end-users and stakeholders in the process of defining, developing and validating AAL Services in such a way that these services will provide a total end-user experience from the start to end, having them embedded in people’s daily context of life in, around, and out of home, supporting people’s exploration of their own boundaries in relation to their social needs, wish for autonomy, security and mobility. The User Experience Design approach has been defined as an iterative process combining trends research with user experience methodologies with the aim of enabling continuous end-user insights and feedback along the project lifecycle.
2 Methodology The User Experience Design methodology followed has been based on the Goal Oriented Design methodology proposed by Alan Cooper in his book “About Face” [2]. In the case of PERSONA, work in UX design has been structured in three subphases during the first eighteen months of the project • Research & Modelling • Specification • Assessment 2.1 Research and Modelling Phase The Research & Modelling phase objective is to gain general knowledge about the domain, to scope out the kind of services that are most valuable when it concerns social inclusion, independence, security and mobility and to analyse the current state and future expectations in these areas. This will be used to create the “context of use” of the defined AAL services by addressing the following aspects: • Target group specification: It is a predefined list of attributes of targeted users. The goal is to study the target group and identify different typologies depending on the context of use we are focusing and also the particularities of different pilot site countries. It is important to make clear which target group we want to address within each context, because it will determine to a great extent the interaction/design/development of the services to provide in.
240
P. Sala et al.
• User profiles: Is a short of summary of the different types of personalities that we have identified within our target groups, and specifically focuses on the context of everyday life of these personality group. The use of user profiles enables us to look at needs and values of individual people and to define services that better meet these needs. Differing profiles should be presented separately and user profiles are relevant instantiations of a target group for a specific “context of use”. The next step is to illustrate each AAL Service by means of the definition of Scenarios (“A day in the life of…” tool). Scenarios are a means to consider the contextual information of a person using a solution. In our case, these solutions are each of the AAL Services that will be provided within a “Context of Use”. It helps in identifying why the person is using a particular service, in what environment this person is using it and with whom the person is interacting. Scenarios can present the situation on various levels. In this early stage the focus has to be made in the storyline and the interaction flow between the user and the system providing as result the user experience model.
Fig. 1. Research and modelling phase process
2.2 Enrichment Phase The Enrichment phase aims at improving the service offer by working out the implications of the user experience model resulting in high-level specifications of service directions. The enrichment cycle delivers answers to the following main questions: • • • • •
WHAT services are appropriate to develop? WHY are they so important to develop? WHO is benefiting from the service? WHERE in the context of users’ daily life will they access the service? HOW will they interact with the service?
Methods for User Experience Design of AAL Services
241
The enrichment process consists in three steps: 1. Improve value creation • Improve storyline based on end user feedback • Specify the drivers and core values behind the service considering the business aspects • Review stakeholders involved in the service by specifying the business role and information flow • Review system capability considering feedback of technical experts 2. Subdivide scenarios into functionalities • Specify single functionalities drivers based on user needs • List tasks (use cases) needed to enable functionality 3. Define interaction paradigms for service access and use • Determine common use cases • Define key touch points between actor and system • Specify related assumptions, pre- and post conditions Templates of specification have been defined taking into account the requirements of the following refinement cycle. The next step consists of a technical, business and user experience evaluation. Technical analysis is done in order to determine the completeness of the scenarios already described and the level of feasibility according to the technologies to be developed. Business analysis is done in order to determine the viability of the scenarios and the complexity to develop a business model around each of them. Private and public care sectors have to be considered and the roles need to be defined. User experience analysis is done to determine if user needs are appropriately addressed and proposed services are desirables from the user perspective as well as from the public and private welfare system. Based on the user experience analysis, scenarios for building mock-ups are selected. 2.3 Refinement Phase The Refinement phase objective is to reach a solid service specification by refining the services through iterative stakeholders input loops [3]. With the use of mock-ups and interviews, data is collected from the stakeholders and then evaluated. The complete evaluation process is dividing in reporting phase, conclusion phase and scoring and results phase. The reporting phase involves the capturing of input from the representatives of each of the stakeholders. This information is useful for further service refinement and future reference. The conclusion phase involves the processing of the raw information collected in the reporting phase. Here essential findings and design implications will be gathered. The output from this phase will include the overall summary and conclusions. The scoring and results phase will be necessary for consolidating data from the reporting phase and will run in parallel with the other two phases. Here graphs and
242
P. Sala et al.
charts will be produced, after processing the data collected. The output is used to rank the AAL services and is included in the summary of the conclusion phase. For end-users, questionnaires are used to assess, in a standard and formal way, subjective judgments, attitudes, opinions or feelings about the services presented during the interview or focus group. Results allow us to compare between services and rank them from the user interest point of view. Technical and business relevance is done also by means of questionnaires that will be used by experts as a guide to analyze the main characteristics of the services and assign value to them in order to rank the services. At the end of the process we will have 3 different rankings of services that will allow us to prioritize for development the most promising services in term of user needs, technical challenge and feasibility, and business opportunity and viability. The aim of the interviews is to gather qualitative insights from stakeholder perspectives. These qualitative insights will: • Support the process of prioritizing the most attractive services, by providing guidance and ensuring comparability (as much as feasible); • Support the further development of the services, by validating the concepts and identifying improvement points and strengths. Because one major purpose of the testing at this stage is to compare the proposed services, it is necessary to provide comparable data. This is best achieved if a common approach is taken to the information gathering, that is, all services/scenarios should be testing using the same methods. We have selected the interview as being the most appropriate method because: • It provides rich qualitative data • It can be carried out by non specialists, with some training • It is not too time consuming Finally the structure of the interview is important (order of subjects and timing). The general structure of the interview is that it will go from general to very specific issues. To accomplish this, we need to take care of the order of the questions, so the progress will go from general to specific questions. Throughout the interview, the focus of the questions will change from the interviewee to the service/scenario presented. It’s important to keep the questions flexible to time, so that it will be possible to cover the basic level of all topics (even if somebody gets ill/tired during the interview). A more extended overview of the structure of the interviews is given in the table below. Table 1. Overview of interview structure Section Introduction Storyboard Additional mock-ups Conclusion Closure Data pre-processing
End-user Interview Subject Setting the scene Walking through and questions Walking through and questions Questions and scoring Ending the interview review the notes
Time 10 min. 20 min. 20 min. 10 min. 5 min. 10 min.
Methods for User Experience Design of AAL Services
243
During the interviews the end-users are presented with several mock-ups that have been produced to make them experienced the possibilities of the future PERSONA Services. The types of Mock-ups that have been used are: • Mock-up in PPT file, eventually including films and animations. It presents the overall description of the Scenario/Service and the benefits intended for the enduser. Some examples of the slides are shown in Figure 2 • Mock-up in tangible format (real mock-up: Wizard of Oz). Some concepts implemented it to show specific details of the Scenario/Service in the form of interactive flash videos or tangible objects.
Fig. 2. Example of Storyboard slides
3 Results 3.1 Research and Modelling Phase According to the analysis made in advance on the services of high potential impact for independent living of senior citizen, four categories of AAL Services have been identified: • Social integration: services in this category aim at alleviating loneliness & isolation by empowering social contact and sharing of vital experiences • Daily activities: services in this category aim at improving independence at home by supporting the realization of daily activities • Safety & protection: services in this category aim at creating safe environment by detecting risk situation occurrence and taking care of them • Mobility: services in this category aim at supporting life outside home by providing contextualized information and guidance. During the first iteration, each PERSONA AAL Service category was considered as a “context of use” and following aspects were defined and discussed in detail to compose a matrix related to the four spaces where the services could be provided (body, house, neighbourhood and village): • User needs: to define the main goal of senior citizens in each of the spaces and for all the categories of services.
244
P. Sala et al.
• Issues that needs to be taken into account to cover user needs • AAL Services: to formalize how to address the issues to support the end-users in achieving their goals. Table 2. Example of matrix for Social integration needs PERSONA space
Home
Neighbourhood
Village
Stay connected with Enlarge activity radius, variety of Very similar to the world, avoid and activities, frequency of activities Neighbourhood ADL need reduce isolation Improve respect and acceptance o f elderly • Living alone • Old friends are dying • can not leave the • Moving to new environment home e.g. nearer to own children, residence, sheltered • no interests and senior homes and losing old contacts losing motivation to Issues and familiar environment take part • Fast changes in neighbourhood (traffic, known places, events, volume, size, neighbours interest) • Virtual meetings • New friend finder and communities • Managing neighbourhood • Virtual learning and communities and mixed exercise sessions volunteer and commercial • State dependent networks AAL services • State depended resource maninput/output interfaces, agement communication and • Synchronising preferences with information devices event transport and care business
A preliminary list of scenarios has been generated for each of the contexts based on the AAL Services identified in the DoW. Then during work sessions these services were analyzed together with the contexts of use defined for each pilot site country trying to find out if the needs of the users were covered by these services or new services needed to be defined as well as if there existed significant differences between the pilot sites that needed to be addressed in the definition of the AAL services. As a result, 16 promising use scenarios were produced as the first description of user requirements, combining our hypothesis based on our experiences in the field as well as real situations of elderly from the pilot sites which have been discussed and analysed in expert workshops. 3.2 Enrichment Phase The second iteration took place during the enrichment process of the scenarios formerly defined. It started with prioritization of the current 16 scenarios according to its technical relevance and end-user interest. Eight scenarios were selected for
Methods for User Experience Design of AAL Services
245
Fig. 3. Example of information flow between stakeholder defined for AAL Service Peer-toPeer communication
improvement, technical requirements were included in the definition and functionalities that cover user-needs were described together with detailed interaction flows between systems, devices, end-users and other stakeholders. The result has been used as base to develop mock-ups to be validated by end user collectives to provide feedback of user experience to the AAL service specification. 3.3 Refinement Phase In the Refinement phase, end-users have been invited to prove service concepts and definitions through mock-up evaluation, providing a comprehensive feedback of user experience to the AAL service specification [4]. Then the User Requirements (UR) extraction process has been carried out, transcribing the requirements of real users derived from the previously performed interviews as well as from Use Cases described in the scenario enrichment process and from a technological point-of-view provided by the partners in charge of developing the services. Additional external sources have been other EU projects, external experts interviews and three workshops have been conducted to discuss the work done and gain more knowledge from a broad set of stakeholders (health and care professionals, legal experts, politics, housing companies etc.). The last step has been to perform a comprehensive prioritization of UR from business, quality of care and technological innovation point of view to deliver the final Service Specification and User Requirement to the development team. The mock-up evaluation was performed in the three pilot sites, namely Denmark, Italy and Spain. The recruiting of the volunteers was different in the three countries. In Denmark they used the phone book and dialed numbers asking whether anyone was in the specified age group. In Italy they contacted a social club asking for volunteers. The people interviewed were volunteer members of an association that provides assistance to elderly people. While in Spain the personnel in charge of a centre for elderly people and trainers in a project dedicated to train elderly in the use of Internet, recruited volunteers.
246
P. Sala et al.
The interviews were carried out in conference or meeting like rooms in all the three countries. TV screens, projectors, laptops were used to present the scenarios, and in Denmark also the interview guide, a paper form to be filled in by each observer at the interviews. In Spain also printed copies of scenarios presentations were used. Methods for the gathering of data varied. In Denmark they carried out collective or focus group interviews. In Spain they have been conducting individual interviews. While in Italy the first set of interviews were carried out collectively and the second were done individually. Also the method for registering of data varied. In Denmark each of the observers had been given one person to focus on and register their opinion in the interview guide. This resulted in very different ways and formats to register the answers. In Italy the interviewers took down notes. In Spain the interviews were taped. The presentations of the scenarios worked efficiently. The overall impression is that respondents were talkative and easy to communicate with. The report from the Italian pilot sites argue that the questionnaires are a bit too long, with the same information being asked for more than once. Finally, a total of 100 interviews were performed (DK 31, IT 29, ES 40) and the results reported back to the development team with the following documentation: • One Respondent Report per interview performed. The goal of this report is to capture the interview per respondent. It can help for the further development of the service and in the future it can serve as a reference document. It contains information within a certain category, namely: o o o o o
Final impression / opinion of the concept: Current situation / vision of the future needs First impression of the concept The impact of this concept Willingness to use / pay for this concept
• One Scoring Sheet with data from all the interviews. This is a supportive Excel template that consolidates the concept scoring sheet data from all the interviews and automatically generates graphs.
4 Conclusions The methodology defined to perform the mock-up assessment presented some problems when trying to put it in practice. Different constrains, such as time, resources or cultural differences, led to alternative implementations of the evaluation process in each of the pilot sites. The consequence of having different methods used to gather information has produced a certain level of heterogeneity in results, so a careful analysis has been needed to perform comparison between scenarios. The scenarios and mock-ups presented were well understood and we got very valuable feedback for scenario prioritization and improvement. The overall evaluation and prioritization has been done taking into account the average percentage of favorable answers in the scoring sheets.
Methods for User Experience Design of AAL Services
247
The main factors that have influenced in the valuation of the scenarios have been related to perceived usefulness of the service presented and privacy and security issues. The scenario that was perceived as most intrusive has got the lowest rate while in the high part of the list were the services that presented more perceived benefits to maintain an independent life. The feedback gathered and the conclusions extracted throughout this mock-up assessment process are being included as user requirements in the service definition and functional specifications.
References 1. 2. 3. 4.
PERSONA Project Description of Work Cooper, A.: About Face 2.0: The Essentials of Interaction Design IR2.3.1.AAL Services Refinement Methodology – PERSONA Project D2.2.2 Report on User Assessment Outcomes – PERSONA Project
Self Care System to Assess Cardiovascular Diseases at Home Elena Villalba, Ignacio Peinado, and María Teresa Arredondo Technical University of Madrid, ETSIT, Adva. Ciudad Universitaria sn. 28904 Madrid, Spain {evmora,ipeinado,mta}@lst.tfo.upm.es
Abstract. CUORE is a Heart Failure (HF) Disease Assessment System that uses Information Technologies (IT) and portable monitoring devices, for to assess and to manage the HF progression. The system valuates the cardiac condition integrating patient data from different sources such as blood pressure cuff or questionnaires. Rather than just evaluate the cardiovascular status, the system also aims to motivate patients to have an active role in their health management and to improve their cardiac condition through an active lifestyle. This paper presents the CUORE validation with patients and professionals. Keywords: personal health systems, mobile applications, goal oriented design; cardiovascular diseases.
1 Introduction The proportion of elderly people (aged 65 or over) in the European Union is predicted to rise from 16.4% in 2004 to 29.9% in 2050 [1]. This will increase the number of elderly suffering from chronic diseases such as cardiovascular disease (e.g. heart failure). Besides, cardiovascular diseases (CVD) are the leading cause of death in Western World [2]. Solutions which allow patient self-management of their chronic condition such as COURE may help the whole society population to face this coming social context playing a role of major importance. CUORE provides cardiovascular patients with a usable and fluent channel of communication with health professionals in order to support them to manage comorbidities related to their condition. The proposed solution monitors at home vital info such as BP, oxygen saturation, weight and HR. Furthermore, it aims to motivate patients and to improve adherence to the technology and medical protocols. The professionals will interact through a ubiquitous connection based on a secure web portal.
2 Methods CUORE interacts with old people who suffer a cardiovascular disease. Thus, usability is crucial. The methodology focuses on patients [3] and follows Goal Directed Design [4]. The global process is divided into 3 phases. The first phase is the conceptualization phase, in which the patients’ and functional requirements are elucidated and the C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 248–257, 2009. © Springer-Verlag Berlin Heidelberg 2009
Self Care System to Assess Cardiovascular Diseases at Home
249
concept is created. Once the concept is defined, the implementation phase starts. The implementation phase is an iterative process, where increasingly complex implementations prototypes are tested and refined. Once the final version of the system is developed, the validation phase starts. The initial input in the conceptualization phase is the social and medical need of solutions to assist people living with a chronic disease, especially when they have special needs of interaction (i.e. elderly). Initially, similar solutions where studied and literature about interaction, heart failure and living with a chronic disease was revised to understand the context and needs [5]. The modeling focused on users, named personas; and scenarios [4]. Personas and scenarios are used to define the functional and non-functional requirements of the system. Once the requirements are defined, interaction design starts. The interaction design comprises the definition of the look and feel and all the key and secondary paths. In NUADU, one of the most important inputs when starting to design the CUORE application was our own expertise in the field of cardiovascular applications, and more specifically within the MyHeart project [5]. During the first half of 2005, a conceptual validation of the Heart Failure Management system within Myheart Project took place in Madrid [6]. The results of this conceptual validation were the starting point of the CUORE system. We performed a confrontation with stakeholders in order to understand their vision and constrains, through interviews and observations with users to capture their needs, goals and behaviors. Three target groups were involved: heart failure patients, medical specialists, and hospital business managers. In total, 26 people were interviewed: 10 end users, 6 business managers and 10 cardiologists. These interviews were designed by Philips Design Eindhoven in the scope of the MyHeart project [5]. After reviewing all the interviews, the next conclusions were extracted. These conclusions were extrapolated for generic cardiovascular applications, and served as a starting point for designing CUORE: • Although most patients did not have any experience using electronic devices, the overall attitude towards the system was very positive. • Most patients addressed the need for mobility; therefore the chosen device should be a mobile phone or a PDA. • All design has to focus on users when the objective group is not professional. Especially, acceptability and usability problems appear with old people and patients. PDAs are suitable for old people when they are carefully designed. Nevertheless, when technical errors occur, users do not trust any longer on the system. PDAs are suitable for old people if they are stable and robust enough. • The main problem detected in the use of the PDA lied in how to make it a dedicated device. • Assessment by health professionals increased the feeling of security and confidence of patients. • Most patients did not very comfortable with the idea of intelligent garments, especially in terms of aesthetics and comfort. Some of them claimed that a fixed routine may be a burden to their lifestyles, and addressed the need for bigger modularity of the system and freedom.
250
E. Villalba, I. Peinado, and M.T. Arredondo
• The medication regime of older patients and patients with co-morbidities can be extremely complex, including up to 10 or 15 pills a day. Therefore, medication management is one of the most important issues for these patients. • Some interviewed experts addressed the importance of education for this kind of patients. Education should cover aspects such as symptoms recognition, better understanding of medication and importance of a healthy lifestyle, among others. The previous results, together with an exhaustive literature review and the opinion of subject matter experts, were the basis for defining the target population, building personas and scenarios and defining the system’s functionalities.
3 System Description 3.1 System’s Design CUORE has three main actors that interact with the system. For each actor a persona (i.e. archetype that contains the main characteristics of the target group) is created. The patient persona is a 72 years old man named Carlos Gómez. He is retired and claims for a sense of independency. He is conscious of his heart status. His cardiologist is Dr. Casas, 52 years old, who aims to feel reassurance for his patients and the possibility of early diagnostic of a decompensation. Marta Besteiro is the nurse in charge of the assessment out of hospital. She is worried about the trend of all her patients and she asks for tools to give them a better support and understanding of their own situation and health status. We used scenarios [4] to elucidate the requirements and functionalities. The main user requirements are the easiness of use and high usability. The functionalities for the patient are listed below: • Medical data gathering from all sensors around the user. • Treatment consultation for drugs dosage and frequency/periodicity. • User guide for Weight, Blood Pressure and ECG/HR measurement, including help functionalities. • Questionnaires to report health status to the professionals. • Medical agenda with reminders, next visit to the hospitals, etc. • Message box for managing all messages received from professionals (e.g. read/unread or priorities). • Well-being practice and understanding your heart and its care functionality. The professionals access all data through a web portal with login and password. The portal’s main functionalities are: • • • • • •
All patients overview with main status and evolution, highlighting worsening. Patient edition. Patient record visualization. Medical data consultation. Treatment consultation and edition (only cardiologist). Professionals community – information interchange among professionals, with annotations, “post-it”, etc.
Self Care System to Assess Cardiovascular Diseases at Home
• • • • •
251
Messages to patients with filtering. Tools for visualizing vital signals (e.g. ECG) or trends. Besides, the system offers the following common functionalities: Data synchronization among home stations and servers. Data managing for extracting information about the status of all patients. This is considered not only in terms of health status, but also regarding their behavior, and technical status on the sensors and devices. The interaction design culminates with workflows for all functionalities [4][6][7].
3.2 System Implementation CUORE is divided into three main areas: the user interaction system, the common platform which contains all services; and the professional interaction. Fig. 1 sketches the CUORE system.
Fig. 1. COURE system consists of the User Interaction and the Professional Interaction. On the left side, we can see the User Interaction comprising necessary sensors and a personal assistant device (i.e. PDA) to be used by the patients to interact with the global CUORE. The right side represents the Professional Interaction thought a web portal to assure ubiquity.
The user platform is a modular architecture running on Microsoft’s .NET Framework [9][10]. The sensors and electronics to monitor patients in their daily routines are: 1) a blood pressure cuff, 2) a weight scale, 3) a ECG/HR monitor; and 4) an oxygen saturation monitor. All devices except the weight scale are integrated in a Monitoring Device called MSV that is manufactured by TSB solutions in Valencia, Spain. All measured data are sent to the user interaction device via Bluetooth serial port. The user interaction device, the PDA, guides, educates, motivates and gives all necessary feedback to the patient. Fig. 2 shows examples of the application’s interfaces. The common platform assures the connectivity and security. This platform enables communication and data synchronization among all modules in the global distributed system. Professionals interact in a ubiquitous way through a secure web portal to follow all patients’ trends and to manage their treatments. The professional portal is implemented in HTML and CSS following WEB 2.0 principles [11]. Fig. 3 shows two examples.
252
E. Villalba, I. Peinado, and M.T. Arredondo
Fig. 2. Examples of the patient’s PDA. First screen is the Home one. Secondly a example of the medication area is shown. Afterwards, the measurements’ page and finally some messages are shown.
Fig. 3. Two views of the professional portal. In the left side, the patient summary is shown. An example of the creation of a new treatment is represented in the right side of the figure.
5 Validation The validation firstly included a heuristic analysis made by experts and “think aloud” sessions with potential users and actual users of similar applications. During the “think aloud” sessions, users are asked to speak their thoughts while performing a task or navigating through the application [12]. These sessions were used to gather qualitative results regarding the patient’s expectations, needs and contextual condition. In November and December 2008, the final version of CUORE was validated in collaboration with the Hospital Clínico San Carlos de Madrid. The validation comprised two phases: first, the system was validated with patients; then, the system was validated with health professionals. In both cases, the methodology for validating was a guided personal interview with people in order to gather qualitative insights and constraints and quantitative results. All interviews had a similar structure, comprising 5 sections: introduction, storyboard, tangible, conclusion, and closure. During the introduction, the interviewer explained the purpose of the interview, remarking its non-profit nature, and some general questions about the user’s position towards new technologies and health care were posed. Within the storyboard section,
Self Care System to Assess Cardiovascular Diseases at Home
253
the interviewer presented the whole system and asked the interviewees open questions about their insights and doubts. Once the basic system was grasped, the tangible section started, where the interviewee was asked to perform a set of predefined tasks with as little intervention from the interviewers as possible. The tangible phase was different for patients and professionals. Patients were asked to use the PDA to perform a set of measurements and to freely navigate through the application. Professionals were shown different parts of the system, depending on their profile and expected role. For instance, cardiologists and hospital staff were only asked to use the web portal, while social workers were asked to use the PDA. During the conclusion section, the interviewer asked users for the overall impression, and patients were asked to fill two scoring sheets with closed questions rated from 1 to 5, which provide qualitative results. The first questionnaire comprises 10 questions regarding the user’s attitude towards the system and acceptance. The second is a user experience questionnaire that aims to study the acceptance of the users of the overall concept and each part. One of the partial parameters of the user experience evaluation is to compare the patients’ insights before using the system and after, to measure whether the system satisfies their expectations or not. Thus, some of the patients took the system home for a couple of days, where they could perform all their measurements, receive messages from the system and educational tips and check their medication schedule. In the closure section, the interviewer thanked the participant and took some last notes about the experiences during the whole interview. The time for each interview varied from 60 minutes for the professionals to about 120 minutes for the end users. During patients’ validation, 10 actual and potential patients were interviewed. 7 of the patients were recruited at the Hospital Clínico San Carlos after visiting their Heart Failure specialist. 3 of them were recruited out of the Hospital. The selection criteria stated that the interviewees should have been diagnosed with heart failure or should have suffered a cardiovascular accident in the last two years, i.e. stroke or myocardial infarction. No distinction was made in terms of gender or age, with ages ranging from 38 to 74 years old. At the end of the first encounter, patients were asked to fill two scoring sheets. The first questionnaire comprises 10 questions and focuses on the overall impression of system, the willingness to use the system and the patients’ expectations and concerns. The next list shows the 10 questions: 1. I like this concept. 2. I will use this concept. 3. This concept will reduce my quality of life. 4. This concept will motivate me to a healthier lifestyle. 5. This concept will make me feel neglected. 6. This concept will be a burden to my lifestyle. 7. This concept will help me stay in control of my health. 8. I will never trust this concept to look after my health. 9. This concept will invade my privacy. 10.This concept will offer me a pleasant experience.
254
E. Villalba, I. Peinado, and M.T. Arredondo
Fig. 4. Results of the scoring sheet given to the 10 patients interviewed to validate the concept and system idea
Fig. 4 shows the results of the initial questionnaire. The results are grouped into three clusters: positive answers (1-2) are displayed in green (right side in the graph), neutral answers (3) are displayed in green and negative answers (4-5) are displayed in red (left side in the graph). In general, the first impression of the system was largely positive. Most of the patients liked the concept and admitted they would use it, under the condition that it would not be too expensive. The majority of negative answers came from one of the older patients, who recognized herself as reluctant to any kind of technology. Regarding question 3, some patients – mostly younger than 55 – considered it would be a problem for them to integrate the system into their daily lives, as some of them still were working. Regarding question 4, some patients – specially the younger ones – valuated the system more as a tool for self-management than a tool for remote assessment. Moreover, most of the patients stated than an alarm should be included for the medication management. Regarding question 10, most patients valuated the usability and ease of use of the system, but some of them stated they do not want to be constantly reminded of their sickness. The second questionnaire was a user experience questionnaire. This questionnaire comprises open questions regarding the patients’ willingness to use the system, their impressions after using the system and their expectations. It also comprised scoring questionnaires to evaluate the patients’ general impressions about the system and its parts. In order to gather quantitative data, patients are asked to evaluate a set of adjectives describing their experience (e.g. interesting vs. not interesting, scary vs. not scary, pretty vs. ugly, fun vs. boring, simple vs. complicated and easy vs. difficult) when using the system in a range from 1 (totally agree) to 5 (totally disagree). The open questions addressed the whole system and all individual devices, and aim to gather information on the strengths and weakness of the system and the patient’s willingness to use the system in a daily basis.
Self Care System to Assess Cardiovascular Diseases at Home
255
The answers to the open questions varied depending on the patient’s age. Most elderly patients (>65) addressed the assessment from health professionals as CUORE’s main strength. Moreover, most patients stated that being remotely monitored increased their feeling of security and comfort. Younger patients, on the other hand, valuated the system as a tool for self-managing their condition, enhancing their motivation through self-assessment of vital signs such as weight and blood pressure. CUORE’s solution for medication management was highly appreciated by both groups, as most patients (7 out of 10) considered medication management as one of their main problems. Nevertheless, while most elderly patients considered that the medication management area would be useful for their informal caregivers and professionals’ helpers, younger patients valuated the use of the system for automatic reminders and education on the medicines. Education on symptoms and medication was highly valuated by both groups, while younger patients had reservations about education on lifestyle, as they considered it may be intrusive and annoying. Education should be displayed by prompting messages and should not be compulsory. Prompted advices should give access to additional material in standard formats such as paper or video. The general impression of the system was positive. The only hesitations came from elderly patients who described themselves as reluctant to technology. Nevertheless, most of the patients who had recognized to be scared of new technologies later recognized that, after the introductory explanations, it had been easy for them to interact with the system. The MSV-404 was considered as scary and difficult to use by most of the patients. Most patients already have measuring devices at home, such as weight scales, blood pressure cuffs, and considered it preferable to continue using devices they already know. Moreover, some of them stated the need to add new sensors to the system – i.e. glucometers – for patients with co-morbidities. Four patients took the system home and used it for one day or two, depending of the availability of the VSM. After this time, the interviewers went to the patient’s home, where the patient was asked to fill the same questionnaire they filed during the first encounter. The next figures show these 4 patients’ impressions before and after using the system in a real environment. The next figures show the patients’ impressions about the PDA before and after taking the system home. The results show that three of the patients did not significantly changed their minds after using the system in a real environment. The only patient that changed his mind had connection problems between the PDA and then at the end of the first day, which made his impressions much worse after taking the system home. It is worth noting that health care systems have to be very carefully designed and implemented in order to guarantee the patient’s adherence and confidence. Interviews with professionals aim to explore the business opportunities, to identify all actors involved in the health care process of chronic patients and to identify the barriers and challenges that arise when designing a holistic approach to treatments and health care. The selected professionals include cardiologists, electrophysiologists, general practitioners, nurses, pharmacists and social workers. The format of the interviews is similar to the format that was used with patients. During the storyboard phase, all professionals were shown the whole system, including the patient side. The tangible phase was different for each professional, depending on their profile and
256
E. Villalba, I. Peinado, and M.T. Arredondo
expected role. Thus, hospital professionals were asked to interact with the web portal, while pharmacists and social workers were asked to interact with the PDA and the web portal. After that, professionals were asked some open questions and they were prompted to fill a scoring sheet questionnaire in order to gather quantitative data on the impressions and insights about the system. The scoring sheet questionnaire comprises 10 questions. The questions are listed below: 1. This concept is a good solution for this health condition. 2. This concept will improve the quality of health. 3. This concept will reduce the effectiveness of care. 4. This concept will damage my relationship with the patient/client. 5. This concept will improve communication in the professional team. 6. This concept will increase my workload. 7. This concept will complicate my way of working. 8. This concept will provide me with reliable information. 9. I think health professionals would not easily adopt this concept. 10.I will recommend/prescribe this concept. The results showed that the response towards the system was mostly positive. Most professionals stated the importance of having quick access to all the information about the evolution of the patients’ vital signs between visits and their treatment. The web portal was highly appreciated, but most hospital professionals stated the need of having all information regarding the patient’s treatment and health record displayed in a unique screen. The medication section was also highly appreciated. Nevertheless, professionals of all profiles stated their concern regarding the difficulty of introducing a system like this in the actual health care system. 9 out of 10 professionals stated that they would recommend or prescribe this system to their patients or clients, as they considered the system as useful to enhance the patient’s motivation and adherence to the medication regime. Nevertheless, cardiologists had doubts on the reliability of the gathered data.
6 Conclusions The final results were really promising. Patients and professional who took part in the process show a high interest on the system. We could learn how to go on with our research trying to find the best way to implement solutions for personal health systems. A detailed analysis to enhance individual experience incorporating this system into the daily routine in a long term basis requires further study. A study is underway on the behavior components towards e-health to create a communication framework plus increasing the patient’s interest in such systems. A future framework considers the analysis of different variables to assure the motivation in the long term use. Likewise, we must evaluate the long-term impact in the quality of life of heart patients and their health status. In the coming years we will face a social change in which these system may play an important role in chronic patients’ daily lives, supporting a quality of life that will prevent and treat chronic diseases. Besides, from the economical point of view, we also need new ways of facing the amount of people demanding care.
Self Care System to Assess Cardiovascular Diseases at Home
257
Acknowledgments. This work has succeeded thanks to the close collaboration with Hospital San Carlos of Madrid, Spain and ITACA Institute, (Valencia, Spain). COURE is an integrated system of the NUADU project (ITEA 05003).
References 1. Eurostat, Population projections 2004-2050 (2005) 2. Mackay, J., Mensah, G. (eds.): World Health Organization. The Atlas of Heart Disease and Stroke (2004) 3. ISTAG Report on Experience & Application Research. Involving users in the Development of Ambient Intelligent. European Communities (2004) ISBN: 92-894-8136-3 4. Cooper, A.: About Face 3.0 The Essentials of Interaction Design. Wiley Publishing, Inc., Chichester (2007) 5. Villalba, E., Peinado, I., Arredondo, M.T.: User interaction design for a wearable and IT based heart failure system. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 1230– 1239. Springer, Heidelberg (2007) 6. Villalba, E., Ottaviano, M., Salvi, D., Peinado, I., Arredondo, M.T.: Iterative user interaction design for wearable and mobile solutions to assess cardiovascular chronic diseases. Advances in Human-Computer Interaction, pp. 335–354. Ioannis Pavlidis, ISBN 978-9537619-19-0 7. Moggridge, B.: Designing Interactions. The MIT Press, Cambridge (2007) 8. Zwick, C., Schmitz, B., Kuehl, K.: Designing for Small Screens. AVA publishing, SA (2005), ISBN: 2-940373-07-8 9. Rubin, E., Yates, R.: Microsoft .NET Compact Framework. Sams Publising (2003) 10. Schmidt, M.: Microsoft Visual C# .NET 2003. Sams Publising, USA (2004) 11. Haine, P.: HTML Mastery: Semantics, Standards, and Styling. Friends of ED (2007) 12. Dumas, J.S., Redish, J.C: A practical guide to usability testing. Intellect Books (1999), ISBN: 1-84150-020-8
Ambient Intelligence and Knowledge Processing in Distributed Autonomous AAL-Components Ralph Welge, Helmut Faasch, and Eckhard C. Bollow Institut für verteilte autonome Systeme und Technologien (VauST), Leuphana University of Lueneburg, Volgershall 1, 21339 Lueneburg, Germany {faasch,welge,bollow}@leuphana.de
Abstract. With the development of computers regarding integration, size and performance we observe a quick increase of computational intelligence into all areas of our daily life. It is shown how to build Ad-Hoc-networks with our middleware to generate emergent intelligence in the behavior of the complete networks. Our approach shows the application of AAL-components (components for ambient assisted living (AAL)). Here we have as well the questions of sustainable development: increasing consumption of resources and energy in the production phase, reduced periods of use phases. Ambient computing and ambient intelligence show a high potential to modify the society’s treatment of resources and energy. The interaction with “intelligent” things will change our conception of production and consumption. Keywords: Autonomous Systems, AAL (Ambient Assisted Living), Knowledge Representation, Services for Human-Computer-Interfaces, Ad-hoc Network, Semantic Method Invocation.
1 Introduction Ambient Intelligent Networks are built using different types of so called Smart Nodes. The Smart Nodes are implemented in different categories. First we have the Mobile Smart Nodes (MSN), which are a kind of PDAs (Personal Digital Assistants) – carried by human beings. Further on there are TESM (Thin Embedded Smart Nodes) and FESM (Full Embedded Smart Nodes). The Smart Nodes act as intelligent clients which communicate all together in Ad-hoc Networks. They offer different types of services to each other. While operating independent and together at the same time we receive emergence effects. The Smart Nodes together supply intelligent behavior. Human beings are assisted and supported by the network. We find application fields for AAL (ambient assisted living), for energy management and basically in everything [1], [2], [3]. If we embed Smart Nodes into usual things of the daily life it leads to an Internet of Things [4]. The support of human acting takes place without overruling the human will.
2 Prerequisites and Methods In the following we prepare the prerequisites and methods for establishing the intelligent ambient network. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 258–266, 2009. © Springer-Verlag Berlin Heidelberg 2009
Ambient Intelligence and Knowledge Processing
259
2.1 Ambient Intelligence-Platform To implement an Ambient Intelligence system we need as well the appropriate hardware and software platforms to establish a modular base infrastructure for a highly adaptable network. Our approach provides a transparent interconnection between users and ubiquitous knowledge [5]. Based on ambient networks consisting of so called Smart Nodes hybrid objects as well as remote services are offered through classical Internet services and channeled into the ambient network. Figure 1 gives an overview of the communication layers. The standard used for communication between Smart Nodes is IEEE 802.15.4. This standard defines a Personal Area Network (PAN) which has been developed with special focus on low energy consumption. While IEEE 802.15 provides the physical and data link layer, we use the standard IP protocol as network layer. This allows an easy integration of Ambient Intelligence networks into existing IT networks. In addition, it enables the Ambient Network to incorporate services from the IT backend, which may not be provided completely by Smart Nodes alone. Moreover, it allows a classic IT infrastructure to participate in the ambient network. The Smart Nodes form a special network – the MANET1 using the AdHoc On Demand Distance Vector (AODV, RFC3561) routing protocol [6]. A proprietary convergence layer is responsible for the interface between IEEE 802.15.4 data link and IP network layer. This includes address mapping and mechanisms for joining and leaving the network. The upper levels of communication are implemented by using XML-RPC2, which is used for calling services and for exchange of data. At the top level is the Semantic Decision Layer which uses OWL3 to represent human preferences, abilities of individual nodes, sensor attributes-value pairs etc. All of them will be used to draw conclusions while trying to match the preferences of humans and the ambient intelligence services [7].
Fig. 1. Network Stack 1
Mobile AdHoc Network. Extended Markup Language – Remote Procedure Call. 3 Web Ontology Language. 2
260
R. Welge, H. Faasch, and E.C. Bollow
2.2 Network Structure and Routing The IP overlay network is implemented in a way that reflects the topology of the underlying IEEE 802.15.4 network. Due to the limited range of a single node a routing protocol has to be applied to transfer data over multiple hops from source to destination. Due to the dynamic nature of the Ambient Intelligence network (most of the nodes may enter or leave the network at any time and some nodes are mobile), static routes are not feasible. We use the AdHoc On Demand Distance Vector (AODV, RFC3561) protocol for routing. AODV is a reactive distance vector routing protocol developed for mobile, dynamic ad hoc networks. Routes are discovered on demand by broadcasting RouteQueries through the network using the expanding ring search algorithm. The destination node or an intermediate node which currently knows a route to the destination replies with a RouteReply describing the discovered route. Like other distance vector protocols a network-node basically stores only information about destination, number of hops to destination and next hop. No complete route information is stored which conserves memory in contrast to link-state protocols. In an Ambient Intelligence network it is not necessary for all nodes to provide routing capabilities. In our design only some of the nodes, the Full Embedded Smart Nodes (FESN), provide routing capabilities. FESNs are stationary nodes forming an environmental network. In the deployment phase the network can be planned in such a way that irrespective of current mobile user’s positions every part of the Ambient Intelligence network is always reachable. The FESNs form the backbone of the Ambient Intelligence network and communicate using AODV. Another kind of nodes which are embedded into sensors and actors, the so called Thin Embedded Smart Nodes (TESN), have no routing capabilities. Instead they associate with FESNs using IEEE 802.15.4 and communicate only through their FESN. The same is true for Mobile Smart Nodes (MSN). This allows the TESNs to be technically simpler and thus less expensive than FESNs. From an IEEE 802.15.4 point of view, the FESNs are the PAN coordinators for the TESNs and MSNs.
Fig. 2. Living Lab
Ambient Intelligence and Knowledge Processing
261
A fourth kind of Smart Node is named Convergence Node. This is a special kind of FESN which, in addition to AODV capabilities, has a second interface to connect to classic IT networks, i.e. it serves as a bridge between IP based Ethernet networks and the Ambient Intelligence network (see figure 2). 2.3 OWL-Discovery–Middleware Each room has a coordinator, the FESN (Full Embedded Smart Node), which collects the self-descriptions of the energy loads (lamps, heaters, etc.) sent and stored by TESNs connected to the energy consumers. The self-descriptions are modeled in OWL and are transferred via HTTP. The FESN receives and keeps these descriptions.
Fig. 3. Network Structure
Additionally, the FESN holds abstract room information, for example the rooms location and rooms volume as well as a context description identifying the room and the FESNs’ IP address. Figure 3 illustrates one possible scenario. Figure 4 shows a typical sequence of actions: When a person enters the building the persons’ MSN (Mobile Smart Node) connects to a reachable FESN (1). After the MSN has received an IP address from the FESN the MSN transfers a “Context Request” to the FESN. The context specifies the current interest of the person, e.g. the persons office to be controlled. The FESN answers with a “Context Response” containing the IP address of the FESN associated to the context (2). These requests and responses are encoded in OWL and are transmitted via UDP. Next the MSN sends its complete preference profile, modeled in OWL, to the FESN associated to the desired context (3). This FESN now infers the users preferences with the accumulated selfdescriptions of all TESNs located in the room (4). After completion FESN controls electrical consumers (e.g. light and heater) by sending an XML-RPC-Call via HTTP to the TESNs according to the result of the inference process (5, 6).
262
R. Welge, H. Faasch, and E.C. Bollow
Fig. 4. OWL-Discovery Sequence
2.4 OWL-Modelling Modelling the Ambient Intelligence system using OWL and preliminary tests with a prototype implementation have shown that expressing simple properties as well as semantic preferences in a formal language like OWL poses clear advantages over plain classical building automation systems when dealing with human centric systems. While traditional building automation system can control some lights and manage air-condition for an entire multi-storey building their possibilities are quickly exhausted when dealing with individual humans in the sense of an Ambient Intelligence System. The basic difference is that traditional building automation systems may include control loops with feed back values from sensors. But the individual human is not part of this loop. Such systems do not factor the actual human characteristics and the actual human behaviour as part of its designs. In our approach the users carry own preferences within the MSNs. These preferences are more complex than “actor x=off, actor y=on”, which would be ok within the domain of a building automation system. Instead preferences are like “I want a temperature to 20°C when working”. To respond appropriately the system must have an "idea" about the context “working”: Where does the user work? When does the user work? And finally: “How do I get the temperature to 20”°C? As a final consequence the system would map the users desire to simple instruction a classical automation system might carry out, by specifying a set point for a specific controller. Formalizing properties, facts, and rules which enable the system to come to a decision which results in appropriate actions is possible with OWL. The expressiveness of OWL surpasses the simple example above, but things get more complicated easily. In an AAEM system there might be energy constraints: What to do if energy is scarce
Ambient Intelligence and Knowledge Processing
263
and the user wants a high temperature or some other energy intensive services? The system might override the preferred temperature and deliver some degrees less. For other services cutting the service level might not be possible, i.e. it is not possible supplying “50% of a laser printer”. This means detailed information about energy sinks and their utility for the user is needed – and therefore must be formalized. A final example introduces a scenario where a learning capability of the system is required: If a user leaves his current “work-context" heated to 20°C the system has to decide whether to keep the temperature at 20°C or to turn the heating down for the sake of energy savings. The system might learn that, if the user leaves the room and enters the canteen in a certain timeframe, it is not reasonable to turn off the heating, while on the other hand if he leaves his current context on Wednesdays to reappear in colleague X’s office, this means that he most probably will not return for the next 3 hours so that in the meantime a lowering of temperature is advisable. Rules like this, which can be either programmed into the system or learned during usage, have to be encoded in a formalized machine readable way using a framework like OWL. These examples show that for human centric Ambient Intelligence systems much more structured information about devices and the environment is needed than in a traditional building automation system.
3 Semantic Ambient Network Using SMI - Semantic Method Invocation We address the fragmented market of low power, low cost, low data rate embedded networking devices. The architecture is designed for seamless integration of mobile users into building environments consisting of wireless sensors, actuators as well as things of daily use. The current embedded implementation is based on NXP’s LPC21xx (ARM7) microprocessor architecture and TI’s IEEE 802.15.4 compliant CC2420 transceiver for wireless communications [8]. SUN’s JAVA platform is supported for IT developments. 3.1 Introducing Semantics into Ambient Networks In addition to typical ad hoc network tasks there are additional challenges: while discovering all available network nodes a user usually demands for room functions from embedded devices, even if he has never met them before. He isn’t aware of nodes and location of services; his context is of interest only. Concepts of services using hard coded IDs and textual descriptions fail here. We adopt the Semantic Web ideas propagated by the World Wide Web Consortium (W3C) in our project to address the problem of personal mobile devices understanding local environments. The Semantic Web project focuses on information retrieval using infrastructure networks processed by web agents rather than information retrieval using Personal Area Networks processed by mobile nodes, but there is no restriction of transferring the ideas. W3C’s Semantic Web idea. Today web contents are formatted for human readers rather than machines in terms of software agents. Common techniques for information
264
R. Welge, H. Faasch, and E.C. Bollow
processing are based on keywords. These techniques do a reasonably good job for human web users but are not a feasible solution for users exploring the local environment, e.g. a room. Search results are of high quantity as a result of low precision depending highly on the used vocabulary – the main problem of today’s WWW. The method of presenting result lists of web pages can be characterized as key word location finding rather than information retrieval. With the Semantic Web idea the W3C introduces the next generation web. The meaning of a web site’s content plays more a role than content management solutions – the main challenge of the current web generation. Next generation web is characterized by knowledge retrieval and processing based on formal languages describing resources, called objects. Knowledge should be organized as concepts to be retrieved and processed unambiguously by software agents. Keyword based search algorithms identifying only words should be replaced by semantic interpretation of formal descriptions. Meta-data, meaning “data describing data” play a key role. Metadata describes the affiliation of data contents to formal characteristics introducing semantic aspects. The Semantic Web community introduces the term ontology. The term originating from philosophy has been defined by T.R.Gruber: “An ontology is an explicit and formal specification of a conceptualization”. It provides for a shared understanding of a domain. E.g. an ontology prevent two applications from using one term with different meanings in one semantic context. Results will be precise navigation through the Internet and search engines with high precision information retrieval. The evolution of internet technologies dealing with knowledge management is a continuous process in terms of layers of a growing protocol suite. The following protocols have been standardized at the moment: XML is the language to develop structured data contents with a user defined vocabulary. XML does not define a way to express semantic. XML is suitable for data exchange at document level. Using XML Schema structure of XML documents can be restricted. • Resource Description Framework (RDF) may be considered as resource description data model. RDF is defined in terms of a XML-based syntax. It enables the expression of statements describing application specific objects (resources) and the relations between them [9]. • The description language RDF Schema (RDFS) offers language components for hierarchical organization of objects. It introduces classes and subclasses, properties and sub-properties, ranges, domains, restrictions and last but not least relationships. It can be used as a simple language for writing ontologies representing knowledge. • OWL (Web Ontology Language) will be used to interpret retrieved information. OWL extends RDFS offering a complex vocabulary adding disjointness of classes, cardinality and other useful features for knowledge representation. Furthermore it restricts RDFS to be decidable, thus enabling suitable support for reasoning. It is the current state of W3C’s Web Ontology language. Logic is an essential prerequisite for definition of declarative knowledge depending on the respective application. Logic is represented by a formal language in terms of sentences expressing declarative knowledge.
Ambient Intelligence and Knowledge Processing
265
3.2 SMI-Semantic Method Invocation: A Semantic Ambient Network Approach Using standard tools RDF Schema documents can be created describing the semantics of objects and methods. At run time each networking node embedded into the environment represents an object containing a RDF Schema self description. Using the Discovery Service of L3-NET Middleware1 a mobile device can retrieve the RDF Schema documents from available embedded device’s SMI servers understanding syntax and semantics of exported software methods [10]. This enables the mobile user to find suitable methods for his needs and how to operate those methods to fulfil his needs. This process is called “matching”. After successfully discovering suitable methods from the objects embedded into the environment the methods of remote SMI servers have to be bound to a stub processed by the mobile device. Any function of any embedded SMI server can be bound to the stub forming one temporary mobile client class which is instantiated immediately. Through this mechanism embedded node software methods including device driver services can be accessed using easy to use method calls of Java objects processed by standard IT systems while at the same time hiding all details of the distributed system like message passing based communications and remote execution from the user. This allows developers without an embedded background to develop applications for distributed systems. Development of a corresponding Remote Procedure Call mechanism for non JAVA mobile devices is in progress. We developed a suitable ontology engine for embedded devices for this task. It allows the user to control the whole room using his mobile device in an implicit driven manner. This means after announcing his identity and preferences stored in terms of a RDS Schema document on the mobile device the device automatically finds appropriate methods, freeing the user from having to care about this. Of course the user is able to specialize or generalize this proposal to fully meet his needs. In contrast to common ontological systems our SMI approach has some special requirements and constraints. As we are using embedded processors – NXP’s ARM7based LPC2148 – to support “small, deeply embedded things” it has severe resource restrictions concerning both memory and processor power. Furthermore we have a highly dynamic system, which is uncommon for traditional systems, too. And last but not least we need a reliable system which has to control the extension of its knowledgebase and ensure its correctness. An actual challenge is to enable distributed ontologies. OWL, tested first, does not have suitable mechanisms for this and thus had to be carefully extended in order to allow ontological communication in a consistent manner [11]. Usually, ontological systems tend to interpret contradicting information as an extension of the existing ontology. In our system we have to make sure the ontology is only extended in a wanted and correct manner and thus had to develop a system that treats contradicting information as wrong.
4 Conclusions The methods and procedures of the developed ambient intelligence platform offer a great field of options and opportunities for human centred man-machine 4
L3-Net: Low Power + Low Cost + Low Datarate Network.
266
R. Welge, H. Faasch, and E.C. Bollow
interfaces – especially in the fields of AAL or energy management by establishing ad-hoc-networks with both fixed and mobile nodes. There is no necessity for a centralized knowledge base with all information and parameters stored in one big memory. However we find the system knowledge spread all over all smart nodes. These smart nodes collect data and save data to offer the different services. In working together we inherit emergence for decentralized intelligent behaviour for individual services. The application of energy management gives a good example for such services: The collection of energy consumption data and the status of the surroundings, i. e. rooms and buildings are collected at their origins. Further on we can control the different energy consumers directly and decentralized. This concept differs completely from traditional automated control systems for buildings with big control centres. The intelligent networks take the human wishes into account, make suggestions and give informations. These networks are learning systems with knowledge based methods. The process of learning depends on user behaviours, acts and reacts on user patterns. However, we move towards the Internet of Things from all directions. We simulate by software and we implement hardware using an embedded middleware to put Things to life. The middleware enables us to connect both sides by standard IPNetworking mechanisms. It’s easy to design and implement a node moveable in a software simulation. But this is a completely different task in hardware! It’s quite easy to have a hardware node that is designed to work as a router. But to establish a dynamic ad-hoc routing protocol for highly movable Things is still challenging.
References 1. Kunze, C., Holtmann, C., Schmidt, A., Stork, W.: Kontextsensitive Technologien und Intelligente Sensorik für Ambient-Assisted-Living-Anwendungen. In: AAL Kongress 2008, VDE VERLAG GMBH Berlin Offenbach (2008) ISBN 978-3-8007-3076-6 2. Kamenik, J., Nee, O., Pielot, M., Martens, B., Brucke, M.: IDEAAL – an integrated development environment for AAL. OFFIS e.V., Oldenburg, Germany, VDE Verlag GMBH · Berlin · Offenbach, ISBN 978-3-8007-3076-6 3. Welge, R., Faasch, H., Bollow, E.C.: Ambient Assistet Living - Human Centric Assistance System. In: aps+pc, Workshop Proceedings (2008), ISBN 978-3-935786-49-2 4. ITU: ITU Internet Reports 2005: The Internet of Things, ITU Geneva (November 2005) 5. Locatelli, M.P., Vizari, G.: Awareness in Collaborative Ubiquitous Environments: The Multilayered Multi-Agent Situated System Approach 6. Perkins, C., et al.: Ad hoc On-Demand Distance Vector (AODV) Routing, IETF RFC 3561 7. Poole, D., Mackworth, A., Goebel, R.: Computational Intelligence – A Logical Approach. Oxford Press, Oxford (2006) 8. Coexistence Assessment of Industrial Wireless Protocols in the Nuclear Facility Environment; U.S. Nuclear Regulatory Commission (2007) 9. RDF/XML Syntax Specification (Revised) http://www.w3.org/TR/2004/ REC-rdf-syntax-grammar-20040210/ 10. Welge, R.: Sensor Networking with L3-NET- Characteristics. A SELF-X Middleware based on standard TCP/IP protocols, Embedded World 2006, Nürnberg, WEKA Zeitschriftenverlag (2006), ISBN 3-7723-0143-6 11. OWL Web Ontology Language XML Presentation Syntax, http://www.w3.org/TR/owl-xmlsyntax/
Configuration and Dynamic Adaptation of AAL Environments to Personal Requirements and Medical Conditions Reiner Wichert Fraunhofer Alliance Ambient Assisted Living Fraunhoferstrasse 5, 64283 Darmstadt, Germany
[email protected]
Abstract. AAL concepts have been shaping scientific and market-oriented research landscapes for many years now [1]. Population development demands have made residing and receiving care in one’s own home a better alternative than institutionalized inpatient care. This reality has been reflected in open calls for proposals, as well as in numerous European and domestic projects, and has resulted in a considerable number of applications and product concepts with AAL ties. Unfortunately, it is already foreseeable that these project results will not be implemented in a comprehensive fashion, as individual applications and products can only be combined into a comprehensive solution with a great deal of effort and potential cost. Through stereotypical projects and prototypes, as well as concrete usage scenarios, this paper will extrapolate the added value resulting from integrating individual products into coherent comprehensive solutions within the framework of the complete supply and value chain. Business and technological obstacles will be identified and pathways shown by which AAL concepts and visions can lead to a better reality for all of those concerned, from healthcare recipients to those bearing the costs. Keywords: Ambient Assisted Living, User Interfaces, Elderly People, End User Configuration, AAL Platform.
1 Increase in the Elderly Living Alone In 2005, there were 82.4 million people living in Germany. According to prognoses of the German Statistical Office, this number will decrease to between 68.7 and 79.5 million residents by the year 2050. At the same time, the number of 80-plus-year-old residents will rise from 4 million (2005) to approximately 10 to 11.7 million (2050) [2]. With increasing age, the proportion of those living alone also increases. In 2000, 44 percent of private households occupied by 65- to 70-year-old primary residents were single occupancy dwellings. In light of increasing divorce rates and the growing number of single, as well as single parent households, this trend is expected to continue [3]. Age in and of itself is not necessarily an indicator for being in need of care, but with increasing age, a higher percentage of the population grows dependent on assistance, support and care giving. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 267–276, 2009. © Springer-Verlag Berlin Heidelberg 2009
268
R. Wichert
96% of those 70 and older have at least one internal, neurological or orthopedic disease process requiring treatment, whereas 30% have five or more [4]. Despite the paradigm „out-patient before in-patient“, data from care giving statistics reveal a trend toward professional care. Especially in the area of inpatient care, the number in need of care has risen over the last several years: in 2005, 676,000 people were in nursing home care; in 1999, this number was just 573,000. Current developments, such as the conceptualization and implementation of alternative forms of habitation, take the discussion of an „out-patient conversion“ of care giving under consideration. The primarily undesired increase in the number of residents in in-patient facilities highlights the current care giving dilemma, namely, that it is very difficult for those in need of outside care and support to continue to live in their accustomed environments. The realization of technologically supported, AAL-based concepts can contribute to closing these care giving loopholes. 1.1 The Desire for Independent Living in the Golden Years The majority of the elderly want to remain in their accustomed environment, even as the need for outside support and care giving increases. Institutionalized forms of habitation, on the other hand, are experiencing a decreasing level of acceptance according to a representative survey by the Schaderstiftung [5]. Independence and selfdetermination have high social value, also among the elderly population. Within the framework of the Fraunhofer IAO project „Pflege 2020” (Care giving 2020), 500 people aged 55-75 were questioned as to their desires and needs for future care giving. Key topics of the representative survey included desired services, forms of habitation, and technological applications. A few guiding universal themes could be identified as fundamental earmarks of a high quality of life according to those surveyed and can be characterized by the keywords “security”, “participation”, “individuality” and “daily structure”.
Fig. 1. Application areas for AAL and “Personal Health” concepts in the housing industry
Configuration and Dynamic Adaptation of AAL Environments
269
The effects of the demographic transformation pose enormous challenges for the housing industry, as well. One such challenge is that of keeping older residents with limited mobility and diminishing health in existing housing and to avoid vacancies. Suboptimal housing structures from the 1950s/60s often make a senior-appropriate transformation difficult. Technological support for senior living does not have to be limited to the adaptation of the personal living space, however. It is more pertinent to develop comprehensive concepts that link individual living spaces with a residential quarters-oriented infrastructure. Only the linking of ambient technologies with individualized, health-oriented service concepts can meet the needs and desires of elderly residents and become the foundation for new business models for the housing industry. 1.2 State of the Art in AAL and Personal Health Currently, further development of technological solutions is occurring in a number of predominantly European research ventures. At the beginning of 2007, a total of 16 individual AAL projects were launched within the 6th Framework Program by the European Commission with such diverse thematic focal points as, i.e., social integration, support for daily life, security and mobility (EU-IST PERSONA) [6], semantic service-oriented infrastructure (AMIGO) [7], special support for the blind (HAH Hearing at Home), secure »white goods« (EASY LINE+), Entertainment and Health (OLDES), mobile (cellular) support within the home and elsewhere (ENABLE), support for »Daily Life« and Health Care and Communication (NETCARITY), »Health« vital function monitoring, activities, position (CAALYX, EMERGE), automation between »white goods«, entertainment with variable user interface (IN-HOME), scalable, adaptive, customizable add-ons for personal assistance sensor systems (SHARE-IT), monitoring of daily activities and incorporation of biofeedback (SENSACTION-AAL), and many more. 2008 saw the launch of additional projects from the 7th Framework Program exploring these very topics. Through the continuation of pilot projects, such as “SmarterWohnenNRW” (Smarter Living in Northrine Westphalia) or the application project “Sophia”, with complete Internet integration, as well as that of communication and telemedicine, better conditions for the future implementation of AAL solutions in rental housing are being prepared. For many of these projects, the term “personal health” – like the term AAL - has been playing an increasingly important role. Similar to how the “personal computer” established itself as an extension and complement to professional computer technology, “personal health” denotes the accessibility of devices previously used only by medical personnel - but also that of respective information and service options – now available to the private user. “Personal health” also characterizes the direction of a paradigm transformation from traditional healthcare to person-centered, individualized prevention, diagnostics, therapy and care giving. This transformational process is supported by developments in the area of telemonitoring, as well as further developments in personalized medicine, which enable the person-oriented integration of digital patient data (images, vital function monitoring, demographic and anamnestic data, lab findings) through the intensive application of information technology and telematics (eHealth), while incorporating the latest developments in biotechnology, genomics, and pharmacology [8], [9].
270
R. Wichert
The technology applied in “personal health care” encompasses wearable medical devices or systems in particular, conceptualized for the diagnostic and therapyaccompanying application in the home environment. Such a telemonitoring system typically consists of medical sensors and a base station, either worn by or located in the immediate vicinity of the user. This base station captures data delivered by the sensors, prepares the data if necessary and makes it available via a wired or wireless transmission system to the stationary (AAL) infrastructure, or, if required, to a doctor, hospital or telemedical service provider, where further evaluative steps or data storage can occur. The sensor devices placed on or in the body communicate with the base station via a wireless network of limited range (Body Area Network / Personal Area Network). The base station can be either a stationary personal computer system with a fixed network connection or a mobile device (Smartphone, PDA, etc.) with wireless transmission technology (GSM, UMTS, WLAN). In order to develop markets for “personal health” systems and applications – initially based on conditions in the US – an international alliance dubbed “Continua Health Alliance” (www.continuaalliance.org) was founded in 2006 and currently encompasses approximately 170 companies. A prerequisite to the realization of “personal health” is the availability of reasonably priced, stand-alone, user-manageable system components, as well as their crossmanufacturer interoperability in “open” systems. To enable this availability, Continua guidelines will be compiled as “recipes” for the development of interoperable products supported by a comprehensive system of international norms and industry standards. In addition to fitness and wellness, the intended application areas also encompass chronic disease management outside of clinical environments, as well as support for independent senior living, with the goal of living as long as possible in one’s home environment.
2 Individualized Housing Development and Adaptation While the need and market potential for universal AAL applications is clear, there is currently a lack of marketable products that significantly rise above purely isolated applications. Viable, innovative services remain on our wish list, a distant, albeit desirable dream. Some of the isolated applications currently available include home emergency systems, which are constructed as pure alert signallers, sensors for light control, or device-specific user interfaces. These applications could only be linked together with a great deal of effort, with any emergent alterations requiring the involvement and expertise of system specialists, thereby increasing solution costs considerably. In addition, sensors and other hardware components, as well as individual functionalities, have a tendency to require multiple installations with multiple associated costs, as the systems are only offered as complete packages, data exchange formats and protocols are not compatible, and components from one application cannot be used by another application. Likewise, it is impossible to generate higher value functionalities through combining layers of individual functions. In other words, targeted AAL systems are not realizable.
Configuration and Dynamic Adaptation of AAL Environments
271
By contrast, future AAL solutions for the care and support of the elderly must be based on a flexible and expandable platform and be modular and expandable so as to remain adaptable to the individual’s changing needs, lifestyle and health. 2.1 Realizing the Vision through an Integrated Concept This scenario presupposes that it will actually be possible to dynamically adapt existing housing to the requirements of age-appropriate living. It is not enough that new components and devices can independently integrate themselves into the current infrastructure. Tools are also required that allow service providers to optimize available resources (services, sensors, device functions) for these infrastructures. With the targeted configuration option, any combination of existing functionalities can be reused in new applications, ultimately leading to the targeted universality, as well as the associated reduction in costs. The Fraunhofer Alliance AAL, the Fraunhofer frontline theme „Assisted Personal Health“, as well as the Fraunhofer innovation cluster "Personal Health", each with expertise in their specific technological fields and in cooperation with external partners, are all contributors in realizing this vision. It is essential to include the complete chain of players in the healthcare field and to get medical professionals, health insurance companies, health associations and organizations, social and health service providers, healthcare lobbyists, housing industry specialists, psychologists, as well as the respective technology developers, all together at one table, in order to develop new forms of cooperation between all participants. It is equally essential to link healthcare assistance in the sense of AAL with personalized information processing, information transfer and information management according to the „personal health“ paradigm and to further develop these linked components into one comprehensive, universal system. This integrated process chain approach appears to contradict demands for a quicker realization and marketability as they are raised by the respective industries. The participating institutes want to counter this supposed contradiction by stating that whereas exemplary existing prototypes close to production are first being further developed into marketable products, standard interfaces are being simultaneously prepared for a later integration of existing platforms. 2.2 Adaptation for Future Needs and Health Issues The goal is to equip existing housing with ambient technology such that it can be adapted for future needs and health issues as easily as possible. The focus is on people with chronic illnesses, who can be provided care via telemonitoring. This links to flexible services ensure comprehensive care. Telemedical care can take place, for instance, through a medical service centre and emergency care through a nearby urgent care centre. In addition to the recording of vital functions, the detection of accidental falls is a priority. Participation in societal life for individual residences to the outside world is to be supported by telecommunication. Reminder functions simplify the structuring of the day’s activities and simultaneously improve the quality of care and the self-management abilities of the chronically ill.
272
R. Wichert
An accompanying evaluation would prove useful for purposes of ascertaining which technological components should be linked together to optimize the adaptation of the residences to the individual needs of the residents. In a further step, resident requirements, results and experiences in existing housing should be compiled and fundamental infrastructure prerequisites for future accommodations in housing should be formulated for the housing industry. In summary, the following objectives in particular are essential: (1) Support for independent, autonomous living for the resident in later years and with potential health-related limitations, (2) Enabling a life lead with a high level of security and social living quality, (3) Enhancement of the selfmanagement of chronic diseases and increase in compliance through supportive ambient functions, (4) Expansion of the service portfolio for health-related and social service providers, as well as for the housing industry, (5) Development of an intuitive, operable configuration tool for device adaptation and data access and (6) Development of adaptable technical installations for increased flexibility in view of changing living requirements for existing and new housing.
3 Solution: Provision of Flexible and Expandable Platforms The overriding technological objective is the provision of a flexible and expandable platform for the care and support of the elderly in their home environment. New housing complexes need to be fundamentally constructed such that each residence can be individually adapted to the respective residents. Existing housing is to be retrofitted such that it can be dynamically adapted to the requirements of an aging population if at all possible. It would appear reasonable to follow a two-step approach: In phase one, AAL technologies are to be integrated into existing housing. In phase two, the fundamental infrastructure requirements for the future housing industry are to be developed in new housing construction on the basis of an expandable platform. Further, the integration of sensors worn on the body and medical devices is to be enabled and evaluated incorporating a central basis of information. Additional components and devices must be fundamentally capable of autonomous integration into these infrastructures [10], [11]. There are validated project results from EU-IST projects, such as PERSONA, AMIGO or SOPRANO, with a special focus on dynamic distributed infrastructures for the self-organization of devices, sensors and services. These results should be taken under consideration [12]. The PERSONA infrastructure e.g. with its four communication buses aims at the provision of mechanisms that facilitate the independent development of components that nonetheless are able to collaborate in a self-organized way as they come together and build up an ensemble. Therefore, the buses act as brokers that resolve the dependencies at runtime using the registration parameters of their members and semantic match-making algorithms [13]. The open nature of such systems must allow dynamic plugability of components distributed over several physical nodes. It consists of a middleware solution for open distributed systems dealing with seamless connectivity and adequate support for interoperability that makes use of ontological technologies and defines appropriate protocols along with an upper ontology for sharing context [14].
Configuration and Dynamic Adaptation of AAL Environments
273
Fig. 2. Decentralized software infrastructure (PERSONA)
3.1 Configuration Tools – Constraints and Functions On the basis of these infrastructures, tools for service providers can be designed, which enable the optimization and configuration of available resources. The targeted configurations should enable higher value functions resulting from a cooperation of resources, thereby generating an added value that has been unattainable up to this point. It is essential that any needs or relevant situations be automatically recognized, analyzed and associated with the call for corresponding functions. Unfortunately, it is almost never the case that a direct deduction of a reaction is possible on the basis of an event, since (1) situations are not always directly measurable and, therefore, conclusions regarding the situation cannot be based on single events. Rather, it is imperative to draw conclusions based on several events or facts (event aggregation). The situation “resident has fallen”, for instance, can be recognized with a greater degree of certainty, if, in addition to an alert sent by one of the acceleration sensors located in the resident’s cane that the cane has fallen on the ground, the camera-based analysis of the positioning of a human form (“is in the prone position”) is also reported and taken under consideration. It should likewise be taken into account that (2) required functions are not always provided by individual devices and components found in a given environment, but that the desired effect can perhaps only be achieved through the combination of several available functions (composition of services). For instance, a service provider receives an alert about a fall and there is an immediate automated address generated through the resident’s environment, asking whether he is alright. In this scenario, the combination of several functions could prevent false alarms, etc. Unfortunately, associations (links) of situations with the respective functions could change, meaning something that was wanted a particular way up until now could suddenly be interpreted and handled differently (i.e., if a resident’s health situation were to suddenly change).
274
R. Wichert
Configuration tools can also serve to make adaptations to the individual preferences, capabilities and limitations of the resident or for the specific health situation (i.e., with the question as to whether the neighbour, a relative, nursing home personnel, or a combination of the above should be notified of a certain event).. These tools become an essential complement to open systems, which can continue to evolve over longer periods of time. The software infrastructure, for instance, of PERSONA is already in the position to integrate new components ad hoc or to execute aggregate events and services by means of a script. 3.2 Intuitive Interaction Concepts In the (not so distant) future, novel interaction forms will essentially shape everyday life as we know it. Interaction concepts for the control of objects in AAL environments will no longer be centrally realized, as is common, for instance, with the PC. Instead, they will be implemented through networks of (computer) nodes that will interpret user commands and distribute them by way of existing communication infrastructures to the end devices that can best realize the task at hand. Multimodal interaction concepts, such as speech and gesture recognition or computer vision, require computationally intensive algorithms, which can only be executed by stationary computers. Should additional intelligent deductions from existing information be required, the temporarily increased computational effort can still be generated quickly enough through distributed (computer) nodes. Applications of such interaction concepts include speech interfaces, 3-dimensional interactive video interfaces or emotive interfaces for robots [15]. The potential applications of novel interaction forms can be illustrated by the home environment: In contrast to current concepts with central controls, where functionalities are laboriously programmed and the user must also remember which functions are being activated by which keys, interaction in the AAL environment is decoupled from the hardware. The user no longer uses commands to control devices. Rather, he provides goals that are then interpreted and automatically realized. For instance, if the user acoustically provides the goal „brighter“, first, the room in which the user currently finds himself will be ascertained. Then, the system will check which options are available for increasing the brightness in this room: Are there blinds which can be opened? What kind of lamps are available? With all actions, the status of the environment is ascertained, as well, as it makes no sense, for instance, to open the blinds at night. The preferences and other goals of the user are also taken into account. So, for watching television, the system could select indirect lighting, but for a work situation or for reading, direct lighting could be chosen. It is apparent that intelligent environments also require a configuration of the rules, as each user has a preference for his own personal settings and would like to make any modifications himself. In contrast to the approach presented in section 3.2, the users possess less technical know-how in the handling of rules in complex control systems. Conventional menu-based approaches, such as found with mobile telephones, fail due to the sheer number of modification options. Novel interaction forms are, therefore, necessary for configuration of the environment by the residents for acceptance purposes (end-user configuration) and have extensive research potential.
Configuration and Dynamic Adaptation of AAL Environments
275
4 Conclusion The AAL-vision is that one day sensors and systems give seniors a helping hand in their own home by measuring, monitoring and raising alarms if necessary. To reach this goal lot of scientific and market-oriented research have been done in the past years. Unfortunately, people normally have many problems that cannot be solved with a single product. Thus it is already foreseeable that the needed project results will not work together since individual applications and products can only be combined into a comprehensive solution with a great deal of effort. Future AAL applications, however, must be both flexible and expandable, specifically incorporating “personal health” components, in order to be dynamically adaptable to individual demands and respective medical conditions. In contrast, with the current closed system concepts, sensors or functionalities must be potentially installed and paid for multiple times, as the functionality of an application cannot be used by another application. The next step that should be made now would be to get the industry together at a table in order to work out certain standards. Only that way the products would become efficient and also cheaper. Thus if we would not change our strategies in AAL from individual products into coherent comprehensive solutions within the framework of the complete supply and value chain, there will be a huge risk that we have been spend a lot of money in AAL system solutions at the end and AAL would have been only a huge bubble.
References 1. Emiliani, P.L., Stephanidis, C.: Universal access to ambient intelligence environments: Opportunities and challenges for people with disabilities. IBM Systems Journal 44(3), 605–619 (2005) 2. Federal Statistical Office: Population of Germany till 2050. 11th coordinated population forecast. Wiesbaden, p. 43 (2006) 3. Cirkel, M., et al.: Produkte und Dienstleistung für mehr Lebensqualität im Alter – Expertise. Gelsenkirchen, p. 8 (2004) 4. Robert Koch Institut: Themenheft 10: Gesundheit im Alter, Gesundheitsberichterstattung des Bundes. Berlin (2005a) 5. Heinze, R.G., et al.: Neue Wohnung auch im Alter – Folgerungen aus dem demographischen Wandel für Wohnungspolitik und Wohnungswirtschaft. Schader-Stiftung, Darmstadt (1997) 6. Avatangelou, E., Dommarco, R.F., Klein, M., Müller, S., Nielsen, C.F., Soriano, S., Pilar, M., Schmidt, A., Tazari, M.-R., Wichert, R.: Conjoint PERSONA - SOPRANO Workshop. In: Sala Soriano, M.P., Schmidt, A., Tazari, M.-R., Wichert, R. (eds.) Constructing Ambient Intelligence: AmI 2007 Workshops, pp. 448–464. Springer, Heidelberg (2008) 7. Georgantas, N., Ben Mokhtar, S., Bromberg, Y., Issarny, V., Kalaoja, J., Kantarovitch, J., Gerodolle, A., Mevissen, R.: The Amigo Service Architecture for the Open Networked Home Environment. In: 5th Working IEEE/IFIP Conf. on Software Architecture (WICSA 2005), pp. 295–296 (2005) 8. Blobel, B., Norgall, T.: Standard based Information and Communication – The Personal Health Paradigma. HL7-Mitteilungen, Heft 21/2006, pp. 33–40 (2006)
276
R. Wichert
9. Norgall, T., Blobel, B., Pharow, P.: Personal Health – The Future Care Paradigm. In: Medical and Care Compunetics 3. Series Studies in Health Technology and Informatics, vol. 121, pp. 299–306. IOS Press, Amsterdam (2006) 10. Aarts, E., Encarnação, J.L.: Into Ambient Intelligence. In: Aarts, E., Encarnaçao, J. (eds.) True Visions: Tales on the Realization of Ambient Intelligence, ch. 1. Springer, Heidelberg (2005) 11. Wichert, R., Tazari, M.-R., Hellenschmidt, M.: Architektonische Requirements for Ambient Intelligence. IT - Information Technology, 13–20 (January 2008) 12. Hellenschmidt, M., Wichert, R.: Rule-Based Modelling of Intelligent Environment Behaviour. In: Künstliche Intelligenz: KI, vol. 2, pp. 24–29 (2007) 13. Furfari, F., Tazari, M.R.: Realizing ambient assisted living spaces with the PERSONA platform. ERCIM News (74), 47–48 (2008) 14. Fides-Valero, Á., Freddi, M., Furfari, F., Tazari, M.-R.: The PERSONA Framework for Supporting Context-Awareness in Open Distributed Systems. In: Aarts, E., Crowley, J.L., de Ruyter, B., Gerhäuser, H., Pflaum, A., Schmidt, J., Wichert, R. (eds.) AmI 2008. LNCS, vol. 5355, pp. 91–108. Springer, Heidelberg (2008) 15. Adam, S., Mukasa, K.S., Breiner, K., Trapp, M.: An Apartment-based Metaphor for Intuitive Interaction with Ambient Assisted Living Applications. In: Proceedings of HCI 2008, Liverpool, May 1-9 (2008)
Designing Universally Accessible Networking Services for a Mobile Personal Assistant Ioannis Basdekis1, Panagiotis Karampelas2, Voula Doulgeraki1, and Constantine Stephanidis1,3 1
Institute of Computer Science, Foundation for Research and Technology – Hellas (FORTH), Greece 2 Hellenic American University, Athens, Greece 3 Computer Science Department, University of Crete, Greece {johnbas,vdoulger,cs}@ics.forth.gr,
[email protected]
Abstract. At present, a tendency towards smaller computer sizes and at the same time increasingly inaccessible web content can be noted. Despite the worldwide recognized importance of Web accessibility, the lack of accessibility of web services has an increasingly negative impact on all users. In order to address this issue, W3C has released a recommendation on Mobile Web Best Practices, supplementary to the Web Content Accessibility Guidelines. This paper presents the design and prototype development of universally accessible networking services that fully comply with those standards. Validation and expert accessibility evaluation on the XHTML Basic prototypes present 100% compliance. The followed design process is presented in details, outlining general as well as specific issues and related solutions that may be of interest to other designers. The results will be further verified through user tests on implemented services. Keywords: Web accessibility, mobile accessibility, user interface design, device independence, prototyping.
1 Introduction Since its creation, the mission of the World Wide Web Consortium (W3C) has been to lead the Web to its full potential. The first goal that specifies this mission1 is Web for Everyone (previously Universal Access) while the second is Web on Everything (previously Interoperability). Ten years ago web users had limited access to software, let alone Web services (eServices) that were designed specifically for desktop computers, as there was no alternative way of accessing the Internet. In parallel, assistive technology solutions were scarce, expensive to purchase, limited to specific age or disability categories, and in most cases incompatible with other hardware and software applications [1]. At present, a tendency towards smaller computer sizes and at the same time increasingly inaccessible web content can be noted. Users have more freedom to choose 1
W3C goals: http://www.w3.org/Consortium/mission
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 279–288, 2009. © Springer-Verlag Berlin Heidelberg 2009
280
I. Basdekis et al.
their preferred hardware-software combination for communication and work through a Web browser (i.e., desktop browser, speech browser, speech synthesizer, Braille display, mobile browser, car browser, etc). Therefore, there is increased demand for web material (i.e., content, digital services) interchangeable and accessible at any time and place. For example, a substantial growth can be observed in mobile Web usage and demand for mobile Web services (mServices). Recent studies indicate that the 27% of European and the 28% of US mobile subscribers who currently do not use mobile data services intend to start using them in the next two years [2]. Following this trend, new and existing eServices are being (re)designed in order to be accessed through mobile devices as well as traditional PCs, and serve the demand for 24/7 web access. However, as studies indicate, web material which is designed basically on visual concepts is largely inaccessible to people with disability [3, 4], raising as a consequence barriers to all mobile device users as well [5]. Therefore, and despite the worldwide recognized importance of eAccessibility, the lack of accessibility of eServices has an increasingly negative impact on all users, and especially those for whom Web access may be one of the main paths to address communication needs and support independent living. In addition to problems occurring because of inaccessible content, handheld mobile devices (such as PDA’s, smart-phones, mobile phones, Blackberries, Notebook PCs, ultra-mobile PCs, and others) can present usability problems as well. The use of a pointing device, touch screen or tiny buttons for input, and a small screen for output, is unsuitable for many users, so these options are not really helpful especially to those who are blind or unable to use a stylus. Additionally, installed browsers on mobile devices may vary in the way they interpret web pages without fully complying with markup standards of W3C (e.g., XHTML Basic, cHTML, CSS and others). Due to platform and hardware differentiations between mobile devices (e.g., sound generation), available assistive technology products are targeted mainly to some well-known device types or major operating systems rather than providing a global solution that works everywhere. Furthermore, mobile operating systems provide minimal or no built-in accessibility support. Inevitably, the rising mobile environment introduces hard constraints to interaction design as the technical characteristics which need to be addressed are much more complicated with respect to accessibility barriers on desktop solutions. As a consequence of the above, the development of fully accessible and interoperable eServices introduces new challenges to the accessibility provisions that have to be adopted from the early design stages [6]. As in the case of eServices, the accessibility limitations of the mobile Web Services (mServices) can also be addressed with the use of assistive technology products. To this effect, the design process of mServices is even more demanding, since the considerations mentioned previously have to be addressed; nevertheless, mobile accessibility is still feasible. This paper presents the design and prototype development of fully accessible web services, available through mobile devices as well as traditional desktop PCs equipped with assistive technology. The aim of the work presented is to identify the main challenges and propose experience-based practical design guidelines that web developers may follow in order to comply with W3C de facto standards for mobile accessibility.
Designing Universally Accessible Networking Services for a Mobile Personal Assistant
281
2 Related Work As with existing standards and guidelines for web accessibility and usability, many design guidelines for mServices exist since the late 90s’ [7, 19]. Nevertheless, mobile web content providers are still not paying specific attention to accessibility, and they are unaware of the benefits of providing accessible solutions. Moreover, currently specialized implementation platforms do not help Web developers in integrating accessibility in Web services. Accessibility of mServices is not supported in existing development suites. In order to address this issue, the W3C’s Mobile Web Initiative (MWI) released in July of 2008 the Mobile Web Best Practices (MWBP) version 1.02, supplementary to Web Content Accessibility Guidelines (WCAG) versions 1.03 and 2.04. The aforementioned document sets out an additional series of recommendations designed to improve the user experience of the Web on mobile devices, without exceptions. Since the delivery of accessible and interoperable eServices should also address legal issues and satisfy the constraints raised from user requirements and devices’ technical specifications, the whole design process signifies an exponential design solution space which makes the compliance with W3C standards such as WCAG and MWBP essential (Figure 1).
Fig. 1. Rely on Web standards and guidelines for delivering Web content to mobile devices
Functionality targeted to desktop access is often transferred in the design process of mobile services, without considering any special adaptation. On the other hand, providing “text-only” versions of existing websites is a technique largely discredited by people with disability. As a result, it makes little sense developing separate mobile sites for disabled users. After all, content and services delivered through the web are the same, no matter how many different versions may occur as a result of possible adaptations, customisations or different versions to be used for a variety of devices. MWBP, although not a W3C recommendation, presents practical solutions that help deliver a full web experience on mobile devices rather than offering a separate-butequal treatment. It seems that the philosophy of those practices contradicts other service-oriented standards for mobile usage under development, such as for example the global standard of the International Air Transport Association (IATA) for global 2
W3C- MWI, Mobile Web Best Practices 1.0: http://www.w3.org/TR/mobile-bp/ W3C-WAI, Web Content Accessibility Guidelines 1.0: http://www.w3.org/TR/WCAG10/ 4 W3C-WAI, Web Content Accessibility Guidelines 2.0: http://www.w3.org/TR/WCAG20/ 3
282
I. Basdekis et al.
mobile phone check-in using two-dimensional (2D) bar5. For example, it is difficult to imagine displaying a 2D bar code image to a passenger’s Braille mobile phone. Schilit et al. [8] discuss various techniques can be followed to fit desktop content into a small display. Accordingly, the following strategies, ordered by resources needed, can be followed to ensure that an existing eService can be used on a PDA or other browser-equipped mobile device: 1. Keep the same eService (as the desktop design) and perhaps make use of scaling techniques or specific web browsing systems that reduce the size of the working area. The latest fit-to-screen features that are being incorporated in some web browsers allow automatic web page size adjustments (e.g., Mobile Opera6, Internet Explorer Mobile7, Handweb8 , Plamscape, and others9). Although such a solution can be handy to experienced users, those with visual disabilities will suffer from reduced readability and face scrolling problems, not to mention that the on-the-fly scaling cannot reorganize in an optimal way designs targeted to bigger displays. 2. Apply automated re-authoring techniques that involve removing all presentation information (i.e., Cascading style sheets, images) and produce raw HTML, or even utilize alternative presentation information (i.e., Cascading style sheets for handheld) by keeping the same markup. Such an automatic process, which is similar to proxy transcoding, may produce user friendly versions for mobile experience in a cost effective way. Examples of such tools and services are Power Browser [9], Mobile Google10, AvantGo11, and Skweezer.net. These solutions cannot work effectively though for eServices with broken markup beyond repair (i.e., the web page contains invalid HTML), since the result will look differently in different browsers, and in most cases tend to render well only in basic html markup. In addition, markup resources size is not reduced, so utilization through a mobile device may result in awkward behavior (e.g., scrolling) and increased costs due to mobile transfer fees. As traditional web services are usually developed with desktop computers in mind, their conventional web pages will not be adequately displayed on mobile devices. 3. Perform adaptations in content and/or in interface elements appropriate for enhancing the mobile experience. This process can include transcoding markup to be compatible with device formats, altering or rearranging the structure and the navigation, and introducing a new content structure. This method can be further classified according to the resulting transformed pages provided to the users, e.g., single column, fisheye visualization [10; 11], and overview-detail [12]. Examples of systems delivering such experience are Opera SSR, Fishnet [13], the Document 5
IATA Resolution 792: Bar Coded Boarding Pass (BCBP), version 2: http://www.iata.org/NR/rdonlyres/2BD57802-6D96-4D9A-8501-5349C807C854/0/ BarCoded BoardingPassStandardIATAPSCResolution792.pdf 6 Opera Software: http://www.opera.com/mobile/ 7 Microsoft: http://www.microsoft.com/windowsmobile/en-us/downloads/microsoft/internetexplorer-mobile.mspx 8 Smartcode Software: http://www.palmblvd.com/software/pc/HandWeb-1999-02-19-palm-pc. html 9 Wikipedia has a more comprehensive listing: http://en.wikipedia.org/wiki/Microbrowser 10 http://www.google.com/mobile/ 11 http://www.avantgo.com/frontdoor/index.html
Designing Universally Accessible Networking Services for a Mobile Personal Assistant
283
Segmentation and Presentation System (DSPS) [14] and the Stanford Power Browser [15]. However, these solutions cannot be easily generalized. 4. Design and create new mServices from the beginning and constantly evaluate the outcomes against design standards. This process is complex to address for both web designers and developers, as it requires substantial effort, planning, deep knowledge of recent standards and well trained personnel. Although it is possible to reuse some of the principles and practical solutions delivered in the desktop version, design and implementation of these solutions implies the creation of new mobile web templates which is a time consuming procedure. The result of such process provides, in theory, the best experience for mobile users. Nevertheless, maintaining a specific mobile site which does not “look like its big brother” is inconsistent with Device Independence principles. When dealing with new web services, the optimal solution is obviously to provide universal accessibility at an early stage during the design phase (e.g., by means of evaluation and redesign on early mock-ups and design prototypes against accessibility standards, because accessibility is more expensive if introduced later in the design phase [16].
3 Design Process for Embedding Accessibility in Mobile Services It is argued that web accessibility can be achieved only if accessibility standards are applied from day one of the design. In the case of mobile Web services, the designer should comply with even more strict constraints than for desktop solutions, since the screen size of the mobile device or the interaction style may be totally different from the desktop environment. To this purpose, design and usability guidelines for mobile design can contribute significantly towards ensuring that the final outcome addresses functional limitations such as visual disabilities, hearing impairments, motor disabilities, speech disabilities and some types of cognitive disabilities. From a usability point of view, applicable principles can be derived from guidelines improving mobile web usability [13]. For example, excellent usability experiments demonstrate that the most effective navigation hierarchy for use with mobile devices is one with only four to eight items on each level [17]. The provision of a universally accessible web service, with mechanisms12 consistent among all devices in use [20], implies producing the intersection13 of all relevant standards and guidelines, design according to this larger set of rules, perform tests and at the end re-evaluate and re-visit the designs. In this recurrent process, user feedback is also critical, because it whittles away the design space and so eliminates possible alternatives. Once the design space has been documented, the resulting designs need to be encapsulated into reusable and extensible design components. The above process has been followed in the context of the Greek nationally funded project “Universally Accessible eServices for Disabled People". The aim of the project is to promote the equal participation of people with disability in e-government services, by the implementation of an accessible portal. 12 13
WCAG, Guideline 13: Provide clear navigation mechanisms. Set Theory: intersection of the sets A and B, is the set whose members are members of both A and B.
284
I. Basdekis et al.
Fig. 2. Design templates for mServices: the main (navigation) page (left) and the first page for email services (right)
Fig. 3. Home page of amea.net (main options translated in English) displayed on a HTC-TYTN II (left) and a Fujitsu Siemens Pocket Loox N500 screen capture (right)
The portal will offer personalized and informative accessible Web services, available through mobile devices as well as traditional desktop PCs equipped with assistive technology. To this purpose and in addition to adhering to aforementioned accessibility standards and generic design principles, the iteration processes involving experts in the field of accessibility as well as end users yielded specific design guidelines. With the stabilization of these guidelines, detailed design mock-ups for all the services were elaborated (Figure 2). Based on the design mockups, markup templates (XHTML Basic 1.1, CSS 1.0) have been implemented to serve as a compass for the implementation team. These templates have been exhaustively tested against
Designing Universally Accessible Networking Services for a Mobile Personal Assistant
285
aforementioned guidelines and full compliance has been achieved (Figure 3). Refinement based on the actual usage of the mServices is expected in the future and to this purpose user tests have been scheduled.
4 Design Experience The practical experience acquired during the design process outlined in Section 3 in the context of the project “Universally Accessible eServices for Disabled People" resulted into the consolidation of the following set of guidelines: 1. Use of standards • Comply with WCAG 1.0 levels AAA (including subjective 14.1 whenever possible), with the use of valid XHTML. Tools that may be useful are the Bobby software of the Center for Applied Technology14, the W3C’s Markup Validation Service15, the Colour Contrast Analyser16, and the WAVE Toolbar17. • Comply 100% with MWBP 1.0, consult relationship documents18 and make use of valid XHTML Basic 1.1. Available validation tools include W3C’s mobileOK Checker19 and TAW mobileOK Basic Checker20. • Perform manual checks (e.g., rendering without style sheets, test the accuracy of alternative text descriptions, etc). 2. • • • • • • • 3. • • • •
14
General Use only server side actions. Do not use javascript at all. Avoid scrolling, unless user chooses to enlarge fonts beyond a threshold. To this purpose split the task into a number of sub-tasks. Provide single task dialogues (e.g., write a topic then save it). Group available options in a single screen. Correlate each service with specific color. Reuse faint version as content’s background color. Use lightweight icons (GIF: size less than 500K), consistent with desktop version for main option categories Navigation Stick to George Miller’s Golden rule (7±2). Use the card sort metaphor [18]. Always provide screen orientation (Hide/Unhide path). After reading – announcing page title, provide high priority/visibility “Return” (back) action.
Bobby: no longer supported. Markup Validation Service: http://validator.w3.org/ 16 Colour Contrast Analyser: https://addons.mozilla.org/en-US/firefox/addon/7313 17 WAVE Toolbar: https://addons.mozilla.org/en-US/firefox/addon/6720 18 W3C, Relationship between Mobile Web Best Practices (MWBP) and Web Content Accessibility Guidelines (WCAG): http://www.w3.org/TR/mwbp-wcag/ 19 W3C mobileOK Checker: http://validator.w3.org/mobile/ 20 TAW mobileOK Basic Checker: http://validadores.tawdis.net/mobileok/en/ 15
286
I. Basdekis et al.
• Use of icons defined in stylesheets to avoid double announcements of alternative descriptions. • Avoid relying on color alone, but use the color coding in a consistent manner to help users correlate colors with services (learning disabilities). Comply with the “color opponent process”. • Use graphic icons only for orientation. 4. Data Form Completion • Provide error messages at the beginning of the (refreshed) form with links to errors. • Provide one-click login for unregistered users. • Auto fill default information. • Provide simple search as well as advanced search options such as history. Table 1 provides a summary of the service-specific guidelines emerged: Table 1. Examples of additional service-specific guidelines for the design and implementation of mServices Service E-mail News
Message board Chat Contacts
Blogs
User defined shortcuts Site map
Guideline Place the most important task first Provide each time just one free-text area on each screen Display the picture list after the content of the article with alternative descriptions Use article pagination to increase readability if necessary Flatten message-responses hierarchy for simplicity Place attachments and responses at the end of the message Provide access to the list of participants first Refresh the content on user demand Use contacts filtering based on letters Use an index where the letters will be visible only when there are contacts Use multiple pages (cards) with the contact details Focus on the current topic All replies/comments displayed should be associated with the current topic Use archiving mechanism for past topics Place that option high in the menu Allow the user to define the shortcuts up to a task level Use a list of the main tasks of the eservices with explanatory description
5 Discussion/Future Work This paper proposes the adoption of specific guidelines in the context of designing and developing networking mServices mainly targeted to people with disability. By following strict accessibility standards from the beginning of the design process, it is possible to deliver mServices that fully comply with even harder restrictions than for eServices, without compromising functionality. The presented design guidelines emerged as one of the results of an iterative design process involving web accessibility experts as well as users with disability. A conclusion stemming from this
Designing Universally Accessible Networking Services for a Mobile Personal Assistant
287
experience is that the provision of universally accessible web services in a mobile context requires more intensive efforts with respect to traditional web accessibility. This is mainly due to the fact that practical guidelines have to be derived from both MWBP and WCAG in the context of the specific services being developed. Overall, it is claimed that this experience contribute towards improving the production of costeffective and qualitative accessible and interoperable Web material by designers with no previous knowledge of accessibility guidelines. Initial tests proves that is possible to develop mServices that fully comply with W3C’s accessibility guidelines, however more user tests and heuristic evaluations are require to further validate this process. In the context of the project “Universally Accessible eServices for Disabled People”, user-based tests will follow, targeted to the refinement of the mServices. Users’ tests are are necessary for the fine tuning of the final outcome, based on a specific PDA device equipped with a mobile screen reader. To this purpose, HTC-TYTN II and Mobile Speak Pocket have been selected among candidates. Acknowledgments. This research has been conducted within the Greek nationally funded project “Universally Accessible eServices for Disabled People”. The authors would like to thank the Panhellenic Assiociation of the Blind (www.pst.gr), acting as the Project contractor, for their support. The project is funded by the Greek Government under the 3rd Community Support Framework and the accessible and interoperable web services will be available at http://www.ameanet.gr
References 1. Blair, M.E.: U.S. education policy and assistive technology: Administrative implementation. Invited paper for the Korea Institute of Special Education (KISE) in preparation for the KISE International Symposium (2006) 2. Nielsen Group, Survey of over 50,000 consumers reveals mobile operators’ issues and opportunities (2008), http://www.tellabs.com/news/2009/index.cfm/nr/53.cfm 3. Cabinet Office: eAccessibility of public sector services in the European Union (2005), http://www.cabinetoffice.gov.uk/e-government/eaccessibility 4. Nomensa: United Nations global audit of web accessibility (2006), http://www.un.org/esa/socdev/enable/documents/ fnomensarep.pdf 5. W3C-WAI, Shared Web Experiences: Barriers Common to Mobile Device Users and People with Disabilities, http://www.w3.org/WAI/mobile/experiences 6. Basdekis, I., Alexandraki, C., Mourouzis, A., Stephanidis, C.: Incorporating Accessibility in Web-Based Work Environments: Two Alternative Approaches and Issues Involved. In: Proceedings of the 11th International Conference on Human-Computer Interaction (HCI International 2005), Las Vegas, Nevada, USA, July 22-27 (2005) 7. Jones, M., Marsden, G., Mohd-Nasir, N., Boone, K., Buchanan, G.: Improving Web interaction on small displays. Computer Networks: The International Journal of Computer and Telecommunications Networking 31(11-16), 1129–1137 (1999) 8. Schilit, B.N., Trevor, J., Hilbert, D.M., Koh, T.K.: Web interaction using very small Internet devices. Comput. 35(10), 37–45 (2002)
288
I. Basdekis et al.
9. Buyukkokten, O., Molina, H.G., Paepcke, A., Winograd, T.: Power Browser: Efficient Web Browsing for PDAs. In: Proc. Conf. Human Factors in Computing Systems (CHI 2000), pp. 430–437. ACM Press, New York (2000) 10. George, F.: Generalized Fisheye Views. Human Factors in computing systems. In: CHI 1986 conference proceedings, pp. 16–23. ACM, New York (1986) 11. Gutwin, C., Fedak, C.: Interacting with big interfaces on small screens: a comparison of fisheye, zoom, and panning techniques. In: Proceedings of Graphics Interface 2004, London, Ontario, Canada, May 17-19, 2004, pp. 145–152 (2004) 12. Xiao, X., Luo, Q., Hong, D., Fu, H., Xie, X., Ma, W.-Y.: Browsing on small displays by transforming Web pages into hierarchically structured subpages. TWEB 3(1), 4 (2009) 13. Buchanan, G., Farrant, S., Jones, M., Thimbleby, H., Marsden, G., Pazzani, M.: Improving mobile internet usability. In: Proceedings of the 10th international conference on World Wide Web, Hong Kong, May 01-05, 2001, pp. 673–680 (2001) 14. Hoi, K.K., Lee, D.L., Xu, J.: Document Visualization on Small Displays, pp. 262–278 (2003) 15. Buyukkokten, O., Molina, H.G., Paepcke, A., Winograd, T.: Power browser: Efficient web browsing for pdas. In: Proceedings of the Conference on Human Factors in Computing Systems CHI 2000 (2000) 16. Clark, J.: Building accessible websites. New Riders (2003) 17. Geven, A., Sefelin, R., Tscheligi, M.: Depth and breadth away from the desktop: the optimal information hierarchy for mobile use. In: Mobile HCI 2006, pp. 157–164 (2006) 18. Card sorting: a definitive guide by Donna Spencer and Todd Warfel on 2004/04/07, http://www.boxesandarrows.com/view/ card_sorting_a_definitive_guide 19. Karampelas, P., Akoumianakis, D., Stephanidis, C.: User interface design for PDAs: Lessons and experience with the WARD-IN-HAND prototype. In: Proceedings of the 7th ERCIM Workshop, User Interfaces for All, Paris (Chantilly), France, October 24-25, pp. 474–485 20. Karampelas, P., Basdekis, I., Stephanidis, C.: Web user interface design strategy: Designing for device independence. In: Proceedings of 13th International Conference on HumanComputer Interaction (HCI International 2009), San Diego, California USA, July 19-24 (2009)
Activity Recognition for Everyday Life on Mobile Phones Gerald Bieber, Jörg Voskamp, and Bodo Urban Fraunhofer-Institut fuer Graphische Datenverarbeitung, Rostock, Germany {gerald.bieber,joerg.voskamp,bodo.urban}@igd-r.fraunhofer.de
Abstract. Mobile applications for activity monitoring are regarded as a high potential field for efficient improvement of health care solutions. The measurement of physical activity within every-day conditions should be as easy as using an automatic weighing machine. Up to now physical activity monitoring required special sensor devices and are not suitable for an every day usage. Movement pattern recognition based on acceleration data enables the usage of standard mobile phones for measurement of physical activity. Now, just by carrying a standard phone in a pocket, the device provides information about the type, intensity and duration of the performed activity. Within the project DiaTrace, we developed the method and algorithm to detect activities like walking, jumping, running, cycling or car driving. Based on activity measurement, this application also calculates the consumed calories over the day, shares activity progress with friends or family and might deliver details about different kinds of transportation during a business trip. The DiaTrace application can easily used today by standard phones which are already equipped with the required sensors. Keywords: Physical Activity Monitoring, Sensor Location, Mobile Assistance, Acceleration Sensor, Pattern Recognition, feature extraction, DiaTrace.
1 Motivation Mobile applications for activity monitoring are regarded as a high potential field for efficient improvement of health care solutions. The measurement of physical activity within everyday conditions should be as easy as using an automatic weighing machine. The determination of physical activities in everyday life suffers on suitable sensors and algorithms. Distributed multisensory systems provide a high accuracy for the recognition rate but they are very unhandy and inconvenient and can not be used in daily life. Single sensor systems achieve a sufficient recognition rate only in laboratory scenarios [8] and using a fixed sensor location and orientation which is in general at the hip, wrist or upper arm. The requirements on wearing position or hardware of these recognition systems do not support the real life scenario. The concern of everyday usage is not to have an additional sensing device but an integration of this functionality into a standard device such as a mobile phone, which should be easy to handle and accurate detect the every day activities. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 289–296, 2009. © Springer-Verlag Berlin Heidelberg 2009
290
G. Bieber, J. Voskamp, and B. Urban
2 Related Work In 2001, Richard W. DeVaul from the MIT started the scientific research of physical activity recognition by acceleration sensors. These research projects were carried out within the framework of the project MIThril (the name Mithril comes from a book by Tolkiens) with the aim to expedite context-aware wearable computing for daily life. From this research group important work is derived furthermore from e.g. S. Intille, Pentland et. al. who increased working on the comprehensive Context Awareness / Ambient Intelligence. The project MIThril was not continued in 2003 any further [5]. Furthermore the Finnish research institute VTT could speed up the research within the scope of a nationwide oriented project Palantir (the name Palantir likewise comes from a book by Tolkiens) together with the Finnish partners Nokia, Suunto, Clothing + and Tekes. The project [8] was discontinued in 2006. VTT itself continues currently the research in project Ramose, which is related to motion tracking. The research activities of Intel Research in cooperation with the University of Washington are focused on an activity logging system, which is called iMote. This platform provides support for the vision of Ambient Intelligence. The approaches of detection and classification of physical basic activities (e.g. walking, running etc.) can also be used for research on the quality of the execution of movements. Hereby the progress of e.g. Multiple Sclerosis can be examined, like Sylvia Lawry Centre MS Research Munich, Germany is doing. At present, in the field of algorithm development are especially active as follows: University of Technology Darmstadt, Georgia Tech (Group Abowd), Lancaster University (Group Gellersen), ETHZ (Group Mattern), Univ. Linz (Group Ferscha), University of Kagawa (Group Tarumi), VTT Finland and likewise University of Rostock (Group Kirste), Germany. The current work shows that physical activity recognition just by one high performance acceleration sensor is possible in laboratory environments. The challenge of research is the development of a suitable method of preprocessing and the identification of relevant features for activity recognition in the every day life.
3 Mobile Phone as Sensor Device In this paper we describe a novel concept of using a mobile phone without any additional devices for physical activity recognition. This enables a permanent, nonobtrusive activity monitoring for everyday usage. The latest generation of mobile phones is using acceleration sensors for orientation detection while taking pictures of landscape or portrait objectives. The acceleration sensor, also often called g- or tilt sensor, are becoming popular also as a new input interface for games. Hereby the steering wheel or squash racquet will be simulated by moving the mobile phone. Some manufactures are using the sensor for new interaction like "shake control" by Sony Ericsson to control the sound player. The quality criteria for acceleration sensors can be summarized in measurement range, sampling rate, sampling stability, quantization and noise. The acceleration sensor of mobile phones was designed for other purposes than activity recognition and so the sensor performance is quite low (e.g. 20Hz sampling, 3bit/g quantization). Usually, the sensor requirements for activity recognition are much higher than the
Activity Recognition for Everyday Life on Mobile Phones
291
acceleration sensor of mobile phones are able to provide. Sampling rates of over 100 Hz are usually used; a lower rate is regarded as unpractical because of sensor noise. The hardware of mobile phones requirements lead to the need of a better preprocessing and a suitable feature extraction. 3.1 Wearing Position Current motion detection systems require a predefined location of the acceleration sensor. Various phones (e.g. Sony Ericsson 560) already provide very simple pedometer functionality. Hereby it is mandatory to fix the phone at the belt. This works for training or sport sessions but in general it is not very suitable for users to have predefined wearing position for their phone. Specific wearing position does not meet the wearing behavior of users in everyday life. In [4], a survey of about 1549 participants from 11 cities in 4 continents provide characteristics of how mobile phones are carried whilst users are out and about in public spaces, typical phone locations are described as follows: y y y y y y
Trouser pockets, Shoulder bags, Belt enhancements, Backpacks, Upper body pockets and Purses.
To offer common pedometer functionality for everybody, easy and uncomplicated to use, it is very important to detect physical activity in every case of different wearing location. Acceleration Sensing for physical activity need basic requirements on the sensor signal. In our laboratory we could estimate the acceleration forces at the simulation of sport equipment as follows: − − − − − − −
Bowling (hand) Jogging (hip) Basketball (hand) Jumping (hip) Playing, romp (hip) Tennis, Golf (hand) Boxing without partner (hand)
~ 4g ~ 5g ~ 6g ~ 7-9g ~ 11g > 16g > 16g
The acceleration of the body for the first steps of a run is about 0.4 g, a rollercoaster has up to 4g, a human survives a permanent acceleration of 10g and a tennis ball has up to 1000g during the start phase [10]. 3.2 Sampling Rate Muscles of the human body are controlled by information which is transferred by nerves. The response time of humans depends on the kind of the signal (acoustic signals cause a longer response time than optical). In addition, the temperature of the muscles, psychological and physical constitution as well as external parameters such
292
G. Bieber, J. Voskamp, and B. Urban
as drugs, alcohol, nicotine, medicines influences the respond time. The average optical response time of a human is approx. 220msec [1]. The trill in the music for piano plays is indicated in the literature [7] as maximally 10 cycles per second and for stringed instruments as 13 cycles per second. A reflex however is a direct reaction without a procession in the brain which occurs within approx. t=0.06 seconds, which corresponds to t-1 = 16 Hertz. Because of Shannon theorem, a double sampling rate is necessary. The sampling rate should be a minimum of 32 cycles per second. Likewise to this view of sampling rate, researchers of similarly orientated projects [3] using similar frequency. This sampling rate is relevant for body movements. Artificial movements, e.g. engine vibration while driving a car provide additional frequency bands which are not covered. However, it is to be assumed that the selected sampling rate of 32 Hz is sufficient. 3.3 Relevant Activity Types A mobile device which is carried by the user for the entire day might be influenced by the user's physical activity. The every day usage of the mobile device requires the consideration of the relevant user activities (activity types). The every day behavior of young people and children consists [6] of only a few activities types. Hereby the most performed activity are lying (ca. 9 hours), sitting (ca. 9 hours), staying (ca. 5 hours) and being active (ca. 1 hour)[2]. For the determination of the energy consumption, some activity types can be summarized such as sitting, staying or lying to "resting". The locomotion is typically performed by walking, jogging, cycling or car driving and should be represented each in a separate activity type. Fuzzy activities such as cleaning, gardening or household are classified as being active. For an every day usage, this lead to the activity-list as follows: • • • • • • •
Device not present Resting (sleeping, sitting) Walking Running / Jogging Bicycle riding Car driving Being active (gardening, cleaning etc.)
This list can be extended and is not limited but the given list allows an estimation of the daily calorie consumption by the usage of the individual metabolic equivalent. DiaTrace is supporting to detect each of the given activity types plus jumping. 3.4 Mobile Phones Requirements Mobile devices with Java J2ME development environment such as Sony Ericsson w910i or w760i provide a sensor api (JSR-256) for an easy access to the acceleration sensor. The integrated sensors of the devices provide a sampling rate of 20 Hz which is lower than a requested 32 Hz frequency. In addition, the samples are not constant in time. The following figure illustrates the sampling distribution and shows the strong abnormality.
Activity Recognition for Everyday Life on Mobile Phones
293
Fig. 1. Varying sampling rate of acceleration values
During normal phone usages (e.g. calling), the sampling rate varies even more, up to some 1/10sec. The device provides the acceleration data with the exact time-stamp. These strong constraints leads to the concept of a basic reconstruction of the input data. 3.5 Preprocessing and Data Conditioning The very strong variability of the sampling rate of mobile phones requires a preprocessing and data conditioning of the acceleration values. Very acceleration value is delivered with an exact time-stamp. This enables the use of a data conditioning within the preprocessing module. We designed a preprocessing module which eliminates the effect of low sampling rate and varying scanning. DiaTrace uses a reconstruction of the true course of acceleration by interpolation of the scanned acceleration value of each axis. This preprocessing compensates the varying sampling rate as well as the rough quantization. This leads to a new input signal for the pattern recognition. By using relevant features, a long term assessment of daily activities is possible by DiaTrace.
4 Sample Application DiaTrace is a mobile application which provides assistive functionalities. DiaTrace measures the every day activities and reminds of additional activity if necessary, otherwise it congratulates the user. In a cooperative scenario – like long term support the comfortable activity monitoring throughout everyday enables a new kind of social connectedness because group members can see what users are doing during the day.
294
G. Bieber, J. Voskamp, and B. Urban
Fig. 2. Phone with integrated sensor showing actual activity
Fig. 4. Activity top ten of the buddies
Fig. 3. Activity recognition by a mobile phone over an entire day
Fig. 5. Electronic medals
The physical activity of a person can be shared to other friends. The mobile device with integrated acceleration sensor is able to send the activity level automatically to other buddies and so DiaTrace can be connected to a community platform. The mobile phone ranks the activity level and displays a top ten list with the current activity type. Other motivation instruments to support the performance of more physical activity are the achievement of electronic medals. In addition, the activities can be transferred to a personal web space. Here the activities are analyzed by intensities and daily energy consumption is calculated. The medical relevance of DiaTrace for overweight children is currently evaluated in a medical study. Hereby the eating is monitored by functionality of taking photos of the food with the mobile phone. The application showed that physical activity monitoring by a standard mobile phone is possible. The evaluation showed that a recognition rate of the type of physical activities is higher than 95% by wearing the phone in the front pocket of a trouser. The correctness is lower at other wearing locations, some activity types (e.g. cycling)
Activity Recognition for Everyday Life on Mobile Phones
295
is false detected (e.g. as car driving) when the phone will be carried in a jacket or bag. The good recognition is possible by the preprocessing of data and suitable feature selection.
5 Conclusions In this paper, we present the DiaTrace project which allows the identification of physical activity in everyday life on a standard mobile phone. A three-dimensional acceleration sensor, which is already integrated in standard phones, can be used to determine physical activity by domain specific feature extraction. By use of data mining techniques and a preprocessing of acceleration data, a suitable feature can be recognized which describe high quality and robust classification of physical activity. The proof-of concept prototype receives a recognition rate of >95% by the activity types of resting, walking, running, cycling and car driving, just by wearing the device in the front pocket of a trouser. The activity level can be shared to friends or buddies and might be helpful to appraise the sporting activity. The application can be used for monitoring the daily calorie consumption by inclusion of the metabolic equivalent for each activity type. This technique enables to support medical applications. We envision the setup of a physical activity database for a homogeneous appraisal of results of activity recognition. Furthermore we are working on a combination of physical activity monitoring with emotion sensing devices like EREC [9], which would allow for an even better personalized, sensitive assistance.
References 1. Biermann, H., Weißmantel, H.: Benutzerfreundliches und Seniorengerechtes Design (SENSI Regelkatalog), VDE-Fortschrittsberichte, Reihe 1 Konstruktionstechnik Nr.247 (2003), ISBN -318-324701-1 2. Bös, K., Worth, A., Opper, E., Oberger, J., Romahn, N., Wagner, M., Woll, A.: MotorikModul: Motorische Leistungsfähigkeit und körperlich-sportliche Aktivität von Kindern und Jugendlichen in Deutschland (i.V.). Forschungsendbericht zum Motorik-Modul, KIGGS (2007) 3. Bouten, C.V.C., Koekkoek, K.T.M., Verduin, M., Kodde, R., Janssen, J.D.: A Triaxial Accelerometer and Portable Data Processing Unit for the Assessment of Daily Physical Activity. IEEE Transactions On Biomedical Engineering 44(3), 136–147 (1997) 4. Chipchase, J., Yanqing, C., Ichikawa, F.: Where’s The Phone? Selected Data, survey, NOKIA (2007) 5. DeVaul, R., Sung, M., Gips, J., Pentland, A.: MIThril: Applications and Architecture. In: Proc. 7th IEEE International Symposium on Wearable Computers, White Planes, NY, USA, October 21-23 (2003), http://www.media.mit.edu/wearables/mithril/ 6. Gesundheitssurvey, Mensink: Körperliche Aktivität, Gesundheitswesen 61, Sonderheft 2, Robert Koch-Institut, p. 126, Berlin (1999) 7. Lange, H.: Allgemeine Musiklehre und Musikalische Ornamentik. Franz Steiner Verlag (2001) ISBN-13: 978-3515056786
296
G. Bieber, J. Voskamp, and B. Urban
8. Pärkkä, J., Ermes, M., Korpipaä, P., Mäntyjärvi, J., Peltola, J., Korhonen, I.: Activity Classification Using Realistic Data From Wearable Sensors. IEEE Transaction on Information Technology in Biomedicine 10(1) (2006) 9. Peter, C., Ebert, E., Beikirch, H.: A Wearable Multi-sensor System for Mobile Acquisition of Emotion-Related Physiological Data. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 691–698. Springer, Heidelberg (2005) 10. Wikipedia (2009), http://de.wikipedia.org/wiki/Beschleunigung (last access: February 23, 2009)
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems Pascal Bruegger and Béat Hirsbrunner Pervasive and Intelligence Research Group Department of Informatics - University of Fribourg - Switzerland {pascal.bruegger,beat.hirsbrunner}@unifr.ch
Abstract. We present in this paper a semantic model for the conception of pervasive computing systems based on object or user's motions. We describe a system made of moving entities, observers and views. More specifically, we focus on the tracking of implicit interaction between entities and their environment. We integrate the user’s motion as primary input modality as well as the contexts in which the interaction takes place. We have combined the user activities with contexts to create situations. We illustrate this new concept of motionawareness with examples of applications built on this model. Keywords: Pervasive computing, Ubiquitous computing, Motion-awareness, Kinetic User Interface, HCI.
1 Introduction In this paper, we explore a new human-computer interaction (HCI) paradigm for pervasive computing systems where location-awareness and motion tracking are considered as first input modality. We call it Kinetic User Interface (KUI) [1]. Nowadays many projects such as EasyLiving [2] or GUIDE [3] have developed Ubicomp1 technologies like mobile devices or applications and have enhanced human experience for instance by providing contextualised services mainly according to user’s location. However, most of current context-aware systems are limited to external parameters and do not take into account user-centric dimensions. In our model, we consider user’s activity as a way to reflect its goal and intention. The paper formalizes KUI as a system composed of entities and observers. Kinetic objects (entities), possibly living things, interacting naturally with their environment are observed by agents (observers) which analyse their activities and contexts. The model focuses on implicit interaction and unobtrusive interface of motion-aware computing systems. The challenge consists in modelling the physical "world" in which entities live and interact into a conceptual system representing this world in a simple and flexible manner. We propose a generic model and a programming framework that can be used to develop motion-aware and situation-aware applications. In section 2, we define our model of system made of on entities, observers and views. In section 3, we present the concept of motion-awareness. In section 4, we 1
Ubiquitous Computing equivalent to what we have called in this paper pervasive computing.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 297–306, 2009. © Springer-Verlag Berlin Heidelberg 2009
298
P. Bruegger and B. Hirsbrunner
describe the implementation of this model and in section 5, we present the application domain with the description of three KUI enabled projects based on our model.
2 KUI Model: A Systemic Approach The approach we have chosen for our semantic model is based on the General System Theory (GST) and the research of three authors [4],[5],[6] who propose interesting visions of systems. For Alain Bouvier ([6], p.18), a system (complex organised unit) is a set of elements in dynamic interaction, organised to reach a certain goal and differentiated within its environment. It has an identity and represents a "finalised whole". General System Theory (GST) defined by von Bertalanffy [4] describes systems in sciences such as biology, chemistry, physics, psychology. GST gives the framework and the concepts to model specific systems studied in sciences. There exist different types of systems such as inert or dead, living or evolutionary, open (exchanging matter with its environment) or close. For instance, we can see the world as a whole extremely complex living but close system. For physicians, this perception of the world is not correct and this vision is reductive. The world (our planet) is one component of the solar system and it is part of the equilibrium of this system. Boulding in [5] writes: an "individual" - atom, molecule, animal, man, crystal - (entity) interact with its environment in almost all disciplines. Each of these individuals exhibits "behaviour", action or change and this behaviour is considered to be related in some way to the environment of the individual, that is, with other individuals with which it comes into contact or into some relationship. The important points in Boulding’s definition is that: • The entity’s actions (activities, behaviour) are related to their environment; • Entities come into relationship. In KUI, systems are open and dynamic (living). Their complexity evolves over time with respect to their components. Components can join and leave systems, increasing or reducing their size. We have included two concepts which are not present in the chosen author’s definitions: • The observer: who/what is observing the system; • The view: observer ‘s point of view. We define a system as a set of observable, interacting and interdependent objects, physical or virtual, forming an integrated whole. The system includes different types of objects: entities, observers, and views. 2.1 Entities Entities are the observable elements of the system. They can be physical or virtual, living things (humans, animals), moving objects (cars, planes), places (rooms, buildings). An entity is made of contexts and do activities (Fig. 1).
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems
do
Made of
Contexts
299
Entity
Activities
Fig. 1. Basic structure of an entity
Contexts. As defined by A. Dey et al. [8], a context is any information that can be used to characterise the situation of an entity. In our model, the contexts are used to define the attributes of an entity. Contexts do not include the activity. The activity is influenced by the environment and therefore by the contexts in which it is made. We will see later that contexts provide relevant information to the observer in the situation analysis. We use the following contexts in our model: Identity, Location, Role, Status, Structure, Relations. Identity and location. The identity is the name of the entity and must be unique in order to be differentiated within the system [6]. The location is the address where the entity can be observed. The location is made of an address and a time. The dynamic behaviour of an entity makes the address possibly dynamic and time must be taken into consideration. Role. We have defined two roles of entity in our model: 1) actor and 2) place. They indicate to the observer what it should focus on. For instance, motions and activities are the focus when the entity is an actor. Roles are dynamics and the entity, according to the point of view of the observer, is sometimes an actor and sometimes a place. For example, a cruise boat can be observed as an actor when we consider its kinetic properties (cruising around the globe) and it is considered as a place when we focus on its structure (passenger, cabins, decks). Status. It provides entity kinetic information to the observer. An entity has two possible status: mobile (motion capabilities) or static (fixed). Structure. The structure of an entity can be simple or complex. A simple entity does not contain other entities. It is called an atom (e.g. a walking person). At the opposite an entity with a complex structure is said composed (e.g. a house). A composed entity contains other entities and must have at least one atom. The structure of an entity is dynamic and evolves over time. For example, a house is made of rooms and contains inhabitants. Each time a person changes room, the structure is modified. Relations. They determine the type of interaction the entity has with its environment. Relations provide information about the state of entities and contribute to evaluate the situation. We consider two types of relations between entities: spatio-temporal relations and interactional relations. A spatio-temporal relation defines the "physical" connection between entities. When an actor is nearby a place or an other actor at the same time, a temporary relation exists. Our model of spatio-temporal relation is inspired from the spatial relationship used in GIS [10]. We differentiate 4 types of spatio-temporal relations:
300
1. 2. 3. 4.
P. Bruegger and B. Hirsbrunner
Proximity (Next To) Containment (Inside). Contiguity (Juxtaposed) Coincidence (overlap)
The relations 1,2 are created between an actor and an other actor or a place and the relations 3,4 concern places. We call interactional any relation between actors needed to carry out complex activities. These relations are parameters used to determine the feasibility of an activity. We have identified 3 types of relations: 1. Collaborative 2. Causal 3. Conditional. The collaborative relations are set to carry out activities that cannot be achieved by one actor. For instance, the activity of fixing a vertical pillar on the street requires the intervention of a crane driver who lifts up the pillar and maintains it stable while a worker is bolting the pillar on the plinth. We see in this case that the completion of the main activity is possible only if at least two specialised activities are combined. A causal relation is present when the activity A causes the activity B. For instance, if a car (entity A) moves it implies that the driver (entity B) moves as well. Causal relation is useful to check the "validity" of the detected activity. Conditional relations are created when activities have to be made in a given order. Like in causal relation, it allows to check the validity of an activity. Activity B can be done only if activity A is done before. Activities in Places. Activities are controlled within a place. Places have rules that determine the authorised activities, the forbidden ones and the negotiated ones. We introduce the concept of white and black activity lists. White-listed activities are the authorised activities that can be carried out with no reaction from the observer. They are accepted as such. Black-listed activities are, in contrary, forbidden and provoke an immediate reaction from the observer. We also take into consideration what we call the "grey" list. If an activity is not explicitly declared in the white or black lists then it is "negotiable" and gives the freedom to evaluate it and to infer on the situation. Activity lists (black and white) allow the observer to quickly react when something is going on in a place. The observer simply checks if the ongoing activity is explicitly present in one of the lists. 2.2 Observers and Views In the previous section, we have detailed the system by its entities. The second part of the system consists of the entities observation. Observers are the agents which observe the moving entities populating a system. They collect and analyse information (activities and contexts) of actors and places and possibly react on dangerous or inappropriate situations in which actors could be. Observers are specific and analyse one or a small number of situations. Our vision is to have more observers but less programming complexity per observer.
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems
301
To illustrate this concept, we take the example of UN2 observers placed at the border between two countries during a cease fire. Their role is to watch, to observe movements of troupes of both countries (the situation) and to react or notify when they detect violations of rules. These rules are established in advance and must be respected by the actors on the field. For instance soldiers must not cross the no-man’s land. An UN observer analyses the actor’s activities and contexts (location, time) and reports any detected incident or violation to the higher level, his hierarchical superior. None intrusive behaviour of observers. Weiser in [9] has introduced the concept of calm technology. In his concept, the user is more and more surrounded by computing devices and sensors. It becomes a necessity to limit the direct interaction with the computing systems in order to avoid an unneeded cognitive load and let the user concentrate on its main activity. Our concept of observer is inspired from the Weiser’s idea. There is no interference with actors and places: the observer reports only problematic situations to the higher level and let the application decide what to do. Views. The entities are observed under certain points of view. Observers can select different points of view to analyse the same situation. Each point of view represents a focus on the situation. Many observers can use similar views for different situations analysis. A view is a multi-dimensional filter placed between an observer and the entities. It allows or constraints the observer to focus on a certain part of the system. The focus goes from the root structure, to one atom. We have 2 dimensions in our model of view: range and level. The range is a parameter that influences the scope of the observation (e.g. the ocean or only a cruise boat) and the level is the parameter which gives the granularity of the observation (e.g. decks or decks and cabins or passengers). For instance, a photographer uses different types of lenses according to the level of observation. The wide angle lens gives a large view of the landscape (the range) but looses details like bees on flowers. Now if the focus is a bee on a flower then a macro lens is needed. The level changes. Actually, the photographer cannot have at the same time the level of a bee and a wide landscape. This limitation range/level is solved in our model.
3 A Motion-Based Model for Situation Awareness In this section, we start to define how the kinetic information of the different entities are processed and how from a simple motion a situation is derived. Context-aware system often consider the user’s external parameters such as location, time, social information and activity to characterise a situation. In our model, we bring a new point of view in situation characterisation by separating the activity from contexts. Indeed, we consider that user’s activity should be interpreted in their contexts in order to fully understand their situation. Like the figure 2 shows, the motion-aware model is divided in 2 levels. At the entity level, we have the activities and contexts. It includes the motion detection. Situations are analysed at the observer level and are high-level semantic information. 2
United Nations.
302
P. Bruegger and B. Hirsbrunner
Fig. 2. Activity and contexts are component of a situation
Our situation-aware model is inspired from the Activity Theory presented by B. Nardi and K. Kuutti in [11],[12], Y. Li and J. Landay in [13] as well as the Situation Theory of J. Barwise et al. [14] and the work of S. Loke [15]. Situations. In [13], Y. Li and J. Landay propose a new interaction paradigm for Ubicomp based on activity (activity-based ubiquitous computing). In their model, the relation between activity and situation is defined as follow: An activity evolves every time it is carried out in a particular situation. A situation is a set of actions or tasks performed under certain circumstances. Circumstances are what we call contexts. According to Loke, the notion of context is linked to the notion of situation [15]. He proposes the aggregation of contexts (perhaps varieties of) in order to determine the situation of entities. In that sense the situation is thought as being at an higher level than context. Loke makes a difference between activity and situation and considers an activity as a type of contextual information to characterise a situation. Our model of situation combines the two visions (contexts and activities, Fig 3b) and we define it as follows: A situation is any activity performed in contexts. Context-Awareness. The user is often unaware of the surrounding computing systems and do not feel the interaction with it. As mentioned by A. Dix [16], the main challenge of the pervasive computing is the Where computers are, the context-aware computing challenges are What it means to interact with computers. Context-aware applications uses contextual information to automatically do the right thing at the right time for the user [15]. Dey et al. [8] defines contexts as “Any information that can be used to characterize the situation of entities (i.e. whether a person, place or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves […]”. They consider that the most important contexts are the location, identity, activity and time. This definition brings a new fundamental dimension in our model: the activity. Activity. In [15], activity typically refers to actions or operations (Fig. 3b.) undertaken by human beings such as “cooking”, “running”, “reading”. For Yang Li and James Landay [13], activity like "running" is considered as an action focused on attaining an immediate goal. They consider, like Kuutti in [12], that an activity is the long-term transformation process of an object (e.g. a user’s body) oriented toward a motive (e.g. keeping fit). The notions of "long term" and "immediate" allow the separation of activities and actions. This raises some questions not answered in this paper: what do we consider as long term and immediate? When an action becomes an activity and vice-et-versa? In our model, we consider that an activity is made of detected motions aggregated into operations and actions and is an input for observers.
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems
303
4 KUI Development Framework: uMove v2 uMove v2 is the 2nd JAVA™ based implementation of the KUI concepts [1],[17]. It offers to programmers a standard platform to develop KUI enabled applications. The framework is separated into three layers in order to have a clear separation of concerns (Fig. 3a).
Sensor layer
Entity layer Observation layer
Motion-aware application
Obs1
Obs2
View1
Application
Obs3
Situation
View2 Activity
e11 e111
e12
e112 e1121
Contexts
Conscious
e-Space
e1
Activity mng
Observer level
Action
Contexts mng
Entity level Operation
e1122
Unconscious Motion Widgets1
Widgets2
Widgets3
sensor1
sensor2
sensor3
s1
s2
s3
s4
Sensor level
Fig. 3. a) uMove architecture, b) motion-aware model
The sensor layer contains all the widgets representing the logical abstraction of the sensors connected to the system. Then we have the entity layer in which we find the logical representation of the physical users or objects being observed. The activity manager aggregates the motion events into activities and makes them available to the observer. The contexts manager gets the sensor information, updates the entities and sends the information to the observation layer which analyses the current situation of the entity. Observers send events to the application according to the detected situations. This model allows the programmer to concentrate on the specific needs of the application without worrying about the communication between sensors (widgets), user or objects (entities) and their management (creation, remove, modification). The activity classes must be specifically developed and can be combined to enable complex motion pattern recognition. Observer and view classes, like activity classes, are developed for specific situations analysis.
5 Application Domain Our KUI model focuses on applications which integrate motions in a large sense as main input modality. In the ubiGlide project [17], uMove allows the application to track hang-glider or paraglide motions. Based on their contexts the application infers and informs the pilot (Fig. 4) about potentially dangerous situations like flying over a no-fly zone, a national park or nearby a storm. In ubiShop project [18], the application is in charge to control the inventory of the fridge in a family house or a shared flat and
304
P. Bruegger and B. Hirsbrunner
to request one or more house/flat inhabitant to get missing items (milk, juice, eggs) according to their current location and activity. For instance the father quietly returning by foot from work and passing nearby a grocery shop will be informed by the system that items are needed. However the system does not react if it detects that the father is running or walking fast; it looks for somebody else. In Smart Heating System (SHS) project [19], the application must adapt the room temperature according to the profile of users in the room and their current activity. The application regulates the thermostatic valves of radiators keeping a comfortable temperature in the room and avoids the waste of energy.
Fig. 4. ubiGlide - pilot’s graphical user interface of FlyNav application
These projects validate uMove in three classes of applications. ubiGlide proposes a model of application for outdoor activity tracking such as trekking, mountain biking, sailing. These applications can prevent accidents by observing the user’s behaviour in specific environments and by informing him/her about potential dangers. ubiShop is a validation scenario for applications that need to track motions and locations of entities in urban environments and distribute tasks in an opportunistic manner like for courier Table 1. Overview of the three projects developed with uMove Projects: Entities Activities Contexts Sensors Architecture Environment
ubiGlide[17] Flying objects, zones, mobile zones Flying Flying objects, zones, mobile zones, location, speed GPS Distributed Outdoor
ubiShop[18] People, zones, shops
SHS[19] People, rooms
Running, walking, standing Time, location, speed
Quiet, active, slepping Location, time
GPS, RFID
RFID, accelerometer Centralised Indoor
Centralised, web based Indoor, outdoor
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems
305
or taxi services. Finally, SHS validates uMove in indoor environments and can be a model for applications providing service information on mobile devices or controlling the environment within a building according to user’s location and activities. Table 1 shows the components of our model used or not in each of the three projects. Situation analysis is not yet implemented in these projects.
6 Conclusion This paper has presented a new human-computer interaction paradigm where location-awareness and motion tracking are considered as first input modality. We call it Kinetic User Interface (KUI). We have presented the semantic model of KUI based on a systemic approach. uMove programming framework implementing KUI has been described and three projects using the KUI concepts and uMove have been presented. We believe that KUI concept and implementation can offer a good tool for developers to rapidly prototype applications that integrate motions of users and mobile objects as main implicit input modality. Based on this new semantic model, uMove v2 has been finalised and as future work the three projects presented in section 4 will be upgraded to the new version. This will include the concepts of observers and views. Two other important challenges are planned in a near future: activity and situation modelling and implementation. In Smart Heating System, only 3 types of activities are taken into consideration and we will propose standard interaction patterns that can be used by developers in specific applications and extended types of activity recognition. We will provide guidelines to help programmers to properly define their system including the entities, observers and views before using uMove to implement the motion-aware application, activities tracking and situation analysis. User studies must be also conducted in order to verify the concept of unobtrusive interfaces in particular in user’s activities that request a high level of attention like flying, driving or manipulating dangerous equipments. Acknowledgments. We would like to thank Denis Lalanne and Daniel Ostojic for the feedback and advises on this paper. This work is supported by the Swiss National Found for Scientific Research Grant n. 116355.
References 1. Pallotta, V., Bruegger, P., Hirsbrunner, B.: Kinetic user interfaces: Physical embodied interaction with mobile pervasive computing systems. In: Kouadri-Mostefaoui, S., Maamar, Z., Giaglis, G. (eds.) Advances in Ubiquitous Computing: Future Paradigms and Directions, ch. 7. IGI Publishing (2008) 2. Brumitt, B., Meyers, B., Krumm, J., Kern, A., Shafer, S.: EasyLiving: Technologies for Intelligent Environments. In: Thomas, P., Gellersen, H.-W. (eds.) HUC 2000. LNCS, vol. 1927, pp. 12–29. Springer, Heidelberg (2000) 3. Cheverst, K., Davies, N., Mitchell, K., Friday, E.A.: Developing a Context-aware Electronic Tourist Guide: Some Issues and Experience. In: Proceedings of CHI 2000, Netherlands (2000)
306
P. Bruegger and B. Hirsbrunner
4. von Bertalanffy, L.: General System Theory. Foundations, Development, applications. George Braziller (1969) 5. Boulding, K.: General systems theory. The Skeleton of Science 2(3), 197–208 (1956) 6. Bouvier, A.: Management et projet. Hachette, Paris (1994) 7. Vallgårda. A: A framework of place as a tool for designing location-based applications. Excercept of Master Thesis (2006) 8. Dey, A., Abowd, E.D., Salber, G.D.: A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human Computer Interaction Journal 16, 97–166 (2001) 9. Weiser, M., Brown, J.S.: The coming age of calm technology, http://www.cs.ucsb.edu/ebelding/courses/284/w04/papers/ calm.pdf 10. Calkins, H.W.: Entity-relationship modelling of spatial data for geographic information systems, http://www.geo.unizh.ch/oai/spatialdb/ergis.pdf 11. Nardi, B.A.: Context and Consciousness, vol. 1. MIT Press, Cambridge (1995) 12. Kuutti, K.: Activity Theory as a Potential Framework for Human-Computer Interaction Research. MIT Press, Cambridge (1996) 13. Li, Y., Landay, J.A.: Activity-based prototyping of ubicomp applications for long-lived, everyday human activities. In: CHI 2008: Proceeding of SIGCHI conference on Human factors in computing systems, New York, NY, USA, pp. 1303–1312 (2008) 14. Barwise, J., Gawron, J.M., Plotkin, G., Tutiya, S.: Situation Theory and its Applications. In: Center for the study of language and information - Stanford, vol. 2 (1991) 15. Loke, S.W.: Representing and reasoning with situations for context-aware pervasive computing: a logic programming perspective. The Knowledge Engineering Review, 213–233 (2004) 16. Dix, A., Finlay, J., Abowd, G.D., Beale, R.: Human-Computer Interaction. In: Pearson, 3rd edn. Prentice Hall, Englewood Cliffs (2004) 17. Bruegger, P., Pallotta, V., Hirsbrunner, B.: UbiGlide: a motion-aware personal flight assistant. In: Adjunct Proceedings of the 9th International Conference on Ubiquitous Computing, UBICOMP, Innsbruck, Austria, pp. 155–158 (2007) 18. della Bruna, D.: Ubiweb & Ubishop. Master project, Supervisors V. Pallotta, P. Bruegger, university of Fribourg – CH (2007) 19. Pallotta, V., Bruegger, P., Hirsbrunner, B.: Smart Heating Systems: optimizing heating systems by kinetic-awareness. In: 3rd IEEE International Conference on Digital Information Management (ICDIM 2008), pp. 887–892 (2008) ISBN: 978-1-4244-2917-2
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation Vlado Glavinic1, Sandi Ljubic2, and Mihael Kukec3 1
Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000 Zagreb, Croatia
[email protected] 2 Faculty of Engineering, University of Rijeka, Vukovarska 58, HR-51000 Rijeka, Croatia
[email protected] 3 College of Applied Sciences, Jurja Krizanica 33, HR-42000 Varazdin, Croatia
[email protected]
Abstract. Many ubiquitous computing systems and applications, including mobile learning ones, can make use of personalization procedures in order to support and improve universal usability. In our previous work, we have created a GUI menu model for mobile device applications, where personalization capabilities are primarily derived from the use of adaptable and adaptive techniques. In this paper we analyze from a theoretical point of view the efficiency of the two adaptation approaches and related algorithms. A task simulation framework has been developed for comparison of static and automatically adapted menus in the mobile application environment. Algorithm functionality is evaluated according to adaptivity effects provided in various menu configurations and within several classes of randomly generated navigation tasks. Simulation results thus obtained support the usage of adaptivity, which provides a valuable improvement in navigation efficiency within menu-based mobile interfaces. Keywords: personalization, adaptation, algorithmics, m-devices, m-learning.
1 Introduction Mobile learning (m-Learning), the intersection of online learning and mobile computing, promises the access to applications supporting learning anywhere and anytime, implementing the concepts of universal access [10]. Personal mobile devices and wearable gadgets are presently becoming increasingly accessible and pervasive, while their improved capabilities make them ideal clients for the implementation of many various mobile applications [8], among which m-Learning represents one of the most important and attractive ones. However, the acceptance of new m-Learning systems is highly dependent on usability challenges, the most important of them being technology variety, gaps in user knowledge and user diversity [9]. The potentiality for including the widest possible C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 307–316, 2009. © Springer-Verlag Berlin Heidelberg 2009
308
V. Glavinic, S. Ljubic, and M. Kukec
parts of the population in the interactive mobile learning process implies particular emphasis on the user interface design and the quality of interaction [5]. These HCI issues become even more considerable within present day mobile device applications (MDAs), which have a firm tendency for increased complexity, sophisticated interfaces and enriched graphics. Hence, the development process for such MDAs must involve personalization procedures that are essential for tailoring them to individual users' needs and interaction skills. As the general framework of mobile interaction heavily bases on two interaction styles – menu-based and direct manipulation, we have focused our interest on personalization of MDAs through a transformable and moveable menu component with adaptable and adaptive features, introduced in [6] – see Fig. 1.
Fig. 1. Adaptation algorithm usage in the general menu personalization process
In this paper we analyze from a theoretical point of view the respective menu navigation efficiency, in the case where automatic interaction personalization is provided by the usage of two different adaptation algorithms.
2 Transformable Menu Component In general, menus represent a core control structure of complex software systems, and therefore provide an interesting object for personalization research [2], especially in the mobile devices' environment. Here the focus is primarily on the speed of interaction between user and menu-based MDA, since this is considered to be one of the main factors in producing a truly usable system. Minimization of the interaction burden for the user, what is an important aspect of speed [1], can be accomplished both by avoiding a multi-screen menu hierarchy and by reducing the number of keystrokes. For that reason, our menu component has the usual well-known form, with size and shape adequately reduced according to mobile device display limitations (Fig. 2). Because of the user diversity and high probability that different users will use different navigation patterns, even when working on very similar tasks, adaptation algorithms can generate various menu configurations [6]. A particular configuration thus personalized can be retrieved (at MDA startup) from and stored (at MDA shutdown) to the Record Management System (RMS) of the local device, or to a remote server by
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation
309
Fig. 2. Menu componet running on different device emulators
the respective Servlet application. RMS represents both an implementation and API for persistent storage on Java ME devices. It provides associated applications (of the Java MIDlets class) the ability to access a non-volatile place for storing the object states [7]. Since the RMS implementation is platform-dependent, it makes a good sense to guarantee redundancy by additionally storing menu configurations to the remote server and subsequently retrieving them from the server. A personalized menu must furthermore provide an easy access to all of the existing menu functions, including adaptable ones. For that reason, our menu component is thoroughly modeled using state diagrams (cf. [11]) where all available state transitions are initiated through exactly one keystroke on the mobile device input, thus providing a platform for optimal interaction efficiency (Fig. 3). 2 { Servlet 1
*
{ Store @servlet }
*
retreive }
{ Show bar }
STARTING_STATE
{ RMS retreive }
{ Embedded retreive }
#
3 UP/DOWN LEFT/RIGHT
{ cancel }
* 5 { custom
*
mark }
HEADER_CUSTOMIZATION
# UP/DOWN LEFT/RIGHT
FIRE
{ Toggle txt/icon }
0 # { Toggle 0
HEADER_NAVIGATION
FIRE
NO_DRAW
5 { Store @RMS }
{ Toggle txt/icon }
{ substitution }
UP DOWN LEFT RIGHT
MOVEABLE
FIRE
{ response, adapt }
txt/icon }
UP/DOWN
0 ITEM_CUSTOMIZATION
FIRE UP DOWN LEFT RIGHT
# { Toggle txt/icon }
*
{ custom mark }
LEFT/RIGHT
# { Toggle FIRE
ITEM_NAVIGATION
*
0
txt/icon }
{ substitution }
{ cancel }
UP/DOWN
Fig. 3. Menu state diagram. The black node represents the spot where automatic adaptation is performed.
Regarding the implemented adaptable (i.e. user-controlled) options (see Fig. 1), the user is provided with the ability (i) to easily control the visibility mode, (ii) to adjust menu orientation and respective docking position, (iii) to toggle the menu appearance
310
V. Glavinic, S. Ljubic, and M. Kukec
both between character-oriented and iconic-based styles and (iv) to manually customize both the menu header and item positions within its hierarchical scheme. On the other hand, the use of adaptive techniques means that MDA user interface changes will be partially controlled by the system itself, providing usability enhancement through increased interaction speed while getting m-Learning tasks done.
3 Adaptation Algorithms First of all, it should be noted that automatic adaptation is based on algorithms both monitoring a user's prior navigation behavior and rearranging menu item positions within a particular popup/pulldown menu frame. In the following two adaptation approaches are compared with the original (static) menu configuration. While a frequency-based (FB) algorithm simply changes item positions according to their selection frequencies, a frequency-and-recency-based (FRB) algorithm refines the same idea by additionally promoting the most recently selected item [3]. The difference in related adaptation effects is visualized in Fig. 4.
Adaptive menu configuration ( current state)
Original menu configuration
Header X: Name
Header X: Name
Item X-1
Item X-4
(11)
Item X-2
Item X-2
(7)
Item X-1
(5)
Item X-5
(5)
TIME (menu navigation, items selection )
Item X-3 Item X-4
Item X-1: 5 selections Item X-2: 7 selections Item X-3: 0 selections Item X-4: 11 selections Item X-5: 5 selections Item X-6 (hidden subset): 2 selections
Item X-5
Current selection: Item X-5
Frequency based adaptation
(2) Item frequency list (sorted)
Frequency -and-recency based adaptation
Header X: Name
TMFU
Item X-6
Header X: Name
Item X-4
(11)
Item X-2
(7)
Item X-5
(6)
Item X-1
(5)
Item X-6
(2)
Item frequency list (sorted)
TMFU TMRU
Item X-4
(11)
Item X-5
(6)
Item X-2
(7)
Item X-1
(5)
Item X-6
(2)
Item frequency list (sorted, with TMRU exception)
Fig. 4. The difference between frequency-based and frequency-and-recency-based adaptation: while the former promotes the most frequently used (TMFU) item only, the latter additionally promotes the most recently used (TMRU) one
As shown in the figure above, after a certain period of time and a related menu navigation pattern, the initial item positions are rearranged according to a sorted item frequency list. Denoting this as a current state of the adapted menu configuration, two
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation
311
outcomes are possible upon Item X-5 selection. If FB adaptation is used, new repositioning is expected, based on the updated item frequency list. However, if automatic adaptation is ensured by the FRB algorithm, the currently selected item (Item X-5) will be replaced to the TMRU position, updating the frequency list and reordering its items. Using recency criteria, every menu item has a "fair chance" to quickly appear and be retained in the promoted part of the item set, regardless of its current frequency value. The core of the automatic adaptation algorithm can be specified through the following pseudocode, with the framed part referring to the case when the recency condition is active (can be omitted if the FB approach is used): if (keyPressed=FIRE_BUTTON) then Update_Frequency_List(selected_item, item_freq++); if NOT position(selected_item, TMFU) then Update_ItemPositions(itemSet, freqList, noRecency); if NOT position(selected_item, TMRU) then Move(selected_item, TMRU_position); Update_ItemPositions(itemSet, freqList, recency); end if end if Application_Response(selected_header, selected_item); end if
The abovementioned algorithm's usage is inspired by the work carried out in [2], where a similar approach was applied on adaptive split menus. In that particular research, the idea of using frequency and recency characteristics emerged from experiences gained working with the Microsoft Office 2000 suite with dynamic menus, which adapt to an individual user's behavior. Whilst the related work is based in the desktop application environment (with the mouse as exclusive input device), we are dealing with an MDA setting and the corresponding mobile device navigation keypad. We believe that efficiency enhancement in mobile interfaces navigation, provided by automatic adaptation, exceeds the debatable benefits reached in desktop menu navigation.
4 Task Simulation Framework As there is still a lack of evaluation studies capable of distinguishing adaptivity from general usability [4], we have developed a task simulation framework able to compare static and adaptive menu configurations and their respective navigation options. Since the time required for the completion of menu navigation tasks directly depends both on the time to locate a target menu header in a root menu bar, as well as on the time to select an item from a single popup/pulldown menu frame, it is quite straightforward to specify the navigation performance level by determining the exact number of keystrokes needed for task fulfillment, within the input set of four navigation keys and a fire button (Fig. 5).
312
V. Glavinic, S. Ljubic, and M. Kukec
Fig. 5. If the general navigation keypad is used, it is a simple task to calculate the "keypad distance" from the starting position to destination. If Header 3 is considered as the current menu position, selecting Item 5-4 would require 7 keypad strokes: 2 RIGHT arrows, 4 DOWN arrows, and the FIRE button.
Various static and adapted menu configurations can be compared, based on the aforesaid calculation method and several simulation parameters which are introduced and thoroughly explained in Table 1. Table 1. Parameters (and structures) used in the task simulation framework Configuration
Menu configuration
Task configuration
Parameter / Structure
Type
Characteristic
Headers
User-defined
Items_MIN
User-defined
Number of first-order menu options (number of menu headers) Minimal number of items within each menu frame
Items_MAX
User-defined
Maximal number of items within each menu frame
Config
Random
Randomly generated menu configuration with #Headers, each of them containing between #Items_MIN and #Items_MAX items
Picks
User-defined
Number of randomly chosen menu selections
Repetition
User-defined
Repetitive selections percentage (within set of #Picks selections)
Pools
User-defined
Number of task subsets (for repetitive selections distribution)
Task
Random
Randomly generated navigation and selection task, with a given number of randomly chosen menu selections and defined repetitive selections distribution
Simulations can be performed with different menu configurations, which are randomly generated according to a given number of menu headers (Headers) and an allowed number of items within each menu frame (Items_MIN and Items_MAX being the limitations). This way we are confronted with the option to analyze many various configurations that can afterwards be classified basing on menu sizes. The basis of the simulation process is a randomly generated navigation and selection task, which consists of an explicit number of random selections (Picks), some of
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation
313
which are repetitive in accordance with a defined percentage (Repetition) and distribution (Pools). It is highly unlikely that the user will make the most of her/his repetitive selections at once, therefore these selections are evenly dispersed throughout the whole task, thus generating the desired distribution (Fig. 6).
Fig. 6. Distribution of repetitive selections within a randomly generated task. Parameter values for task configuration: Picks = 50, Repetition = 30%, Pools = 5.
Obviously, if items change their initial positions within a particular menu frame (according to algorithm instructions), a variation in keypad distance for selecting a particular item in both the original and the modified menu configuration will result. The overall difference between static and adaptive configuration in the total count of keystrokes (for completing the given task) represents the interaction speed enhancement provided by (automatic) adaptivity. In the task simulation framework, this criterion will be used to evaluate the usefulness of menu adaptation and to quantify efficiency of the used adaptation algorithms.
5 Simulation Results The measure of interaction speed improvement derived from automatic adaptation is given by (X-Y), where X stands for the number of keystrokes required for completion of the generated task using the original menu configuration, while Y stands for the number of keystrokes using the adaptive one. Fig. 7 shows a sample result of an FRB adaptation simulation session. Menu configurations that are used within the simulation process are categorized according to structure complexity, so we basically distinguish small, medium and large scale menus. It is easy to realize that wading through large scale menus requires increased user attention, because related headers can extend on several display screens. Because of the random characteristic of the task generation process, we used exactly 100 different instances of the generated task for every particular set of simulation parameters. Consequently, 100 simulation sessions are performed for a distinct menu configuration and specified task class, while the final simulation results are presented as a mean value of data thus collected. Altogether 4500 simulation runs have been carried out for 9 menu configurations and 5 classes of randomly generated tasks. The mean values show an observable level of navigation efficiency enhancement for both FB and FRB approaches. The obtained simulation results are structured and presented in Table 2.
314
V. Glavinic, S. Ljubic, and M. Kukec
Fig. 7. Sample result derived from task simulation framework. In this particular case, FRB adaptation decreases overall keypad distance for 81 keystrokes, thus decreasing the input interaction burden for approximately 10%.
Task classes with small designated number of Picks represent the user's interaction with the menu-based MDA in short interactive cycle. Conversely, tasks which include very large number of selections (e.g. Class #5 with 10000 Picks) correspond to longer usage of the application with menu component navigation options. Related to that, we can see that adaptivity effects considerably grow with task duration, so users can improve their navigation efficiency with the duration of adaptive menu usage. Table 2. Simulation results. For every task class, we set a Repetition parameter value to 15%. Menu configuration Scale
small menu
medium menu
large menu
Items Headers [min-max]
Task classes and simulation results Class #1 100 Picks 5 Pools
Class #2 200 Picks 5 Pools
Class #3 500 Picks 10 Pools
Class #4 1000 Picks 30 Pools
Class #5 10000 Picks 300 Pools
FB / FRB
FB / FRB
FB / FRB
FB / FRB
FB / FRB
3
2-5
5/9
14 / 19
42 / 70
40 / 105
102 / 652
4
2-5
5/8
17 / 23
43 / 67
71 / 152
98 / 584
5
2-5
4/6
17 / 19
32 / 41
44 / 79
118 / 770
6
3-8
14 / 18
25 / 31
50 / 71
133 / 255
330 / 1957
7
3-8
13 / 16
37 / 43
99 / 132
116 / 217
257 / 1348
8
3-8
16 / 19
52 / 59
100 / 123
142 / 241
503 / 2240
9
4-11
30 / 34
57 / 64
128 / 160
187 / 318
785 / 3027
10
4-11
29 / 32
65 / 71
150 / 178
185 / 298
692 / 2767
10
8-12
34 / 38
85 / 94
212 / 247
255 / 417
1081 / 4263
According to simulation results, the benefit of adaptivity implementation is on the other hand questionable in small scale menu configurations, especially within infrequently used applications. In such menus all popout/pulldown frames can be
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation
315
expanded on a single display screen, resulting in no need for navigation to hidden item subsets, hence rearranging items according to prior user's navigation patterns has no manifest significance. Regarding the recency criterion, in most cases FRB adaptation resulted with better enhancement with respect to the FB approach, regardless of menu configuration scale. However, this difference in adaptivity effects becomes more prominent within tasks formed by a larger number of menu selections (e.g. within task class #5, FRB adaptation outperforms several times the FB approach). Hence, promotion of the most recently used items within a particular menu frame is preferable in MDAs requiring a frequent usage of a navigation-and-selection interaction style (as is the case in e.g. mlearning applications). Generally speaking, simulation outcomes support the concept and confirm the usefulness of adaptive techniques implementation for menu navigation in the mobile application environment. Nevertheless, it should be noted that the abovementioned conclusions emerge from theoretically based results. Let us note that it is quite hard to model real application tasks by using random generators because actual navigation patterns contain to some extent more predictive sequences of menu selections, which is not the case in our task simulation framework. This is the reason we can expect even more enhanced results in real application adaptation scenarios. Users' possible mistakes in navigation, impressions and levels of satisfaction while working with adaptive interfaces are excluded from this analysis, as the groundwork for these indicators (e.g. usability testing) is yet not implemented.
6 Conclusion and Future Work M-learning systems, one of our main research interests, will certainly become an additional advantage in the wide-ranging process of lifetime learning. When developing related m-learning MDAs, there arises a strong aspiration to completely utilize mobile device technology upgrowth, and to make these applications very powerful, graphically rich and usable. Hence, following the concept of universal usability, our efforts are focused on the quality of mobile user interaction. We make use of personalization procedures in order to enable users to work with MDA interfaces that are adjusted according to their preferred individual interaction patterns, thus making the users faster and more satisfied in performing assigned (m-learning) tasks. In our previous work, we introduced a transformable menu model for MDAs, with personalization capabilities derived from the use of both adaptable and adaptive techniques. The model is implemented as a Java ME API extension, and can easily be reused in all likewise applications (not necessarily m-learning ones). In this paper we are dealing with system-driven personalization of the presented menu model and efficiency of adaptation algorithms. Various static and adaptive menu configurations are compared within a task simulation framework, and the results thus obtained confirm that the usage of adaptivity makes a difference, providing a valuable improvement in navigation efficiency within menu-based mobile interfaces. Directions for future work include further improvements of our cognition on mutual influence between user diversity and automatic interaction adaptation. We would like to identify the conditions with respect to which the benefit of adaptation is more
316
V. Glavinic, S. Ljubic, and M. Kukec
valuable than the eventual loss of control due to unexpected changes of the menu configuration. Results derived from the described task simulation framework will be substantiated with new research outcomes which base on running adequate usability tests. Moreover, for every presented and completed user task, appropriate time measurements will be carried out, within both static and adaptive menu-based applications. With results thus collected, we expect to get a better insight into the correlation between theoretical and empirical adaptation effects. Acknowledgments. This paper describes the results of research being carried out within the project 036-0361994-1995 Universal Middleware Platform for e-Learning Systems, as well as within the program 036-1994 Intelligent Support to Omnipresence of e-Learning Systems, both funded by the Ministry of Science, Education and Sports of the Republic of Croatia.
References 1. Anderson, D.J.: Speed is the Essence of Usability (Editorial). UIdesign.net: The Webzine for Interaction Designers (1999), http://www.uidesign.net/1999/imho/sep_imho2.html 2. Findlater, L., McGrenere, J.: A Comparison of Static, Adaptive, and Adaptable Menus. In: Proc. ACM SIGCHI Conf. Human Factors in Computing Systems (CHI 2004), pp. 89–96. ACM, New York (2004) 3. Gajos, K.Z., Czerwinski, M., Tan, D.S., Weld, D.S.: Exploring the Design Space for Adaptive Graphical User Interfaces. In: Proc. 8th Int’l. Working Conf. Advanced Visual Interfaces (AVI 2006), pp. 201–208. ACM, New York (2006) 4. Glavinić, V., Granić, A.: HCI Research for E-Learning: Adaptability and Adaptivity to Support Better User Interaction. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 359–376. Springer, Heidelberg (2008) 5. Glavinic, V., Ljubic, S., Kukec, M.: A Holistic Approach to Enhance Universal Usability in m-Learning. In: Mauri, J.L., Narcis, C., Chen, K.C., Popescu, M. (eds.) Proc. 2nd Int’l. Conf. Mobile Ubiquitous Computing, Systems, Services and Technologies (UBICOMM 2008), pp. 305–310. IEEE Computer Society, Los Alamitos (2008) 6. Glavinic, V., Ljubic, S., Kukec, M.: Transformable Menu Component for Mobile Device Applications: Working with both Adaptive and Adaptable User Interfaces. International Journal of Interactive Mobile Technologies (iJIM) 2(3), 22–27 (2008) 7. Mahmoud, Q.: MIDP Database Programming Using RMS: a Persistent Storage for MIDlets. Sun Developer Network (SDN) - Technical Articles and Tips, http://developers.sun.com/mobility/midp/articles/persist/ 8. Roduner, C.: The Mobile Phone as a Universal Interaction Device – Are There Limits? In: Rukzio, E., Paolucci, M., Finin, T., Wisner, P., Payne, T. (eds.) Proc. of the MobileHCI Workshop on Mobile Interaction with the Real World (MIRW 2006), pp. 30–34 (2006) 9. Shneiderman, B.: Universal Usability: Pushing Human-Computer Interaction Research to Empower Every Citizen. Comm. ACM 43, 85–91 (2000) 10. Stephanidis, C.: Editorial. International Journal - Universal Access in the Information Society 1, 1–3 (2001) 11. Thimbleby, H.: Press On: Principles of Interaction Programming. The MIT Press, Cambridge (2007)
Accessible User Interfaces in a Mobile Logistics System Harald K. Jansson1, Robert Bjærum2, Riitta Hellman3, and Sverre Morka2 1
Norkart Geoservice AS, Løkketangen 20a, 1300 Sandvika, Norway
[email protected] 2 Tellu AS, Hagaløkkveien 13, 1383 Asker, Norway {robert.bjarum,sverre.morka}@tellu.no 3 Karde AS, P.O. Box 69 Tåsen, 0801 Oslo, Norway
[email protected]
Abstract. In this paper, we focus on ICTs for young people attending occupational rehabilitation and training. An important goal is to develop ICTs that decrease the need for reading and writing dramatically. The UNIMOD-prototype demonstrates how mobile phones can be used as the main and only ICTdevice by truck drivers who deliver mats from the laundry to a large number of companies and public places. The mobile phone can be used in the truck for navigation according to traffic situation and geography, and for handling the customer and delivery information. The test sessions show that mobile phones offer an excellent point of departure for the development of simple and intuitive services that support users with cognitive declines. Keywords: Accessibility, Cognitive disabilities, GIS, Mobile solutions.
1 Introduction The influx of people registered as unfit for work is steady, if not increasing in Europe. In particular, incapacity for work amongst young employees increases continuously. There are also large numbers of so-called drop-outs, i.e. young people who drop out of school for various reasons. One of the reasons for not completing basic or occupational education is learning disabilities, such as dyslexia. According to an OECD-study [6], approximately 30 % of adults have difficulties in reading and writing, to such an extent that it is difficult for them to handle daily activities at study or work. Other cognitive declines [14], such as concentration problems are also rather common among young people. In many European countries, different occupational training and rehabilitation policies and programmes have been established to combat unemployment due to occupational disability. In Norway, there is a large number of enterprises that are dedicated to and specialized in occupational rehabilitation [1]. Such enterprises are organized as shareholder companies where the main shareholder usually is the local municipality. The services provided for occupationally disabled persons include assessment of the potential work and educational capacity of the individual and qualification of the C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 317–326, 2009. © Springer-Verlag Berlin Heidelberg 2009
318
H.K. Jansson et al.
individual through individually adapted job training and guidance. The enterprises qualify occupationally disabled persons in real work environments. In this paper we focus on ICTs for young people attending occupational rehabilitation and training. An important goal is to develop ICTs which decrease the need for reading and writing dramatically. In connection with the R&D-work of the UNIMOD-project, the authors have collaborated with the rehabilitation enterprise ÅstvedtGruppen and their logistics team [2] on specifying and testing the prototype. 1.1 The UNIMOD-Project The main objective of the UNIMOD-project [12] is to develop new knowledge of multimodal, personalized user interfaces, and thus to improve the accessibility and use of electronic services. The UNIMOD-prototypes are based on real cases, and show how to increase the accessibility of the user interface on the mobile phone. The project presented in this paper addresses users with different kinds of cognitive declines in such areas as memory, problem-solving, orientation, attention/concentration, reading/writing, learning and verbal/visual/mathematical comprehension [4]. These areas are crucial to support in order to achieve an inclusive HCI-design for ICTs [1], such as mobile phones [14]. The so-called Åstvedt-prototype of the UNIMOD-project demonstrates how mobile phones can be used as the main and only ICT-device by truck drivers who deliver mats from the laundry to a large number of companies and public places in the Bergen area in Norway. The mobile phone will be used in the truck for navigation according to traffic situation and geography, and for handling the customer and delivery information. The Åstvedt-prototype results from software development collaboration between three UNIMOD-partners: Norkart Geoservice [9] delivers GIS, Tellu [11] develops applications for mobile phones, and ÅstvedtGruppen [2], a large rehabilitation enterprise (cf. Chapter 1). Researchers from Karde [5] and Norwegian Computing Center [8] have performed user requirements analyses and usability tests. 1.2 The UNIMOD-Prototype in a Nutshell Based on field studies, i.e. empirical observations of (a) the collaboration between the truck driver and the co-driver, (b) available documents (driving instructions, delivery information etc.), (c) the variety of different mats and other “deliverables”, (d) concrete traffic situations and actual navigation, (e) the architecture of public places and buildings, and (e) customer behaviour and preferences, the UNIMOD-team has developed a two-dimensional model for the mobile user interaction. The first dimension of the model handles geographic and navigational information in a user-centered way. The solution suggests a route, but the truck driver makes the concrete navigation decisions depending on the work situation. This dimension also handles the delivery information (i.e. delivering clean mats and picking up dirty ones). It is possible to change the order of delivery stops, and to switch between different representations of the route. The second dimension manages the presentation. The presentation is based on a multimodal user interface, and a “minimal information model”. Interactive forms of information input follow design guidelines developed in earlier projects [4]
Accessible User Interfaces in a Mobile Logistics System
319
Multimodality enables alternative presentations of the very same information (e.g. points of delivery on a map instead of a list of addresses). The minimal information model shows just the necessary information, until the user asks for more. The users require that the user interface introduces the lowest possible cognitive load, and that it must be possible to operate the application multi-modally, depending on personal preferences. In all modalities, the progress of the working day is shown to the user. This is considered an important motivating factor for both the truck driver and the co-driver. In the remainder of this paper, the R&D-work concerning the prototype will be presented. We approach the presentation by discussing the opportunities and limitations of GIS on mobile phones. Second, we will make some remarks about mobile phone technology and the challenges it poses to developers. Finally, we will present the HCI of the prototype.
2 Challenges and Opportunities for Mobile GIS In general, it is fair to say that navigating with the use of a map is not a trivial task. Even a simple map is saturated with topographical data, road data, buildings and landmarks. When constructing a good map, it is always important to have good idea of what information it should convey, what it is will used for, in what context it will be used, and it is certainly important to have a notion about the end user. The mobile digital map differs from its analogue counterpart in some important ways. A traditional printed map is static in its nature; it has a set scale and gives no option of filtering any information in the map. It is mobile in the sense that you can bring it with you, but offers no context sensitive help or guidance (GPS, nearby points of interests, turn-by-turn navigation etc.). However, it often provides detailed, high resolution data, and a well defined cartography. Therefore, it is common that printed maps have a specific theme, and if you need other information you go out and buy a new separate map. A digital map, on the other hand, has the ability to present dynamic data. Also, depending on the hardware and software platform, it can assist the user in simple tasks like knowing his or her position, calculating route and distance to a destination, gathering context information (speed, bearing etc.), and communicating with similar devices or other users. This makes the digital map a very flexible tool, which can be tailored to match the user’s abilities and the use context. 2.1 Technical Constraints Pertaining to Mobile Maps While mobile digital maps excel at providing dynamic, context-based information, they are less good at displaying complex cartography. This has a lot to do with the relatively small screen-sizes associated with such devices, but also with the resolution of the screens. A typical mobile phone has a resolution of 240x320 pixels on a 2.8 inch display, which gives little room for detail. Moreover, mobile screens often render colours in disparate ways, which constricts the map to a small colour space. Mobile devices have numerous technical constraints in addition to the screen limitations which do not affect the map directly. They often have cumbersome interfaces,
320
H.K. Jansson et al.
lack the inclusion of a decent “qwerty-keyboard”, they rely on battery power, and they have unreliable and low data connections (at least in comparison to an ordinary PC). Such limitations play an important role when mobile systems are designed for professional use at work. 2.2 Mobile Cartography Mobile cartography is a concept described by Reichenbacher as follows [10]: “Mobile cartography deals with theories and technologies of dynamic cartographic visualization of spatial data and its interactive use on portable devices anywhere and anytime under special consideration of the actual context and user characteristics.” While the prototype application did not use a client side vector map rendering engine, we could make small server-side adjustments to enhance readability on small screens. One of the things we encountered through user feedback during the project was the lack of road names at certain spots on the map. A given map which may work well in a desktop solution will lack essential information because of the limitations connected with the small screen. The map engine uses pre-rendered map tiles from a WMS-source (Web Map Service), so in this case the rules pertaining to road names had to be changed. We changed text to occur more frequently in vector and hybrid maps, resulting in a more informative map for our mobile users (Fig. 1).
Fig. 1. Text changes in maps for mobile users: more frequent information
2.3 Aerial Photos, Hybrid Maps and Waypoint-Based Navigation It is a well known fact that landmark navigation is a simple and intuitive method of getting from one place to another, given some knowledge about the route. It is, for example, common to use landmarks when explaining a route to another person. On a vector-based map, these objects are often hard to spot, as the features that characterize a building or structure, often is lacking from the dataset. In these cases the use of ortophotos, which are terrestrial images that have been geometrically corrected in such a way that they can be used as maps, may be useful. Hybrid maps can be used as an addition to the vector map. These pictures provide extra details pertaining to the user's surroundings and allow for landmark based navigation. Hybrid maps, which are ortophotos with vector data layered on top of them is also a good alternative. Fig. 2 illustrates the three types of maps. Our user tests showed that a number of users preferred ortophotos instead of vector maps.
Accessible User Interfaces in a Mobile Logistics System
321
Fig. 2. Vector map, orthophoto and hybrid map of same point of interest. Different end users may prefer different presentations.
2.4 Delivery List and Its Map Representation The delivery list (Fig. 3) is a central artifact connected to the work process of the truck drivers. It tells them where to drive and what to deliver. One of the early objectives of the UNIMOD-prototype project was to facilitate multiple ways of viewing the delivery list. We wanted a simple model that could be viewed as a geographically ordered list or as waypoints along a route in a map, depending on the user’s cognitive abilities and preferences. One alternative was to create a simplified route-view, which had a loose connection to the actual geographical locations, much like a modern subway map. The map representation of the list has the same interface mechanisms for going back and forth between waypoints, or selecting delivery spots, as the normal list. Either view contains the same amount of information, so the user should not need to change display to access another kind of data. The map view was intended for users with limited or no knowledge of the route, and the list view was intended for users who were familiar with the locations, and only needed to know the name of the next stop. 2.5 Challenges of Mobile Computing The prototype client for the UNIMOD-project was developed for mobile phones. As pervasive computing comes with a lot of additional challenges, this is most notable on cell phones. According to Forman et al. [3], the main challenges in mobile computing can be divided into three fields. These are wireless communication, mobility and portability. The concrete problems, in addition to wireless communication, are related to disconnection, low bandwidth, high bandwidth variability, heterogeneous networks and security risks. For the UNIMOD-prototype, the wirelessness is a real challenge. The problem with wireless communication is that there is no guarantee that the device is connected to the network. As the mobile client is developed for use in a truck delivering goods, there might be environmental issues that block both GPS and GPRS communication, such as tall buildings. Hence the application can not rely on communication at all times. For instance, the mobile client can use GPS-positions to provide relevant user interface and activate the current views. However, it is necessary to allow the user to override application navigation to avoid a deadlock when the GPS-signal is lost.
322
H.K. Jansson et al.
Fig. 3. A paper-based delivery list. Exceptions are written on it etc. At the end of the day, it may not look like this at all, with comments and coffee stains on it. Perhaps a bit of the sheet is torn off, because the co-driver needed something to write a phone number on…
Another reason why mobile computing relies on wireless connection is that it is necessary to reduce the amount of computation and processing executed on the mobile device to maintain battery capacity. Wireless communication allows the client to delegate computational work to a central server, by sending some parameters and receiving the outcome of the computation from the server. In fact, we experienced some problems with the prototype as a consequence of congestion when sending much data during a short time interval. The map service used by the client sends a grid of nine images of 250x250 pixels. This would not be a problem for stationary devices, as TCP has mechanisms that deal with congestion. For the UNIMOD-prototype, this caused the application to crash. We solved this by using a proxy on the application server. This assured that just one image is sent at a time. For the end-users in real working situations, such mechanisms are vital. The next challenge of mobile computing is portability. This challenge includes such aspects as the lack of standardisation on mobile devices with respect to screen sizes, input interface, communication features (such as Bluetooth, GPS, Wi-Fi etc.) and other hardware constraints. Other concrete constraints are battery capacity, small and different user interfaces and relatively small storage capacity. Fortunately, the latter is not a great issue anymore, as new devices ship with at least 40 megabytes of RAM with the possibility to increase this by inserting memory cards. However, the constrained application memory (heap) is still an issue, and we experienced problems with it during the prototype development.
Accessible User Interfaces in a Mobile Logistics System
323
There are three main concerns regarding portability on mobile devices. It is important to reduce the amount of heap to an absolute minimum, to reduce the processing needed in order to preserve battery and to make generic interfaces with respect to both screen and user input. One of the factors that make it difficult to achieve processing and memory heap economy is that the API used on the relevant devices is Java. Java is an intuitive programming language, and it is easy to implement it on all systems. Unfortunately it is not efficient with respect to memory management and economic processing. The reason for this is that Java is an interpreted language, meaning that it is a high level that requires a lot of redundant processing and large structures to fit in the memory. Solutions to reduce processing are distributed processing by external server for heavy computations, use of native functionality to avoid processing overhead whenever possible and keeping data structures at an absolute minimum. The features in the prototype that are the largest threat to heap consumption are the images used by the navigation module and the paths between destinations. To solve this we had to implement mechanisms that assured that only visible images were in the memory, and persistent storage of all the other images to reduce communication latency and potential charge for bandwidth usage. As for paths we have reduced the number of points needed to draw the path. With regard to external processing, this is implemented on the paths between destinations. All paths are calculated on the server. The client requests the path with the start and the destination coordinates as parameters, and the server responds with the shortest path to the destination. 2.6 The Development Framework The UNIMOD-partner Tellu has developed a framework that solves most of the issues connected with wireless communication. This framework is called ActorFrame [7]. It is an open framework that connects devices to a message bus. The framework can operate on most of the protocols and connections used in mobile computing. ActorFrame divides the application into a number of Actors. An actor is a module that serves a responsibility (called “role” in ActorFrame) in the application. One actor may consist of several inner actors. Seen from the outside the application consist of one actor, usually with multiple inner actors. ActorFrame assures that there is a connection between all peers comprising the application at all times, using a message bus. The prototype consists of tree actors on the server side and two actors on the client side. These actors are in addition to standard actors that are part of the framework, such as resource manager, name server etc. The actors used by the server are:
MapServer handles map tile requests and transformation and visual representation of navigation. UnimodServer handles handshake with the client, and routes the other messages to their respective actors. UnimodFileedge parses the delivery lists and prepares initial data, such as default spider paths and initial images, persistent delivery objects etc.
324
H.K. Jansson et al.
The client consists of the following actors, in addition to framework actors:
MapClient handles map requests, path requests and mapping to the client view. This actor handles all interaction with the map view, including the module assuring persistent storing of map images. UnimodClient handles the more application specific logic, such as sorting delivery lists, handling progress, user interaction etc.
ActorFrame maintains a connection between the client and server at all times using a router that allows the message bus to communicate over GPRS. ActorFrame provides a persistent connection between the server and the client, and so it is adequate for continuous communication. This is convenient in terms of dynamic route updates from base, or reporting back to base about schedule changes etc. ActorFrame will also allow communication between clients. This may be convenient in case of obstacles in the road, or when a truck lacks delivery objects. The client can simply broadcast a message about this, or send directly to another client, and this car might make the delivery instead.
3 The Prototype Using a hierarchical ordering of screens, and having the intended user in mind (i.e. users with potential cognitive declines), the prototype was kept as simple as possible. The end user should only need to see information pertaining to the task at hand, while also maintaining an overview of the delivery process and the route. The UNIMOD-prototype has six levels of screens. (Fig. 4). The first two levels are used as preliminary steps to the actual delivery. Level one lets the user choose which truck (carrier) should be used. Level two has a list of deliverables pertaining to the chosen truck. When the carrier has finished loading the truck he is presented with one of the level three screens, depending on a user setting. This level holds all supernodes in the route, and the carrier can choose between list mode or map mode. When the carrier has arrived at a supernode, he can expand it to see which subnodes (actual delivery spots) it contains. In this particular case the deliverables were doormats, and for each supernode (building) there could be multiple subnodes (entrances). While at the level four screen, the carrier can choose to recursively check the subnode with its products as delivered. This is to avoid unnecessary interface navigation, but the user can – if necessary – dig one step further, to level five. If the user is not sure about what kind of goods to deliver, or what number of goods to deliver, or if there is a mismatch for some reason, the user is given the option to check off single deliveries at the level five screen. When finished with all deliveries, the user is presented with a message box, indicating a job well done. At all times during the delivery, a status bar is shown in the upper portion of the screen. This is to give the carrier a quick overview of the delivery route progress, even while at low levels in the interface.
Accessible User Interfaces in a Mobile Logistics System
325
Fig. 4. The UNIMOD-prototype and the work flow. The bar that indicates the progress is shown at the top of the screens. The blue colour indicates the degree of completion.
4 Conclusion In this paper we have presented our R&D-work to increase the usability and accessibility of applications on mobile phones. Walkthroughs were applied as the main test methodology, allowing the designers and developers to communicate on the prototype. Feedback from the expert users concerned more intuitive navigation, the need to increase the visual clarity of map symbols and the possibility to use multimodal input when registering exceptions on the delivery route. The overall impression from the test sessions is that mobile phones offer an excellent point of departure for the development of simple and understandable services which support users with cognitive declines, such as people with dyslexia.
326
H.K. Jansson et al.
It is, however, necessary to continue increasing the accessibility requirements connected to physically small screens and interactivity designs that apply to mobile devices. It is also necessary to address the constraints of multimodality. The flexibility afforded by multimodality raises considerable challenges for the users who interact with their systems, services and devices. This concern is connected to the overload that may be generated by the introduction of several modalities, such as combinations of visual and audio information, and the opportunity to choose. Finally, there is the question of suitable use contexts for the mobile phone. The UNIMODprototype clearly shows the potential of mobile phones in professional use contexts. Acknowledgments. The research work has been partially financed by the Norwegian Research Council. Personnel from the ÅstvedtGroup have made the empirical work possible. Special thanks go to truck drivers and co-drivers for driving the researchers around Bergen and commenting on the prototype.
References 1. Association of Vocational Rehabilitation Enterprises, http://www.attforingsbedriftene.no/uk/home.aspx 2. Åstvedt Logistikk, A.S., http://www.astvedt.no/?aid=9045819 3. Forman, G.H., Zahorjan, J.: The Challenges of Mobile Computing. IEEE Computer, 38–47 (April 1994) 4. Hellman, R.: Universal Design and Mobile Devices. In: Stephanidis, C. (ed.) HCI 2007. LNCS, vol. 4554, pp. 147–156. Springer, Heidelberg (2007) 5. Karde AS, http://www.karde.no/karde-web/Karde_engelsk.html 6. Learning a Living. First Results of the Adult Literacy and Life Skills Survey. Organisation for Economic Co-operation and Development (2005), http://www.oecd.org/dataoecd/44/7/34867438.pdf 7. Melby, G., Husa, k.E.: ActorFrame Developers Guide, Technical Report, Ericsson (2005) 8. Norwegian Computing Center, http://www.nr.no/ 9. Norkart Geoservice AS, http://www.norkart.no/wip4/detail.epl?cat=1077 10. Reichenbacher, T.: The world in your pocket – Towards a mobile cartography. In: Proceedings of the ICC 2001, Beijing, China, pp. 2514–2521 (2001) 11. Tellu AS, http://www.tellu.no/tellu_webpage.html 12. Universal Design in Multi-modal Interfaces, http://www.unimod.no 13. WebAim: Cognitive Disabilities - Design Considerations, http://www.webaim.org/articles/cognitive/design.php 14. WebAim: Cognitive Disabilities - Introduction, http://www.webaim.org/articles/cognitive/
Multimodal Interaction for Mobile Learning Irina Kondratova National Research Council Canada Institute for Information Technology 46 Dineen Drive, Fredericton, NB, Canada E3B 9W4 {Irina.Kondratova}@nrc-cnrc.gc.ca
Abstract. This paper discusses issues associated with improving usability of user interactions with mobile devices in mobile learning applications. The focus is on using speech recognition and multimodal interaction in order to improve usability of data entry and information management for mobile learners. To assist users in managing mobile devices, user interface designers are starting to combine the traditional keyboard or pen input with “hands free” speech input, adding other modes of interaction such as speech-based interfaces that are capable of interpreting voice commands. Several research studies on multimodal mobile technology design and evaluations were carried out within our state-of the art laboratories. Results demonstrate feasibility of incorporating speech and multimodal interaction in designing applications for mobile devices. However, there are some important contextual constrains that limit applications with speech-only interfaces in mobile learning, including social and environmental factors, as well as technology limitations. These factors are discussed in detail. Keywords: Mobile usability, multimodal interaction, speech recognition, mobile evaluation.
1 Introduction Many researchers see great value in mobile learning because of portability, low cost and communication capabilities of mobile devices [21]. Mobile devices are becoming an increasingly popular choice in university and school classrooms, and are increasingly being adopted by the “lifelong learners”. Several features of mobile technologies make it attractive in learning environments, among them: relatively low cost of mobile devices [25] and good fit within informatics and social layers of classroom communications [20]. Evaluations of mobile technologies within the classroom environment are largely positive [1, 24]. However, widespread use of mobile technology in learning applications is impeded by numerous usability issues with mobile devices. The gravity of mobile usability problems is highlighted by recent surveys of mobile Internet users [22]. They show that usability is by far the biggest source of frustration among the users of mobile technologies. In particular, for learning applications, research shows that the most important constraining factors for widespread mobile learning adoption, along with battery life, are the screen size and user interface of most portable devices [17]. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 327–334, 2009. © Her Majesty the Queen in Right of Canada, 2009.
328
I. Kondratova
This paper explores possible improvements in the usability of mobile devices that is facilitated by utilization of natural user interfaces to enhance interaction with mobile devices. In section two of the paper the author provides background information on speech-based interaction with mobile devices and on technologies involved. This section of the paper also addresses the concept of multimodality and multimodal applications for interaction with mobile devices. The follow-up section discusses several laboratory studies conducted to evaluate efficacy and feasibility of multimodal interactions with mobile devices and their potential applications to mobile learning. The author concludes with observations on the potential for incorporating speech and multimodal technologies in mobile learning domain and some limitations of these technologies.
2 Alternative Interaction Modalities 2.1 Speech as an Interaction Modality In order to assist users in managing mobile devices, user interface designers are starting to combine the traditional keyboard or pen input with “hands free” speech input [28], adding other modes of interaction such as speech-based interfaces that are capable of interpreting voice commands [23]. As a result, speech processing is becoming one of the key technologies for expanding the use of handheld devices by mobile users [18]. In the eLearning technology foresight, technology-based education guru Tony Bates predicted that: “A new computer interface based on speech recognition will have a major impact on the design of e-learning courses” [15]. Currently, automated speech recognition (ASR) technology is being used in desktop e-learning applications for automated content-based video indexing for interactive e-learning [29], audio–clip retrieval based on student questions [30], and, together with speech synthesis, to improve accessibility of e-learning materials for visually impaired learners [3, 4]. Another novel application of mobile technology for experiential learning is being developed for functionally illiterate adults [14]. This application employs speech recognition and text-to-speech to assist adult literacy learners in improving pronunciation of words they learn. 2.2 Multimodal Interaction Speech technology seems to be ideally suited for enhancing usability of mobile learning applications designed for the mobile phone. In this domain speech is a natural way of interaction, especially where a small screen size of a mobile device limits the potential for a meaningful visual display of information [2]. However, speech technology is limited to only one form of input and output - human voice. In contrast to this, voice input combined with the traditional keyboard-based or pen-based input permits multimodal interaction where the user has more than one means of accessing data in his or her device [16]. This type of user interface is called a multimodal interface [5]. Multimodal interfaces allow speedier and more efficient communication with mobile devices, and accommodate different input modalities based on user preferences and the usage context. A field trip learning environment, offers the most comprehensive scenario for using of speech and multimodal interaction with mobile device. For
Multimodal Interaction for Mobile Learning
329
example, in a field trip scenario for a group of engineering students, a student can request information about the field structure (bridge, building, road, etc.) from the course repository using “hands free” voice input on a “smart phone” (hybrid phoneenabled PDA). The requested information would then be delivered as a text, picture, CAD drawing, or video, if needed, directly to the PDA screen. The student will be able to enter field notes in the forms using a portable keyboard or a pen, if appropriate or via voice input during field data gathering. In addition to this, free-form verbal field notes could be attached to the data collected as an audio file and later analyzed in class [6].
3 Evaluations of Mobile Speech and Multimodal Technologies This section compares several applications of mobile multimodal technologies (speech-based and keyboard/stylus). In particular, the focus is on user evaluations of these technologies conducted to study the feasibility and efficacy of speech-based and multimodal interactions in different contexts. This comparison will form the basis for author’s estimate for potential of using speech as an interaction technique in various learning contexts. 3.1 Speech vs Stylus Interaction Comparison of efficacy of speech-based and stylus-based interaction with a mobile device was conducted as a part of our research in the area of mobile field data collection that focus on multimodal (including voice) field data collection for industrial applications. We investigated the use of technologies that allow a field-based concrete testing technician to enter quality control information into a concrete quality control database using various interaction modes such as speech and stylus, on a handheld device. A prototype mobile multimodal field data entry (MFDE) application have been developed to run on a Pocket PC that is equipped with a multimodal browser and embedded speech recognition capabilities. The prototype application was developed for the wireless Pocket PC utilizing the multimodal NetFront 3.1 Web browser and a fat wireless client with an embedded IBM ViaVoice speech recognition engine. An embedded relational database (IBM DB2 everyplace) was used for local data storage on the mobile device. A built-in microphone on a Pocket PC was utilized for speech data input [7]. User evaluation was conducted as a lab-based mobile evaluation of the prototype technology we developed. The detailed description of the study design is given in [8]. Our mobile application was designed to allow concrete technicians to record, while in the field (or more specifically, on a construction site), quality control data. The application supported two different modalities of data input – speech-based data entry and stylus-based data entry. The purpose of the evaluation was to (a) determine and compare the effectiveness and usability of the two different input options and (b) to determine which of the two options is preferred by users in relation to the application’s intended context of use. In order to appropriately reflect the anticipated context of use within our study design, we had to consider the key elements of a construction site that would potentially
330
I. Kondratova
influence a test technician’s ability to use one or both of the input techniques. We determined these to be: (a) the typical extent of mobility of a technician while using the application; (b) the auditory environmental distractions surrounding a technician – that is, the noise levels inherent on a typical construction site; and (c) the visual or physical environmental distractions surrounding a technician – that is, the need for a technician to be cognizant of his or her physical safety when on-site. A total of eighteen participants participated in the study. The results of the evaluation confirmed, as it was anticipated, that stylus-based input was significantly more accurate than speech under the conditions of use that included construction noise in the range of 60-90 dB (A) [11]. We observed, however, that the stylus-based interaction was, on average, slower than speech-based input and that speech-based input significantly enhanced the participants’ ability to be aware of their physical surroundings. In addition, majority of participants expressed preference for using speech as interaction technique with mobile device. As a result, this research study demonstrated significant preference for using speech as an interaction modality, with some limitations imposed by the lower speech recognition accuracy levels due to environmental noise. These findings led us to investigation of several technology factors that can potentially influence the accuracy of speech recognition, such as the type of the microphone and the type of speech recognition engine used. 3.2 Speech-Based Interactions – Technology Evaluations The choice of microphone technology and speech recognition engine plays an important role in improving quality of speech recognition [19]. Our study described in detail in [12] was designed to evaluate and compare three commercially available microphones – the bone conduction microphone, and the two types of condenser microphones for their effect on accuracy of speech recognition within mobile speech input application. We developed a data input application based on a tablet PC running Windows XP and utilized IBM’s ViaVoice embedded speaker-independent speech recognition engine, the same speech recognition engine that was utilized in our previous study [8]. Twenty four people participated in the laboratory-based study. The participants were mobile while entering information requested. The results of the study helped us to prove that the choice of microphone had significant effect on accuracy of mobile speech recognition; in particular, we found that both condenser microphones (QSHI3 and DSP-500 microphones) performed significantly better than bone conduction microphone (Invisio). In addition, we found that there was no significant effect of a background noise (within our evaluation scenario we incorporated street noise of 70 dB (A) level) on the accuracy of speech recognition, indicating that all microphones under evaluation had sufficient noise-cancelling capabilities. Considering the importance of choosing the best speech recognition engine on the accuracy of results obtained, a complementary laboratory study was conducted to evaluate a number of state-of-the art speech recognition engines as to their effect on the accuracy of speech recognition [13]. This study was based on pre-recorded user speech entries, collected in our previously mentioned study [12]. All speech recognition engines were evaluated in speaker independent mode (e.g. walk-up-and-use). Based on the results of this study, we also proved the importance of proper pairing of
Multimodal Interaction for Mobile Learning
331
microphone systems and speech recognition engines to achieve the best possible accuracy of speech recognition for mobile data entry. 3.3 Feasibility of Using Speech Interaction in Learning Contexts Our previous research demonstrates that it is technically possible to implement speech-based and multimodal interaction with a mobile device and to achieve significant level of user acceptance and satisfaction with technology. However, if we were to consider implementation of speech-based interfaces within mobile learning domain, we have to look at other important considerations, such as appropriateness of speech as an interaction modality within certain contexts of use and social acceptance of speech-based interactions. In a classroom environment, when a number of learners could potentially utilize mobile technology to participate in learning and collaboration process, the appropriateness of speech-based interaction is questionable, since simultaneous use of speech by multiple users will introduce high level of environmental noise that could significantly reduce the accuracy of speech recognition for each individual device. Thus, based on contextual considerations, this application of speech-based interfaces is not appropriate. At the same time, utilization of speech interaction by a single mobile learner is very much appropriate and could significantly improve experience of his/her “learning on the go”. Research has proven that mobile speech-based interaction could be successfully designed for users on the go, such as city tourist guides or in-car speechbased interfaces [10]. Most frequently these types of applications utilize a constrained vocabulary of user commands and a constrained grammar of possible user entries. This functionality enables menu navigation, information retrieval and some basic data entry capabilities. The same principles apply to utilization of speech-based and multimodal interfaces for student field trips, where students are mobile and take notes “on the go” [9]. Another interesting and rapidly developing research area is an application of speech-based and multimodal interfaces within various training scenarios, including industrial and military training. Within these scenarios, when training is conducted in the field or in the simulated field environment, voice command could enable efficient “hands free, eyes free” information retrieval, menu navigation and basic data entry. Another application of speech-based interfaces is within the domain of gaming, including “serious gaming” in education and training domains [27]. A major challenge for speech-based interfaces within “serious gaming” domain is to improve the accuracy of speech recognition within environmentally challenging conditions (high level of noise, people possibly being under stress thus affecting the way they speak and reducing the accuracy of command recognition, etc). Within this usage domain, we see an opportunity to successfully deploy multimodal interaction so that multiple channels of input would assist in improving accuracy and usability of the system [26].
4 Conclusions Our research on speech-based and multimodal user interaction with mobile devices has proven that it is technically feasible to implement speech-based (or multimodal)
332
I. Kondratova
interaction with a mobile device and to achieve significant level of user acceptance and satisfaction with this technology. We also identified some challenges associated with use of speech-based and multimodal interaction within the learning and training domains. Our future research efforts will be focused on exploring ways to better incorporate multimodal (including speech-based) interfaces within “serous gaming” scenarios, where this technology has potential to significantly improve usability of user interactions with technology, especially in cases where “hands-free and eyes free” interaction is a must, such as military and industrial training applications. Acknowledgements. The author would like to acknowledge the support for this research program provided by the National Research Council Canada.
References 1. Crawford, V., Vahey, P.: Palm Education Pioneers Program: Evaluation Report. SRI International, Menlo Park (2002) 2. de Freitas, S., Levene, M.: Evaluating the Development of Wearable Devices, Personal Data Assistants and the Use of Other Mobile Devices in Further and Higher Education Institutions. JISC Technology and Standards Watch Report: Wearable Technology (2002) 3. Guenaga, M.L., Burger, D., Oliver, J.: Accessibility for e-Learning Environments. In: Miesenberger, K., Klaus, J., Zagler, W.L., Burger, D. (eds.) ICCHP 2004. LNCS, vol. 3118, pp. 157–163. Springer, Heidelberg (2004) 4. Jahankhani, H., Lynch, J.A., Stephenson, J.: The Current Legislation Covering E-learning Provisions for the Visually Impaired in the EU. In: Shafazand, H., Tjoa, A.M. (eds.) EurAsia-ICT 2002. LNCS, vol. 2510, pp. 552–559. Springer, Heidelberg (2002) 5. Jokinen, K., Raike, A.: Multimodality – Technology, Visions and Demands for the Future. In: Proceedings of the 1st Nordic Symposium on Multimodal Interfaces, Copenhagen (2000) 6. Kondratova, I., Goldfarb, I.: M-learning: Overcoming the Usability Challenges of Mobile Devices. In: Proceedings International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL 2006), p. 223. IEEE Computer Society Press, Los Alamitos (2006) 7. Kondratova, I.: Speech-Enabled Handheld Computing for Fieldwork. In: Proceedings of the International Conference on Computing in Civil Engineering 2005, Cancun, Mexico (2005) 8. Kondratova, I., Lumsden, J., Langton, N.: Multimodal Field Data Entry: Performance and Usability Issues. In: Proceedings of the Joint International Conference on Computing and Decision Making in Civil and Building Engineering, Montréal, Québec, Canada, June 1416 (2006) 9. Kravcik, M., Kaibel, A., Specht, M., Terrenghi, L.: Mobile Collector for Field Trips. Educational Technology & Society 7(2), 25–33 (2004) 10. Larsen, L.B., Jensen, K.L., Larsen, S., Rasmussen, M.H.: Affordance in Mobile Speechbased User Interaction. In: Proceedings of the 9th international Conference on Human Computer interaction with Mobile Devices and Services, MobileHCI 2007, pp. 285–288. ACM, New York (2007)
Multimodal Interaction for Mobile Learning
333
11. Lumsden, J., Kondratova, I., Langton, N.: Bringing A Construction Site Into The Lab: A Context-Relevant Lab-Based Evaluation Of A Multimodal Mobile Application. In: Proceedings of the 1st International Workshop on Multimodal and Pervasive Services (MAPS 2006), Lyon, France (2006) 12. Lumsden, J., Kondratova, I., Durling, S.: Investigating Microphone Efficacy for Facilitation of Mobile Speech-Based Data Entry. In: Proceedings of the British HCI Conference, Lancaster, UK, September 3-7 (2007) 13. Lumsden, J., Durling, S., Kondratova, I.: A Comparison of Microphone and Speech Recognition Engine Efficacy for Mobile Data Entry. In: The International Workshop on MObile and NEtworking Technologies for social applications (MONET 2008), part of the LNCS OnTheMove (OTM) Federated Conferences and Workshops, Monterrey, Mexico, November 9-14 (2008) 14. Lumsden, J., Leung, R., Fritz, J.: Designing a Mobile Transcriber Application for Adult Literacy Education: A Case Study. In: Proceedings of the International Association for Development of the Information Society (IADIS) International Conference Mobile Learning 2005, Qawra, Malta, June 28 – 30 (2005) 15. Neal, L.: Predictions for 2002: e-learning Visionaries Share Their Thoughts. eLearn Magazine 2002(1), 2 (2002) 16. Oviatt, S., Cohen, P.: Multimodal Interfaces that Process What Comes Naturally. Communications of the ACM 43(3) (March 2000) 17. Pham, B., Wong, O.: Handheld Devices for Applications Using Dynamic Multimedia Data, Computer Graphics and Interactive Techniques in Australasia and South East Asia. In: Proceedings of the 2nd international conference on Computer graphics and interactive techniques in Australasia and South East Asia. ACM Press, New York (2004) 18. Picardi, A.C.: IDC Viewpoint. Five Segments Will Lead Software Out of the Complexity Crisis, Doc #VWP000148 (December 2002) 19. Quek, F., MCNeill, D., Bryll, R., Dunkan, S., Ma, X.-F., Kirbas, C., MCCullough, K.E., Ansari, R.: Multimodal Human Discourse: Gesture and Speech. ACM Transactions on Computer-Human Interaction 9(3), 171–193 (2002) 20. Roschelle, J., Pea, R.: A Walk on the WILD Side: How Wireless Handhelds May Change Computer-supported Collaborative Learning. International Journal of Cognition and Technology 1(1), 145–168 (2002) 21. Roschelle, J.: Keynote paper: Unlocking the learning value of wireless mobile devices. J. of Computer Assisted Learning 19, 260–272 (2003) 22. Sadeh, N.: M-Commerce: Technology, Services, and Business Model. John Wiley & Sons, Inc., Chichester (2002) 23. Sawhney, N., Schmandt, C.: Nomadic Radio: Speech and Audio Interaction for Contextual Messaging in Nomadic Environments. ACM Transactions on Computer-Human Interaction 7(3), 353–383 (2000) 24. Smordal, O., Gregory, J.: Personal Digital Assistants in Medical Education and Practice. Journal of Computer Assisted Learning 19(3), 320–329 (2003) 25. Soloway, E., Norris, C., Blumenfeld, P., Fishman, B.J.K., Marx, R.: Devices are Ready-atHand. Communications of the ACM 44(6), 15–20 (2001) 26. Tse, E., Greenberg, S., Shen, C.: Exploring Interaction with Multi User Speech and Whole Handed Gestures on a Digital Table. In: Proceedings of ACM UIST 2006, Montreux, Switzerland, October 15–18 (2006) 27. Wang, X., Yun, R.: Design and Implement of Game Speech Interaction Based on Speech Synthesis Technique. In: Pan, Z., Zhang, X., El Rhalibi, A., Woo, W., Li, Y. (eds.) Edutainment 2008. LNCS, vol. 5093, pp. 371–380. Springer, Heidelberg (2008)
334
I. Kondratova
28. Wilson, L.: Look Ma Bell, No Hands! – VoiceXML, X+V, and the Mobile Device. XML Journal, August 3 (2004) 29. Zhang, D., Nunamaker, J.F.: A Natural Language Approach to Content-Based Video Indexing and Retrieval for Interactive E-Learning. IEEE Transactions on Multimedia 6(3), 450–458 (2004) 30. Zhuang, Y., Liu, X.: Multimedia Knowledge Exploitation for E-Learning: Some Enabling Techniques. In: Fong, J., Cheung, C.T., Leong, H.V., Li, Q. (eds.) ICWL 2002. LNCS, vol. 2436, pp. 411–422. Springer, Heidelberg (2002)
Acceptance of Mobile Entertainment by Chinese Rural People Jun Liu1, Ying Liu2, Hui Li1, Dingjun Li1, and Pei-Luen Patrick Rau1 1
Institute of Human Factors & Ergonomics, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China 2 Nokia Research Center, No. 5, Donghuan Zhonglu, Beijing Economic & Technological Development Area, Beijing 100176, China
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. This study explores and analyzes contributing factors of mobile entertainment acceptance by Chinese rural people. First, 27 factors were drawn from literatures. Then a new factor “cost” was found through interview. After that, a survey was built based on the 28 factors. From the data collected in Chinese rural area, seven factors were extracted through explorative factor analysis: social influence, technology and service quality, entertainment utility, simpleness and certainty, self-efficacy, perceived novelty, and cost. Finally, a comprehensive model was provided involving the seven factors as well as their importance rank. This research provides a comprehensive approach in technology acceptance theory. It can also help practitioners to better understand the rural user group and improve their products accordingly. Keywords: Technology acceptance; mobile entertainment; rural people.
1 Introduction Mobile technologies and applicants are rapidly and widely developed for entertainment. However, entertainment related services are far from fully accepted by mobile phone users, especially in emerging markets. Therefore, to study the emerging market users’ perception and acceptance of mobile entertainment is a great demand for business, technology and social practice. The objectives of this research include: 1) to build a comprehensive model for the acceptance of mobile phone entertainment by Chinese rural people, considering users, technologies and the environment; and 2) to generate design and ecosystem suggestions for improving the acceptability of mobile entertainment services.. The research is creative and significant from two aspects: the first by its comprehensive modeling paradigm, and the second by its special focus on mobile entertainment issue. Mobile entertainment acceptance is a sub-question of technology acceptance. Since 1975, much has been done in investigating the technology acceptance [1]. Several models have been developed to describe contributing variables of technology acceptance. However, there is no comprehensive model considering users, C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 335–344, 2009. © Springer-Verlag Berlin Heidelberg 2009
336
J. Liu et al.
technologies and the environment. Some of the proposed models seem to be comprehensive, but if the measuring items are examined, they are still only focusing on technology itself, or the users [2]. So a comprehensive model which can analyze and predict better is vital to build. On the other hand, although there are lots of researches in the topic of technology acceptance, few have been focused on the special issue of mobile entertainment acceptance. Since mobile entertainment is different from other technologies for its mobility, legerity, emotionality, and personalization, the contributing variables of mobile entertainment acceptance should act differently from those of technology acceptance. Therefore, there is a great demand researching factors influencing user’s acceptance of this particular mobile service, and models describing how those influences happen. To achieve those objectives, a four-phase study was conducted. Phase I was background research. As mobile entertainment acceptance is a sub-issue of technology acceptance, researches on the latter topic are reviewed. From literatures, factors contributing to technology acceptance were concluded as the basis of following study. Phase II was user study by in-depth phone interviews. A new factor that is not included in former literatures was found during the interview analysis. In Phase III, survey and modeling, all the factors were validated and modeled together. Phase IV was to discuss design and ecosystem suggestions based on the model.
2 Background Research 2.1 Mobile Entertainment Definition According to Mobile Entertainment Forum 2003, the term “mobile entertainment” refers to “entertainment products that run on wirelessly networked, portable, personal devices, which includes downloadable mobile phone games, images and ring tones, as well as MP3 players and radio receivers built into mobile handsets.” The term excludes mobile communication like person-to person SMS and voicemail, as well as mobile commerce applications like auctions or ticket purchasing [3]. In this research, we adopt this definition, and encapsulate the device scope to mobile phone. 2.2 Technology Acceptance Theories Information technology acceptance is relatively well studied with many models and researches. The most widely used four models are the theory of reasoned action (TRA), the technology acceptance model (TAM), the theory of planned behavior (TPB) and the innovation diffusion theory (IDT). The TRA, developed by Fishbein and Ajzen[1], shows that a person's specific behavior is determined by the behavioral intention. In turn, behavior intention is determined by the person's attitude and subjective norm. The TAM, developed by Davis [4], is adapted from TRA, and is specially focus on the behavior of information system acceptance. In this model, “perceived usefulness” and “perceived ease of use” are primarily relevant to the acceptance behaviors. The TPB is a theory of planned behavior developed by Ajzen who is also the developer of TRA [5]. It is an extension of TRA by adding the variable of “perceived behavior control”. The IDT, developed by Rogers [6], explains the behavior of innovation adoption. There are strong affiliations
Acceptance of Mobile Entertainment by Chinese Rural People
337
among the four models. First of all, TPB extends TRA with the additional factor of “perceived behavior control”. Then, TAM is based on TRA, and is focus on information technology acceptance. For the two theories of TAM and IDT, the core constructions are very similar: “perceived usefulness” is very like “relative advantage” and “perceived ease of use” has similar meaning with “complexity” [2][7]. Based on the theories and related researches, totally 27 factors contributing to technology acceptance are concluded. The related measure items for each factor are collected as well. For example, to measure “perceived ease of use”, there are items like “learning to operate information technology would be easy for me” [8]. The 27 factors are listed as below. 1. Perceived usefulness [7][9] 2. Perceived ease of use [4][7][8][9] 3. Perceived complexity [6][10] 4. Perceived enjoyment/fun [11][12] 5. Output quality [9] 6. Relative advantage [7][6] 7. Compatibility [13][7][6][14] 8. Perceived behavioral control [5][12][14] 9. Subjective norm [5][14] 10. Peer influence [14] 11. Word-of-mouth [15][16] 12. Job relevance [17][18][9] 13. Voluntariness [7][12][9] 14. Innovativeness [19] 15. Self-efficacy [20][14][12] 16. Computer anxiety [12] 17. Computer playfulness [12] 18. Technology facilitating conditions [14] 19. Organizational support [10] 20. Visibility [6] 21. Trialability [6] 22. Being-younger [21][22] 23. Perceived modernness [23] 24. Perceived Risk [24] 25. Communication facilitating [21] 26. Perceived novelty 27. Image [7][9]
3 Research Questions The research was aimed to build a comprehensive model for the acceptance of mobile entertainment by Chinese rural people. Two research questions are: 1) what variables can affect the acceptance of mobile entertainment by Chinese rural people? And 2) what are the relationships between these variables and rural people’s mobile entertainment acceptance intention?
338
J. Liu et al.
The previous 27 variables extracted from literatures were generated in the context of information technology. In this study, they are supposed to affect the acceptance of mobile entertainment as well. Besides, new variables are also supposed to be found in this specific technology area. All these variables could have different impact in users’ acceptance, and we are asking which are crucial, and which are not so related. Therefore, the model not only structures all the variables into a few dimensions, but also describing the weights and importance of them, so that the model can be more predictive and more practical.
4 Methodology The study has two steps. Firstly, qualitative user study using in-depth phone interview was adopted to explore the factors contributing to the acceptance of mobile entertainment by Chinese rural people. As the 27 technology acceptance factors extracted from literatures are also supposed to have effects in this study. The aim of the interview was to check if there are factors lost besides the theoretical based 27 factors. Secondly, quantitative survey data were collected to model the relationships between these variables and rural people’s mobile entertainment acceptance intention. 4.1 In-Depth Phone Interview Interview Questions. To explore factors, the open end explorative questions were formulated. The questions were mainly about participants’ experiences using mobile services including entertainment services, their entertainment life, and the behaviors of surrounding people. A question example is “what things happened will promote you to use the mobile phone services, or why do you want to use it?” Participants. Three Chinese rural people, two male and one female, with their ages ranged from 28 to 50 took part in the interview. They all had mobile phones and had mobile entertainment experiences. They were recruited from two provinces of China (Shandong and Shanxi), where the economic and life patterns are different. Procedure. The whole interview was taken by mobile phone with a loudhailer, and recorded on PC. The interview time for each participant was 30~45 minutes. During interview, the dialect of each district was used. Data analysis. Following the Long Table Approach [25], the transcripts from phone interview were printed out, followed with a series of cutting and categorizing of the transcripts. At the end, citations reflecting the same factor were pasted together, and the factor was named and written beside. 4.2 Survey Questionnaire Construction. Twenty-nine items were designed based on the factors found from the interview. All the items were attitude statements concerning mobile entertainment acceptance. For example, from the factor of “perceived usefulness”, an item was designed like “I often need to use mobile entertainment in my daily work and life.” Three of the items were inversed to help excluding the invalid samples during the analysis phase. The 7-point Likert scales were used with different levels of
Acceptance of Mobile Entertainment by Chinese Rural People
339
agreement to the statements from “1=totally disagree” to “7=totally agree”. The definition of “mobile entertainment” was given in front of every page to help the participant to remember. For further analysis, questions about personal information, mobile phone and related technology experience were included in the questionnaire. Participants. The sample size was designed according to Gorsuch [26], who recommended that the subject to item ratio should be larger than 5. Therefore, at least 145 participants should fill in the questionnaire. We finally got 150 valid questionnaires in the research. All the participants were from the rural areas of Dezhou City in Shandong province of China. Their ages ranged from 16 to 56, with an average of 33.39. Most (51.3%) participants have an education level of junior high school. All the participants had mobile phones, with 87.3% had used more than one mobile before. The prices of most (51.4%) participants’ mobiles were in the range of 500-1000RMB. Procedure. As most rural people are not familiar with web-based questionnaires, paperbased questionnaires were given to them face to face. In order to keep the validity of each questionnaire, the entire process of answering the questions were assisted by the survey conductor. After filling the survey, each participant got 20 RMB reward. Data Analysis. Among the 29 items of the questionnaire, 3 of them were reversed items aiming to identify invalid samples. They were excluded for further analysis. Therefore, 26 items were utilized in this phase of data analysis. The data analysis was conducted by three steps. Firstly, the internal consistency of the questionnaire could be tested by Cronbach Alpha calculation [27]. Secondly, exploratory factor analysis was used to find the structural characteristics among the items. We used the KaiserMeyer-Olkin (KMO) testing to analyze if the items had enough common information. Then the “principal component analysis” method for factor extraction was adopted, and the rotation method “Varimax with Kaiser Normalization” was used to further interpret these extracted factors. After the factor extraction, some of the original items were eliminated or grouped. And each factor was named by its included items. Finally, a visualized model was build to describe the results comprehensively.
5 Results 5.1 Interview Results Eighteen factors were explored from the interviews. Seventeen among them are matched with those concluded from literatures. They are: 1) perceived usefulness; 2) perceived ease of use; 3) perceived complexity; 4) perceived enjoyment/fun; 5) output quality; 6) relative advantage; 7) compatibility; 8) perceived behavioral control; 9) job relevance; 10) voluntariness; 11) innovativeness; 12) technology facilitating conditions; 13) organizational support; 14) visibility; 15) perceived risk; 16) communication facilitating; and 17) perceived novelty. One new factor which cannot match any one from the literatures was named as “cost”. “Cost” means the charges of a particular service which could influence users’ acceptance. Citation examples are like: “Is listening to music for free?”and “I hope the function/service totally for free.” Although there are other 10 factors which were obtained from literatures were not explored from the phone interview, we cannot delete them at this step, for the limit of
340
J. Liu et al.
sample size. The aim of the interview was achieved to check if there are factors lost besides theoretical based 27 technical acceptance factors. Therefore, all those 28 factors were tested and analyzed quantitatively in the next phase of survey. 5.2 Survey Results Firstly, the Cronbach’s alpha of the 26 items is 0.812. For each item, this index can not increase significantly if the item is deleted. From literatures, an alpha (α) value of 0.70 or above is considered to indicate strong internal consistency [28]. For exploratory research, an alpha value of 0.60 or above is also considered significant [29]. This indicates that all the 26 items in our study has a high internal consistency, therefore they were all included for further analysis. Secondly, two iterates of exploratory factor analysis were conducted. After the first run, according to the criteria from literatures [29][30], we eliminated a single-item factor (item 21), and an item (item 19) with factor loadings significantly less than 0.45. In the second run, seven factors were extracted from the left 24 items, with 61.79% of variance be explained. For both iterates, the exploratory factor analysis method is appropriate because the KMO were 0.783 for the initial 26 items, and 0.793 after eliminating the two items, both more than 0.7. The extracted factors were named by the common meanings of the items included (Table 1). The first factor social influence shows that others’ suggestion and social norms can influence the users’ acceptance for mobile entertainment. Also people like to accept the products that promote their social image or enhance social communication. This factor is related with the theoretical factors of being-younger, peer influence, subjective norm, word-of-mouth, and organizational support. The second factor technology and service quality is about the quality and convenience of mobile entertainment technology and service. It includes the former theoretical factors of trialability, technology facilitating conditions, output quality, innovativeness, and visibility. The third factor entertainment utility means the emotional and entertainment utility of mobile entertainment. It is related with the theoretical factors of perceived enjoyment/fun, voluntariness, perceived usefulness, and perceived modernness. The fourth factor is the users’ perceived simpleness and certainty of the interaction with the product. People tend to accept the products when the interaction is simple and the consequences are certain. It is formed by two theoretical factors of perceived ease of use and perceived risk. The fifth factor self-efficacy is self perception of being able to use the service or product. It is related with theoretical factors of self-efficacy, perceived complexity, and perceived behavioral control. The sixth factor perceived novelty indicates that people are like to accept novel products and services. And the last factor is the cost of familiar or unfamiliar mobile entertainment services. The reliability and validity of a questionnaire’s construction are confirmed. The internal consistency methods are adopted to establish reliability in this study. After eliminating two items during factor analysis, the Cronbach’s alpha of this measuring instrument is 0.814, which indicates strong reliability according to former discussion. The seven factors account for 61.79% of the total variance and factor loadings range from 0.44 to 0.81. So the construct validity of the instrument is acceptable.
Acceptance of Mobile Entertainment by Chinese Rural People
341
Table 1. Factor naming Factor Name Social influence
Technology and service quality
Entertainment utility
Simpleness and certainty
Selfefficacy Perceived novelty Cost
Questions involved 12. Mobile entertainment can make me feel younger. 14. If my friends think I should use mobile entertainment, I will try it. 20. I feel that people around (family member, friends, etc.) think I should use some mobile entertainment. 18. If my friends tell me some mobile entertainment service, I will try it. 17. I think mobile entertainment provide me more chances to communicate with others (family member, friends, etc.). 29. I hope I can try a ring tone before download it. 27. If some problem happens during the mobile entertainment process, I hope to get help and instruction easily. 22. If the mobile phone can take clear and good-quality photos and videos, I would like to use the function. 28. I like novel mobile games rather than those people are familiar with. 15. Many people around me use mobile entertainment. 5. I find mobile entertainment enjoyable. 7. I like to take mobile as an entertainment tool, and play with it voluntarily. 1. In daily work and life, I often need to use mobile entertainment. 8. Mobile entertainment can keep me up with the times. 2. I will give up a mobile entertainment if it’s too hard to use. (negative loading) 25. I concern a lot of mobile monthly tariff. 9. I think there are risks in some mobile entertainment process. (negative loading) 10. I can use mobile entertainment without any help. 16. I don’t feel trouble in using mobile entertainment. 6. I’m capable enough to use mobile entertainment. 11. Mobile entertainment is novel to me. 26. I decide whether to use a mobile entertainment by it’s price. (negative loading) 24. A low price is very important for me to buy a mobile. 23. I think it’s more convenient to take photos/videos using a mobile than using a particular camera. (negative loading)
Finally, according to the results of factor analysis, the mobile entertainment acceptance (MEA) model for Chinese rural people was built (Figure 1). The model has two main aspects: First, seven factors that influence Chinese rural people’s mobile entertainment acceptance, represented by seven ellipses in the visualized model (Figure 1). Second, the importance order of each factor, which is ranked by the factor eigenvalue: social influence (3.170), technology and service quality (2.875), entertainment utility (2.524), simpleness and certainty (1.713), self-efficacy (1.644), perceived novelty (1.510), and cost (1.394). Factor eigenvalue is the measurement of explained variance. For each factor, the higher the eigenvalue is, the more variance it can explain, therefore in the model the more important the factor is. In our visualized model (Figure 1), the area of each ellipse represents the importance grade, for example, the first ranked factor “social influence” has the largest area.
342
J. Liu et al.
Fig. 1. Mobile entertainment acceptance (MEA) model for Chinese rural people
6 Discussion and Conclusion In the research, the MEA model considers users, technologies and the environment comprehensively. It not only involves most of the theoretical factors that have been proved to influence users’ technology acceptance, but also structures them into seven main factors and gives the importance weights of each one. As a result, a whole picture is provided for technology acceptance theory. Additionally, a new factor “cost” was found to effect users’ acceptance, at least for rural people’s mobile entertainment acceptance. It suggests a novel point of view and makes the theory more comprehensive. Based on the comprehensive model, we can easily draw a series of practical suggestions for mobile entertainment service and ecosystem design. From the empirical index, practitioners can also get references for which is more important and which is less. For example, the model shows that Chinese rural people are most influenced by social factors when considering whether to accept a new technology, so designers and marketers can put social strategies first priority. However, the model still needs to be refined to be more predictive. First is because all the seven factors are compared and weighted by the factor eigenvalue. This method simply puts them in the same level and ignores the internal relationships among factors. Therefore, further analysis, like path analysis or regression is needed to explore their real relationships. Second, the sample is from a single rural area of China. However, several different types of rural society exist all around the world, varying in economics, culture, education, weather, and so on. All these may influence people’s technology acceptance pattern. A more systematic sample will be explored to verify the model. In conclusion, through both qualitative and quantitative research, a factor-based model is built to invest the mobile entertainment acceptance by Chinese rural people.
Acceptance of Mobile Entertainment by Chinese Rural People
343
Seven factors are involved as well as their importance index. This research provides a comprehensive approach in technology acceptance theory. It can also help practitioners to better understand the rural user group and improve their products accordingly.
References 1. Fishbein, M., Ajzen, I.: Belief, attitude, intention and behavior: An introduction to theory and research. Addison-Wesley, Reading (1975) 2. Wang, L.: Variables contributing to old adults acceptance of IT in China, Korea and USA. Unpublished PhD proposal (2007) 3. Wiener, S.N.: Terminology of Mobile Entertainment: An Introduction. In: Mobile Entertainment Forum (2003) 4. Davis, F.D.: A technology acceptance model for empirically testing new end-user information systems: Theory and results. Unpublished Doctoral dissertation, MIT Sloan School of Management, Cambridge, MA (1986) 5. Ajzen, I.: The Theory of Planned Behavior. Organizational Behavior and Human Decision Processes 50(2), 179–211 (1991) 6. Rogers, E.M.: Diffusion of innovations, 4th edn. Etats-Unis Free Press, New York (1995) 7. Moore, G.C., Benbasat, I.: Development of an Instrument to Measure the Perceptions of Adopting an Information Technology Innovation. Information Systems Research 2(3) (1991) 8. Davis, F.D.: Perceived usefulness, perceived ease of use and user acceptance of information technology. MIS Quarterly 13(3), 319–340 (1989) 9. Venkatesh, V., Davis, F.D.: A Theoretical Extension of the Technology Acceptance Model: Four Longitudinal Field Studies. Management Science 46(2), 186–204 (2000) 10. Igbaria, M., Parasuraman, S., Baroudi, J.: A motivational model of microcomputer usage. Journal of Management Information Systems 13(1), 127–143 (1996) 11. Hsu, C.-L., Lu, H.-P.: Consumer behavior in online game communities: A motivational factor perspective. Computers in Human Behavior 23, 1642–1659 (2007) 12. Venkatesh, V., Davis, F.D.: A Theoretical Extension of the Technology Acceptance Model: Four Longitudinal Field Studies. Management Science 46(2), 186–204 (2000) 13. Igbaria, M., Schiffman, S.J., Wieckowshi, T.S.: The respective roles of perceived usefulness and perceived fun in the acceptance of microcomputer technology. Behaviour and Information Technology 13(6), 349–361 (1994) 14. Taylor, S., Todd, P.A.: Understanding Information Technology Usage: A Test of Competing Models. Information Systems Research 6(2) (1995) 15. Lee, S.M.: South Korea: From the land of morning calm to ICT hotbed. Academy of Management Executive 17(2) (2003) 16. Webster, C.: Influences upon consumer expectations of services. Journal of Services Marketing 5(1), 5–17 (1991) 17. Black, J.B., Kay, D.S., Soloway, E.M.: Goal and plan knowledge representations: From stories to text editors and programs. In: Carroll, J.M. (ed.) Interfacing Thought, pp. 36–60. The MIT Press, Cambridge (1987) 18. Davis, F.D., Bagozzi, R.P., Warshaw, P.R.: Extrinsic and Intrinsic Motivation to Use Computers in the Workplace. Journal of Applied Social Psychology 22(14), 1111–1132 (1992)
344
J. Liu et al.
19. Park, C., Jun, J.-K.: A cross-cultural comparison of Internet buying behavior Effects of Internet usage, perceived risks, and innovativeness. International Marketing Review 20(5), 534–553 (2003) 20. Compeau, D.R., Higgins, C.A.: Computer self-efficacy: development of a measure and initial test. MIS Quarterly 19, 189–211 (1995) 21. Boulton-Lewis, G.M., Buys, L., Lovie-Kitchin, J., Barnett, K., David, L.N.: Ageing, Learning, and Computer Technology in Australia. Educational Gerontology 33(3), 253– 270 (2007) 22. Stark-Wroblewski, K., Edelbaum, J.K., Ryan, J.J.: Senior Citizens Who Use E-mail. Educational Gerontology 33(4), 293–307 (2007) 23. White, J., Weatherall, A.: A grounded theory analysis of old adults and information technology. Educational Gerontology 26(4), 371–386 (2000) 24. Dowling, G.R., Staelin, R.: A Model of Perceived Risk and Intended Risk-Handling Activity. The Journal of Consumer Research 21(1), 119–134 (1994) 25. Krueger, R., Casey, M.: Focus Groups: A Practical Guide for Applied Research, 3rd edn. Sage Publications, Inc, Thousand Oaks (2000) 26. Gorsuch, R.L.: Factor analysis, 2nd edn. Lawrence Erlbaum, Hillsdale (1983) 27. Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951) 28. Nunnally, J.C.: Psychometric Theory. McGraw-Hill, New York (1978) 29. Hair, J.F., Anderson Jr., R.E., Tatham, R.L., Black, W.C.: Multivariate Data Analysis. Prentice-Hall International, New Jersey (1995) 30. Stiggelbout, A.M., Molewijk, A.C., Otten, W., Timmermans, D.R.M., van Bockel, J.H., Kievit, J.: Ideals of patient autonomy in clinical decision making: a study on the development of a scale to assess patients’ and physicians’ views. Journal of Medical Ethics 30(3), 268–274 (2004)
Universal Mobile Information Retrieval David Machado, Tiago Barbosa, Sebastião Pais, Bruno Martins, and Gaël Dias Centre of Human Language Technology and Bioinformatics, University of Beira Interior 6201-001, Covilhã, Portugal {david,tiago,sebastiao,brunom,ddg}@hultig.di.ubi.pt
Abstract. The shift in human computer interaction from desktop computing to mobile interaction highly influences the needs for new designed interfaces. In this paper, we address the issue of searching for information on mobile devices, an area also known as Mobile Information Retrieval. In particular, we propose to summarize as much as possible the information retrieved by any search engine to allow universal access to information. Keywords: Mobile Information Retrieval, Clustering of Web Page Results, Automatic Summarization.
1 Introduction and Related Work The shift in human computer interaction from desktop computing to mobile interaction highly influences the needs for new designed interfaces. In this paper, we address the issue of searching for information on mobile devices, an area also known as Mobile Information Retrieval. Within this scope, two issues must be specifically tackled: web search and web browsing. On the one hand, small size screens of handheld devices are a clear limitation to displaying long lists of relevant documents which induce repetitive scrolling. On the other hand, as most web pages are designed to be viewed on desktop displays, web browsing can interfere with users’ comprehension as repetitive zooming and scrolling are necessary. To overcome the limitations presented by current search engines to handle information on mobile devices, we propose a global solution to web search and web browsing based on clustering of web page results and web page summarization. Most of the projects on mobile search deal with organizing the information to fit into small screens without benefiting from new trends in Information Retrieval presented in [1] and [2]. Indeed, projects such as Yahoo Mobile1, Google Mobile2 or Live Search Mobile3 present information in a classic way by listing web page results as it is shown in Figure 1. In order to show as many results as possible on the screens of PDAs or smart phones, layout structures are usually redesigned to keep to their basics. In fact, commercial projects have mainly privileged services over location on 1
http://mobile.yahoo.com/yahoo http://www.google.com/mobile/ 3 http://www.livesearchmobile.com/ 2
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 345–354, 2009. © Springer-Verlag Berlin Heidelberg 2009
346
D. Machado et al.
(a)
(b)
(c)
(d)
Fig. 1. (a) Google Mobile. (b) Yahoo Mobile. (c) Live Search Mobile. (d) Searchme Mobile.
mobile devices such as news, weather forecast or maps rather than providing new ways of searching for information, maybe to the exception of local search facilities. Other projects have proposed different directions. In particular, Searchme Mobile4 is certainly one of the first mobile search engine to categorize web page results as shown in Figure 1d. By doing so, it is clear that web search is made easier to the user. Indeed, the more the information is condensed into chunks of valuable information the more it is accessible to any user (paired or impaired) in any location (car, home, street, etc.). However, the solution implemented by Searchme is based on a set of predefined categories for each query term. As a consequence, the categorization can only be performed for well-known queries. In the case the category is not known, no search results are displayed. This solution is clearly unsatisfactory as one may want to query any term in any language over the all web. Within the VipAccess project5, we propose to cluster web page results “on the fly” independently of the language thus allowing to searching for any query in any language over the entire web and providing a user-friendly interface for mobile devices (Figure 2b). For that purpose, we propose to cluster web page results based on a new clustering algorithm CBL (Clustering by Labels) especially designed for web page results. Comparatively to Searchme, we propose a more sophisticated way to cluster web page results, which does not depend on pre-defined categories, and as such can be applied “on the fly” as web page results are retrieved from any domain, any language or any search engine. In terms of visualization of web page results, clustering may drastically improve users’ satisfaction rates as only few selection items are presented to the user. However, an extra-step in the search process is introduced which may interfere with the users’ habits to scroll lists of web page results. In this paper, we also propose different visualizations which try to make the most of both techniques i.e. lists of web page results and lists of clusters of web pages results. 4 5
http://m.searchme.com/ This project is funded by the Portuguese Fundação para a Ciência e a Tecnologia with the reference PTDC/PLP/72142/2006.
Universal Mobile Information Retrieval
(a)
(b)
347
(c)
Fig. 2. (a) VipAccess Mobile interface. (b) VipAccess Mobile with clusters. (c) VipAccess Mobile for summarization.
In terms of summarization of web contents, accessing summaries of information instead of full information may be a great asset for users of mobile devices. Indeed, most web pages are designed to be viewed on desktop displays. As a consequence, users may find it hard to evaluate the importance of a document as they have to come through all of it by repetitive zooming and scrolling. Some solutions have been proposed by content providers to overcome these drawbacks. They usually require an alternate trimmed-down version of documents prepared beforehand or the definition of specific formatting styles. However, this situation is undesirable as it involves an increased effort in creating and maintaining alternate versions of a web site. Within the VipAccess project, we propose to automatically identify the text content of any web page and summarize it in an efficient way so that web browsing is limited to its minimum. For that purpose, we propose a new architecture for summarizing Semantic Textual Units [3] based on efficient algorithms for semantic treatment such as the SENTA multiword extractor [4] which allows real-time processing and languageindependent analysis of web pages thus proposing quality content extraction and visualization (Figure 2c).
2 Clustering Web Page Results The categorization of web page results is obtained by the implementation of a new clustering algorithm called CBL (Clustering by Label) which is specifically designed to cluster web page results and is inspired from the label-derived approach. In terms of clustering algorithms, two different approaches have been proposed: label-derived clustering [1][5][6][7] and document-derived clustering [2][8][9]. The first approach defines potential labels and agglomerates documents which share common labels and the second groups similar documents based on text similarities and extracts potential
348
D. Machado et al.
labels at the end of the process. CBL is a label-derived clustering algorithm and as such, the first step of the clustering process aims at identifying potential labels. 2.1 Label Identification Most methodologies identify potential labels based on the extraction of frequent itemsets. A frequent itemset is a set of items that appear together in more than a minimum fraction of the whole document set. For that purpose, different language-independent and language-dependent approaches have been proposed. In the first case, [5] implement a suffix tree-like structure and [6] use association rules. In the second case, [1] propose to extract common gapped sentences from linguistically enriched web snippets and [7] extract frequent word sequences based on suffix-arrays which are weighted using the well-know tf.idf score. As one may want to search over the entire web in any language and any domain, it is important that the clustering algorithm only depends on language-independent features. Within this scope, the identification of relevant labels based on frequent itemsets mainly takes frequency of occurrence as a clue for extraction. However, this methodology suffers from the poor quality of web snippets which mainly contain illformed sentences with many repetitions. To overcome this drawback, we propose to weight strings based on three different word distributions and consequently extract potential labels. Internal Value of a String. If a string6 appears alone in a chunk of text separated on both sides by any given delimiter (such as a HTML tag or a comma), this string is likely to be meaningful. This characteristic is weighted in Equation (1) where w is any string, A(w) is the number of occurrences where w appears alone in a chunk and F(w) is the total number of occurrences of w. .
(1)
External Value of a String 1. The bigger the number of strings that co-occur with any string w both on its left and right contexts, the less meaningful this string is likely to be. This characteristic is weighted in Equation (2) where w is any string and WIL(w) (resp. WIR(w)) is the number of strings which appear on the immediate left (resp. right) context of string w. .
(2)
External Value of a String 2. The bigger the number of different strings that cooccur with any string w both on its left and right contexts compared to the number of co-occurring strings on both contexts, the less meaningful this string is likely to be. This characteristic is weighted in Equation (3) where w is any string, WDL(w) (resp. WDR(w)) is the number of different strings which appear on the immediate left (resp. right) context of string w and FH(w) is equal to max[F(w)], for all w. 6
In our context, a string is any sequence of characters separated by spaces or other common linguistic delimiters such as dots, commas, etc.
Universal Mobile Information Retrieval
349
.
(3)
Based on these three characteristics, we propose to weight all strings from the web snippets as in Equation (4) such that the smaller the W(w) value, the more meaningful the string w.
1
,
0.5
,
0.5
(4)
In Table 1, we present the 30 most relevant results of our weighting score W(.) for the query term “programming” searched over Google search engine7, Yahoo search engine8 and MSN search engine9 accessed via respective web services. Table 1. The first 30 strings ordered by W(.) for the query “programming” String (1-5) Articles Wikibooks Computers Compilers Subject
String (6-10) Perl Java Php Training Forums
String (11-15) tutorials c wiki security database
String (16-20) Cgi Category Knuth Home Advanced
String (21-25) documentation News Net Unix Internet
String (26-30) tips science object-oriented site downloads
2.2 Clustering by Labels Once all important words have been identified, these are going to play a crucial role in the process of clustering following the label-derived approach. Within this scope, many algorithms have been proposed based on frequent item sets [1][5][6][7]. In this paper, we propose a new algorithm called Clustering by Label (CBL) which objective is to group similar documents around meaningful word anchors i.e. labels. The algorithm is based on three steps: pole creation, unification and absorption, and labeling. Pole Creation. We first need to initialize the algorithm so that we can start from potential meaningful labels. For that purpose, all words with less than a given threshold α10, which cover more than two urls, are proposed as initial cluster centers i.e. poles. For each start pole, a list of urls is built. An url is added to the list if it contains the pole word before a β position of the sorted relevant word list of each url. In particular, this allows to controlling the number of urls which are added to each pole since low β will produce smaller clusters and on the opposite, high values will join more results.
7
http://www.google.com http://www.yahoo.com 9 http://www.msn.com 10 Most meaningful strings. 8
350
D. Machado et al.
Union and Absorption. The next stage aims at iteratively unifying clusters which contain similar urls. For that purpose, we define two types of agglomerations: Union, when two clusters contain a significant number of common urls and share similar size in terms of cluster number; Absorption, when they share many common urls but are dissimilar in size. As a consequence, we define two proportions: P1, the number of common urls between two clusters divided by the number of urls of the smaller cluster and P2, the number of urls in the smaller cluster divided by the number of urls in the bigger cluster. The following algorithm is then iterated. For each cluster, P1 is calculated over all other clusters. Then for each pair of clusters, if P1 is higher than a constant γ, then we evaluate P2 between both clusters. If P2 is higher than a constant δ, then the pair of clusters is added to the Union list otherwise it is integrated in the absorption list. Once all clusters have been covered, both union lists and absorption are treated. The union list is first processed as follows. For each cluster pair in the union list11, each two clusters are joined into the original cluster with the highest W(.) score for its label. At each step of this process, clusters indexes are substituted and unified clusters are removed in the union list to keep a list of updated clusters. Then the absorption list is processed. Iteratively select the pair of clusters which contains any cluster which cannot be absorbed by any other one in the absorption list. Once encountered, this cluster absorbs the cluster which forms the pair with it, cluster indexes are updated and useless clusters removed. Both lists have been updated and the initial process iterates, thus enabling flat clustering (first step of the algorithm) or hierarchical clustering (all steps of the algorithm). Moreover, the CBL algorithm allows soft clustering as urls may be contained in different clusters. Finally, clusters are labeled. Labeling. By union and absorption, each cluster may contain different candidate labels. However, it may be the case that urls in the cluster contain more meaningful words (i.e. multiword units) than the highly scored single words. As a consequence, multiword units are extracted from the web snippets agglomerated in the clusters by applying a methodology proposed by [18] implemented with suffix-arrays for real-time processing12. Then, each multiword unit is compared to the potential labels and if it contains one of the single words it is evaluated by frequency if it must replace the single word label. Finally, the best scoring labels, with a given threshold, are chosen as final labels. 2.3 Visualization In terms of visualization of web page results, clustering may drastically improve users’ satisfaction rates as only few selection items are presented to the user on the small screens of mobile devices (Figure 2b). However, an extra-step in the search process is introduced which may interfere with the users’ cognitive process to search for information. Indeed, the user is used to find web page results after the first selection. In order to avoid the gap between the classic view (lists of web pages) and the cluster view (list of clusters), we propose to display the most relevant web page result of each encountered cluster in the form of a list as shown in Figure 3a. As such, the user is proposed the best possible coverage of its query with the minimum number of 11 12
Both the union and the absorption lists are ordered by W(.) score of the label. This method has proved to be particularly suited for web snippets processing.
Universal Mobile Information Retrieval
351
web page results thus reducing scrolling and maintaining the cognitive process for information search. If the user wants to keep the classic view, this option is available but always with the indication of the name of the belonging cluster so that the user can navigate to any given cluster and visualize only its members (Figure 3b). In order to take into account that the users of mobile devices may use their device in different contexts, such as car, classroom or street, we also propose a full-screen visualization (Figure 3c). In this case, the best first web page result of the most relevant cluster is presented to the user. The next result is obviously the best first web page result of the second most relevant cluster, and so on and so forth.
(a)
(b)
(c)
Fig. 3. (a) Clustering visualization. (b) List visualization. (c) Full-screen visualization.
The visualization issue of web page results has never been addressed as far as we know, although it is at the core of the success or failure of new techniques in Information Retrieval. Indeed, most search engines which propose interfaces with clustering of web page results13 are not as popular as classic search engines although they provide a better understanding of the retrieved information. A reason for that may be the lack of newly designed interfaces for the sake of information search.
3 Web Page Summarization After clustering web page results, scrolling and zooming must also be kept to its minimum for web browsing. For this purpose, we propose a new architecture to summarize Semantic Textual Units [3] which embeds an efficient algorithm for multiword extraction [4]. 3.1 Semantic Textual Units and Multiword Units One main problem to tackle is to define what to consider as a relevant text in a web page. Indeed, web pages often do not contain a coherent narrative structure. So, the 13
For example, http://www.clusty.com or http://www.searchme.com
352
D. Machado et al.
first step of any system is to identify rules for determining which text should be considered for summarization and which should be discarded. For this purpose, [3] propose to identify Semantic Textual Unit (STU). STUs are page fragments marked with HTML markups which specifically identify pieces of text following the W3 consortium specifications. It is clear that the STU methodology is not as reliable as any language model for content detection [10] but on the opposite it allows fast processing of web pages. Once each STU has been identified in the web page it is processed with the SENTA software [4] to identify and mark relevant phrases in it. SENTA is statistical parameter-free software which can be applied to any language without tuning and as a consequence is totally portable. Moreover, its efficient implementation shows time complexity Θ(N log N) where N is the number of words to process which allows the extraction of relevant phrases in real-time. 3.2 Extractive Text Summarization Extractive text summarization aims at finding the most significant sentences in a given text. So, a significance score must be assigned to each sentence in a STU. The sentences with higher significance naturally become the summary candidates and a compression rate defines the number of sentences to extract. For this purpose, we implement the TextRank algorithm [11] combined with an adaptation of the wellknown inverse document frequency, the inverse STU frequency (isf) to weight word relevance. The basic idea is that highly ranked words with high isf are more likely to represent relevant words in the text and as a consequence provide good clues to extract relevant sentences for the summary. Within our purpose, each STU is first represented as an unweighted oriented graph being each word connected to its successor following sequential order in the text. Following the TextRank algorithm, the score S(.,.) of any word wi in any stu is defined as in Equation (5) where In(wi) is the set of words that point to wi, Out(wj) is the set of words that the word wj points to and d is the damping factor set to 0.85. ,
,
∑
1
|
|
.
(5)
Then, each word is weighted as in Equation (6) based on its graph-based ranking and its relevance in the text based on its inverse STU frequency where N is the number of STUs in the text and stuf(w) is the number of STUs the word w appears in. .
,
,
.
(6)
Finally, the sentence significance weight is defined as in [12], thus giving more weight to longer sentences, as shown in Equation (7) where |S| stands for the number of words in sentence S, wi is a word in S and max(|S|) is the length of the longest sentence in the STU. ,
| |
∑
.
, | |
| |
.
(7)
Universal Mobile Information Retrieval
353
In order to present as much information of the web page as possible so that its understanding is eased, the best scoring sentences of each STUs are retrieved and presented to the user as in Figure 2c14. As such, the user gets the most of the web page in a small text excerpt easy to read and scroll.
4 Conclusions and Future Work In this paper, we proposed a global solution to web search and web browsing for handheld devices based on web page results clustering, web page summarization and new ideas for visualization. In order to enable full information access to any users (paired or impaired), we also propose a speech-to-speech interface which is used as the exchange mode which may allow to achieving greater user satisfaction [13] in situations where the hands are not free [14], whenever reading is difficult [15], or in situations of mobility [16]. Moreover, we propose a location search based on Global Positioning System (GPS) which automatically expands the original query with the closest city name to the user’s location. In particular, a test of the interface has been conducted in the context of visually impaired people which received positive feedback although coherent and exhaustive evaluation is still needed in the way [17] explain.
References 1. Ferragina, P., Gulli, A.: A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. Journal of Software: Practice and Experience 38(2), 189–225 (2008) 2. Campos, R., Dias, G., Nunes, C., Nonchev, B.: Clustering of Web Page Search Results: A Full Text Based Approach. International Journal of Computer and Information Science 9(4) (2008) 3. Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices. In: 10th International World Wide Web Conference (2000) 4. Gil, A., Dias, G.: Using Masks, Suffix Array-based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora. In: Workshop on Multiword Expressions of the International Conference of the Association for Computational Linguistics (2003) 5. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: 19th Annual International SIGIR Conference (1998) 6. Fung, P., Wang, K., Ester, M.: Large Hierarchical Document Clustering using Frequent Itemsets. In: SIAM International Conference on Data Mining (2003) 7. Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on Singular Value Decomposition. In: Intelligent Information Systems Conference (2004) 8. Jiang, Z., Joshi, A., Krishnapuram, R., Yi, Y.: Retriever Improving Web Search Engine Results using Clustering. Journal of Managing Business with Electronic Commerce (2002) 14
Compression rate is defined by the user in the menu options.
354
D. Machado et al.
9. Dias, G., Pais, S., Cunha, F., Costa, H., Machado, D., Barbosa, T., Martins, B.: Hierarchical Soft Clustering and Automatic Text Summarization for Accessing the Web on Mobile Devices for Visually Impaired People. In: 22nd International FLAIRS Conference (2009) 10. Dolan, W.B., Quirk, C., Brockett, C.: Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In: Interantional Conference on Computational Linguistics (2004) 11. Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Conference on Empirical Methods in Natural Language Processing (2004) 12. Vechtomova, O., Karamuftuoglu, M.: Comparison of Two Interactive Search Refinement Techniques. In: Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting (2004) 13. Lee, K.W., Lai, J.: Speech versus Touch: A Comparative Study of the Use of Speech and DTMF Keypad for Navigation. International Journal Human-Computer Interaction 19, 343–360 (2005) 14. Parush, A.: Speech-based Interaction in a Multitask Condition: Impact of Prompt Modality. Human Factors 47, 591–597 (2005) 15. Fang, X., Xu, S., Brzezinski, J., Chan, S.S.: A Study of the Feasibility and Effectiveness of Dual-modal Information Presentations. International Journal Human-Computer Interaction 20, 3–17 (2006) 16. Oviatt, S.L., Lunsford, R.: Multimodal Interfaces for Cell Phones and Mobile Technology. International Journal of Speech Technology 8, 127–132 (2005) 17. Fallman, D., Waterworth, J.A.: Dealing with User Experience and Affective Evaluation in HCI Design: A Repertory Grid Approach. In: Conference on Human Factors in Computing Systems (2005) 18. Frantzi, K.T., Ananiadou, S.: Retrieving Collocations by Co-occurrences and Word Order Constraint. In: 16th International Conference on Computational Linguistics (1996)
ActionSpaces: Device Independent Places of Thought, Memory and Evolution Rudolf Melcher, Martin Hitz, and Gerhard Leitner University of Klagenfurt, Universitätsstraße 65-67, 9020 Klagenfurt, Austria {rudolf.melcher,martin.hitz,gerd.leitner}@uni-klu.ac.at
Abstract. We propose an inherently three-dimensional interaction paradigm which allows individuals to manage their personal digital artifact collections (PAC) regardless of the specific devices and means they are using. The core of our solution is to provide unified access to all user artifacts normally spread across several repositories and devices. Not till then individuals may foster and evolve persistent multi-hierarchical artifact structures (PAS) fitting their cognitive needs. PAS subsets can be arranged and meaningfully related to virtual habitats or even mapped to physical contexts and environments they are frequenting to solve their tasks. Keywords: 3DUI, interaction paradigm, semantic desktop metaphor, ubiquitous computing, distributed computing, distributed cognition, mixed realities, concept maps, virtual file system, post-WIMP, post-desktop, digital artifacts, information space.
1 Introduction In our work we emphasize the need for a paradigm change as a direct consequence of complex personal device infrastructures. We therefore try to determine the minimum characteristics facilitating device-independent interaction in the face of virtual artifacts (i.e. digital entities like files, folders, repositories, emails, contacts, appointments, web pages, database-views, services views, rendered objects, etc.) Virtual artifact management is a difficult cognitive task on its own at least with regard to artifact classification. But now we have to discuss additionally where multi-device usage will finally lead to and how its potentials may be leveraged. In our opinion techniques subsumed as cloud computing are not sufficient. We recognize a demand for user-centric, deviceindependent, and non-hierarchical structures carrying the artifacts of individuals persistently while being ubiquitously accessible. Our approach is to determine an adequate set of concepts and methods for so-called workspace-level integration [1]. The driving vision and assumption is that in the course of time ubiquitous augmented realities (UARs) related to user’s cognitive panoramas will succeed. In terms of system architecture we propose two new layers reorganizing the artifact management between hardware and operating systems on the one hand and applications on the other hand. For all that, the feature sets and capabilities specific to distinct appliances shall remain unlimited. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 355–364, 2009. © Springer-Verlag Berlin Heidelberg 2009
356
R. Melcher, M. Hitz, and G. Leitner
Fig. 1. Four layer approach to facilitate device-independent interaction
First, we outline the middleware layer called SubFrame which masks the specific file handling mechanisms and unifies the artifact references across an individual’s heterogeneous device infrastructure. Persistent user-specific structures for virtual artifacts (PAS) are accessed, manipulated, and stored on this basis. Second, subsets of PAS are embedded and presented in 3D work spaces called ActionSpaces. We define ActionSpaces as session-persistent geometrical areas populated by virtual artifacts according to an individual’s concept map or mental map. An ActionSpace may be rendered at arbitrary points on Milgram’s reality-virtuality continuum [2]. Accordingly, it is an extension and virtualization of workplaces and habitats normally experienced in the physical world with a view to tool and artifact arrangements. ActionSpaces are more flexible than virtual desktops. Current desktop implementations are bound to screen characteristics and represent only a small fraction of possible manifestations along Milgram’s continuum. In this sense, ActionSpaces manifold virtual desktop scenarios on common PC-systems as well. There seems to be a firm conviction in the HCI community that 3DUI technology is only of use in complex and expensive visualization scenarios and probably will never penetrate common workplaces, with the exception of gaming. In opposition to that we are convinced, that inherently three-dimensional work-spaces represent an important step towards ubiquitous augmented realities (UARs) [3]. We found two intriguing arguments for this. On the one hand 3DUIs are the most comprehending virtual hull to arrange and render all kinds of virtual artifacts such as text, image, video, sound, animated 3D graphics, etc. and to express relations between them. On the other hand virtual three-dimensional spaces can be mapped to physical locations without the slightest effort, provided appropriate paradigms are applied. This, in turn, is an important foundation for pervasive computing scenarios and location based services. With the two layers proposed we follow a top-down approach in order to develop a device-independent thought-pattern for interaction.
2 Related Work The research assignment to specify a device-independent and user-centric interaction paradigm is related to many distinct research areas partially summarized here. This
ActionSpaces: Device Independent Places of Thought, Memory and Evolution
357
multi-disciplinary interdependence, the need to develop a basic understanding of the heterogeneous issues related and the difficulty to specify formal models makes it difficult to prove the assumptions, potentials and consequences of our approach in this early stage. The potentials and consequences have to be discussed separately and investigated issue by issue in view of every distinct area after a complete prototypical framework is available. For now we want to provoke an interdisciplinary discussion about its need. Cognitive Models and Semantic Desktops As Winograd and Flores stated “we recognize that in designing tools we are designing ways of beeing!"[4] There is a lot of empirical evidence for the defects of the desktop metaphor discussed closely in [1]. Regarding possible workspace-level implementations for non-hierarchical artifact structures we can build on the semantic desktop paradigm defined by Sauermann et al.: “A Semantic Desktop is a device in which an individual stores all her digital information like documents, multimedia and messages. These are interpreted as Semantic Web resources, each is identified by a Uniform Resource Identifier (URI) and all data is accessible and queryable as RDF graph. Resources from the web can be stored and authored content can be shared with others. Ontologies allow the user to express personal mental models and form the semantic glue interconnecting information and systems. Applications respect this and store, read and communicate via ontologies and Semantic Web protocols. The Semantic Desktop is an enlarged supplement to the user’s memory.” [5] However, by some means or other these semantic desktop solutions remain application-, platform-, or device-dependent. These dependences are predetermined breaking points for the whole paradigm. The user-centrism aimed is not consistently portable and the visualization of ontologies remains problematic. 3DUI, MR, UARs Kirsh argues that „how we manage the spatial arrangement of items around us is not an afterthought: it is an integral part of the way we think, plan, and behave.”[6] Gregory Newby argues that in exosomatic memory systems, the information spaces (geometry) of systems will be consistent with the cognitive spaces of their human users. [7] Following this discussion we conclude that 3D user interfaces (3DUIs) summarized by Bowman et al. [8] and AR interfaces discussed by Haller et al. [9] can fit these requirements. Issues related to the tracking and rendering performance on common user platforms were striking problems in the last decades. Now, the major hindrances are caused by application, platform and device dependences. As Sandor and Klinker conclude “Increasingly, an overarching approach towards building what we call ubiquitous augmented reality (UAR) user interfaces that include all of the just mentioned concepts will be required” [3]. At the end of the day “user interfaces for ubiquitous augmented reality incorporate a wide variety of concepts such as multi-modal, multi-user, multi-device aspects and new input/output devices.” [10]
3 SubFrame Our ultimate goal is to establish true device-independence while supporting consistent artifact management across the whole range of user devices and appliances. We
358
R. Melcher, M. Hitz, and G. Leitner
therefore propose a middleware layer called SubFrame which we introduced in [11]. We assume the existence of such a layer for the discussion of ActionSpaces. For sake of completeness, we will briefly address it here. The SubFrame holds and provides a complete set of references to the user’s artifacts. It is, however, neither responsible for nor involved in the rendering of artifacts. The terms “personal information cloud” and “cloud computing” immediately coming to mind are very popular but fuzzy. While clearly related to our work we need to distinguish from them as they are disputed with regard to service orientation and we discuss user-centered artifact management here. Throughout this paper we use the term personal artifact collection (PAC) being more precise regarding artifact sets and personal artifact structures (PAS) being more precise regarding their interrelationships. The whole concept is based on the following set of key assumptions: Table 1. Key assumptions of device-independence 0 1 2 3 4 5 6
Devices are networked (temporarily) PACs don’t belong to devices. PACs don’t belong to applications or services. PACs are private and non-substitutable. PACs are partially shared. PAS are unique in their characteristics. PAS are ever-changing and evolving.
In [11] we suggest employing a personal proxy server (PPS) for the purpose of sub framing. The PPS holds a central instance of the user’s PAS. Every single user appliance has to request the artifacts through the PPS. Even – and that is very important – local artifacts have to be requested through the PPS (loopback) to enable classification and temporal tracing. Despite evident redundancies this is maintainable, as artifacts are handled by reference and not by containment. In a second step this approach may also be used to implement transparent backup mechanisms. The consequence of workspace-level integration yields the ground rule not to foster proprietary solutions. For prototyping we use Squid proxy servers and we investigate how the RDF format may be used in conjunction to express and annotate non-hierarchical personal artifact structures. At the end of the day this layer implements the five functional requirements for personal information management defined by Ravasio et al. [12], which are: unified handling, multiple classification, bi-directional links, tracing of documents’ temporal evolution, and transparent physical location of a particular piece of information. Finally, PAC and PAS together represent an instance of the personal information cloud, a kind of exo-somatic memory [7] at our disposal.
4 ActionSpaces Now let us assume individuals have unobstructed access to every single digital artifact in every collection they own or share with others. We have to identify the
ActionSpaces: Device Independent Places of Thought, Memory and Evolution
359
fundamental building blocks needed to make them (re-)presentable and manageable on every device, every interface, in every virtual environment we can think of. On typical WIMP (windows, icons, menus, and pointer) platforms application windows are used to render one or even more documents simultaneously. File manager applications of all kinds and proprietary file dialogs are playing an outstanding role on these platforms. They provide hierarchical artifact access and classification means. But this powerful technique cannot be implemented consistently and efficiently across the heterogeneous device infrastructure and Milgram’s reality-virtuality continuum [2]. Even on the historical target platforms for desktop computing we face a lot of problems in the view of cognitive load, efficiency and consistency [1]. Thus we have to find an alternative better suitable for our needs and consistently implementable. We propose to rely on the principle of abstraction since it is common sense that a monolithic “universal interface” is neither feasible nor eligible in terms of specification, implementation, and usage. We quest for a user-centric thought pattern, where all possible interactive features and interface implementations fit in. This thought pattern will allow for smooth transitions with minimal cognitive loads across the user’s device infrastructure. In the year 2000 Dachselt suggested a metaphorical approach called action spaces to structure three-dimensional user interfaces. He argues, that “new applications will be built in the near future, where the focus does not lie on navigation through more or less realistic worlds, but rather on 3D objects as documents in an interactive threedimensional user interface.”[13] He defines action spaces as task-oriented scenes of actions or more precisely as virtual 3D spaces with interface controls serving an associated task. Noteworthy he points out that action spaces do not have to be rooms in a geometric sense and “there is a need for the integration of these spaces in a more general visual application framework, in a geometric and metaphorical structure”. [13] Based on Dachselt’s work we reformulate and generalize ActionSpaces as sessionpersistent mixed-reality areas (geometry and location) populated by virtual artifacts (PAC) which are rendered according to the individual’s mental models (PAS). According to this definition ActionSpaces are a kind of generalized display and interaction areas. Traditional screen spaces, now part of a more generic scheme, represent only a small fraction of possible instances. 4.1 Device Independent Places Rendered or not, an ActionSpace has a geometry or cubic expansion respectively. The geometric bounding box may be based on the underlying hardware (e.g. screen size), it may be specified explicitly (e.g. fish-tank VR) or its dimension may be implicitly based on a real-world subspace (e.g. desk, wall, room). For now we define the bounding box dimensions of ActionSpaces as: ·
·
0,
0,
0.
(1)
The point of origin and the orientation may be bound to an absolute or relative location, i.e. a point in physical/virtual space or to a physical/virtual object (e.g. by using GPS coordinates or other tracking techniques). There are several possible bindings listed in Table 2. In accordance with Feiner et al. [14] we have to carefully
360
R. Melcher, M. Hitz, and G. Leitner
Y Environment Space
Object Space
Screen Space
Object Space
User Space Object Space
Object Space
X Z Fig. 2. Types of reference-spaces to distinguish carefully. Cubes represent the local coordinate systems involved.
distinguish environment spaces, object spaces, screen spaces, and user spaces as depicted in Figure 1. Following Bowman et al. [15], we should establish comprehensible mappings between the different spaces in terms of user interaction and interplay between real and virtual. Therefore we discuss the characteristics of ActionSpaces and possible mappings across the heterogeneous device infrastructure. Table 2. Binding examples between ActionSpaces and (physical) reference space Space Binding environment object screen user dynamic unbound
Physical Link Parameters used in ActionSpaces GPS, Fiduciary Marker, … position and orientation screen, prop, … object dimensions, orientation cell phone, monitor, … screen dimensions individual perspective, distance, 3DMouse, … position, orientation and scale --determined programmatically
ActionSpaces themselves may be rendered explicitly as a composition of virtual objects (i.e. all kinds of semi-opaque virtual environments) or not (i.e. as transparent augmented realities). The examples in Table 3 give a rough impression of interface paradigms possibly consulted to render ActionSpaces. The attribution of entries is ambiguous because the paradigms available today were not designed with the suggested classification in mind. In any mapping-case ActionSpaces serve as reference coordinate systems for the positioning and location of artifacts which always have to be rendered visually as they
ActionSpaces: Device Independent Places of Thought, Memory and Evolution
361
Table 3. Rendering examples and paradigms for ActionSpaces Space Binding environment object
screen user dynamic unbound
(Semi-)opaque Virtual Room (Cube VR)
Transparent augmented workbench (AR), Handheld AR World in Miniature (WiM), Augmented engine manual (AR), Control Panel (Virtual Prop) Marker-based AR, Tangible Computing Desktop, Today Screen TV-Inserts, RT Video-Overlays Head-Up-Display (HUD) AR HUD (head-tracked VR) (head-tracked AR) Fish-tank VR, Dome VR Pervasive Annotation Layers, Animation, Film RT Video
would be not accessible otherwise. Together, an ActionSpace and the artifacts populating it are specified in an XML based file format. Hence, ActionSpaces are artifacts themselves and an ActionSpace may contain other ActionSpaces. This is an important aspect of our concept allowing complex (i.e. non-hierarchical) relationships between artifacts and spaces and advanced information and artifact exchange between individuals. The two later show great promise for future collaboration scenarios, but they cannot be discussed here for now. By design an ActionSpace may be accessed on demand everywhere and at any time (universal accessibility). We fulfill this requirement by requesting it with its URI from a personal proxy server (PPS). For instance, on a desktop machine the process may appear similar to the request of a free-form HTML file in a full-screen browser. The appropriate type of space binding according to Table 3 is subject to the user’s location, device and platform (interface paradigm). Even time may be used as a parameter to support pervasive scenarios. The possibilities are manifold. Until the potentials and consequences can be studied in detail, we suggest an obvious design rationale, provided that the actual platform supports it: An ActionSpace shall be mapped to the environment-space when its position has to be fixed persistently. It shall be mapped to an object-space when it has to be moved frequently or transported. Screen-space mappings are used for currently prevalent location-independent applications and for remote scenarios, e.g. to access virtual artifacts located on my home-office shelf. User-space mappings capturing the user’s attentiveness are used to present artifacts available for interaction, activity guidance, and communication, e.g. note taking. We now will briefly look into ActionSpaces and discuss the abstract characteristics and features supporting activities like thinking, memorizing and evolving. 4.2 Thought, Memory, and Evolution Thinking involves the activity of classification. [16] Because spatial classification works well for individuals we need to provide proper means for this task. Depending on the device or platform on-hand artifacts should be rendered directly in full-detail, or represented intermediately as 3D objects, 2D icons, and text entries. Each artifact has a well-defined position, orientation and scale relating to the
362
R. Melcher, M. Hitz, and G. Leitner
ActionSpace it is member of. o To support the act of classification all three parameeters are separately adjustable. Wherever W possible, direct interaction shall be used to faccilitate this parameterization. Automatic grouping, ordering and alignment functiions shall be provided for morre complex activities to be conveniently performed. T The actual values shall be sessiion-persistent, i.e. they are saved together with the artiffact references in the XML file specifying the ActionSpace. As already mentioned, ActionSpaces A may contain other ActionSpaces in additionn to artifacts since ActionSpacees are artifacts themselves. The possibilities cannot be exposed in detail here, but reelated to cognitive activities this feature yields expressiions like to “change a topic”, “sttep through a process”, “go into detail”, “get an overvieew”, and many more. That way y we have plenty of means to browse multi-hierarchhical structures in relation to phy ysical and virtual contexts now. By design an artifact is always created inside an ActionSpace or imported frrom another one. Interaction teechniques akin to drag-and-drop (pointer) and linguiistic interaction (speech recogniition, command line interface) are used for these taskss so that position, orientation, an nd scale are implicitly defined at first. Complex manipuulation of “opened” artifacts is – aside from some advanced platforms and paradigm ms – still the prevalent domain of applications. The application layer is not within the scope of this work.
Fig. 3. At least three ActionSp paces are active with two of them bound to screen space (LAP and AAP) and one of them bound to t the physical environment space as actual workspace
As depicted in Figure 3 there t are at least three ActionSpaces active, even if not vvisible, at any time. Two of them are special instances and are always bound to the screen space or the user sp pace respectively. Their position and size depends on uuser preferences. The first one represents r the device-bound local artifact pool (LAP) and makes its content accessiblle. It may for example contain several image files recenntly shot by the user and not yeet integrated (i.e. not classified) in the PAS. LAPs are the counterpart to the prevalen nt file managers. The second one represents the ambiient artifact pool (AAP), i.e. thee environmental resources accessible in a given context. It makes their artifacts availab ble (e.g. thumb drives content) and allows sending artifaacts to them (e.g. printer queue). In advanced mixed reality scenarios the “real device” and its artifacts may be directly y addressed at their physical locations, by dragging virttual text documents and droppin ng them onto the printer with gestures. In most paradiggms an iconographic representattion is still essential to make AAPs handling convenientt.
ActionSpaces: Device Independent Places of Thought, Memory and Evolution
363
Structural persistence is guaranteed across the device infrastructure. Geometric arrangements related to gestalt principles are kept persistent or they are emulated to support individual recognition. For instance, if an ActionSpace bound to an environment space is accessed remotely, the geometric proportions and distances remain valid. But they will probably be scaled to establish comprehensible viewpoints and perspectives. Even if the artifacts contained in such an ActionSpace are listed in pure textual form, the original positions will be preserved. Hence, the content arrangements in an ActionSpace may be temporarily reorganized using non-persistent automatic modes or they may be manually and persistently rearranged by the user. The personal proxy server (PPS) allows the implementation of transparent backup strategies and encrypted artifact pools. It will be of great importance for individuals that with our approach every single artifact is accessible, and in terms of devicedependence nothing gets lost accidentally. In fact we work towards lifelong solutions for artifact management and thus lifelong exosomatic memories. For instance, the damage of one’s laptop yields no loss of artifacts since all ActionSpaces and PAS arrangements are still available from the PPS. Transparent backups and versioning make up for a potential loss of local copies. If an artifact is not findable via navigating the PAS it may be still found with common search techniques applied on the PPS. That way and not surprising search is still complementing the structural access.
5 Conclusions Every tool/device/appliance has its own strength and weaknesses. For that reason users want to interact with more than one. The availability of devices in different contexts (e.g. tasks, times and locations) is another reason. In the quest for device independence we are not well advised to seek for the intersection (i.e. greatest common divisor) of feature sets. With the widely discussed problems of current interaction paradigms and the steadily growing diversity of electronic appliances in mind we try to determine a minimal set of interaction means unifying artifact handling in a device-independent manner. That way the cognitive load for users can be minimized and the potentials of heterogeneous infrastructures may be leveraged. Both layers proposed – the SubFrame and the ActionSpaces – challenge the hierarchical file systems across the user’s device infrastructure and the dated desktop metaphor on PC devices. With both layers consequently realized, we will see new possibilities in artifact handling and sharing which will meet the cognitive conditions and needs of individuals. The concrete representation of artifacts, interrelations, and ActionSpaces depends on the individual’s needs and flavors. Therefore it could not be specified and, in view of design style guides, it never will. Following the argumentation of Ravasio and Tscherter, every single ActionSpace “should be a place where users, and only users, are able to engrave personal preferences and tastes.” [1] Acknowledgments. The authors would like to thank Prof. R. Mittermeir at the University of Klagenfurt for his constant input and challenging discussions and our student Bonifaz Kaufmann for the brave development of prototypical solutions.
364
R. Melcher, M. Hitz, and G. Leitner
References 1. Kaptelinin, V., Czerwinski, M.: Beyond the Desktop Metaphor. MIT Press, Cambridge (2007) 2. Milgram, P., Takemura, H., Utsumi, A., Kishino, F.: Augmented reality: A class of displays on the reality-virtuality continuum. In: SPIE, Telemanipulator and Telepresence Technologies, vol. 2351, pp. 282–292 (1994) 3. Sandor, C., Klinker, G.: A rapid prototyping software infrastructure for user interfaces in ubiquitous augmented reality. Personal Ubiquitous Comput. 9(3), 169–185 (2005) 4. Winograd, T., Flores, F. (eds.): Understanding computers and cognition. Ablex Publishing Corp., Norwood (1985) 5. Sauermann, L., Bernardi, A., Dengel, A.: Overview and outlook on the semantic desktop. In: Proceedings of the 1st Workshop on The Semantic Desktop at the ISWC 2005 Conference (2005) 6. Kirsh, D.: The intelligent use of space. Artificial Intelligence 73(1-2), 31–68 (1995) 7. Newby, G.B.: Cognitive space and information space. J. Am. Soc. Inf. Sci. Technol. 52(12), 1026–1048 (2001) 8. Bowman, D.A., Kruijff, E., LaViola, J.J., Poupyrev, I.: 3D User Interfaces: Theory and Practice. Pearson Education, Inc., Boston (2005) 9. Haller, M., Mark Billinghurst, B.T. (eds.): Emerging Technologies of Augmented Reality: Interfaces and Design. Idea Group Publishing, USA (2007) 10. Hilliges, O., Sandor, C., Klinker, G.: Interactive prototyping for ubiquitous augmented reality user interfaces. In: IUI 2006: Proceedings of the 11th international conference on Intelligent user interfaces, pp. 285–287. ACM, New York (2006) 11. Melcher, R.: Device-independent handling of personal artifact collections. Submitted to Interact 2009 (2009) 12. Ravasio, P., Vukelja, L., Rivera, G., Norrie, M.C.: Project infospace: From information managing to information representation. In: HumanComputerInteraction Interact 2003, 8092, Swiss Federal Institute of Technology, Zurich, Switzerland, pp. 864–867. IOS Press, Amsterdam (2003) 13. Dachselt, R.: Action spaces - a metaphorical concept to support navigation and interaction in 3d interfaces. In: Proceedings of ’Usability Centred Design and Evaluation of Virtual 3D Environments, April 13-14, 2000, Shaker Verlag, Aachen (2000) 14. Feiner, S., MacIntyre, B., Haupt, M., Solomon, E.: Windows on the world: 2d windows for 3d augmented reality. In: UIST 1993: Proceedings of the 6th annual ACM symposium on User interface software and technology, pp. 145–155. ACM, New York (1993) 15. Bowman, D.A., North, C., Chen, J., Polys, N.F., Pyla, P.S., Yilmaz, U.: Information-rich virtual environments: theory, tools, and research agenda. In: VRST 2003: Proceedings of the ACM symposium on Virtual reality software and technology, pp. 81–90. ACM Press, New York (2003) 16. Bowker, G.C., Star, S.L.: Sorting Things Out: Classification and Its Consequences (Inside Technology). The MIT Press, Cambridge (1999)
Face Recognition Technology for Ubiquitous Computing Environment Kanghun Jeong, Seongrok Hong, Ilyang Joo, Jaehoon Lee, and Hyeonjoon Moon School of Computer Engineering, Sejong University, Seoul, Korea
[email protected]
Abstract. In this paper, we explore face detection and face recognition algorithms for ubiquitous computing environment. We develop algorithms for application programming interface (API) suitable for embedded system. The basic requirements include appropriate data format and collection of feature data to achieve efficiency of algorithm. Our experiment presents a face detection and face recognition algorithm for handheld devices. The essential part for proposed system includes; integer representation from floating point calculation, optimization of memory management scheme and efficient face detection performance on complex background scene. Keywords: ubiquitous computing environment, face recognition, face detection, application programming interface, algorithm optimization.
1 Introduction In recent years, face detection and recognition technology has been developed rapidly. The need for biometric security system had increased for ubiquitous computing environment. Face recognition technology maintains high security level while providing convenience for both human and computer. Therefore, it is necessary to optimize the face recognition system by maintaining recognition accuracy while decreasing computational complexity. However, most of the hand-held devices are hard to satisfy such factors because of limited configuration of memory and CPU power. The major factors of the face recognition system are feature size and the processing efficiency. Generally, processing time is governed by geometric progression which is the dimensionality of feature data in the case of mobile-based devices. It is essential to optimize data structure while maintaining a reasonable recognition performance. In this paper, we explore various algorithms for face detection and recognition algorithms. Experiments include normalization of a face database to increase recognition performance through various pre-processing methods. We propose a novel algorithm for automatic face detection and post-processing method to further improve the face recognition performance. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 365–373, 2009. © Springer-Verlag Berlin Heidelberg 2009
366
K. Jeong et al.
2 Sejong Face Database (SJDB) Generally, face recognition system requires face image data that used for learning and features are normalized and registered for face recognition process. These face image data are collected for some time intervals in various conditions [1]. We have collected Sejong face database for this experiment and images are called ‘session’ images. There are several session images in the Sejong face database. A set of images of each person are used for learning and other sets are used for face recognition system. A session contains three images for each person (Figure 1).
Fig. 1. Sejong face image database (SJDB)
The session images of each person are collected one or two images a day in the same physical setup such as pose and lightning. Session images of N different people are collected without distinction of sex. In this way, we collected 100 different ‘session’ about 70 male and 30 female. Each session contains three frontal images which were collected with similar illumination condition (170~220 lux) [2][3].
3 Face Detection and Face Recognition For any face recognition system noise-data is a primary factor of degradation. Therefore, the process for remove noise is essential to improve recognition accuracy [3][4]. This process is called pre-processing and it begins with geometric transformation (rotation, translation, and rescaling). The correction of the geometric transformation uses the intermediate point of the eyes and mouth. We remove the unnecessary data of the outer face region using an oval mask. These processes can be more accuracy controlled as shown in Figure 2. The pre-processed image is properly modified with several feature point (eyes, nose, mouth) to have equal dimension regardless of personal face size [5] (Figure 3). The face images for this experiment has horizontal versus vertical ratio of 3:4. As a result, face images are modified to 1200*1600, 120*160, and 30*40 size.
Face Recognition Technology for Ubiquitous Computing Environment
367
Fig. 2. Pre-processing with the mask and three points information
Fig. 3. Sejong face database (SJDB) and pre-processed face data
The pre-processing time is shown in Table 1 which was performed based on Intel Pentium 4 computer with 3.0GHz and 1GByte configuration. In this paper the preprocessing was performed with fully automatic setup with Adaboost algorithm [6][7]. Table 1. Pre-process time for one face data Image size Processing time (ms)
1200 * 1600 1639.75
120 * 160 449.02
30 * 40 414.24
Understanding it in Table 1., if image size reduced ratios of the decrease of time are different. Time when it is necessary to preprocess size of 120 * 160 and an image of 30 * 40 has few differences, the number of image pixel used for face recognition has many differences. Data used for collecting recognition with much number of pixels increase, and recognition performance improves. The image with size of 120 * 160 does the best performance reference of Table 1.
368
K. Jeong et al.
A face image is converted into a pre-processed image for face recognition with normalized pixel value as well as geometric transformation. In this experiment, we have applied series of algorithms including histogram equalization and filtering. As for the image, noise data was removed and the feature data is emphasized. In addition, intensity information of the image was redistributed through histogram equalization. The structure of our face recognition system is shown in Figure 4.
Fig. 4. Structure of proposed face recognition system
We have used principal component analysys (PCA) [8] and linear discriminant analysis (LDA) for feature extraction [9]. As for the feature data, the change of recognition rate is presented with different cutoff ratio (percentage of feature vectors used) as shown in Table. 2. We set the cutoff ratio at 80% and designed our face recognition system which we considered as most appropriate performance. We have measured the distance between feature vectors based on L2 norm (Euclidian distance). Table 2. A face recognition result by the change of image size and cutoff ratio Image size Cutoff
100 % 80 % 70 % 50 %
1200 * 1600
120 * 160
30 * 40
96 % 97 % 94 % 91 %
98 % 98 % 95 % 87 %
97 % 96 % 91 % 82 %
The choice of the automatic face detector was based on Adaboost algorithm which was trained using FERET [10], XM2VTS [11], CMU PIE [12] and Sejong face database. The size of learning data is using face (30x30), eyes (10x6) and mouth (20x 9) pixels (Figure 5). The 3,513 positive data and 10,777 negative pieces of image data are collected and used (Figure 6). As a result, we succeeded in the face detection from near frontal face images ranging from 0 to 15 degrees. Figure 7 is the structure of the Adaboost classifier used in this experiment [7].
Face Recognition Technology for Ubiquitous Computing Environment
Fig. 5. Negative training data
Fig. 6. Positive training data
Fig. 7. Adaboost classifier
369
370
K. Jeong et al.
For learning the face detection data using Adaboost algorithm, negative (wrong) data is equally important as positive (correct) data. After appropriate training, the face detector can locate feature points (the center of each eyes and the center of the mouth) from a various rotated angle and front face image. After pre-processing of face image, face recognition takes about 7 fps (frame per second) which is reasonable for realtime processing. The total processing time requires 1.1 fps (frame per second) including the face detection. Face detection and face recognition process show rate of 87% detection and 95% recognition percentage each.
4 Face Recognition Application Programing Interface (API) We have implemented four major face recognition functions into application programming interface (API) to performed general face recognition experiment. API is composed of face detection, face recognition, face similarity, and face evaluation module designed based on C/C++ as shown in Table 3. Table 3. API for Face Recognition Function Modules Function Face Detection Face Recognition Face Similarity Face Evaluation
Description Face region and feature point detection for input image After pre-processing, face recognition on detected face image. Face similarity calculation between probe image and gallery image. Evaluation of face detection and face recognition algorithms
Generally, face recognition algorithms are expressed by normalized numerical value. Such real number based algorithms are used for graphical algorithms such as image processing, 3D rendering, etc. There exists noticeable performance drops between floating and integer number based calculation. Moreover, this difference appears greatly on embedded system without floating point unit (FPU). Since face recognition API use floating point based calculation algorithm, performance degradation can be significant. In order to reduce these differences, integer number type was used with fixed point method which is a numerical analysis method to store fraction's part of the decimal and the integer number separately. Fixed-point method was used for API to produce faster than existing program with floating algorithm. We designed fixed decimal point algorithm is designed to perform faster than floating point algorithm. There is proved measurement of a runtime through simple C program in Figure 8. Integeral fragmentary performance can be verified through profiling. In practice, performance can be evaluated through integeral conversion which uses fixed point algorithm. Face image data was used for performance evaluation with 120x160 pixels. Evaluation unit compare each module during the process. Change of total arithmetic velocity depends on integeral transform considerably.
Face Recognition Technology for Ubiquitous Computing Environment
Fig. 8. Function profiling result (performance increase by the fixed-point use) Table 4. 7,000,000 times of profiling repetition results Intel ® Core™2 Quad Q9300 Integer number Addition 62.1 Subtraction 78.2 Multiplication 113.3 Division 311.2
Floating number 89.1 87.6 447.2 490.5
Double number 255.1 277.6 997.1 817.7
371
372
K. Jeong et al. Table 5. 1,000,000 times of profiling repetition results
Intel ® Pentium – 233 Addition Subtraction Multiplication Division
Integer number 39.246 38.242 39.156 197.004
Floating number 38.661 38.506 528.966 579.077
Double number 40.044 40.400 528.861 574.697
This fact is that because additional arithmetic is contained through integeral transform, and performance improvement is gauranteed if it can be applied on embedded system as shown in Table 4 and Table 5.
5 Conclusion The face recognition system is essential in multiple biometric system because it provides the only non-contact biometric features and user friendly information which can be processed by both human and computer. This is very important feature in case of communication problem caused by network failure which prohibits database access for border control applications. Recently, face detection and recognition technology can be build into a embedded system for closed-circuit television (CCTV) related applications. But this system should respond to numerous conditions including various pose and lighting which is challenging for face recognition system. Generally, a face recognition system contains two major function which is face detection and face recognition. We produce several face detection and recognition algorithms into application programming interface (API) for various biometric applications. We present performance evaluation for integral transformed calculation which is optimize for embedded system suitable for ubiquitous computing environment. We have trained face detection and face recognition modules based on facial recognition API using Sejong face database. Proposed face detection and face recognition modules are integral transformed which can be calculated faster, and optimized for embedded system. Our experimental result shows that we can minimize the run-time by decreasing computational complexity while maintaining reasonable accuracy which is applicable for ubiquitous computing environment. Acknowledgement. This work was supported by the Seoul R&BD Program (10581).
References 1. Papatheodorou, T., Rueckert, D.: Evaluation of 3D Face Recognition Using Registration and PCA. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 997–1009. Springer, Heidelberg (2005) 2. Tusk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3, 71–86 (1991)
Face Recognition Technology for Ubiquitous Computing Environment
373
3. Moon, H., Phillips, P.: Computational and Performance Aspects of PCA-Based FaceRecognition Algorithms. Perception 30, 303–321 (2001) 4. Sim, T., Baker, S., Bsat, M.: The CMU Pose, Illumination, and Expression Database. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(12), 1615–1618 (2003) 5. Phillips, P., Moon, H., Rauss, P.: The FERET Evaluation Methodology for Face Recognition Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090–1104 (2000) 6. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, pp. 511–518 (2001) 7. Viola, P., Jones, M.: Robust Real-time Object Detection. In: Second International Workshop on Statistical and Computational Theories of Vision, July 13 (2001) 8. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3, 71–86 (1991) 9. Yambor, W.S.: Analysis of PCA-based and Fisher Discriminant-based Image Recognition Algorithms. Technical report, CSU (June 2000) 10. FERET Database. NIST (2001), http://www.itl.nist.gov/iad/humanid/feret/ 11. XM2VTS Database, http://www.ee.surrey.ac.uk/CVSSP/xm2vtsdb/ 12. Sim, T., Baker, S., Bsat, M.: The CMU Pose, Illumination, and Expression Database. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(12), 1615–1618 (2003)
Location-Triggered Code Execution – Dismissing Displays and Keypads for Mobile Interaction W. Narzt and H. Schmitzberger Johannes Kepler University Linz, Altenbergerstr. 69, A-4040 Linz, Austria {wolfgang.narzt,heinrich.schmitzberger}@jku.at
Abstract. Spatially controlled electronic actions (e.g. opening gates, buying tickets, starting or stopping engines, etc.) require human attentiveness by conventional interaction metaphors via display and/or keystroke at the place of event. However, attentiveness for pressing a button or glimpsing at a display may occasionally be unavailable when the involved person must not be distracted from performing a task or is handicapped through wearable limitations (e.g. gloves, protective clothing) or disability. To automatically trigger those actions just at spatial proximity of a person, i.e. dismissing displays and keypads for launching the execution of electronic code in order to ease human computer interaction by innovative mobile computing paradigms is the main research focus of this paper. Keywords: Location-Triggered Code Execution, Natural Interaction Paradigms.
1 Introduction Currently available mobile location-based communication services enable their users to consume geographically bound information containing static text, images, sound or videos. Having arrived at previously prepared spots people are provided with information about the next gas station, hotels or sights of interest. More sophisticated variants of mobile location-based services also include dynamic links to locally available content providers. They additionally reveal the current gas price, vacancy status and reroute users to the online ticket service for tourists. Recently recognizable trends even consider individual user profiles as contextual constrains for supplying personalized information and as a technique to counteract spam and to selectively address content to specific user groups. However, the potentials of mobile interaction are far beyond being exploited, considering limiting factors preventing users from attending to the information screen of their mobile device e.g. while driving in a car. Active interaction may also be hindered when people are handicapped or requested to wear gloves, safety glasses or protective suits in order to perform a working task. What is the use of perfectly filtered, personalized dynamic information when the addressee is not able to perceive or react to it? We expect mobile services to support the users in their tasks by automatically triggering (personally authorized) electronic actions just at spatial proximity of approaching users without the needs of glimpsing at displays, typing, clicking or pressing buttons. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 374–383, 2009. © Springer-Verlag Berlin Heidelberg 2009
Location-Triggered Code Execution – Dismissing Displays and Keypads
375
As a consequence, mobile location-based services are not solely regarded as information providers but also as action performers. The context location in combination with personalized access privileges and further quantifiably sensory input are the triggers for opening gates, automatically stopping engines in danger zones or validating tickets at entrance areas. Hence, people are able to continue their natural behavior without being distracted from their focused task and simultaneously execute an (assumed incidental but necessary) action. The users' mobile devices enabling locationtriggered code execution remain in their pockets.
2 Related Work In this paper, we are primarily concerned with intuitive human computer interaction in mobile computing scenarios derived from the spatial context of the respective user. In this regard, the notion "context" has been issued in numerous publications and is widely thought of as the key to a novel interaction paradigm. In [1] Dourish analyzed the role context plays in computer systems and claimed that future computing scenarios will move away from the traditional single user desktop applications employing mouse, keyboard and computer screen. In [2] the usage of context in mobile interactive systems has been analyzed. Dix et al. determined the relevance of space and location as fundamental basis for mobile interaction. [3] studies the user's needs for location-aware mobile services. The results of the conducted user evaluation highlight the need for comprehensive services as well as for seamless service chains serving throughout a user's mobile activity. To achieve broad user acceptance mobile computing research is confronted with the issue of seamless transitions between the real and the digital world [4] without distracting the user's attention [5]. Modern solution approaches use Near Field Communication (NFC) for contactless initiated actions following the same objectives of dismissing the conventional display and key-controlled interaction paradigm in order to claim a minimum of attention for performing an action at a place of event (e.g. SkiData - contactless access control in skiing areas through RFID). However, the disadvantage in this solution lies within the fact that every location which is supposed to trigger an electronic action has to consider mandatory structural measures for engaging the NFC principle. Beyond, a remaining part of attention is still required as users are supposed to know the position of the NFC system and bring up the RFID tag or reader (depending on which part of the components carries the reading unit) close to the system for proper detection. Regarding the structural measures for implementing NFC, this technology is only marginally applicable causing financial and environmental impairments. In earlier research on context with respect to HCI the notion implicit interaction appears [6]. Implicit interaction denotes that the application will adapt to implicitly introduced contextual information learned from perceiving and interpreting the current situation a user is in (i.e. the user's location, surrounding environment, etc.). In [7] this definition has been refined and split up separating between implicit input and implicit output. Implicit input is regarded as action performed by a human to achieve a specific goal while secondary captured and interpreted by a system. Implicit output is described as not directly related to an explicit input but seamlessly integrated with
376
W. Narzt and H. Schmitzberger
the environment and the task of the user. The approach presented in this paper proposes a means of combining implicit input and output to achieve a minimum of user distraction. Up to now, HCI research already offers numerous contributions on implicit interaction from an application point of view. In [8] the usage of accelerometer data attached to digital artifacts has been exploited to implicitly grant access to a room. The Range Whiteboard [9] explores implicit interaction on public displays supporting co-located, ad-hoc meetings. Similar to this work interactive public ambient displays are explored towards interaction in [10]. Here, the contextual focus lies on body orientation and position cues to determine interaction modes. In [11] the personalization of GIS area maps is realized by monitoring users' implicit interaction with maps. All these works on implicit interaction strongly focus on the primary role of the user in the interaction process and the modalities of interacting with ubiquitous computers respectively digital artifacts. Spontaneous interaction triggered upon physical proximity was further studied in numerous works [12] [13] [14]. These approaches share the aspect that radio sensors are used to determine mutual proximity between smart artifacts and humans. The simplest form of smart artifacts are Smart-Its [15], small computing devices that can be attached unobtrusively to arbitrary physical objects in order to empower these with processing, context-awareness and communication. Smart-Its are designed for ad hoc data exchange among themselves in spatial proximity. In [16] Gellersen et al. underlined the importance of awareness of the environment and of the situation for inferring behavior of mobile entities. Many researchers have focused on identifying smartness in mobile systems. As for the reason of usability this paper focuses on implementing smart environments rather than smart tools or appliances. Key issues of such smart environments have already been discussed recently. The ReMMoC system [17], a web service based middleware solution, deals with the problem of heterogeneous discovery and interaction protocols encountered in mobile applications. In [18] interaction with physical objects is supported by a web service backend system providing mobile phone users with access to associated services. Common to most solutions for mobile interaction is the usage of spatial context. Zambonelli et al. presented the spatial computing stack [19], a framework to overcome the inadequacy of traditional paradigms and to accentuate the primary role of space in modern distributed computing. Generally, it represents virtual environments and their physical counterpart as a common information space for creating awareness among the participants.
3 Architecture Similar to mobile telecommunication services we propose a distributed provider model as the basis for realizing a common information space enabling worldwide unbounded mobile location-based communication services. This proven model allows users to join the provider of their choice and guarantees scalability of the service as each provider only handles a limited number of clients [4]. Every provider stores a set of geographically linked information in appropriate fast traversable geo-data structures (e.g. r-trees) containing hierarchically combinable content modules (which we
Location-Triggered Code Execution – Dismissing Displays and Keypads
377
call gadgets) for text, pictures videos, sound, etc. The name gadget already refers to a possible activity within a module and is the key for a generic approach of integrating arbitrary electronic actions to be triggered automatically on arriving users. The main focus of designing an architecture for location-triggered code execution is high extensibility to third-party systems, for the number and variety of non-recurring electronic possibilities is unforeseeable and simultaneously enriches the potentials of such a service. Fig. 1 illustrates the common principles of a flexible component architecture which enables fast connections to third party systems:
Fig. 1. Extensible Component Architecture for Location-Triggered Code Execution
The basic technical approach is a client-server model where clients repetitively transmit their own (commonly by GPS-based) position to a server (1) which evaluates the geo-data considering visibility radiuses and access constraints (2) and transmits the corresponding results back to the clients (3). Generally, when the transmitted information contains conventional gadgets as text and pictures, it is immediately displayed on the output device of the client (4). The basic idea for executing code is to use the gadget metaphor and store executable code inside instead of text or binary picture data (smart gadgets). Whether this piece of code is executed on the client or the server is primarily irrelevant for the paradigm of automatically triggering actions at certain locations.
378
W. Narzt and H. Schmitzberger
However, where to execute the code is crucial for system compatibility and extensibility issues. The client as the executing platform brings up portability tasks at every new third-party connection, as there is possibly more than one system implementation for covering multiple mobile platforms. The server on the other hand would need an elaborate plug-in mechanism for covering new third-party connections and is then still faced with the problem of integrating code from varying third-party operating systems. In order to solve this conflict, we propose a web-service-based mechanism which is both effective and simple to extend: Smart gadgets do not actually contain executable code but a simple URL to a remote web-service which is the actual component to execute the code. Every third-party vendor provides a web-service and decides about the URL and its parameters on her own. When a client receives information containing a smart gadget, its URL is resolved in some kind of HTTP-request (5) which is handled internally (6) and finally triggers the desired electronic action (7). A response back to the client (8)(9) can additionally be illustrated as a visual confirmation of the thirdparty system whether the action could have been executed successfully or not (10). This architecture comprises several advantages: • In order to execute location-based code clients just have to handle standardized HTTP-requests. A majority of currently utilized mobile platforms support these mechanisms. • The system does not run into compatibility or portability problems, for the executable code is exclusively run on the platform of the corresponding web-service. • Commonly, location-triggered actions are provided by third-party vendors (e.g. opening of garage doors, gates, starting or stopping of machines, etc.). Using a web-based approach, external systems can easily be linked without compiling the core system or adding plug-ins to it. • Most important for third-party vendors: Their internal data representations, servers and control units are hidden from the publically accessible location-based service, guaranteeing a maximum of data security for the vendors. Summarizing all those architectural thoughts, location-triggered code execution is easily achievable by using conventional (GPS- and wireless-enabled) devices and services and adding web-services via a smart gadget mechanism to them. Simple standardized HTTP-requests enable arbitrary integration of third-party systems without structural measures as they are e.g., mandatory for NFC systems.
4 Implementation In the course of a research project, the Johannes Kepler University of Linz, Siemens Corporate Technology in Munich and the Ars Electronica Futurelab in Linz have developed a novel location-based information and communication service for mobile devices facing the challenges of natural interaction triggered by geographical closeness without display and keypad. It enables users to arbitrarily post and consume information in real locations for asynchronous one-to-many or one-to-any communication having time-driven and contextual perceptibility; and it provides a framework interface for extending the functional range of the service, especially for adding new smart elements by third party vendors.
Location-Triggered Code Execution – Dismissing Displays and Keypads
379
Fig. 2. Location-Based Client for Cell Phones and PDAs
Fig. 3. Up-to-date Lecture Information at the campus of the University of Linz
The server component of our proposed architecture (as sketched in Fig. 1) has been implemented as a multithreaded C++ application capable of providing different kinds of gadgets regarding the user’s current position. Localization is selectively accomplished via GPS, WLAN triangulation, RFID- or Bluetooth-based positioning. In order to guarantee multi-platform compatibility, the client component uses a slim J2ME system core supporting various mobile platforms including cell phones, PDAs and conventional notebooks. Data transmission is implemented for the most
380
W. Narzt and H. Schmitzberger
common wireless communication techniques (GPRS, UMTS, WLAN, Bluetooth). Fig. 2 shows snapshots of the client software for mobile phones (left) and PDAs (right). The web-based third-party component can be deployed to external server systems and uses REST technology (Representational State Transfer, a client-server communication architecture) to offer access to and control of its internal set of functions. In order to prevent abuse of the service the component includes identification and authorization mechanisms, only granting access to registered users. The applicability of this framework architecture is currently being demonstrated in the course of a first reference implementation at the campus of the University of Linz, available for students, academics and maintenance staff. At the moment, the campuswide deployment of third-party components is restricted to solely embedding dynamic content from several university-related information systems (e.g. event management, study support system, lecture room occupation plan). However, the mechanism already follows the principle of location-triggered code execution as proposed in chapter 3. For instance, students are able to obtain up-to-date lecture information at special proximity to respective auditorium halls (see Fig. 3).
5 Fields of Application The project described in the previous chapter has already attracted potential customers from industry and the consumer market, who have already expressed their interest in adopting our service in their business fields. Due to the manifold fields of interested industry segments, we could identify four different types of relevant locationtriggered actions: 1. 2. 3. 4.
Actions that should be executed when users approach at a geographical point. The execution is due to entering an area (e.g. a room) Also the opposite is valid for location-triggered actions (e.g. leaving a room) For certain places, users are supposed to reside for a predefined period of time before actions are executed.
All those examples can additionally be enriched by considering the current heading of a user, i.e., from which direction is the user approaching a point or entering or leaving an area? The following use-cases demonstrate examples of (already implemented and planned) location-triggered actions validating the functional scope and the extensibility of the system: To start with, a common area of application for triggering actions at points of arrival is derived from logistics requirements: Carriers arriving at their designated destination automatically engage the process of loading or unloading cargo controlling e.g., local conveyor belts, and affecting storage management software for altering working procedures. Another use-case for location-triggered actions on entering an area has already resulted in a real business scenario: A producer of golf carts is equipping his vehicles with a location-based information system displaying overviews on the players' current position at the golf course revealing distances to holes and obstacles. When players
Location-Triggered Code Execution – Dismissing Displays and Keypads
381
Fig. 4. Alerting or stopping Engines when driving on Fairways or Greens
try to drive on forbidden fairways and greens (marked in light green in Fig. 4) the system automatically alerts to the operator or even automatically stops engines. Leaving an area may be interesting for power consumption issues. A household being aware of two persons living in it and recognizing the dwellers' positions leaving a selected region around the property triggers electronically controllable units (e.g. lights, central heating, door lock, etc.) to be switched off or to be locked in order to decrease power consumption. In contrast to existing smart power consumption solutions, location-triggered code execution does not need any additional sensory gadgets for context recognition and an electronic backbone to keep them working [20]. The personal mobile device which is powered on anyway is the only gadget required to be turned on for enabling power consumption. Concerning the application field of industrial security mechanisms where people must leave contaminated zones after strict time slots, the fourth type of triggering actions can be applied: The maintenance staff is restrained by protective clothing needs to focus on their primary repairing tasks and is unable to monitor security displays. The alarm automatically signalizes upcoming hazardous situations to each worker individually and to the supervising operators.
6 Conclusion and Future Work Location-triggered code execution enables a variety of innovative interaction mechanisms, neither distracting the users' attention from their currently performed tasks nor requiring structural measures for implementing it. For the initiation of electronically controlled actions users can abstain from conventional interaction techniques using displays and keystrokes. Solely, their physical presence is the trigger for real events. Users are simply requested to carry the enabling infrastructure, i.e., a mobile, wireless communication device equipped with some kind of tracking technology in their pockets. Currently available cellular phones and PDAs already fulfill these technical requirements and are suited as client devices for instantiating location-triggered code execution. Generic extension to external systems via web-services is the key for implementing a manifold of application scenarios by third-party vendors. Without interfering into the core system, new electronic functions can be adopted by using simple, standardized methods widening the palette of applications unboundedly.
382
W. Narzt and H. Schmitzberger
Our prototype at the University of Linz already shows the applicability of the proposed concept focusing on eliminating key strokes for mobile computing interaction. However, some interaction modalities still depend on the use of arbitrary forms of display metaphors (visual, acoustic, haptic), yet. Future work will comprise further studies on coupling location context with triggering actions in order meet the paradigm of display- and keyless mobile interaction.
References 1. Dourish, P.: What we talk about when we talk about context. Personal Ubiquitous Computing 8(1), 19–30 (2004) 2. Dix, A., Rodden, T., Davies, N., Trevor, J., Friday, A., Palfreyman, K.: Exploiting space and location as a design framework for interactive mobile systems. ACM Trans. Comput.Hum. Interact. 7(3), 285–321 (2000) 3. Kaasinen, E.: User needs for location-aware mobile services. Personal Ubiquitous Comput. 7(1), 70–79 (2003) 4. Narzt, W., Pomberger, G., Ferscha, A., Kolb, D., Müller, R., Wieghardt, J., Hörtner, H., Haring, R., Lindinger, C.: Addressing concepts for mobile location-based information services. In: Proceedings of the 12th International Conference on Human Computer Interaction HCI 2007 (2007) 5. Ishii, H., Ullmer, B.: Tangible bits: towards seamless interfaces between people, bits and atoms. In: CHI 1997: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 234–241. ACM, New York (1997) 6. Schmidt, A.: Implicit human computer interaction through context. Personal Technologies (2000) 7. Schmidt, A., Kranz, M., Holleis, P.: Interacting with the ubiquitous computer: towards embedding interaction. In: sOc-EUSAI 2005: Proceedings of the 2005 joint conference on Smart objects and ambient intelligence, pp. 147–152. ACM, New York (2005) 8. Antifakos, S., Schiele, B., Holmquist, L.E.: Grouping mechanisms for smart objects based on implicit interaction and context proximity. Interactive Posters at UbiComp 2003 (2003) 9. Ju, W., Lee, B.A., Klemmer, S.R.: Range: Exploring implicit interaction through electronic whiteboard design. Technical report, Stanford University, HCI Group (2006) 10. Vogel, D., Balakrishnan, R.: Interactive public ambient displays: transitioning from implicit to explicit, public to personal, interaction with multiple users. In: UIST 2004: Proceedings of the 17th annual ACM symposium on User interface software and technology, pp. 137–146. ACM Press, New York (2004) 11. Weakliam, J., Bertolotto, M., Wilson, D.: Implicit interaction profiling for recommending spatial content. In: GIS 2005: Proceedings of the 13th annual ACM international workshop on Geographic information systems, pp. 285–294. ACM, New York (2005) 12. Ferscha, A., Mayrhofer, R., Oberhauser, R., dos Santos Rocha, M., Franz, M., Hechinger, M.: Digital aura. In: Advances in Pervasive Computing. A Collection of Contributions Presented at the 2nd International Conference on Pervasive Computing (Pervasive 2004), Austrian Computer Society (OCG), Vienna, Austria, April 2004, vol. 176, pp. 405–410 (2004) 13. Kortuem, G., Schneider, J., Preuitt, D., Thompson, T.G.C., Fickas, S., Segall, Z.: When peer-to-peer comes face-to-face: Collaborative peer-to-peer computing in mobile ad hoc networks. In: Proceedings of the First International Conference on Peer-to-Peer Computing, P2P 2001 (2001)
Location-Triggered Code Execution – Dismissing Displays and Keypads
383
14. Brunette, W., Hartung, C., Nordstrom, B., Borriello, G.: Proximity interactions between wireless sensors and their application. In: WSNA 2003: Proceedings of the 2nd ACM international conference on Wireless sensor networks and applications, pp. 30–37. ACM Press, New York (2003) 15. Gellersen, H., Kortuem, G., Schmidt, A., Beigl, M.: Physical prototyping with smart-its. IEEE Pervasive Computing 3(3), 74–82 (2004) 16. Gellersen, H.W., Schmidt, A., Beigl, M.: Multi-sensor context-awareness in mobile devices and smart artifacts. Mob. Netw. Appl. 7(5), 341–351 (2002) 17. Grace, P., Blair, G.S., Samuel, S.: A reflective framework for discovery and interaction in heterogeneous mobile environments. SIGMOBILE Mob. Comput. Commun. Rev. 9(1), 2–14 (2005) 18. Broll, G., Siorpaes, S., Rukzio, E., Paolucci, M., Hamard, J., Wagner, M., Schmidt, A.: Supporting mobile service usage through physical mobile interaction. In: Proceedings of the Fifth IEEE international Conference on Pervasive Computing and Communications, PERCOM (2007) 19. Zambonelli, F., Mamei, M.: Spatial computing: an emerging paradigm for autonomic computing and communication. In: 1st International Workshop on Autonomic Communication, Berlin (October 2004) 20. Ferscha, A., Emsenhuber, B., Gusenbauer, S., Wally, B.: PowerSaver: Pocket-Worn Activity Tracker for Energy Management. In: Adjunct Proceedings of the 9th International Conference on Ubiquitous Computing UbiComp 2007, pp. 321–324 (2007)
Mobile Interaction: Automatically Adapting Audio Output to Users and Contexts on Communication and Media Control Scenarios Tiago Reis, Luís Carriço, and Carlos Duarte LaSIGE, Faculdade de Ciências, Universidade de Lisboa
[email protected], {lmc,cad}@di.fc.ul.pt
Abstract. This paper presents two prototypes designed in order to enable the automatic adjustment of audio output on mobile devices. One is directed to communication scenarios and the other to media control scenarios. The user centered methodology employed on the design of these prototypes involved 26 users and is also presented here. Once the prototypes were implemented, a usability study was conducted. This study involved 6 users that included our prototypes on their day-to-day lives during a two-week period. The results of the studies are presented and discussed on this paper, providing guidelines for the development of audio output adjustment algorithms and future manufacturing of mobile devices. Keywords: Media Control, Communication, Automatic Volume Adjustments, Context Awareness, Hand-held Devices, User Centered Design, Contextual Evaluation.
1 Introduction Nowadays, mobile devices are strongly integrated in peoples’ lives. The ubiquitous nature of these devices enables humans to use them in an enormous variety of contexts, which are defined by various sets of dynamic characteristics (contextual variables) that heavily affect user interaction. However, the differences amongst users (e.g. preferences, capabilities) and the frequent context mutations, which occur during device utilization (e.g. moving from a silent to a noisy environment), usually result on users’ adaptation to both contexts and interfaces available, and not the other way around. Most mobile user interfaces are unable to adapt effectively and automatically to the mutations of their utilization contexts, introducing difficulties on user interaction and, many times, inhibiting it [1, 2, 3]. Accordingly, it is necessary to explore new approaches for user interface design and development, aiming usability and accessibility improvements on mobile applications. To achieve this, applications must be constantly aware of their utilization contexts and respective mutations, naturally providing users with adequate interaction modalities, combining and configuring them according to the contexts in which the devices are used. The work presented in this paper addresses the contextual adaptation of audio output on mobile devices, focusing communication and media control scenarios. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 384–393, 2009. © Springer-Verlag Berlin Heidelberg 2009
Mobile Interaction: Automatically Adapting Audio Output
385
Although there are solutions available for similar purposes, especially regarding communication scenarios, they consider only few, insufficient, dimensions of context: noise levels are used in order to adapt different aspects of audio output (e.g., ringtone volume, earphone volume) [4, 5, 6]. The personal dimension of context, which represents a non-trivial issue on user interface adaptation, is not addressed. Peoples’ preferences and capabilities are not considered. Nevertheless, nowadays, the available technology enables the transparent gathering of the information needed for this purpose [7, 8] and its correct employment may significantly increase usability and accessibility, consequently, improving user experience by assuring interaction‘s adequateness to both utilization contexts and user needs. This motivated the user centered design, development, and contextual evaluation of two context-aware prototypes. These prototypes gather noise levels from the surrounding environment, adapting audio output according to scenarios, user needs and environmental noise. They present a proof of concept that can be achieved through the utilization of the technology available on most mobile devices, and can be considered and improved on the manufacturing of new devices in order to overcome the limitations found by the studies conducted and presented on this paper. This paper starts by presenting and discussing the related work developed in this area. Following, it presents a strongly user centered design process that enabled the definition of a set of requirements, guidelines and design decisions that were considered during the prototypes’ implementation. Afterwards, it presents the prototypes created, and their underlying volume adjustment algorithms. It details and discusses the contextual evaluation of these prototypes, emphasizing the limitations and advantages of the algorithms created for audio output adaptations, as well as their users’ acceptance. Finally, the paper is concluded providing future work directions within this domain.
2 Related Work From simple headphones and headsets with physical noise canceling [9] to headphones and headsets that include very complex algorithms of noise elimination [10, 11], different solutions have been proposed and created in order to reduce the impact of noise on audio interaction. Firstly, it is important to emphasize that there are significant differences between noise canceling and noise-based automatic volume adjustments. While the first is of utmost importance in many scenarios (e.g. communicating inside a plane or close to a helicopter), it demands an aggressive noise elimination that is extremely complex and, consequently, extremely expensive, sometimes making the headphones themselves more pricey than regular mobile devices [9, 10, 11]. Accordingly, the latter presents a reliable and more affordable solution for contexts in which the environmental noise varies significantly, but not tremendously. Moreover, noise canceling solutions can become dangerous in several contexts. For instance, while a user is walking on the street, if he/she is completely inhibited from hearing the cars passing by, he/she can be injured. The work presented on this paper focus the contextual-based adaptation of volume in communication and media control scenarios. Regarding communication scenarios,
386
T. Reis, L. Carriço, and C. Duarte
Chris Mitchell developed an application that is capable of monitoring noise levels on the surrounding environment, consequently adjusting the ringtone volume of a mobile phone [4]. Regarding media control scenarios, there are several car stereos that adjust the volume according to the estimated noise generated by the car. Regarding both scenarios, Apple recently published a new patent with the US Patent Office [12]. They claim to show a new technology feature that may be included on their future products. Their concept is similar to the ones proposed on this paper: automatic volume adaptation on communication and media control scenarios. However, they envision the inclusion of sound sensor (an extra microphone) in order to capture noise, while we used only the hardware available on every mobile device including communication and media control capabilities. Moreover, this paper focuses the user centered design and contextual evaluation of such concepts. Finally, all the existent solutions consider noise as the only contextual variable affecting audio output preferences, which is, as we show on the following section, an incomplete assumption, especially in media control scenarios.
3 Early Design This section is dedicated to the early design stages of the two prototypes created. The design processes followed a strongly user centered methodology, involving 26 nonimpaired users: 16 male, 10 female, ages between 14 and 45 years old, familiar with mobile devices, especially mobile phones and mobile MP3 players. The users involved answered a questionnaire regarding important aspects related to the contextual adaptation of volume in both communication and media control scenarios. It was important to understand which context variables have an impact on users’ voluntary volume modifications, how the automatic volume modifications should be performed, and how useful the users’ believed the concepts proposed to be. The resultant information was carefully analyzed, culminating on a set of requirements, design decisions and guidelines, which were considered during the implementation of the prototypes. 3.1 Questionnaires Firstly, concerning users’ decisions regarding volume’s modification on the considered scenarios, it was important to understand which contextual variables have an impact on these decisions. Furthermore, it was necessary to quantify the impact of each variable on the different scenarios, defining a level of importance for each one of them. Accordingly, for the different scenarios, users were asked to rate the impact of a set of contextual variables on a scale from 0 (null impact) to 100 (great impact). The results (Fig. 1) indicate noise, with an impact of 100, as the only contextual variable affecting voluntary volume modifications on communication scenarios. Conversely, on media control scenarios, the variables influencing voluntary volume modifications are significantly more, and their impact considerably different. Noise remains the primary variable; however, its impact is reduced to 82. The task the user is engaged reveals an impact of 70 and interruptions generated by third parties an impact of 58. The remainder factors only apply to media control scenarios and are all related to specific characteristics of the media file being played (the song itself, album, artist,
Mobile Interaction: Automatically Adapting Audio Output
387
Fig. 1. User ratings: the impact of different contextual variables on voluntary volume modifications in both, communication and media control scenarios
and genre of music), presenting impacts between 30 and 45. Most users involved added that their emotional condition also influences voluntary volume modifications significantly, and can even influence the impact of the other context variables. Secondly, it was fundamental to understand users’ preferences regarding the automatic volume adaptation on the different scenarios considered. Two alternatives were implemented on a very simple application: gradual and direct volume adaptation. Fig. 2 a) depicts the differences between these alternatives. The gradual volume adaptation increases volume gradually (in approximately 0.8 seconds) from its current value (50 on the example provided) to the value suggested by the adaptation algorithm (80 on the example provided). On the other hand, the direct volume adaptation performs the same action in a quarter of that time (approximately 0.2 seconds).
a)
b)
Fig. 2. Example of gradual and direct volume adaptations (a) and user preferences (percentage) regarding the volume adaptation alternatives on both scenarios (b)
A small laboratory experiment was conducted with all the users involved, in order for them to experiment and be aware of the differences amongst these alternatives. The results of such experiment (Fig. 2 b)) show substantial differences regarding the appropriateness of the alternatives implemented. 96% of the users involved prefer the gradual volume adaptation on both communication and media control scenarios. Finally, it was in the best interest of our team to understand how useful users believed the automatic volume adjustments to be. The results (Fig. 3) indicate a strong user acceptance for the concept proposed on both communication and media control scenarios. On communication scenarios 50% of the population involved rated the
388
T. Reis, L. Carriço, and C. Duarte
concept very useful, 30% rated it useful, 20% rated it slightly useful, and none of the persons involved considered the concept useless. On media control scenarios 58% of the population involved rated the concept very useful, 38% rated it useful, 4% rated it slightly useful, and none of the persons involved considered the concept useless.
Fig. 3. Users’ opinions about the usefulness of the concept proposed on communication and media control scenarios (percentage)
When asked about the discrepancies on the answers for each scenario, users emphasized the fact that they usually avoid noise on communication scenarios by moving to a silent place after answering a call and realizing that the noise is affecting the communication. Moreover, the users who rated the concepts useful and very useful were the ones which used mobile devices including communication and media control capabilities more often, and on contexts with significant environmental noise mutations (e.g. street, subway), while the remainder users used these devices mostly at work and at home.
4 The Prototypes This section explains the noise monitoring process of the prototypes created, details the two automatic volume adjustment algorithms created, and the logging mechanisms implemented in order to ease the contextual evaluation of the prototypes. The mentioned prototypes are available for devices running Windows Mobile and were written in C#, using Microsoft’s .Net Framework. 4.1 Noise Monitoring Both prototypes created use the noise monitor available in [4]. This monitor gathers sample values from the device’s microphone, consequently calculating loudness values in order to adjust the ring tone volume on a mobile phone (on a 0 to 5 scale). The measure used to calculate loudness is root-mean-square (RMS). 4.2 Algorithms for Automatic Volume Adjustments The environmental noise was considered the primary context variable influencing users’ decisions regarding volume modifications on both communication and media control scenarios. However, as described on section 3, there are other context variables that have a significant impact on these decisions. Moreover, users’ hearing capabilities
Mobile Interaction: Automatically Adapting Audio Output
389
must also be considered. Accordingly, despite behaving differently, the two algorithms created take all these dimensions into account, sharing some principles. For both algorithms, the noise spectrum varies from 0 to 127.5 [4] and the volume spectrum varies from 0 to 100. Moreover, both algorithms can be configured by defining 4 parameters that are accessible to the users (Fig. 4): • Minimum: defines the minimum volume that can be set by the algorithm. This boundary is defined in order to avoid adaptations that are inconvenient for the users (e.g. setting the volume to low due to the absence of noise). • Maximum: defines the maximum volume that can be set by the algorithm. This boundary is defined for the same reason as the above mentioned (e.g. setting the volume to high due to excessive noise on the environment). • Sensibility: defines the coefficient dividing the noise spectrum into noise levels. For instance, if sensibility is defined to 3 and the noise spectrum’s range is 127.5, the spectrum is divided in 43 noise levels. • Volume Step: defines the increase or decrease of volume whenever the noise on the surrounding environment goes up or down one level, respectively.
Fig. 4. Configuration screen presenting default values of the algorithms
The only contextual variable that is directly monitored by our prototypes is the environmental noise. Nevertheless, the remaining context variables are also considered (e.g. preferences, hearing capabilities, third party interruptions, etc.). These are indirectly expressed by the users whenever they perform a voluntary volume modification. For instance, if for any reason a user is not satisfied with the volume automatically set by the algorithm, his/her natural behavior would be to manually set the volume according to his/her preferences. When this happens, the algorithm registers a user preference, which is composed by a pair noise/volume, modifying the adaptation table accordingly and registering it on a XML file for posterior use. The developed algorithms behave differently on such situations and will be further explained: Automatic Volume Adjustments on Communication Scenarios. This functionality is achieved through the utilization of a non-continuous preference based algorithm. The non-continuity derives from the constraints imposed by the scenario considered (the user is talking on the phone) and the device used to create the prototype, which includes only one microphone. Accordingly, the microphone used to communicate is the same one monitoring the environmental noise. The automatic volume adjustment is direct and effectuated based on the noise levels gathered immediately before the user answers the phone.
390
T. Reis, L. Carriço, and C. Duarte
The preference base emerges from the last voluntary volume modification performed during a call. On the end of the call, a preference entry is registered and the adaptation table is modified according to that preference. This only happens on the end of the call because the noise monitoring is stopped in the meantime, while the user might move between contexts with different values of environmental noise. Automatic Volume Adjustments on Media Control Scenarios. This functionality is achieved through the utilization of a continuous preference based algorithm. This algorithm was specifically designed for media control scenarios in which the users are wearing headphones. Accordingly, the sound produced by the media being played does not influence the noise monitoring process, enabling a continuous utilization of the device’s microphone in order to monitor noise. Therefore, the automatic volume adjustments are applied continuously and gradually, while the user is controlling the media. The preference base emerges from users’ voluntary volume modifications. When these take place, the algorithm assumes that the volume set by the user represents his/her preferences for the noise captured at that moment, overriding his/her previous preference. Accordingly, the table noise/volume defined by the algorithm’s sensibility and volume step is instantaneously modified in order for the noise registered to match the volume set, continuing the adaptation according to the modified table. 4.3 Logging Mechanisms The logging mechanisms were implemented in order to ease the evaluation process of the prototypes. These mechanisms enabled middle term studies to be conducted, removing the need for direct monitoring. Both prototypes created include these mechanisms and are able to register the users’ and the algorithms’ behaviors on a XML file. Every user action is registered and associated with a contextual stamp, which includes time and noise information.
5 Contextual Evaluation The prototypes created were evaluated through a strongly user centered procedure. The users selected to participate on this procedure were very familiar with mobile phones and media players. Moreover, there was a strong concern from our team on selecting users which used these devices on a broad variety of contexts (e.g. home, street, bus, subway, gym, etc.). These concerns emerged from the necessity of having a basis for comparison of our solutions with the existing technology, on several real contexts. Six users were involved: 3 male, 3 female, with ages between 18 and 35 years old. They used our prototypes on their day-to-day lives during a two week period. Accordingly, the logs gathered during the process represent the utilization of the prototypes on real contexts, under real, constantly mutating contextual constraints. In the end of the process, the users returned the utilization logs, which were carefully analyzed. Moreover, these users provided their feedback and opinions about the automatic volume adjustments, emphasizing situations where these were, and were not, satisfactory.
Mobile Interaction: Automatically Adapting Audio Output
391
5.1 Contextual Evaluation on Communication Scenarios Considering situations where the environmental noise decreases significantly during a call, all the users were slightly uncomfortable with the algorithm’s behavior, reporting that the volume became too loud and they had to manually set it down. Such situations occurred mostly on the beginning of the study, when the users would still bond to their natural behavior, moving to a more silent place after answering a call. However, as the users continued to use the prototype, they have all changed their behavior regarding this issue, moving to a more silent place only in situations where they didn’t feel comfortable discussing the topic of the call in front of other people. Regarding situations where the environmental noise doesn’t change significantly during a call and the topic of that call is not private, all the users were very satisfied with the algorithm. On situations where the environmental noise increased significantly during a call, users were very uncomfortable with the algorithm’s behavior. Such situations emerged mostly on public transportations (e.g. bus, subway). Users reported that the volume was too low, and that they had to manually modify it. Finally, on situations where the environmental noise was extremely loud, users reported that the maximum volume was not enough to maintain the conversation, suggesting the creation of alerts that advise users not to answer calls in such situations. 5.2 Contextual Evaluation on Media Control Scenarios Users reported several situations where they had to move between contexts with significantly different environmental noise values, without the need to manually modify the volume configuration. Such situations include going from home to the users’ workplaces, having to walk on the street, ride public transportations, get in and out of different buildings, etc. The log analysis corroborates these reports, showing no manual volume adjustments during long periods of time (until 2 hours), characterized by very discrepant environmental noise values. The only situations where users felt uncomfortable with the algorithm’s behavior were situations where they were interrupted by third parties, engaging conversations. They explained that on such situations the volume would start to increase (due to the increasing environmental noise generated by the conversation), culminating on a manual volume modification or on users removing their headphones. These situations are also corroborated by the logs gathered, where in some situations of increasing environmental noise the users manually set the volume to mute, and then back to its previous value. 5.3 Discussion The algorithm directed for communication scenarios was the one raising more questions. This occurred due to the non-continuity of the noise monitoring, implied by the hardware available on the device used, and the constraints of the scenario for which the prototype was developed (the only microphone available was being used to talk). However, the limitations of the algorithm, except the ones regarding the privacy of the topics being discussed during a call, could be overcome with the inclusion of another microphone on the mobile device. This microphone should be used in order to
392
T. Reis, L. Carriço, and C. Duarte
continuously capture noise, enabling continuous volume adjustments during the calls. Despite the limitations of the prototype created, users considered it better than their personal mobile phones, explaining that in the worst case scenarios their behavior was very similar to the one they had when using their phones: manually modifying the volume configuration. The continuous noise monitoring of the algorithm directed to media control scenarios revealed an excellent user acceptance in most contexts. However, third party interruptions generated noise, consequently leading to an automatic increase of volume, which resulted on uncomfortable situations for the users. This problem can be solved by separating speech from environmental noise as in [10] (not only the speech of the user using the device but also of the third parties engaged in conversations with this user). Nonetheless, such solution would clearly increase the complexity of the algorithms and the amount of hardware used. Overall, the studies conducted revealed that noise monitoring should be performed continuously, in order to enable accurate and continuous volume adjustments. There was a strong user acceptance of both prototypes created, and despite being able to modify all the algorithms’ parameters, the users involved on the evaluation procedures only personalized the maximum and minimum volumes, being very satisfied regarding the default sensibility and volume step of the algorithms. The utilization logs gathered also corroborate these affirmations.
6 Conclusion and Future Work In this paper we described the user centered design of two context-aware prototypes directed for communication and media control scenarios. These prototypes were built on top of a regular mobile device, including both communication and media control capabilities. They are capable of adjusting volume according to different aspects of the context in which they are being used, monitoring noise directly through the microphone, and considering user preferences and capabilities, which are expressed indirectly on the users’ voluntary volume modifications. Issues regarding the amount of contextual information directly monitored by the algorithms responsible for the volume modifications, are left untied and will be studied on our future work. The contextual evaluation of the prototypes revealed a strong user acceptance of the concept proposed, especially on media control scenarios. However, the studies conducted also point some issues that could not be overcome using only the hardware available on most mobile devices nowadays. Accordingly, the study also provides important information to be considered on the manufacturing of future mobile devices. Acknowledgments. This work was supported by LaSIGE and FCT through the Multiannual Funding Programme and individual scholarships SFRH/BD/44433/2008.
References 1. Barnard, L., Yi, J.S., Jacko, J., Sears, A.: Capturing the effects of context on human performance in mobile computing systems. Personal and Ubiquitous Computing 11(2), 81–96 (2007)
Mobile Interaction: Automatically Adapting Audio Output
393
2. Reis, T., Sá, M., Carriço, L.: Multimodal Interaction: Real Context Studies on Mobile Digital Artefacts. In: HAID 2008. LNCS, vol. 5270, pp. 60–69. Springer, Heidelberg (2008) 3. Schmidt, A., Aidoo, K.A., Takaluoma, A., Tuomela, U., van Laerhoven, K., Van de Velde, W.: Advanced interaction in context. In: Gellersen, H.-W. (ed.) HUC 1999. LNCS, vol. 1707, pp. 89–101. Springer, Heidelberg (1999) 4. Mitchell, C.: Mobile Apps: Adjust Your Ring Volume For Ambient Noise. MSDN Magazine (2008), http://msdn.microsoft.com/en-us/magazine/cc163341.aspx 5. Kumar, Larsen, Infotelimcithed, T.: Smart Volume Tuner for Cellular Phones. IEEE Wireless Communications (2004) 6. US Patent 7023984 - Automatic volume adjustment of voice transmitted over a communication device, http://www.patentstorm.us/patents/7023984/description.html 7. Baldauf, M., Dustdar, S., Rosenberg, F.: A survey on context-aware systems. International Journal of Ad Hoc and Ubiquitous Computing 2(4), 263–277 (2007) 8. Chen, G., Kotz, D.: A survey of context-aware mobile computing. Technical Report TR2000–381, Dartmouth College, Department of Computer Science (2000) 9. Review on noise canceling headphones, http://www.seatguru.com/articles/noise-canceling_review.php 10. The Jawbone Headset, http://eu.jawbone.com/epages/Jawbone.sf 11. The Boom Headset, http://www.theboom.com/ 12. Patent for automatic noise-based volume adjustments, http://appft1.uspto.gov/netacgi/nph-Parser? Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=20090022329.PGNR.&O S=DN/20090022329RS=DN/20090022329
Interactive Photo Viewing on Ubiquitous Displays*
†
Han-Sol Ryu1, Yeo-Jin Yoon1, Seon-Min Rhee2, and Soo-Mi Choi1 1
Department of Computer Engineering, Sejong University, Seoul, Korea
[email protected] 2 Integrated Media Systems Center, University of Southern California, USA
Abstract. This paper presents a method of showing photos interactively based on a user’s movements using multiple displays. Each display can identify the user and measure how far away he is using an RFID reader and ultrasonic sensors. When he approaches to within a certain distance from the display, it shows a photo that resides in his photo album and provides quasi-3D navigation using the TIP (tour into the picture) method. In addition, he can manipulate photos directly using a touch-screen or remotely using an air mouse. Moreover, a group of photos can be represented as a 3D cube and can be transferred to PDA for a continuous viewing on other displays. Keywords: photo viewing, distance-based interaction, multiple displays.
1 Introduction As digital cameras are becoming more popular, the demand for digital picture frames or digital photo frames that can store and display many photos is increasing. People want interactive photos and augmented photos to enrich their viewing experience, according to Darbari’s survey [1]. If useful facilities for user interaction are added to the digital photo frames, the displays can do more than simply show pictures; they can interact with users in various ways and can function as ubiquitous displays around the home, placed on convenient surfaces or attached to the wall. Several systems using a digital frame-type display have been studied in projects relating to ubiquitous health care. In the AwareHome project at Georgia Institute of Technology, indirect interactions with remote family members are facilitated using the Digital Family Portraits display [2]. This display looks like a picture, but can provide family members who live at a distance with information about their elderly relative’s everyday life, including their health, environment, relationships and activities. This information is summarized every week and gives the user a feeling that they are talking to distant family members through the frame. The CareNet display [3] developed by the Intel Research Center in Seattle enables users to access information directly by operating the menus of a touch screen, and it also allows images to be edited. These existing picture-like displays are not sufficient to give users enriched picture viewing experiences, because they only show pictures and permit only explicit *
This work was supported by the Seoul R&BD program (10581).
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 394–401, 2009. © Springer-Verlag Berlin Heidelberg 2009
Interactive Photo Viewing on Ubiquitous Displays
395
interactions, for instance through the menus of a touch screen [2-4]. In addition, these studies did not consider any continuous viewing across multiple displays. In this paper, we present a smart photo frame that allows a user to tour into a photo based on his movement. The photo interface is automatically changed according to the distance between him and the display. He not only can navigate the photo in 3D space, but he can also change its background space. Moreover, our system allows him to move photos to another display at home using PDA. The rest of this paper is organized as follows: Section 2 describes a hardware configuration of our system; in Section 3, interaction zones and user activities are explained; in Section 4, methods of generating a 3D space from a 2D photo are presented; in Section 5, a photo arrangement in 3D space is described; experimental results are given in Section 6 and lastly we draw some conclusions in Section 7.
2 A Smart Photo Frame The proposed photo frame consists of a touch panel, an LCD (15~32 inches), an RFID reader, ultrasonic sensors (2~4 sensors) and LEDs attached to the rear of the display, as shown in Fig. 1.
(a)
(b)
(c)
Fig. 1. Components of a smart photo frame; (a) a user wearing a RFID tag; (b) a small display (15-inch); (c) a large display (32-inch)
To identify a user, we use a 900MHz RFID reader (Infinity 210UHF: SIRIT) and a tag (Class 1, Gen 2 type). He wears a small tag that looks like a necklace or bracelet as shown in Fig. 1(a). When he approaches to within the reaction zone of the RFID reader, the RFID reader recognizes the tag and identifies him from its ID. A tag can be recognized reliably within 4.0m of the display with an external antenna. The RFID reader is able to identify several users simultaneously. In this case, the user whose tag responds most strongly to the RFID reader will be recognized as the primary user and receive the highest priority. Piezo-type ultrasonic sensors and an ultrasonic sensor board with a transmission and reception module are used to measure the distance between a user and the display. It also measures the direction of the user’s movement by ultrasonic sensors attached on the sides of the photo frame. For a small display, we attach two ultrasonic sensors on the left and right sides of the frame as shown in Fig. 1(b). However, we attach
396
H.-S. Ryu et al.
more sensors on the bottom of the frame for a large display, as shown in Fig. 1(c). The range of these sensors can be adjusted up to 7m. We also use the touch screen panel to operate icons on the display. Some detailed information can be gotten by touch interaction. In the event of an emergency, LEDs attached around the display call user’s attention by flashing the light.
3 Interaction Zones and User Activities The interaction zones corresponding to the proximity of a user to the display are shown in Fig. 2. We treat the space occupied by its user as having three zones like Vogel [5]. Fig. 3 depicts an activity diagram of our system in UML (Unified Modeling Language), which shows the flow of actions in the zones.
Fig. 2. Interaction zones corresponding to the user’s proximity
Fig. 3. An activity diagram showing the flow of actions
When he is outside all the zones, the photo frame functions as a simple picture frame and shows black-and-white pictures with little attention. In the display zone, ultrasonic sensors react to the presence of him and pictures are shown in color. Nevertheless, at this stage the photo frame does not demand excessive attention, although the display shows a high-level menu, with which he can interact using a mouse-like
Interactive Photo Viewing on Ubiquitous Displays
397
pointing device if required. In the implicit interaction zone, both the ultrasonic sensors and the RFID reader recognize him, and the image on the display becomes threedimensional. He can tour into the photo by movement-based interaction. In the touch interaction zone, he can operate menus directly using the touch screen, and access detailed information related the photos. If an urgent message is sent from the server system, it is displayed on the screen and the LEDs light up regardless of the proximity of him.
4 Interactive Photo Viewing Based on the User’s Movements When a user moves into the implicit interaction zone and the selected photo has 3D information such as a vanishing line, a 3D space is simulated using the TIP (tour into the picture) method [6,7]. This requires three input images: an original image, a background image in which foreground objects are removed, and a mask image in which foreground objects are colored white and the background is colored black.
Fig. 4. The 3D reconstruction of the background and the foreground objects using a vanishing line
Fig. 4 shows the 3D reconstruction of a photo using a vanishing line. To render the 3D background which consists of a ground plane (2', 4', 5', 6') and a back plane (1', 3', 5', 6'), we assume that the camera is positioned at the origin O, the viewing direction is +z, and the vertical direction is +y in 3D space. In Fig. 4, points 5 and 6 are the intersections of the vanishing line with the image boundary. Foreground objects are selected by means of the pre-processed mask image. These objects appear to be placed on a billboard that moves independently of the background, providing the user with a simple illusion of moving in 3D space. To get the 3D coordinates of the foreground objects, we computed intersection points of the billboard and the ground plane. In Fig. 4, the foreground object is represented as a quadrilateral with four points (p'1, p'2, p'3, p'4) and it is stood orthogonally on the ground plane. The coordinates of p'2 and p'3 are computed from the camera position and the points (p2, p3) on the image plane. After 3D reconstruction of the photo, the position and orientation of the virtual camera can be changed by the user’s movements, as shown in Fig. 5. When he approaches to the display, the camera moves toward the back plane in 3D space like zoom in, and when he moves right or left direction, the camera rotates the same direction like panning. In addition, when he moves forward to the display, the detailed information of the photo appears gradually such as date, names, places, etc. If he
398
H.-S. Ryu et al.
(a)
(b)
Fig. 5. Camera movements according to the user’s movements; (a) backward and forward movements; (b) left and right rotations
moves back it disappears. The level-of-detail technique for the detailed information can naturally attract his interest to the display without requiring excessive attention.
5 A Photo Arrangement in 3D Space When a user is in touch interaction zone, he can interact explicitly with the display using touch screen menus. He can categorize photos and select one to be shown on the display. Unlike other photo frames, our system displays photos in 3D space. Thus, we can give 3D effects on photos when they appear or disappear such as rotation, translation, flipping effects. As a good rule of thumb of object-oriented design, more than seven entities are not used at the same time for easy understanding. Here, we use a cube metaphor. As shown in Fig. 6, a group of photos is represented as a 3D cube, which can be moved in 3D space freely. Each cube contains from one to six photos. When it is touched, contained photos are automatically arranged on the screen. Using the menus on the left side, the selected photo can be zoomed in or zoomed out, and some cubes can be transferred to PDA to continue to see the photos while he is moving.
Fig. 6. An automatic photo arrangement in 3D space and touch interaction
6 Experimental Results Several experiments were conducted to investigate the effects of the proposed digital frame in different interaction zones. Fig. 7 shows how the screen is changed when he approaches to the display. As shown in Fig. 7(b), the detailed information gradually appears on the right side of the screen depending on the extent and direction of his
Interactive Photo Viewing on Ubiquitous Displays
399
movement. In our study, we assumed that one person navigates at once in the implicit interaction zone. After recognizing the primary user by the RFID, the ultrasonic sensors attached on the frame react to the user’s movements until it is clear that the user has left the implicit interaction zone. Fig. 8 shows the result of 3D navigation of a photo using our sensor-based interaction. (a)
(b)
Fig. 7. Backward and forward movements of the user
Fig. 8. 3D Navigation by sensor-based interaction
After reconstructing a 3D space from a photo, we can change its background image. Thus, some virtual photos can be created, as shown in Fig. 9. This can make people feel that they are traveling through some places that they have not visited before. It can add some fun effects.
(a)
(b)
(c)
Fig. 9. Virtual tour; (a) Sejong university in Korea (original photo); (b) Asakusa temple in Japan (virtual photo); (c) Opera house in Australia (virtual photo)
400
H.-S. Ryu et al.
In the touch interaction zone, several photos can be summarized into a 3D cube using a touch menu. If the 3D cube is double-touched, all photos in the cube are automatically arranged depending on the number and size of photos, together with 3D effects (See Fig. 10). In addition, users can flip the photos through a traditional album-like interface.
Fig. 10. The automatic arrangement of photos in 3D space
In order to see some photos while being away from the display or move them to another display, several 3D cubes can be transferred to PDA as shown in Fig. 11(a). The circled area shows the transferred 3D cube. To reduce the transmission time and display at a low resolution, we decreased the image resolution to 320×240 pixels. To see the photos again in large, he can approach to the previous or another large display at home, as shown in Fig. 11(b). The information of his current state is transferred from the PDA to the large display through the server, such as the name of cubes, the name of photos, etc. Thus, he can resume his interaction with the display.
(a)
(b)
Fig. 11. User interaction within the touch inter action zone
7 Conclusions We developed a smart photo frame to improve the quality of interaction not only by explicit feedback, but also by implicit feedback from the user. The proposed system reacts to user’s movements without requiring excessive attention from the user based on the concept of different interaction zones. Moreover, 3D navigation of a photo in the implicit interaction zone made the photo more memorable with a fun. The system
Interactive Photo Viewing on Ubiquitous Displays
401
also provided a partial migration service with which the user can change a device and continue the interaction.
References 1. Darbari, A., Agrawal, P.: Enliven Photographs: Bringing Photographs to Life. In: The 4th international symposium on ubiquitous VR (2006) 2. Mynatt, E., Rowan, J., Craighill, S.: Digital Family Portraits: Supporting Peace of Mind for Extended Family Members. In: Proc. of CHI 2001, pp. 333–340 (2001) 3. Consolvo, S., Roessler, P., Shelton, B.E.: The CareNet Display: Lessons Learned from an In Home Evaluation of an Ambient Display. In: Davies, N., Mynatt, E.D., Siio, I. (eds.) UbiComp 2004. LNCS, vol. 3205, pp. 1–17. Springer, Heidelberg (2004) 4. Molyneaux, D., Kortuem, G.: Ubiquitous displays in dynamic environments: Issues and opportunities. In: Proc. of Ubicomp 2004 (2004) 5. Vogel, D., Balakrishnan, R.: Interactive Public Ambient Displays: Transitioning from Implicit to Explicit, Public to Personal, Interaction with Multiple Users. In: Proc. of UIST 2004, pp. 137–146 (2004) 6. Horry, Y., Anjyo, K., Arai, K.: Tour Into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image. In: Proc. of ACM SIGGRAPH, pp. 225–232 (1997) 7. Kang, H.W., Pyo, S.H., Anjyo, K., Shin, S.Y.: Tour Into the Picture Using a Vanishing Line and it’s Extension to Panoramic Images. In: Proc. of Eurographics, pp. 132–141 (2001)
Mobile Audio Navigation Interfaces for the Blind Jaime Sánchez Department of Computer Science, University of Chile Blanco Encalada 2120, Santiago, Chile
[email protected]
Abstract. In this paper we present a set of mobile, audio-based applications to assist with the navigation of blind users through real environments. These applications are used with handheld PocketPC devices and are developed for different contexts such as the neighborhood, bus transportation, the Metro network and the school. The interfaces were developed with the use of verbalized and soundbased environments. The usability of the hardware and the software was evaluated, obtaining a high degree of acceptance of the sound and user control, as well as a high level of satisfaction and motivation expressed by the blind users. Keywords: blind navigation, orientation and mobility, mobile audio interfaces.
1 Introduction The problems faced by blind users in mobile contexts are diverse and nondeterministic. This makes it difficult for such users to make decisions on what routes to follow, resulting in movement with very little autonomy. Furthermore, blind users orient themselves in space by using straight angles, which does not allow them to develop a full representation of the real environment. One way to resolve this problem is by navigating through the use of a clock system 10. A clock system in combination with mobile technology can be a valuable alternative to help with the mobility and orientation of blind users. Having a mental map of the space in which we travel is essential for the efficient development of orientation and mobility techniques. As is well known, the majority of the information needed for the mental representation of space is obtained through the visual channel [5, 12]. For blind people, key environmental information is received through the spatial relations constructed by the remaining senses. Despite this limitation, the cognitive mapping skills of blind people are flexible enough to adapt to this sensory loss. Even the congenitally blind are able to manage spatial concepts and are competent navigators [3]. Some generic problems blind people have when moving about have to do with localization and their perception of the environment, as well as choosing and maintaining a correct orientation, and detecting possible obstacles [14]. Jacobson & Kitchin 4 point out that the most important problem for the blind has to do with their incapacity for independent navigation and interacting with the rest of the world. Also, exploration can lead to disorientation, which is accompanied by the fear, stress and panic associated with the feeling of being lost. There is also a risk associated with obstacles C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 402–411, 2009. © Springer-Verlag Berlin Heidelberg 2009
Mobile Audio Navigation Interfaces for the Blind
403
that cannot be detected by the body or with mobile aids such as the cane [13]. Ran et al. [8] propose that the main difficulty for the blind in the context of orientation and mobility is knowing where they are at any given time and which way they are going, and that reorienting themselves is especially complicated if they get lost.
2 Related Work There are several ways to help blind users achieve autonomous movement with the aid of mobile technology. One way is helping them with in situ assistive technologies in order to provide them with additional contextual information while they are moving; this is known as location technology. Such technology uses a variety of means such as RFID, IrDA, Bluetooth or WIFI, with which several solutions have been designed and developed to assist with the movement of blind users [2,6,7,8,15]. Some studies propose different modes of interaction for blind users who use mobile devices, which implies the implementation of entry modes that use tactile or voice commands, and outputs provided through verbal and/or iconic sounds [10,1]. Loomis et al. [6] has presented a study on the use of a GPS device that can guide a blind user in an outdoor environment. The synthesized voice-based system helped users to be able to identify the shortest route between two points. GPS does not work in indoor spaces, and in such contexts it is necessary to recur to other methods. Gill [2] presents a solution by using infrared technology standards (IrDa) that work as sensors to determine the user’s indoor location. The Drishti system, developed by Ran et al. [8], uses a combination of GPS for outdoor navigation and ultrasonic sensors for indoor navigation. One problem concerning GPS is the error associated with the measurements taken, more so when associated with a cloudy climate or if there are very tall buildings in the area. For the indoor environment, the blind user must carry two ultrasonic sensors that receive signals that are transmitted from different points in the rooms. With these signals it is possible to detect the location of the users by processing and analyzing the data. A grid with RFID technology (Radio Frequency Identification) informs us on the location and proximity of a user in a certain environment [15]. Combined with Bluetooth technology the reading apparatus sends data to the user’s handheld device or a cellular phone, which analyzes the information and indicates the user’s position. Finally, Na [7] proposes BIGS (Blind Interactive Guide System), a grid system that contains a group of RFID tags that are placed on the floor of a room, and an RFID reader carried by the user. In addition, this system is capable of constantly monitoring the movements of and the route taken by the user, thanks to communication via WIFI.
3 Methodology 3.1 Mobile Audio Interfaces The 4 software programs presented in this paper are oriented towards developing navigation (orientation & mobility) abilities and strategies in blind users through the use of a mobile PocketPC device: 1. aGPS is used to navigate a neighborhood, which is a space that they travel daily, but deficiently and not in all its magnitude. Also,
404
J. Sánchez
throughout their lives they may need to visit and navigate various unknown neighborhoods; 2. AudioTransantiago is a mobile application that provides assistance for using public transportation, particularly a bus; 3. mBN is mobile software that helps to move through and use the Metro network; and 4. MOSS is a system that provides necessary assistance for moving about without problems in an indoor environment such as a school or a specific building. aGPS. In the aGPS software the entry interface consists of 3 buttons on the PocketPC. The first button is used to enter the starting point of a path, which could be the first position entered when starting the software, or a location entered after having changed directions. The second button is used to ask the system for information. When the user presses this button, the Text-to-Speech engine (TTS) replies with the distance to and direction of the destination (using the clock system to express the direction). The third button is used to change the destination point. When the user presses this button, he/she navigates a circular list 11 with different destinations read by the TTS. The output interface is made up mostly of the TTS. As previously mentioned the TTS responds to the user’s requests when he/she presses a certain button. The only output provided to the user consists of the distance to and direction of the destination point, and the names of the destination points. The user is not provided with the routes to be taken. The user must decide the paths to follow in order to arrive at the destination. There is also a visual interface that provides information regarding the destination point, the distance to and the direction of the destination point at any given time. This interface is used to help the facilitators to be able to support blind users in their learning for navigation purposes. AudioTransantiago. AudioTransantiago stores contextual information on each stop of the Transantiago routes, from which the user chooses in order to plan his/her trips in advance. In addition, this software navigates the stops of a previously planned trip in order to strengthen orientation and facilitate the blind user’s journey. AudioTransantiago uses an audio-based interface consisting of a TTS engine and non-verbal sounds that help to identify the navigational flow within the application menus and through which it conveys information to the user. The interface is made even better by a minimalist graphic interface that includes only the name of the selection that is being used and the option that has been selected, including a strong color contrast (yellow letters over a blue background) that is useful for users with partial vision, but legally blind, who can only distinguish shapes when displayed as highly contrasting colors. Navigation through the software application’s different functions is performed through circular menus 11 and the use of the lower buttons on the PocketPC. The advantage of these menus is that they facilitate searches within the lists, which have a large number of different elements. The software application’s two operational functions can be accessed through this structure (planning a trip and making a trip), as well as their respective submenus, which are explained in the next section. mBN. The mBN, or mobile Blind Navigation, is a navigational system for use in a Metro network. The mBN software contents are presented in a hierarchy of menus
Mobile Audio Navigation Interfaces for the Blind
405
displayed on the screen and also as audio cues. A menu has a heading and a set of items; the number of elements in each set has to meet the cognitive usability load restriction of 7 ± 2 chunks of information. Menus can be defined as circular 11 or normal according to the way in which they are explored. When using mBN software, users have to execute commands through the touch screen of a PocketPC. The interface was designed and developed “with” and “for” users with visual impairments. The interaction is achieved with the corners of the PocketPC screen by joining adjacent corners. Thus the software registers, analyzes, and interprets the movements and jumps of the pointer. With this information, the software knows whether a command was executed. A blind user’s interaction with the touch screen is performed without the need for the pointer pen (stylus) by using touch to map the relief of the four corners needed to construct and execute a command. The information managed by mBN is represented internally by strings transmitted to the user via spoken audio texts and high contrast color text on the screen. A TTS engine performs the translation of the written information to audio speech messages. These messages are complemented by earcons for a higher degree of attention and motivation when interacting with the software. MOSS. This interface is mainly audio-based, in which information is provided through sound in two different ways. On one hand, iconic sounds (sound effects) associated with the different actions that the user performs (walk, navigate the menu, etc.) were used, which also provide contextual information (pass through a door, walk down a hall, bump into a wall, etc.). On the other hand, a TTS engine was used to provide information verbally (for the description of an element, or current position, etc.). One of the main actions that the user can perform is SoundTracing (ST). ST follows the metaphor that the individual emits a ray that detects all the objects that are in a certain direction, even if there are solid objects in the way. To generate an ST, the user must make a gesture on the touch screen of the PocketPC device that represents a line in the direction that he/she wants the ray to go. 3.2 Evaluation For each one of the mobile audio interfaces a usability evaluation was made in order to detect the level of acceptance for the applications and their potential for use. This was done to determine whether the users were able to interact with the PocketPC device by using the sound-based interfaces. It was expected that users would be able to recognize both entry and output means of interaction (buttons, screen and audio). Sample. The participants in the usability test for the aGPS software consisted of 4 users (two boys and two girls) with ages ranging from 11 to 13 years old, and all of who attended the Santa Lucia School for the Blind in Santiago, Chile. They had a variety of ophthalmic diagnoses and degrees of vision. The sample for the evaluation of the AudioTransantiago prototype consisted of 6 legally blind participants between the ages of 27 and 50 years old, all of who were residents of Santiago, Chile. They had a variety of ophthalmic diagnoses, 3 of them had partial vision and all were men.
406
J. Sánchez
The sample for the usability evaluation of mBN consisted of 5 people, aged 19 to 28 years old, from the Santa Lucia School for the Blind in Santiago, Chile. Four of them had partial vision and one was blind. It is important to mention that none of these users had any previous experience interacting with PDA devices. The sample to evaluate MOSS consisted of five children aged 8 to 11 years old, including three girls and two boys. Two of them attended a segregated school (fifth grade), while the rest attended integrated schools (3rd, 4th and 6th grade) and were held to the same standards as their sighted classmates. Of all the participating users, only one had partial vision (non-functional). On the other hand, the users did not present any additional deficit other than their visual disability. All of users were legally blind (totally blind and partial vision). For all the usability evaluation sessions, two special education teachers specializing in visual disabilities and one usability evaluation engineer participated. Instruments. For the usability evaluation of the aGPS, an End-User Usability Questionnaire was used that consisted of two parts: (1) A set of 24 closed questions with a scale of appreciation from 1 to 5; 12 questions regarding the software and 12 on the hardware, and (2) A set of 7 open questions that were extracted from Sanchez’s Software Usability Questionnaire 9. The questionnaires were read and explained by facilitators and answered by users. The usability evaluation of AudioTransantiago was performed by means of a Software Usability Questionnaire 9 adapted for adult users in the context of this study. This questionnaire includes 18 closed questions on specific aspects of the software’s interface, together with 5 more general, open-ended questions regarding trust in the system, the way the system is used, and the perceived sense of utilizing these devices as a way to help users travel on a bus system. The results obtained can be grouped into four categories: (1) Satisfaction, (2) Control and Use, (3) Sound, and (4) Image. To evaluate the usability of the mBN software, automatic data recording was used. This consisted of data structured in XML format that is internally stored by the software during the user’s interaction, registering data on every key used, the Metro stations taken, and the time used to perform each action. To support the data collection process for usability testing, complementary software was created (AnalisisSesion). This software checks the data recorded during mBN sessions (automatic data recording). The end user usability evaluation of MOSS focused on user acceptance, with questions on whether the user liked the software, which things he/she would change or add to the software, what use the software had for him/her, and other similar questions. These questions were based on Sanchez’s Software Usability Questionnaire 9. Each statement on the software was evaluated with a score from 1 (strongly disagree) to 10 (strongly agree). Procedure. Each usability evaluation was completed during two 60-minute sessions. In each session, the users interacted with the software for 25-30 minutes in order to evaluate the effectiveness of their interaction with the buttons and the PocketPC screen, control and use, and the clarity of the audio support. Each session involved the following steps: (1) Introduction to the software. The functions of the software application and its use through the PocketPC buttons were
Mobile Audio Navigation Interfaces for the Blind
407
explained to the participants. (2) Interaction with the software. The users tried out the software’s functions and the use of its buttons. At this point they also planned a trip as their final task. This trip was arbitrarily defined so as to be used in a later cognitive impact evaluation. (3) Documentation. Sessions were documented in detail through the use of digital photographs. (4) Evaluation. The Software Usability Questionnaire was administered. Based on the comments and recommendations the participants provided, the software was modified and redesigned in order to improve its usability. 3.3 Results Figure 1 shows the average scores obtained for the software and the hardware used. All scores are over 4.2 points, on a range that varies from 1 to 5 points. This means that the users’ evaluation of the software and hardware’s usability was quite satisfactory. These results are the same for the four dimensions of the software that were analyzed, in that all average scores are 4 or above, which indicates a high degree of user acceptance regarding each of the dimensions analyzed.
Fig. 1. Usability results of the software aGPS
The dispersion of the data is similar for both the software and hardware variables and the satisfaction and control & use variables. For software and hardware the standard deviation is between 0.348 (software) and 0.357 (hardware), with a kurtosis of 2.980 (software); 3.210 (hardware) and skew of 1.673 (software); 1.725 (hardware), which means that the evaluations of the software and hardware received very similar opinions, with a slight deviation towards higher scores. For control & use, the standard deviation is slightly lower than that for satisfaction (SD = 0.479 and 0.5 points respectively). The case of the image dimension is distinct, in that the degree of dispersion is far greater than that observed in the other dimensions (SD = 0.816). A kurtosis of -1.289 for control & use and of -3.901 for satisfaction was obtained, with skew of - 0.855 for control & use and skew of -0.37 for satisfaction. Users were able to construct a correct map of the software. Their mental models easily grasped the application. The usability data showed that the proposed interface was easy to use and easily understood by blind users. The usability questions for AudioTransantiago were evaluated on a scale ranging from 1 to 10 points, 10 being the highest. On average, the values obtained for all the items were quite satisfactory, obtaining an average of over 9 points for each item. The totally blind users assigned a score of 10 points for all the questions, while those users with partial vision assigned slightly lower scores (average of 9.02 points) (Fig. 2). As
408
J. Sánchez
can be seen in table 1, users assigned high scores to all 4 dimensions. The scores are higher than 9.2 points for all dimensions. The most highly evaluated dimension is image, although this dimension was only analyzed by three users who were not total blind. This dimension also has the lowest degree of dispersion among the answers, with a standard deviation of 0.577. The degree of dispersion increased slightly for control & use (SD=0.698), satisfaction (SD=0.816) and sound (SD=1.123). The control & use and sound dimensions obtained a kurtosis of -0.053 and – 1.646 points respectively. Satisfaction obtained a Kutosis of 2.774. The skews for all the dimensions were the following: -1.732 (images); -1.276 (control & use); -1.783 (satisfaction); -1.006 (sound). This means that the highest degree of agreement is reached in the satisfaction dimension, and a comparatively lower degree of agreement is reached for control & use and sound.
Fig. 2. Usability results of AudioTransantiago software
In the case of mBN software the usability evaluation sessions provided information that validated the event and sound feedback, the logic of the interface, the design, and the programming strategy. It also favored the improvement of the design and coding for the following milestones. Information was gathered on the time that a user needed to use the functions through the proposed input interface by dragging the pointer from one corner to another. The average time taken by the users for the different tasks assigned was 0.693 seconds, with a standard deviation that reaches 172.48 points. The distribution of the users’ times shows a kurtosis of 2.358 and a skew of -1.225. With this information, a 2.5-second limit was established for entering a command. After this time, the action is timed out (Table 1). Table 1. Action spent time
Minimum Average Maximum Timed Out
Seconds 0,325 0,69625 1,35 2,5
The device’s screen can be used as support for the audio interface in the case of users with partial vision and for teachers involved in the testing. The same restrictions were obtained as those observed for the mBN software, with functions that should be implemented in the logic for menus, requirements, organization, and the debugging of
Mobile Audio Navigation Interfaces for the Blind
409
contents presented in the software, such as including a menu with the value of a ticket over time, and including relevant information about the station’s surroundings. The proposal to present information on the stations’ surroundings is related to the orientation and mobility cues that blind people use when navigating urban environments. These cues are: street numbers, cardinal points regarding traffic direction, cardinal points regarding street curbs, street intersections and other urban landmarks (sidewalks, stairs, rails and traffic lights). Figure 3 displays the users’ satisfaction with the MOSS software. This dimension obtained 9.5 points of a total of 10, and is followed by control & use with 9.17 points, and interface with 8.20 points. The standard deviation for the first two dimensions was 0.16, reaching 0.60 for interface. A kurtosis of 0 and a skew of 0 for all three dimensions show that the distribution for the three dimensions is symmetrical. On average a score of 8.88 points was obtained, which is an extremely relevant result that assures the usability and acceptance of the software. Some of the most highly evaluated statements were: “I like the software’s sounds (9.8 points), “I learned with this software” (9.6 points), and “I like the software” (9.4 points). The lowest scores were obtained for the statements, “The software adapts to my rhythm” (8.0 points) and “I felt in control of the software’s situations” (8.2 points), which reveals the existence of a certain learning and appropriation curve. In general, however, a high degree of appreciation was obtained from the end users.
Fig. 3. Results of the end user evaluation of the MOSS software
4 Conclusions Four prototype software applications were evaluated that seek to support the navigation of blind users in real environments such as a neighborhood, public bus transportation, the Metro network and the school or a closed building. The interfaces of all prototypes evaluated are adequate for use by blind users. During the interaction it was possible to observe that users easily learned and recognized the audio cues and the functions used in the software, as well as their meaning. Through the evaluation of all the software applications we could determine that the use of a PocketPC was appropriate for the end goals of this study, in that the participants learned to use the device without any major difficulties, demonstrating a high level of skill in using the buttons on the PocketPC. Users were highly receptive to the 4 software applications, and were motivated by their use of the system.
410
J. Sánchez
Also, the use of both the synthesized voice and the non-verbal sounds in the audio system were highly accepted by the users. In this case, the natural sound of the TTS and the clarity of the sounds in general were highlighted. In particular, the clock system that the software used to transmit information regarding directions was easily assimilated by users with visual disabilities. The use of all the software applications allowed for relevant navigation by the users because it provided specific information to guide them during their travel. Because the handheld apparatus was a new device for them, there were some difficulties in the very beginning, but users slowly began to adjust to using the device. They discovered solutions such as using it from their pockets with earphones in order to avoid losing the auditory references in their surroundings, and choosing a safe and comfortable place in which to handle them.
5 Discussion The interfaces of the software applications developed were evaluated by using a sample made up of participants with different ages and degrees of blindness, verifying in all cases that the users were able to interact with all the mobile software applications independently. At the same time they demonstrated that the handheld device, the interfaces designed and the model of interaction were all appropriate for users with visually disabilities. Although the samples used for this evaluation were limited, the different contexts of use and the various users’ backgrounds allowed us to detect several usability problems (real and potential), as well as to measure the level of understanding the objective of the software, embedded representation and the ways to navigate and interact with it. During the interaction it was possible to observe that the users quickly learned and recognized the audio cues used in the software and their meanings. They were able to understand the model of interaction and the metaphor used. As far as the use of the device, none of the users had a hard time finding and identifying the buttons, the joystick or the screen. Besides these significant usability results, the evaluation became a useful opportunity to detect problems and opportunities to improve the design, as well as to correct the software’s programming and modeling errors. Acknowledgements. This report was funded by the Chilean national Fund of Science and Technology, Fondecyt #1060797 and Project CIE-05 Program Center Education PBCT-Conicyt.
References [1] Dowling, J., Maeder, A., Boldes, W.: A PDA based artificial human vision simulator. In: Proceedings of the WDIC 2005, APRS Workshop on Digital Image Computing. Griffith University 2005, pp. 109–114 (2005) [2] Gill, J.: An Orientation and navigation System for Blind Pedestrians (2005), http://isgwww.cs.uni-magdeburg.de/projects/mobic/ mobiruk.html (last Accessed, January 2009)
Mobile Audio Navigation Interfaces for the Blind
411
[3] Jacobson, R.: Navigation for the visually handicapped: Going beyond tactile cartography. Swansea Geographer 31, 53–59 (1994) [4] Jacobson, R., Kitchin, R.: GIS and people with visual impairments or blindness: Exploring the potential for education, orientation, and navigation. Transactions in Geographic Information Systems 2(4), 315–332 (1997) [5] Lahav, O., Mioduser, D.: Construction of cognitivemap0s of unknown spaces using a multi-sensory virtual environment for people who are blind. Computers in Human Behavior 24(3), 1139–1155 (2008) [6] Loomis, J., Marston, J., Golledge, R., Klatzky, R.: Personal Guidance System for People with Visual Impairment: A Comparison of Spatial Displays for Route Guidance. Journal of Visual Impairment & Blindness 99, 219–232 (2005) [7] Na, J.: The Blind Interactive Guide System Using RFID-Based Indoor Positioning System. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I., et al. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 1298–1305. Springer, Heidelberg (2006) [8] Ran, L., Helal, A., Moore, S.: Drishti: An Integrated Indoor/Outdoor Blind Navigation System and Service. In: Proceedings of the 2nd IEEE Pervasive Computing Conference, Orlando, Florida, March 2004, pp. 23–30 (2004) [9] Sánchez, J.: End-user and facilitator questionnaire for software usability, Usability Evaluation Test, University of Chile (2003) [10] Sánchez, J., Aguayo, F.: Mobile Messenger for the Blind. In: Stephanidis, C., Pieper, M. (eds.) ERCIM Ws UI4ALL 2006. LNCS, vol. 4397, pp. 369–385. Springer, Heidelberg (2007) [11] Sánchez, J., Maureira, E.: Subway Mobility Assistance Tools for Blind Users. In: Stephanidis, C., Pieper, M. (eds.) ERCIM Ws UI4ALL 2006. LNCS, vol. 4397, pp. 386–404. Springer, Heidelberg (2007) [12] Sánchez, J., Zúñiga, M.: Evaluating the Interaction of Blind Learners with Audio-Based Virtual Environments. Cybersychology & Behavior 9(6), 717 (2006) [13] Sasaki, H., Tateishi, T., Kuroda, T., Manabe, Y., Chihara, K.: Wearable computer for the blind – aiming to a pedestrian’s intelligent transport system. In: Proceedings of the 3rd International Conference on Disability, Virtual Reality and Associated Technologies, ICDVRAT 2000, pp. 235–241 (2000) [14] Virtanen, A., Koskinen, S.: NOPPA: Navigation and Guidance System for the Blind (2004), http://virtual.vtt.fi/noppa/noppa%20eng_long.pdf (last Accessed, January 2009) [15] Willis, S., Helal, S.: A Passive RFID Information Grid for Location and Proximity Sensing for the Blind User. University of Florida Technical Report number TR04-009 (2005), http://nslab.ee.ntu.edu.tw/iSpace/seminar/papers/ 2005_percom/passive_RFID_information_grid.pdf (last Accessed, January 2009)
A Mobile Communication System Designed for the Hearing-Impaired Ji-Won Song and Sung-Ho Yang College of Design, Inje University, 607 Obang-Dong, Kimhae, Kyongnam, Korea {dejsong,deyangsh}@inje.ac.kr
Abstract. This is a case study of the design of a communication system and its interfaces aimed at addressing the communication needs of the hearingimpaired. The design work is based on an in-depth investigation of the problems pertaining to mobile phone usage and general conversation difficulties of Korean deaf people. It was determined from this investigation that the technology-related issues of the hearing impaired are not limited to usability or accessibility, but arise from hindered executive actions and differing executive behaviors for achieving communication goals at varying levels of ability. Therefore the design study has developed a new approach to the unique communication needs of the hearing-impaired, as well as their behavioral patterns, and presents possible overall improvements in face-to-face and distance communication through mobile technology. Keywords: Hearing-impaired, Communication system, Behavior pattern.
1 Introduction Despite the increased usability and accessibility of information technology, there are still substantial challenges with regard to technology design for people with disabilities. There exists a broad range of activities that occur after the initial perception that a person must undergo to fully interact with a device and to fulfill the person’s interaction goal. [1] Although many technologies provide supplementary “accessible” means through which the disabled can perceive and react to interface information, this by no means renders them “usable” in that the interaction is often unsatisfactory. [4] To provide satisfactory technology usability, for users disabled or not, designers need to focus on consideration of users' practical goals and behaviors. In order to design a user-friendly technology for the disabled, we need expansion from a design perspective, in consideration of their overall goals and behaviors, that stretches beyond partial accessibility. This is a case study of the design of a communication system and its interfaces aimed at addressing the communication needs of the hearing-impaired. The design work is based on an in-depth investigation of the problems pertaining to mobile phone usage and general conversation difficulties of Korean deaf people. The design study has developed a new approach to the unique communication needs of the hearing-impaired, as well as their behavioral patterns, and presents possible overall C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 412–421, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Mobile Communication System Designed for the Hearing-Impaired
413
improvements in face-to-face and distance communication through mobile technology. This paper includes the knowledge gained from the investigation and explanation of the methods used to overcome the overall problems.
2 Gulfs in Disabilities’ Technology Interaction According to Donald Norman (1990), in the use of a product, there are gaps, called gulfs of execution and evaluation, between user goals and an artifact (Figure 1) [2]. The gulf of execution lies between our desired achievement (our goals and intentions) and the available means of physical execution. The gulf of evaluation reflects the amount of effort required by the user to interpret the physical state of the system and to determine how well expectations and intentions have been met. When an artifact is used, the gap is bridged through the user's specific actions. To bridge the gulf of execution, users perform actions as follows: 1. The user forms an intention to achieve a goal. 2. The user specifies a sequence of actions required to achieve the goal. 3. The user physically executes the intended actions to achieve the goal. The evaluation stage can be broken down into three parts also: 1. The user perceives the state of the world after the performance of some actions. 2. The user interprets those perceptions according to the expectations resulting from the actions. 3. The user must examine the current state of the world with respect to both his/her own intermediate expectations and his/her overall goal [1]. GULF OF EXECUTION
USER’S GOALS
PHYSICAL SYSTEM
GULF OF EVALUATION
Fig. 1. Interaction Gulfs (Norman (1986)) [3]
In the HCI area, the gulf framework between user goals and an artifact, has helped to discuss usability problems such as the ease with which users can find possible actions and execute them accordingly, and whether they can determine the interaction state without misunderstanding and can respond to it. For Instance, Norman suggested visibility, a sound conceptual model, good mappings and feedback, as typically good design principles to reduce the distance of gulfs [2]. However, for the disabled, whom have limited sensory or other abilities, solutions to the gulf problem are not limited to good conceptual model and feedback. From our investigation of mobile phone usage by the hearing-impaired, it was determined that the technology-related issues experienced by these people arise from hindered
414
J. Song and S. Yang
executive actions and differing executive behaviors for achieving communication goals at varying levels of ability. On the basis of Norman’s executive stage action framework, to achieve the goal, the disabled need to specify and execute a sequence of actions planned to achieve the goal. However, the limited sensory function of the hearing-impaired interrupts the sequence of communication actions, rendering them unable to achieve the goal through the same means as those with unimpaired hearing. This interruption extends beyond interface perception problems to fundamental communication activities. From the user research, we found that the hearing-impaired are giving up on their communication goals due to a lack of executive actions, and that they have altered their actions unrelated to hearing to supplement the broken actions and ultimately achieve their intentions, if only primitively. The differing behavioral patterns in supplementary actions have also lead to serious usability problems in their technology usage.
3 User Research on Communication for Hearing-Impaired With the aim of designing a mobile communication system for the hearing-impaired, we investigated hearing-impaired mobile phone usage and general conversations. Three sessions of focus-group interviews (FGI) with eight hearing-impaired people and three sign language interpreters were conducted in two Korean cities. The first interview was at Kimhae sign-language interpretation center, with interpreters whose job is to assist hearing-impaired people in their everyday communication. It was a pilotinterview to gather basic knowledge on how the hearing-impaired communicate and to get some advice on interview methods for deaf people. The second and third interviews were conducted with five deaf citizens and three deaf people from the Korea Association of the Deaf, at the Kimhae branch and Chanwon branch, respectively. The participants were aged from their twenties to forties; six people lived with their families and two lived alone. The FGI is designed to discuss issues relating to hearing-impaired communication, how they use mobile phones and what particular and practical needs they have for both mobile phone usage and general conversation. Because writing is insufficient in complex interview communication, due to the differing grammatical structure of sign language and written Korean, the second and third FGIs were interpreted by sign-language interpreters between a moderator and participants. The interpreted interviews were voice-recorded with participants' consent. From the interviews, we found hearing-impaired people to have the same objectives in general conversation and mobile-phone usage, communication, as those with unimpaired hearing, but with the obvious hindrance of their disability standing in the way of them accomplishing goals. The current technological design trends don't do enough to help bridge the gap. The main means of communication between the hearing-impaired is sign-language at close range. They use sign-language via video calling technology in distance communication, in addition to Short Message Service (SMS). In a conversation with a person of unimpaired hearing who is unacquainted with sign language, they use writing or lip-reading at close range, and SMS for distance communication. When a complex conversation is necessary with a person of unimpaired hearing (e.g. at a hospital or government office), either via face-to-face or by distance communication, deaf people often need sign-language interpretation
A Mobile Communication System Designed for the Hearing-Impaired
415
because SMS and writing are often awkward due to the differing grammatical structure of sign language and written Korean. However, the current interpretation service for such cases is available only in a face-to-face conversation involving a deaf person, an interpreter, and a conversational partner. On account of this, the hearing-impaired tend to give up most distance communication, and even close-range communication with those of unimpaired hearing (Figure 2). &O R V H U DQ JH F R P P X Q LF DW LR Q $KHDULQJ LPSDLUHG SDUWQHU
'L V W D Q F H F R P P X Q L F D W L R Q 9LGHRFDOO 6LJQ ODQJXDJH
6LJQ ODQJXDJH
$KHDULQJ LPSDLUHG SDUWQHU
606 :ULWLQJ $SDUWQHURI XQLPSDLUHG KHDULQJ
,QWHUSUHWDWLRQ Unavailable communication
$KHDULQJ LPSDLUHG SHUVRQ
Unavailable communication
$SDUWQHURI XQLPSDLUHG KHDULQJ
9RLFHFDOO $FRPPXQL FDWLRQ SDUWQHU
9RLFH 606 $SHUVRQRI XQLPSDLUHG KHDULQJ
$FRPPXQL FDWLRQ SDUWQHU
9LGHRFDOO
Fig. 2. Available executive actions for close-range and distance communication
In addition, limited hearing sensitivity differed other communication actions using mobile phones for SMS and video-calls. Because the hearing-impaired accomplish communication through these alternative means, to supplement their hindered communication channels, it has altered their SMS and video-call behaviors. Consequently, these differing behaviors lead to serious usability problems. Whereas SMS usage is secondary to voice calls for people with unimpaired hearing, SMS is the primary means of distance communication in Korea's deaf society. Differing from short information exchange in non-deaf people’s usage, for a simple but garrulous conversation, the hearing-impaired exchange at least four to ten times as many messages. Often in the SMS interface corresponding to the message received time, the sent or received messages forming a continuous conversation are mixed with messages from other conversations, interrupting the smooth flow of hearing-impaired communication. Because of these difficulties, sign language conversations between deaf people via video call have recently enjoyed an increase in popularity. However, the videocalling interface is designed to show only the speaker’s face, and is generally too small to accommodate the chest and the two hands making signs, or postures in which signs are made using only one hand while the other hand holds the phone. In being alerted to mobile phone signals, the hearing-impaired have shown particular behaviors: because the vibrating signals supplementing auditory signals aren’t effective if a device is not in contact with the user's body, the hearing-impaired will hold a mobile
416
J. Song and S. Yang
phone at all times so as to not miss incoming signals. Some participants said they hold the device even when they sleep, or put it under their pillow. Additional details of the problems involved with each communication method and mobile phone usage revealed in the FGI are presented in Table 1. Table 1. Hearing-impaired peoples' communication methods and associated problems &RPPXQLFDWLRQPHWKRGV
3UREOHPVGLVFRYHUHG
:ULWLQJ
y Not fluent due to the differing grammatical structure of sign language and written Korean. y Writing ability is varied depending on educational background y Writing devices required (pen and paper)
&RPPXQLFDWLRQYLD LQWHUSUHWDWLRQ
606
y Available only for face-to-face conversation y Due to limited interpreter numbers, face to face interpretation is not readily available y Immediate interpretation is required in emergency situations y The hearing-impaired often find themselves in unfavorable situations without an interpretation service y y y y
Awkward in writing due to grammar problems Only for simple conversation At least four to ten times as many messages required Mixed messages from other conversations can hinder fluent communication
9LGHRFDOO
y Screen too small for sign-language y One handed sign-language while the other hand holds the phone y Hard to hold during long conversations (arm aches) y Invisible in a dark place or at night time y Service charge is expensive
2WKHUV
y Calling signals often missed even if set to vibration signal y Video phone (local line) and mobile phone video calls are not compatible y Door bell is not perceived
4 System Design 4.1 Design Process From the user research results, we found that many different problems pertaining to communication and mobile phone usage by the hearing-impaired are primarily caused by the fact that current technology is not designed to support their unique communication needs and behavioral patterns. From this discovery, our design goals developed
A Mobile Communication System Designed for the Hearing-Impaired
417
two major focuses: firstly, providing an appropriate set of communication means to cover all distance and close-range communication, so as to fulfill the communication needs upon which the hearing-impaired have all but given up; secondly, we propose to provide improved interaction and interface for each communication method to better suit the unique communication behavioral patterns of the hearing-impaired. To achieve this design goal, our design approach was as follows: First, we recommend a communication framework to offer sufficient available executive actions to cover the majority of hearing-impaired communication objectives with other deaf people, as well as with people with unimpaired hearing, in close-range or distance communication. Second, on the basis of this framework, we designed a communication system and interfaces. Three personas and six design scenarios, reflecting communicational situations selected from the user research, were employed as the major tools in designing the system interaction, and form factors fitting deaf peoples' unique behavioral patterns. Design scenarios describing user behavior and experience are essential in discussing and analyzing how the technology is (or could be) reshaping their activities [5].
Fig. 3. User test with deaf people (Left) using video prototype (Right)
Third, a user evaluation was conducted, in which six deaf people and two interpreters reviewed the system design via video and model prototypes, and expressed their opinions and identified possible problems (Figure 3). In this way, it was verified that the design concept of the communication system would be effective in addressing the overall communication needs of the hearing-impaired. Currently, the project is undergoing a procedure of design refinement for further development. 4.2 Communication Framework and Interaction Design The communication means of the system are developed from a communication framework as previously mentioned. A deaf person’s communication environments were divided into close range and long distance, and they are distinguished by whether the partner is hearing-impaired or not. In each distinguished communication situation, appropriate communication methods were developed, as indicated in Figure 4. The framework suggests new communication methods such as remote interpretation service or interpretation calls, as well as sign-language video call, SMS, and
418
J. Song and S. Yang
digital memo, which are those currently used by Korea's deaf citizens. A remote sign language interpretation service, connecting the device to the interpreter’s video phone, helps face-to-face communication between people of impaired and nonimpaired hearing. Similarly, for distance communication in the same situation – which Korea's hearing impaired have basically given up on – the system is designed to provide an interpretation call service by establishing a connection between the hearing-impaired person, an interpreter, and the non-hearing-impaired person, through video calling.
Fig. 4. Proposed framework to develop communication methods of the system
The mobile device is designed to provide various means of communication, developed from the framework, via sign-language video calls, threaded SMS, digital memos, remote interpretation services, and interpretation calls (Figure 5). The interfaces and form factors of the devices are intended to suit the behavioral patterns and needs of the hearing-impaired in sign-language conversations using two hands, SMS conversations with several exchanges, and digital memos to another person. The device has two screens. The screens are slightly bigger than those of normal mobile phones (3.3 inches) to show each person's sign language, including the chest and the hands making signs. It is designed to stand independently so that a user can make signs using two hands. When the device is not in a situation in which it can stand independently, like in a car, it can be worn on the arm to reduce the annoyance of
A Mobile Communication System Designed for the Hearing-Impaired
419
having to hold the phone. Threaded SMS is applied to provide fluent conversation with a particular person without the disturbance of other incoming messages. Digital memo is added to help users with short and instant face-to-face communication, for which the interface is designed to clearly and easily present a memo by rotating the text as per the open angle of the upper screen. A Mobile Communication Device
Front
Back
Initial Display
Remote Interpretation Service
Interpretation Call
Threaded SMS Digital Memo
Sign-Language Video Call
Fig. 5. Form factors and interfaces of the mobile communication device
4.3 The Additional Devices The final proposed system adds a vibrator and a cradle to a mobile communication device which provides the major communicational aids explained above (Figure 6). The vibrator and cradle work together to help mitigate additional daily problems of the hearing-impaired. The vibrator is designed to be worn on the wrist in order to prevent the missing of incoming call and message signals through wireless communication. It is also assembled with mobile device to be used without holding the device when a user uses long time sign language communication. The cradle, in which the device can be docked, supports the needs of the hearing-impaired by allowing widescreen, hands-free viewing at the home or office, as well as charging of the device's battery and providing a secure and stable location for device storage. The cradle can also act as a house door bell, which is but another unfilled daily need of Korea's hearing-impaired.
420
J. Song and S. Yang
Fig. 6. The proposed communication system modules and functions
5 Conclusion Computing system design is part of an ongoing cycle in which new technologies create opportunities for humans. Technology design, considering the practical needs of the disabled, can provide new opportunities for these people and improve their quality of life. From the design study case, we discovered that limited sensitivity and ability can alter the overall executive actions of the disabled in their interaction with a device. For conspicuous improvement in technology usability for the disabled, serious efforts are vital for the provision of appropriate executive actions and to better suit disabled citizens' unique behavioral patterns on the basis of deepening understanding of the daily problems they face. In this design study, a concrete design approach to improving the overall communication methods and abilities of the hearing-impaired can help them to achieve their communication goals in face-to-face and distance communication. The interaction design, directly addressing each communication method commonly used by the hearing-impaired, is based solely on their unique behavioral patterns, and poses to improve their overall quality of communication activities and life.
A Mobile Communication System Designed for the Hearing-Impaired
421
References 1. Carey, K., Gracia, R., Power, C., Petrie, H., Carmien, S.: Determining accessibility needs through user goals. In: Proceedings of the 12th International Conference on HumanComputer Interaction, 4th International Conference on Universal Access in HumanComputer Interaction, pp. 28–35. Lawrence Erlbaum Associates, Mahwah (2007) 2. Norman, D.: The design of Everyday Things. MIT Press, Cambridge (1990) 3. Norman, D.: Cognitive Engineering. In: Norman, D., Draper, S. (eds.) User centered system design: New perspectives on Human-Computer Interaction, pp. 31–61. Lawrence Erlbaum Associates, Inc., New Jersey (1986) 4. Pullin, G., Newell, A.: Focussing on Extra-Ordinary Users. In: Proceedings of the 12th International Conference on Human-Computer Interaction, 4th International Conference on Universal Access in Human-Computer Interaction, Part1, pp. 253–262. Lawrence Erlbaum Associates, Mahwah (2007) 5. Rosson, M.B., Carroll, J.M.: Usability Engineering: Scenario-Based Development of Human-Computer Interaction. In: The Morgan Kaufmann Series in Interactive Technologies. Moran Kaufmann, San Francisco (2001)
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly Wang-Chin Tsai and Chang-Franw Lee Graduate School of Design, National Yunlin University of Science and Technology 123, University Road Section 3, Touliu, Yunlin, 64002, Taiwan, R.O.C. {g9330802,leecf}@yuntech.edu.tw
Abstract. Small touch screens are widely used in applications such as bank ATMs, point-of-sale terminals, ticket vending machines, facsimiles, and home automation in the daily life. It is intuition-oriented and easy to operate. There are a lot of elements that affect the small screen touch performance. One of the essential parts is icon feedback. However, to merely achieve beautiful icon feedback appearance and create interesting interaction experience, many interface designers ignore the real user needs. It is critical for them to trade off the icon feedback type associated with the different users needs in the touch interaction. This is especially important when the user capability is very limited. This paper described a pilot study for identifying factors that determine the icon feedback usability on small touch screen in four older adult Cognitrone groups since current research aimed mostly at general icon guidelines and recommendations and failed to consider and define the specific needs of small touch screen interfaces for the elderly. In this paper, we presented a concept from the focus on human necessity and use a cognitive assessment tool, which is, Cognitrone test, to measure older adult’s attention and concentration capability and learn more about how to evaluate and design suitable small screen icon feedback types. Forty-five elder participants were participated. Each subject was asked to complete a battery of Cognitrone tests and divided into 4 groups. Each subject was also requested to perform a set of ‘continuous touch’ usability tasks on small touch screen and comment on open-ended questions. Results are discussed with respect to the perceptual and cognitive factors that influence older adults in the use of icon feedback on small touch screen. It showed significant associations between icon feedback performance and factors of attention and concentration. However, this interrelation was much stronger for the Group 2 and Group 4, especially for Type B, Type C and Type G. Moreover, consistent with previous research, older participants were less sensitive and required longer time to adapt to the high-detailed icon feedback. These results are discussed in terms of icon feedback design strategies for interface designers. Keywords: small touch screen, icon feedback, older adults, cognitrone style.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 422–431, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly
423
1 Introduction Lately, “touch” becomes one of the buzzwords. In fact, for over a decade, touch screen technology and devices are in widespread use, from public systems such as self order and information kiosks to personal handheld devices like PDAs (Personal Digital assistants) or gaming devices. Generally speaking, interactions on touch sensitive screens is one of the most “direct” application forms of HCI (Human Computer Interaction), with information and control displayed on one surface. So to say, the zero displacement between input and output, control and feedback, hand movement and eye gaze makes touch screen an intuition-oriented tool for users, particularly for the novices [6]. Nonetheless, as this touch technology gains sophistication and its teething problems being worked out, small touch screen technology meets two limitations. First, the screen might be obscured by the user’s finger, hand, or arm. Second, it is difficult for users to point at targets within their finger width. Recently, some studies on thumb use recommended that 9.2mm is the most proper width for on-screen icons [2]. Below 9.2mm, users’ performance tends to degrade when they attempt to correctly select an icon on the screen with their thumb. Though the problem can be solved by applying other aids, such as a stylus or a cursor, the easy-to-operate characteristic of thumb-based screen touch no longer exists. Moreover, a practical designer may consider icons of 9.2mm too large and space occupying. Therefore, techniques like Offset Cursor and Shift are introduced to improve selection accuracy and to help users to refine their initial selection position. Originally designed for fingertip operation, Offset Cursor overcame digit occlusion by offsetting the cursor from the selection point while Shift achieved it by displaying an inset of selection region. However, both novel designs are of little significance in their adaptabilities to the altering needs of the elder users as their abilities decline because of aging. Known as the most frequently applied approach for human computer interface design, Nielsen’s outlines of the User-centered Paradigm (1993) were intended for homogeneous groups to test the users regarding to design decisions. Yet, the current interface development tools and methods neither meet the needs of diverse user groups, nor do they address the dynamic nature of diversity. As a result, there is an urgent need to implicit the issues of these shortcomings of the current approach as well as to search for new processes and practices. By its literal definition, touch screen operation is different from normal screen operation. Besides visual search, “touch” actions are involved during the interaction as well. That is to say, the main objective of interface designers is to create a highly user-friendly interface while confirming appropriate design concepts. With the view that older adults’ attention ability could be a strong impact factor to their feedback perception of the icon, this study aimed to investigate how icon feedback types affect diverse elderly users when they operate on small touch screen. By generalizing the older generation’s perceptions and performances of varied icon feedbacks on small touch screen, analyzing their preferences of different icon feedbacks and finally generating performances and differentiating advantages, the findings of this study served as a guide to icon feedback design of more user- friendly small touch screen.
424
W.-C. Tsai and C.-F. Lee
2 Literature Review Research on the use of alternative feedback modalities focused primarily on the use of single feedback, while comparatively few studies examined and compared different visual icon feedback combination on small touch screens. As Leonard et al. [4] pointed out in 2006 that additional research is needed to examine specific combined icon feedbacks and their usability for the elderly with varied physical and psychological conditions. Although passive touch screens are intuition-oriented and easy-to-learn, there are several limitations about the precision that users have to overcome in the interaction. First of all, for touch screens, finger pointing selection of rather small objects and specification of smaller targets may be difficult and critical in effective selections. Second, for interaction on small touch screens, complications may occur, due to occlusion, imprecision in selection, poor calibration, or parallax errors caused by the offset between the display surface and the overlaid touch surface. Third, for touch screen interaction, which is different from the use of mouse, no analogue is involved. Unlike mouse users who can move the mouse pointer over screen elements, get feedback from the selected elements such highlighting as well as confirm their selection by clicking mouse buttons, touch screen users point on screen elements directly and immediately initiate an action which might not be able to canceled afterwards. Fourth, touch screen interaction is characterized by the user’s habits and characteristics. In other words, it is a procedure requiring crucial cognitive skills such as concentration, coordinated reactions, excellent judgment together with decision-making capabilities to avoid mistaken manipulation. Finally, owing to the fact of physical and intellectual declinations of human aging, older adults face more difficulties in small touch screen operation which is intended majorly for the younger users. For instance, when operating with complicated interfaces, it may be hard for the elderly to press minute buttons and detect icon feedbacks because of their varied attention capabilities and habits formulated [3]. Hence, indications whether an action is possible or not have to come along with the icon feedbacks. Likewise, static activation takes more of the elder users’ attention for clarifications and action-receiving checks. To sum up, despite the fact that relatively few attempts were made, discussions and studies of icon feedback types are of essential importance. Meanwhile, for small touch screen assessments, direct applications of icon feedback are required, especially those in terms of the estimated numbers of potentially-excluded population and potentially-challenged population among various target population groups.
3 Methodology 3.1 Participants Forty-five volunteers ranging from the age of fifty-five to seventy-three years old participated in this study. Among them, twenty-eight were female and seventeen were
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly
425
male. The mean age is 67.6 years old. Compensation for participants in this study included free comprehensive cognitrone tests and souvenir. Participants were randomly selected in Taichung City. 3.2 Cognitrone Test The Cognitrone (COG) Test is a general performance test regarding to attention and concentration measurement and analysis, which consists two basic concepts. First, its stimulus materials are composed of stick figures and require paticipants’ judgment to decide identical pictures. Second, the Cognitrone test is used for measuring executive functionings such as decision or judgment-making on a person’s receptive response to minute changes. Moreover, the COG test is applied in predicting concentration levels and attention spans, which are essential to underground work skills. With a set time limit, participants are asked to accomplish tasks which are not intellectuallydemanding, with possible speed and accuracy.
Fig. 1. The Cognitrone test introduction
During the test, participants have to compare the cognitrone of figures. Altogether five pictures are presented on the screen in which four pictures in one line with the fifth picture below them. Participants have to determine whether it is an exact match or not by pressing two different colors of buttons on the response panel. Green buttons are for exact matches while red buttons are for inexact matches. Then, complied measurements describing the subject’s performance of speed, accuracy and consistence will be processed and calculated by a scoring program. In regards with the time limit, usually the COG test gives an unlimited completion time. However, it is suggested that the ideal time for completion is between five to ten minutes. Any time longer than ten minutes is considered a reflective of a concentration deficit. Furthermore, the reliability of the COG test is considerably high, which is above r=9.5, with a number of validity tests carried out. Meantime, although different versions of COG test are available, the S11 Version is the most suitable for the use of Taiwanese participants because the S11 Version was developed with relevant and applicable norms nation-wide. In this session, participants were divided into four COG groups, each with different analysis result and different focuses on a major trait of attention, as illustrated in Table 1 below.
426
W.-C. Tsai and C.-F. Lee Table 1. Arrangements: participants group the number of the participants item
age
gender
Group characteristic
Group1(n=11)
62.0 (3.4)
M=2 F=9
Accurate-fast
Group2(n=15)
67.3 (5.4)
M=10 F=5
Accurate-slow
Group3(n=12)
67.2 (5.1)
M=5 F=7
Inaccurate-fast
Group4(n=7)
66.8 (5.5)
M=0 F=7
Inaccurate-slow
3.3 Materials and Experimental Design The interface platform used in this study was an ASUS MyPal A730W compatible PDA. Participants were seated approximately thirty centimeters from the screen display. Simulated screen resolution was set at 1024 x 768 pixels, with a 24-bit color setting. To accomplish continuous-touch tasks, participants were requested to perform a series of random input of ten-digit telephone number. In the meantime, by using the
Table 2. icon feedback example screen shot and experiment scene type A
Feedback Form Description Movement: The position of the icon will gradually move after icon is touched
B C
Color :The color of the icon will change after icon is touched Magnify: The shape of the icon will change after icon is touched Movement+Magnify: The combined
D
feedback of the icon will apply after icon is touched Movement+Color: The combined
E
feedback of the icon will apply after icon is touched Color+ Magnify: The combined
F
feedback of the icon will apply after icon is touched Movement+ Color+ Magnify The
G
combined feedback of the icon will apply after icon is touched
Presentation
interface platform
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly
427
Flash X programming language, a group of icon feedback presentations was developed in this study. Also, based on the findings of related research, an average icon size of 6 mm was adopted in this experiment. Finally, this study employed a 7 x 4 factorial design with seven feedback modality conditions among the four groups of participants illustrated in Table 1. In addition, two measurements of efficiency were used to assess participants’ performance. One was the total time for completion, which was measured in seconds and the other was the frequency of errors of missing or wrong. Both measurements focused on interrelated components of the continuous touch task, which were influenced mostly by the user’s response to the icon feedback on small touch screen. 3.4 Procedures Before the experiment, the participants were briefed about the rules and the purpose of the experiment and were requested to fill in their personal information such as their age, gender, and education. In the test session, in order to get accustomed to the interface, the participants were asked to make a simple trial before the start of each task. As the task began, the participants were asked to touch the icon from the program instruction, which adopted progressive interaction in the experiment interface. After that, to complete, the participants had to touch every icon on the touch screen and complete the ten-digit telephone number trials as shown in Table 2. During the trials, the participants experienced all of the seven icon feedback types, which are Type A, Type B, Type C, Type D, Type E, Type F, and Type G. Also, respectively they perceived various usabilities from each icon feedback. At last, after the screen touch tasks being completed, the participants were encouraged to comment on some open-ended questions if there were any aspects for adjustment, improvement or further explanation or if there were any favored features or disliked features. All materials were presented to the participants in Chinese and for the purpose of this paper, all items and questions were translated into English. 3.5 Data Analysis For the analysis of the data, this study applied Analyses of Variance (ANOVA) to examine significant differences of the task performance of feedback conditions within each cognitrone group. When each cognitrone group operated the icon feedback of Type A, Type B, Type C, Type D, Type E, Type F and Type G, the one-way ANOVA analyzed the data. In addition, the significant differences were analyzed by utilizing the Scheffe Method as the post hoc test for multiple comparisons. Significance was accepted at the level of p<.05, while the degrees of freedom and corresponding probability, or the F-value, were also shown in the statistical test. In all, the statistical analysis was conducted by utilizing the Windows SPSS Statistics 17 Program.
4 Results and Discussions The results indicated that changes in overall completion time and the frequency of errors were of highly significance among the four subject groups (F(3, 311) = 109.7,
428
W.-C. Tsai and C.-F. Lee
p<.01). Hence, following the ANOVA analysis of the four groups, Table 3 depicted the mean total time for completion and error frequency of each icon feedback condition. In general, the results of the ANOVA analysis and Figure2, Figure 3, Figure 4 and Figure5 indicated that different feedback condition had a significant effect on Group 2 and Group 4, as indicated by the test statistics and post hoc test results described in Table 3 below. As to Group 1 and Group 3, no significant effect of feedback condition was appreciated (p<.05). To begin with, for Group 1, neither the icon feedback of completed time nor the effect of error frequency was significant, which showed that the older adults in Group 1 had fewer positive errors during touch screen interaction. Regarding their characteristics, they attended to touch icons in a quick review and made a correct decision. For this reason, Group 1 participants were considerably effortless to experience and detect the small-size icon feedbacks. Nonetheless, when compared to the theory of Capability Demand, the demand levels of these icon feedbacks were multidimensional and set by the attributes of small touch screen interface features. Therefore, even if the Group 1 participants had higher performance, other potential factors in terms of creative interaction design methods were still needed further clarification. Secondly, for Group 2, the results indicated that both total completed time and error frequency were significant among the feedback types. Among them, the icon feedback of Type A, Type D, Type E and Type G were more appreciated as they were two-dimensional effects which provide spatial and semantic cues for the participants. Moreover, as other normal feedback application intensified in Type B and Type C, the participants’ response with a slower rate for task completion, which was an indication that those individuals with G2 characteristics do not benefit from the change of color or shape. In other words, the findings suggested that for small touch screen, concentration on location changes are highly required since Group 2 participants preferred spatial and semantic cues to organize information. Hence, designs of icon feedback should take their specific requirements into consideration. Although for Group 3 there was not much significance appreciated on the icon feedback types, still the findings were useful. It is agreed that an ideal icon feedback design not only pay more attention to older adults with declining attention capability but also add more chance for the elderly to improve their performance. Take Group 3 participants for example, they perceived icon feedbacks in a rather short time and made more mistakes. For them, the icon feedback design should shift its focus to the concept of shape-change because structural components were regarded as a perceiving modal of information transferring. Thus, the conceptualization of improved icon feedback with alerting function may be an insightful solution to this issue concerned. Still, in regards to Group 4, the findings showed that participants prefer external structure presented by the material in the Type D and Type G interactions, which was consistent with the findings of the previous studies in the domains of user capabilities. [5] That is, for some older adults, more interaction steps and thinking time for small screen interaction were needed, especially for diverse characteristics of the older
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly
429
Table 3. Results of operational time and frequency of errors on icon feedback type Group
df
Mean Square
F
Sig.
Post hoc Tests
1.61
0.156
—
0.97
0.452
—
6.24
0.00
A
12.6
0.00
A, D, E, F, G
0.39
0.88
—
2.78
0.01
—
2.64
0.02
G
5.90
0.00
G,
Group1 (Accurate-fast) time error
Between Groups
6
12.4
Within Groups
70
7.7
Between Groups
6
0.53
Within Groups
70
0.55
Group2 (Accurate-slow) time error
Between Groups
6
68.1
Within Groups
98
10.9
Between Groups
6
12.6
Within Groups
98
1.0
Group3 (Inaccurate-fast) time error
Between Groups
6
3.4
Within Groups
77
8.8
Between Groups
6
3.9
Within Groups
77
1.4
Group4 (Inaccurate-slow) time error
Between Groups
6
26.8
Within Groups
42
10.1
Between Groups
6
6.22
Within Groups
42
1.05
population. Altogether, results of this study and the previous works suggested that the elderly tend to require more interaction steps than the other age groups. Moreover, this study also showed that Type D and Type G helped Group 4 participants with available information source by getting an overview of the feedback content via clear movement and magnification method. Finally, for the open-ended questions about problems and further improvements for the 7 icon feedback types, problems commonly reported by the participants were ‘‘I felt a little fatigue when I tried to focus on some icon for a period of time,’’ and ‘‘Occasionally, I touched wrong places near my targets.’’ They also gave comments such as ‘‘It was easy to find or understand the icon while being touched (especially for the G Type).” The discussions can be developed into an understanding of how icon feedback features of small touch screen are perceived by older adults with different attention capability styles. Furthermore, comparing the related concept in Microsoft touch screen technique such as offset cursor, shift and wedge [7], some sophisticated adjustments to different people and situations could also be further applied. Even though it could be a tiny issue, it could still be an important role for the interaction on small touch screen.
430
W.-C. Tsai and C.-F. Lee 30.00
30.00
error
25.00
performance
20.00
5.00
1.33
1.00 9.18
0.82
10.56
0.82
11.01
0.55 8.68
1.09
10.90
1.18
11.35
9.25
10.00 5.00
1.33
14.56
14.16
Type-D
Type-E
0.93
21.23 16.41
13.79
0.00
Type-A
Type-B
Type-C
Type-D
Type-E
Type-F
Type-A
Type-G
Fig. 2. Group 1 performance means
30.00
Type-B
Type-C
Type-F
Type-G
Fig. 3. Group 2 performance means
30.00 error
25.00
performance
25.00
error 2.29
20.00
20.00 2.17
2.50
2.75
10.50
11.27
10.79
Type-A
Type-B
Type-C
1.75
1.33
1.50
1.33
10.21
9.54
10.20
10.33
Type-D
Type-E
Type-F
Type-G
performance
3.57 3.57
2.71
1.43
1.86 1.29
15.00 10.00
10.00 5.00
14.10
18.31
1.53
1.53
15.00
0.73
0.00
15.00
performance
2.67 20.00
15.00 10.00
error
3.53
25.00
19.76
16.31
18.80
15.32
18.50
16.67
Type-E
Type-F
5.00
14.43
0.00
0.00
Fig. 4. Group 3 performance means
Type-A
Type-B
Type-C
Type-D
Type-G
Fig. 5. Group 4 performance means
5 Conclusion This study examined the usability of icon feedback types in small touch screen devices. In order to find out how these feedbacks are suited for different Cognitrone groups, four divided older adult groups were examined. They represented the preferences of each icon feedback type, also they provided an overall picture of users’ need while using small touch screen. The findings can be used to develop guidelines for the design of icon feedback that suits the preferences of diverse older adults. More specific design considerations should be taken into consideration when developing a new icon feedback. Further research could also address access to appropriate software technique for touch screen interaction which can be characterized by the other factors, such as icon feedback characteristics on speed, intensity, portion, locomotion, and precision. We hope to provide a capability-diverse framework providing a useful starting point for analytical evaluation, by focusing on ways of related small touch screen issues to the range of user capabilities. Acknowledgements. This study received partly financial support from the National Science Council of the Republic of China Government, under Grant No. NSC 97-2221-E-224-024.
References 1. Dickinson, A., Gregor, P.: Computer use has no demonstrated impact on the well-being of older adults. International Journal of Human-Computer Studies 64(8), 744–753 (2006) 2. Jin, Z.X., Kiff, T.P.L.: Touch screen user interfaces for older adults: Button size and spacing. In: The 4th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2007, Held as Part of HCi International 2007, Beijing, China (2007)
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly
431
3. Lee, C.F., Tsai, W.C.: Mapping of user interfaces on electronic appliances. Applied Ergonomics 38(5), 667–674 (2007) 4. Leonard, V.K., Jacko, J.A., Pizzimenti, J.J.: An investigation of handheld device use by older adults with age-related macular degeneration. Behaviour & Information Technology 25(4), 313–332 (2006) 5. Persad, U., Langdon, P., Clarkson, J.: Characterising user capabilities to support inclusive design evaluation. Universal Access in the Information Society 6(2), 119–135 (2007) 6. Shneiderman, B.: Touch screens now offer compelling uses. IEEE Software 8(2), 93–94 (1991) 7. Vogel, D., Baudisch, P.: Shift: a technique for operating pen-based interfaces using touch. In: The 2007 Conference on Human Factors in Computing Systems, San Jose, California, USA (2007)
Ubiquitous Accessibility: Building Access Features Directly into the Network to Allow Anyone, Anywhere Access to Ubiquitous Computing Environments Gregg C. Vanderheiden Trace R&D Center University of Wisconsin-Madison Madison, Wi USA 53706
[email protected]
Abstract. Traditionally access to computers and electronic devices has relied extensively on the strategy of adapting the devices that the person with a disability needs to access or using a special version of the product. This was especially true for people with more severe or multiple disabilities. As we move to an environment where computers and information services are incorporated into our environments, and where people must be able to access the technologies they encounter throughout their day, we need to move to a different model that might be called “ubiquitous accessibility”. Ubiquitous accessibility would involve building access features for all people directly into the ICT systems in the environment so that access could be invoked directly by the user when they needed it. This approach would need to involve a combination of access features that were built in and features that could be invoked on demand from the network. Keywords: Ubiquitous accessibility, universal design, access for all, ubiquitous computing.
1 Introduction Although computing at one time was centralized with remote access terminals, the trend in the last few decades has been to move to ever more personal computing devices. For those individuals who could not access these personal devices through their standard interfaces, adaptations were installed. Since each person had their own computer, each could adapt it with physical or software adaptations such as key guards, screen readers, screen enlargers, etc. With the advent of the portable computer and laptop, individuals were able to take their computer with them and able to access and use it in different environments. While this worked for some personal applications, there were already problems when individuals were required to run special software which is only available on university or company work stations. For example, students having to use special programs at a university would find that the licenses would not allow them to install the software on their personal computers. Further, the work stations in the laboratories were not set up for them to use. Even when they had the proper software, they would not be configured and personalized. More often than C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 432–437, 2009. © Springer-Verlag Berlin Heidelberg 2009
Ubiquitous Accessibility: Building Access Features Directly into the Network
433
not, they did not have the special software or physical adaptations required. Also, they may have to use several different computers on different parts of campus, in different departments, running different software. As we move further into the future, however, we may be moving away from personal workstations, even as we move to more “cloud-like” computing. With the everdropping cost of interface technologies and continually increasing flexibility and compactness for displays, we may soon find that we are carrying around computers or work stations less and relying more and more on the interfaces which we will find built into all of the environments in which we find ourselves. And as more daily living devices incorporate computer-like interfaces or are network devices themselves the ability to adapt each one – one at a time becomes impractical. Some new strategy for access is needed to allow people to be able to invoke they access features that they need on any device they encounter. We need to have ‘ubiquitous accessibility’ as we move to ubiquitous computing.
2 Cloud Computing The move to Web applications and “cloud computing” at first looks to be providing assistance with this problem. Instead of software being tied to particular computers or workstations, software can be tied to people or authorizations. It can then be run on any workstation, allowing the individual to have access through their own personal work station. But the focus of our efforts needs to go beyond “workstations” and look at computing and computer interfaces as something that will be ubiquitous; always around us wherever we go. Although it seems far-fetched to think that there will be hardly a room or surface that is not electronically enabled (and that can be used as an interface) it was not that long ago that people carried lanterns or candles with them wherever they went if they expected to have light. If someone were to have told them that some day they wouldn’t have to carry light with them, that they would be able to assume that there would be light in every room they went into and most places outdoor as well, that light would just be built into every room and place they went – they would have thought it very unrealistic. Yet today, none of us carries light with us in our daily lives, but in fact we assume that there will always be light wherever we need to go, with a few exceptions. Display technologies are already proliferating, and information technology is rapidly merging with home entertainment, home control, telecommunication, transportation, and daily living appliances. Although not all are networked, fewer and fewer are not computer controlled and have computer operated interfaces on them. As these functions continue to blend and merge, we will generally find ourselves less willing to carry all these devices as we move around, and will begin to shift to a much more personally-liberating mode of invoking any of these functions from devices around us. We may carry a small personal device and rely on the displays and systems in our environment for everything else. All of this is going to cause major disruptions in the way companies think of and market products, and there will be some awkward periods while existing companies try to hang on to the successful models of the past, while others try to gain new
434
G.C. Vanderheiden
footholds in the models of the future. In the end, however, the technology advancements will cause a shift. We can see an example of this today in Microsoft and Google. The model of an installed operating system and installed applications has worked very well for Microsoft for many years. They play very well in that arena and would be happy to continue in the model as long as possible. Other vendors, like Google, however, are pushing toward cloud computing and virtual applications. They have successfully introduced the concept of software as a service that can exist in the network and be called up on any computer a person encounters. Security and network issues have slowed adoption, but these issues are being worked out. And now we are seeing a shift in Microsoft’s approach and future plans.
3 AT as a Service Assistive technology companies have long used a model of purchased hardware and software which are installed on a particular computer. Some assistive technologies, in fact, restrict the number of computers that a piece of software can be run on. However, other companies and initiatives, such as SAToGo [6] and Raising the Floor [8] and its partners [5], have introduced virtual technologies that can be invoked on computers without requiring any installation. This not only allows more mobility for individuals without requiring them to carry their specialized work stations wherever they go, but also for the first time provides access to individuals have fewer resources and who, in fact, do not have computers of their own. These latter individuals are able to use whichever computers they can find in their environments or communities, and invoke the needed access features from the Internet. 3.1 Centralized, Robust Access Features If access features exist in the “ether” and can be invoked on demand, then a very rich ecosystem is enabled, which has a much greater capacity to meet individual needs of users. This would be true even if the needs change over time, location, or task. Instead of accessibility being thought of as a single package, access could be viewed as a set of features or capabilities. If an individual is invoking their access features on something with a small display, they can invoke a different set of access features than if they are currently using a very large display. Similarly, if they are stable and seated in a work station-like environment, they may use one type of interface. If they are seated in a comfortable chair with less support, they may invoke a different set of access features. In the morning they may be stronger and use one mechanism for input, while in the afternoon they have to use a different set of access features. The different tasks that they are engaged in may similarly require them to use different feature sets. With a centralized feature-based, rather than package-based, accessibility model, many more individuals with different types, combinations, and degrees of disability could be accommodated. In fact, many individuals not perceived to have disabilities may be using many of these features due to situation-induced functional limitations [7]. For example, a person may be reading a book when they have to prepare a meal. They may switch to auditory presentation and controls that can be easily carried out
Ubiquitous Accessibility: Building Access Features Directly into the Network
435
with gross gestures. In a noisy environment they may go to an all-visual presentation mode. In an environment with a large display, they may use one mode of interaction, but use something different in an environment with a smaller display. As individuals age, they may naturally tweak their interfaces to keep them within range of their abilities in a quite natural way, rather than stretching and straining to use a single, standard interface until they are no longer able to use it, and then having to accept the stigma of being “disabled” and needing “special interfaces.”
4 Services on Demand This approach also allows the introduction of both more powerful computing services and human intervention. Network-based services allow central servers to be part of the system [1] [3] [4] as well as allowing a combination of computers and human services in a “try harder” approach [9] (see Figure 1).
Fig. 1. A person can use a variety of devices to access a broad range of services that include services best provided locally, by network servers, or by remote humans
This approach allows for a much broader array of services than would otherwise be possible at any point in time. Services that may someday be available using personal devices could be provided by network-based systems that have more capability. And services that may someday be provided in an automated way could be provided via human intervention today. One example of this is CapTel, a network based telephonecaptioning service available in the US [2]. Someday robust, speaker-independent speech recognition may be available on portable devices. But today it isn't even available on network based devices. A free service to people who are deaf or hard of
436
G.C. Vanderheiden
hearing in the US, however, provides captioning for telephone calls by linking in a special relay operator who listens in on the call and re-voices one side of the call into a computer very carefully. The computer then does text-to-speech that is corrected and sent on to the second caller who can both hear the person on the far end of the call and see captions of what they are saying.
5 Limitations and Changes with This Approach There are limitations to the concept of “purely” ubiquitous accessibility. People who need special physical interfaces will not be able to invoke them from the network. These individuals may have to carry their switches or interfaces around with them. However, what they carry with them may simply be transducers. And they may use different transducers for occasions when seated stably vs. when seated in a comfortable lounge chair, etc. These simple transducers can then be connected to their control software, which can indeed be invoked from the “ether” (i.e., from the Internet). This approach would allow them to have much less expensive and more flexible access to a much wider range of devices and systems in their environment. This could include information and communication devices as well as transportation and daily living devices that have computer controlled interfaces. Moving to this model will cause a similar major sea change in assistive technology and be accompanied by the same concerns and problems around any such paradigm shift. Although there is discussion of the availability of free public assistive interfaces built into the network, so that everyone can have basic access, it is likely that there will also be a rich (but quite different than today) market for commercial assistive technology, that also would reside in the network and be invoked by the users who have paid for it (purchase or rental). It is also possible that individuals needing special interfaces will in fact carry about with them more interface than individuals who do not need special interfaces. While many people may find that they can very easily use whatever interfaces they encounter in the environment, people who need special interfaces may find that it is easier for them to bring a larger part of their interface with them. For example, this might include not only transducers, but also special displays. They would then use these controls and displays instead of the controls and displays in the environment. This has some advantages, but also creates some challenges around device security, locus of control, product identity (marketing), etc. As we move toward and plan for these new environments, however, we should keep all of these variations in mind and not simply assume that individuals with disabilities would be able to use the interfaces and systems they encounter in the environment simply with invocable modifications.
6 Summary As we move toward more virtual applications and services that can play on whatever displays we find in our environments, we may find that we no longer carry our computing devices, or even our interfaces, around with us, but rather rely on the ubiquitous computing services and interfaces we will find integrated into almost any envi-
Ubiquitous Accessibility: Building Access Features Directly into the Network
437
ronment we find ourselves in. In some cases we will have to use the interfaces in the environments we are in for logistical or security reasons. As we move to more ubiquitous computing and interfaces, we need to move away from the “patch the system in front of us” model and begin to think of ubiquitous accessibility. This model has many advantages, but is also quite different from what we have today, and will require not only different models and support mechanisms, but will also require a transition path from where we are today to where we will be in the future. It is also quite possible that, while completely virtual or ubiquitous accessibility may meet the needs of some, others will need to bring at least part of their interface with them. We will need to plan for these situations as well. Although it is not clear exactly what form accessibility will take in the future, or the path that we need to take between what we have today and this future, it is clear that we do not have good answers today, and that we need to begin thinking and exploring soon, or people who need access features will be left behind again in the next big paradigm shift. The good news is that it appears as if where we will end up will not only allow more people to have more access for less money, but that the future will provide a potential for much more variability and incremental access across all dimensions, providing a better fit for more people and a viable economic model for individuals with more severe and multiple disabilities, who today constitute too small a market to be effectively served. Acknowledgement. The contents of this paper were developed under a grant from the U.S. Department of Education, NIDRR grant number H133E080022. However, those contents to not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government.
References 1. Bigham, J.P., Prince, C.M.: WebAnywhere: A Screen Reader On-the-Go. In: Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 225–226. ACM Press, New York, NY (2007) 2. CapTel – Captioned Telephony, http://www.ultratec.com/captel/ 3. Fairweather, P.G., Hanson, V.L., Detweilers, S.R., Schwerdtfeger, R.: From Assistive Technology to a Web Accessibility Service. In: Proceedings of the Fifth International ACM Conference on Assistive Technologies, ASSETS 2002, pp. 4–8. ACM Press, New York, NY (2002) 4. Hanson, V.L., Richards, J.T.: Achieving a Usable World Wide Web. Behaviour and Information Technology 24(3), 231–246 (2005) 5. Raising the Floor: Solutions & Tools, http://raisingthefloor.net/tools 6. SAToGo, http://serotek.com/softwaresolutions 7. Sears, A., Young, M.: Physical Disabilities and Computing Technologies: An Analysis of Impairments. In: Jacko, Sears (eds.) The Human Computer Interaction Handbook. Lawrene Erlbaum Associates, New Jersey (2003) 8. Vanderheiden, G.: Presentation (invited) to Joint ITU and G3ict Forum, on The Convention on the Rights of Persons with Disabilities: Challenges and Opportunities for ICT Standards, Geneva, Switzerland, April 21 (2008) 9. Zimmermann, G., Vanderheiden, G.: Modality Translation Services on Demand - Making the World More Accessible for All. In: RESNA 2001 Annual Conference Proceedings, pp. 100–102 (2001)
Using Distributed Processing to Create More Powerful, Flexible and User Matched Accessibility Services Gregg C. Vanderheiden Trace R&D Center University of Wisconsin-Madison Madison, Wi USA 53706
[email protected]
Abstract. Accessibility today is characterized by individual devices, which have been custom-built for people with disabilities (talking alarm clock, braille watches, special communication devices, etc.) or mainstream devices such as computers, which have been adapted with hardware or software to be usable by an individual with disabilities. This model results in more isolated access packages whose capabilities are limited by the particular devices on which they are run. By moving to more distributed, network-based accessibility solutions, we open up the potential for a much wider range of accessibility solutions which can not only evolve over time, but vary by environment, task, etc. A rainbow of on-demand services and capabilities can be available to them. It also opens up the potential for individuals who cannot afford assistive technologies to be able to tap into a pool of free public assistive services that they can use on any device which they encounter. Keywords: Web accessibility, services on demand, virtual AT, ubiquitous accessibility.
1 Introduction The current model for accessibility tends to limit the number of different types of AT and results in duplication of effort rather than building on efforts of others and extending the range of solutions available. It also adds to the already high costs of working in the AT field and increases the price to consumers with disabilities. Particularly hard hit are low incidence disabilities where there is both limited solutions and high cost. The current distribution and support model tends to work against variation. It is harder to sell products (or even train sales forces) if there is too much variation or too many products. Support also wants to have fewer types of products due to training and communication considerations. The model for variation therefore is to have different companies all providing different versions of the same basic product. This provides the product variation need of consumers but results in much duplicated effort. Since each company must develop essentially the same product over again (with their variation added) the costs to consumers are much higher than if companies could build off of each other’s efforts. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 438–444, 2009. © Springer-Verlag Berlin Heidelberg 2009
Using Distributed Processing to Create More Powerful
439
Another problem arises when there are many different products, each with their own different interface, that a person needs to access. For example an individual may use one talking interface with their computer and software, a different one for their clock, and different ones yet for the ATM, Metro fare machine, and the other devices and systems they encounter in their environment. Thus they may have to use a different screen voicing programs and not have access to their macros, vocabulary, etc. Current products also don't build off of consumer input and effort as much as they might. Some products have a rich consumer contribution network – such as scripts developed for some screen readers. However, if the scripts are non-standard and cannot be used across different AT, then the benefit is lost for other technology users and duplication of effort is again needed for different groups. Finally current stand-alone technologies may not be able to provide all of the processing functionality that may be needed at a price point that can be met by many users. Although processing power is growing rapidly, some of the access techniques may require more power or access to large data sets that are not available in small hand held units. The use of network based or network enhanced solution that allow distributed processing and collaborative development of solutions and application data can help to address these problems and provide more powerful, flexible and user matched accessibility services.
2 Raising the Floor, a Network-Based Approach to Assistive Technology An alternate approach to providing access for people with disabilities as well as those with barriers due to literacy is now being explored by the Raising the Floor Initiative, an international consortium of people and programs seeking to build accessibility directly into the Internet. The premise of their approach is that creating accessibility as a rich set of open-source access features and services that can be mixed and matched, and invoked on demand, can yield solutions that are much more powerful, lower-cost, and synergistic and that can address the needs of people with a much wider range of disabilities. It is further believed that it can increase the level for effectiveness of all assistive technologies (AT), including both commercial and free public assistive technologies (that is, it will essentially “raise the floor” for all assistive technologies). Basically what is envisioned is a rich set of open source modules that can be used to create a wide variety of access solutions to match the needs of individuals with different types, degrees, and combinations of disability. This set of open source modules would be used to construct free access features, which would be available in the network and invocable on demand by those who need them. The same modules would also be used by commercial assistive technology developers in their commercial products. 2.1 Feature-Based The envisioned system would be feature-based rather than device-based. That is, individuals could choose the particular features that best match their needs and abilities.
440
G.C. Vanderheiden
Different variations of techniques would be available. For example, although highlighting text as it is read is useful to individuals with low vision, cognitive, language, or learning disabilities, and even individuals with physical disabilities who are unable to hold their head still enough to easily track when reading, the type of highlighting and the function that highlighting provides for them differs. Different highlighting techniques may be more effective for one individual than another. Common packages of techniques can be assembled to make it easy for individuals getting started, and many people may stay with these common packages. The system, however, would not constrain them to these features, however, and other free or commercial features could be used instead. These features could be packaged in different ways. Figure 1 shows some of the different ways these features could be implements. Types 2 through 4 are all examples of network-based solutions.
Fig. 1. 4 Different approaches to implementing the access features
Type 1 is basically a super-browser based solutions [1] [2] [3]. Type 2 approaches would be ‘services-on-demand’ features like the color shifting service a page voicing service [4]. Type 3 approaches are the ‘proxy based transcoder’ solutions [5] [6] [7] [8]. Type 4 would be virtual assistive technologies and network based assistive features such as WebAnywhere [9]. Type 4 will be the primary type of approach used in Raising the Floor since it will allow access any computer and allows access by anyone including those who have no computer or device of their own but will be accessing the Internet from any computer that they can get to use.
Using Distributed Processing to Create More Powerful
441
2.2 Power on Demand Because the features are network-based, the power that is available to them for any particular feature can scale with their need. If they encounter something that needs heavier computing power, this can be easily available to them without their having to have a high-powered system idling the rest of the time, as would be necessary in a personal system. The power can be available when they need it and used by others when not needed, providing more power with less cost. For example, relatively little power may be needed when reading through text pages on well laid-out and semantically marked-up Web pages. However, if the individual encounters a complex page or a graphic diagram, more powerful algorithms and computing resources can be drawn on to convert the content into a form that can be more easily navigated and understood. 2.3 Pooled Learning and Crowd Sourcing Working from a network-based AT with common engine components provides the opportunity for learning done by one user to carry over to another. Words that are mispronounced or phrases that are hard to understand can be resolved by one user, and the results made available to others. This can either occur through the users who solve a problem themselves making their solution available to others, or it could occur from peers or volunteers who can answer a question for one person and have the answer available for others that follow. For example, the pronunciation for technical or trade names may not exist within screen reader vocabularies. However, once the correct pronunciation is entered by or for one user, it can be available to all through the use of a shared common dictionary. In fact, regional dictionaries can exist to allow pronunciation bases to grow naturally and in a form appropriate to the individuals’ environment. Complicated Web pages can be very difficult to decipher if not properly marked up. Again, individuals having trouble with a page might ask for assistance. Or more sophisticated users may figure out the pages for themselves. Once the page is augmented with semantic mark-up to make it easy to understand and navigate, the same mark-up can be made available to any others who encounter the page. [10] [11] [12] [13]. 2.4 Internationalization All of the modules will be written in a form that supports easy localization. The goal is to have a rich set of access modules that can be adapted to many different languages and cultures, including many that do not now have a rich set of access solutions. This is a key focus of the Raising the Floor Initiative and one that is only possible through the open-source modular approach that allows not only ‘localization’ of the menu and control text but any other modifications or extensions that are needed to address both language and cultural support. 2.5 Free Public Access One of the most important objectives of the Raising the Floor effort is enabling the creation of robust, yet free, public accessibility features that are built directly into the
442
G.C. Vanderheiden
Internet and invocable on demand. The cost of currently assistive technologies that are powerful enough to be able to access and process the complex new Web technologies is far beyond the financial reach of most people internationally who need such features. This includes a larger percentage of people with disabilities, as well as a very large population of people who cannot access the Web directly due to literacy issues. These are individuals who not only cannot afford assistive technologies, but often cannot afford a computer either. Their access is on systems owned by others that they can get access to. The ability to be able to invoke the access features they need anytime, anywhere, on any computer that they can get permission to use is central to their ability to access the Internet and its resources. 2.6 Commercial Assistive Technologies But free access features will not be able to meet the full needs of people with disabilities or literacy problems. The RtF model is therefore designed to support commercial assistive technology as well as the basic free public access features. Commercial AT vendors can use both the open source modules and the distribution mechanism that’s being established in their commercial products (and distribution). In addition to potentially reducing the effort of AT developers to create new products and distribute them it also allows users to be able to have a richer set of access options that they can invoke on demand. By having a single unified distribution mechanism users can invoke a combination of free, public access features and commercial assistive technologies. It also allows them to invoke assistive technologies from different companies with a single request. They just ‘sign-on’ and have their personal set of access features and assistive technologies (from different vendors) available for them on the computer in front of them. The model also will support a concept of “micro AT.” Since the system is modular and feature-based, it will be possible for companies to build and market individual features. In this fashion, companies or individuals can build new features or extensions that they either contribute to the public domain, or market commercially as single accessibility features or “micro AT.” This approach has been very successful in other fields such as plug-ins for common software products and app-stores like Apple has implemented for the iPhone. 2.7 Profile Based Current personalization approaches emphasize independently configuring a large number of potentially confusing interface settings, including preferences for sound, display, messaging, and single key commands (speed dial, mute, etc.). In contrast, the profile-based approach envisioned offers the potential for users to easily select a range of information displays and services tailored to their needs and capabilities. Once one or more profiles have been configured, the user can set multiple parameters to appropriate values simply by selecting a profile. If combined with available contextual information, such as day of week, time, and GPS location, a profile-based approach could offer the possibility to appropriately adjust or even change information modalities. For example, a handset sound volume
Using Distributed Processing to Create More Powerful
443
can be automatically turned up while traveling or switched to vibrate when at a doctor's office. If a user interface architecture is designed that supports this tailoring, it could also promote the modular sharing of UI services across applications and handset platforms. Profiles for mobile devices could be specified using the same schema being developed for computers and based on the AccessForAll framework. Such profiles can be stored online and delivered by an identity provider to any device a user is operating. [14]
3 Benefits The proposed collaborative open-source and network based approach to creating access solutions has potential benefits for many stakeholders. People with disabilities or who can't read, - would be able to go up to any computer and invoke the access features they need without having to install anything on the computer. An older person - who had finally mastered use of the Internet on a computer at her living center would be able to user the computer at here children’s houses when she visits and have all of her features and setup available to her. A worker with a disability -who needs special commercial assistive technology software, would be able to purchase it once and use it on a computer anywhere -- at work, at a satellite office, at home, or on the road. Individuals who cannot afford any technology - would be able to use any technology they can get access to and invoke the features they need free. AT companies - can use the modules in their commercial products, reducing the time to build and maintain core existing features and allowing them more resources to innovate. Mainstream IT companies – would have access to the modules directly and could contribute to their maintenance, ensuring better compatibility with their products, and contributing special modules to facilitate interfacing with new products they introduce. Researchers – can use the module to create working systems that they can extend through their research. This both facilitates their research and also provides a more successful path for transferring their results to either commercial or free public availability. Consumers – can contribute scripts, definitions, support, and/or code as their skills and time allows. Consumers who solve problem with websites or particular types of content can more easily share them with others. All stakeholders can have access to more powerful services that can be invoked on demand creating new opportunities for addressing access issues. Acknowledgement. The contents of this paper were developed under a grant from the U.S. Department of Education, NIDRR grant number H133E080022. However, those contents to not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government.
444
G.C. Vanderheiden
References 1. Hanson, V.L., Richards, J.T.: Achieving a usable World Wide Web. Behaviour and Information Technology 24(3), 231–246 (2005) 2. Borodin, Y., Mahmud, J., Ramakrishnan, I.V., Stent, A.: The HearSay non-visual web browser. In: Proceedings of the 2007 International Cross-Disciplinary Conference on Web Accessibility (W4A), vol. 225, pp. 128–129 (2007) 3. Mahmud, J.U., Borodin, Y., Ramakrishnan, I.V.: CSurf: A context-driven non-visual webbrowser. In: Proceedings of the 16th International Conference on World Wide Web, pp. 31–40. ACM Press, New York (2007) 4. Erra, U., Iaccarino, G., Malandrino, D., Scarano, V.: Personalizable edge services for Web accessibility. Universal Access in the Information Society 6(3), 285–306 (2007) 5. Huang, A.W., Sundaresan, N.: Aurora: A conceptual model for web-content adaptation to support the universal usability of web-based services. In: Proceedings on the 2000 Conference on Universal Usability, pp. 124–131. ACM Press, New York (2000) 6. Fairweather, P.G., Richards, J.T., Hanson, V.L.: Distributed accessibility control points to help deliver a directly accessible Web. Universal Access in the Information Society 2(1), 70–75 (2002) 7. Hanson, V.L., Richards, J.T.: Achieving a usable World Wide Web. Behaviour and Information Technology 24(3), 231–246 (2005) 8. Parmanto, B., Ferrydiansyah, R., Zeng, X., Saptono, A., Sugiantara, I.W.: Accessibility transformation gateway. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, vol. 07, p. 183. 1. IEEE Computer Society, Washington (2005) 9. Bigham, J.P., Prince, C.M.: WebAnywhere: A screen reader on-the-go. In: Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 225–226. ACM Press, New York (2007) 10. Seeman, L.: The semantic web, web accessibility, and device independence. In: Proceedings of the 2004 International Cross-Disciplinary Workshop on Web Accessibility (W4A), vol. 63, pp. 67–73 (2004) 11. Bigham, J.P.: Accessmonkey: enabling and sharing end user accessibility improvements. ACM SIGACCESS Newsletter: Accessibility and Computing 89, 3–6 (2007) 12. Bigham, J.P., Ladner, R.E.: Accessmonkey: a collaborative scripting framework for web users and developers. In: Proceedings of the 2007 International Cross-Disciplinary Conference on Web Accessibility (W4a), vol. 225, pp. 25–34. ACM Press, New York (2007) 13. Chen, C.L., Raman, T.V.: AxsJAX: a talking translation bot using google IM: bringing web-2.0 applications to life. In: Proceedings of the 2008 international Cross-Disciplinary Conference on Web Accessibility (W4a), vol. 317, pp. 54–56. ACM, New York (2008) 14. Cooper, M., Treviranus, J., Heath, A.: Meeting the diversity of needs and preferences – a look at the IMS AccessForAll specifications’ role in meeting the accessibility agenda efficiently. In: Proceedings of the 2005 Accessible Design in the Digital World Conference, https://www.bcs.org/upload/pdf/ewic_ad05_workshop3.pdf (2-15-09)
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone Bruce N. Walker and Anya Kogan Sonification Lab, School of Psychology Georgia Institute of Technology Atlanta, GA 30332
[email protected],
[email protected]
Abstract. This study investigates the use of spearcons as an auditory cue. It looks simultaneously at both performance and subjective preference of spearcons and text-to-speech (TTS). The study replicated on a mobile phone a previous PC-based study run by Palladino and Walker [1]. Performance results have been very similar to those found in the previous study, supporting the generalizability of spearcon performance from PCs to mobile phones. TTS and spearcons both provided comparable performance improvements, suggesting that spearcons do not negatively effect the design of visual and non-visual menus and may, within the right context, lead to enhanced designs. Participants gave positive performance scores to both TTS and spearcons when no visual cues were provided. Higher rankings were provided for all audio cues when Spearcons were included both in visual and non-visual conditions. Keywords: sonification, spearcons, auditory interfaces, auditory menus.
1 Introduction Many types of auditory displays, and in particular, auditory menus, have been studied either as enhancements to visual displays or as the primary means for interacting with a system or device. Such auditory displays can improve a variety of products, from those with small screens to those being used in limited or no-vision contexts. This may include the use of mobile phones while driving or while walking outside where glare is prevalent. Users with vision impairments can also benefit [2], as many recent GUI designs rely strictly on visual interaction. However, there remain unanswered questions regarding the best ways to design auditory menus. While most auditory menus are based on simply speaking the menu items to the user (often via text-to-speech, or TTS), this basic approach is now regarded as somewhat simplistic. Many auditory menu enhancement approaches have been considered, in order to maximize the functionality of this new type of interaction. There are several solutions that have been explored most recently as part of auditory menu design. Four approaches that have had considerable attention include: regular speech with no enhancements; adding auditory icons to a speech-based menu [3]; adding earcons [4]; and, as is demonstrated in this study, adding spearcons [5,6]. All C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 445–454, 2009. © Springer-Verlag Berlin Heidelberg 2009
446
B.N. Walker and A. Kogan
of these design approaches have their advantages and disadvantages, many of which are still being studied in order to determine the most proper usage for each. 1.1 Auditory Menus Increasing the usability and accessibility of menus on small electronic devices is essential due to their decreasing sizes and increasing proliferation. Advanced auditory menus are being studied as an enhancement to the visual-only menus currently on most of these devices, especially when the user is unable to look at the device (e.g., it is in a pocket) or unable to see it (e.g., due to a vision impairment). It remains to be determined how to design an optimal auditory menu, but various enhancements have been proposed to improve the basic (and often unsatisfactory) text-to-speech (TTS) menus often deployed. The study presented here focuses on the use of spearcons within auditory menus, but we also explain other approaches, for historical context. Using sound to enhance menus on electronic devices and desktop computers has the potential to significantly widen the user base. However, most audio menus today are limited to a simple system consisting of a text-to-speech (TTS) conversion of words and phrases. Users can listen to the text provided, use a combination of arrow keys to navigate the menu, and use a select button to choose menu items. Auditory enhancements have sometimes been prepended to the TTS. The goal of these enhancements is to elevate the efficiency of menu navigation by allowing users to listen to just the cue, if the TTS phrase is not needed for menu navigation. In fact, after becoming familiar with such a system, some users can even turn off the TTS completely and use just the extra audio cues for maximum speed and efficiency. The transient nature of sound produces several usability challenges to its use in menu design. First, the speech comprehension speed among individuals is highly varied. One study found that blind listeners can understand speech at speeds up to 2.8 times faster than the standard TTS [7]. These differences can be a challenge in creating universal audio cues. Another challenge is location awareness. Users must be able to quickly grasp their location within a menu hierarchy in order to choose the correct path to their desired item [8]. Unlike a visual menu in which users can scan quickly in order to determine their current location, audio menus can take considerable time to present items, and thus can tax the user’s working. Further, learning novel auditory cues can be a strain on the user’s time, and can lead to poor acceptance; therefore, choosing cues with short learning curves is essential. Because sound is currently rarely utilized for navigation of menus, there is little information on users’ general acceptance of audio cues. It is important to begin to assess user opinions and preferences, since usability depends not only on performance (e.g., time to target), but also subjective impressions. This study will open this topic for research, especially related to the use of auditory menus enhanced with spearcons. 1.2 Auditory Icons Although this experiment focuses on the use of spearcons, it is important to provide context for their research and development in light of previously developed audio cues. Auditory icons [3] and earcons [4] have been the most popular predecessors to spearcons and will be discussed here. Both have had their advantages and disadvantages, which have been partially addressed with the use of spearcons.
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone
447
An auditory icon is a direct or metaphorical representation of a word or concept [3], often utilising the sound that the associated word would be known for. For example, a “dog” could be represented with a “woof” sound, and a “cow” with a “moo” sound. For words with clear sound associations, learning can be quick and easy. However, when dealing with electronic menus, clear associations can be difficult or even impossible to create. For instance – what would “delete file” sound like? For this reason, these icons are of limited utility in menu design for modern electronic devices and systems. 1.3 Earcons Earcons [4] are brief musical motifs or melodies that are used to represent a menu item. Earcons do not require the same natural associations as auditory icons do, and thus can be applied to menus containing any type of information. They can be produced using any systematic set of musical elements that can vary according to frequency, timbre or tempo in order to indicate hierarchy. Guidelines for their design have been suggested by Hereford and Winn [9]. Earcons can present problems due to their rigidity in being able to add or subtract menu items as needed. For example, if an item is inserted into a menu (e.g., adding a new name in a Contact List), the new item would get an earcon assigned to it. However, it is difficult to determine whether it would make sense to keep the earcon assigned to that point in the menu, and move all the other menu items down to be reassigned to the existing earcons, or else also insert a new earcon for that new menu item. Unfortunately, earcon hierarchies are often fixed, and are based on a musical scale, so inserting a new earcon is generally not possible. Clearly this makes the menu somewhat inflexible, as well. In any case, learning arbitrary earcon-menu item assignments can also be frustrating [6] and difficult for users, even if the mappings do not change. As Walker et al. [5] have stated, the arbitrary nature of the earcons is considered both its strength and its weakness. At the same time, Palladino and Walker [6] showed that listeners learn to associate menu items to spearcons faster than to other types of sounds, such as earcons. 1.4 Spearcons A spearcon [5], the auditory menu enhancement cue explored in this study, is created by speeding up a spoken phrase without modifying the perceived pitch of the sound. Some of the speech used is compressed so that it is no longer comprehensible as a particular word but a mere representation of that word or phrase, similar to how we think of an icon as a particular image that represents an idea. Walker et al [5] compared the spearcon to a fingerprint because each unique word or phrase creates a unique sound when compressed that distinguishes it from other spearcons. A short learning session leads to easy association of the spearcon to its related word. Once a spearcon is created, it is prepended to the original spoken word (created either by a TTS generator or a recorded voice) to make a complete, enhanced menu item. A 250 ms pause is typically inserted between the spearcon and the spoken word or phrase. Spearcons are convenient in part due to their brief duration and easy production. More on the production of spearcons can be found in Palladino and Walker’s 2008 [1] spearcon study.
448
B.N. Walker and A. Kogan
There are many advantages to the use of spearcons in auditory menu design. Despite not having natural hierarchical associations, like earcons, Walker et al. [5] found them to result in significantly more efficient navigation than hierarchical earcons [1]. It would also be possible to create additional hierarchical information for the user by augmenting the spearcon with additional audio information if needed. However, the utility of spearcons in real mobile applications remains to be studied. Desktop applications and mobile phone simulators can provide great insights, but the use of spearcons to enhance menus on a mobile phone may lead to different results. Thus, the present study investigates TTS menus with or without spearcons, and also extends this research paradigm to include an assessment of subjective opinions of these various menu designs.
2 Method 2.1 Participants A total of 89 undergraduates (55 women and 34 men, mean age = 20) with normal or corrected-to-normal hearing and vision participated for extra credit in psychology courses. English was the native language for 76 of the participants. There were between 15 and 20 participants in each condition. 2.2 Design This experiment consisted of a between-subjects design with two independent variables. The first was sonification type (TTS Only, Spearcon Cue + TTS, or No Audio), and the second was the visual cues (visual menus were either on or off). Since it would not be feasible to have both no audio and no visual, that condition was not used for this study, leaving five valid experimental conditions. There were two dependent variables used. The first was the time taken to select the target menu items for each trial. The second was a set of subjective preference scores given to each of the auditory cues used—TTS and Spearcons—individually. 2.3 Materials Participants used a Nokia N95 mobile phone with a simulated contact list running in Java on the Symbian S60 platform. They used the arrow keys to navigate to desired menu items. They listened to the audio cues through Sony MDR-7506 Dynamic Stereo Headphones. The names used in the contact list (e.g., “Allegra Seidner”) were taken from the study by Palladino and Walker [1], and were produced from a random name generator and translated into sound using the AT&T Labs, Inc TTS Demo program. Spearcons were produced by speeding up the TTS phrases to be very short sounds. The speed-up is logarithmic, so long phrases see a greater compression. The pitch of the sounds is maintained, and the spearcons are still clearly “related to” the original source TTS sounds. More details on spearcon generation can be found in Palladino and Walker’s publication [1].
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone
449
Each Spearcon + TTS stimuli was created by using Audacity software to prepend the spearcon cue to the TTS with a 250 ms post-cue interval between them. The target name was visibly listed at the top of the phone screen for both the visual on and visual off conditions. In the visuals on conditions, a scrollable list of 50 names were presented, nine of which were visible at a time. A photo of the screen presented can be seen in Figure 1. All 50 names were displayed in alphabetical order by first name. The list scrolled upward or downward according to the key presses. In the visuals off conditions, the list portion of the screen Fig. 1. Screen presented in the condition with was left blank (i.e., below the target the visuals on. The target name, shown name), though the underlying list was underlined at the top of the list, was still active and navigable. For all randomized for each trial. conditions, the list of names did not wrap at the top or bottom of the list to allow for a representative time to target measurement. As each name was placed in focus, both audio and visual cues were presented simultaneously. 2.4 Procedure Participants were assigned to one of five conditions: (1) TTS prepended with a spearcon and no visuals cues; (2) a single TTS cue with no visuals; (3) a single TTS cue with visuals cues; (4) only visual cues with no sound; and (5) TTS prepended with a spearcon and visual cues. Every 25 trials were grouped into a single block, for a total of 10 blocks. Each block was counterbalanced so that one half of the names was used as targets in some blocks and the other half was used in the other blocks. Each block was otherwise identical to all others for a given participant. Participants were first read aloud a set of instructions that taught about the structure of the menus presented and the required task which was to find the requested target names as quickly and accurately as possible. They were told that they would be timed during the study. Once the participants were given a phone, they could begin by pressing a “continue” key. The timer started once the first down key was pressed. Participants used the up and down arrow keys to reach the target name within the list. Once a name was selected, the end time was recorded and the participant saw the next trial screen with a new target name. Every 25 trials, the participants saw a screen indicating the end of a block and the start of a new one. Each of the nine subsequent blocks proceeded in the exact same way. After the tenth block, participants filled out a questionnaire that included demographics (i.e. age, gender, native language) and Likert agreement statements to assess their preferences for the TTS and Spearcon audio cues individually. They were only asked to provide their opinions on the cues that were present in their given condition. The scales probed helpfulness, distraction level, preference over silence, fun and annoyance level. A free-response box was also provided for extra comments.
450
B.N. Walker and A. Kogan
3 Results 3.1 Time to Target An alpha level of .05 was used for all statistical analysis. Trials with incorrect item selection were disqualified (0.79% of trials in all, 64 in Visuals Off/Spearcons+TTS condition, 21 in Visuals Off/TTS condition, 32 in Visuals On/No Sound condition, 23 in Visuals On/TTS condition, and 38 in Visuals On/Spearcons+TTS condition); a total of 22,072 trial records remained for data analysis. Figure 2 presents the results, specifically the mean time to target for each condition in each block of the experiment. A planned Tukey honestly significant difference was performed on the data to check for significant differences among the different experimental conditions. As expected, overall performance on all conditions including visual cues was significantly faster than those including only auditory cues. A Tukey honestly significant difference analysis of Block 10 data for each condition found no significant difference between any of the three visuals-on conditions (p > 0.05). By Block 10, the significance of the differences in means between the Visuals On/TTS (M = 6546, SD = 3064) and Visuals On/Spearcons + TTS (M = 7061, SD = 3408) conditions in Block 10 was very small. It is also clear from Figure 2, that even though the differences between the conditions using auditory-only and auditory with visual cues in Block 10 are significant, there is much less of a difference between the auditory only and visual conditions than existed in the first block of the experiment. Figure 3 illustrates the mean time to target for the five categories in the first and tenth blocks. 25000
Visuals Off/Spearcons + TTS Visuals Off/TTS
20000
Visuals On/Spearcons + TTS Visuals On/No Sound
15000
Visuals On/TTS
10000
5000
1
2
3
4
5
6
7
8
9
10
Block Number
Fig. 2. Mean time to target in milliseconds for all conditions over all blocks
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone
451
Fig. 3. Mean time to target in milliseconds for all conditions in Blocks 1 & 10. Error bars are 95% confidence intervals.
Collapsing across audio cue types, conditions with the visuals on were significantly faster than visuals off, in both Block 1,F(1, 2197) = 661.269, p < 0.001, and Block 10, F(1, 2197) = 348.079, p < 0.001. Considering the different audio cue types (TTS vs. spearcon+TTS), the spearcons cues led to slower times in Block 1, F(1,2197) = 9.539, p = 0.002, but the effect diminished quickly over the first few blocks, and no significant difference was found amongst the sound conditions for Block 10 (p > .05). 3.2 Subjective Ratings The participants gave scores on five dimensions (i.e. helpfulness, distraction level, preference over silence, fun and annoyance level) by providing agreement or disagreement responses on a Likert scale. The scores were also aggregated into an overall preference score for each participant. The means across all participants for each condition and audio cue are summarized in Figure 4. Overall, there was no significant difference in preference for spearcons and TTS, F(1,18) = 3.319, p = 0.071. However, a t-test comparing visuals on and visuals off conditions demonstrated that both audio cues were significantly better rated when no visuals were provided, t(106) = 6.706, p < 0.001. The TTS sounds were given significantly higher rankings when they were accompanied by spearcons than when they were not, in both the visuals on condition, t(33) = -2.234, p = 0.032, and visuals off condition, t(33)= -3.181, p = 0.004. That is, simply adding spearcons seemed to lead to higher ratings of the TTS, with no performance difference after a few blocks of practice.
452
B.N. Walker and A. Kogan
Fig. 4. Mean aggregate subjective preference scores, 5 being the highest possible score. TTS is given higher scores in the presence of spearcons.
4 Discussion The performance results confirm many of the findings in the study by Palladino & Walker [1], allowing us to generalize the utility of spearcons as part of auditory menus from the desktop to the mobile phone. Conditions with visual cues led to faster responses, as compared to conditions with only auditory cues. This is understandable, given that the visual list allows for fast look-ahead. With the visuals on, the type of audio cues did not matter. That is, adding spearcons did not negatively impact performance, even though the spearcons add approximately half a second to each audio cue. In fact, even the silent (visuals only) condition was no different from the TTS and spearcons conditions, when the visual list was presented. It is likely the case that with the visuals on participants are moving through the list about as fast as possible by relying largely on the visual interface. Practice does not have much of an impact, supporting the interpretation that this is a highly practiced task. Adding the audio at least does not slow down performance when the visuals are on. When the visuals are off, overall performance was slower than when visuals were on (see the top lines in Figure 2). However, with a little practice, performance in the audio-only conditions improved, and closed in on the conditions with visuals on (see the narrowing of the gap between the top lines and the bottom three lines, in Figure 2, from Block 1 across to Block 10). This bodes well for the use of auditory menus, even for users with little or no experience with audio-only interfaces. Within the pair of audio-only conditions, it is interesting to note that TTS-alone initially led to faster performance than spearcons+TTS, but this difference went away by Block 10. In Block 1, it is likely the case that because the spearcons were prepended to the TTS for each item, participants took the time to listen to both cues
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone
453
before making a selection, rather than focusing strictly on the spearcon to take advantage of its cuing capability. From the open-ended comments from participants, it appeared that they would hold down the arrow key to scroll quickly to the necessary item, then listen to the entire auditory cue and make the selection as needed. This showed that they made very little use of the auditory cue and relied mainly on their recollection of the alphabetical list organization. This would explain why a previous study by Palladino & Walker [10] showed a significant difference in the spearcons and TTS conditions while testing shallow two-dimensional menus. In that study, participants needed to listen to each menu item before proceeding to the next, since they could not predict what was coming. It was not beneficial for them to hold down the arrow key each time as they did in the present study with a deeper menu structure, as that would lead them to miss the necessary cue. However, as they became more familiar with both the list and the audio cues, participants here relied on the spearcons more. We know this because the overall performance times were comparable in the spearcons+TTS and TT-only conditions. That is, if they listened to, say, 1000 ms of audio for each menu item, then in the spearcons case this means about 250 ms of spearcon, 250 ms of silence, and 500 ms of TTS. Without the spearcon, this means 1000 ms of TTS. Thus, with practice, listeners came to make item selection decisions without listening to very much of the TTS phrases. Indeed, spearcons contribute a lot to performance of the navigation task. The preference questionnaire demonstrated the positive reception of auditory cues in the absence of visual cues, as both spearcons and TTS were rated positively in the no-visual condition. This shows that, in a setting where users must rely on sounds to complete a task, they are inclined to feel good toward the sounds given, regardless of format. However, when they can rely on the visual sense to guide them, they prefer not to hear any audio and may even be annoyed by the sound. Given that there were no performance differences in the three visuals on conditions (silent, TTS only, and spearcons+TTS) it is instructive to consider the subjective ratings as well as the performance measures. Taken together, then, it is clear that users must be provided with the option to turn off audio when visuals are available, and turn it on only when it is perceived as desired and/or necessary. One additional caveat is that the audio quality needs to be optimized. Several participants commented on having trouble deciphering the audio cues for both spearcons and TTS. It is be important not to discount the interaction modality as a whole, simply due to a less-than-optimal implementation. While we are confident that the sounds here were generally acceptable and intelligible, the TTS could certainly be produced with higher quality algorithms. This would also improve the quality of the spearcons, since they are derived from the TTS sound files. The general receptiveness of listeners to audio cues to aid navigation in a no-visual context supports further research into auditory menu design and deployment. In particular, it would be interesting to test how spearcons are perceived in a twodimensional menu study, where they have shown improved performance over TTS alone. That is, what happens when both the preference and performance cues support spearcon use? Most interestingly, although preference ratings for TTS were consistently higher than spearcons, the TTS ratings were even higher in the presence of spearcons. That is, adding spearcons to TTS seemed to enhance the ratings of the TTS. It is possible that listeners considered the spearcons+TTS menus to be more sophisticated or perhaps interesting, and this was rated as preferable. This has great implications for
454
B.N. Walker and A. Kogan
designing with spearcons. While not harming overall user performance, spearcons can provide another layer to the user experience of audio menu navigation, one that encourages positive receptiveness to a new system.
5 Future Work Future studies are focusing on the use of spearcons in audio-dependent contexts, where the participants cannot devote their full attention to the visual cue. In particular, we will be looking at task performance while a participant is simultaneously working on a visually and cognitively distracting task. This will be tested both in a desk setting and in a mobile one, where the user is walking on a designated route. We will be looking for effects on performance as well as subjective preference feedback from those involved. And, of course, we are extending these studies to participants with vision impairments, as they will be the primary users of (non-visual) advanced auditory menus, enhanced with whatever cues make the interfaces more effective and more pleasing to use.
References 1. Palladino, D., Walker, B.N.: Efficiency of spearcon-enhanced navigation of onedimensional electronic menus. In: Proceedings of the International Conference on Auditory Display (ICAD 2008), Paris, France (2008) 2. Nees, M.A., Walker, B.N.: Auditory Interfaces and Sonification. In: Stephanidis, C. (ed.) The Universal Access Handbook, pp. TBD. Lawrence Erlbaum Associates, New York (in press) 3. Gaver, W.W.: Auditory Icons: Using Sound in Computer Interfaces. In: Human-Computer Interaction, vol. 2, pp. 167–177 (1986) 4. Blattner, M.M., Sumikawa, D.A., Greenberg, R.M.: Earcons and icons: Their Structure and Common Design Principles. In: Human-Computer Interaction, vol. 4, pp. 11–44 (1989) 5. Walker, B.N., Nance, A., Lindsay, J.: Spearcons: Speech-based Earcons Improve Navigation Performance in Auditory Menus. In: Proceedings of the International Conference on Auditory Display (ICAD 2006), London, England, pp. 63–68 (2006) 6. Palladino, D., Walker, B.N.: Learning rates for auditory menus enhanced with spearcons versus earcons. In: Proceedings of the International Conference on Auditory Display (ICAD 2007), Montreal, Canada, pp. 274–279 (2007) 7. Asakawa, C., Takagi, H., Ino, S., Ifukube, T.: Maximum Listening Speeds for the Blind. In: Proceedings of the International Conference on Auditory Display (ICAD 2003), Boston, MA (2003) 8. Leplatre, G., Brewster, S.: Designing Non-Speech Sounds to Support Navigation in Mobile Phone Menus. In: Proceedings of the International Conference for Auditory Display (ICAD 2000), Atlanta, GA, pp. 190–199 (2000) 9. Hereford, J., Winn, W.: Non-Speech Sound in Human-Computer Interaction: A Review and Design Guidelines. Journal of Educational Computing Research 11, 211–233 (1994) 10. Palladino, D., Walker, B.N.: Navigation efficiency of two dimensional auditory menus using spearcon enhancements. In: Proceedings of the Annual Meeting of the Human Factors and Ergonomics Society (HFES 2008), New York, NY, September 22-26 (2008)
Design and Evaluation of Innovative Chord Input for Mobile Phones Fong-Gong Wu, Chia-Wei Chang, and Chien-Hsu Chen Department of Industrial Design, National Cheng Kung University, 1 University Rd., Tainan, Taiwan 70101, Taiwan
[email protected],
[email protected],
[email protected]
Abstract. Text message is one of the most popular functions of mobile phones, apart from talking through the phone. This study focuses on how chord input is being used on mobile phones, as well as operating phones with chord input. We propose two new mobile phones: Tri-joint key and Four-corner key, which combines with the chord input and the natural finger localization. There were 14 male participants and 6 female participants that participated in this research; after 9 days of practice with the content of numerals, English characters and English phrases. The result shows the performances of the participants have increased, including the speed of completing tasks and accuracy. There is no significant difference between these two new styles of phones and the ordinary type concerning the user satisfaction chart. This also means users could accept new kinds of input devices. Keywords: mobile phones, keyboard, chord input, input device, innovation.
1 Introduction Mobile phones have been in our lives for the past decades, however the design of its buttons have barely changed. In terms of English alphabets, the 26 alphabets and other numerals have to be fitted into 12 buttons for input purposes. Therefore a button is used for the input of three or four alphabets by pressing the button multiple times. This input method is inefficient and causes burden on your muscle and bones. Different systems for different languages furthermore use different input methods on the buttons. People want and need a bigger display screen. If the buttons occupy a big area of the mobile phone, then the size of the screen will be limited. Minimizing the buttons will cause inconvenience for input purposes. The need to increase its function and decrease its size of an electronic product will cause point of pain in the field of Ergonomics [1]. In terms of electronic products, the user interface can be divided into two types: GUI (Graphic User Interface) and SUI (Solid User Interface). SUI emphasizes on the importance of the control, signs, buttons and knobs; whether they fulfill the size, position, sense of sight, hearing and touch in Ergonomics. Due to the increased usage of mobile electronic products, and yet we are still using the traditional input C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 455–463, 2009. © Springer-Verlag Berlin Heidelberg 2009
456
F.-G. Wu, C.-W. Chang, and C.-H. Chen
keyboard. There are many keyboards that are defined this way; the size of the mobile cannot be decreased which in result is not suitable for mobile products. Using two of our hands on a keyboard has also further caused injury on one’s finger, wrist and forearm [2]. Baumann and Thomas also pointed out that most electronic products require multifunctional purposes; the idea of one button for one function is not viable anymore [3]. This will lead to a large number of buttons occupying a large area of the interface, raising the production cost and increasing the burden of the user. The rate of error will also increase. The buttons of a mobile phone is an important factor influencing the control of the phone; if the phone is placed on the palm of the user, it will efficiently enhance the posture and increase stability. Fitts’ Law [4] is also an important law to refer to when designing buttons. This has been used as the standard of measuring accuracy and speed [5]. There were many scholars that used this Law to investigate the results of this influence [6][7][8][9][10][11]. Various results show the distance in between will influence the time it takes to key in and this is a very valuable principle that should be referred to when designing or evaluating keyboards. This is more relevant when the efficiency is the criterion that is investigated. Chord keyboard integrates and lessens the buttons on the keyboard, thereby decreasing the movement of the user’s hand. This will lessen the burden and improve working posture, ultimately lessen the harm accumulated through operating on the keyboard [2]. This study investigates the popular input device of modern mobile phones and proposes possible innovative designs for text and number input. The focus of the design will emphasize on integrating chord input onto the design of mobile phone buttons and combining natural finger positioning. The main purposes of the study are as follows: 1. Through literature review and discussions with professionals, propose new input methods for chord input mobile phones. 2. Study current user behavior of mobile phones and base it as the principles of the new mobile phone. 3. Produce prototypes of the new mobile phone chord input design. 4. Learning curve of users using the new chord input and input efficiency comparison. 5. Propose suggestions for mobile phone chord input design.
2 Method and Evaluation This study consists mainly of three stages: observe current user behavior, design new conceptual product and verify the comparison experiment. The process will be supported by statistical analysis and user questionnaires and complete evaluation of results. In order to control the result of the experiment, the users are required to familiarize with the experiment equipments under a certain amount of time until they are under the same level of understanding with the equipment. In order to understand the psychological level of the users, continuous scale is used to undergo psychological choosing task. According to RPE (Rating of Perceived Exertion) proposed by Borg [12], using satisfaction estimation can evaluate the burden of the hand. As users are burdened when they are operating, the discomfort will increase accordingly.
Design and Evaluation of Innovative Chord Input for Mobile Phones
457
2.1 Observation Before the formal experiment, observation is made on the market and the natural holding posture of the hand; this is used as the reference of the new input design. The participants are 17 design-related students of undergraduate level or higher; 11 male and 6 female. No handicap on their hand and serious injury. Most of the participants are right-handed and have a long history of using mobile phones. The task is to input a paragraph consisting of 50 English words and 213 characters. Every alphabet occurs more than three times. After the inputting task, the subject fills in a subjective measurement that consists of performance estimation, comfort level and psychological satisfaction. According to the observation results, the 17 users use both hands for support and their thumb for input. Three users use one hand for support and their thumb for input. One user uses one hand for support and the thumb of the other hand for input. The result shows that most users like to use both hands for support and input via their thumb. Furthermore, in terms of using both hands for support, the four fingers behind the mobile phone can be categorized as three types: four fingers curled but not crossed, four crossed fingers and the forefinger locking the upper section of the mobile phone forming a “C” form. Looking at the operation behavior of the available mobile phones, most users use their thumb for input because it has a higher mobility. The other four fingers are used for supporting the mobile phone. This will cause imbalance of the fingers and burden the thumb, causing muscle ache and sour muscles. While using chord input, it will increase the efficiency and the burden between the fingers. 2.2 Design Development The first phase of the experiment observes existing mobile phone operation behaviors and designs eight different prototypes for different holding postures (Table 1). Table 1. Eight different prototype models
1.
2.
3.
4.
5.
6.
7.
8.
458
F.-G. Wu, C.-W. Chang, and C.-H. Chen
The prototypes are designed for a right-handed person and divide the buttons as groups for the input of their five fingers. The burden of the muscles is divided onto all the fingers in accordance to the different fingers. The burden on the ring finger and pinkie is lessened. This is referenced from the previous study as the most efficient natural gesture and finger positioning [13]. Evaluation of the holding posture and operation is made on the eight prototypes. A total of 10 participants consisting of 8 male and 2 female; all of them having an undergraduate level of education and more than three years of experience in using a mobile phone. The highest scored prototype is number 1, a finer modeling is made on this prototype and design on the placement of the buttons is based on the position of the buttons on this prototype. 2.3 Evaluating Chord Input of Mobile Phones According to the design principles of existing buttons, the numeral buttons are spread on the left hand-side of the mobile phone which is considered as grouped buttons. There are sub-buttons in grouped buttons; these buttons are operated by the four fingers other than the thumb. The function button is place on the right hand-side of the mobile phone that is operated by the thumb. The button itself has four sub-buttons that are used during input. Pressing different buttons will select different combinations to produce different numerals. Base on the above design, two prototypes are designed: Tri-joint and Four-corner (Table 2). Table 2. Two designs of chord input for a mobile phone Type
Three-view Drawing
Button placement
Tri-Joint
Four-Corner
Chord input is not applied in the input of numerals. In the input mode of English characters, the thumb operates on the function button and number buttons to select the characters desired through chord input. The alphabets use the placement of current mobile phones (Table 3).
Design and Evaluation of Innovative Chord Input for Mobile Phones
459
Table 3. Corresponding chart of chord input 1 A B C D
2 a b c
3 d e f
4 g h i
5 j k l
6 m n o
7 p q r s
8 t u v
9 w x y z
0
2.4 Design Mockup Tri-joint and Four-corner prototypes (Table 4), all sizes are referred from existing mobile phones. Length*width*height is 100mm*45mm*16mm. Table 4. Information about two prototypes Type
Prototype view and operating posture
Tri-joint
Four-corner
2.5 Participants 20 participants are chosen, aged between 20-30 years old. The average age is 25.05 years old. They consist of 14 male and 6 female, all having a higher educational degree of undergraduate level. All participants have more than 3 years of experience in using mobile phones and sending English text messages. There is not any hand injury at the time of participating in the experiment. They were willing to participate for the ten day course of the experiment. 2.6 Learning Sections The formal experiment required the participants to input characters and numbers as part of their learning; which included numerals and English words and short sentences. The rate in which the numbers occurred is the same as the English characters. The experiment also recorded time and the error rate. The learning period of the experiment is nine days. All of the nine days included numeral input and English
460
F.-G. Wu, C.-W. Chang, and C.-H. Chen
characters are added starting from the fourth day, and English short sentences are added on the seventh day. 2.7 Final Tasks After all the participants have completed the learning stage and begins the formal experiment. They begin on two tasks: numeral and English characters input. The questions are displayed on the computer screen in the form of slides, each slide including 5 questions, each question including 5 words. Every task includes 3 slides. In terms of numerals, each question has five numbers, each slide has 5 questions and each number on a slide is displayed randomly. Each task includes five questions, which is 75 numbers. The English character includes 15 English words consisting of five characters, displayed on three slides and every alphabet will be displayed more than twice. Every mobile phone is tested first on numerals followed by English characters. In the formal experiment, the subject completes one usability questionnaire and evaluate according to the efficiency, physical comfort and psychological comfort of mobile phones. The numeral and English characters are discussed separately.
3 Result The experiment measures the time taken and the number of errors occurred that are transformed to input speed (CPM) and accuracy (%) for further evaluation and analysis. 3.1 Results of Numeral Part in the Final Tasks One can see from the figure that the fastest input rate is an ordinary mobile phone, 82.07 characters per minute. This is followed by Tri-joint of 66.73 characters per minute and lastly Four-corner of 51.43 characters per minute. The correctness is also the ordinary mobile phone leading the other two new mobile phones. The correctness of an ordinary mobile phone is 99.33% compared to Tri-joint’s 98.07% and Fourcorner’s 97.33%. Compared with the achievement rate of the current mobile phones; Tri-joint can reach 81.31% of the current mobile phone’s rate, Four-corner can reach 62.67% of the current mobile phone’s rate. In terms of correctness, Tri-joint can reach 98.73% of current mobile phones and Four-corner reaching 97.99% (Table 5). Table 5. Formal experiment numerals statistical numbers
Speed (CPM)
Accuracy (%)
Group Tri-joint Four-corner normal mobile phone Tri-joint
Mean 66.73 51.43 82.07 98.07
SD 16.00 9.26 19.46 20.65
Achievement rate 81.31% 62.67%
Four-corner normal mobile phone
97.33 99.33
23.78 12.69
97.99%
98.73%
Design and Evaluation of Innovative Chord Input for Mobile Phones
461
The Homogeneity Tests before undergoing ANOVA analysis, the P value of numbers in terms of input speed and accuracy is 0.080 and 0.059 respectively, bother larger than 0.05. Therefore an ANOVA analysis is proceeded and the result shows significant differences between the three mobile phones both on the speed [F(2, 57) =19.539, P<0.001*] and the accuracy [F(2, 57) =5.322, P=0.008*<0.05]. Duncan multiple comparison tests is proceeded and the figure shows that the input speed cannot be grouped. In terms of accuracy, Tri-joint and Four-corner can be groped in the same category. 3.2 Results of English Characters Part in Final Tasks The figures below shows that the current mobile phone has the fastest average input rate, 23.37 characters per minute. Tri-joint has the rate of 18.51 characters per minute and Four-corner has 16.78 characters per minute. Table 6. Formal experiment English characters statistical numbers
Speed (CPM)
Accuracy (%)
Group Tri-joint Four-corner normal mobile phone Tri-joint Four-corner normal mobile phone
Mean 18.51 16.78 23.37 95.80 94.80 98.07
SD 3.06 3.10 9.46 2.42 2.93 1.65
Achievement rate 79.2% 71.8% 97.69% 96.67%
The correctness of the current mobile phones leads the other two, being 98.07%. Tri-joint has a correctness rate of 95.8% and Four-corner being 94.8%. Compared with the achievement rate of the current mobile phones; Tri-joint can reach 79.2% of the current mobile phone’s rate, Four-corner can reach 71.8% of the current mobile phone’s rate. In terms of correctness, Tri-joint can reach 97.69% of current mobile phones and Four-corner reaching 96.67% (Table 6). The Homogeneity tests before undergoing ANOVA analysis, The P value of English characters in terms of input speed and accuracy is 0.032 and 0.157 respectively, only the P value of accuracy larger than 0.05. Therefore an ANOVA analysis is proceeded and the result shows significant differences between the three mobile phones on the accuracy [F (2, 57) =9.808, P<0.001*]. Duncan multiple comparison test is proceeded and shows that in terms of accuracy, Tri-joint and Four-corner can be grouped in the same category. 3.3 Participants Satisfaction Subjective Measurement will be divided as efficiency rating, physical comfort and usability satisfaction. There are seven, eight and five questions respectively. Each question is compared on the three mobile phones and a score is given on the continuous scale for the mobile phone. There is no numbers on the scale, only the length of the starting point to the end point. The length is 10cm and the length marked is measured afterwards. The score ranges from 0 to 10, calculated to one decimal point.
462
F.-G. Wu, C.-W. Chang, and C.-H. Chen
In the efficiency and comfort rating, all three mobile phones did not pass the Homogeneity tests, be it numerals or English characters. Therefore ANOVA analysis is not preceded. The usability satisfaction rating did pass the Homogeneity tests, the P value for numerals and English characters are respectively 0.1 and 0.133, both larger than 0.05. Therefore ANOVA analysis is preceded and the results shows that there are no significant differences between the three mobile phones in either numerals [F(2, 12)=0.704, P=0.514>0.05] or English characters [F(2, 12) =1.753, P=0.215>0.05].
4 Discussion The learning stage is divided into three sections. The participants faces many mistakes when they tryout the new input device, whether it is the numerals or English alphabets. The participants with the best scores ended up as the best in the end. Although the one’s with lower efficiency progressed but they did not surpass the participants with better results in the end. This can be concluded that the selection of the participantts did benefit this research. In the statistical numbers, Tri-joint is better than Four-corner in the final practice, whether it’s the speed, accuracy or overall progress. However, in terms of the range of progress, Four-corner is better than Tri-joint on some aspects. This shows that Four-corner did not perform as well as Tri-joint yet there is a bigger space for progress. The numbers on numerals and English characters showed there are significant differences; meaning that learning has a remarkable effect on the users, except English short sentences, this can be speculated as the lack of practice. The formal experiment shows significant differences on the current mobile phones both in the sense of efficiency and accuracy, they are both better than the chord input mobile phones. In terms of numerals and English characters both showed significant differences. Through the Duncan tests one can see that the ordinary mobile phones have better input speed than Tri-joint or Four-corner. Based on the result of the subjective measurement the problem is pointed out that Tri-joint mobile phone has the same placement as the current mobile phones. Therefore the participants can feel with their fingers during their input to find the corresponding buttons. This improved the input of English characters for Tri-joint mobile phones. The subjective measurement is divided into two sections of numerals and English characters. It is focused on efficiency, comfort rating and usability satisfaction. No significant findings were made after the ANOVA analysis. This can be explained as the participants having no significant ratings on the efficiency, comfort rating and satisfaction rating of the three mobile phones; whether it is numerals or alphabets. In other words, the participants can input according to different devices. Furthermore, the participants expresses that the original input system is so deeply rooted in their minds so Tri-joint is better memorized than Four-corner. However, some participants expressed that Four-corner’s placement is either top or bottom, left or right; it would be a better device in the long run. It is possible that it will be better than the three rows used in current mobile phones. The efficiency and comfort rating is not as good as current mobile phones because the prototypes differs from current mobile phones, causing operational difficulties. The lack of learning from the chord input also causes psychological burden on the users during their usage of this new input method.
Design and Evaluation of Innovative Chord Input for Mobile Phones
463
Acknowledgments. The authors would like to thank the National Science Council of the Republic of China for financially supporting this research under Contract No. NSC 95-2221-E-006-097.
References 1. Hare, C.B.: Redefining user input on handheld devices. 3G Mobile Communication Technologies, 388–393 (2002) 2. Rose, M.J.: Keyboard operating posture and actuation force: Implications for muscle over-use. Appl. Ergon. 22(3), 198–203 (1991) 3. Baumann, K., Thomas, B.: User interface design for electronic appliances, London (2001) 4. Fitts, P.: The information capacity of the human motor system in controlling the amplitude of movement. J. Exper. Psychol. 47, 381–391 (1954) 5. Jagacinski, R.J., Repperger, D.W., Ward, S.L., Moran, M.S.: A test of Fitts’ law with moving targets. Human Factors 22, 225–233 (1980) 6. Card, S.K., Moran, T.P., Newell, A.: The psychology of human-computer interaction. Erlbaum, Mahwah (1973) 7. Drury, C.G., Hoffmann, E.R.: A model for movement time on data-entry keyboards. Ergon. 35(2), 129–147 (1992) 8. Hoffmann, E.R.: Effective target tolerance in an inverted Fitts task. Ergonomics 38(4), 828–836 (1995) 9. Danion, F., Duarte, M., Grosjean, M.: Fitts’ law in human standing: the effect of scaling. Neurosci. Letters 277, 131–133 (1999) 10. Sörensen, K.: Multi-objective optimization of mobile phone keymaps for typing messages using a word list. European Journal of Operational Research (2005) 11. Wu, F.G., Luo, S.: Performance Study on Touch-pens Size in Three Screen Tasks. Appl. Ergon. 37(2), 149–158 (2006) 12. Borg, G.: An introduction to Borg´s RPE-Scale. Movement Publications, New York (1985) 13. Eilam, Z.: Human engineering the one-handed keyboard. Appl. Ergon. 20, 225–229 (1989)
The Potential of the BCI for Accessible and Smart e-Learning Ray Adams, Richard Comley, and Mahbobeh Ghoreyshi School of Engineering & Information Sciences, Middlesex University The Burroughs, Hendon, London NW4 4BT, UK {ray.adams,r.comley,mg469}@mdx.ac.uk
Abstract. The brain computer interface (BCI) should be the accessibility solution “par excellence” for interactive and e-learning systems. There is a substantial tradition of research on the human electro encephalogram (EEG) and on BCI systems that are based, inter alia, on EEG measurement. We have not yet seen a viable BCI for e-learning. For many users for a BCI based interface is their first choice for good quality interaction, such as those with major psychomotor or cognitive impairments. However, there are many more for whom the BCI would be an attractive option given an acceptable learning overhead, including less severe disabilities and safety critical conditions where cognitive overload or limited responses are likely. Recent progress has been modest as there are many technical and accessibility problems to overcome. We present these issues and report a survey of fifty papers to capture the state-of-the-art in BCI and the implications for e-learning. Keywords: brain-computer-interface, e-learning, accessibility, disability, artifacts.
1 Introduction It has been argued that the potential of e-learning has never been fully recognized. There are, perhaps, many reasons why this may be so, such as (a) a lack of flexibility or ability to detect and reflect the differing requirements of individual users and (b) problems with accessibility such that some learners may be excluded (e.g. those with disabilities). Recent work has focused on the construction and deployment of simple user models (based on a validated theory of human cognition) to improve the flexibility and accessibility of e-learning systems [1]. Applications of the concept of the brain computer interface (BCI), if shown to be valid, could offer partial solutions to these problems. It has the potential to facilitate e-learning systems with the ability to provide flexible, accessible and adaptive learning solutions. One very significant benefit of the BCI approach is that it has the potential to elicit information on the ‘state of mind’ of an individual learner (e.g. alert, attentive, drowsy, etc.) and hence to tailor the learning activities to the changing requirements of that individual. Also, the use of BCI would allow users with, for example, limited psychomotor performance or cognitive disabilities to participate more fully in education and training. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 467–476, 2009. © Springer-Verlag Berlin Heidelberg 2009
468
R. Adams, R. Comley, and M. Ghoreyshi
Most BCI systems rely on non-invasive measurements of the human EEG (electroencephalogram). Technologies, such as psychophysiological measurements in general and electroencephalograms (EEG) in particular, are not new. Ever since Hans Berger [3] showed that the electrical activity of the brain could be monitored by electrodes placed on the scalp, attempts have been made to link these signals to the underlying activity of the brain. Berger went on to discover that EEG activity was abnormal in epilepsy which, combined with the work of Walter [15] who showed that slowly varying voltages arose near brain tumours, lead to the widespread use of the technique for routine clinical diagnosis. Debate has continued over the years as to whether this gross, averaged and distorted signal conveys any meaningful information or is merely an interesting phenomenon [e.g. 12]. However, recent advances have shown that Brain Machine Interfaces (BMIs) are becoming a practical proposition [8]. Future technologies built on them promise to revolutionize the emerging Information Society through the development of effective and acceptable brain-computer interface, virtual, augmented realities and augmented cognition. This paper begins with a critical review of psychological and pragmatic issues that must be understood before these technologies can deliver their full potential. Current work has shown that the concepts of usability and accessibility have rarely been applied explicitly to BCI and augmented cognition research. This is changing and while this suggests an increased awareness of these concepts and the related large research literatures, the task remains to sharpen these concepts and to articulate their obvious relevance to BCI work [2]. The concept of the brain computer interface (BCI) presents some startling possibilities for enhanced communication and accessibility: BCIs have the potential for helping individuals with severe communication and control problems due to disability or extreme circumstances, as well as giving anybody who requires or desires nontraditional human-to-system communication tools with additional input/output channels. The notion of BCI may be simple, but the underlying science is complex. Hence, an effective application of BCI necessitates an adequate appreciation of the underlying science. For this reason, this paper sets out to consider the artifacts, the psychology and the rehabilitation engineering underlying BCI, with particular reference to its use for education and e-learning. There are at least four serious problems that must be faced if the potential of BCI is to be realized. First, the viability of the BCI to deliver a valid reflection of the activity of the human brain must be assured. A normal EEG record contains various artifacts such as the influence of gross motor movements, eye movements, external electromagnetic influences, etc. Second, the BCI must provide enhanced accessibility and the ability to identify certain parameters associated with the individual learner necessary to support the adaptive customization of education and e-learning systems. Third, different populations of users will have different requirements, so the system must be adaptable (i.e. can be altered before running). Fourth, the requirements of modern users, in the Information Society, are much more nuanced and demanding. The system must not collapse when faced with unfamiliar user requirements. It should be adaptable and capable of adapting whilst running. The overall purpose of this paper is, therefore, to review the current state of the art, with a particular focus on the above.
The Potential of the BCI for Accessible and Smart e-Learning
469
We propose that an effective BCI system must satisfy the following three axioms: 1. It is possible to take sensitive and reliable measurements of aspects of human brain activity on a non-invasive basis; 2. Aspects of human brain activity can be controlled systematically and dependably by the individual; 3. These measurements of human brain activity can be readily used to control or communicate with interactive systems or to communicate with other people [17]. In addition, we suggest that there are at least three generic requirements that apply to any communication and control system: • Functionality [13], i.e. does it support important, useful and desirable tasks; • Usability [9], i.e. is the system too difficult to use; • Accessibility [7] i.e. are there any barriers that prevent or disadvantage users when using the system?
2 The Human Head and Electrical Signals The human head has three main layers, namely skull, scalp and brain. There are also many thin layers between them. In the skull area, signals are attenuated by approximately one hundred times [11]. The resultant signal that reaches the surface of the skull is in the order of a few tens of microvolts and represents the average of the activity of a large number of individual neurons firing in the underlying area. Different levels of activity can be picked up depending on the position of the electrode on the surface of the scalp. This, in part, reflects the activities associated with different regions of the brain. The human brain is basically divided into three parts. The cerebrum initiates behavior such as movement, conscious, sensation, complex analysis, expression and motion. The cerebellum is responsible for the co-ordination and control of voluntary movement and balance of muscle and body. Finally, the brainstem controls involuntary function such as heart regulation and hormone section [14]. The human body can generate a range of different signals but the primary sources are the brain and the muscles. The electroencephalogram (EEG) and magnetoencephalogram (MEG) are generated by the brain. The electromyogram (EMG) originates from the nerve impulses to the muscles. Of these, the electroocclugram or electrooptigram (EOG) generated from the optical nerves and electrocardiogram (ECG) generated from the heart, are of particular importance. Both can lead to a serious disruption of measurement of the EEG and hence can seriously compromise a BCI. The signals from these other sources may be hundreds of millivolts, i.e. several orders of magnitude greater than that of the EEG signal. The majority of current BCI applications are concerned with the classification of EEG signals from the brain and their translation into control signals. These control signals give power to the human participant to control the environment and communicate with the outside world by thought alone, without the intervention of physical movement. For example, this could allow control over a computer screen cursor, or an electric wheelchair, just by thinking and imagining left or right-related movements and receiving feedback from any consequent movements (of cursor or wheelchair, etc). Different EEG patterns can be obtained and identified, depending on the type of
470
R. Adams, R. Comley, and M. Ghoreyshi
motor or imaginary motor responses [10]. EEGs can be recorded (i.e. BCI data acquisition) by sets of electrodes that are placed in standard positions on the scalp surface (i.e. this is a non-invasive system). Other, more invasive systems may rely on implanted electrodes, but that is not the focus of our work. The simple act of collecting the EEG signals for BCI presents enormous practical problems. In a clinical environment, the person undergoing an EEG recording would do so within a very controlled environment, with dimmed lighting and with movements kept to a minimum. For a practical BCI system, neither of these conditions can be assumed and indeed, depending on the type of system, may precipitate activity, e.g. looking at flashing images, moving eyes to look at certain positions on a screen, etc. As previously mentioned, the signals picked up by the scalp electrodes from EMG will swamp those resulting from normal EEG activity. Any BCI system that is to be of practical use must be able to cope with EMG and other spurious pickup in an efficient manner, hence much effort is expended in techniques designed to reject and/or remove artifacts (i.e. any signals that are not the direct result of EEG activity). In a clinical environment, artifact rejection is generally quite simply accomplished by ignoring sections of the signal that appear to be contaminated. This is a valid approach since only small sections of the EEG recording would normally be affected in this way. However, for BCI this is generally not a satisfactory solution. It is likely that large segments of the EEG record will be ‘contaminated’ and to simply ignore them would render as useless, any attempt to make a real-time classification. A BCI system thus needs a reliable method to separate noise and artifacts from the incoming EEG signal (i.e. pre-processing), a means of enhancing and/or isolating the features of interest (e.g. specific frequency bands) (i.e. signal conditioning), a method to identify the presence of specific features (i.e. feature recognition), a decision making process (i.e. classification) and finally a suitable output channel to send control an appropriate signal to the application interface (e.g. a wheelchair or computer). The components of the BCI system are connected together as a sequential chain, such that reliable detection of EEG signals forms the start of the chain. So if this step fails, the whole system will fail! It is therefore not surprising that much recent effort has been expended in the search for suitable pre-processing and signal conditioning techniques. The detection of ‘signals buried in noise’ has been a major pre-occupation for those involved in signal processing, for many years. Numerous techniques are available to choose from, but the most successful for EEG artifact rejection are currently based on some form of decomposition and/or cancellation. One particularly powerful method is based on Independent Component Analysis (ICA) which has been shown to be effective in detecting and removing artifacts in EEG, arising from a variety of sources (e.g. ECG, EOG, line noise, etc.) [6]. The majority of signal conditioning methods involve some form of filtering, either spatial, temporal or a combination of both. The Common Spatial Patterns (CSPs) algorithm has proved to be successful in the design of spatial filters, suitable for example, in the discrimination of rhythmic brain activity (e.g. beta and theta activity, associated with ‘creative thinking’). It has not proved particularly effective when it comes to nonperiodic, or temporal, activity (e.g. imagined motor movements). In these cases, the Independent Residual Analysis (IRA) has been applied with good results. For many applications (e.g. e-learning in our case) both methods are important and the most successful implementations to date, incorporate both techniques (e.g. [18]).
The Potential of the BCI for Accessible and Smart e-Learning
471
BCI however, is not just about finding solutions to the technical difficulties, the types of people who will use the systems are of equal or even greater importance. In order to operate a BCI system, a user must be able to produce brain activity that can be detected and classified with a high degree of reliability and reproducibility. There is a general consensus, based on experimental results, that a user must learn to operate a BCI system. This will require the development of suitable training mechanisms, and has been liken to the experience of a child learning to walk. We expect our focus on e-learning based on BCI to make a significant contribution in this particular aspect of research. There is already evidence that subjects exhibit a high degree of variability in terms of their ability or otherwise to master the necessary control of their thought processes. This is not necessarily a surprising finding and could be considered as akin to learning to master a musical instrument. However, a number of potential BCI users can be broadly classified into one of three main Groups, as defined below, each with a common set of demands or requirements.
3 BCI User Groups 3.1 Severely Disabled People Severely disabled people and those who are totally paralyzed or have little or no control over their motor functions, such as spinal cord injury and lock-in-syndrome patients often have involuntary eye-blinks, eye-movements as well as facial or behavioral mimicry, producing EEG contamination. (Lock-in-syndrome patients are usually aware and awake, but cannot move or communicate due to an almost complete paralysis of nearly all voluntary muscles in the body. However, they often display strange mimicry behavior, such as crying, yawning, stretching, etc. Actions that are not considered to be within their normal repertoire of behaviors.) These people are usually isolated from the outside world. Here, EEG-based channel communication (BCI) has the potential to be a form of assistive technology that provides new communication channels for them to communicate with and control their environments. But can current BCI systems fulfill all the requirements of these groups? Can BCI systems support writing, arithmetic, spelling, and imaginary mental tasks? The answer is that this can usually be achieved where these tasks are based on binary and simple responses (on/off, yes/no, and left/right, up/down, etc). A good example is the “Dasher” system [16]. This system allows an individual to type letters, papers, etc, simply by navigating a moving cursor up or down on the screen. Thus, this interface design is mediated by the simple binary responses (“up” versus “down”) but still allows the individual to work their way through the letters of the alphabet to create whole words and sentences. The system uses dependencies between letters to present the user with the most likely options first. With practice, performance speed can show significant improvement. Of course, for some individuals, it may be the only way yet that they can communicate by writing! (It should be added that Dasher is more than just a BCI system and supports a range of response modes). Such systems are not yet able to replicate complex mental tasks that are without the mediation of binary responses by the individual.
472
R. Adams, R. Comley, and M. Ghoreyshi
3.2 Older Adults The older adult (over 60) may often present multiple, minor motor and cognitive function impairments or have slow control over their motor functions. EEG shape changes often occur in the older adult through increasing slow delta wave and are associated with slow EEG rhythms. They may also have slower control over muscle activities of the hands, fingers, etc. Decision-making may also be slower. For this group, the motivation to use BCI is completely different from the first group. The second group may need a system that is based on slower reaction times. Such a system could provide long-term or medium-term control over their living environments, but could not be relied upon yet for emergency responses in safety-critical situations. The design of a BCI for older adults should reflect their non-typical EEG profiles or slower response times. 3.3 Able-Bodied People Able-bodied people, by definition, have normal or near-normal control over their psychomotor functions and have typical brain activity rhythms. Thus they should be able to use BCI systems well. However, even in this group problems may arise. At first, the BCI may be enjoyed as a new experience, as a game or as a stimulating new way to learn. For example, imagine the ability to navigate through a virtual reality environment by deploying imaginary movements. However, perhaps this group would require the system to be both fast and accurate. We suspect that current EEG-based BCIs are too slow to maintain their interest. Imaginary game direction (left/right imaginary depend on the mu rhythm of EEG) is very slow. However, this type of group can use other signals such as EMG detection (finger flexion), EOG detection (Eye tracker) or some combination of them. This population of users are also likely to be accustomed to fidgeting causing problems for the detection system. Finally, BCIs may require significant time in the “make-up room”, i.e. require a substantial amount of time to adapt to a participant in order to obtain a good-enough signal strength. This group of users may lack the necessary patience. If so, a new generation of BCIs may be needed that are faster and much more robust to such problems.
4 The Current State of the Art One of the primary attractions of the BCI is its potential to create much better access to interactive systems, particularly for people with significant disabilities such as the locked-in syndrome. Other potential benefits include the ability to monitor an individual for health problems, their emotional status and cognitive overload. Additionally, there is the appeal of being able to control a system, a computer game or an ambient environment through the power of thought alone, provided that the level of difficulty is acceptably low. There has also been research to explore the value of BCI for user authentication [4]. There are viable alternatives to the BCI, including eyetracking, simple or binary switches, etc. In fact, at least one laboratory has abandoned the use of BCI [5]. Many applications of BCI technology, however, still face problems of signal processing and measurement artifacts such as eye-blinks and facial muscle movements (see below).
The Potential of the BCI for Accessible and Smart e-Learning
473
To explore the current state of the art, we surveyed a sample of fifty research papers from the ACM Digital Library that were constrained only by the two search terms “Brain Computer Interface” and “BCI”. A set of papers meeting these search criteria were downloaded to a folder and fifty papers were selected at random from that set. This sample of fifty papers is available on request from the authors. The fifty papers were summarized in terms of (a) a paper ID, (b) definition of intended users, (c) artifacts and problems identified and (d) salient features of the paper. The resulting data were subject to quantitative and qualitative evaluation. However, as found in previous research [2], there was little or no overlap between the BCI and accessibility research literatures, as indicated by a significant lack of cross-referencing. However, there was some (subjective) indication that the literatures of BCI and HCI are slightly converging. First, the fifty papers were divided into (a) those that defined their intended users and those that did not. Only 28 (56%) of the 50 papers defined or described these possible users. Thus 22 papers did not define the users. Fifty-six percent is a surprisingly low level, though can be understood, to some extent, by a focus on the technicalities of BCI systems. When considering EEG-based BCI systems, this sample of papers identified a number of potential artifacts and problems. This list is: eye-blinks (n=3), EEG noise (n=3), EMG (n=3), signal attenuation; associated with aging, illness and other factors (n=2), low signal-to-noise ratio (n=2), face muscle movement (n=1), eye-movements (n=1), body movements (n=1), EOG (n=1), involuntary movements, particularly with specific clinical conditions (n=1), ECG (n=1) and small sample sizes (n=1). This above list reflects the language of the individual papers, so there is some nuanced overlap. However, this would seem to be a relatively complete list of artifacts. This approach assumes a pure EEG system, but (see below) one researcher’s artifact is another researcher’s measure. The papers that defined their users were then examined for the groups of users. Users with severe neuromuscular disorders were the most cited group (n=9), people with physical disabilities were next (n=7), then people with cognitive disabilities were next (n=4), locked-in syndrome was an important sub-group (there are an estimated 500,000 such individuals in the world, today; n=4), gamer-players were a distinct group (n=3), people with brain injuries were included (n=3), electrical wheel-chair users (n=2), severely disabled people (n=1), people with spinal cord injuries and, finally, musicians! (n=1). Again this list reflects the language used by the authors themselves. However, there is clearly some diversity in this list. Currently, there is a debate about the merits, relative or absolute, for the use of different BCI systems for different user populations. No doubt this debate will continue, with studies of different alternative systems. Turning to the exact types of measures used, our results were surprising. We had anticipated that a significant majority of our sample would have reported EEG only based systems. This did not turn out to be so. There were a small number of cases were embedded electrodes were used (i.e. electrodes actually inserted into the human brain; ECogG) but these were infrequent (n=3). Surprisingly, less than half our sample reported the pure use of scalp electrodes for EEG (n= 20). Of those, a subset focused on averaged evoked potentials (n= 5). An almost identical number of papers reported EEG plus other measures, including EMG, EOG, heart rate, keys, pedals, buttons, heart rate and GSR (n=19). Finally, a small group used alternatives to EEG, including EMG,
474
R. Adams, R. Comley, and M. Ghoreyshi
EOG and GSR (n=7). This may implicate a concern for the value of the EEG as a reliable and informative measure. One response to such a concern would be to combine different measures. Another strategy would be to seek to increase the signal-tonoise ratio and to filter out as many artifacts as possible. One important issue is the comparison of different measures or combinations of measures. However, the present sample of papers does not provide sufficient comparisons to allow us to do so. Clearly, further work is needed on this issue. Another vital question is the choice of the number of electrodes. Here, the number of electrodes in use varies widely. There is the 10-20 electrode placement system issued by the International Federation of Electroencephalography and Clinical Neurophysiology in 1958. However, the focus is on defining the positions of electrodes on the human scalp rather than setting an exact number of electrodes to be used in different contexts. Numbers varied from a single (implanted) electrode to 256 electrodes. Some averaging may be useful but also creates delays in the responsiveness of the system. Finally, looking at the uses to which BCIs were put, most were focused on user control of the external environment, though a small number were applied to game playing and to user authentification instead of passwords. In a few cases, the use of EEG or other BCIs to monitor the individual for health status, emotional state, cognitive state, cognitive overload, working memory load, etc. No cases were found of applications to e-learning, even though some potential learners have little or no alternatives. These analyses have revealed a number of important collections. Whilst there has been a significant move from surgically implanted electrodes to the use of noninvasive, scalp electrodes, the cost of this move has been a reduction of signal strength. There is also a clear distinction between pure EEG (with consideration of signal to noise ratio and artifacts) and multi-measure systems. Clearly there are many agendas in operation, including a better understanding of human EEG and better, practical control being given to the user. Finally, it is emerging that the calculation of the potential benefit of BCI (and different versions of BCI) to human control of systems depends strongly on a clear definition of the intended user population. If so, we may be a step closer to the use of BCIs for the creation of more accessible e-learning systems for users who would otherwise be excluded.
5 Conclusions We have attempted in this paper to highlight some of the potential for BCI for elearning while at the same time recognizing the enormous problems that must be overcome in order to implement even very basic functions. It is now 80 years ago that the EEG was first identified and almost 40 years since the first publications appeared on BCI type systems. Only very limited progress has been made in that time, in spite of the enormous advances in semiconductor technology, delivering ever faster and more complex processing engines, in signal processing theory and indeed in terms of our understanding of human brain function itself. Our survey has shown that the majority of current effort is focused on the technical challenges associated with the capture and processing EEG activity and where a target application is identified, most are
The Potential of the BCI for Accessible and Smart e-Learning
475
concerned with the use of BCI to replace motor type functions, especially for those with significant motor disabilities. The focus of our research is on how BCI may be used to identify mental activity associated with the learning process and thereby augment and enhance the capabilities and accessibility of e-learning systems. We believe this has the potential to revolutionize education and could prove fundamental in the development of BCI itself, providing a ‘bootstrap’ method by which users may be trained to operate the interface. The marriage of BCI and e-learning will provide an adaptive environment through which to enhance the learning process, accessible to all members of society.
References 1. Adams, R., Granić, A.: Creating Smart and Accessible Ubiquitous Knowledge Environments. In: Stephanidis, C. (ed.) UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 3–12. Springer, Heidelberg (2007) 2. Adams, R., Bahr, G.S., Moreno, B.: Brain Computer Interfaces: Psychology and Pragmatic Perspectives for the Future. In: Annual convention of the Artificial Intelligence and Simulation of Behaviour (AISB) Society (2008) 3. Berger, H.: Uber des Eleklrenkephalogramm des Menschen. Arch. Psychiat., 16–60 (1929) 4. Borkotoky, C., Swapnil Galgate, S., Nimbekar, S.B.: Human Computer Interaction: Harnessing P300 Potential brain waves for Authentication of Individuals. In: Compute 2008, Bangalore, Karnataka, India, January 18-20, pp. 1–4 (2008) 5. Felzer, T., Ernst, M., Strah, B., Nordmann, R.: Accessibility Research at the Department of Mechatronics at Darmstadt University of Technology. Sigaccess Newsletter (88), 19–28 (2007) 6. Jung, T.-P., Makeig, S., Humphries, C., Lee, T.-W., McKeown, M., Iragui, V., Sejnowski, T.: Removing electroencephalographic artifacts by blind source separation. In: Psychophysiology, pp. 163–178. Cambridge University Press, Cambridge (2000) 7. Lawrence, S., Giles, C.L.: Accessibility of information on the web. Nature 400, 107 (1999) 8. Lebdev, M.A., Nicolelis, M.A.L.: Brain-Machine Interfaces: past, present and future. Trends in Neurosci. 29(9), 536–546 (2006) 9. Nielsen, J.: Usability engineering. Morgan Kaufmann, N.Y. (1994) 10. Popescu, F., Badower, Y., Fazli, S., Dornhege, G., Muller, K.-R.: EEG-based control of reaching to visual targets. In: Dynamical Principles for neuroscience and intelligent biomimetic devices - Abstracts of the EPFL-LATSIS Symposium 2006, pp. 123–124, 1–2 (2006) 11. Sanei, S., Chambers, J.A.: EEG Signal Processing. Wiley-Interscience, London (2007) 12. Stowell, H.: No future in the averaged scalp. Nature, 1074 (1970) 13. Szykman, S., Racz, J.W., Sriram, R.D.: The representation of function in computer-based design. In: Proceedings of the 1999 ASME Design Engineering Technical Conferences, Las Vegas, Nevada, September 12-15 (1999) DETC99/DTM-8742 14. Teplan, M.: Fundamentals of EEG Measurements. Measurement Science Review 2(2) (2002) 15. Walter, W.G.: The location of cerebral tumours by electroencephalography. The Lancet, 305–308 (1936)
476
R. Adams, R. Comley, and M. Ghoreyshi
16. Wills, S.A., Mackay, D.J.C.: DASHER – An efficient writing system for Brain-Computer Interface. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14, 244– 246 (2006) 17. Wolpaw, J.R., Birbaumer, N., Heetderks, W.J., McFarland, D.J., Peckham, P.H., Schalk, G., Donchin, E., Quatrano, L.A., Robinson, J., Vaughan, T.M.: Brain-computer interface technology: a review of the first international meeting. IEEE Transactions on Rehabilitation Engineering [also IEEE Trans. on Neural Systems and Rehabilitation 8, 164] (2000) 18. Zhao, Q., Zhang, L.: Temporal and Spatial Features of Single-Trial EEG for BrainComputer Interface. Hindawi Publishing Corporation, Computational Intelligence and Neuroscience (2007) Article ID 37695
Visualizing Thermal Traces to Reveal Histories of Human-Object Interactions Tomohiro Amemiya NTT Communication Science Laboratories 3-1 Morinosato Wakamiya, Atsugi, Kanagawa 2430-198 Japan
[email protected]
Abstract. Traces of human-object interactions remain on objects in the form of thermal information. This paper describes a human memory aid that exploits such traces to create a thermal ‘lifelog’ of one’s interactions with the environment, without disrupting ongoing activities and without any special apparatus or wires. The goal of the aid is to build a digitized surrogate memory to assist in recalling personal experiences. A system with an infrared camera that records the thermal traces left by human-object interactions was fabricated. Measurements obtained with this system can help us understand the nature of thermal traces and be used to develop thermal models that can describe the heat transfer process on object surfaces after contact. Keywords: thermal trace, lifelog, surrogate memory.
1 Introduction We are sometimes forced to attend to things in a specific or scheduled order. However, many people, especially, the elderly or those with early Alzheimer’s disease, have had the experience of losing track of what they were just doing. In these cases, recording and storing one’s interactions with the world can be helpful for recalling one’s own experiences. There has been growing interest in ‘lifelog’, which is a concept introduced by the Defense Advanced Research Projects Agency (DARPA) in 2003. A lifelog records and stores a person’s experiences in and interactions with the world, and its ultimate goal is the development of a digitized “surrogate memory” to assist in recalling those experiences [1-5]. Several wearable systems, typically equipped with a video camera and a microphone that record the user’s visual and auditory experiences, have been developed to support this digitization of personal experiences [6][7]. The physical interactions between the user and surrounding objects can be inferred from visual information [6] or detected by pressure or force sensors [8] or by objects with RFID (radio frequency identification) tags [9][10]. However, the placement and wiring of the sensors may interfere with contact, and it is difficult to judge if the contact truly happens when the object is occluded by the user’s hand (Fig. 1). C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 477–482, 2009. © Springer-Verlag Berlin Heidelberg 2009
478
T. Amemiya
(a)
(b) Camera
RFID reader
RFID tags
(d)
(c) Force/pressure sensor
IR camera
Fig. 1. Classification of recording systems based on touch-based interaction
2 Lifelog from Thermal Traces Since the resting temperature of the skin is typically higher than the temperature of the objects it encounters [11], the user and background can be differentiated easily with thermal vision as shown in Fig. 2(a). When the user interacts with an object, the temperature of the object surface increases. Accordingly, this ‘invisible’ evidence of human-object interaction is left on the object surface after contact in the form of thermal information as shown in Fig. 2(b). These unique characteristics of thermal information can be useful in detecting physical interactions and estimating the timing and duration of these interactions. By using thermal traces, a lifelog can be kept without interfering with ongoing activities, even when the objects are occluded by a part of the human body for a short duration. As a first step to achieving this goal, a system that records the thermal traces left by hand-object interactions was created. Measurements obtained with this system can help us understand the nature of thermal traces and be used to develop thermal models that can describe the heat transfer process on object surfaces after contact. The resting temperature of the skin on the hand ranges from 25 to 36 °C [11] and is typically higher than the temperature of objects encountered in the environment. The thermal interaction between the skin and an object in contact with the skin is a transient process. Heat from the skin flows across the skin-object interface and diffuses into the object by conduction. As a result, the region that is close to the contact area heats up and the temperature increases. The amount of heat received by the object and the temperature response of the object during contact can be estimated by using the model proposed in [12]. In general, objects with a high contact coefficient (defined as
Visualizing Thermal Traces to Reveal Histories of Human-Object Interactions (a)
479
(b)
Fig. 2. Visual and thermal images (a) during and (b) after typing, walking, and writing
the square root of thermal conductivity, specific heat, and density), such as metal, absorb more heat from the skin than those with low contact coefficients, such as plastic. However, the surface temperatures of the objects with high contact coefficient do not change much during contact compared to those with low contact coefficients. When the hand withdraws from the object after contact, there are three ways that the heat can escape from the heated region: infrared radiation, convection into the air, and conduction into the base of the object. The amount of heat removed by infrared radiation depends on the emissivity of the object surface. For objects with high emissivity, infrared radiation plays a major role in heat removal. The heat can also be removed by convective heat transfer, and the amount depends on the natural air flow in the environment. The significance of conduction is decided by the thermal diffusivity of the object. The heat left on the objects with high thermal diffusivity tends to diffuse into the base faster than that left on ones with low thermal diffusivity. Since an infrared camera measures the infrared radiation emitted from the object surface, the thermal trace would be more ‘visible’ when it is left on objects with high emissivity and a low contact coefficient because there will be a larger increase in the surface temperature during contact.
480
T. Amemiya
3 Daily Scenarios The results of a pilot study [13] indicate that the visibility of a thermal trace left on an object’s surface after contact depends on the material of which the object is made. Thermal traces may be more visible and stay longer on materials with both a high emissivity and low contact coefficient (e.g., thermal plastic acrylonitrile butadiene styrene: ABS) than they are on those with low emissivity and high contact coefficient (e.g., copper). Materials such as copper are not suitable for the IR-based lifelog system because their extremely low emissivity makes them act like an infrared mirror. In the following experiments, we focused on objects made of plastic. 3.1 Detection of Hand-Object Interaction Let’s consider a situation where a cup has been broken, and we want to know whether it was broken intentionally or accidentally. From a conventional video, we may not be able to determine what happened because of occlusion [as shown in Fig. 3(a)], which is often the case for surveillance cameras. With an infrared camera and a proper thermal model of the objects, we can distinguish whether the cup was broken intentionally or not [Fig. 3(b) and (c)]. A thermal trace on an object made of ABS will disappear about 20 seconds after contact. The proposed system should be able to visualize such a trace within 10 seconds. before
occlusion
after
(a)
(b) (c)
t Fig. 3. Robustness against the occlusion of body part. (a) Normal vision cannot distinguish whether (b) the cup was broken intentionally or (c) accidentally.
3.2 Influence of Duration of Contact The duration of contact can influence the temperature response of the thermal trace. The results of the pilot study [13] indicate that exponential functions can describe the general trend of the cooling process on the surface after hand-object interactions. Figure 4 shows two bowls made of the same material (polyvinyl chloride). The left one in the picture was released from holder’s hand first, and then the right one was released. When the contact time was longer, the surface temperature was higher than those elicited by shorter contact times.
Visualizing Thermal Traces to Reveal Histories of Human-Object Interactions
481
Numerous measurements would have to be performed in order to determine the exponential function for every object and condition that could appear in the proposed system. The time and labour needed for these measurements would be considerable. This drawback can be overcome by developing an analytical or numerical thermal model to describe the heat transfer process after contact and by using the experimental data to validate the thermal model. With a proper thermal model, it would be possible to estimate the timing and duration of the hand-object interactions based on the thermal traces.
t Fig. 4. The effect of contact duration. Two bowls were made from same material (polyvinyl chloride). The left one in the picture was released first, and then right one was released. Longer contact duration leave a longer thermal trace on the object surface.
4 Conclusion This paper presented the lifelog system that uses thermal vision to visualize ‘invisible traces’ produced during hand-object interactions. The results for two daily scenarios indicate that a thermal trace left on an object’s surface after contact can be visualized with an infrared camera. Thermal traces may be more visible and stay longer on materials having a high emissivity and a low contact coefficient than on those having low emissivity and a high contact coefficient. The duration of contact can also influence the temperature response of the thermal trace. From these findings, it is possible to infer the material properties of the objects and the timing and duration of hand-object interactions on the bases of the thermal traces and a thermal model that describes the heat transfer process on object surfaces after contact. With these capacities, the infrared system can be used to detect physical interactions between humans and objects and to log and ‘compile’ the temporal thermal information into a history of human-object interaction.
482
T. Amemiya
Acknowledgement. This study was supported by Nippon Telegraph and Telephone Corporation. The author thanks Dr. Hsin-Ni Ho for her valuable comments on thermal modelling.
References 1. Lamming, M., Flynn, M.: “Forget-me-not” Intimate Computing in Support of Human Memory. In: Proc. of FRIEND21 Symposium on Next Generation Human Interfaces (1994) 2. Rhodes, B.: The Wearable Remembrance Agent: A system for augmented memory. In: Proc. of the 1st International Symposium on Wearable Computers, pp. 123–128 (1997) 3. Gemmel, J., Williams, L., Wood, K., Lueder, R., Bell, G.: Passive capture and ensuring issues for a personal lifetime store. In: Proc. of the 1st ACM Workshop on Continuous Archival and Retrieval Personal Experiences, pp. 48–55 (2004) 4. Tancharoen, D., Yamasaki, T., Aizawa, K.: Practical experience recording and indexing of Life Log video. In: Proc. of the 2nd ACM workshop on Continuous archival and retrieval of personal experiences, pp. 61–66 (2005) 5. Gemmell, J., Bell, G., Lueder, R., Drucker, S., Wong, C.: MyLife Bits: fulfilling the Memex vision. In: Proc. of the 10th ACM international conference on Multimedia, pp. 235– 238 (2002) 6. Mann, S.: ‘WearCam’ (The Wearable Camera): Personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis. In: Proc. of the 2nd IEEE International Symposium on Wearable Computers, p. 124 (1988) 7. Ueoka, R., Hirota, K., Hirose, M.: Study of wearable computer for subjective visual recording. In: Proc. of HCI International 2003, pp. 350–354 (2003) 8. Silva, J.G., Carvalho, A.A., Silva, D.D.: A strain gauge tactile sensor for finger-mounted applications. IEEE Trans. on Instrumentation and Measurement 51(1), 18–22 (1991) 9. Kawamura, T., Fukuhara, T., Takeda, H., Kono, Y., Kidode, M.: Ubiquitous Memories: a memory externalization system using physical objects. Personal and Ubiquitous Computing 11(4), 287–298 (2007) 10. Hirose, Y., Ikei, Y., Hirota, K., Hirose, M.: iFlashBack: A Wearable Electronic Mnemonics to Retain Episodic Memory Visually Real by Video Aided Rehearsal. In: Proc. of IEEE Virtual Reality Conference, pp. 273–274 (2005) 11. Verrillo, R.T., Bolanowski, S.J., Checkosky, C.M., McGlone, F.P.: Effects of hydration on tactile sensation. Somatosensory and Motor Research 15, 93–108 (1998) 12. Ho, H.-N., Jones, L.A.: Thermal model for hand-object interactions. In: Proc. of the IEEE Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, pp. 461–467 (2006) 13. Ho, H.-N., Amemiya, T., Ando, H.: Revealing Invisible Traces of Hand-Object Interactions with Thermal Vision. In: Proc. of the 2007 InfraMation Conference, pp. 431–438 (2007)
Interacting with the Environment through Non-invasive Brain-Computer Interfaces Febo Cincotti1, Lucia Rita Quitadamo1,2,3,4, Fabio Aloise1,4, Luigi Bianchi1,2,3, Fabio Babiloni5, and Donatella Mattia1 1
Neurofisiopatologia Clinica, Fondazione Santa Lucia, IRCCS, Rome, Italy 2 Dipartimento di Neuroscienze, Università Tor Vergata, Rome, Italy 3 Centro di Biomedicina Spaziale, Università Tor Vergata, Rome, Italy 4 Dipartimento di Ingegneria Elettronica, Università Tor Vergata, Rome, Italy 5 Dipartimento di Fisiologia e Farmacologia Umana, Università Sapienza di Rome, Italy
[email protected],
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. The brain computer interface (BCI) technology allows a direct connection between brain and computer without any muscular activity required, and thus it offers a unique opportunity to enhance and/or to restore communication and actions into external word in people with severe motor disability. Here, we present the framework of the current research progresses regarding noninvasive EEG-based BCI applications specifically devoted to interact with the environment. Despite of the technological advancement, the operability of a BCI device in an out-laboratory setting (i.e. real-life condition) still remains far from being settled. The BCI control is indeed, characterized by unusual properties, when compared to more traditional inputs (long delays, noise with varying structure, long-term drifts, event-related noise, and stress effects). Current approaches to this are constituted by post hoc processing the BCI signal in order to better conform to traditional control. A long-term approach is to devise novel interaction modalities. In this regard, BCI can offer an unusual and compelling testing ground for new interaction ideas in the Human Computer Interaction field. Keywords: BCI, EEG, Applications, Functional Model, Standards.
1 Introduction A Brain-Computer Interface (BCI) is a direct communication pathway between the user’s brain and an external device [1, 2]. From a neurological point of view, BCIs bypass the user’s peripheral nervous system (nerves) and his/her muscles, establishing a direct connection between the central nervous system (brain) and the environment the user operates in. In this interaction paradigm, it is not needed that the user contracts even a single muscle (e.g. to press a button, to vocalize his/her intent, or to direct his/her gaze), because the interface is able to recognize specific commands by recognizing his/her “brain states”. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 483–492, 2009. © Springer-Verlag Berlin Heidelberg 2009
484
F. Cincotti et al.
The concept of direct pathway does not necessarily imply a physical contact with the brain. Many BCI devices have been proposed, which are based on non-invasive detection of correlates of brain functioning, such as bioelectromagnetic fields (exploited in electroencephalography- EEG- and Magnetoencephalography –MEG[3]) and concentration of metabolic products (exploited in Functional Magnetic Resonance Imaging – fMRI[4, 5] – and Near Infrared Spectroscopy – NIRS[6]). At present, practical considerations (such as cost and portability) only permit the use of EEG-based BCIs as viable interfaces for a real-life use. In this paper, we will present the framework of the current research progresses regarding the field of non-invasive EEG-based BCI applications specifically devoted to interact with the environment. First, the neurological bases of the BCI control signals will be described with particular attention to event-related potential signals and the modulation of the sensorimotor rhythms; a second body of the paper will deal with the steps from the detection of BCI control signals to the translation of these electrophysiological signals into a semantic signals which is meaningful for the BCI interface application; as third issue we will introduce some applications of the BCI in terms of environmental interaction implemented in different related projects; finally, we will outline open questions related to the operability of a BCI device such as theoretical modeling and standardization whose ultimate goal is a formal introduction to a BCI general framework.
2 Neurological Principles EEG-based BCIs rely on automated detection of the user's intent, based on the classification of patterns of his/her brain waves. The most successful approaches are the P300[7,8], steady-state visual evoked potentials (SSVEP)[9, 10] and oscillatory components based BCIs[11]. In all cases, the user has to interact with the system through an interface which functions as an online feedback to the users. However, whereas the oscillatory components based BCIs require the user to learn regulation of the target EEG response by means of the online feedback, in the event-related potential-based BCIs, an evoked brain potential is elicited by external stimuli and learning of voluntary brain regulation is not necessary (Fig.1). The P300 event-related brain potential is a positive endogenous potential which occurs over the parietal scalp region when infrequent or particularly significant stimuli are interspersed with frequent or routine stimuli [12]. Because of its stability and reproducibility, the P300 has been proposed as a control signal for brain computer interface (BCI) systems [13]. The P300-based brain computer interface presents e.g. 36 characters on the screen that flashed up in a random order. The subject controlling the BCI has to look at the character he wants to spell. Whenever, the desired target character will flash up, the P300 component is produced in the brain and this reaction can be analyzed with parameter extraction and classification algorithms. The P300 concept has been the classical way to create a spelling device (see below; [8]). In the SSVEP based BCIs, the subject is presented with flashing lights that flicker with a specific frequency of e.g. 11, 12, 13 and 14 Hz. This interface presentation can generate, when the subject is looking at one of the lights, an EEG signal exactly at this frequencies and this will be detected with signal processing methods in the EEG raw data. Therefore a cursor movement (BCI application to train people) can be generated for instance, with 4 lights. Finally, in the BCI based on the EEG oscillatory
Interacting with the Environment through Non-invasive Brain-Computer Interfaces
485
components, selective EEG rhythm changes relevant to motor performances can be detected and function as control signal. In particular, in this type of interaction, the subject is confronted with a cursor moving on a screen and he/she has to learn to control the cursor movement toward a given target; the learning is based on a motor imagery task (e.g. a right-left hand/foot movement) that will induce an event-related desynchronization (ERD) of the motor–related EEG rhythms. Indeed, it is well known that sensorimotor rhythms (SMRs; in the alpha and lower beta EEG frequency ranges) decreases or desynchronizes with movement, preparation for movement or movement imagery and increases or synchronizes in the post-movement period [14]. This ERD can be again detected over the scalp motor regions, with dedicated parameters extraction and classification algorithms [15]. At the present state of the art, several BCI systems can be operated with all 3 EEG signals [1].
Fig. 1. The scheme illustrates the several conceptual steps fundamental to a classical EEG – based Brain Computer Interface (BCI) loop. The user’s intent is expressed by means of different attentive-cognitive tasks (for instance, attention to a “desired” target or motor imagery); these tasks induced a modification of the brain electrical activity (EEG evoked potentials or EEG oscillatory components). The stability of this link is achieved during the user’s training. Thus, some relevant features are extracted by the brain signals (EEG), classified and eventually translate into an action on the external world. What is crucial in this paradigm is the feedback to the user of her/his rain signal modulations which is instrumental for interacting via a BCI channel.
3 From Brain States to Control of Devices The two main functional blocks of a BCI system are the Transducer and the Control Interface [16; 17], and four main steps are needed to convert brain states into control of devices, namely biosignal collection, extraction of relevant features, decoding of feature patterns, and translation into a control signal (Fig. 2).
486
F. Cincotti et al.
Fig. 2. Functional model of a BCI system
The Transducer deals with the collection of data, the extraction of the features of interest and the first step of translation. In the collection phase, bioelectrical signals are collected from the surface of the scalp using electrodes, whose number ranges from 2 (simple applications) to 128 (brain mapping studies). These signals, whose amplitude is just a few microvolts, are amplified, digitized and sent to a processor. In the processing phase, relevant features are extracted from biosignals (Features Extractor). Processing may consist in averaging over a few repetitions of the same response of the brain to an external stimulus (as in the case of BCIs based on P300, see below), or in the analysis of spectral properties of the electroencephalographic signal (as in the case of BCIs based on sensorimotor rhythms). Then, these features are combined (linearly or nonlinearly), by means of the Features Translator, into a logical signal. These logical signals can be mapped into logical symbols (LS) that may either be an analogical signal (e.g. a degree of displacement from baseline values), or a discrete output (e.g. an actual classification). LSs constitute the input to the Control Interface which, in the second step of the translation, converts them, by means of some encodings, into a semantic symbol (SS), which is meaningful for the application control interface (e.g., in the case of a computer application, how many pixels a cursor should be displaced, which command was selected in a menu, etc.).Semantic symbols are finally translated into physical controls for the actual application, which may consist of, for instance, a computer program, an assistive device, a robot, or a domotic environment [18].
4 Current BCI Applications A non-exhaustive list of non-invasive BCI applications is given in the following. The aim is to provide a summary of current applications of BCI, and to form the base to discuss open issues. First, intuitive BCI application is communication and environmental control. Recently, non invasive EEG-based BCIs have gained interest as a control modality for robotic devices to increase mobility (like a wheelchair). 4.1 Support to Communication In a classical BCI application to increase or even to allow communication, introduced by Farwell and Donchin in 1988 [19], the subject can “mentally” select letters from a screen to write sentences. In this type of interaction, based on P300 control signals,
Interacting with the Environment through Non-invasive Brain-Computer Interfaces
487
users are presented with a 6x6 matrix where each of the 36 cells contains a letter or a symbol. The paradigm design is such that each row and column is intensified for 100 ms in random order and then, by instructing participants to attend to only one (the desired) of the 36 cells. Thus, in one trial of 12 flashes (6 rows and 6 columns), the target-desired cell will flash only twice constituting a rare event, compared to the 10 flashes of all other rows and columns and will therefore elicit a P300 [8]. Although with the limit of a low rate of information transfer (about 2 letters per minute, [20]) this BCI application has been successful in allowing communication in persons who have lost control of their muscles but have cognitive function preserved (so called locked-in syndrome) [21]. The “mental” text entry application ‘Hex-o-Spell’ incorporates principles of Human-Computer Interaction research into BCI feedback design [22]. Indeed, here the focus is on the challenge in designing a “mental” typewriter: to map a small number of BCI control states (typically two) to the high number of symbols (26 letters plus punctuation marks) while accounting for the low signal to noise ratio in the control signal. The system utilizes the high visual display bandwidth to help compensate for the extremely limited control bandwidth which operates with only two mental states (usually motor imagery of right hand and right foot), where the timing of the state changes encodes most of the information. The display is visually appealing (a moving arrow), and control is robust [23]. One of the aims of Hex-oSpell is to make the best use of the language model to reduce the effort required to enter text, without inducing enormous cognitive load or extensive training time. There are four common approaches to introducing language models into text entry systems: post hoc interpretation (e.g. as used in T9); adaptive target resizing (as in Dasher, [24]); dynamics adjustment (as in the original Hex); and layout re-ordering (used in Hex-o-Spell). The re-arrangement strategy does require visual search at every new letter input, but the minimal reorganization algorithm used in Hex-o-Spell significantly reduces the impact of this. Compared to other potential entry styles, such as Dasher or grid selection mechanisms, Hex-o-Spell is also very visually compact; the hexagonal display can potentially be used as a small overlay on top of a text being edited, giving the user an overview of the context in which they are editing. 4.2 Domotic Control In this section, we will illustrate a pioneering introduction of the BCI technology into the principles of the assistive technology (AT) devoted to the people’s daily life interaction with the environment. This introduction was one the aims of a project named ASPICE (Italian Telethon Foundation, GUP03562), addressing the implementation and validation of a technological aid that allows people with motor disabilities to improve or recover their mobility and communicate within the surrounding environment [18]. The key elements of the system are: (1) Interfaces for easy access to a computer: mouse, joystick, eye tracker, voice recognition, and utilization of signals collected directly but non-invasively from the brain using an EEG-based BCI system. The rationale for the multiple access capacities was twofold: (i) to widen the range of users, but tailoring the system to the different degrees of patient disability; (ii) to track individual patient’s increase or decrease (because of training or reduction of abilities, respectively) to interact with the system, according to the residual muscular activity
488
F. Cincotti et al.
present at the given moment of the disease course and eventually to learn to control the system with different accesses (up to the BCI) because of the nature of neurodegenerative diseases which provoke a time progressive loss of strength in different muscular segments. (2) Controllers for intelligent motion devices that can follow complex paths based on a small set of commands. (3) Information transmission and domotics that establish the information flow between subjects and the appliances they are controlling. Implementation of the prototype system core took advantage of advice and daily interaction with the users. It was eventually realized as follows. The core unit received the logical signals from the input devices and converted them into commands that could be used to drive the output devices. Its operation was organized as a hierarchical structure of possible actions, whose relationship could be static or dynamic. In the static configuration, it behaved as a “cascaded menu” choice system and was used to feed the feedback module only with the options available at the moment (i.e. current menu). In the dynamic configuration, an intelligent agent tried to learn from use which would have been the most probable choice the user will make. The user could select the commands and monitor the system behavior through a graphical interface (Fig. 3)
Fig. 3. A possible appearance of the feedback screen (icons in the graphical interface), including a feedback stimulus from the BCI (cursor moving on a screen towards a given target that is controlled by the user)
The prototype system allowed the user to operate remotely electric devices (e.g. TV, telephone, lights, motorized bed, alarm, and a front door opener) as well as monitoring the environment with remotely controlled video cameras. While input and feedback signals were carried over a wireless communication, so that mobility of the user was minimally affected, most of the actuation commands were carried via a powerline-based control system.
Interacting with the Environment through Non-invasive Brain-Computer Interfaces
489
4.3 Robot Control The non-invasive BCI technology has been successfully integrated with a complex robotic device for the continuous “mental” control of a wheelchair. This integration was one of main achievement of the MAIA (FP6-003758) project [25]. In this type of human-computer interaction via BCI technology, the subject was confronted with a display which simulated the robotic wheelchair, being in a first person view. The subjects were instructed to execute three mental tasks (imagination of movement, rest, and words association), and 2 tasks utilized as mental commands to operate the wheelchair, in a self-paced way. The mental task to be executed was selected by the operator in order to counterbalance the order, while the subjects decided when they started to execute the mental task. In successive experiments, the subject was asked to mentally drive both a real and a simulated wheelchair from a starting point to a goal along a pre-specified path. The pre-specified path was divided into seven stretches to assess the system robustness in different contexts. To further assess the performance of the brain-actuated wheelchair, subjects participated in a second experiment where he was asked to drive the simulated wheelchair following 10 different complex and random paths never tried before. Also, they can autonomously operate the BCI over long periods of time without the need for adaptive algorithms externally tuned by a human operator to minimize the impact of EEG non-stationarities. This is possible because of two key components: first, the inclusion of a shared control system between the BCI system and the intelligent simulated wheelchair; second, the selection of stable user-specific EEG features that maximize the separability between the mental tasks.
5 Open Issues: Transducer Features, Real World Applications, and Standardization Current research in the BCI field faces advancements in several aspects of its functioning. Here we will outline three main categories: intrinsic features of the BCI Transducer, deployment of BCIs in real world settings, theoretical modeling and standardization. Improving intrinsic features of the BCI transducer may regard many aspects, such as increasing the transfer rate of information, improving classification accuracy, speeding up training and calibration phases, addressing the “illiterates” issue. All of these issues are important as they can furnish disabled people a communication mean that is as much similar as possible to that of heal people and can lighten the communication load for them. Also, as the main purpose of BCI system is to help people to achieve some degree of independence in their daily life, usability of the interface in a non-laboratory setting is a fundamental aspect to consider and cannot prescind from the reduced obtrusiveness of sensors, the ease of configuration and operation, the on demand operability, dependability, portability and robustness of physical devices. The last open issue regarding BCI systems is the need of a theoretical model that is able to describe all the features of different BCI implementations and of a standardization of all the BCI-related components that can create a common BCI language.
490
F. Cincotti et al.
Formal models have been proposed to describe the general functioning of a BCI system [16, 26]; these models are important because they separately identify the main functional BCI blocks in all their functions and allow for combining and tuning them according to the final applications. In particular, the model defined in [26] describes, with unique static and temporal structures, different BCI implementations currently available in the literature, thus demonstrating that a unification of resources, and so their dissemination, in BCI research is possible. A standard model, in fact, leads to standards modules, for the implementation of BCI systems that can be independently designed and then matched or replaced according to the final application. Particular components of that standard should be interchangeable and independent (so different versions of each can be used without changes anywhere else in the system). Standards modules can finally lead to low-level technical standards that are not less important than the previous ones: for example, certain technical aspects, such as the layout of electrode connectors, are somewhat arbitrary. Because connectors are a mature technology, the definition of a standard for electrode connectors would provide the advantage of standards (i.e., improved interoperability) without being impacted by the disadvantage of that standard (i.e., stifled innovation in the area of electrode connectors). Technical standards have advantages and disadvantages that need to be considered. Use of technical standards can improve interoperability of components and thereby generally lessen the need for development and use expertise. FDA/CE certification is typically less costly. Technical standards might also provide the foundation to help solve possible future legal disputes arising from BCI development. On the other hand, technical standards might also stifle innovation in any area defined by a particular standard. Therefore, the choice of which areas should be standardized is an important one. In summary, standards should be chosen so that they specify only the interface between, but not the specific implementation of, particular BCI system components The standard should facilitate interaction among researchers. It should be practical so that it can facilitate diffusion and should not be covered by intellectual property protection such as patents.
6 Conclusions BCI research is a highly interdisciplinary field. Input is needed from clinical, engineering, neuroscience, psychology, and other fields, and interdisciplinary collaborations are required for further progress in BCI development. BCI offers an unusual and compelling testing ground for new interaction ideas in the HCI field. In fact, BCI control is characterized by unusual properties, when compared to more traditional inputs: long delays, noise with varying structure, long-term drifts, event-related noise, and stress effects. The current remedy to this is constituted by post hoc processing the BCI signal so that it better conforms to traditional control. Another possible long term approach is to devise novel interaction modalities. BCI should not be treated as if it were a “noisy” mouse; rather, unconventional interaction paradigms should be explored, independently from “Windows, Icons, Menus and Pointing devices” (WIMP) interfaces. This approach is a crucial cross-point in the SM4ALL[27] project, in which the one of the goals related to BCI is to go beyond
Interacting with the Environment through Non-invasive Brain-Computer Interfaces
491
command/execute but rather to infer user’s intention based on probabilistic notions and contextual information. Finally, a BCI technology could monitor the subject mental state (i.e. stress, detection of errors, attention) and adapt the dynamics of interactions appropriately. This novel approach represents a fundamental step in the TOBI [28] project, where the BCI technology will be moved towards applications in real life context. Acknowledgments. Part of the presented work is supported by FP7-224332 SM4ALL project; FP7-224156 TOBI project; DCMC Project of the Italian Space Agency.
References 1. Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Braincomputer interfaces for communication and control. Clin. Neurophysiol. 113(6), 767–791 (2002) 2. Kübler, A., Neumann, N.: Brain-computer interfaces–the key for the conscious brain locked into a paralyzed body. Prog. Brain. Res., 150513–150525 (2005) 3. Mellinger, J., Schalk, G., Braun, C., Preissl, H., Rosenstiel, W., Birbaumer, N., Kübler, A.: An MEG-based brain-computer interface (BCI). Neuroimage. 36(3), 581–593 (2007) 4. Yoo, S., Fairneny, T., Chen, N., Choo, S., Panych, L.P., Park, H., Lee, S., Jolesz, F.A.: Brain-computer interface using fMRI: spatial navigation by thoughts. Neuroreport. 15(10), 1591–1595 (2004) 5. Weiskopf, N., Mathiak, K., Bock, S.W., Scharnowski, F., Veit, R., Grodd, W., Goebel, R., Birbaumer, N.: Principles of a brain-computer interface (BCI) based on real-time functional magnetic resonance imaging (fMRI). IEEE Trans. Biomed. Eng. 51(6), 966–970 (2004) 6. Coyle, S.M., Ward, T.E., Markham, C.M.: Brain-computer interface using a simplified functional near-infrared spectroscopy system. J. Neural. Eng. 4(3), 219–226 (2007) 7. Nijboer, F., Sellers, E.W., Mellinger, J., Jordan, M.A., Matuz, T., Furdea, A., Halder, S., Mochty, U., Krusienski, D.J., Vaughan, T.M., Wolpaw, J.R., Birbaumer, N., Kübler, A.: A P300-based brain-computer interface for people with amyotrophic lateral sclerosis. Clin. Neurophysiol. 119(8), 1909–1916 (2008) 8. Sellers, E.W., Donchin, E.: A P300-based brain-computer interface: initial tests by ALS patients. Clin. Neurophysiol. 117(3), 538–548 (2006) 9. Müller-Putz, G.R., Pfurtscheller, G.: Control of an electrical prosthesis with an SSVEPbased BCI. IEEE Trans. Biomed. Eng. 55(1), 361–364 (2008) 10. Allison, B.Z., McFarland, D.J., Schalk, G., Zheng, S.D., Jackson, M.M., Wolpaw, J.R.: Towards an independent brain-computer interface using steady state visual evoked potentials. Clin. Neurophysiol. 119(2), 399–408 (2008) 11. Wolpaw, J.R., McFarland, D.J.: Control of a two-dimensional movement signal by a noninva-sive brain-computer interface in humans. Proc. Natl. Acad. Sci. U. S. A. 101(51), 17849–17854 (2004) 12. Sutton, S., Braren, M., Zubin, J., John, E.R.: Evoked-potential correlates of stimulus uncertainty. Science 150(700), 1187–1188 (1965) 13. Donchin, E., Spencer, K.M., Wijesinghe, R.: The mental prosthesis: assessing the speed of a P300-based brain-computer interface. IEEE Trans. Rehabil. Eng. 8(2), 174–179 (2000)
492
F. Cincotti et al.
14. Pfurtscheller, G., Aranibar, A.: Evaluation of event-related desynchronization (ERD) preceeding and following voluntary self-paced movement. Electroencephalogr. Clin. Neurophysiol. 46(2), 138–146 (1979) 15. Pfurtscheller, G., Neuper, C.: Future prospects of ERD/ERS in the context of braincomputer interface (BCI) developments. Prog. Brain. Res., 159433–159437 (2006) 16. Mason, S.G., Birch, G.E.: A general framework for brain-computer interface design. IEEE Trans. Neural. Syst. Rehabil. Eng. 11(1), 70–85 (2003) 17. Bianchi, L., Quitadamo, L.R., Garreffa, G., Cardarilli, G.C., Marciani, M.G.: Performances evaluation and optimization of brain computer interface systems in a copy spelling task. IEEE Trans. Neural. Syst. Rehabil. Eng. 15(2), 207–216 (2007) 18. Cincotti, F., Mattia, D., Aloise, F., Bufalari, S., Schalk, G., Oriolo, G., Cherubini, A., Marciani, M.G., Babiloni, F.: Non-invasive brain-computer interface system: towards its application as assistive technology. Brain Res. Bull. 75(6), 796–803 (2008) 19. Farwell, L.A., Donchin, E.: Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogr. Clin. Neurophysiol. 70(6), 510–523 (1988) 20. Krusienski, D.J., Sellers, E.W., McFarland, D.J., Vaughan, T.M., Wolpaw, J.R.: Toward enhanced P300 speller performance. J. Neurosci. Methods 167(1), 15–21 (2008) 21. Kübler, A., Birbaumer, N.: Brain-computer interfaces and communication in paralysis: extinction of goal directed thinking in completely paralysed patients? Clin. Neurophysiol. 119(11), 2658–2666 (2008) 22. Blankertz, B., Dornhege, G., Krauledat, M., Schröder, M., Williamson, J., Murray-Smith, R., Müller, K.: The Berlin Brain-Computer Interface presents the novel mental typewriter Hex-o-Spell, pp. 108–109. Verlag der Technischen Universität, Graz (2006) 23. http://www.dcs.gla.ac.uk/~rod/Videos/hexawrite_Sonne.mp4 (accessed on February 26, 2009) 24. Ward, D.J., Blackwell, A.F., MacKay, D.J.C.: DASHER—A data entry interface using continuous gestures and language models. Human-Computer Interaction 17(2-3), 199–228 (2002) 25. Galán, F., Nuttin, M., Lew, E., Ferrez, P.W., Vanacker, G., Philips, J., et al.: A brainactuated wheelchair: asynchronous and non-invasive Brain-computer interfaces for continuous control of robots. Clin. Neurophysiol. 119(9), 2159–2169 (2008) 26. Quitadamo, L.R., Marciani, M.G., Cardarilli, G.C., Bianchi, L.: Describing different brain computer interface systems through a unique model: a UML implementation. Neuroinformatics 6(2), 81–96 (2008) 27. Smart Homes for all, http://www.sm4all-project.eu (accessed on February 26, 2009) 28. Tools for Brain-Computer Interaction, http://www.tobi-project.org (accessed on February 26, 2009)
Movement and Recovery Analysis of a Mouse-Replacement Interface for Users with Severe Disabilities Caitlin Connor1, Emily Yu2, John Magee2, Esra Cansizoglu2, Samuel Epstein2, and Margrit Betke2 Department of Computer Science, Boston University, Boston, MA 02215, USA
[email protected], {eyu,mageejo,ataer,samepst,betke}@cs.bu.edu
Abstract. The Camera Mouse is a mouse-replacement interface for users with movement impairments. It tracks a selected body feature, such as the nose, eyebrow or finger, through a web camera and translates the user's movements to movements of the mouse pointer. Occasionally, the Camera Mouse loses the feature being tracked, when the user moves quickly or out of frame, or when the feature is occluded from view of the web camera. A new system has been developed to recognize when the tracked feature has been lost and to locate and resume tracking of the originally selected feature. In order to better understand the directions of movement which are most and least comfortable for users with disabilities, a game interface was developed to test the accuracy and speed of users across different trajectories. The experiments revealed that trajectories most comfortable for a user with severe cerebral palsy were along diagonal axes. Keywords: HCI, Assistive Technology, Camera Mouse, Video-based Interface.
1 Introduction Approximately 0.3 percent of the population worldwide suffers from a severe disability which can cause movement impairment [1]. This includes individuals with Cerebral Palsy, Spinal Muscular Atrophy, Amyotrophic Lateral Sclerosis, Multiple Sclerosis, and various neurological disorders. They would benefit from computer access but lack the ability to manipulate a traditional mouse. The "Camera Mouse" was developed to provide computer access to such users, who often are able to produce voluntary motions with their head or some part of their face [2,3]. The system uses a video camera or webcam to track a body feature, such as the nose, eyebrow, or finger, translating the user's movements into movements of the mouse pointer on the computer screen. For several years, the Camera Mouse has been helping users with disabilities successfully access a computer for purposes of communication, education, and entertainment [4]. In order to improve the Camera Mouse experience for users, two issues have been studied in depth. One issue is that for many users, use of the Camera Mouse requires the constant presence of a caretaker. When the system loses the selected feature during tracking, either by tracking a different feature or by attempting to track a piece of C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 493–502, 2009. © Springer-Verlag Berlin Heidelberg 2009
494
C. Connor et al.
background, it is necessary to reselect the original feature manually, which in most cases requires the intervention of a caretaker. For users who can operate independently otherwise, this can be a frustrating experience, especially for users who exhibit spastic motions, since such motions can cause the loss of the tracked feature. In response to this first issue, a new Camera Mouse system has been developed which automatically recognizes the loss of a tracked feature and automatically reselects that feature, so that the user can operate independently throughout the duration of his or her computer session. There are many software applications which work well with or are designed to work in conjunction with Camera Mouse [4]. They are designed with the needs of users with disabilities in mind. Generally, non-disabled users are able to move their heads comfortably in all directions when using the Camera Mouse to interact with a software application. However, it has been noticed over previous testing sessions for the Camera Mouse that many users with disabilities have difficulty moving their heads in certain directions. In response to this issue, the second part of this study aims to identify which trajectories are consistently more and less comfortable for the users with disabilities than others. Once the most comfortable trajectories have been determined, software applications can be created which require the most movement over the most comfortable axis. Our work relates to efforts by the Computer Vision community to develop facial feature tracking systems [5]. Many existing systems require some manual initialization, and few are able to recover from loss of track, for example due to occlusion [58]. The approach that is most similar to our own uses a threshold for correlation to identify loss of feature [6].
2 Methodology The Camera Mouse software is an accessible and inexpensive mouse replacement interface, as it is available online for free download and works in conjunction with a standard USB webcam [3]. Upon starting the system, the interface displays the video feed and prompts the user to manually select a feature in the image frame to track. Once a feature has been selected, tracking of that feature begins, as shown in Figure 1 left. The Camera Mouse interface allows toggling between mouse pointer control via the tracked feature and via the standard mouse. The transition from tracked feature to standard mouse control can be initiated by pressing the “Num” key or simply by moving the standard mouse. Such a transition is necessary if the user wants to select a different feature to be tracked or if the tracker lost the initially selected feature. The most common reasons that a feature being tracked is lost are that the user moves the feature out of the image frame, that an object moves in front of the feature, blocking it from view, or that the user makes a sudden movement so that the feature has moved far from where it was positioned in the previous frame. This last issue is of particular concern for users with disabilities because many exhibit spastic motions that can cause loss of the tracked feature. In all cases of loss, the Camera Mouse software will continue to track some part of the image, tracking either another feature on the user or a piece of the background and using the movement of that feature to direct the movement of the mouse pointer.
Movement and Recovery Analysis of a Mouse-Replacement Interface for Users
495
Fig. 1. The Camera Mouse interface which appears on the computer screen. Left: The interface during tracking. The green square appears over the feature being tracked. Right: The New Camera Mouse interface upon startup with oval graphic.
Our new system recovers a lost feature with a two-stage process. In the first stage, the system performs periodic checks on the tracked feature in order to recognize when the feature is lost. The second stage, the re-initialization stage, involves finding and reselecting the original feature to continue tracking it. An overview of the system is shown in Figure 2.
Fig. 2. Overview of new Camera Mouse system
2.1 Recognition of Breakdown of Interaction At the time the new Camera Mouse system starts, an oval graphic is inscribed on the currently viewed video feed, and the user is prompted to center his or her head in this shape before manually selecting the facial feature to track, as shown in Figure 1 right. Once the user does so, the coordinates of the pixel selected are stored for use in the re-initialization phase later. Once the feature has been selected manually, that feature is tracked. The system prepares for the possibility of tracked feature loss by saving a template, a sub-image of x by x pixels around the point which is manually selected, at the time of selection. The system detects that the original feature has been lost, or that the feature being tracked is not the originally selected feature, by periodically comparing the area of x by x pixels around the point being tracked in the current frame to the previously stored template every t image frames. The template and current sub-image
496
C. Connor et al.
are compared by calculating the normalized correlation coefficient (ncc) between them, based on the brightness values of their corresponding pixels [9]. If the normalized correlation coefficient falls below a threshold rn, then the feature being tracked and the originally selected feature are judged to be different, and the original feature is considered lost. To account for the possibility that the system is tracking a piece of background with brightness patterns similar to that of the original feature, the new Camera Mouse system also compares color values. The idea behind this comparison is that skin and hair colors have relatively more red and less blue tones than do most background colors, so comparing the color values should distinguish between them. At the time of manual feature selection, when the template is saved, the averages of the red, green, and blue (RGB) color values of the pixels in the template are computed and normalized. The normalization is computed for the red component as follows Normalized average red value = ∑(red values of each pixel) / [ ∑(red values of each pixel) + ∑(green values of each pixel) + ∑(blue values of each pixel) ], and for the green and blue components similarly. At the time of the system's periodic comparisons between template and tracked feature, the normalized RGB values of the tracked feature area are calculated, and so is the difference between them and the stored, normalized RGB values. If this difference is greater than d for any color, then the feature being tracked and the originally selected feature are judged to be different, and the original feature is considered lost. 2.2 Re-initialization of Interaction When the originally selected feature is considered to be lost, tracking is terminated and the re-initialization phase begins. The phase consists of three sub-phases which each implement a different searching strategy. The first sub-phase finds the lost feature in most cases, and the third sub-phase is rarely necessary. To find the lost feature, the new Camera Mouse tries to isolate the facial region by computing areas with changes in brightness from one frame to the next. The idea behind this is that if the user moves his or her head, the greatest brightness changes occur to the right and left of the user’s head. To detect the left and right sides of the face, the system compares the two most recent image frames, taking the difference in brightness values of the corresponding pixels between the images to create a difference image. Summing the differences in brightness for the pixels in vertical columns of the image that are w pixels wide, the system detects the two columns with the greatest total difference mark the boundaries of movement. The area between the two columns with greatest brightness difference delineates the region of the image where the face is positioned. If the feature became lost because of a sudden spastic movement of the user to the side, the face should be especially easy to find with this method. Once the left and right boundaries of the face are established, the top and bottom boundaries are determined to be y pixels above and below the y-coordinate of the originally selected feature. The y-coordinates are determined in this way because a user's vertical range of motion with a facial feature is much more limited than his or her horizontal range of motion, so the feature is almost always within this vertical range. Once the vertical and horizontal boundaries of the face have been
Movement and Recovery Analysis of a Mouse-Replacement Interface for Users
497
established, the bounded region is searched for the lost feature. The search is executed by calculating the normalized coefficient of the original feature template and the x by x pixel area that has its corner positioned every s pixels to the right and down from the upper left corner of the bounded region. Of these calculations, the ncc and location of the area with the maximum ncc are saved, and if the ncc is greater than the threshold rn, the system considers the feature to be found and tracks that area. If the ncc falls below the threshold rn, the feature is still considered lost, and the second sub-phase of search is initiated, which requires user cooperation. The oval graphic reappears, and the user is prompted to center his or her head in the oval. This way, the feature should be in the same area of the image that it was at the time of initial selection. After a pause in which the user can reposition his or her face, a region, extending u pixels left, right, above and below the coordinates of original selection is searched for the feature in the way described above. The user is given 3 seconds by default to center his or her head in the oval, but the amount of time can be changed in the Camera Mouse settings. If the feature is still not found, then the user may not be centering his or her head correctly, and, in the third and final sub-phase, the region bounded by the dimensions of the oval is searched. If at this point the feature is still not found, then the user did not position their face in the oval, and the prompt reappears. The user is given another three seconds the center his or her head, and the process repeats with the search described for the second sub-phase. If at any point, there is a manual mouse click on the image, which could happen if the user wants to switch which feature is being tracked, tracking is terminated, the oval graphic and prompt reappear, and then the user can select the new feature. The coordinates of the new selection are saved over the coordinates of the previous selection. 2.3 Movement Analysis In order to improve the Camera Mouse experience for users, we studied the directions of movement that are most and least comfortable for users with disabilities. We considered eight possible directions of movement: up, down, left, right and diagonally up/right, down/right, up/left, and down/left. We developed an interface to use in testing which features twelve targets spaced evenly around the edge of a square interface, as depicted in Figure 3 left. The user is prompted to select each target in turn. To make the testing an enjoyable experience, we reveal a humorous graphic on the target area to be selected (picture of face in Figure 3 right). The user selects a target by clicking on it anywhere in the target area, which in the Camera Mouse system is accomplished by dwelling over a small area for a pre-specified dwell time (e.g., 0.5 s). The order in which targets are revealed is predetermined so that each of the eight directions of movements are followed by the user (Figure 3 right). The ideal path that the user's movements should follow is considered the straightline path between the coordinates clicked on the previous target and the coordinates clicked on the current target. These ideal paths between targets correspond to the possible eight directions of movement. Over the duration of the test, the coordinates of the moving mouse pointer and the location of mouse clicks are recorded, so that the path taken by the mouse pointer can be analyzed in relation to the ideal path.
498
C. Connor et al.
The ease of movement for each direction of movement, or for each path between targets, was measured in two ways by comparing the ideal path with the actual path taken. The first measure defines the actual path as the sum of lengths of the straight line paths between each of the successive recorded mouse locations along the path the user took. To compute the second measure of ease of movement, each point on the actual path for which a mouse coordinate was recorded is projected perpendicularly onto the ideal path (Figure 4) and the distance between the point on the actual path and the projected point on the ideal path. The second measure is then defined by the average of these distances.
Fig. 3. The movement analysis interface. Left: The interface upon starting the test. Right: The trajectories the user follows during the test.
Fig. 4. Second measure of ease of movement. Mean distance from the ideal path (straight line) is calculated by averaging the shortest distances (brown lines) between each point (black disk) on the actual path (dashed line) and the ideal path.
3 Testing and Results 3.1 Recognition of Breakdown of and Recovery from HCI The system was tested with thirteen users without disabilities and with two users with disabilities in separate testing sessions. One initially tested user was a spastic quadriplegic, non-speaking person in his mid-forties who suffers from cerebral palsy. The
Movement and Recovery Analysis of a Mouse-Replacement Interface for Users
499
subject is a regular user of the Camera Mouse software. In a five-minute session in which the user played a game, Eagle Aliens, with the new Camera Mouse, the tracked feature was never lost, and the system never identified it as lost [3]. During this session, the user did not experience any spastic events which might have resulted in feature loss. We therefore designed the following experiment to simulate a common cause of feature loss. After the system was initialized to track the tip of the user's nose, we repeatedly forced the system to track the wrong feature by waving a hand between the camera and the user, obstructing the camera's view of the feature. This user went through the obstruction and recovery trial 8 times. Out of the 8 trials, the correct feature was identified 62.5% (5/8) of the time in sub-phase 1, 25% (2/8) of the time in sub-phase 2, and 12.5% (1/8) of the time, the system selected a part of the eyebrow to track rather than the nose tip. However, within seconds, the system determined that the piece of eyebrow was not the correct feature and selected the nose tip in sub-phase 1. We analyzed the reason that the system identifies a feature as lost. Each time the feature is identified as lost, the system records whether it was due to an ncc that was too low or due to RGB values that were too different. With this additional information, we conducted the same test involving deliberate feature occlusion. In tests consisting of 10 trials each with 13 users without disabilities, the correct feature was identified 76.1% (99/130) of the time in sub-phase 1, 20.8% (27/130) of the time in sub-phase 2, and 3.1% (4/130) of the time in sub-phase 3 (the feature was reselected correctly 100% of the time). The feature was identified as lost using the ncc 91.5% (119/130) of the time and because of color 8.5% (11/130) of the time. The same test was conducted with a user with cerebral palsy, consisting of 44 trials. The correct feature was identified 65.9% (29/44) of the time in sub-phase 1, 13.6% (6/44) of the time in sub-phase 2, and 20.4% (9/44) of the time in sub-phase 3. Again, the feature was reselected correctly 100% of the time. The feature was identified as lost using the ncc 93.2% (41/44) of the time and because of color 6.8% (3/44) of the time. 3.2 Movement Analysis The game interface described above was used to test six users, four with severe motion impairments due to cerebral palsy and two users without disabilities. Of the four subjects with disabilities, the test proved too cognitively challenging for three users, and thus no meaningful data could be retrieved from those testing sessions. The fourth subject, the same user who tested the new Camera Mouse system, ran two sessions successfully. The data from these two sessions and from the sessions run by the two subjects without disabilities were used in the analysis. The ease of movement for each direction of movement, or for each path between targets, was determined by the two measures defined above. On average, for users without disabilities, the length difference between shortest possible and actual path was 1,352 screen pixels, whereas for the subject with cerebral palsy, the average was 21,859 pixels. The mean distance for users without motion impairments was 13 pixels, and for the user with motion impairments it was 101 pixels. Each subject tested played the game twice, so that each was forced to move along every trajectory several times. The data from the individual paths recorded were averaged for each trajectory.
500
C. Connor et al.
Evaluating the movement patterns of subjects using these two measures, vertical and horizontal movements clearly proved the most difficult and least accurate directions of movement for the user with disabilities by having the greatest length differences between shortest possible path and actual path and having the greatest mean distance from the shortest path. The directions of movement which consistently proved the most natural and comfortable for the user with disabilities, those which had the least length differences between shortest possible path and actual path and having the least mean distance from the shortest path, were diagonal, particularly the up/right and down/left directions. On average, for users without disabilities, the mean distance from the ideal path was 8 screen pixels for horizontal and vertical trajectories and 18 pixels for diagonal trajectories, and the difference in path length from the ideal path was 1196 pixels for horizontal and vertical trajectories and 1503 pixels for diagonal trajectories. In contrast, the user with disabilities was least comfortable with vertical and horizontal movements, and most comfortable with diagonal movements. On average, for the user with disabilities, the mean distance from the ideal path was 124 pixels for horizontal and vertical trajectories and 83 pixels for diagonal trajectories, and the difference in path length from the ideal path was 22,971 pixels for horizontal and vertical trajectories and 18,522 pixels for diagonal trajectories.
4 Discussion and Conclusions Testing the new Camera Mouse with users with and without disabilities proved successful. The most encouraging result is that the correct feature was identified 100% of the time, rarely with a few-second delay between loss and recovery. This shows that users who have severe motion impairments can in fact have an independent session of computer use. Testing showed that in most cases, the feature was identified as lost as a result of an ncc below threshold rn, but the RGB color difference threshold d was still necessary for those cases where the system tracked a piece of background with a brightness pattern similar to the template. The results also showed that, of the three sub-phases of search for the original feature, the feature was identified within in sub-phase 1 in most cases (about 75% of the time). The system very rarely needed to employ sub-phase 3 (about 3% of the time). Since the user would only have to participate in the cooperative repositioning in about 25% of the instances of loss, he or she would rarely experience any interruption in tracking. Because the search in sub-phase 1 is based on the user's recent movements, the amount of activity that the user exhibits can alter the effectiveness of the search. For example, the second subject with disabilities that we tested had just taken medication which made him drowsy, so during the testing session, he exhibited minimal movement. As a result, the proportion of trials in which the search entered sub-phases 2 and 3 was much higher with him than with the first user or with the users without disabilities. For users who exhibit spastic movements, the opposite is likely to be true, since the system would be able to determine the facial boundaries with high accuracy if the user just made a sudden movement.
Movement and Recovery Analysis of a Mouse-Replacement Interface for Users
501
The Camera Mouse system accepts a range of cameras and image dimensions. The size of the template and tracked feature image (x by x pixels), the width w of the columns over which brightness is summed, as well as the distance y and length u which delineate the areas in which to search were optimized in relation to an image input size of 360 by 240 pixels. These dimensions were selected because they are the image dimensions of the web camera recommended on the Camera Mouse website. For example, with too small a value for x, the pattern becomes less distinct, and the system is less able to accurately distinguish between the tracked feature and similar patterns elsewhere in the image. But, with too large a value for x, the software operates too slowly. The ideal value for x we found was 21, so the sub-image represents 0.5% of the image. For the other variables, the optimal values were found to be 50 image frames for t, 10 pixels for w, 50 pixels for y, 5 pixels for s, and 30 pixels for u. The threshold values were also optimized so that they did not identify a part of the image that was not the original feature as the original feature but which was low enough that it allowed for variation in lighting and angle of the feature as he user moves. The threshold for the ncc, rn, is most accurate at 0.75, and the difference threshold for the normalized RGB values, d is best at 0.1. The most significant obstacle we encountered was the speed of the system. The more comprehensive the search becomes, the slower it is. We therefore had to compromise in the depth of the search in order to make sure that the Camera Mouse interface operates in real time. As the average user's computer becomes faster, we will be able to release newer versions of the system with a more comprehensive search procedure. In the future, we plan to try to improve the ease and independence with which Camera Mouse users access computers. One direction we will investigate is for the system to find the user's face and initialize a feature to track on startup, eliminating the need for manual selection. Currently, most software, including most software compatible with the Camera Mouse, has menus, scroll bars, and other interactive features which require the user to move the mouse pointer in horizontal and vertical directions. This is in keeping with our result that these directions of movement are the most natural for users without disabilities. Since our research has shown, in contrast, that horizontal and vertical trajectories are not necessarily the most comfortable directions of movement for users with disabilities, future software designed to work in conjunction with a mousereplacement assistive technology such as the Camera Mouse will consider relocating these interactive features along other axes. In particular, axes in diagonal directions should be considered if they turn out to be the more comfortable directions of movement for users with disabilities. Designing software this way will allow users with disabilities to have a less strenuous and time-consuming computer session. Acknowledgements. The authors thank the subjects for their efforts in testing the new Camera Mouse and participating in our movement analysis experiment. Caitlin Connor was supported as a Clare Booth Luce research fellow, and Emily Yu was funded by the CRA-W Distributed Mentor Program. The paper is based upon work supported by the National Science Foundation under Grant IIS-0713229 and the Boston University Undergraduate Research Opportunities Program. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.
502
C. Connor et al.
References 1. 2.
3. 4.
5.
6.
7. 8.
9.
World Health Organization, http://www.who.int/en/ Betke, M., Gips, J., Fleming, P.: The camera mouse: Visual tracking of body features to provide computer access for people with severe disabilities. IEEE Transactions on Neural Systems and Rehabilitation Engineering 10(1), 1–10 (2002) Camera Mouse, http://www.cameramouse.org/ Betke, M.: Camera-based Interfaces and Assistive Software for People with Severe Motion Impairments. In: Augusto, J.C., Shapiro, D., Aghajan, H. (eds.) Proceedings of the 3rd Workshop on Artificial Intelligence Techniques for Ambient Intelligence (AITAmI 2008), Patras, Greece, July 21-22 (2008) Castelli, T., Betke, M., Neidle, C.: Facial feature tracking and occlusion recovery in American Sign Language. In: Fred, A., Lourenço, A. (eds.) Pattern Recognition in Information Systems: Proceedings of the 6th International Workshop on Pattern Recogntion in Information Systems (PRIS 2006), Paphos, Cyprus, May 2006, pp. 81–90. INSTICC Press (2006) Strom, J.: Reinitialization of a Model-Based Coder. In: Proceedings of the EuroImage International Conference on Augmented, Virtual Environments and Three-Dimensional Imaging (ICAV3D 2001), Mykonos, Greece, 4 p. (May 2001) Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Computing Surveys 38(4), 1–45 Chen, J., Tiddeman, B.: A Robust Facial Feature Tracking System. In: IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS), Como, Italy, September 2005, pp. 445–449 (2005) Open Computer Vision Library, http://sourceforge.net/projects/opencvlibrary/
Sonification System of Maps for Blind – Alternative View Gintautas Daunys and Vidas Lauruska Siauliai University, Vilniaus str. 141, 76353 Siauliai, Lithuania {g.daunys,vidas.lauruska}@tf.su.lt
Abstract. An inexpensive sonification system of maps and charts for visually impaired is described. A digitiser (tablet) is used as systems input device, which helps to investigate the map. The maps are presented using xml technology – mainly svg language tags. Then the maps from svg are converted to RGB bitmap A system software is based on Microsoft .NET technology. Free Microsoft development systems as Visual C# 2008 Express Edition and Direct Sound are used to implememt sonification system. Keywords: Blind, sonification, map, svg language.
1 Introduction With the increasing usage of multimedia systems, there is a real need for developing tools able to offer aids for visually impaired or blind people in accessing graphical information. This technological development opened new prospects in the realization of man-machine interfaces for blind users. Many efforts have been devoted to the development of sensory substitution systems that may help visually impaired and blind users in accessing visual information such as text, graphics, or images. Some of them are based on transformation of visual information to auditive signal. These approaches assume a sufficient knowledge of both visual and auditory systems. At present time, we can consider that the various solutions suggested for text access are acceptable. However, the information presented in the form of graphics or images presents a major obstacle in the daily life of blind users. One of the first approaches of sonification signals used in human computer interaction is called earcons [1]. Sounds used for earcons should be constructed in such a way that they are easy to remember, understand and recognise. It can be a digitised sound, a sound created by a synthesiser, a single note, a motive, or even a single pitch. A method for line graph sonification invented in the mid 1980s was called SoundGraphs [2]. Movement along the x-axis in time causes notes of different pitches to be played, where the frequency of each note is determined by the value of the graph at that time. It was established by experiments with fourteen subjects that after a small amount of training, test subjects were able to identify the overall qualities of the data, such as linearity, monotonicity, and symmetry. The flexibility, speed, costeffectiveness, and greater measure of independence provided for the blind or sight-impaired using SoundGraphs was demonstrated. In the late 1980s a system called Soundtrack was developed [3]. It is a word processor for visually impaired people. The interface consists of auditory objects. An auditory object is defined by its spatial location, a name, an action, and a tone. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 503–508, 2009. © Springer-Verlag Berlin Heidelberg 2009
504
G. Daunys and V. Lauruska
Invention of haptic devices leaded to design of multi-modal interfaces to access graphical information. This technique was used in the GUIB system in which graphics were communicated using sound and text using synthesised voice or Braille [4].
2 Method Our aim was to develop widely available graphical information presentation system for blind user. We tried to use most common and cheapest hardware and open source or free software components. First we consider the system hardware. Computer mouse is optional graphic-input device. The device use relative motion, so when the user hits the edge he or she need merely pick up the mouse and drop it back. It is convenient during usual work with computer applications, but maps exploration system is one of exceptions. In our application we need devices which give absolute coordinates. For graphical input we selected digitiser (tablet). Other hardware is a standard PC computer with soundcard and Microsoft Windows operation system and load speakers. For graphical information description we selected SVG language because of reasons as was described earlier. Because we aimed achieved high interactivity, we don’t use standard products with SVG implementation as Adobe SVG Reader. We developed software using Visual C++ environment from Microsoft Visual Studio.NET. As a XML based language, SVG supports foreign namespaces. It is possible to define new elements or add new attributes. Elements and attributes in a foreign namespace have a prefix and a colon before the element or attribute name. Elements and attributes in foreign namespaces that the SVG viewer does not know, are ignored. However, they can be read and written by script. Foreign namespaces are used to introduce new elements (e.g. GUI elements, scalebars) and for the attachment of nongraphical attributes to SVG graphic elements. The SVG file, prepared for our system could be seen on other SVG viewers. Mapping represents a perfect application of SVG because maps are, by nature, vector layered representations of the earth. The SVG grammar allows the same layering concepts that are so crucial to Geographic Information Systems (GIS). Since maps are graphics that depict our environment, there is a great need for maps to be informative and interactive. SVG provides this interaction with very high quality output capability, directly on the web. Because of the complexity of geographic data (projection, coordinate systems, complex objects, etc.), the current SVG specification does not contain all the particularities of a GIS particularities. However, the current specification is sufficient to help the mapping community produce open source interactive maps in SVG format. Figure 1 is a example of represent map of Finland using SVG format. The hierarchical structure of file for storing map is shown in Figure 2[5]. There are shown only main elements. All map has text field with information about the map. This is information for presentation to user by speech synthesis. Other elements of the first level represent regions of maps. Actually, region is graphical tag of SVG, which describes contour of region. This tag has attributes related to sound, text and similar. Sound attribute allows to indicate sound file, which is played when cursor is over region. Text attribute is devoted to information about selected region.
Sonification System of Maps for Blind – Alternative View
505
<path id=“Finland“ fill="rgb(128,255,128)" M140 76C139.82 70.67 133.284 62.11 127 63.46C119.30 65.11 117.69 71.50 109.00 66.48C98.81 60.59 100.58 49.34 93.58 41.11C86.38 32.65 83.01 40.97 83 48L77 46L96 75C105.89 76.76 118.67 89.71 120.09 100C121.01 106.62 117.20 113.69 116 120L123 127 .... C143.73 65.90 144.32 72.16 140 76 z"/>
Fig. 1. Map with contour of Finland and example of contour description
Map Text
External contour
Region1
Sound
Region2
Text
Region3
Reference to other information
Fig. 2. The hierarchical structure for maps information storage in XML format file
A warning sound signal about boundary of two regions is issued, when pen of digitiser is close to the boundary. Because it is easy to jump through boundary without stop on it the signal is started to generate at some distance from boundary when the pen is approaching to it. Also it is necessary to indicate when the pen goes toward to boundary or is departing from boundary. So volume of sound is selected according to the distance from exact intersection of two regions. If the pen moves parallel to the boundary then the volume remains at constant level. When the pen is approaching to boundary the volume of warning sound is increased. The maximal volume is reached on exact boundary. The volume is decreased when pen crosses the boundary and recedes from it.
506
G. Daunys and V. Lauruska
3 Implementation In this section we will discuss implementation issues of the sonification system. For coding we selected C# language. We used the free Microsoft Visual C# 2008 Express Edition. The Windows application is based on System.Windows.Forms assembly. The developed software must be very stable because it will be impossible for a disabled to solve a software crash and respond to unpredicted dialog boxes. Best guarantee for stability should be found in widely used technologies. In recent years the .NET Framework by Microsoft has brought the ability to write much more robust and secure code on the Windows platform. Furthermore, .NET Framework is not operating system specific; there exist some projects where .NET Framework is implemented in other OS. For example, one of the projects is Mono leaded by Novel. One of the advantages of .NET Framework is its automatic memory cleaning, so called garbage collection. It is carried out when managed code is used. One of the simplest ways for managed code programming is the use of C# language. .NET Framework promises good options for interoperability. It is easy to combine code written in different .NET languages because all code is first translated into CIL (Common Intermediate Language). CTS (Common Type System) also exists and ensures compatibility of parameter types in functions calls. It is simpler to invoke methods on COM objects. There are also some choices for cross-machine communication between managed modules. The parsing of SVG document was implemented using XLINQ library functions, other called as LINQ to XML library. The abbrevation LINQ stands for NET Language-Integrated Query. LINQ defines a set of general purpose standard query operators that allow traversal, filter, and projection operations to be expressed in any .NET-based programming language. The standard query operators allow queries to be applied to any IEnumerable
-based information source. XLINQ provides both DOM and XQuery/XPath like functionality in a consistent programming experience across the different LINQ-enabled data access technologies. We used object-oriented programming technology. XLINQ allows parse data from XML file directly to classes of graphical objects. Graphical rendering was implemented with Windows GDI+ functions. PictureBox control allows draw stable pictures. Included bitmap in it allows organise navigation plane.
Map Text
External contour
Region1
Sound
Region2
Text
Region3
Reference to other information
Sonification System of Maps for Blind – Alternative View
507
For speech synthesis we used Speech library from NET. Framework version 3.0. It allows not only synthesize English speech but also some effects as emphasis of words or speech rate changes by 5 levels. Only one software component was used outside .NET Framework. It was DirectSound library from Microsoft DirectX version 9c. Attractive features of DirectSound are advanced sound playing control: some files in the same time with independent parameters control.
4 Discussion The differences of visual and auditory systems are pointed by Brewster [6]. Our visual system gives us detailed information about a small area of focus whereas our auditory system provides general information from all around, alerting us to things outside. Visual system has a good spatial resolution, while auditory system has preference in time resolution. So it is impossible to convey the same information by these two information channels. In the sonification report [7] it is stated that progress in sonification will require specific research directed at developing predictive design principles. There is also indicated about the need of research by interdisciplinary teams with funding that is intended to advance the field of sonification directly, rather than relying on progress through a related but peripheral agenda. Analysis shows that there many different sonification efforts including solutions for visually impaired but they are more as project results and are not widely available. The described sonification system can be easily implemented and easily integrated to bigger projects. The improvements mostly can concern selection of sounds.
5 Conclusions XML format files were successfully used for preparing information for sonification. The developed model of sonification was successfully implemented using free software development tools: Microsoft Visual C# 2008 Express Edition and Microsoft DirectSound library.
References 1. 2. 3. 4.
Blattner, M.M., Sumikawa, D.A., Greenberg, P.M.: Earcons and Icons: Their Structure and Common Design Principles. J. Human Computer Interaction 4(1), 11–44 (1989) Mansur, D.L., Blattner, M., Joy, K.: Sound-graphs: A Numerical Data Analysis Method for the Blind. Journal of Medical Systems 9, 163–174 (1985) Edwards, A.D.: Soundtrack: An Auditory Interface for Blind Users. J. Human Computer Interaction 4(1), 45–66 (1989) Mynatt, E.D., Weber, G.: Nonvisual Presentation of Graphical User Interfaces: Contrasting Two Approaches. In: Proceedings of the CHI 1994 Conference on Human Factors in Computer Systems, pp. 166–172. ACM Press, New York (1994)
508 5.
6. 7.
G. Daunys and V. Lauruska Daunys, G., Lauruska, V.: Maps Sonification System Using Digitiser for Visually Impaired Children. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 12–15. Springer, Heidelberg (2006) Brewster, S.A.: Non-speech Auditory Output. In: Jacko, J.A., Sears, A. (eds.) HumanComputer Interaction Handbook, pp. 220–239. Lawrence Erlbaum Associates, NY (2002) Kramer, G., Walker, B.: Sonification report: Status of the field and research agenda, http://www.icad.org/websiteV2.0/References/nsf.html
Scanning-Based Human-Computer Interaction Using Intentional Muscle Contractions Torsten Felzer, Rainer Nordmann, and Stephan Rinderknecht Department of Mechatronics in Mechanical Engineering Darmstadt University of Technology Petersenstr. 30, D-64287 Darmstadt, Germany {felzer,nordmann,rinderknecht}@mim.tu-darmstadt.de
Abstract. It has already been shown in the past that it is possible to leverage tiny muscular contractions produced at will (e.g., by frowning) in order to give someone complete control over a PC [1]. The underlying interaction technique is ideal for persons with severe motor impairments who are in need for an alternative, non-standard way to operate a computer. This paper deals with a scanning-based computer application of that approach to enable its user to control the immediate environment, e.g., by making a phone call, toggling the lights, or sending particular Infra-Red (IR) remote signals. Although the software is primarily targeted at people with disabilities, it is ready – and (in certain situations) even expected – to be used by able-bodied individuals as well. A user study evaluating the remote control module of the system has been conducted with twelve non-impaired subjects, and the results are discussed herein. keywords: Human-computer interaction, bio-signal interfaces, scanning, hands-free access, universal remote control, Speech API (SAPI).
1
Introduction
Technological progress is visible everywhere nowadays, in particular concerning the development of personal computers. For example, a modern graphics card manages a multiple of the memory capacity of a high-end hard disk drive from the late 1980s – at only a fraction of the cost. However, it is very interesting that the standard way to interact with a computer is basically the same as it was half a century ago. Of course, the typewriter-style keyboard was supplemented with a pointing device (a necessary step with the advent of GUI-based – as opposed to pure text-based – operating systems), but the operation still requires – in the standard case – the usage of the hands. This is also true for many computermediated (or “computer-like”) devices, such as a telephone or an infra-red remote control. Unfortunately, not everyone is able to reliably employ the hands. To be able to control a computer (and consequently the immediate environment), many persons with physical disabilities rely on appropriate alternative interfaces. Moreover, an able-bodied person may not want to leave the exercising machine when C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 509–518, 2009. c Springer-Verlag Berlin Heidelberg 2009
510
T. Felzer, R. Nordmann, and S. Rinderknecht
listening to music while working out (just to pick up the remote to skip to the next song), or probably he/she feels uncomfortable with touching the cordless phone with wet hands (when a call is coming in while lying in the bathtub). The next section discusses common (“hands-free”) alternatives in use so far. This is followed by a closer look at a certain muscle-based input principle, which is applied to an environment control system in section 4, while section 5 presents a user study with twelve able-bodied participants, evaluating an integral part of that system. A brief conclusion is given in section 6.
2
Related Work
As for hands-free computer access, persons who do not have any articulation problems can benefit from solutions based on speech recognition, i.e., “talk to their computer” (e.g., [2]). Eye trackers comprise a second group of (mostly rather expensive) alternatives (e.g., [3]). Switch-activated devices often work in conjunction with a particular scanning software (e.g., [4]), where the computer suggests output actions by cyclically highlighting a number of available options, and the user may select the highlighted option by activating a certain switch (also [5]). A fourth, very large group of alternatives is given by so-called bio-signal interfaces: those systems monitor the time series of a certain bodily function of the user and generate corresponding output commands in response to specific patterns in the monitored signal – willfully issued by the user [6]. An important representative is an EEG-based Brain-Computer Interface (BCI) which analyzes the on-going brainwaves of a subject (e.g., [7]). What makes this very popular as a concept helping persons with physical disabilities is that it requires mental activities only. The major disadvantage of BCI’s is that EEG recording is very slow and sensitive (e.g., [8]). The input idea detailed in this paper represents a mixture between a bio-signal interface and a single-switch device. On the one hand, the muscular activity of a single dedicated muscle is inspected, in order to detect intentional contractions of that muscle. On the other hand, the contraction events are used in analogy to the activation of a switch, i.e., processed in a scanning-like fashion. The same input concept has already been implemented in a mouse emulator (see fig. 1) allowing its user to control a PC (or a computer-mediated device) just by frowning (without the need to use the hands) [1]. However, as that tool is optimized for mouse actions, only its data acquisition engine can be reused here.
3
Input Principle
Instead of any manual interaction, the technique focused here relies on tiny contractions of an arbitrary muscle of choice. To detect those input signals, the amplitude of the muscular activity of the chosen muscle – recorded and amplified with the help of a piezo-based sensor – is compared to a user-dependent threshold. Exceeding the threshold triggers a contraction event.
Scanning-Based Human-Computer Interaction
a
511
b
Fig. 1. The “HAnds-free Mouse COntrol System” (HaMCoS) allows its user to operate a PC completely, only by issuing tiny muscle contractions. a) Various framework applications each deal with a specific task - the application shown here is for playing a certain board game against the computer (it is intended to provide a hands-on approach for getting familiar with the system); b) with HaMCoS, it is also possible to employ standard tools in a non-standard way, e.g., logging in to a banking site (displayed in an ordinary web browser) can be done by opening a special on-screen keyboard.
Depending on the amount of time between two contraction events, the HaMCoS tool mentioned above discerns a Single (SC) and a Double Contraction (DC). Those two input events are processed by switching among the states of a Finite State Machine (FSM), and the currently activated state governs mouse pointer movement. As a result, HaMCoS can emulate a mouse device, and its user just needs to issue SC’s and DC’s. In contrast to automatic scanning (where the highlight automatically cycles to the next option after a certain time period has elapsed), HaMCoS basically implements some form of self-paced scanning, in that it cycles to the next pointer movement direction (in a given order) upon detection of explicit contraction events. The application detailed in the next section reverts to conventional (automatic) scanning with contraction signals to select the highlighted option.
4
Environment Control Application
An Environment Control System (ECS) has been implemented (in C++, under R XP) as a target application, capable of processing the muscle-related Windows input signals. The ECS (depicted in fig. 2) basically offers four modules, e.g., turning the PC into a comfortable telephone set or into a Universal Remote Control (URC). Each module is associated with up to 32 buttons (arranged in an 8 × 4 grid) which are activated using a special variant of row-column scanning – or optionally using a manual mouse to cut short the entry of longer button sequences for able-bodied users. Only one of the four module displays is active at any given time. When the user triggers the selection of a button of that module, the display cycles through the representations symbolized in fig. 3. At the top level (following the selection of another button), the four rows are scanned from top to bottom, i.e., the
512
T. Felzer, R. Nordmann, and S. Rinderknecht
Fig. 2. The Environment Control System comprises four individual modules
a
b
c
d
Fig. 3. Different phases during the selection cycle of any button are accompanied by specific display layouts. This is illustrated using the example of toggling the lights: a) row scan; b) column scan; c) confirmation period; d) notification period.
(eight) buttons in a specific row are highlighted, and the highlight advances to the next row (cyclically) after the scan-delay1 τ . Following a contraction while the desired row is highlighted, the eight columns are scanned left to right (and the “intersecting” button is particularly marked). A second contraction pre-selects a button, but to make the system fault-tolerant, the associated action will not be generated unless the user confirms the selection with a third contraction in the subsequent period of length τ . A successful confirmation is answered with a notification (as depicted in fig. 3d)). The entire process starts anew with a row scan in either case. 4.1
Functionality
The main purpose of the ECS is to provide an accessible interface for persons with severe motor impairments. In this respect, each of the four modules deals with a different aspect of everyday life. The first (top left) module is for making telephone calls, answering incoming calls, managing phone numbers as well as call histories, or composing text 1
For the time being, the scan-delay is set to τ = 1.0s – it shall be made adjustable in future versions.
Scanning-Based Human-Computer Interaction
513
Fig. 4. The editor routine – shared by all four modules – helps the user in saving “keystrokes” by implementing Word Prediction (WP). Here, entering “En” leads to the 12 suggestions displayed in the box on the left.
messages – its purpose is to simply replace the usual telephone with a hands-free alternative2 . Whenever the user wishes to enter a text message (“SMS”), the module display is temporarily replaced by the editor routine depicted in fig. 4. In addition to an on-screen keyboard with 8 × 8 “character buttons” (to be selected just as any other button) on the left, the editor presents the entered text as well as some miscellaneous buttons on the right (e.g., for confirming or discarding the input). In order to help the user save keystrokes, the editor suggests prediction candidates of the currently “typed” word based on lists of frequently used terms [9]. Closing the editor with the “OK” button results in on-line adaptation of the personal frequency list. The same editor routine is invoked for all text entry requirements in the other modules as well. After the installation of switchable power outlets, the switch-board module on the lower left can be used to turn suitable appliances on or off. Possible target devices include lamps, a fan, a heater, blinds, or an electric door opener. The top-right module allows the user to transmit pre-recorded IR signals without the need to employ the hands. It turns the PC into a URC, and, once programmed (preferably to be done by an able-bodied caregiver) it can be of great help for a motor-impaired person. As this module is in the focus of the user study presented in the next section, it is described in more detail below. The fourth module enables a speech impaired user to “speak with the comR SAPI, this component not only offers puter’s voice”. Applying the Windows to pronounce standard, pre-determined utterances (like “Yes”, “No”, or “Thank You”), but also arbitrary user-entered sentences. The word prediction feature makes it even possible to have simple conversations via PC. 4.2
Universal Remote Control
The display of the “IR-Remote” module is dominated by a layout representing (a part of) an original remote control with (up to) 24 buttons associated with 2
It should be noted that only the user interface of this module is finished at this point. There are precise ideas for realizing the functionality, but the actual implementation remains a task for the near future.
514
T. Felzer, R. Nordmann, and S. Rinderknecht
a
b
c
Fig. 5. The remote control module runs in two modes: the default transmit mode (a) and a configuration mode offering to edit the button labels (b) or to capture IR data (c). R device of the company HOMElectronics, R appropriate IR signals. With the Tira the software is able to replace the corresponding remote. The user (or its able-bodied assistant, in case of a disabled user) can “copy” the remote of any appliance by creating a new layout, labeling the 24 IR code buttons, and assigning suitable IR signals to them (see fig. 5). There is no limit as to the number of layouts to be created – new layouts are stored on hard disk and (so far) accessed linearly.
5
Evaluation Study
A user study evaluating the remote control module of the environment control system described in the previous section has been conducted with two goals in mind. First, it was intended to verify the general usefulness of the software. The second objective was to investigate how able-bodied users react to this software. 5.1
Participant Recruitment and General Organization
It was decided to look for possible study participants among the non-impaired members of the technical staff at the authors’ university. It is true that that group is surely not representative of all non-impaired people, but as they are using a computer at work, they are able to estimate the value of a software from a rather comprehensive perspective. Twelve subjects (one female) accepted our invitation to take part in the study. The ages ranged between 26 and 38 years with an average of 28.4 years (standard deviation 3.4). The average usage of IR-controlled devices was 2.9 hours per week (standard deviation 4.8), while five participants did not use any IR-controlled device at all on a regular basis. All twelve were already familiar with the general concept of a URC, though none of them utilized one him-/herself. The user contribution was divided into three parts: one practical session (per subject!) and two questionnaires, one before and one after the practical session. The pre-evaluation questionnaire asked for certain demographic data as well as the participants’ prior exposure to IR-controlled devices. It also requested informed consent. The post-evaluation questionnaire asked about the (subjective) opinions of the participants regarding the software – after they had a chance to emulate a particular remote with the tool. Response options corresponded to seven-point Likert-style scales.
Scanning-Based Human-Computer Interaction
5.2
515
Practical Sessions
The practical sessions involved the experimental setup depicted in fig. 6. UnforR device has two major drawbacks (with quality being its tunately, the PHILIPS main advantage). On the one hand, it comes with an alternating remote control. This means that two consecutive keypresses on the same button of the remote control do not result in the transmission of the same IR code. Instead, the remote stores two codes for each button, say, one of some type A as well as one of type B, and the remote constantly alternates between A and B for each keypress. This is no problem as long as one is using the original remote, since the receiving unit also alternates in the code type to expect. However, it makes life extremely unpleasant for users of universal remotes, since they not only have to assign two URC buttons to every remote control function, they also have to do the alternation “by hand”.
R X41 tablet PC with a headband Fig. 6. The experimental setup involved an IBM R PET 1035 portable sensor connected to the microphone input (left) and a PHILIPS R device (right). DVD player as well as a USB-driven Tira
The second drawback relates to the timeout of certain menus (e.g., for configuring the subtitle settings). An open menu will automatically disappear, if the unit does not receive a code of the expected type for more than about 14 seconds. This is particularly frustrating if one has managed to descend three levels, but then just needs half a second too long for triggering the finalizing “Play” (as the menu has to be reopened at the top level) 3 . The individual practical sessions began with a detailed explanation of the setup and the used software tool. The participant’s task was then to emulate (partially) the remote of the DVD player – this involved the following four phases: 3
The DVD player originally designated for the study broke the day before the scheduled practical sessions – a replacement had to be obtained rapidly ...
516
T. Felzer, R. Nordmann, and S. Rinderknecht
1. assigning the 12 most important remote control functions to the IR code buttons of an own layout, i.e., labeling the 24 buttons4 , 2. learning the IR code of each button, i.e., capturing the signals issued by pressing corresponding keys on the original remote, 3. verifying that the signals were captured correctly, and 4. applying the hands-free input method to the newly created layout. Researchers recorded the duration of each phase, and they were also ready to answer questions or to give hints when requested. However, the subject was encouraged to perform all subtasks without further guidance (and it was their decision when to end a phase and move on to the next). 5.3
Results and Discussion
Since every participant was asked to copy the same 12 remote control functions, the resulting layouts looked quite similar – fig. 7 shows a typical representative. The numerical results regarding the practical sessions and the post-evaluation questionnaires are presented in tables 1 and 2.
Fig. 7. The resulting layouts resemble each other – that of Participant 07 is typical
It can be seen in table 1 that capturing the 24 IR codes (D2) can be done relatively fast with the program as opposed to labeling the buttons5 . The (mostly) large D4 value suggests that participants were quite satisfied being able to operate the DVD player hands-free. This satisfaction can also be deducted from table 2 (with positive scores for almost every question, particularly the one about the social value). The scores might be even better, if the chosen equipment did not have the drawbacks mentioned above6 . The only exception is question Q4: participants doubt they will use the program again. There might be three reasons for this. First, having a motor-impaired family member would greatly increase the usage probability, but this was no requirement when recruiting study participants. Second, an able-bodied person 4 5 6
The mouse was used here on the on-screen keyboard of the editor (without WP). Many participants wished to use the keyboard rather than mouse-based shortcuts for labeling – in the meantime, this functionality was added. An optional support of alternating remotes was implemented directly after the study.
Scanning-Based Human-Computer Interaction
517
Table 1. Objective, measurable results. The first (value) line in this table denotes the number of mouse clicks each participant performed while labeling all of the buttons of “her/his” layout (which relates to the total number of characters), D1 is the amount of time needed for that, and Rate is the quotient of those two values. D2, D3, and D4 stand for the time each participant spent capturing the IR codes of all buttons, verifying (testing) the stored codes (with the mouse), and evaluating the final layout with the help of the alternative (hands-free) input method, respectively. All duration values are given in minutes (rounded to the nearest integer). Feature “Keystrokes” D1 (Labels) Rate [kspm] D2 (Capture) D3 (Test) D4 (Altern.)
01 263 15 17.5 3 5 12
02 201 12 16.7 2 2 11
03 248 15 16.5 3 3 12
04 221 15 14.7 3 4 14
Participant 05 06 07 08 180 229 192 248 21 25 10 10 8.5 9.1 19.2 24.8 3 3 2 2 12 9 2 3 7 5 19 6
09 165 8 20.6 3 7 16
10 219 11 19.9 3 7 7
11 126 10 12.6 6 2 27
12 196 12 16.3 3 2 16
Avg. Std. Dev. 207.3 39.0 13.6 4.9 16.3 4.6 3.0 1.0 4.8 3.2 12.6 6.3
Table 2. Subjective evaluation (scores on Likert-type seven-point scales). The five Likert-type questions were as follows: Q1. How easy was it to use the program (1=very easy; 7=very hard); Q2. How enjoyable was it to use the program (1=very enjoyable; 7=very annoying); Q3. How would you rate your sympathy towards the program (1=liked it; 7=hated it); Q4. How probable is it that you want to use the program again (1=very probable; 7=very improbable); Q5. How would you rate the (social) value of the program (1=very valuable; 7=no value at all). Feature 01 Q1 (Easy) 3 Q2 (Enjoyable) 2 Q3 (Sympathy) 2 Q4 (Again?) 6 Q5 (Value) 2
02 3 5 2 4 2
03 2 3 3 4 3
04 4 3 3 2 3
Participant 05 06 07 08 09 4 3 3 3 2 3 2 2 2 2 2 1 2 3 1 2 6 6 5 3 1 1 3 2 1
10 2 3 3 5 2
11 3 4 3 6 1
12 3 2 2 6 1
Avg. Std. Dev. 2.9 0.6 2.7 0.9 2.2 0.7 4.5 1.5 1.8 0.8
probably prefers (and expects) features our program does not have (other than, e.g., providing hands-free access). And finally, as the usage of IR-controlled devices is rather modest among the members of the study group, it is not too probable that they will use any URC software in the first place.
6
Conclusion
An environment control system intended to enable someone with severe physical impairments to control her/his immediate surroundings, e.g., by turning on the lights or switching channels on the TV has been described. The tool merely relies on intentional muscle contractions as input signals – for instance, the system asks for no more physical contribution than simply frowning. The software proved to
518
T. Felzer, R. Nordmann, and S. Rinderknecht
be satisfactory in a user study with able-bodied subjects, yet it can hardly be considered a general alternative for non-impaired individuals (i.e., not assisting someone with a disability). Another user study investigating the reactions of persons with motor impairments is a necessary next step. The same input idea has already been applied to the PC operation tool HaMCoS as well as a system controlling an electric powered wheelchair [10]. All three systems are constantly being extended and ultimately combined with HaMCoS being the common platform, which greatly adds to the independence of a severely disabled person. Acknowledgments. This work is supported by DFG grant FE 936/3-1 ”The AID package – An Alternative Input Device based on intentional muscle contractions”.
References 1. Felzer, T., Fischer, R., Gr¨ onsfelder, T., Nordmann, R.: Alternative control system for operating a PC using intentional muscle contractions only. In: Online-Proc. CSUN Conf. (2005) 2. Assistive Technology & Accessibility at Oklahoma State University – Dragon NaturallySpeaking 9.0, http://access.it.okstate.edu/content/view/22/ 3. Wobbrock, J.O., Rubinstein, J., Sawyer, M., Duchowski, A.T.: Not Typing but Writing: Eye-based Text Entry Using Letter-like Gestures. In: Proc. COGAIN 2007, pp. 61–64 (2007) 4. Baljko, M., Tam, A.: Motor input assistance: Indirect text entry using one or two keys. In: Proc. ASSETS 2006, pp. 18–25. ACM Press, New York (2006) 5. Steriadis, C.E., Constantinou, P.: Designing human-computer interfaces for quadriplegic people. ACM Trans. Comput. Hum. Interact. 10(2), 87–118 (2003) 6. Felzer, T.: Verwendung verschiedener Biosignale zur Bedienung computergesteuerter Systeme (Using various kinds of bio-signals for controlling computermediated systems). Ph.D. Thesis (German). Wissenschaftl. Verlag Berlin (2002) 7. Birbaumer, N., Hinterberger, T., K¨ ubler, A., Neumann, N.: The thoughttranslation device (TTD): Neurobehavioral mechanisms and clinical outcome. IEEE Trans. Neural Syst. Rehabil. Eng. 11(2), 120–123 (2003) 8. McFarland, D.J., Sarnacki, W.A., Vaughan, T.M., Wolpaw, J.R.: Brain-computer interface (BCI) operation: Signal and noise during early training sessions. Clinical Neurophysiology 116, 56–62 (2005) 9. Felzer, T., Nordmann, R.: Speeding up hands-free text entry. In: Proc. CWUAAT 2006, pp. 27–36. Cambridge University Press, Cambridge (2006) 10. Felzer, T., Nordmann, R.: Alternative wheelchair control. In: Proc. RAT 2007, pp. 67–74. IEEE Computer Society Press, Los Alamitos (2007)
Utilizing an Accelerometric Bracelet for Ubiquitous Gesture-Based Interaction Albert Hein1 , Andr´e Hoffmeyer1,2 , and Thomas Kirste1 1
2
University of Rostock, Institute of Computer Science, Rostock, Germany [email protected] Fraunhofer Institute for Computer Graphics Research, Rostock, Germany [email protected]
Abstract. In this paper we present an approach for recognizing free-handed gestures using an embedded wireless accelerometric bracelet. We developed a very low complexity algorithm which can be directly implemented on the device and operate in real-time. New gestures can be easily added through supervised learning. An evaluation shows the feasibility of our approach. Simple gestures are detected and recognized at a very high rate (> 97%) while more complex ones were misclassified more often (48% – 95%).
1 Introduction Embedding computers into ambient environments induces the demand for new and intuitive input modalities and concepts. Gestures are currently widely seen as a promising and natural method for communicating with intelligent devices. Hollywood movies like “Minority Report” made this kind of user interface popular. Although continuously waving the arms around for hours perhaps isn’t really desireable in the real world, gestures could be well suited for short term or sparse interactions. Why not control a presentation on a white board with an arm stroke or zooming into a picture by drawing a circle with your hand? Recently the task of gesture recognition gained a lot of interest in the research community. While traditionally this is seen as an application for computer vision, this approach is not always suitable as it is dependent on camera infrastructure, adequate lighting and high computing capacity. The wide availability of small and low cost acceleration sensors and their integration into embedded devices rapidly increased the demand for sensor based gesture recognition methods. These sensors are able to detect and measure the motion of a device in a certain direction. Newer models of 3-axis accelerometers can capture the device orientation and movements in three degrees of freedom. While first research in this area was based on custom built sensing boards [1,2], later work is done mostly on handheld mobile devices [3,4,5]. A lot of investigation has recently gone into the commercially available Wii Controller as some kind of gesture aware remote control which already resulted in an open source gesture recognition library for desktop computers [6]. However, being dependent on holding a piece of hardware in the hand, these devices do not allow freehanded work, which is why we developed a bracelet prototype. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 519–527, 2009. c Springer-Verlag Berlin Heidelberg 2009
520
A. Hein, A. Hoffmeyer, and T. Kirste
Most of these approaches are built upon Hidden Markov Models [1,4,5,6,7]. They are well known from speech and handwriting recognition, and are also used for activity and visual gesture recognition. Most of the algorithms in literature show good results even for relatively complex gestures. But as we are focusing on real-time detection using a low-power microcontroller on a tiny embedded device we depend on simpler and model-free techniques as we don’t have the computing power for bayesian inference. We decided to use statistical classifiers as they have already proven to work well for other related pattern recognition tasks like base level activity and motion recognition [8,9,10]. Own previous work has shown that for instance decision trees usually produce good recognition rates in these use cases, while preserving a very low footprint and so are well applicable for embedded sensor data classification. In our work we are not primarily interested in the most successful algorithm possible for recognizing arbitrarly complex gestures. We focus on the specific application of an autonomous and low cost bracelet device. In this context we are concentrating on building a rather simplistic pattern recognition pipeline, trying to get as close to the recognition performance of model based approaches as possible, while keeping the computing complexity minimal. For this purpose we built a prototypic appliance called “GRacelet” for recording and testing our algorithm with basic free-handed gestures. Our system utilizes user independent training and is extensible regarding new gesture classes. This paper is organized as follows: First our understanding of gestures, specific requirements and algorithmic basic concepts are defined, followed by an explanation of the single steps of the recognition process. In the following section our prototypic implementation is presented. After that we decribe our experimental setup and show the results of our evaluation. Finally an outlook on future developments is given together with conclusions.
2 Gesture Recognition Before elaborating the algorithmic details we will have to clarify which kinds of gestures we intend to detect. Our gestures are executed with one single hand and should not last longer than 2 seconds. The specific finger posture is irrelevant and the subject’s body is in an upright position. Basic motions follow an approximately linear or circular path or are a combination of both.
Measurement
Training
Segmentation
Feature Extraction
Classification
Filtering
Gestures
Domain Knowledge
Fig. 1. Gesture recognition system overview: The recognition process consists of the steps Measurement, Segmentation, Feature Extraction, Classification and Filtering
Utilizing an Accelerometric Bracelet for Ubiquitous Gesture-Based Interaction
521
As mentioned earlier we built our pattern recognition system around a decision tree. This choice was made because of the very asynchronous ratio between training and working phase and the low complexity of the inference. While the complex offline training of such a tree is very expensive and time consuming, the actual model is a set of simple conditional rules based on numerical comparisons. They can be converted into a sequence of if-then-else statements which can be processed in real time on the device. The whole system consists of the following processing steps: Measurement, Segmentation, Feature Extraction, Classification and Filtering (see Fig. 1). Measurement. According to [11] accelerations of human body parts can reach an amplitude up to 6 g and frequencies of 60 Hz in the worst case, which always implicates some kind of impact. It is mentioned that in general accelerations do not exceed 5 Hz. Lange [12] determined 13 Hz as the maximal frequency for a voluntary movement of the fingers and a frequency of 16 Hz achievable by a reflex. Own tests for determining the amplitude measuring range during hand movements showed that even quick accelerations did not exceed 4 g in practice. We took those values as a basis configuration for our sensor (because of the Nyquist-Shannon sampling theorem the sampling frequency was set to 2 ∗ 16 = 32 Hz). Segmentation. For partitioning the continuous sensor data stream into processable portions, it is most prevalent to separate it into equally sized segments (windows). A window should be long enough to capture a whole gesture, but short enough not to merge and blur motions in quick succession. As most of our anticipated gestures do not exceed 2 seconds, we chose a window size of 64 samples. For keeping the system reactive and not loosing shifted gestures which overrun a window border we are calculating the segments half-overlapping. We do not apply a weighting function. Feature Extraction. Although high-level features arising from complex fourier or shape analysis usually result in a significant improvement of the recognition rate of a classifier, we decided to stick to time domain features because of the computing complexity. As we do not know in advance, which gestures will have to be disambiguated only generic and unspecific features were considered. For every window we are calculating the euclidean norm of the means of all three axis, and the covariances between all three axis and additionally the euclidian norm. Altogether this makes 11 direction independent and easily calculable features. Classification. As we intend to work with real sensor data on an embedded device the classifier has to meet several requirements. It must be able to handle numerical values, noisy data and missing samples, also it must work in real time and under very limited system resources. Additionally, learning should not require much user interaction or parameters and it should not tend to overfitting, both to simplify the training of new gestures. The C4.5 decision tree [13] is a non-metric supervised learning method. It is based on recursive binary partitioning of the feature space by constant thresholds in its nodes. Learning these “rules” follows specific selection criterions. This process terminates when a given exit condition is reached and the tree is pruned. As mentioned earlier if once the time-consuming learning phase has finished the actual classifier model is very straightforward and applicable in real-time even on embedded devices.
522
A. Hein, A. Hoffmeyer, and T. Kirste
Filtering. For smoothing out temporal inconsistencies a median filter can be applied on the preliminary gesture classes detected by the classifier. It turns out that a filter kernel width of three windows makes an acceptable tradeoff between reactivity and noise reduction for our window size. The median filter was not used during the tests presented in this paper.
3 Implementation For recording motion data and evaluating our algorithm we built the “GRacelet” – a gesture recognition bracelet – consisting of a wrist mounting and a custom built motion sensor board (Fig. 2). The sensor board is constructed as an autonomous embedded system containing an Atmel ATMEGA32-L 8 Bit low-power microcontroller, a Stollmann BlueMod+P25 Class 2 Bluetooth module (range 25 m+) and an integrated battery lasting for at least 24 hours of continuous operation. The attached Bosch Sensortec SMB380 acceleration sensor is able to capture a measuring range of ±2 g, ±4 g or ±8 g. The range was set to ±4 g and the raw sampling frequency to 32 Hz, as explained in section 2.
Fig. 2. GRacelet: Gesture recognition bracelet consisting of a wrist mounting and a motion sensor board
For evaluation and offline training purposes we first implemented the prototype software on a desktop computer in Java using the Weka toolkit [13] and used the GRacelet device only for recording and instantly transmitting the raw sensor data. This way we could experiment with various feature sets and setups and modify the gesture recognition pipeline without too much effort. The software also supports an “online” mode for demonstrating the detection of pre-trained gestures in real time on the screen. For the final application it is intended that the device works as an autonomous input device (not requiring any kind of external classifier), and just sends out recognized gesture events via bluetooth. Therefore the recognition pipeline has been reimplemented for running directly on the device (measurement, segmentation, feature extraction) while the model, generated by training the decision tree, has to be transferred manually.
Utilizing an Accelerometric Bracelet for Ubiquitous Gesture-Based Interaction
523
4 Evaluation During the evaluation we wanted to clarify three aspects: The feasibility of the system as a whole, whether or not our approach is able to classify simple gestures and how reliable our gesture recognition algorithm would work for simple and even more complex gestures for different persons. As we were not yet interested in the day to day performance but in a general proof of concept, we were conducting our tests under laboratory conditions. Therefore we conducted two experimental test runs with different sets of gestures and two subjects each. Every single gesture was performed 100 times by every subject which makes a total number of 200 examples for one specific gesture. The captured sensor data was transmitted to a stationary computer via Bluetooth and annotated manually. The classification results for each test run were estimated using stratified 10fold cross-validation. The final classification tree used on the device was trained using all recorded examples. Basic Gestures. For representing simple gestures we chose 4 different figures drawn into the air by hand and one idle gesture (Fig. 3). The five gesture classes were named “circle left”, “circle right”, “up/down”, “left/right”, and “idle”. 3D plots of the specific acceleration values are shown in Fig. 4. A total number of 1000 training examples were recorded.
Fig. 3. Basic gestures performed during first test run. Circular (left) and linear (right)
The gesture classifier achieved a recognition rate of 97.78% +/- 4.44%. As the confusion matrix points out, especially the linear motion and the disambiguation between idle state and performing a gesture was perfectly detected (Tab. 1). Also circular movements were detected highly reliable, but left and right could not be discriminated very well in every case. Probably a more specific feature set also incorporating the phase shift would have resulted in a higher accuracy. Complex Gestures. For evaluating the recognition accuracy of more complex gestures each test subject had to write all ten digits into the air plus one additional idle gesture.
524
A. Hein, A. Hoffmeyer, and T. Kirste
Fig. 4. 3D plot showing the accelerations of the 5 gestures of the first test run Table 1. Confusion matrix (in %) for first test run. Different gesture classes are “circle left”, “circle right”, “up/down”, “left/right”, “idle” simple gestures
estimate
circle left circle right up/down left/right idle
truth circle left circle right up/down left/right idle 100,00 0,00 0,00 0,00 0,00 0,00 90,48 4,76 4,76 0,00 0,00 0,00 100,00 0,00 0,00 0,00 0,00 0,00 100,00 0,00 0,00 0,00 0,00 0,00 100,00
For these 11 gestures again 2200 examples were recorded. The classnames were “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “zero”, and “idle” (see Fig. 5). The second test run resulted in a recognition accuracy of 76.01% +/- 7.98%. Compared with the overall accuracy of the first test run this value is significantly lower but not as bad as we expected in advance. It has to be considered that the classifier had to distinguish 11 much more complicated and partially very similar gesture classes. The level for random guessing would be at 9% which still is much below the actual value. The confusion matrix for the complex gestures is shown in Tab. 2. As can be seen, the classifier could disambiguate perfectly between the idle state and the actual gestures (100% right). More unique shapes like the digits “zero” (95%), “one” (94%), “four” (89%), “nine”, “eight”, “two” (each about 80%) could be recognized much better than the other more similar ones (in this context similar and shape have to be understood in respect of acceleration and not actual displacement, see Fig. 5). Probably new high-level features which are adjusted according to these poorly detected gestures could slightly improve the recognition rate while keeping the classifier simple. This empirical manual feature selection requires either a lot of expert knowledge or plenty of trial and error. That is why in most cases a different classification method for example incorporating hidden markov models could be more promising.
Utilizing an Accelerometric Bracelet for Ubiquitous Gesture-Based Interaction nine
x
zero
idle
x
z
x
x
z
y five
z
y six
x
z
x
y
eight
x
z
y two
x
z
y seven
x
z
y one
525
z
y three
x
z
x
z
y
y four
z
y
y
Fig. 5. 3D plot showing the accelerations of the 11 gestures of the second test run Table 2. Confusion matrix (in %) for second test run. Different gesture classes are “one”, “two”, “three”, “four”, “five”, “six”, “seven”, “eight”, “nine”, “zero”, “idle” complex gestures
estimate
one two three four five six seven eight nine zero idle
one 94,12 0,00 0,00 0,00 0,00 0,00 0,00 18,75 0,00 0,00 0,00
two 0,00 80,95 4,17 0,00 0,00 0,00 3,70 0,00 0,00 0,00 0,00
three 0,00 14,29 54,17 8,33 10,34 2,86 7,41 0,00 0,00 0,00 0,00
four 0,00 0,00 4,17 88,89 3,45 0,00 0,00 0,00 0,00 5,00 0,00
five 0,00 4,76 12,50 2,78 62,07 5,71 11,11 0,00 0,00 0,00 0,00
truth six 0,00 0,00 4,17 0,00 6,90 74,29 18,52 0,00 0,00 0,00 0,00
seven 5,88 0,00 16,67 0,00 13,79 14,29 48,15 0,00 13,51 0,00 0,00
eight 0,00 0,00 0,00 0,00 3,45 0,00 3,70 81,25 0,00 0,00 0,00
nine 0,00 0,00 4,17 0,00 0,00 2,86 7,41 0,00 83,78 0,00 0,00
zero idle 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 2,70 0,00 95,00 0,00 0,00 100,00
5 Conclusion Ubiquitous computing requires new ways of human computer interaction. Free-handed gestures could become an unobtrusive and intuitive input method. In this work an approach for recognizing gestures in real-time with a prototype of a wireless embedded bracelet device was presented. We are using one single 3-axis accelerometer for motion sensing. Keeping in mind that future devices will have severe requirements regarding miniturization and power consumption we developed a tree-based very low complexity recognition algorithm avoiding complex calculations or probabilistic models. New gestures can be added easily through supervised learning. We conducted an evaluation where we could show the feasibility of our approach. Simple gestures could be
526
A. Hein, A. Hoffmeyer, and T. Kirste
recognized at a very good rate while more complex gestures (digits) at least partially caused significant problems. The latter could be detected still much better than random guessing, but in fact some of them not good enough for everyday usage. Recent offline tests utilizing a pricipal component analysis and a support vector machine significantly improved the recognition accuracy for complex gestures to 93%. Unfortunately, these algorithms are too complex and could not be executed in real time on our hardware device. So our future work will focus on investigating a model based algorithm, which probably would be better suited for compound gestures. Gesture recognition on its own is not very spectacular without applications built around it. So besides just further improving the recognition accuracy especially practical applications and user studies will bring this research area forward and help to define specific requirements for new enhanced algorithms. From our point of view detecting both-handed gestures is an interesting challenge. The further integration of sensors in new devices like for instance intelligent wristwatches will allow new subtle ways of interaction in a longer timeframe.
Acknowledgements We are indebted to the Fraunhofer Institute for Computer Graphics Research Rostock, especially Gerald Bieber, for providing the accelerometric sensing device and supporting this work.
References 1. M¨antyj¨arvi, J., Kela, J., Korpip¨aa¨ , P., Kallio, S.: Enabling fast and effortless customisation in accelerometer based gesture interaction. In: MUM 2004: Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia, pp. 25–31. ACM, New York (2004) 2. Farella, E., Acquaviva, A., Benini, L., Ricc`o, B.: A wearable gesture recognition system for natural navigation interfaces. In: Proceedings of EUROMEDIA 2005, Toulouse, April 2005, pp. 110–115 (2005) 3. Niezen, G., Hancke, G.: Gesture recognition as ubiquitous input for mobile phones. In: DAP Workshop at UBICOMP 2008, University of Pretoria (2008) 4. Prekopcs´ak, Z., Hal´acsy, P., G´asp´ar-Papanek, C.: Design and development of an everyday hand gesture interface. In: MobileHCI 2008: Proceedings of the 10th international conference on Human computer interaction with mobile devices and services, pp. 479–480. ACM, New York (2008) 5. Pylv¨an¨ainen, T.: Accelerometer Based Gesture Recognition Using Continuous HMMs (2005) 6. Schl¨omer, T., Poppinga, B., Henze, N., Boll, S.: Gesture recognition with a wii controller. In: TEI 2008: Proceedings of the 2nd international conference on Tangible and embedded interaction, pp. 11–14. ACM, New York (2008) 7. Kela, J., Korpip¨aa¨ , P., M¨antyj¨arvi, J., Kallio, S., Savino, G., Jozzo, L., Marca, D.: Accelerometer-based gesture control for a design environment. Personal Ubiquitous Comput. 10(5), 285–299 (2006) 8. Hein, A.: Echtzeitf¨ahige merkmalsgewinnung von beschleunigungswerten und klassifikation von zyklischen bewegungen. Master’s thesis, University of Rostock (November 2007)
Utilizing an Accelerometric Bracelet for Ubiquitous Gesture-Based Interaction
527
9. Bao, L., Intille, S.S.: Activity recognition from user-annotated acceleration data. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 1–17. Springer, Heidelberg (2004) 10. Ravi, N., Dandekar, N., Mysore, P., Littman, M.L.: Activity recognition from accelerometer data. In: AAAI, pp. 1541–1546 (2005) 11. Bouten, C., Koekkoek, K., Verduin, M., Kodde, R., Janssen, J.: A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE Transactions on Biomedical Engineering 44(3), 136–147 (1997) 12. Lange, H.K.H.: Allgemeine Musiklehre und Musikalische Ornamentik. Franz Steiner Verlag (2001) 13. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers Inc, San Francisco (2005)
A Proposal of New Interface Based on Natural Phenomena and So on (2) Ichiro Hirata1, Toshiki Yamaoka2, Akio Fujiwara2, Sachie Yamamoto2, Daijirou Yamaguchi3, Mayuko Yoshida3, and Rie Tutui2 1
Hyogo Prefectural Institute of Technology, Product Innovation Dept 3-1-12, Suma-Ku, Kobe City, Hyogo, Japan 2 Wakayama University, Faculty of Systems Engineering 3 Wakayama University, Graduate school of Systems Engineering Sakaedani 930, Wakayama City, Wakayama, Japancontain [email protected], [email protected], {s105048,s105059,s085061,s115037}@sys.wakayama-u.ac.jp, [email protected]
Abstract. The purpose of this research is "realization of the user interface that is kind to person". We live together with nature. Therefore, it is effective to use natural phenomena for the user-interface. To explore the new user-interface based on natural phenomena and so on, the data of three categories ("Accustomed manners friendly", "natural phenomenon", "Movement, behavior of plants and animals") were gathered by field survey. The gathered data were classified and structured. New user-interface that combines many userinterfaces (picture scroll interface, water lily interface, fish shoal interface, and so on) was constructed. This paper presents the example of four user-interfaces that are selected from collected data. Keywords: user-interface, natural phenomena, manners, behavior.
1 Introduction In this paper, the new user-interface that is based on usual manners is discussed. Human tends to acknowledge the object from the left to the right. The way to the kabuki actor's appearing is an example of using this physiology phenomenon. “The direction of the fish in the food counter” and “the character mark” is one of the examples [1]. This paper discusses the new interface design based on natural phenomena and so on. We live together with nature. So, it is effective to use natural phenomena for the user interface. The purpose of this research is "realization of the user interface that is kind to the person". Observing a natural phenomenon is one of the most important ways of thinking [2]. Designers observe a natural object, simplify some of its curves and patterns, and bring about a new design. We think this is the most important when we design motions. In order to extract the elements of the user interface, we made a field research. It was investigated from three points. The interface design ideas taken by field survey were edited as follows: C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 528–534, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Proposal of New Interface Based on Natural Phenomena and So on (2)
529
1. Accustomed manners friendly 2. Natural phenomenon 3. Movement, behavior of plants and animals Field survey was investigated from Hokkaido island (Northern island) to Okinawa island (Southern island) in 33 places. The collected data was classified, and it was structured. And, ideas of the user-interface were designed. Many elements of the userinterface were extracted. The collected data was classified, and it was structured. Finally, the main element of each group was translated into the new user-interface.
2 Example of New User-Interface 1. In this chapter, example that this survey constructed new user-interface is discussed as follows. 2. A new user-interface with scroll function (Example of new user-interface based on accustomed manners and objects) 3. A new user-interface with direction (Example of new user-interface based on natural phenomenon) 4. A new user-interface gathered related information (Example of new user-interface based on movement, behavior of plants and animals ) The three user-interface designs were visualized to evaluate from the viewpoint of usability and emotion and so on. 2.1 A New User-Interface with Scroll Function (Examples of New User-Interface Based on Picture Scroll) Japanese traditional manner is effective to apply the screen transfer. It is kind to Japanese people. Especially, it is kind to Japanese elderly people. A new user-interface, which is picked among all of them, is discussed. The picture scroll is one of Japanese traditional painting form as shown in Fig.1. It is a form drawn on the paper horizontally connected. The story is expressed on one page. And, the screen moves from the right to the left. Human tends to acknowledge the object from the left to the right. Therefore, archetypal user-interface is consecutive composition from the left to the right. But, the picture scroll is consecutive composition from the right to the left. Japanese original document form that is vertical writing is read from the upper right to the left under, too. The new interface design which is based on the picture scroll, as shown in Fig.2. This user-interface is designed on assumption of Japanese hotel’s guide. The feature of this user-interface ideas are as follow. 1. 2. 3. 4.
Display all information on one page Information flows from the right to the left. It is a traditional display method of Japan. It is easy to confirm the operation flow.
530
I. Hirata et al.
Fig. 1. Picture scroll
This user-interface is easy to correct a mistake because it can confirm the flow of task. Moreover, this information flow is kind to Japanese elderly people. It is possible to explain by comparing relationship between “Picture scroll” and “Picture-story show”. “Picture scroll” which gave all stories in one screen is “Parallel information model”. In contrast, “Picture-story show” which story is sequentially served is “each information consecutive model”. The user-interface design was visualized to evaluate from the viewpoint of usability and emotion and so on. The user-interface was evaluated highly as follows. Generally, design1 has good results. Especially, the emotional aspect is evaluated highly.
Fig. 2. A new user-interface idea based on picture scroll
As another idea, the new interface design which is based on the picture scroll, as shown in Fig.3. This user-interface is designed on assumption of hotel’s reservation system. The feature of this user-interface ideas are as follow. 1. All operation menu items are on a big seat. Main menu on the display is close-up of a big page. 2. When push the button, screen is moved to target menu.
A Proposal of New Interface Based on Natural Phenomena and So on (2)
531
This interface’s structure is “Parallel information model”. But, the operation of this user-interface is “each information consecutive model”. “Each information consecutive model” is the operation for amateurs. Therefore, this interface is easy to use. Moreover, the animation when push the button is easy to understand the screen’s change.
Fig. 3. A new user-interface idea based on picture scroll 2
2.2 A New Interface with Direction (Examples of New User-Interface Based on Surface of Water and Water Lily) The interface design based on natural phenomena was effective to design and interface structure. A new user-interface, which is picked among all of them, is discussed. The water lily’s leaf is float on the water. The root of the water lily is sank underwater (Fig.4.).
Fig. 4. Surface of water and Water lily
The interface design idea, which is based on the water lily, as shown in Fig.5. This user-interface is designed on assumption of ATM (automated teller machine). The feature of this interface design ideas are as follow. 1. The navigate flow chart is displayed lightly on background. 2. When the button is pushed, screen is moved to target menu.
532
I. Hirata et al.
Fig. 5. Interface idea based on water lily
This user-interface is easy to use because the navigate flow is displayed lightly on background. The user-interface design was visualized to evaluate from the viewpoint of usability and emotion and so on. The user-interface was evaluated highly as follows. While interface aspects are evaluated highly, the navigation is not evaluated. Although the navigation is good idea, the design is not so good. 2.3 A New Interface Gathered Related Information (Examples of New User-Interface Based on Gathered Fish) The interface design based on “movement behavior of plants and animals” is effective to motion design of the user interface. A new user-interface, which is picked among all of them, is discussed. Carps in the pond usually swim calmly. But, they swarm around food rapidly when bait drops in the pond, (Fig.6.).
Fig. 6. Fishes gathered to get bait
The user-interface idea, which is based on fishes gathered, as shown in Fig.7. This user-interface is designed on assumption of the multifunctional controller. This controller has the functions of three kinds of products (an air-conditioner, a television, an audio component system). Main menu display has 7 buttons (Air-conditioner,
A Proposal of New Interface Based on Natural Phenomena and So on (2)
533
Television, Audio compo, Temperature, Reservation, Mode, Volume). When the user wants to adjust the volume of the audio component system, the assembly operations have 2 ways as follows. 1. First, "Volume" button is pushed. Next, "Audio compo" button is pushed. 2. First, "Audio compo" button is pushed. Next, "Volume" button is pushed. Either way can be executed to adjust the volume of the audio component system. Information in this system doesn't categorize. The feature of this interface design ideas are as follows. 1. When the button is pushed, relating menu swarm around the pushed button. 2. The pushed button moves to center on the screen when the button is pushed. 3. The information is gathered which are related. 4. This user-interface isn't categorized the menu-items. “The relating menu swarm around the button” is convenience.
Airconditioner
Temperature
Audio Compo
Television
Reservation
Mode
Volume
Fig. 7. Interface idea based on gathered fishes
The user-interface design was visualized to evaluate from the viewpoint of usability and emotion and so on. The user-interface was evaluated highly as follows. This user-interface is evaluated highly because of new function and the convenience.
3 Conclusions This paper described the new interface design approach. The purpose of this research is "realization of the user interface that is kind to the person". The approach of these interface are common point that it is possible to operate it without layered structure. Conventional user-interface was necessary to understand the layered structure of the category. However, many elderly people don't have the mental model [4]. Mental model have 2 type models as follows [5]. The functional model is a model to understand how to use. The structural model is a model to understand how it works. The results showed that many elderly people didn’t have the structural model of mental model when he/she operated new electrical appliances. The elderly people feel that it
534
I. Hirata et al.
is not easy to use because general interface needs structural model of mental model. This study’s interfaces are not necessary to have the structural model. The proof experiment is going to be conducted in future. Acknowledgments. This research was executed by the subsidy of the surveillance study “Eco- innovation Promotion project 2008”. We would like to thank The New Energy and Industrial Technology Development Organization (NEDO) that promoted this project.
References 1. 2. 3. 4. 5.
Toshiki, Y.: Ergonomic lecture, pp. 256–257. Musashino Art University Publisher (2002) (in Japanese) Tomoyuki, S., Chisa, Y., Zhao, Y.Y.: Understanding Interaction Design Using Tiny Computer System. In: IASDR 2007 (2007) Ishii, H.: Tangible bit, pp. 30–31. NTT publishing, Tangible Media Group (2000) Ichiro, H., Toshiki, Y.: A study on Operational Electrical Appliances for Elder Person. Japan Ergonomics Society in Kansai 2008, 137–138 (2008) (in Japanese) Jenny, P., Yvonne, R., Helen, S.: Human-ComputerInteraction, pp. 123–139. AddisonWesley Publishing, Reading (1994)
Timing and Accuracy of Individuals with and without Motor Control Disabilities Completing a Touch Screen Task Curt B. Irwin and Mary E. Sesto Trace Center, University of Wisconsin-Madison [email protected], [email protected]
As touch screen technology improves in functionality and decreases in price, these input devices are becoming much more prevalent. People are increasingly required to interact with touch screens at places ranging from their local grocery stores to airport check-in kiosks. Since it is becoming necessary for people to use touch screens in order to access needed products or services, we conducted an experiment to examine how individuals with varying motor control disabilities perform on a simple number entry task. We feel this research is important because, to date, most of the usability research related to touch screens has only included young, healthy subjects. Several relevant studies have examined the effect of button size and spacing on the timing and accuracy of users’ touch screen performance. Colle (2004) examined the performance characteristics of young, healthy college students as button sizes and spacings were altered in a number entry task. Jin (2007) studied performance relating to button size and spacing for a single selection task with a sample of older adults with varying levels of manual dexterity. This pilot study included a total of 30 participants, 11 participants with Cerebral Palsy (CP) average age = 46.3 years (SD=9.4), 11 with Multiple Sclerosis (MS), average age =51.5 years (SD=6.8) and 8 age-matched, non-disabled controls, average age = 48.4 (SD=11.6). All disabled participants self-reported difficulty pushing buttons. Informed consent was obtained in accordance with the University of Wisconsin guidelines for the protection of human subjects. This study investigated performance using buttons ranging in size from 10mm square to 30mm square, in 5mm increments. The spacing between the buttons varied between 1mm and 3mm. Seated participants completed a four digit number entry task on a 4 x 4 keypad with a “Go” button on the upper left side and a “Done” button on the upper right side of the button array. After they pressed the “Go” button, subjects were presented with a random 4-digit number just above the keypad. After entering the 4-digit number, the subjects pressed the “Done” button. The buttons were activated with a “land-on” strategy and the touch screen required an average activation force of 0.5N. Practice sessions for all tasks were provided. Each button size/spacing combination had 6 repetitions and therefore the entire protocol consisted of 60 trials. All participants operated the touch screen in a front-approach position. The touch screen was mounted in an adjustable frame which positioned the touch screen with adequate knee clearance of at least 69 cm as specified in the ADA accessibility guidelines. Data from four of the participants with CP were eliminated as these participants could not successfully complete all tasks. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 535–536, 2009. © Springer-Verlag Berlin Heidelberg 2009
536
C.B. Irwin and M.E. Sesto
Timing, accuracy (errors, misses) and kinetic data were collected for every finger touch during the number entry task. The kinetic data is reported elsewhere and will not be reported here. An error was defined as when the operator activated an incorrect button and a miss was defined as when the operator activated the touch screen but did not activate a button. For this pilot work, the following variables were examined: 1) Total time from the first numbered button push to the fourth numbered button push (only trials that did not contain a miss or an error), 2) Percentage of trials containing an error, 3) Percentage of trials containing a miss, 4) Percentage of trials containing either a miss or an error. The results for total time were analyzed using a mixed effects repeated measures analysis of variance and the results for miss, error, and miss or error were analyzed using non-parametric methods (Friedman’s test). The results indicated significant differences between button sizes for all variables (p<0.05). On average, participants required 23% less time to complete the four-digit entry task when using 25mm buttons as compared to the 10mm buttons (p<0.05). The percentage of trials with an error was significant by button size (p<0.05) as was the percentage of trials with a miss (p<0.05). The percentage of trials with any inaccurate touch (either a miss or an error) was also significant by button size (p<0.05). For trials with an inaccurate touch, the differences were significant (p<0.05) between both the 10mm and 15mm buttons and all other button sizes. When increasing the button size from the smallest to the largest button, the percentage of trials with an inaccurate touch decreased from 35% to 1% for the non-disabled group, from 68% to 21% for the CP group and from 58% to 4% for the MS group. Performance improvements appear to be less dramatic for buttons larger than 20mm with 14.4% of trials having an inaccurate touch at 20mm, 9.6% at 25mm and 7.7% at 30mm. These results indicate that performance trends for errors and misses are similar between groups. The contents of this paper were developed under grant H133E030012 from the National Institute on Disability and Rehabilitation Research (NIDRR), U.S. Department of Education. However, these contents do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the federal government.
Gaze and Gesture Activity in Communication Kristiina Jokinen University of Helsinki, Finland [email protected]
Abstract. Non-verbal communication is important in order to maintain fluency of communication. Gestures, facial expressions and eye-gazing function as nonverbal means to convey feedback and provide subtle cues to control and organise conversations. In this paper, verbal and non-verbal feedback are discussed from the point of view of how they contribute to the communicative activity in conversations, especially the type of strategies that the speakers deploy when they aim to construct shared understanding of the tasks and duties in interaction in general. The study concerns conversational data, collected for the purposes of designing and developing more natural interactive systems.
1 Introduction Recently verbal and non-verbal aspects of communication have become popular research topics in interaction technology. Understanding how intonation conveys the speaker's attitudes and emotional state, and how gestures, facial expressions and body posture support, complement, and in some cases, override verbal communication, is necessary for modelling interaction management. The knowledge is crucial also in the design and implementation of systems that can adapt themselves to different users in different environments and in different cultural contexts: intelligent, interactive and context-aware applications require knowledge on natural interaction strategies and what it means to communicate in a natural way. Often these aspects have been overlooked in dialogue system design because of technological constraints or lack of larger theoretical views of how human communication takes place. However, information management and multimodal user interface are considered among the main research challenges for enabling technologies, and interactions with smart objects, services and environments need to address challenges concerning natural, intuitive, easy, and friendly interaction. For instance, Norros et al. [21] list speech, hearing, gesturing and gaze as serious modality candidates to enhance future Human Technology Interfaces, since these are extensively used in human-human interaction. Active research is going on concerning the types and communicative functions of various gestures and facial expressions in order to learn more about natural human communication, and to enable more intuitive and flexible interactions between human users and computer agents. Besides corpus-based empirical investigations, ECAs (Embedded Conversational Agents, [2, 8]) and virtual world characters have been used to experiment and develop more intuitive interaction techniques. Also robotic companions have been developed so that they can recognize speech and gestures, and so become engaged in multimodal communication [3]. New application areas have C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 537–546, 2009. © Springer-Verlag Berlin Heidelberg 2009
538
K. Jokinen
also appeared for non-verbal communication techniques: various games and educational toys that use the novel technology can allow users, especially children with special needs to enjoy and be empowered by the new technology. In second language learning and intercultural communication studies non-verbal communication also appears important: students need to learn to observe relevant communicative gestures and to produce suitable gestures themselves in order to communicate fluently. In this paper we look at verbal and non-verbal feedback from the point of view of the joint activity that the partners show in their gestures, facial expressions, and body posture. Through the collection of large corpora and examination of communication in natural situations, we study the regulating function of non-verbal communication, and how the speakers use different non-verbal means to manage dialogues in multiparty settings. The study concerns especially the synchronisation of communicative activity that the partners show in their reactions to the partner's presentations, and the kind of gesturing they use to construct shared understanding of the tasks and duties in communication. Non-verbal activity is thus tied to the notion of feedback and to the general dialogue strategies modelled for the purposes of designing and developing more natural interactive systems. The specific research questions concern: • What kind of non-verbal communication takes place in human interactions? • What kind of interrelations can be found among the various types of non-verbal communication? • Can the correlations be measured and modelled for interactive systems? The paper is structured as follows. Section 2 discusses different non-verbal signals, gestures, face and body posture in providing feedback, and Section 3 presents the data and examples of the gestures and body communication which function in the coordination of communication. Section 4 presents their contribution to the dialogue activity and synchrony among the participants. Section 5 draws conclusions and points to further research topics.
2 Non-verbal Feedback Natural and intuitive dialogue phenomena do not only include spoken words and utterances, but also vocal aspects and gesticulation which do not appear in written language. Such phenomena include: • • • • • • • • •
Hesitations (silence, sound prolongation, hesitation markers, repetitions) Discourse markers (well, I mean, oh, ah) Backchannels (uhuh, mmhmm, ok) Speech signals (changes in voice quality, speaking rate, pitch, intensity) Eye movements (focus of attention) Head movement (related to focus of attention, turn taking) Facial expressions (reflecting states of mind) Hand gestures (indicating rhythm of the speech, pointing, icons) Body posture (controlling their involvement in the discussions). Often these phenomena work in synchrony. For instance, turn-taking and backchannelling are based on prosodic and syntactic features [19] and can also be
Gaze and Gesture Activity in Communication
539
supported by eye-contact which signals the end of turn or the speaker's intention to take turn. The importance of gaze can also be seen in establishing the focus of shared attention in classroom interactions and meetings: gaze direction serves to frame the interaction and establish who is going to speak to whom, and about what. Moreover, non-verbal signals serve social functions, creating bonds and shared knowledge, as well as reflecting attitudes, mood and emotions of the speakers [12]. Much of the conversational information exchange relies on the assumptions that are not necessarily made explicit in the course of the interaction. One of the main challenges in interaction management thus lies in the grounding of language: finding the intended referents for the partner's expressions and regulating the flow of information so as to construct shared knowledge of what the conversation is about [9, 13, 14, 22]. Non-verbal signals provide an effective means to contribute to the mutual understanding of the conversation, and to update one's knowledge without interrupting verbal presentation. For instance, looking at the conversational partner or looking away can provide indirect cues of the speaker's willingness to continue interaction, gazing at particular elements in the vision field tells what the focus of attention is, while gesturing usually catches the partner's attention and marks relevant parts of a message. Feedback can be classified according to the strength of commitment on the feedback giver's side: at its weakest we talk about backchannelling as providing feedback about the basic enablements of communication (contact, perception and understanding), and at its strongest the speaker gives feedback by agreeing on the content of the utterance. Non-verbal feedback is usually backchannelling, i.e. automatic signalling of “the channel being open” rather than conscious and intentional exchange of symbolic information: it contributes to the fluency of communication and allows the participants to monitor the state of interaction quickly and unobtrusively. However, non-verbal signals are usually ambiguous and their interpretation requires that the communicative context is taken into account. Different levels of context are relevant, ranging from the utterance itself to the roles and cultural background of the participants (see [17]). One of the necessary conversational skills is thus to know how and when to enable the right type of contextual reasoning: participants need to observe each others’ reactions and changes in their emotional and cognitive states so as to draw appropriate contextual inferences. The form of non-verbal signals also provides important information about the meaning of the signals in a given context. For instance, the form of gestures (hand shape, movement, use of fingers) can vary from rather straightforward picturing of a referent (iconic gestures) to more abstract types of symbolic gestures, up to culturally governed emblems (such as the sign of victory). Although gestures are often culturespecific, some forms seem to carry meaning that is typical of the particular hand shape itself. Kendon [18[, for instance, talks about different gesture families which describe the semantics and function of gestures in their context. This supports multifunctionality of gestures and also the gestures forming a continuum rather than a classification of categories. Furthermore, it allows wider and more communicative interpretation of gestures: gesture families guide the partner towards certain general pragmatic interpretations (e.g. “I'm offering this information for your consideration and expect your evaluation of it” vs. “I think these points are important and expect you to pay attention to them”), and the verbal content then specifies the meaning of the gesture by expressing what detailed information is being offered or considered important.
540
K. Jokinen
The gesture can of course also occur alone, in which case its interpretation is based on the semantic theme typical of the hand shape.
3 Conversational Data and Analysis The corpus consists of conversations collected in an international setting at the ATR Research Labs in Japan, and includes videos of four participants engaged in freeflowing conversation. One of the speakers knows all the others while the others are unfamiliar with each other, although share some background knowledge of the culture and living in Japan. All the interlocutors speak English but represent different cultural backgrounds and language skills. In order to collect as natural data as possible the participants' topics or activities were not restricted in advance. The three about 1,5 hour long conversations were recorded during three consecutive days and consist of casual conversations in an unobstructed technical setting. The technical setup for the collection is similar to the one described in [10] and [6], while the corpus annotation is reported in [16]. The ATR data is transcribed, and a small part of it is annotated with respect to nonverbal gesticulation. The analysis is based on the MUMIN coding scheme [1], developed as a general tool for the annotation of communicative functions of gestures, facial expressions, and body posture. The communicative functions are related to the use of the three non-verbal modalities in turn-taking and feedback giving processes, and also in sequencing information. Annotation also takes into account the general semiotic meaning of communicative elements: they can be indexical (pointing), iconic (describing) and symbolic (conventional) signs. Also the form is annotated for each communicative element: for gestures this includes e.g. the shape of hand, palm, fingers, and hand movement, for face and head this includes the shape and combination of eyes, eye-brows, mouth, and the movement of head, and for the body posture, the leaning back- and forward. One of the conversational situations is shown as an example in Figure 1. The speaker (second from left) had just explained how her friend had went to an electrical shop and bought a special massaging shower and also tried a massage chair that she had liked. The speaker in the back right had suggested it is a healing massage, aiming to share background information that the friend was interested in various types of Eastern healing practices. The gesturer (the one in the front right) now wants to make sure she has understood the concept correctly, and asks for a clarification do you mean this chair or ... or herself. The clarification is accompanied by a gesture with the open hand moving up-down and emphasizing the phrase the chair and, after a short pause and hesitation (or ... or), also the second alternative herself, although less visibly. During the speaking of the first alternative, all the partners look at her, but they move their heads simultaneously towards the original speaker during the second emphasis in anticipation of a clarifying response (Figure 2). The gesture is a typical emphasizing gesture which does not only accompany or complement the spoken content but also functions as an independent means for interaction management. It is related to what McNeill [20] calls catchment. Kendon [18] interprets this kind of gesture as a pragmatic gesture on the meta-discursive level. The particular shape of the gesture belongs to the gesture family of Palm Open Supine
Gaze and Gesture Activity in Communication
541
Fig. 1. Gesture emphasising "do you mean this chair or ... or herself"
Fig. 2. Gesture and face turning to expect a response
Vertical with the semantic theme of cutting, limiting, structuring information. Jokinen & Vanhasalo [17] have called these gestures stand-up gestures since they can be distinguished from the normal flow of information presentation (i.e. beats) so as to direct the partner's attention, structure the information flow, and create mutual context (they also belong to the normal repertoire of gesturing in successful stand-up comedies). In this particular example, the gesture cuts information flow and marks a particular part of the verbal content as something that the speaker considers relevant in the context to be clarified. The gesture focuses the partners' attention onto the two particular alternatives, and at the same time it limits the conversational topic to the clarification of these items. Thus, instead of expressing a set of meanings explicitly in an utterance, the speaker uses a non-verbal gesture that conveys them in an unobstructed manner, and simultaneously with the verbal content: how the building of mutual context is progressing (clarification needed), which part of the message is in focus (the two alternatives), and how the conversation is to be structured (divided into segments: presentation – clarification – explanation). Notable differences can be found in the manner and frequency in which the speakers provide feedback. The basic statistics in Table 1 shows how the speakers differ concerning the types of non-verbal signals produced: although the average number of
542
K. Jokinen Table 1. Basic statistics of non-verbal signals
Total
Average D
Y
K
N
Head
295
74
25
119
88
63
Gesture
133
33
37
32
34
30
Posture
69
17
4
16
10
39
497
124
66
167
132
132
gestures seems to be the same across the participants, the number of head and body movement differ greatly. There are of course personal and cultural reasons for the differences, but they may also be due to the underlying neurocognitive properties of non-verbal communication. For instance, if gesturing is closely related to thinking and motor control of speech, it is understandable that most speakers produce the same amount of gestures on average during speaking. Similarly, head turns and facial expressions, described with the help of the shape and movement of eyes, eye-brows, mouth, etc. have connections to the speaker's emotional state and focus of attention: facial expressions, at least as signs of true emotions, are produced via an innate mechanism of a cognitive stimulus firing neurons that control facial muscles (although the recognition and production of emotional expressions may be culturally conditioned too, see [11]). Face and head movements, however, show the participants' reactions to what has taken place, and thus vary among the individuals since they appear as automatic reactions based on their personalities; moreover, the underlying reactions seem to be related to those aspects of social life (emotions and attitudes) which are strongly controlled by cultural roles and expectations and which the speakers have internalised as acceptable communicative behaviour. Thus, as means of non-verbal feedback, face and head movement seem to belong to the behaviour patterns that are learnt in order to indicate appropriate level of understanding, interest, and willingness to listen to the partner, while gesturing is related to the speakers' own communication management which may be controlled by the speakers' needs and intentions to express themselves more than by social norms. Body movement, which to a large extent expresses “personal space”, can also indicate participation in the conversation, either as an active participant or an onlooker, and is thus also governed by social norms. Consequently, the observed differences in the frequencies of the different non-verbal signals across the participants can be related to their origin in the production of the speaker's own speech vs. reaction to the partner's speech, and to the culturally conditioned politeness and social norms which govern the appropriate level of expressiveness.
4 Dialogue Activity As seen from the previous data, conversations are full of simultaneous activity which requires subtle coordination by the participants. Models of feedback giving processes are thus relevant in order to design and develop more natural interactive systems, and active research using annotated corpora and machine-learning techniques is going on.
Gaze and Gesture Activity in Communication
543
Some work in this respect, using the annotated MUMIN categories to classify and cluster the information, is reported e.g. in [15]. To see how the conversational activity is distributed as signal-level observations, we can visualize the conversational activity as activity bars along the speakers' speech. Figure 3 below shows visualization of the verbal and gesture activity during a 25 minutes long dialogue excerpt from the ATR data, for the two speakers Y and D.1 The horizontally depicted activity bars indicate which of the speakers are verbally active at a particular time, i.e. either speaking, laughing or backchannelling, while the vertical peaks show the speakers' movement as detected on the video: gesturing, head turns and nods, as well as body leaning forward and backward.
Fig. 3. Conversational activity of speaker Y (above) and speaker D (below)
Speaker Y (the upper verbal bar) is the most active interlocutor concerning the annotated non-verbal activity, while Speaker D (the lower verbal bar) is the least active speaker. As can be seen, Y is very active in the beginning of the conversational excerpt, and has long stretches of speech in the middle and at the end, while D is fairly active in the beginning and end of the conversation, and has a long monologue type speech in the middle. Their speech overlap and also shows coordinated turn-taking (however, since the speech by the other two speakers is cut off, the actual coordination of the interlocutors turn taking and verbal activity is not shown in the figure). Speaker Y provides verbal feedback more than D as can be seen in the several crosses and circles on Y's speech bar, for instance, in the middle of the conversation where Speaker D has a long turn. These indicate that the length of Y's verbal activity was very short at these points (this kind of backchannelling is also supported by the top-down manual annotations). 1
I would like to Stefan Scherer for the video and speech analysis and producing the pictures.
544
K. Jokinen
What is clearly seen in the figure is the connection between speech and non-verbal body and gesture activity. There are clear peaks in the speaker's movement and speaking: when the speaker starts to speak, there is typically an action with their hands and/or body. Movement activity appears less when the speaker is listening although this also depends on the speaker. In general, these observations match with the assumptions made about the use of different non-verbal feedback signs in the conversation. As for the other speakers, their conversational activities against the other participants are shown in Figures 4 and 5.
Fig. 4. Speaker K's activity against the other participants' speech
Fig. 5. Speaker N's activity against the other participants' speech
5 Conclusions In the beginning of the paper we asked three questions about the form and function of non-verbal communication, and we can now conclude the paper by providing answers on the basis of the research described above. The kind of non-verbal communication
Gaze and Gesture Activity in Communication
545
that we have focussed on concerns gestures, facial expressions, and body posture, and their functioning in different communicative functions. On the basis of real conversational examples, we have shown that the participants coordinate their activities in an accurate manner, and effectively use the signals to give feedback, coordinate turn taking, and build shared context. We have also visualized the speakers' non-verbal activity against their verbal activity, and thus contributed to the on-going research on specifying correlations and interrelations among the various types of non-verbal communication signals. As future work, we plan to specify correlations between the non-verbal signals and speech properties such as intonation and the quality of voice. Work is also going on concerning the comparison of the top-down annotations and bottom-up signal processing, and we expect to learn more about the interplay between verbal and non-verbal interaction, and how observations of low-level signals match on the linguistic-pragmatic categories and human cognitive processing. As human-computer interactions get wider and more complex, the resulting models of interaction will provide us with a better understanding of the enablements for communication and the basic mechanisms of interaction that are crucial for flexible and intuitive interaction management. Intuitive communication strategies can improve the rigid and simple interactions that present-day systems exhibit, and thus the research also encourages interdisciplinary research where human and social sciences look into technological possibilities of application to the design and construction of interactive systems.
References 1. Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In: Martin, J.C., Paggio, P., Kuenlein, P., Stiefelhagen, R., Pianesi, F. (eds.) Multimodal corpora for modelling human multimodal behaviour. Special issue of the International Journal of Language Resources and Evaluation, vol. 41(3–4), pp. 273–287 (2007) 2. André, E., Pelachaud, C.: Interacting with Embodied Conversational Agents. In: Jokinen, K., Cheng, F. (eds.) Speech-based Interactive Systems: Theory and Applications. Springer, Heidelberg (2009) 3. Bennewitz, M., Faber, F., Joho, D., Behnke, S.: Fritz - A Humanoid Communication Robot. In: Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) (2007) 4. Campbell, N.: On the Use of NonVerbal Speech Sounds in Human Communication. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 117–128. Springer, Heidelberg (2007) 5. Campbell, N.: Speech and expression; the value of a longitudinal corpus. In: The 4th International Conference on Language Resources and Evaluation (LREC), pp. 183–186 (2004) 6. Campbell, N., Ohara, R.: How far can non-verbal information help us follow a conversation? Preliminary experiments with speech-style and gesture tracking. In: Proceedings of ATR Symposium on the Cross-Modal Processing of Faces & Voices. No laughing matter (2005) 7. Carletta, J.: Announcing the AMI Meeting Corpus. The ELRA Newsletter 11(1), 3–5 (2006)
546
K. Jokinen
8. Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds.): Embodied Conversational Agents. MIT Press, Cambridge (2003) 9. Clark, H.H., Schaefer, E.F.: Contributing to Discourse. Cognitive Science 13, 259–294 (1989) 10. Douglas, C.E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: Towards a new generation of databases. Speech Communication 40, 33–60 (2003) 11. Ekman, P.: Universal and cultural differences in facial expression of emotion. In: Cole, J.R. (ed.) Nebraska Symposium on Motivation, pp. 207–283. Nebraska University Press, Lincoln (1972) 12. Feldman, R.S., Rim, B.: Fundamentals of Nonverbal Behavior. Cambridge University Press, Cambridge (1991) 13. Jokinen, K.: Constructive Dialogue Management: Speech Interaction and Rational Agents. John Wiley and Sons, Chichester (2009a) 14. Jokinen, K.: Natural Language and Dialogue Interfaces. In: Stephanidis, C. (ed.) The Universal Access Handbook, Cenveo, ch. 31, pp. 495–506 (2009b) 15. Jokinen, K.: Non-verbal Feedback in Interactions. In: Tao, J.H., Tan, T.N. (eds.) Affective Information Processing, pp. 227–240. Science+Business Media LLC. Springer, London (2008) 16. Jokinen, K., Campbell, N.: Non-verbal Information Sources for Constructive Dialogue Management. In: LREC 2008, Marrakech, Marocco (2008) 17. Jokinen, K., Vanhasalo, M.: Stand-up gestures – Annotation for Communication Management. In: Proceedings of the Multimodal Workshop at Nodalida Conference, Denmark (2009) 18. Kendon, A.: Gesture: Visible action as utterance. Cambridge University Press, Cambridge (2004) 19. Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., Den, Y.: An analysis of turn taking and backchannels based on prosodic and syntactic features in Japanese Map Task dialogs. Language and Speech 41(3-4), 295–321 (1998) 20. McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005) 21. Norros, L., Kaasinen, E., Plomp, J., Rämä, P.: Human-Technology Interaction Research and Design. VTT Roadmap. VTT Industrial Systems, VTT Research Notes 2220. Espoo (2003), http://www.vtt.fi/inf/pdf/tiedotteet/2003/T2220.pdf 22. Traum, D.: Computational models of grounding in collaborative systems. In: Working Papers of the AAAI Fall Symposium on Psychological Models of Communication in Collaborative Systems, pp. 124–131. AAAI, Menlo Park (1999)
Augmenting Sticky Notes as an I/O Interface Pranav Mistry and Pattie Maes MIT Media Laboratory, 20 Ames Street, Cambridge MA 02139, USA {pranav,pattie}media.mit.edu
Abstract. The design and implementation of systems that combine both the utilities of the digital world as well as intrinsic affordances of traditional artifacts are challenging. In this paper, we present ‘Quickies’, an attempt to bring one of the most useful inventions of the 20th century into the digital age: the ubiquitous sticky notes. ‘Quickies’ enriches the experience of using stickynotes by linking hand-written sticky-notes to the mobile phone, digital calendars, task-lists, e-mail and instant messaging clients. By augmenting the familiar and ubiquitous physical sticky-note, ‘Quickies’ leverages existing patterns of behavior, merging paper-based sticky-note usage with the user's informational experience. The project explores how the use of Artificial Intelligence (AI), Natural Language Processing (NLP), RFID, and ink recognition technologies can make it possible to create intelligent sticky notes that can be searched, located, can send reminders and messages, and more broadly, can act as an I/O interface to the digital information world. Keywords: Sticky notes, paper as an I/O interface, connecting the physical and information world, intelligent user interface.
1 Introduction Drawing (including in this definition also the concept of writing) is an essential part of human communicational and intellectual activities. It allows expressing thousands of different types of data; it can be done without paying active attention and it does not require users to be familiar with computers. Since the beginning of modern computer science, research has been conducted in order to develop interfaces that could enable users to draw. The development of digitizers was an important step, allowing users to input their drawings or writing using a digital pen on a tablet or directly on the screen. However, despite these developments, the use of paper as the primary medium for information organization has far from dwindled, but instead increased steadily. Today, the paperless office is more distant than when it was proposed [1]. Despite the enormous popularity of computers and personal digital assistants, along with improvements in screen technology, mobile computing technology, and navigational and input tools, paper usage continues to increase. Paper has visual (resolution, contrast, viewing angle) and functional (null power consumption, low cost, portability, small & light weight) features that can hardly be rivaled. Support for and augmentation of paper-based routines is an important step in the computerization of human work practices. Several studies [2] have showed that paper C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 547–556, 2009. © Springer-Verlag Berlin Heidelberg 2009
548
P. Mistry and P. Maes
objects are very supportive, giving users control, flexibility, and overview of information in ways that are difficult to achieve with computer technology. However, digital benefits such as trace-and-search ability of documents are almost impossible to achieve with solutions based on paper only. Rather than trying to develop solutions that can replace the world of paper, it would be interesting if we can make it possible to use paper as the interface to computers, and augment and link paper-based work practices to our digital information world. 1.1 Sticky Notes and its Limitations Since 3M’s introduction of Post-it® Notes in 1980 [3], sticky notes, one of the variants of paper, have become an integral part of our everyday life - accumulating and keeping track of all sorts of information. In an office, sticky notes are often seen on desks as meeting reminders, to-do lists and contact information; on whiteboards as brainstorming devices; and on paper documents as brief notes to the recipient about the content or intended purpose. Sticky notes are also found at home mostly near telephones or on refrigerators as household reminders and messages. Sticky notes are usually seen in books as bookmarks. In addition, we also use sticky notes to tag our assets for personal or social usage. In short, sticky notes are everywhere. Unlike most of our modern digital information devices, sticky notes are portable, low cost and easy to use. However, as written sticky notes accumulate, keeping track of our stickies and the information they contain gets unruly. Desks, whiteboards, refrigerators, telephones and textbooks are inundated with sticky notes. As a result, stickies become lost, hidden or forgotten about. Furthermore, sticky notes have physical limitations; a particular sticky note cannot be in an office and at home simultaneously. Being a passive piece of paper, sticky notes lack the capability of dispatching reminders about upcoming events or deadlines. After scribbling details of a forthcoming occasion on a sticky note, one can still overlook the appointment due to forgetfulness or loss of the sticky note. Like most paper-based media, sticky notes fall short as a medium that can communicate to other, especially digital, information media we use. Given the wide popularity and practical usefulness of sticky notes, we are compelled to bring them along with us into the 21st century. At the same time, given sticky notes’ weakness in communicating with our digital information world in a more orderly and active way, we feel the need to augment the features of sticky notes. In this paper, we presents ‘Quickies’ that attempts to bridge the gap between the physical and digital worlds of information, linking hand-written sticky-notes to the mobile phone, digital calendars, task-lists, e-mail and messaging clients. Quickies system augments familiar and ubiquitous physical sticky-notes.
2 Related Work Several projects and products have tried to use the metaphor of sticky notes in the digital world. The Post-it® Digital [4] of 3M is a computer software program that provides users digital Post-it® Notes. Although Post-it® Digital features searchability, the scope is limited to the boundaries of a computer, isolated from the portable and convenient physical experience that paper sticky notes provide. There are more than
Augmenting Sticky Notes as an I/O Interface
549
a dozen similar software applications available today, all trying to imitate the simplicity and ease of use of physical sticky notes in the digital realm. Stanford University’s Post-that Notes [5] project attempts to facilitate both searchability and portability, by creating a mobile phone application which captures regular Post-it® notes as pictures within the mobile phone platform. Inspired by the use of sticky notes on whiteboards and walls during the early stages of a project, the Designer’s Outpost [6] of the University of California, Berkeley presents a tangible user interface that combines the affordances of paper and a large physical workspace. The Designer’s Outpost contains an interactive whiteboard with augmented sticky notes that allow users to collaboratively author website architectures. Rasa [7] is a system designed to support situation assessments in military command posts, providing officers the capability of positioning written sticky notes on a paper map with digitizers that simultaneously update a digital database system. TeleNotes [8] was one of the first attempts to provide, in the computer, the lightweight and informal conversational interactions that sticky notes provide. Projects such as HayStack [9] use sticky notes as a metaphor to provide annotation for the semantic web. Projects such as TeamWorkStation [10] and XAX [11] were one of the first attempts to integrate traditional paper media with electronic media. PaperLink [12] system allows marks made on paper to have associations and meaning in an accompanying electronic world. DigitalDesk [13][14] uses augmented reality to provide an integrated experience of both paper and digital documents. Brief overviews of some of these projects are provided below. Designer’s Outpost [6] and Rasa [7] are designed for the specific needs of web developers and military officers, respectively, and as such are not generic systems. In addition, both Designer’s Outpost and Rasa require heavy hardware infrastructure and are targeted towards usage of Post-it notes in collaborative environment. They do not address the use of sticky notes by individuals for the information management task. TeleNotes [8] and HayStack [9] only use the metaphor of features of physical sticky notes in our digital information world. TeleNotes attempts to provide the lightweight and informal conversational interactions that sticky notes provide. HayStack use sticky notes as a metaphor to provide annotation for the semantic web. Post-it® Digital [4] and Post-that Notes [5] attempt to bring the familiarity and features of sticky notes to digital world. Rather than linking the physical and digital, they are confined and limited to computers and mobile phones respectively and thus loses the affordance and intuitive interaction of physical paper sticky notes. DigitalDesk [13][14], TeamWorkStation [10] and XAX [11] are great inspiration for Quickies project in devising integrated experience of both paper and electronic media. PaperLink brings the concept of hyper-linking to physical world by allowing marks made on a paper to have associations and meaning in an accompanying electronic world. Although PaperLink links electronic world and paper, it is limited to hyper-linking. There remains a need for having an integrated system which combines the qualities and affordances of physical sticky notes – portability, adhesiveness, low-cost – with the positive attributes of digital notes – effective information management and organization, automatic reminders and compatibility with the rest of the digital world. Provided their usage can be made as intuitive and efficient as that of regular stickies, the merger between physical and digital stickies can definitely be an added convenience to our fast-paced environment.
550
P. Mistry and P. Maes
3 QUICKIES – Intelligent Sticky Notes Quickies are regular paper sticky notes that have been augmented in a few ways. First, each sticky note contains a unique RFID tag, so that stickies can be located in different parts of a home or office. Second, we use a small digitizer, so that while a note is being scribbled, a digital copy is created. Character and shape recognition is used to translate the note’s content into machine readable data. Finally, specialpurpose knowledge, NLP (Natural Language Processing) and commonsense based AI (Artificial Intelligence) techniques are used to interpret what the content of the note means and what relevant actions should be taken. Subsequently, Quickies updates your electronic calendar with the meeting reminder you wrote down on a paper sticky note; and reminds you 15 minutes before your meeting via an SMS. It syncs the list of items to buy with your computer based task-list. You can locate documents or books tagged with Quickies in your home or office. To look up some information quickly from your computer, you can use Quickies instead of keyboard and mouse. ‘Quickies’ is an attempt to link physical and digital informational media and combine the best of both worlds in one seamless experience.
Fig. 1. (A) Sticky notes at user’s desk (B) Example of a reminder sent to a user’s mobile phone
Quickies are sticky notes that offer portability, connectivity to the digital information world, smart information organization, ability to be findable (searchable as well as locatable) and ability to send reminders and messages. These are just examples, but Quickies can do a lot more. The following usage scenarios present some common problems or tasks that Quickies offers a better solution for than today's paper or electronic solutions. • Imagine you scribbled a sticky note about an upcoming meeting with a colleague; you placed the note on your desktop. Unfortunately, you overlooked the note, completely forgetting about the meeting and went for lunch. Luckily, your intelligent sticky note added the meeting to your online calendar and reminds you about the meeting via a text message on your phone 15 minutes before the meeting.
Augmenting Sticky Notes as an I/O Interface
551
• You write down a person’s name and phone number on a sticky note while talking on the phone. That new contact information is automatically entered in your computer address book. • You create a grocery list or to-do list on a paper sticky note. This list is automatically synchronized with the task-lists in your mobile phone and computer. Now, your mobile phone has a list of the things you noted down to buy, which comes in handy when you are at the grocery store. • You use a sticky note to bookmark a section about the ‘Platypus Paradox’ in Peter Morville’s ‘Ambient Findability’ book. Several weeks later, a discussion about the ‘Platypus Paradox’ arises and you remember bookmarking Morville’s explanation. You can now use Quickies’ graphical interface to search for the keywords ‘Platypus Paradox’. As the system is keeping track of all your notes in digital form, it shows all the relevant notes you have created in past. The system also helps you locate that note (and hence the book) in the house. • It is Saturday and you are at home. You forgot some important information that you noted down on a sticky note while in office on Friday. You ask the Quickies graphical interface to show the notes located at your office. Your computer screen shows you all the notes located at your office. There are many. You filter them by selecting ‘notes created on Friday’. You get the particular sticky note and information you were looking for. • You are in a hurry to get to a doctor's appointment. You ask Quickies system the address of ‘Dr. Smith’ by writing down on a sticky note (or a piece of paper) ‘Address of Dr. Smith’ followed by a ‘?’ mark. In just a few seconds, a small printer prints out the address along with the driving directions to Dr. Smith’s clinic. • Your mom prefers using paper rather than mobile phones and computers. She leaves a message for you on a sticky note when leaving for the market. The note recognizes that this is a message to you; looks up your mobile number in the contact-list and sends you her message as an SMS. The system of Quickies allows sticky notes to be used as an interface to the digital world of information. As shown in Figure 1 (A), the user writes down a reminder for a meeting with a friend. 15 minutes before the meeting, at 2:15 PM she receives a message on her mobile phone reminding her about the meeting (see Figure 1 (B).) The Quickies system reminds the user at appropriate time or remembers things on behalf of the user. The system is also configurable according to the user’s personal preferences so that the user can decide what she wants the system to do in particular situations. For example, if she has a habit of putting a star (‘*’) in front of important notes, she can configure the system to interpret accordingly. With Quickies system, sticky notes (or paper) can be used not only as an input but also an output medium. As shown in the Figure 2 (A) the user writes down a query “? The address of Dr. Smith” on a sticky note. As shown in the Figure 2(B) a small handheld printer prints out Dr. Smith’s address from the user’s address book in the computer. It also prints out the driving direction to Dr. Smith’s clinic from the current location. One of the most interesting features Quickies provide is ‘findability’. The user can use physical sticky notes to tag her assets or documents and later can locate that tag, hence the tagged object, at home or in the office using the Quickies graphical
552
P. Mistry and P. Maes
Fig. 2. (A) User writes a query on a sticky note (B) A handheld printer prints out the requested address and driving directions
Fig. 3. (A) User writes on a sticky note (B) User tags a book with the sticky note (C) User searches notes related to the word ‘Pattie’ (D) A sticky note with the RFID tag on back
interface. At the back of each of the Quickies is a unique RFID tag, which makes it possible to locate Quickies in the house or office.As shown in Figure 3 ‘A’ and ‘B’, the user uses a Quickie to tag the book given to her by a friend with that friend’s first name. Some weeks later when the user wants to return the book to her friend, she uses the Quickies graphical user interface (Figure 3 ‘C’) to search through all the notes she has created.By searching for her friend’s name she sees all the notes that mention her friend’s name. She can see the digital version of the note saying “PATTIE’S BOOK”, which she used to tag the book. As shown in the Figure 3 ‘D’, the note has an RFID tag on the back that gets picked up by one of the many RFID readers positioned in the house so that the book can be located.The computer program also provides other information such as when the user created the note, and all the different locations where that RFID tag (and so forth the book) has been detected in the past.
4 How Does ‘QUICKIES’ Work? The Quickies system consists of a digitizer hardware (pad + pen) device, a software program and physical sticky notes. Optionally, the system can also include a
Augmenting Sticky Notes as an I/O Interface
553
handheld printer, RFID readers and RFID tags. The user uses the digitizer pad-pen hardware to write on the paper sticky notes. All the handwritten notes created by the user are captured and the digital representations of the notes are saved in the note database. The system also interprets the content of the notes and categorizes them into one of many possible types of notes. At present, the Quickies system can categorize notes into following types: To-do list, Meeting reminder, Message, List of items, Contact, Payment, Query, and Tag. The Quickies system provides a highly visual interface to browse these digital representations of the notes. The software interface can let the user sort, filter or search for one or more specific notes by keywords, date created, physical location of the note and type of the note. The system also performs a set of operations according to the type of the note.
Fig. 4. ‘Quickies’ system Figure 4 presents a detailed explanation of how Quickies work. Physical sticky notes are captured and stored in the computer using commercially available digitalpen hardware, which captures the movement of the pen on the surface of a sticky note. The digital-pen hardware used in the prototype uses an ultra-sound wave sensing mechanism. Two stationary sensors receive ultra-sound waves that are emitted by a transmitter placed at the tip of the pen. The device measures the location of the pen tip on the paper based on the calculation of receiving-time differences of the signals received by the two stationary receivers. A software program stores the
554
P. Mistry and P. Maes
handwritten notes as images/strokes and converts the stored hand-written notes into computer-understandable text using handwriting recognition algorithms. As shown in Figure 5, the computer program also provides a highly visual user interface for browsing or searching all of the user’s notes based on keywords. The user can also use the ‘Advanced Search’ feature for searching notes at created at particular time or located at a place in office. For example, the user can search for all the Quickies on the user’s desk at work that contain the word ‘Urgent’? The recognized text is processed using a commonsense knowledge engine which is based on NLP and ConceptNet [15]. This process provides the note database with contextually rich information. Later, the computer program uses its understanding of the user’s intentions, content and the intended purpose of the notes to provide the user with reminders, alerts, messages and just-in-time information.
Fig. 5. Graphical user interface of Quickies
5 Implementation We implemented a fully working prototype of the ‘Quickies’ system [16]. Handwritten note capturing is performed by the Pegasus PC NoteTaker digital pen hardware. The ultrasonic sensing mechanism provides the system with X and Y coordinates of the pen tip (X(t) and Y(t)). The spring mechanism at the tip of the pen picks up pen-up/pen-down switching. Time sampling of X and Y coordinates of pen tip (X(t) and Y(t)) are captured in strokes. These strokes (also known as digital ink) are passed to the handwriting recognition engine. On-line handwriting recognition algorithms convert the pen strokes of text into digital text. The engine also analyses layout of the written text and primitive shapes, if any, on the sticky note. The output of the handwriting recognition engine with added information of layout and graphical shapes is passed to the interpretation engine that uses ConceptNet [15], Natural Language Processing (NLP) and some other computational methods in order to support categorizing and understanding the intended purpose of the notes. This engine categorizes and tags the note with its type. The system currently supports following categories: to-do list, meeting reminder, message, list of items (not a list of tasks), contact, payment reminder, query to the system and tag. Each note is saved in an XML database. Along with the content and type of the note, for each note the system also captures extra information such as the note ID, creation date and time, author of the note, etc. The actual graphical representation of the note is also saved as an image file and also referenced in the XML database.
Augmenting Sticky Notes as an I/O Interface
555
According to the type of the note, the system also performs relevant extra actions. For instance, the system updates the user’s digital calendar with the ‘meeting reminder’, by adding the entry for the event at the specified date and time. It can also remind the user about the meeting via an SMS or an Email. For notes of type ‘message’, the system looks up the contact information of the person that the message is written for in the address book and sends that person an SMS or an Email with the message. The to-do lists get synced with the user’s digital task-lists and new contacts are updated in the user’s address book, even though they are written on sticky notes. The system is also capable of processing simple queries on the user’s address book, email clients or digital calendars. Notes of type ‘query’ are replied to with requested information on the computer screen or on a printout. A portable pocket printer is used as an output medium for printing out answers to the user’s queries in the prototype. The most important feature of the Quickies system is that the user can customize what he or she wants the system to do in different cases or in case of different types of notes. In order to make the Quickies trackable at home or in office, each sticky note contains a unique RFID tag on the back. Multiple RFID readers keep track of the availability of the individual RFID tags in their vicinity. UHF (902-928 MHz) RFID readers and EPC Gen 2 tags are used in the prototype system. This mechanism provides sticky notes unique IDs and links the IDs to content. The user can use the Quickies graphical interface to browse, search or filter particular notes he is interested in. The user can also find the location of a note and hence the object he has tagged with the note at home or in office. The RFID tracking mechanism enables this feature.
6 Conclusion This paper presented ‘Quickies’ – a system that bridges the gap between the physical and digital world, linking hand-written sticky-notes to the mobile phone, digital calendars, task-lists, e-mail and messaging clients. We explained what are ‘Quickies’ and what they can do. The paper also described the system design and implementation details of the ‘Quickies’ system. By augmenting the familiar and ubiquitous physical sticky-note, ‘Quickies’ leverages existing patterns of behavior, merging paper-based sticky-note usage with the user's informational experience. Upon synchronizing with and connecting to the popular digital devices for information management, such as personal computers and mobile phones, ‘Quickies’ - intelligent paper sticky notes can prove to be an alternate and intuitive interface to digital information for people around the world, the majority of whom have found it frustrating struggling to enter a strange digital world dominated by mouse and keyboard.
References 1. Liu, Z., Stork, D.: Is paperless really more? Communications of the ACM 43(11), 94–97 (2000) 2. Perry, M., O’Hara, K.: Display-Based Activity in the Workplace. In: Proceedings of INTERACT 2003, pp. 591–598 (2003) 3. The 3M Story, 3M Company (2002), http://www.3m.com
556
P. Mistry and P. Maes
4. Post-it® Digital, http://www.3m.com/us/office/postit/digital 5. Post-that Notes, http://hci.stanford.edu/cs294h/projects/postthat.doc 6. Klemmer, S.R., Newman, M.W., Farrell, R., Bilezikjian, M., Landay, J.A.: The designers’ outpost: a tangible interface for collaborative web site design. In: Proc. UIST 2001. ACM Press, New York (2001) 7. McGee, D.R., Cohen, P.R., Wu, L.: Something from nothing: augmenting a paper-based work practice via multimodal interaction. In: Proc. DARE 2000 on Designing augmented reality environments, pp. 71–80 (2000) 8. Whittaker, S., Swanson, J., Kucan, J., Sidner, C.: TeleNotes: managing lightweight interactions in the desktop. In: TOCHI 1997, vol. 4(2), pp. 137–168 (1997) 9. Haystack Project, http://groups.csail.mit.edu/haystack/ 10. Ishii, H.: TeamWorkStation: towards a seamless shared workspace. In: Proceedings of the ACM conference on Computer-supported cooperative work, pp. 13–26 (1990) 11. Johnson, W., Jellinek, H., Klotz Jr., L., Rao, R., Card, S.: Bridging the paper and electronic worlds: the paper user interface. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 507–512 (1993) 12. Arai, T., Aust, D., Hudson, S.: PaperLink: a technique for hyperlinking from real paper to electronic content. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 327–334 (1997) 13. Newman, W., Wellner, P.: A desk supporting computer-based interaction with paper documents. In: Proceedings of the SIGCHI conference on Human factors in computing systems (1992) 14. Wellner, P.: Interacting with paper on the DigitalDesk. Communications of the ACM 36(7), 87–96 (1993) 15. Liu, H., Singh, P.: ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal 22(4), 211–226 (2004) 16. Mistry, P., Maes, P.: Intelligent Sticky Notes that can be Searched, Located and can Send Reminders and Messages. In: Proceedings of the ACM International Conference on Intelligent User Interfaces (IUI 2008), Canary Islands, Spain (2008)
Sonification of Spatial Information: Audio-Tactile Exploration Strategies by Normal and Blind Subjects Marta Olivetti Belardinelli1,2, Stefano Federici2,3, Franco Delogu1, and Massimiliano Palmiero2 1
Department of Psychology, ‘Sapienza’ University of Rome, Rome, IT ECONA, Interuniversity Centre for Research on Cognitive Processing in Natural and Artificial Systems, IT 3 Department of Human and Educational Sciences, University of Perugia, Perugia, IT [email protected]
2
Abstract. On the basis of a meta-analysis of existing literature about sonification technologies, new experimental results on audio-tactile exploration strategies of georeferenced sonificated data by sighted and blind subjects are presented, discussing: technology suitability, subjects’ performances, accessibility and usability in the user/technology interaction. Keywords: sonification, blindness, mental mapping, audio-tactile exploration strategies.
1 Three Orders of Problems in the Cognitive Research on Sonification In recent years researchers have been increasingly attracted by the possibility of conveying spatial information through non-visual sensory channels. In particular, the sonification technology that implements non-speech audio information to represent data allows “the transformation of data relations into perceived relations in an acoustic signal for the purposes of facilitating communication and interpretation” [2]. Three orders of problems are tied to the substitution of the auditory sensory channel to the visual one: 1) from an objective point of view, the capacity of the acoustic mean to convey information similar to the visual one has to be proven; 2) from a subjective point of view one may wonder if the potential information represents also a real one; 3) from the point of view of the interaction user/environment the problems regard the effective possibility for the user to explore and navigate a non-visual representation of space. Taking into account the objective point of view, several parameters of the sound (timbre, frequency and intensity) may be combined in a meaningful percept in order to make sonification feasible to a large variety of fields, as well as to segregate or group multiple simultaneous sources, even minimizing the working load. Spatial information by means of acoustic messages can be provided with speech, music and environmental sounds. In all cases a learning or training phase is therefore mandatory for the sonification to be effective. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 557–563, 2009. © Springer-Verlag Berlin Heidelberg 2009
558
M. Olivetti Belardinelli et al.
The objective point of view about sonification redirects therefore the researchers’ interest to a subjective perspective related to the allocation of attentional resources in auditory and visual spatial perception. In fact, it is questionable whether the representation of space is directly tied to the visual experience or rather it is an a-modal one either collecting information from different senses or forming equivalent representations from different sensory channel inputs. If spatial representations generated by different sensory modalities besides vision are functionally equivalent to the visual representation of space [3] blind people should be potentially able to gather functionally equivalent spatial mapping using tactile, auditory and kinaesthetic information, contending by this way with the absolute necessity of visual experience for spatial understanding [34]. On the other side the subjective perspective cannot be separated from the interactive one, according to which the central focus of investigation is shifted toward the possibility of space exploration and navigation and to the distinction between egocentric and allocentric space, founded on to two different frames of reference: the egocentric frame of reference and the allocentric one.
2 Coding and Processing Strategies of Non-visual Spatial Information When exploring the near-space, people with little or no visual experience generally prefer to code spatial relations by reference to their own body co-ordinates [26]. Following this view blind and sighted individuals should perform similarly in tasks requiring an egocentric reference [18], although a recent investigation on the systematic distortions in blind haptic exploration put in evidence a shift from an egocentric to an allocentric representation when a delayed or a verbal response is required [32]. The auditory coding aimed to supply a mental representation of space [15], [28] was not extensively investigated, due to the peculiarity of auditory processing at the sensory, neural and cognitive levels (once more the subjective perspective) and, perhaps much more, to the technical contingencies in developing fit softwares for new communication modalities or augmented communication (objective perspective). From the subjective perspective experimental results seem to converge into the idea that a combination of sound and touch will work better than a single modality in non-visual displaying of spatial information [39]. Already in 1984 Wikens [35] verified that sound can enhance a visual or haptic display. More recently Ramloll, Yu, Riedel, and Brewster [33] found that a combination of touch and sound will provide the optimal technology to read line graphs. A combination of haptic and auditory information is used in iSonic, a new sonification tool developed at the University of Maryland to facilitate the exploration of georeferenced information [36], [37], [38].
3 Sonification as a Means for Communicating Spatial Information Taking into specific account the objective perspective the first systems were loudspeakers-based systems, simulating sound sources from different locations [11], [20] that may be used solely in indoor environments, with parameters such as distance, resolution, etc. already fixed.
Sonification of Spatial Information: Audio-tactile Exploration Strategies
559
Much more possibilities are offered by bat-like sonar systems relying on distance cues to analyze the auditory scene and to generate acoustical spatial maps by means of ultrasounds [16], [17], or musical scale [14], although some difficulties to judge the height of obstacles have been reported [6], [25]. The most of the sonification tools implement Head Related Transfer Function (HRTF). These systems particularly rely on binaural cues and pinna filtering to analyze the auditory scene. Individualization of the HRTF system is also possible, especially when sound elevation is needed [4]. However the HRTF systems allow blind users only to localize objects which are in a limited perimeter within which they move around.
4 Sonification and the Blinds: Is Technology Really Assistive? One of the most important trends in sonification applications regards sensory substitution and sensory integration for visually impaired people, although most research on sonification for blind people, being authored by computer scientists and not by psychologists, scarcely consider both the subjective perspective and the interactive one, very often using unsatisfactory definitions of accessibility and usability. Moreover most experiments are carried only with sighted blindfolded subjects. Very few systems, attempted to relate sonification to vision as in Meijer’s software [23]: here visual information is analyzed by a software that sweeps the images with a vertical scan line. However, it was proved that continuous scanning of the environment from left to right may confuse the user and requires considerable concentration even if after an intensive training neural plasticity from hearing to vision may occur with the activation of the lateral-occipital tactile-visual area in sighted subjects [2], as well as in a congenitally and a late blind subject [24]. In a recent Conference Ag Asri Ag [1] presented a HCI Sonification Application Model for usability inspection, based on the Toolkit Technology for Interactive Sonification by Pauletto & Hunt [29], without giving information about subjects and results. Similar shortages of information can also be found in Candey et al. [5]. Auditory information, either verbal or musical, was added in some studies to other sensory information to enhance performances [21], [30], [31], [10], [12], nervertheless, none of these studies assessed the accessibility and usability of the tested devices on blind users. A slight different situation characterizes the research on exploration and navigation of sonificated spatial representations, that demonstrated on blind users the capacity to identify mathematical concepts [22], simple 2-D graphical shapes[1] and table data location and acquistation [33]. Everinova showed that directional-predictive sounds are reliable and effective in guiding blind’s exploratory behaviour [9], and Heuten et al. [13] assessed on blind users the accessibility of a new sonification interface to explore city maps. In the above cited iSonic the software accessibility was tested before on blindfolded sighted subjects [8], then comparing blindfolded, congenitally and acquired blinds [27], [7] and finally with an intensive use of the software by subjects totally blind since long time [38].
560
M. Olivetti Belardinelli et al.
5 Audio-Tactile Exploration Strategies: Comparing Normal and Blind Subjects In iSonic different musical instruments indicate different map features and exploring contexts, while different pitches indicate different levels of a given geopolitical variable (e.g. unemployment, or crime rate statistics). Map exploration may be performed using two different navigation tools: a computer keyboard or a touch-pad. Recently Delogu et al. [8] demonstrated that congenitally blind, acquired blind and blindfolded people did not significantly differ in good recognition performances by means of both interfaces. These results confirm the suitability of the acoustic mean to convey spatial information (objective point of view). In the following we will try to clarify if sighted blindfolded, congenitally blind, and acquired blind subjects: 1) perform differently in the recognition of sonificated maps (subjective point of view) and 2) deploy different strategies and modalities into the audio-haptic exploration of sonificated maps (interactive point of view). In the first experiment 20 blind participants (10 early and 10 late) and 16 sighted blindfolded subjects explored three sonified auditory maps representing patterns of unemployment rates in U.S.A. 4 plastic tactile maps for each task were used in the recognition phase, one target (corresponding to the sonificated one) and three distractors. After the auditory exploration of each one of the maps (either by means of the keyboard or the touch-pad), subjects performed a tactile recognition of the navigated map among three distractors. The analyses showed that in all tasks the target tactile map was well recognised and that congenitally blind, acquired blind and sighted blindfolded subjects do not differ in detecting targets in all tasks. No differences among the groups were found in relation to exploration exhaustiveness, preferred direction toward the right, and direction change generally coinciding with the variations in sound. Viceversa as regards the displacement velocity index, the congenitally blind subjects perform quite the double amount of steps with respect to both late blind and sighted subjects. In the final questionnaire blind subjects answered very differently from the sighted ones, judging the proofs more simple and the stereo-panning more important in orienting exploration. To further investigate these differences the above described paradigm was repeated with 20 new blind subjects. After each task they were requested to reproduce the sonificated map by inserting plastic nails in a punched board to delimitate the map external borders, and three more kinds of nails, of different sizes, to indicate the employment rates. This way, we obtained a quantitative and tangible external representation of subjects’ mental map. The analysis shows that the reproductions of the sonificated maps explored by touchpad users are much more accurate in terms of boundaries and inner details than the ones made by keyboard users. Moreover the reproduction through keyboard navigation shows a systematic reproduction error in the bottom left corner, probably due to the left/right direction of the sweeping. To conclude our results indicate: 1) Sonification integrated with tactile exploration may be a suitable tool for transmitting spatial geographical information (objective perspective). 2) The equivalent recognition performances of sighted, acquired blind and congenitally blind subjects is in accordance with the hypothesis of a possible equivalence of different sensory channels in transmitting spatial information, consistently with the hypothesis of an a-modal representation of space results (subjective
Sonification of Spatial Information: Audio-tactile Exploration Strategies
561
perspective). 3) As for the interactive perspective, the higher speed of congenitally blind as well as the better information reproduction after touch-pad navigation, provide interesting insights about multimodal integration in space navigation.
References 1. Alty, J.L., Rigas, D.: Communicating graphical information to blind users using music: the role of context. In: ACM CHI, pp. 574–581. ACM Press/Addison-Wesley Publishing Co, New York (1998) 2. Amedi, A., Stern, W.M., Camprodon, J.A., Bermpohl, F., Merabet, L., Rotman, S., Hemond, C., Meijer, P., Pascual-Leone, A.: Shape conveyed by visualto-auditory sensory substitution activates the lateral occipital complex. Nat. Neurosci. 10(6), 687–689 (2007) 3. Avraamides, M., Loomis, J., Klatzky, R.L., Golledge, R.G.: Functional equivalence of spatial representations derived from vision and language: Evidence from allocentric judgments. J. Exp. Psychol. Learn. 30, 801–814 (2004) 4. Berman, L., Danicic, S., Gallagher, K., Gold, N.: The Sound of Software: Using Sonification to Aid Comprehension. In: 14th IEEE International Conference on Program Comprehension, ICPC, IEEEXPLORE (2006) 5. Candey, R.M., Schertenleib, A.M., Diaz Merced, W.L.: Sonify sonification tool for space physics. In: ICAD, pp. 289–290 (2006) 6. Davies, T.C., Patla, A.E.: Obstacle Avoidance Strategies Using a Sonic Pathfinder. In: Canadian Society of Biomechanics, Halifax, Nova Scotia (2004) 7. Delogu, F., Federici, S., Olivetti Belardinelli, M.: La rappresentazione di mappe sonificate in soggetti ciechi: modalità e strategie di esplorazione audio-tattile. In 14th AIP (2008) 8. Delogu, F., Olivetti Belardinelli, M., Palmiero, M., Pasqualotto, E., Zhao, H., Plaisant, C., Federici, S.: Interactive sonification for blind people exploration of geo-referenced data: comparison between a keyboard-exploration and a haptic-exploration interfaces. In: Cogn. Process. 7 (suppl. 1), pp. 178–179 (2006) 9. Evreinova, T., Vesterinen, L., Evreinov, G., Raisamo, R.: Exploration of directionalpredictive sounds for nonvisual interaction with graphs. Knowl. Inf. Syst. 13(2), 221–241 (2007) 10. Garcia-Ruiz, M.A., Arthur, E., Aquino-Santos, R., Vargas Martin, M., Mendoza-Quezada, R.: Using Sonification to Teach Network Intrusion Detection: A Preliminary Usability Study. In: World Conference on Educational Multimedia, Hypermedia and Telecommunications, pp. 849–857. AACE, Cheasepeake (2007) 11. Golledge, R.G., Loomis, J.M., Klatzky, R.L., Flury, A., Yang, X.L.: Designing a personal guidance system to aid navigation without sight: progress on the GIS component. Int. J. Geo. Inf. Sci. 5(4), 373–395 (1991) 12. Guizatdinova, I., Guo, Z.: Sonification of Facial Expressions. University of Tampere, New Interaction Techniques Finland (2003) 13. Heuten, W., Wichmann, D., Boll, S.: Interactive 3D Sonification for the Exploration of City Maps. In: NordiCHI, pp. 155–164 (2006) 14. Heyes, T.: Sonic Pathfinder. Electr. & Wireless World 90, 26–29 (1984) 15. Kawai, Y., Kobayashi, M., Minagawa, H., Miyakawa, M., Tomita, F.: A Support System for Visually Impaired Persons Using Three-Dimensional Virtual Sound. In: ICCHP 2000, pp. 327–334 (2000) 16. Kay, L.: Auditory perception of objects by blind persons, using a bioacoustic high resolution sonar. J. Acoustical Soc. Am. 107, 3266–3275 (2000)
562
M. Olivetti Belardinelli et al.
17. Kay, L.: Bioacoustic spatial perception by humans: a controlled laboratory measurement of spatial resolution without distal cues. J. Acoustical Soc. Am. 109, 803–808 (2001) 18. Klatzky, R.L., Golledge, R.G., Loomis, J.M., Cicinelli, J.G., Pellegrino, J.W.: Performance of blind and sighted persons on spatial tasks. J. Visual. Impair. Blin. 89, 70–82 (1995) 19. Kramer, G., Walker, B.N., Bonebright, T., Cook, P., Flowers, J., Miner, N., et al.: The Sonification Report: Status of the Field and Research Agenda. In: Report prepared for the National Science Foundation by members of the International Community for Auditory Display. International Community for Auditory Display (ICAD), Santa Fe, NM (2006) 20. Lakatos, S.: Recognition of complex auditory-spatial patterns. Perception 22(3), 363–374 (1993) 21. Maffiolo, V., Chateau, N., Mersiol, M.: The impact of the sonification of a vocal server on its usability and its user-frendliness. In: ICAD, pp. 130–134. Published by Advanced Telecommunications Research Institute (ATR), Kyoto (2002) 22. Mansur, D.L., Blattner, M., Joy, K.: Sound-Graphs: A numerical data analysis method for the blind. J. of Med. Sys. 9, 163–174 (1985) 23. Meijer, P.B.: An experimental system for auditory image representations. IEEE Trans. Biomed. Eng. 39, 112–121 (1992) 24. Merabet, L., Pogge, D., Stern, W., Bhatt, E., Hemond, C., Maguire, S., Meijer, P., PascualLeone, A.: Activation of visual cortex using crossmodal retinotopic Mapping. In: 14th HBM, the Annual Meeting of the Organization for Human Brain Mapping, Melburne, Australia (2008) 25. Milios, E., Kapralos, B., Kopinska, A., Stergiopoulos, S.: Sonification of range information for 3-D space perception. IEEE Trans. Neural Sys. Rehab. and Eng. 11(4), 416–421 (2003) 26. Millar, S.: Understanding and representing space: theory and evidence from studies with blind and sighted children. Oxford University Press, Oxford (1994) 27. Olivetti Belardinelli, M., Delogu, F., Palmiero, M., Federici, S., Pasqualotto, E., Zaho, H., Plaisant, C.: Interactive sonification of geographical maps: a behavioural study with blind subjects. In: 15th ESCOP (2007) 28. Parente, P., Bishop, G.: BATS: the blind audio tactile mapping system. In: ACM Annual Southeast Regional Conference (2003) 29. Pauletto, S., Hunt, A.: A Toolkit for Interactive Sonification. In: ICAD (2004) 30. Pfeiffer, D.G., Maffiolo, V., Chateau, N., Mersiol, M.: Listen and Learn: an Investigation of Sonification as an Instructional Variable to Improve Understanding of Complex Environments. Comput. Hum. Behav. 28(2), 475–485 (2008) 31. Poguntke, M., Ellis, K.: Auditory attention control for human-computer interaction. In: Human System Interactions IEEEXPLORE, pp. 231–236 (2008) 32. Postma, A., Zuidhoek, S., Noordzij, M.L., Kappers, A.M.L.: Keep an eye on your hands: on the role of visual mechanisms in processing of haptic space. Cogn. Process. 9(1), 6–68 (2008) 33. Ramloll, R., Yu, W., Riedel, B., Brewster, S.A.: Using non-speech sounds to improve access to 2D tabular numerical information for visually impaired users. In: Annual conference of British Computer Society (BCS) IHM-HCI, pp. 515–530. Springer, Heidelberg (2001) 34. Ungar, S.: Cognitive mapping without visual experience. In: Kitchin, R., Freundschuh, S. (eds.) Cogntive Mapping: Past, Present and Future, pp. 221–248. Routledge, London (2000) 35. Wickens, C.D.: Processing resources in attention. In: Parasuraman, R., Davies, R. (eds.) Varieties of attention, pp. 63–101. Academic Press, New York (1984)
Sonification of Spatial Information: Audio-tactile Exploration Strategies
563
36. Zhao, H., Plaisant, C., Shneiderman, B., Duraiswami, R.: Sonification of geo-referenced data for auditory information seeking: design principle and pilot study. In: ICAD (2004) 37. Zhao, H., Smith, B.K., Norman, K., Plaisant, C., Shneiderman, B.: Interactive Sonification of Choropleth Maps: Design and Evaluation. IEEE Multimedia 12(2), 26–35 (2005) 38. Zhao, H., Shneiderman, B., Plaisant, C.: Listening to Choropleth Maps: Interactive Sonification of Geo-referenced Data for Users with Vision Impairment. In: Lazar, J. (ed.) Universal Usability, pp. 141–174. John Wiley & Sons Ltd., Hoboken (2008) 39. Zuidhoek, S.: Representation of space based on haptic input. Febodruck, Utrecht (2005)
What You Feel Is What You Get: Mapping GUIs on Planar Tactile Displays Maria Schiewe1, Wiebke Köhlmann1, Oliver Nadig2, and Gerhard Weber3 1 Universität Potsdam, Institut für Informatik, August-Bebel-Straße 89, 14482 Potsdam, Germany {schiewe,koehlmann}@cs.uni-potsdam.de 2 Deutsche Blindenstudienanstalt e.V., Am Schlag 8/10, 35037 Marburg, Germany [email protected] 3 Technische Universität Dresden, Institut für Angewandte Informatik, Nöthnitzer Straße 46, 01187 Dresden, Germany [email protected]
Abstract. Exploiting the advantages of planar tactile displays, we aim for efficient and effective information retrieval for blind users. To facilitate orientation, we define four regions segmenting the available space: header, body, structure, and detail region. Furthermore, we suggest four views—layout, outline, symbol and operating view—that define how detailed and in which manner information from window-based applications is displayed on tactile displays. Keywords: Tactile user interface, tactile interaction, visually impaired, tactile devices, pin-matrix devices.
1 Introduction The WIMP (Window, Icon, Menu, Pointing device) paradigm has been the predominant interaction technique for more than two decades. Sighted users naturally access and manipulate two-dimensional information when working with computers. Video screens render the information as needed, e.g. displaying complex spreadsheets and bar charts. In contrast, blind users are normally limited to a single-lined Braille display and synthesized speech. Tables and charts must thus be linearized—making spatial dependencies difficult to perceive. Braille displays are usually 40 or 80 characters wide and allow only for a small amount of simultaneous information. As a consequence, no two-dimensional enrichment is possible. Two-dimensional tactile output is typically available through static heat-raised reliefs. However, information presented by such reliefs is not suitable for dynamic changes. Therefore several refreshable tactile displays of various size have been developed in the last 20 years. Metec’s DMD 120060, the Dot View Series from KGS, C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 564–573, 2009. © Springer-Verlag Berlin Heidelberg 2009
What You Feel Is What You Get: Mapping GUIs on Planar Tactile Displays
565
the NIST display, and Handytech’s GWP are the most prominent examples of pinmatrix devices, ranging from 24×16 up to 120×60 pins size.1 Pin-matrix devices allow for a multimodal representation of information, examples for general guidelines include [2] and forthcoming ISO 9241-9xx. Early approaches were already aiming at specific applications merging tactile graphics with Braille. A tutorial system for geometry as described in [3] has allowed blind users to manipulate geometrical shapes such as triangles or learn about parallel lines. The system demonstrates that shapes are recognized on a planar tactile display from their tactile gestalt while exploring a relief made of pins with one or two hands. Textures obtained from scanned documents and presented as bitmaps can be separated into textual and nontextual content and interpreted by blind users if appropriate pre-processing is performed [4]. One of the first tactile window systems for a planar tactile display was developed for a Personal Information Manager [5]. More recent work shows how web browsing benefits from spatial layout of tactile graphics and text [6]. The challenge of the HyperBraille project2 is to make various applications accessible on a large planar tactile display with multitouch input: the newly developed BrailleDis 9000 [7] pin-matrix device that is 120×60 equidistant pins large (see Fig. 1) Although a direct mapping of the video screen’s two-dimensional information to tactile devices is trivial, it is not usable by blind users. The concepts of information visualization do not comply with that of tactile output. Hardware restrictions and special needs of blind users impose additional requirements which must be considered.
Fig. 1. Two prototypes of the BrailleDis 9000
This paper, thus, presents a concept for the conversion of common visual (windowbased) user interfaces into tactile interactive presentations. The paper is structured as follows. After presenting the user requirements for two-dimensional tactile representations, the concept of regions and views is introduced. Furthermore, the result of the concept’s primary evaluation is discussed. The paper closes with a conclusion and an outlook.
1 2
For a comprehensive overview of tactile displays see [1]. HyperBraille project website: http://www.hyperbraille.com/
566
M. Schiewe et al.
2 Requirements The target group, blind users, is heterogeneous and, therefore, has differing needs. In contrast to Braille displays, users’ experience with bi-manual work techniques plays a major role. On a planar tactile device, users expect that they can apply work techniques they have been trained on. Tactile representations aim to compensate for visual constraints. For visual designs Shneiderman’s information-seeking mantra [8] poses questions on how to gain an overview, how to zoom and filter, and how to get details on demand. Similar questions have to be answered for tactile designs. Furthermore, Shneiderman points out that the bandwidth is higher for visual context than for media addressing other senses, in our case the tactile sense. In contrast to video screens, the resolution of pin-matrix devices is rather low, e.g. 10 dpi for the BrailleDis 9000. Therefore, information has to be reduced or displayed in fragments leading to an extreme low information density. Compared to conventional Braille displays, larger tactile devices allow for displaying a considerably large amount of information, for example, to obtain an overview of documents or vertical relations in tables. Thus, less mental effort is needed to imagine these relationships and to create a mental model. But as such a large amount of information is unfamiliar to most blind users, its presentation easily leads to orientation problems. To guide users through the mapped information, tactile or acoustic clues can be utilized similar to visual focuses. To avoid additional requests by users (e.g. via shortcuts or dialog boxes) certain types of frequently needed information have to be displayed permanently in fixed locations. Both suggestions could prevent disorientation and allow for familiarization with the new dimensionality. This is especially important as the display changes dynamically and users read actively, i.e. by moving their fingers over the display [9]. A dynamic planar tactile representation calls for bi-manual operation, but many blind users are not accustomed to working with parallel information and beginners often work one-handed. Therefore, a balance between the amount of simultaneously displayed information and unused space needs to be found to avoid information overload. To allow for goal-oriented working, consistent and quick access to information has to be provided. This calls for short distances between related information. For blind users this is more important than for sighted users, as tactile exploration is more sequential and punctual and, therefore, only allows minimal preprocessing of the data. In terms of interaction, the aim of planar representations must be to augment control and to offer intuitive work techniques [10]. Working with two devices, for instance keyboard and tactile two-dimensional display, requires frequent hand movements between input and output device. This restricts users in their performance as they can only use one mode at a time. Hence, ergonomic operations need to be facilitated through input possibilities, e.g. gestures [11], which can be performed on the device. Blind users cannot take advantage of the What You See Is What You Get (WYSIWYG) principle. Line-based output provided by Braille displays does not enable them to check formatting and layout directly. When mapping graphical user interfaces on two-dimensional tactile devices, this functionality needs to be provided
What You Feel Is What You Get: Mapping GUIs on Planar Tactile Displays
567
in a similar intuitive manner by adapting the principle to What You Feel Is What You Get (WYFIWYG), e.g. providing possibilities for layout control and grasping adjacencies. Considering the requirements discussed above, the main goals of the following concept are (1) to improve the efficiency of blind users’ information retrieval on twodimensional low-resolution space, (2) to improve effectiveness by avoiding mistakes that frequently occur with conventional assistive technologies caused by the absence of spatial and overview information, and (3) to ensure the users’ satisfaction with the adapted visual representation. The remainder of this paper will focus on a layout concept supporting bi-manual reviewing and tracking of applications with possibly multiple windows. As discussed by [12] and [13] the spatial relationship may be kept in a tactile presentation but often has to be re-mapped into new sequences of textual or non-textual presentations more easily graspable by the users.
3 Regions Orientation problems occur because of the high amount and parallelism of information. Therefore, complex information has to be structured according to importance and purpose while ensuring short distances between related data. This is achieved by distinct regions grouping information according to their properties. We defined four non-overlapping regions segmenting the available space: header, body, structure, and detail region (see Fig. 2), with each region holding distinct types of information and incorporating certain visual concepts. However, they are not limited only to displaying information found in these concepts. Regions are clearly distinguishable by users, e.g. separated by lines with a prominent pin pattern, and can be hidden when they are not needed.
Fig. 2. Schematic drawing of the four regions
The header region is predominantly used to inform users about the properties and status of the application, or to display the application’s menu. The header region reflects the visual concepts of title, status, and menu bars and is located at the top of the display. Using the BrailleDis 9000 one to two lines, i.e. 120 pins horizontally and 4 to 9 pins vertically, are allocated to the header region. The body region covers 34 to 44 pins vertically and 110 pins horizontally. It accounts for roughly two thirds of the overall space if all other regions are minimized and for half if they are maximized. However, the full space is allocated to the body
568
M. Schiewe et al.
region when all other regions are hidden. As the prime region, the body region is used for displaying as much information from the application’s content frame as possible and cannot be hidden. The structure region highlights the position of elements according to certain properties, e.g. formatting or hierarchy. By default, the region is 5 pins wide, allowing it to display two Braille characters, and is always as high as the adjacent body region. Symbols encode the properties within the structure region and, thus, indicate the vertical position of the corresponding elements in the body region. Therefore, the structure region allows for efficient locating of headings, comments, or spelling mistakes in complex text documents. The position of the structure region can be configured to be located to the left or right of the body region. It may also be useful to provide a horizontal structure region or several structure regions, for example, when working with spreadsheets. The detail region covers the same amount of space as the header region and is located below the body region. It contains all available details of the element that has been focused. Such details comprise, for example, hotkey and status of a menu item, coordinates of a spreadsheet cell, or color and font size of text. The detail region may also display data that can be found in tooltips and other contextual information. The four regions facilitate fast orientation and locating on planar tactile devices and, thus, support reviewing and tracking. In this context, bi-manual operation is far more efficient than working one-handed. On large displays, this is always preferable as reorientation requires more time than returning to a specific position on a single Braille line.
nondominant
Table 1. Bi-manual operation in regions
header body structure detail
header O X
body X O X X
dominant structure X O
detail X O
The regions in which bi-manual operation can be more efficient than on Braille displays are shown by Xs in Table 1. Os represent combinations which are also possible on those devices. The dominance of a hand depends on ergonomic considerations rather than users’ handedness. The hands could be positioned in any region, but only the indicated combinations are sensible in our concept. For example, using one hand in the body and the other in the structure region helps to quickly locate headings in text documents. To obtain formatting information about the heading, one hand is placed on the heading while the detail region is read with the other hand. Changes within the header and detail region can easily be tracked as they occupy little space. Changes in the body region are harder to track, for example, if another heading is added to the document. However, the small structure region helps users to perceive such changes faster, as it is directly tied to the body region.
What You Feel Is What You Get: Mapping GUIs on Planar Tactile Displays
569
4 Views Diverse tasks and the user group’s heterogeneity impose the need for different representations of two-dimensional information. We call such different presentation modes views. While regions define where information is to be located, views define how information is displayed and how detailed. Views are characterized by this level of detail and their physical appearance—purely textual, semi-graphical or graphical. We provide four predefined views that cover the majority of the users’ needs: layout, outline, symbol and operating view. Some allow for a quick overview of an application, others display as much detail as possible for a focused element and its surroundings. Of course, other views could also be reasonable. The layout view preserves the pixel information while mapping the content to lowresolution tactile screens. Thus, text does not appear as Braille but rather in a tactile version of print (see Fig. 3). In the layout view the original screen content can be explored in as much detail as possible. This allows control of the layout of a document, for instance, ensuring correct white spacing or avoiding other layout problems otherwise occurring in text documents produced by blind users [14]. In that respect, What You Feel Is What You Get. Furthermore, direct access to the graphical representation may compensate for the lack of semantic information retrievable from inaccessible applications. Furthermore, the layout view can be used to explore rather inaccessible applications that, for example, do not provide labels on their icons. Direct access to the graphical representation may compensate for the lack of semantic information retrievable from such applications. The outline view provides an abstract representation of the screen’s objects. It enables a quick overview of complex objects, such as applications or complete documents, at the expense of details. The outline view maintains spatial relations, but in contrast to the layout view, it does not fully preserve layout information, it rather reflects the structure. This is achieved by exclusively displaying simple geometrical shapes, like lines and rectangles, that indicate groups of objects or of objects’ elements. Thus, no textual information is displayed. For instance, the outline view illustrates the overall structure of a website by solely displaying the outline of images, paragraphs, and embedded objects. It is also possible to indicate filtered information by highlighting its relative position within the outlines. For instance, links within paragraphs appear as lines. The symbol view is similar to the layout view. It preserves the relative position of elements, but text appears in Braille and graphics are displayed semi-graphically, i.e. they are replaced by predefined shapes or symbols. Thus, the symbol view resembles to some extent the line mode of present screen readers that preserves spatial relations. For example, in a form containing check boxes all elements are displayed according to their original position in the symbol view. While labels are given in Braille, the boxes are represented by brackets. The operating view is optimized for a fast work flow and bears a resemblance to the screen readers’ structured mode. Thus, the adjacencies of the focus are displayed as detailed as needed to work efficiently. Graphical details are disregarded in favor of a logical and structured representation of the screen content and information is
570
M. Schiewe et al.
Fig. 3. Sample application (Microsoft Word) in the layout and operating view
provided exclusively textually (see Fig. 3). In the operating view spatial dependencies are preserved only when necessary, i.e. when resulting from semantics rather from layout concerns like in spreadsheets or diagrams. Views influence the display of information in the regions. As not every combination of regions and views is necessarily reasonable, we decided to allow the view to be switched in the body region only, while all other regions remain in the operating view by default.
5 Evaluation Our concept was tested with 11 legally blind adults with 35 to 60 years of age, divided into three groups. The first group consisted of four male subjects who were experts in computer usage. The second group was composed of two female, and the third group of three male and two female subjects. The subjects of the second and third groups were intermediate computer users. Our evaluation method was based on an adaption of Morgan’s and Newell’s methodology of narrative video [15] that uses video and live theater to address usability problems. However, instead of showing a video, audio recordings illustrating our concept’s advantages and potential weaknesses were used to motivate discussions within the groups. Subsequently, this technique is referred to as audio confrontation. Sixteen embossed printings3 served as test material for all groups, because the BrailleDis 9000 was not available in such quantities that each subject could be provided with a display. The paper mock-ups were of the same resolution as the device, showing various combinations of regions and views of common programs. Each test lasted approx. six hours and was conducted by a moderator. Due to time restrictions not all mock-ups could be tested with the third group. Throughout the tests, the subjects’ hands were videotaped and their discussions recorded. In all settings, the subjects were seated around a table, each subject having a binder with paper mock-ups. In the beginning, the general concept of regions and views was explained. Afterwards the moderator described the scenario shown on the current paper mock-up. The expert group discussed the presented scenario. At the conclusion, one subject summarized the group’s impressions for the audio confrontation planned with the intermediate users. This procedure was repeated for all paper mock-ups. 3
Differences of traditional and tactile paper prototyping are discussed in detail in [16].
What You Feel Is What You Get: Mapping GUIs on Planar Tactile Displays
571
The intermediate groups also heard the moderator’s explanation. Subsequently, they listened to the recorded comment of the expert group and discussed the corresponding scenario. These two steps took place in an alternating manner, i.e. either the audio confrontation was conducted directly after the explanation of the moderator or the subjects discussed without bias before having heard the audio and commented on it. We found that both expert and intermediate users approved of our concept of regions and views. However, some concern was expressed that visual work techniques might be imposed on blind users. Certainly, a balance between new and familiar work techniques is needed and an introductory training is obligatory. Already, our tests showed that after exploring only few mock-ups, the subjects applied the region concept independently. The structure region was well liked as it allows users to maintain an overview of which properties or formatting were present in a particular line in an unfamiliar multiline presentation. The subjects asked for a flexible position of the structure region according to their preferred work techniques. As already assumed in our concept, they also suggested providing an additional horizontal structure region for presenting twodimensional relations. Our assumption that the outline view improves the conception of document structure and orientation and helps in learning about proportions e.g. of cells in tables or paragraphs in documents was affirmed. The subjects pointed out that the distances between related information needed to be short. Otherwise usage might be inefficient and some information go unnoticed. To ensure the retrieval of all relevant information with little searching, a consistent placement of types of information within the regions is needed. Some subjects in the two intermediate groups were overwhelmed by the amount of information, whereas subjects of the expert group enjoyed the display of parallel information and the efficient usage of available space. In addition, the expert group pointed out that additional information about a focused element in the detail region allows more independence from speech output. The two-dimensional arrangement of regions suggests working with both hands, e.g. retaining the current position in the body region with one hand and obtaining additional information in the detail region with the other hand. As there were subjects mostly using one hand for reading, one-handed work techniques also have to be considered. The subjects also raised critical concerns about the ergonomics when using a keyboard and the pin-matrix device simultaneously. This would result in one-handed use of the display. In order to allow for bi-manual operation, interaction must be available through the device itself, whether through gestures or hardware controls. This requirement is supported by the subjects’ preference for direct interaction through click gestures on the display.
6 Conclusion and Outlook Regions support bi-manual tracking and reviewing of information on planar tactile displays, thus allowing for more efficient information retrieval compared to Braille displays. Additionally, well-defined views ensure high effectiveness. Nevertheless,
572
M. Schiewe et al.
they require a high degree of spatial and tactile skill from the users. Thus, the question concerning the balance between the amount of presented tactile information and its necessary space has to be discussed individually for every software application. Regions and views satisfy two demands of Shneiderman’s information-seeking mantra on planar tactile displays: the outline view provides the requested overview, and details can be obtained via the detail region. It still has to be investigated how zoom operations and filtering can be done adequately by blind users on said devices. Regions and views were accepted well within our concrete scenarios, but a comparative evaluation with other concepts has to follow. Furthermore, assumptions about users’ preferences of views have to be verified by long-term use. These assumptions include, for example, that experienced users will mostly use the operating view, while blind users that are less familiar with computers or a specific application will prefer the outline and layout view. Establishing a concept for the transformation of two-dimensional information into a tactile representation for planar tactile devices is only the first step towards an efficient, effective and satisfying user interface for the blind. The next step is to develop an appropriate concept for speech and sound output to create a multimodal environment. Only the tactile and auditory channels combined can form the basis of a state-of-the-art screen reading technology.
Acknowledgments We thank all blind subjects in Marburg, Pinneberg and Kiel for participating in the evaluation very actively and, thus, giving us a lot of valuable feedback. We also thank Thorsten Völkel for his helpful comments, and Megan Kramer for improving the paper’s grammar and style. The HyperBraille project is sponsored by the Bundesministerium für Wirtschaft und Technologie (German Ministry of Economy and Technology) under the grant number 01MT07003 for Universität Potsdam and 01MT07004 for Technische Universität Dresden. Only the authors of this paper are responsible for its content.
References 1. Vidal-Verdú, F., Hafez, M.: Graphical Tactile Displays for Visually-Impaired People. IEEE Transactions on Neural Systems and Rehabilitation Engineering 15(1), 119–130 (2007) 2. Proceedings of the Workshop on Guidelines for Tactile and Haptic Interfaces (2005), http://userlab.usask.ca/GOTHI/ 3. Schweikhardt, W., Fehrle, T.: Ein rechnergestützter Zeichenplatz für Blinde (A ComputerBased Drawing System for the Blind). In: Ebersold, J.M., Schwyter, T., Slaby, W.A. (eds.) Computerised Braille Production, Katholische Universität Eichstätt-Ingolstadt, pp. 251–261 (1986) 4. Lokowandt, G., Schweikhardt, W.: Distinguishing Pattern-Types in Printed Documents. In: Zagler, W.L., Busby, G., Wagner, R.R. (eds.) ICCHP 1994. LNCS, vol. 860, pp. 206–213. Springer, Heidelberg (1994)
What You Feel Is What You Get: Mapping GUIs on Planar Tactile Displays
573
5. Klöpfer, K.: Ein multifunktionaler Büroarbeitsplatz für Blinde (A Multifunctional Work Place for the Blind). Dissertation, Universität Stuttgart, Institut für Informatik (1987) 6. Rotard, M., Taras, C., Ertl, T.: Tactile Web Browsing for Blind People. Multimedia Tools Appl. 37(1), 53–69 (2008) 7. Völkel, T., Weber, G., Baumann, U.: Tactile Graphics Revised: The Novel BrailleDis 9000 Pin-Matrix Device with Multitouch Input. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2008. LNCS, vol. 5105, pp. 835–842. Springer, Heidelberg (2008) 8. Shneiderman, B.: Designing the User Interface—Strategies for Effective Human-Computer Interaction, 3rd edn. Addison-Wesley, Reading (1998) 9. Jürgensen, H., Power, C.: Information Access for the Blind—Graphics, Modes, Interaction. In: Proceedings of the Workshop on Guidelines for Tactile and Haptic Interfaces, pp. 13–25 (2005) 10. Weber, G.: Reading and Pointing—Modes of Interaction for Blind Users. In: Ritter, X.G. (ed.) Information Processing 1989, pp. 535–540. Elsevier, Amsterdam (1989) 11. Sturm, I., Schiewe, M., Köhlmann, W., Jürgensen, H.: Communicating through Gestures without Visual Feedback. Submitted to PETRA 2009, Corfu (2009) 12. Challis, B.P., Edwards, A.D.: Design Principles for Tactile Interaction. In: Murray-Smith, R. (ed.) Haptic HCI 2000. LNCS, vol. 2058, pp. 17–24. Springer, Heidelberg (2001) 13. Sjöström, C.: Non-Visual Haptic Interaction Design. Phd thesis, Certec (2002) 14. Diggle, T., Kurniawan, S., Evans, D., Blenkhorn, P.: An Analysis of Layout Errors in Word Processed Documents Produced by Blind People. In: Miesenberger, K., Klaus, J., Zagler, W.L. (eds.) ICCHP 2002. LNCS, vol. 2398, pp. 587–588. Springer, Heidelberg (2002) 15. Morgan, M., Newell, A.F.: Interface Between Two Disciples, the Development of Theatre as a Research Tool. In: HCI International 2007, pp. 184–193. Springer, Heidelberg (2007) 16. Miao, M., Köhlmann, W., Schiewe, M., Weber, G.: Tactile Paper Prototyping with Blind Subjects. Submitted to HAID 2009, Dresden (2009)
Multitouch Haptic Interaction Michael Schmidt and Gerhard Weber Technische Universität Dresden, Institut für Angewandte Informatik, Nöthnizer Str. 46, 01602 Dresden, Germany {mschmidt,gerhard.weber}@inf.tu-dresden.de
Abstract. Gestural user interfaces designed for planar touch-sensitive tactile displays require an appropriate concept for teaching gestures and other haptic interaction to blind users. We consider proportions of hands and demonstrate gestures by tactile only methods without the need for Braille skills or verbalization. A user test was performed to confirm blind users may learn gestures autonomously. Keywords: haptic interaction, assistive technology, gestures.
1 Introduction Although the use of tactile displays for Braille is not new and common when browsing the web or working with a GUI while using a screen reader, a whole set of new possibilities is open to blind or visually impaired people by the appearance of large pin-matrix devices [1, 2]. Access to contextual and layout information as well as graphical notations may become available. Unlike speech synthesis is competence in Braille the basis for reading and writing a tactile notation. However, graphical notations such as maths have to be linearized in Braille in order to support reading and writing avoiding drawn fraction bars, for example. As tactile displays convey layout information they can present graphical information such as arrows, scrollbars or window frames [3]. Tactile displays are refreshable and hence even non-verbal information may be expressed through animated tactile or vibrating patterns. Major limitations arise only when graphics are encountered, for which no accessible alternative description has been developed. The integration of touch-sensitive sensors on a Braille display [4] seems to allow gestural input in the context of haptic interaction, even if problems like the Midas touch effect arise. By combining large pin-matrix devices with touch-sensitiveness, use of more intuitive interaction techniques appears on the horizon. Gestural touch input could improve efficient user communication with the system. The options of pointing and direct manipulation, mnemonic vs. abstract interaction are promising, but for blind users the real advantage is the locality of input. While exploring a tactile display’s content through touch, switching to conventional input devices means a loss of focus and therefore extra time and effort for reorientation. Active tactile interaction takes the direction and temporal structure of movements into account [2]. Feedback from tactile output while moving and feedback from other media ensures perception of information and enhanced navigation. This has led to C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 574–582, 2009. © Springer-Verlag Berlin Heidelberg 2009
Multitouch Haptic Interaction
575
non-visual multimodal systems such as the Nomad [5], which provide verbal and nonverbal acoustic feedback for touch input on relief placed on a touch- or pen-sensitive surface. The HyperBraille project1 aims to allow exploring the screen beyond text and plain widgets. This is enabled by conversion of information to a tactile representation shown on the BrailleDis9000 [2]. Besides information retrieval from screen content [3] and creation of tactile pendants to graphical user interfaces [6], part of this work is to enable the user to control a multimodal system while also reading with the fingers and the hand simultaneously. In this paper we investigate the implications for designing tactile interaction while the computer may guide the necessary movements. We outline some concepts and problems as well as explicitly target the problem of teaching gestures to blind persons. We present a prototype and our findings from an evaluation.
2 Related Work Gestures have been proposed for use by blind people for graphical applications with a non-visual user interface, in the domain of mobile devices, and for access to graphical user interfaces with a Braille display. Complexity of non-visual utilization of gestures has increased considerably over time. A blind user of the Nomad device receives auditory feedback after touching some tactile shape [5]. Such audio-haptic interaction techniques create an affordance to tap with a single finger by the type of shape. Many more different gestures may be memorized if the user is guided mechanically. Hill and Grieb show gestures may control a word processor and allow flexible text editing through an audio-haptic user interface [7]. An interactive tactile map of stars on a planar tactile display has shown that loss of overview after gestural input may occur [8]. In this application fingers explore a tactile display sequentially but both hands can be active simultaneously and one hand may be used for gesturing. Gestures change the scale of the map through circling and control the selected region of the sky by forming a caret into the intended direction of panning. After such gestural input users have to restart to orient themselves on the map. Although touch screens on mobile devices lack tactile output, they can be used in an audio-haptic user interface, if some vibration can be generated while audio is played back. Moving fingers along the edges on a touch screen of a PDA has been evaluated successfully with blind children for navigating geographical information [9]. This application intends to utilize gesturing with off-the-shelf mobile devices for cost-effectiveness. Only simple gestures with mechanical support from the casing can be formed. This study showed that users memorize between one and three strokes in various combinations, if auditory verbal feedback follows completion of gestural strokes. Multi-touch gestures have also been demonstrated [10] to be useful for blind people, if used for audio-haptic interaction on a mobile phone. Single strokes by a single finger, angular strokes by a single finger and a single stroke by two fingers simultaneously have been evaluated. The authors point out the need for error robustness and propose to confirm gestural input by tapping with a second hand. 1
http://www.hyperbraille.com
576
M. Schmidt and G. Weber
Another development is based on a Braille display with one line of 80 Braille modules. Gestural input is generated by moving a single finger forward or backward [4] while touching Braille pins. These gestures may be distinguished by their speed of movement. Speech output, for example, is triggered by a speed-up of movement towards the end of the line. In summary, strokes, combined strokes, strokes from multiple fingers, circles and carets have been used even at differing speed by blind users. But learning additional and possibly multi-touch gestures involving multiple fingers appears to be not achievable without a proper approach for training gestures.
3 Teaching Gestures to the Blind To compensate some drawbacks of gestural input and get ductile interaction and therefore a high degree of efficiency, a set of gestures should meet several requirements, like being intuitive and memorizable. But intuition is often misleading, if gestures have no deictic nature while at the same time people tend to forget gestures [11]. In terms of usability we need self-explanatory and learnable gestures. As most common gestures are not self-explanatory we propose as a precondition for learnability its teachability to describe its execution. A printed manual’s constricted way of communicating information on interface design is the bottleneck of learnability. Through teachability, a gesture becomes graspable from appropriate use of a planar tactile display. When it comes to utilization of tactile interfaces by blind persons involving scenarios that include gestural input several methods of explaining are possible: 1. Keep gestures simple enough to describe them verbally. 2. A second person demonstrates gestures by guiding the user’s hand. 3. Illustrations are provided for each gesture. For many applications the first method would apply well. For instance, on mobile devices a small invariable set of strokes does fine. For evaluations under lab conditions, as they are done within HyperBraille, the second and third methods are adequate, too. If targeting at an interaction concept on a tactile display capable of substituting mouse input and adding the option of users creating their own gestures for specific tasks, one cannot rely on these methods anymore. Autonomous work with such a system includes the ability to not only initially learn gestures, but also recall self-defined gestures the same way as system defined ones, if needed. A method is required that is capable to serve as fallback in case of missing memorization, needs no second person, would work with gesture sets rather flexibly and, as a side effect, provides an instrument to measure a gesture’s teachability.
4 System Evaluation For a first insight into the possibilities of showing the concept of gesture input for tactile interaction a set of gestures was defined involving single and multiple fingers of the same hand. Our objective was to communicate non-verbal information on
Multitouch Haptic Interaction
577
several more or less complex spatial gestures performed with tactile only feedback by multiple fingers. 4.1 Participants Tests were performed by six subjects, three female and three male. Two of them were legally blind, the other four sighted but blind-folded during tests. Due to organizational issues it was not possible for the congenital blind person to go through the whole testing procedure. Nevertheless, we include her results as they give some useful indications. Table 1. Test persons participating in our tests Subject
Gender
Degree of Impairment
Handedness
1
female
congenital blind
dextral
2
male
gone blind
sinistral
3
male
blind-folded
sinistral
4
female
blind-folded
dextral
5
male
blind-folded
dextral
6
female
blind-folded
dextral
4.2 Experimental Set-Up The BrailleDis 9000 [2] is a planar tactile display with 720 touch-sensitive modules arranged in 12 lines per 60 modules. Each module contains 10 pins in 5 rows of 2 pins each, resulting in a matrix of 7200 pins arranged in 60 rows. Touch intensity is measured in 256 steps but for our purposes cutting them down to binary steps by some predefined threshold was sufficient. Our prototype showed some sensitivity to normal hand perspiration, causing modules reporting further touch input some time after fingers left them. For this reason we covered the planar display by a foil, thin enough to allow proper sensing of pins. While now hands were not easily sliding on that foil, four subjects made usage of magnesium carbonate (liquid chalk). Our modification reasonably improved recognition rates of all gestures. Application. A gesture guide was developed capable of detecting a comfortably laid down hand on the planar display. The program offers a compact graphical user interface for displaying blobs caused by the hand’s touch and ensures random selection of specific tasks. Each task involves guiding the hand to an initial position and, on
578
M. Schmidt and G. Weber
arrival, displaying gestures in a tactile form. The gesture guide recognizes different fingers as well the size of the hand. It generates dynamically prototypical gestures mostly as static relief pattern. Some gestures included an animated relief pattern built up according to the progress of the user’s movement.
Fig. 1. Gesture Guide’s data (left) of a hand on the display (right) Table 2. Gesture set made of thumb (T), index finger (I) and middle finger (M)
Gesture
Name
panning
zoom out
zoom in
drag & drop
undo
Abbr.
P
ZO
ZI
DD
U
two finger check (T+I)
two finger caret (T+I)
full screen
desktop
(close areas)
(minimize all windows)
FS
D
caret
check
left squared bracket
Gesture
Name
Abbr.
minimize Window
close area
MW
CA
focus
F
Multitouch Haptic Interaction
579
Additionally the program includes a modified $1 classifier [12] to support multitouch input. This classifier came along with a set of user defined gesture prototypes. The gesture guide fits them to match the display’s resolution. Hence in our tests we used gestures as they could occur after a user defined them himself and not as perfect geometric shapes. Gestures. The provided gestures are all embedded in basic scenarios that could occur while working with a tactile user interface [6]. Our classifier is simple, and improved techniques have been developed elsewhere [13, 14], allowing to assist users’ further input and thus may reduce error rates additionally. To support such a system, early cut-down of possible gestures would be of advantage. This can be achieved by incorporating multiple modalities, interaction context (history), and static (fingers to use) or dynamic (locality) features. Diversification of gestures has been applied in this study. Keeping this in mind we created gesture prototypes to open/close regions/windows, rearrange objects, perform panning, zooming in and out, marking an area and going back in interactions history. Table 2 describes single and parallel finger movements by arrows. 4.3 Procedure Eeach test consisted of 41 test runs, each made of three phases. In phase 1 subjects were asked to comfortably lay down their preferred hand on the tactile display such that their fingertips and palm touch the surface. The program identified palm as well as each finger and chose fingers needed to draw the gesture with respect to the task. Selecting fingers means elevating all pins except the ones under the chosen finger tips. In other words, fingers involved in a gesture start to move from within a groove formed by lowered pins. In phase 2 under each selected finger the grove is extended and leads to an initial position. Subjects move their selected fingers to the initial position where they are recognized to prepare for phase 3. Phase 3 shows the actual gesture as a path consisting of pins (width is one pin) for every selected finger starting at its initial position while all other pins are lowered. The subject now follows these possibly multiple paths of pins to their end. The 41 test runs are made up of four sets A, B, C and D. The first five test runs (set A) serve as an introduction, explaining procedures and phases to the subjects. They consist of simple gestures of type DD (straight lines involving different initial points and targets). Additionally information is encoded representing two execution speeds (slow and fast). Each speed is coded in two ways, either by fast or slow blinking pins (alternating every second pin) or by gaps between every two set pins where a gap of one pin indicates fast movement while a gap of two pins indicates slow movement. Repetition during this introduction was possible. Subjects were asked what kind of speed coding they preferred. The following test runs, taken from sets B, C and D, were presented in randomized order. Set B includes only panning and zooming gestures with variations of speed. Subjects were asked to identify the gesture (which includes number of fingers used) and the speed information sensed in each case. We tested five variations of panning (normal and one of each speed coding) plus three versions (normal plus both blinking variants) of each zooming gesture. Part C contains a selection of partly multi touch,
580
M. Schmidt and G. Weber
but single stroke gestures (U, MW, CA, F, FS and D, see table 2). Subjects were only asked to describe the gesture’s figure along with the number of fingers they were using. If they couldn’t describe them for sure (some even recognized the check gesture as a check mark) subjects were asked to draw the gesture three times on the tactile display where it was classified by our recognizer. If at least two out of three times the gesture was classified correctly (and was drawn with the right number of contact points), it was treated as properly learned. To resemble real world conditions, a collection (part D) of 20 DD gestures was included randomly. All test persons could choose repetitions of sequences if they were unsure about their answers. Every repetition was recorded as a new trial, if it contained all phases. This was due to limitations of the software when canals of the second phase crossed positions of unselected fingers and thus lead to misinterpretations. In these cases the test supervisor repeated the sequence before it reached the third phase and the test person was asked to slightly rotate the hand. 4.4 Results In the second part of our test procedure (panning and zooming) only two erroneous gestures were recognized as one subject didn’t realize he had to use three fingers for two of the panning gestures. Two panning sequences (normal and one pin gap lines) weren’t attempted by subject S1 due to a busy schedule. If asked for the preferred speed coding, four out of the six subjects said blinking is the better choice, although one of those subjects made two mistakes in its recognition. Nevertheless two other persons (one preferred blinking) made one mistake in recognition of the encoding by gaps. We conclude gaps seem to be more challenging than blinking to indicate speed of movement. Results of the third part are shown in Figure 2. 9 8 7 Count
6 5 4
Erroneous
3
Additional Trials
2 1 0 U
WM
D
FS
CA*
F
Gesture
Fig. 2. Number of trials and errors in performing gestures. *Note that only five subjects performed Close Area (CA).
There were 35 single tests at an average of 1.8 trials and 0.3 errors for each gesture per subject. As Figure 2 shows by the number of additionally needed trials, our two finger single stroke gestures were the most challenging, followed by the “close area” gesture due to its bended form. Approximately one third of the total number of single
Multitouch Haptic Interaction
581
tests was not properly recognized but that number is to be modified if inspected closer. For instance, there were four errors for two types of the check gestures (CA, FS) that were classified as a “v” with the correct number of fingers used. Interestingly the double caret gesture was described like an arrow. Anyhow, it challenged some subjects as they were misled to twist their two fingers around each other for input. Changes in palm orientation and use of multiple fingers indicate a lower teachability of such gesture types.
5 Conclusions and Future Work As presented, the work could only scrape the surface of problems concerning nonvisual haptic interaction on planar tactile displays. Further investigation is to be done to enable blind users to define their own gestures. Nevertheless we showed gestures can be learned from tactile only feedback. Our method suggests learning gestures in a way which requires no knowledge of Braille. This has several advantages like possibly addressing deaf-blind users or Braille-illiterate people. More parameters could be included such as width of reliefs, tactons and Braille. But it has its drawbacks, too, if gestures are difficult to teach due to crossing one’s own path. An audio-haptic approach may create audio feedback on automatic identifiable features like the number of fingers used or execution speed should confirm the user’s own recognition. In addition to this an audio-haptic approach may resolve issues where the user is not capable of touching with his hand due use of a pen or grip [15]. Furthermore, the software could not convey that the user could decide which fingers to use and how large the amount of variation is that would be allowed. The decision how to input a specified gesture is not taken by the system. In fact, the user should find an ergonomic way to induce the touch input that is necessary for a specific gesture to work. Even at the size of an A4 sheet the tactile display’s size is limited. It is not always possible to guide the user to the most comfortable initial position. A slightly twisted hand or thumb complicates proper sensing of multiple lines and, for instance, made following the zooming gestures somewhat fiddly. The presented work did not include self-crossing gestures like the single stroke x. With reliable touch-sensitiveness on elevated pins amending the software to dynamically draw the guiding lines should be easily possible to a certain degree. Of course, this approach comes to its limits, too, at more complex gestures, but would offer the option of teaching writing or sketching with fingers. Far more challenging would be teaching of multistroke and compound gestures by the available instruments. Finally a very important question is, whether it would be possible to allow the second hand touching and reading, too, while the first hand is monitored. This may become the future way of avoiding the Midas touch effect while allowing gesture input. Acknowledgements. We kindfully thank all subjects for participating and supporting us with valuable hints. The HyperBraille project is sponsored by the Bundesministerium für Wirtschaft und Technologie (German Ministry of Economy and Technology) under the grant number 01MT07004 (for Technische Universität Dresden). Only the authors of this paper are responsible for its content.
582
M. Schmidt and G. Weber
References 1. Rotard, M., Bosse, K., Schweikhardt, W., Ertl, T.: Access to Mathematical Expressions in MathML for the Blind. In: Proc. of the HCI International Conference, pp. 1325–1329 (2004) 2. Völkel, T., Weber, G., Baumann, U.: Tactile Graphics Revised: The Novel BrailleDis 9000 PinMatrix Device with Multitouch Input. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2008. LNCS, vol. 5105, pp. 835–842. Springer, Heidelberg (2008) 3. Kraus, M., Völkel, T., Weber, G.: An Off-Screen Model for Tactile Graphical User Interfaces. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2008. LNCS, vol. 5105, pp. 865–872. Springer, Heidelberg (2008) 4. Kipke, S.: Sensitive Braille Displays with ATC Technology (Active Tactile Control) as a Tool for Learning Braille. In: Miesenberger, K., et al. (eds.) ICCHP 2008. LNCS, vol. 5105, pp. 843–850. Springer, Heidelberg (2008) 5. Parkes, D.: Nomad, an audio tactile tool for the acquisition, use and management of spatially distributed information by visually impaired people. In: Proceedings of the Second International Symposium on Maps and Graphics for Visually Handicapped People, pp. 24–29. A.F.&Dodds, London (1988) 6. Schiewe, M., Köhlmann, W., Nadig, O., Gerhard Weber, G.: What You Feel is What You Get: Mapping GUIs on Planar Tactile Displays. In: Stephanidis, C. (ed.) Universal Access in HCI, Part II, HCII 2009. LNCS, vol. 5615, pp. 564–573. Springer, Heidelberg (2009) 7. Hill, D.R., Grieb, C.: Substitution for a restricted visual channel in multimodal computerhuman dialogue. IEEE Transactions on Systems, Man and Cybernetics 18(2), 285–304 (1988) 8. Weber, G.: Adapting direct manipulation for blind users. In: Ashlund, S., et al. (eds.) Adjunct Proceedings of INTERCHI 1993, pp. 21–22. Addison Wesley, Reading (1993) 9. Sánchez, J., Maureira, E.: Subway Mobility Assistance Tools for Blind Users. In: Stephanidis, C., Pieper, M. (eds.) ERCIM Ws UI4ALL 2006. LNCS, vol. 4397, pp. 386–404. Springer, Heidelberg (2007) 10. Kane, S.K., Bigham, J.P., Jacob, O., Wobbrock, J.O.: Slide Rule: Making Mobile Touch Screens Accessible to Blind People Using Multi-Touch Interaction Techniques. In: Proceedings of the 10th international ACM SIGACCESS Conference on Computers and Accessibility, Halifax, Nova Scotia, Canada, October 13 - 15, 2008, pp. 73–80. ASSETS. ACM, New York (2008) 11. Wolf, C.G., Morrel-Samuels, P.: The Use of Hand-Drawn Gestures for Text Editing. International Journal of Man-Machine Studies 27(1), 91–102 (1987) 12. Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Proceedings of the 20th Annual ACM Symposium on User interface Software and Technology, UIST 2007, Newport, Rhode Island, USA, October 07-10, 2007, pp. 159–168. ACM, New York (2007) 13. Rubine, D.: Specifying gestures by example. SIGGRAPH Comput. 25(4), 329–337 (1991) 14. Bau, O., Mackay, W.E.: OctoPocus: a dynamic guide for learning gesture-based command sets. In: Proceedings of the 21st Annual ACM Symposium on User interface Software and Technology, UIST 2008, Monterey, CA, USA, October 19 - 22, 2008, pp. 37–46. ACM, New York (2008) 15. Brewster, S., Brown, L.M.: Tactons: structured tactile messages for non-visual information display. In: Cockburn, A. (ed.) Proceedings of the Fifth Conference on Australasian User interface - Volume 28, Dunedin, New Zealand. ACM International Conference Proceeding Series, vol. 53. Australian Computer Society, Darlinghurst
Free-form Sketching with Ball B-Splines* Rongqing Song, Zhongke Wu, Mingquan Zhou, and Xuefeng Ao Institute of Virtual Reality and Visualization, Beijing Normal University, BeiJing 100875 [email protected], [email protected], [email protected], [email protected]
Abstract. Quickly and conveniently generating a 3D freeform model is a challenging problem in computer graphics field. This paper proposes a new approach for rapid building 3D freeform shapes through sketching based on ball B-splines. Keywords: Sketch, Ball B-Spline, Free-form Shapes.
1 Introduction It is a challenge to quickly and conveniently generate a 3D freeform model in computer graphics field. Despite the 3D scanning technique has been widely used, it is not suitable for model building in the initial or conceptual stage of design. The goal of the paper is to build 3D model rapidly and intuitively through sketching silhouette curves. Sketching is a fast and intuitive method of building 3D models from freehand 2D input. 3D shapes are generated through regarding the sketching 2D curves as silhouettes from current points of view. There exist two well-known sketching systems to create 3D geometric models by 2D sketching, SKETCH and Teddy [1, 2]. SKETCH is based gesture recognition. Teddy uses polygonal mesh model. Recent works propose some systems based on implicit function [6, 7]. In this paper, a new sketching method using ball B-Spline curves and surfaces is proposed. Ball B-Spline curves and surfaces represent 3D objects as well as its skeleton, explicitly and intuitively. This gives more potential for prototype modeling. Therefore, we primarily use ball B-Spline curves to represent 3D objects in our method. The rest of this paper is organized as follows: In section 2, fundamentals of ball B-Spline curves are introduced; in section 3, the skeleton algorithm on sketching contours is discussed; in section 4, Sketching methods are described; in section 5, Editing methods are further discussed as more convenient way for solid generation; finally some conclusions are given. *
The work is partly supported by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry and the National High Technology Research and Development Program of China (863 Program. No: 2008AA01Z301).
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 583– 592, 2009. © Springer-Verlag Berlin Heidelberg 2009
584
R. Song et al.
2 Introduction to Ball B-Spline Curve 2.1 Sphere Geometry A sphere is defined as a set
< C ; r >= {x ∈ R3 | x − C ≤ r , C ∈ R3 , r ∈ R + } Here C is the center of the sphere and
(1)
r is the radius.
2.2 Ball B-Spline Curve
N i , p (t )
is the i-th B-Spline basis of degree
p with knot vector.
[ u 0 , ..., u m ] = { a
, ..., a , u p + 1 , ..., u m − p − 1 , bN ..., b } p +1
p +1
(2)
Now the Ball B-Spline Curve can be defined: n
< B > (t ) = ∑ N i , p (t ) < Pi ; ri >
(3)
i =0
Pi is called control points, ri is called control radius. As shown in Fig.1
Fig. 1. Control polygon, radius (white) and center curve (red) 2.3 Ball B-Spline surface
N i , p (u ) is the i-th B-Spline basis of degree p with knot vector
[u0 ,..., um ] = {a ,..., ..., b} a , u p +1 ,..., um − p −1 , bN p +1
p +1
(4)
Free-form Sketching with Ball B-Splines
And
N i , q (v )
is the j-th B-Spline basis of degree
585
q with knot vector
[v0 ,..., vm ] = {d,..., d , uq +1 ,..., um − q −1 , eN ..., e}
q +1
q +1
(5)
Then the ball B-Spline Surface is defined as follows: m
n
< s > (u, v) = ∑∑ N i , p (u ) N j , q (v) < Pi , j ; ri , j > i =0 j =0
(6)
Pi , j is called control points, ri , j is called control radii. As shown in Fig.2
Fig. 2. Control polygon, radius (white) and center surface (red)
3 Computing Skeleton User can create a new object by drawing a closed polygon, and then a 3D model will be automatically constructed based on the 2D input. In order to use ball B-Spline curves and surfaces, the control points and control radius must be given, these information can extract from the skeleton of the 2D polygon. In our system, we use voronoi-based method to extract skeleton. Blum was the first person who introduced the concise definition of skeleton or medial axis as the set of centers of maximal disks (2D) or balls (3D) [8]. Medial Axis: The set of points of a region P with more than one closest point among the boundary points ∂ P of the region. Equivalently, it is the set of centers of maximal balls, i.e. of balls in P that are themselves not enclosed in another ball in P [9]. The skeleton has a wide field of applications. It has been proven useful in automatic navigation, compression, shape description and abstraction and tracking etc. there is a large variety of techniques for extracting skeletons and centerlines. We use voronoi-based method to extract skeleton. The method is more suitable for polygon mesh. The skeleton we extract has some restrictions (I) The centerline should be a sequential single-voxel chain (I I)The order of centerline points is crucial for reconstruction.
586
R. Song et al.
(I I I)The skeleton need to be pruned, the less the branches are it will be better for our model. The skeleton of the 2D sketch can be extracted follow the next steps First, while drawing the 2D sketching, there are many restrictions: it should not be self-intersecting, the start point and end point of the stroke will be automatically connected, and the operation will fail if the stroke is self-intersecting. The algorithm to create the 3D shape is described in detail in section 4. When constructing the initial closed planar polygon, the system makes all edges a predefined unit length (Fig.3 (a)).
(a) The initial closed planar polygon
(b) Constrained Delaunay triangulation
(c) The skeleton without prune (d) The pruned skeleton Fig. 3. The process of extract the skeleton
Next, construct a Constrained Delaunay Triangulation (Fig.3 (b)) of the polygon point’s .after the triangulation, the edges can be classified to two categories: The edges of the initial polygon are called external edges, while edges added in the triangulation process are called internal edges. The skeleton can be obtained by line the midpoint of the internal edges which may have many branches (Fig.3(c)), in order to get one main centerline, it should be pruned. Our skeleton extraction algorithm and pruning are based on the ones described in [2].after the pruning ,only the main skeleton will be remain Figure (Fig.3 (d)).The link points of the skeleton are control points of the Ball B-Spine cures and surfaces. Third, we calculate the radius of each point on the skeleton use the method introduced in [2]. Then let the radius of the point be the control radius. After the above operation, an orderly main skeleton (include control points and control radius) will be extracted.
Free-form Sketching with Ball B-Splines
587
4 Sketching Methods 4.1 2D Input and Data Obtaining Our system accepts two ways of input. The first method is drawing using a digital pen on the tablet, while the other one is importing a sketching image, which is stored after previous sketching. As generating 3D model from the sketching by digital pen on tablet in time is more natural to the idea of sketch, we discussed this method mainly. By drawing each stroke, the points on each stroke are stored in independent data structure for following processing. In order to avoid taking bother to compute the intersection points of the stroke, the constraint for inputting is set for users that the stroke should be open. So the stroke closing is conducted by the system automatically later by simply connect the end point and the start point of the drawing. As the drawing rates of the users can be arbitrary but the precision of the tablet is certain, the data points stored can be totally irregular in density, which means the points can be very dense if the user draw slower or can be very sparse if the user draw faster in local places. Therefore the stroke reforming is inevitable for following processing. By setting the gate value for the distance between two adjacent points in the reformed stroke, some too close points will be sorted out and only the points whose distances with the previous ones are larger than the gate value can be stored. Then based on these points in the reformed stroke, the skeleton points and the corresponding radius can be extracted out for later 3D surface construction.
(a) The original stroke
(b) The reformed stroke
(d) The 3D object in mesh mode
(c) The skeleton points and radius
(e) The 3D object in face mode
Fig. 4. Input and data obtaining
The fig.4 shows the process of inputting and obtaining data used for later 3D model generation. (Fig.4 (a)) is a original drawn stroke using the digital pen, (Fig.4 (b)) is the stroke after reforming using the above way, (Fig.4(c)) shows the skeleton points and corresponding diameters of the data spheres used for following interpolation; (Fig.4 (d)) is the 3D mesh model generated from the skeleton points and corresponding radius by interpolation; (Fig.4 (e)) is the rendering result in face form.
588
R. Song et al.
4.2 Reconstruction of the 3D Surface As the solid in reality usually has the disk cross section, it is very natural to use BBSC to construct the 3D surfaces using the skeleton points and the corresponding radius. BBSC Interpolation. In fact, these skeleton points combined with their corresponding radius can be regarded as several spheres, from which a whole BBSC whose center curve passes these central points and their corresponding radius can be obtained through interpolation. We can use B-spline interpolation method to interpolate the center curve and B-Spline scalar function method to interpolate the radius in a BBSC respectively to obtain BBSC interpolation. For more information on B-Spline interpolation, please refer to [10]. For the detail interpolation effect see the following figure. Boundary surface computing. To evaluate the boundary of the region represented by a BBSC, we can regard the boundary as the envelope surface of a one-parameter family of spheres. For example, for any t, center at n ⎧ x ( t ) N i , p (t ) xi = ∑ ⎪ i=0 ⎪ n ⎪ y ( t ) N i , p (t ) y i = ∑ ⎪ ⎪ i=0 ⎨ n ⎪ z (t ) = N i , p (t ) zi ∑ ⎪ i =0 ⎪ n ⎪ r (t ) = N i , p (t ) ri ∑ ⎪⎩ i =0
Then ( x − x (t )) + ( y − y (t )) According to envelope theorem [11], 2
2
(7)
+ ( z − z (t ))2 − (r (t ))2 = 0
(8)
⎧ F ( x, y, z , t ) = 0 ⎪ ⎨ ∂F ( x, y, z , t ) =0 ⎪⎩ ∂t
(9)
The second equation is obtained:
( x − x(t )) x ' (t ) + ( y − y (t )) y ' (t ) + ( z − z (t )) z ' (t ) + r (t )r ' (t ) = 0 For any t, we can compute x (t ) ,
(10)
y (t ) , z (t ) , r (t ) and their derivatives x ' (t ) ,
y ' (t ) , z ' (t ) , r ' (t ) according to the boor algorithms of a B-Spline curve. Now, let X = x − x(t ) , Y = y − y (t ) , Z = z − z (t ) , C = r (t ) ,
Free-form Sketching with Ball B-Splines
D = x ' (t ) , E = y ' (t ) , F = z ' (t ) , G = r ' (t )
589
(11)
So we get the following equation group:
⎧ X 2 + Y 2 + Z 2 = C2 ⎨ ⎩ DX + EY + FZ = −CG
(12)
This is an equation group of the intersection of a sphere and a plane. So its solution is a circle. We call it characteristic circle. When ri =constant for any i, for example, r (t ) =constant, then the boundary is the offsets in 3D space of the center curve by the constant. When the center curve is open and the end sphere’s radius is not zero, the end part is a sphere cap, whose center is the end point of the center curve and the intersection plane can be obtained from above computation. Therefore any point on the boundary of the 3D region can be obtained. Therefore, by computing the boundary points and then triangulate these points, we get a 3D mesh surface which can represent the 3D object. As shown in the following figure, (Fig.5.(a)) is the data spheres used for later interpolation; (Fig.5 (b)) is the result of interpolation and boundary evaluation of the data spheres in the left;(Fig.5(c)) is the rendering effect of the 3D boundary surface which can represent the 3D object.
(a) Data Spheres
(b) Interpolating and
(c) Rendered boundary surface Boundary evaluation (mesh)
Fig. 5. Interpolation of BBSC and boundary evaluation
4.3 System Diagram Use our system,first,we draw a 2D input use the tablet, then the input points will be reformed, then the skeleton will be extract use CDT algorithm, with the skeleton information use ball B-Spline a 3D model will be create., and many rules will be defined to made our model more complex. The diagram in figure 6 describes the system flow.
590
R. Song et al.
5 Editing Operations Our system also provides various editing methods which help the user to generate the free-form objects more conveniently. The editing methods include extrusion, deformation and cutting.
Fig. 6. Diagram of our system
5.1 Extrusion Extrusion is an editing method provided for users to generate an extra part added to the original 3D object. Firstly, the users are allowed to draw a stroke on the surface of the already generated objects, which is regarded as the base stroke. Later, the users need to draw a second extrusion stroke just in the panel which is orthogonal to the panel the base stroke surface lies in. Assume the base stroke is similar to a circle, the center point and the corresponding radii can be figured out. Furthermore, by applying the above method of computing the skeleton, the skeleton points and the corresponding radius of the second stroke can be obtained. Then the center and radii of the base stroke is put at the head of the skeleton points list and radius list for following interpolation. Finally, by interpolation these skeleton spheres in the list, the result 3D object with added part can be generated. The following figure shows some results generated by extrusion.
Fig. 7. Solid generated by extrusion
Free-form Sketching with Ball B-Splines
591
5.2 Deformation As the BBSC has very good features for its solid mathematical representation, the deformation can be easily operated by moving its control spheres or data spheres or by changing its control radius or data radius. After movement or changing, the interpolation can be operated again to get the new BBSC. The following figure shows the deformation effects when generating new objects.
(a) Original object
(b) deformed object
Fig. 8. Object generated by deformation
5.3 Cutting The system provides also cutting method for users to cut the unwanted part of the solid. The detail operation is drawing a line across the solid and specifying which side besides line should be cut. Then the system can figure out the data spheres within the cutting part and throw them away from the original list. Later just by interpolating these left data spheres, the new solid after cutting can be obtained very easily. The following figure shows the how the cutting method is used to get new solid. In (Fig.9 (a)), the cross line is drawn and the lower side of the cross line was specified to cut; in (Fig.9 (b)) the upper side of the cross line was specified to throw away.
(a)
(b)
Fig. 9. Object generated by cutting method
6 Conclusions As the BBSC have very good features as very sound mathematical representation and precise evaluation. It can be very proper to present 3D free-form objects.
592
R. Song et al.
Acknowledgments. I would like to thank Dr. Zhongke Wu, my supervisor, for his time, effort, and constant support during this research. His personal enthusiasm and serious attitude to research work benefit me well. I am also thankful to Xuefeng Ao, who gave me a lot of help in my research.
References 1. Zeleznik, R.C., Herndon, K.P., Hughes, J.F.: SKETCH: an interface for sketching 3D scenes. In: Proceedings of SIGGRAPH 1996, pp. 163–170. ACM, New York (1996) 2. Igarashi, T., Matsuoka, S., Tanaka, H.: Teddy: A Sketching Interface for 3D Freeform Design. In: SIGGRAPH 1999 Conference Proceedings, pp. 409–416 (1999) 3. Lee, H.J., Chen, Z.: Determination of 3D Human Body Postures from a Single View. Computer Vision, Graphics, and Image Processing 30, 148–168 (1985) 4. Taylor, C.T.: Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image. Computer Vision and Image Understanding 80, 349–363 (2000) 5. Davis, J., Agrawala, M., Chuang, E., Popović, Z., Salesin, D.: A Sketching Interface for Articulated Figure Animation. In: Proceedings of the ACM SIGGRAPH/Euro graphics Symposium on Computer Animation, pp. 320–328 (2003) 6. Tai, C.-L., Zhang, H., Fong, C.-K.: Prototype Modeling from Sketched Silhouettes based on Convolution Surfaces. Computer Graphics Forum (2004) 7. Karpenko, O., Hughes, J.F., Raskar, R.: Free-form sketching with variational implicit surfaces. Computer Graphics Forum (2002) 8. Blum, H.: A transformation for extracting new parameter of shape. Models for the Perception of Speech and Visual Form (1967) 9. Goodman, J.E., O’Rourke, J.: Handbook of Discrete and Computational Geometry. CRC Press, Boca Raton (1997) 10. Piegl, L., Tiller, W.: The NURBS Book. Springer, Heidelberg (1995) 11. Pottmann, H., Peternell, M.: Envelopes - computational theory and applications. In: Falcidieno, B. (ed.) Spring Conference on Computer Graphics 2000. Proceedings of the conference in Budmerice, Comenius University, Bratislava, May 3-6, 2000, pp. 3–23 (2000) ISBN 80-223-1486-2 12. Wu, Z., Seah, H.S., Zhou, M.: Skeleton based Parametric Solid Models: Ball B-Spline Surfaces. In: Proceedings of 10th IEEE International Conference on Computer-Aided Design and Computer Graphics, Beijing, pp. 421–424 (2007) 13. Wu, Z., Zhou, M., Wang, X., Ao, X., Song, R.: An Interactive System of Modeling 3D Trees with Ball B-Spline Curves. In: PMA 2006: The Second International Symposium on Plant Growth Modeling, Simulation, Visualization and Applications. Beijing (CHINA PR), November 13-17 (2006) 14. Prasad, L.: Morphological analysis of shapes. CNLS Newsletter 139, 1–18 (1997) 15. Seah, H.S., Wu, Z.: Ball B-Spline Based Geometric Models in Distributed Virtual Environments. In: Workshop towards semantic virtual environments, SVE, Villans, Switzerland, pp. 1–8 (March 2005) 16. Fei, G., Li, X.: An adaptable sketch-based modeling system. In: Computer-Aided Design and Computer Graphics Conference Proceedings, pp. 371–376 (2007) 17. Masry, M., Lipson Aug, P.H.: A Sketch-Based Interface for Iterative Design and Analysis of 3D Objects. ACM SIGGRAPH 2007 courses (2007) 18. Sheng, B., Wu, E.: Laplacian-based Design: Sketching 3D Shapes. The International Journal of Virtual Reality 5(3), 59–65 (2006) 19. Sun, L., Jin, X., Feng, J., Peng, Q.: Generate 3D Models from 2D Silhouette Curves Using Metaballs. In: CCVRV 2003 (2003)
BC(eye): Combining Eye-Gaze Input with Brain-Computer Interaction Roman Vilimek1 and Thorsten O. Zander2 1
Siemens AG, Corporate Technology, User Interface Design Otto-Hahn-Ring 6, 81730 Munich, Germany 2 Technische Universität Berlin, Chair of Human-Machine Systems, Team PhyPA Franklinstr. 28-29, 10587 Berlin, Germany [email protected], [email protected]
Abstract. Gaze-based interfaces gained increasing importance in multimodal human-computer interaction research with the improvement of tracking technologies over the last few years. The activation of selected objects in most eyecontrolled applications is based on dwell times. This interaction technique can easily lead to errors if the users do not pay very close attention to where they are looking. We developed a multimodal interface involving eye movements to determine the object of interest and a Brain-Computer Interface to simulate the mouse click. Experimental results show, that although a combined BCI/eyegaze interface is somewhat slower it reliably leads to less errors in comparison to standard dwell time eye-gaze interfaces. Keywords: Brain-Computer Interaction, BCI, multimodal, eye tracking, eyecontrolled applications.
1 Motivation With the idea of “eyes as output” Richard Bolt introduced eye-gaze input to facilitate human-computer interaction already in 1982 [1]. Since then, numerous studies were conducted on how to utilize the user’s eye movements for working with graphical user interfaces (GUIs). These studies have shown that eye tracking can be a successful means of controlling the mouse cursor and more (cf. for instance [2]). Since the first activities in this field, gaze control has become an accepted input modality. It has proven to be very intuitive, fast and especially useful in hands-free operation scenarios [3-5]. However, whereas moving the mouse cursor with eye movements is quite intuitive, it is not that easy to find a good mechanism for performing the click operation. Ideas like using eye blinks for activation were rejected already in the first studies on this subject as it is impossible for the user to exercise precise enough control over the blink reflex [6]. Most solutions are based on dwell times, i.e. the user has to fixate an item for a pre-defined period of time in order to activate it. This technique has to face the inherent problem of finding the optimal dwell time. If the dwell time is too short, C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 593–602, 2009. © Springer-Verlag Berlin Heidelberg 2009
594
R. Vilimek and T.O. Zander
click events will be carried out unintentionally leading to errors. If it is too long, fewer errors will be made but more experienced users will get annoyed. In our study we developed and evaluated a completely different approach: using a Brain-Computer Interface (BCI) to confirm object selections made by eye tracking. Brain-computer interaction and eye-gaze input can be regarded as complementary modalities in the respect that they compensate for each other’s disadvantages. The combination overcomes the BCI drawback of having problems differentiating between more than two commands because only one activation thought needs to be tracked reliably. If this activation works properly, a new solution to perform click operations in gaze-based interfaces can be established by providing an explicit, nevertheless not overt visible, command under complete user control.
2 Eye-Gaze Input and BCI This section outlines general properties of eye-tracking and BCI as input modalities. In detail, it describes the advantages and some of the major challenges when using these technologies individually and describes how a combination of the two in a multimodal interface may leverage their potential. 2.1 Gaze-controlled User Interfaces Eye-gaze interaction can be a convenient and – with certain restrictions – a natural addition to human-computer interaction. The eye gaze of humans is basically an indicator for a person’s attention over time [7]. For human-computer interaction this means that the mouse cursor and visual focus usually correspond to each other which implies an intuitive substitution of the conventional mouse control by eye movements. However, this rule does not always apply. The design of gaze-based systems has to consider unintentional fixations and sporadic dwellings on objects that typically occur during visual search or when people are engaged in demanding mental activity (cf. [8]). This fact is known as the “Midas Touch” problem: Although it may be helpful to simply look at an object and have the corresponding actions occur without further activity, it soon becomes annoying as it gets almost impossible to let the gaze rest anywhere without issuing a command [9]. The problem directly points to the challenge of defining the mouse click operation in gaze-controlled environments. In past research dwell-time based solutions proved to be the best technique that can establish an even faster interaction process than using a mouse [10, 11]. However, choosing a dwell time duration is always a trade-off between speed and accuracy. Furthermore, a well defined feedback informing the user about the current state of the activation progress is crucial, but can be difficult to design [12]. Even with adaptive algorithms, like e.g. shortening the dwell time period with growing user experience, one major problem remains: The system can not know whether the user fixates a command button for a long time because he wants to trigger an action or because the description is difficult to read, he reflects about the corresponding system action, he tries to understand the meaning of a complex icon… it is simply not possible to find a perfect relation between gaze duration and user intention. Thus, it will be beneficial to replace this implicit way of issuing a command with a more direct and controllable user action.
BC(eye): Combining Eye-Gaze Input with Brain-Computer Interaction
595
A different solution for the Midas Touch problem consists of providing the user with a manual control key for activation, a so-called gaze button (for details cf. [13]). This activation key has the same functionality like a mouse button and allows to click an object which has been selected by gaze tracking. The gaze button offers a greater amount of control for the users and reduces activation errors. It is especially useful for people with muscle diseases like muscular dystrophy who may not be able to finecontrol mouse movements but still can perform some simple movements. The disadvantage is, of course, that systems with a gaze button are no longer hands-free. 2.2 Brain-Computer Interfaces Brain-computer interfaces provide a unidirectional channel from human to computer without the involvement of any muscular activity. Contrary to most standard EEG analyses, BCIs isolate feature-patterns from online EEG data. Thoughts or intentions of activity as well as conscious or unconscious information processing evoke specific neuronal activity that can be detected as specific patterns in EEG. These patterns are extracted from the very noisy EEG signal by filtering techniques and methods of machine learning. As BCIs work in real time, no averaging over many trials is possible, so the challenge is to find the relevant pattern in only one single trial. Once the signal of interested is detected it can be used in two ways. Either BCIs allow the user to deliberately control system properties by brain signals. These so-called active BCIs, which enable the users to perform direct commands, are typically operated by forming the intention of a motor movement like imagining to move the right hand. Or BCIs work by recognizing specific mental states of the user like high workload peaks with the goal that the intelligent systems can adapt to the user’s current needs. At present most BCI research focuses on solutions for the medical care sector where significant contributions were made in assisting people with massively restricted motor abilities [14, 15]. These applications can be regarded as specialized high-end solutions for a relatively small number of users. Access to a mass market will be possible most likely for gaming devices. Having to wear head-mounted equipment and a relatively low degree of accuracy may be a less important factor when establishing completely new game experiences. However, most applications relying on active control suffer from the small number of available commands. BCIs typically can only differentiate between two commands as they analyze whether an imaginary movement is reflected in the right or left hemispherical primary motor cortex. Hence, the highest potential for BCIs lies in the realm of multimodal environments where one or two explicit commands can suffice and may significantly increase the overall system performance. In order to get the EEG data needed for using a BCI, electrodes are positioned on the user’s scalp. This is quite time-consuming; typically it takes about 20-30 minutes for 32 electrodes. Additional time is required to adjust the BCI itself. As the EEG signal varies considerably not only between users but also within one person at different times a classifier needs to be trained before every usage. Because of these side conditions, current BCIs are incapable for usage outside the laboratory. But researchers and industry are already working on new solutions. With the advance of more efficient algorithms, the training effort reduces steadily. Several groups are concentrating on the development of dry electrodes and mobile EEG systems that
596
R. Vilimek and T.O. Zander
allow usage in a broader range of environments (cf. [16]). Last but not least the high interest of the gaming industry in brain-controlled devices boosts development. Thus, although there are still difficulties, BCIs are a promising technology for HCI applications [17]. 2.3 Combining Eye-Gaze Input and BCI In the research reported here we decided to evaluate Brain-Computer Interaction as a supportive modality to eye-gaze input. Mouse movements, i.e. the selection of a target object on a GUI, are mapped to eye movements. A mouse click, i.e. the activation of a selected object, will only be carried out if the user fixates on an object and imagines a special movement of both hands at the same time. Obviously, there also exist other options in multimodal environments that are more stable, like speech or gesture control. However, a BCI offers the advantage of hands-free operation and does not demand any additional muscular activity or overt command. The combination of eye-gaze input and a BCI is especially suitable for demanding working environments like sterile operating conditions, when wearing protective clothing or if the working condition severely restricts the range of body movement. In contrast to voice control, other people in the same room do not get distracted and there is no interference with human-to-human verbal communication. The eye movements themselves, however, are quite a challenge for an EEG based BCI. The eye is a powerful dipole that disturbs the detection of the much weaker brainwaves. This means for the activation thought a pattern needs to be found that poses less weight on the frontal electrodes. The recognition algorithm needs to be able to deal with the noise produced by eye movements. In this investigation, we did not limit the scope to the context of assisting physically challenged people but have tried to learn more about the potential of a multimodal BCI/eye-gaze interface. This has several implications. First, anybody should be able to use the system after a short training session. Therefore, contrary to most experiments on BCIs, our participants had no working experience with a BCI before. Second, the replacement of dwell times by BCI need to prove to be a better solution to the Midas Touch problem by yielding lower error rates in the selection tasks. Task completion times should also be lower or at least comparable. Finally, using the new interface must at least be as convenient as a the gaze-based interface. Thus, the workload associated with BCI/eye-gaze interface may not be higher and using it should be preferred in comparison to conventional eye-tracking interfaces.
3 Experimental Evaluation of BC(eye) This experiment compares a BCI-based activation of targets in an eye-controlled selection task against two conventional dwell time solutions with different activation latencies. The study aims to determine the degree to which BCI can match or even exceed dwell time activation in respect to effectiveness, efficiency, and demands on cognitive resources in “clicking” the target stimulus.
BC(eye): Combining Eye-Gaze Input with Brain-Computer Interaction
597
Task difficulty in the selection task was varied by showing either simple visual stimuli with only a few random characters or by presenting more complex visual stimuli featuring a higher number of characters. Two different dwell times, short and long, were chosen for a better representation of the range of typical interaction situations with gaze-controlled applications. Assuming that signal extraction and pattern recognition of current BCIs still need a substantial minimal presence duration of the activation thought and that processing these signals takes additional time, it does not seem very likely that subjects will be able to complete tasks with the BCI faster. The question of interest here is whether they are significantly slower with a BCI than with dwell times. The activation thought via BCI is a conscious, explicit command – in contrast to the implicit commands of dwell time solutions. Thus, the error rate in the BCI condition should be substantially lower, especially for difficult selection tasks. 3.1 Methods Participants. Ten participants (five female, five male) took part in the present study. They were monetarily compensated for their participation. Their ages ranged from 19 to 36 years. Before engaging in the experiment subjects were screened for shortness of sleep, tiredness, and alcohol or drug consumption. All participants reported normal or corrected-to-normal vision. Tasks. The participants had to perform a search-and-select task. They were presented with stimuli consisting of four characters in the “easy” condition and seven characters in the “difficult” condition. The reference stimulus was displayed in the middle of the screen. Around this item twelve stimuli were shown in a circular arrangement, eleven distractors and one target stimulus, which was identical to the reference stimulus. The radial arrangement of search stimuli ensured a constant spatial distance to the reference stimulus. All search strings consisted of consonants only. The distractors shared a constant amount of characters with the target. Examples of the search screens are shown in Figure 1. WHQG CJYX
CTYHBPK CJLF
CJQX
JRLX
CJLX
QLTS
CTYHZPG
NCLZ
QJYX VMLC
QJVT
CBLV CJLX
KWNHZRM
CTYHZKG XTYHWPG
CTLHZPG
CTYHZPG
XTYHMPG
CTYJQPW VXYLSNG
BFYNKSG
FTYHZPQ CDJMZPG
Fig. 1. Examples for easy (left) and difficult (right) search tasks
598
R. Vilimek and T.O. Zander
Subjects had to select the target stimulus by either fixating it for the given dwell time or by thinking the activation thought. It was not possible to use standard suggestions for dwell time durations from literature (e.g. [4]), because the difficulty levels of the search task are not directly comparable to search tasks on a GUI in terms of absolute time needed for identification. Rather the tasks were chosen to be easily kept in working memory in the “easy” condition and to almost exceed its storing abilities in the “difficult” condition. To make sure that the dwell times match stimulus complexity, different versions were tested in pre-experiments. The selection criterion was that the short version is still well controllable and that the long activation latency is not perceived as slowing down the user. The short dwell time was 1.000 milliseconds, the long dwell time 1.500 milliseconds. For BCI activation, the participants had to imagine closing both hands to fists and then to turn them against each other like when wringing out a cloth by twisting it tightly. They were told not to involve any overt muscular activity. Apparatus. Brain data were registered using a 32 channel EEG system (Brain Products, actiCap). Electrodes were positioned according to the 10-20 system covering all relevant areas. Signal processing was focussed on the sensomotric areas C3 and C4. Grounding was established with electrode Fz. Eye movements were tracked with an infrared camera equipped remote eye tracker (SensoMotoric Instruments, iView X RED). Lighting conditions were held constant during the experiment. Design and Procedure. Two different levels of search difficulty (easy, hard) and three levels of activation technique (method of activation: dwell time short, dwell time long, BCI) were varied in a 2 × 3 within subjects factorial design. Participants went through the levels of the factor activation technique in separate blocks. The order of these blocks was counterbalanced. Subjects completed 30 trials per condition. The experiment itself took about 1 hour, the whole test procedure about 2.5 hours. Effectiveness was measured in terms of errors in task completion. Efficiency was defined as the time needed to complete a search task. Mental workload was assessed with the unweighted version of the NASA Task Load Index (Raw Task Load Index, RTLX) [18]. After making sure that all EEG electrodes were in place and working, additional EMG electrodes were attached to the participants’ arms to monitor for muscular activity. Before and during the technical preparations subjects received a general overview on the procedure of the experiment and their tasks. A complete and summarized presentation of the test setting was given afterwards. To finalize the preparation phase, subjects practiced using the BCI command and engaged in training the BCI classifier with a task that was very similar to the later search task. If the training was successful, a short calibration of the eye tracker followed and the experiment started. Each trial was terminated after 15 seconds if the participants were not able to locate the target stimulus. These trials were excluded from further analysis. The NASA TLX was filled in after each condition. At the end the participants had the opportunity to discuss their experiences with the experimenter and were asked to rate the activation techniques according to their preferences.
BC(eye): Combining Eye-Gaze Input with Brain-Computer Interaction
599
3.2 Results and Discussion Time needed for task completion and accuracy (data on errors) were averaged across all subjects for each selection method and level of search difficulty. Trials with errors were not included in the analysis of response time. First, an analysis of variance was conducted on the results. The alpha level for significance was chosen to be .05. In a second step, the data of the easy and difficult condition were pooled for each selection method. This allows to take a closer look in pairwise comparisons between BCI vs. long dwell time and BCI vs. short dwell time. To avoid any problems associated with multiple testing, differences will be regarded as significant with an alpha level of .025 for these comparisons.
Accuracy 100%
% correct
90% 80% 70% 60% 50% 40% 0% easy
difficult Condition
BCI
DTL
DTS
Fig. 2. Percentage of correct selections: Brain-Computer Interface (BCI), long dwell times (DTL) and short dwell times (DTS)
The accuracy data are summarized in Figure 2. The “easy” condition yielded 88.0% correct selections when using the BCI. In 93.8% of all tasks correct answers were produced in the “dwell time long” (DTL) condition, in the “dwell time short” (DTS) condition 83.8%. Fewer correct selections were made in the “difficult” condition. Remarkably, the BCI leads to the best results with 78.7% correct selections, although the difference to the long dwell time, 75.6%, is only marginal. The short dwell time condition, however, lead to a strong negative effect on performance as the percentage of correct answers dropped to 51.1%. This change in the result pattern in the difficult condition is reflected in a significant search condition × activation technique interaction (F(2,18) = 13.30, p < .001). An analysis of the main effects confirms general differences between the activation techniques (F(2,18) = 12.47, p < .001) and that the difficult search condition leads to more errors (F(1,9) = 38.37, p < .001). The pooled BCI accuracy average is 83.3% correct selections, the corresponding values for dwell time long and dwell time short are 84.7% and 67.4%. Pairwise t-tests reveal that the better performance of the BCI compared to “dwell time short” is significant (t(9) = 3.66, p = .005). The small differences between BCI and “dwell time long” is not reliable (t(9) = 0.33, p = .75). As expected, the BCI allows users to
600
R. Vilimek and T.O. Zander
activate (click) GUI items more precisely than a dwell time solution with short latencies. Long dwell times are suited for precise object activation but do not prove to be substantially better than BCI based selection. Task completion was fastest in both search conditions with short dwell times (easy: 3.98 s; difficult: 5.38 s). Next was dwell time long (4.79 s; 7.37 s), leaving BCI the slowest method of activation (5.90 s; 8.84 s). This general difference between the input methods is statistically confirmed (F(2,18) = 56.25, p < .001). The results are depicted in Figure 3. Task Completion Time 10 9
Time [s]
8 7 6 5 4 3 2 0 easy
difficult Condition
BCI
DTL
DTS
Fig. 3. Task completion times: Brain-Computer Interface (BCI), long dwell times (DTL) and short dwell times (DTS)
Looking at these data also shows that the difficult search task leads to longer search times, which is only of minor interest (F(1,9) = 102.38, p < .001). The significant search condition × activation technique reflects the larger differences between the selection techniques in the difficult compared to the easy condition (F(2,18) = 7.46, p < .01). The pairwise comparisons support the view that BCI selection was slowest (BCI: 7.37 s; dwell time long: 6.08 s; dwell time short: 4.68 s). These differences are significant (BCI – DTL: t(9) = 4.31, p = .002; BCI – DTS: t(9) = 13.57, p < .001). Overall the TLX results show no differences in workload between the activation techniques. On a scale ranging from 0 to 10 with higher values standing for higher workload, BCI yielded 4.7, DTL 4.6 and DTS 4.6 (F(2,18) = 0.18, p = .84). Judging on this basis BCI does not come at the cost of higher cognitive demands. In the preferences ratings at the end of the experiment, 9 out of our 10 participants preferred using the combined BCI/eye-gaze interface over the standard gaze-based interface.
4 Conclusions and Outlook Taken together, the state of technology allows to perform more accurate activations with BCI than with dwell time solutions with short latencies. Also quite remarkable is the
BC(eye): Combining Eye-Gaze Input with Brain-Computer Interaction
601
strong user voting preference for using a BCI instead of dwell times for the activation of selected objects. However, using BCI is still somewhat slower. Nonetheless, although statistically significant, the magnitude of the difference between BCI and the dwell time solutions is remarkably small. Therefore, BCI has successfully proven to be a real competitor for dwell time activation already at the current state of technological development. This clearly indicates that it is a forthcoming technology for multimodal interfaces indeed. Furthermore, integrating brain-computer interaction into a multimodal system opens up the option of using it as a means of direct input on the one side while simultaneously monitoring the user’s workload on the other side [19]. Thus, behind any work on BCIs also stands the vision of building more ergonomic work places with future UI technology [20]. Acknowledgments. We would like to thank Matthias Rötting and especially Matti Gärtner and Christian Kothe for their support in conducting the experiments. We also want to thank Sonja Pedell for helpful comments on an earlier draft of this document.
References 1. Bolt, R.A.: Eyes at the Interface. In: Proceedings of the 1982 Conference on Human Factors in Computing Systems, pp. 360–362. ACM Press, New York (1982) 2. Engell-Nielsen, T., Glenstrup, A.J., Hansen, J.P.: Eye Gaze Interaction: A New Media Not Just a Fast Mouse. In: Itoh, K., Komatsubara, A., Kuwano, S. (eds.) Handbook of Human Factors / Ergonomics, pp. 445–455. Asakura Publishing, Tokyo (2003) 3. Nilsson, S., Gustafsson, T., Carleberg, P.: Hands Free Interaction with Virtual Information in a Real Environment. In: Proceedings of COGAIN 2007, Leicester, UK, pp. 53–57 (2007) 4. Jacob, R.J.K.: What You Look at Is What You Get. IEEE Computer 26, 65–66 (1993) 5. Murata, A.: Eye-Gaze Input Versus Mouse: Cursor Control as a Function of Age. Int. J. Hum-Comput. Int. 21, 1–14 (2006) 6. Hutchinson, T.F.: Eye-Gaze Computer Interfaces: Computers That Sense Eye Positions on the Display. Computer 26, 65–67 (1993) 7. Kahneman, D.: Attention and Effort. Prentice Hall, Englewood Cliffs (1973) 8. Yarbus, A.L.: Eye Movements During Perception of Complex Objects. In: Riggs, L.A. (ed.) Eye Movements and Vision, pp. 171–196. Plenum Press, New York (1967) 9. Jacob, R.J.K., Legett, J.J., Myers, B.A., Pausch, R.: Interaction Styles and Input/Output Devices. Behaviour & Information Technology 12, 69–79 (1993) 10. Sibert, L.E., Jacob, R.J.K.: Evaluation of Eye Gaze Interaction. In: CHI 2000: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 281–288. ACM Press, New York (2000) 11. Jacob, R.J.K.: The Use of Eye Movements in Human-Computer Interaction Techniques: What You Look at Is What You Get. ACM Transactions on Information Systems 9, 152– 169 (1991) 12. Beinhauer, W., Vilimek, R., Richter, A.: Eye-Controlled Applications - Opportunities and Limits. In: Proceedings of Human-Computer Interaction International 2005. Lawrence Erlbaum Associates, Mahwah (2005)
602
R. Vilimek and T.O. Zander
13. Salvucci, D.D., Anderson, J.R.: Intelligent Gaze-Added Interfaces. In: CHI 2000: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 273–280. ACM Press, New York (2000) 14. Birbaumer, N., Ghanayim, N., Hinterberger, T., Iversen, I., Kotchoubey, B., Kübler, A., Perelmouter, J., Taub, E., Flor, H.: A Spelling Device for the Paralyzed. Nature 398, 297– 298 (1999) 15. Leeb, R., Friedman, M.-P.G.R., Scherer, R., Slater, M., Pfurtscheller, G.: Self-Paced (Asynchronous) BCI Control of a Wheelchair in Virtual Environments: A Case Study with a Tetraplegic. Computational Intelligence and Neuroscience 7, 1–8 (2007) 16. Gargiulo, G., Bifulco, P., Calvo, R.A., Cesarelli, M., Jin, C., van Schaik, A.: A Mobile EEG System with Dry Electrodes. In: IEEE Biomedical Circuits and Systems Conference. IEEE, Baltimore (2008) 17. Birbaumer, N.: Breaking the Silence: Brain-Computer Interfaces (BCI) for Communication and Motor Control. Psychophysiology 43, 517–532 (2006) 18. Byers, J.C., Bittner, A.C., Hill, S.G.: Traditional and Raw Task Load Index (TLX) Correlations: Are Paired Comparisons Necessary? In: Mital, A. (ed.) Advances in Industrial Ergonomics and Safety, pp. 481–485. Taylor & Francis, London (1989) 19. Cutrell, E., Tan, D.: BCI for Passive Input in HCI. In: Proceedings of ACM CHI 2008. ACM Press, New York (2008) 20. Zander, T.O., Kothe, C., Welke, S., Rötting, M.: Enhancing Human-Machine Systems with Secondary Input from Passive Brain-Computer Interfaces. In: Proceedings of the 4th International BCI Workshop & Training Course. Graz University of Technology Publishing House, Graz (2008)
Colorimetric and Photometric Compensation for Optical See-Through Displays Christian Weiland1, Anne-Kathrin Braun2, and Wolfgang Heiden1 1
University of Applied Sciences, Bonn-Rhein-Sieg, Germany [email protected], [email protected] 2 Fraunhofer Institute for Applied Information Technology, Sankt Augustin, Germany [email protected]
Abstract. Optical see-through displays are an established technology within augmented reality. Wearing such a display the users eyes automatically adapt to the luminance of the real world environment, while the virtual part is displayed using a steadybrightness. This often results in clear differences between real and virtual elements. This paper shows a technique for colorimetric compensation which avoids this effect. Furthermore algorithms for photometric compensation will be demonstrated. The appearance of background shapes and colours arise from the combination of the luminance of the background and the projected luminance of the object. These “ghosts” are photometrically compensated. Keywords: augmented reality, see-through display, colorimetry, photometry, compensation.
1 Introduction The human eye can perceive a huge range of luminance, with absolute levels ranging 1 : 1014 orders from nocturnal starlight to sunlight on a very bright day and the dynamic range of light can exceed 1 : 104 orders. Despite this wide range the human visual system is able to generate valuable information for us. It becomes even more astonishing as one single photo receptor is only capable of a dynamic range of about 103 orders. The human eye uses four mechanisms to handle different levels of luminance and adapt to the specific situation: the pupil, the rod and cone system, bleaching and regeneration of photo pigments and neural processes. The eye’s receptor cells called rods and cones respond differently to luminance levels: While the achromatic rods work at scotopic illumination levels of 10−6 to 10 cd/m2, the chromatic cones react at the photopic range of 0.01 to 108 cd/m2. In the mesopic range between 0:01 and 10 cd/m2 both receptors contribute to vision. All these mechanisms effect our perception of luminance and color. They usually show non-linear behavior and their functionality is still not understood completely. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 603–612, 2009. © Springer-Verlag Berlin Heidelberg 2009
604
C. Weiland, A.-K. Braun, and W. Heiden
A person wearing a see-through display will move around and encounter various scenes with various illumination levels, but an optical see-through display, does not consider the current adaptation of the eye. This causes a visual difference between the real and the virtual part of the viewed scene especially when the eye adapts to the real world while the virtual part remains uncorrected. In this paper a technique for photometric compensation is developed which measures scene-luminance with a camera and changes the virtual part of the scene to be adapted to the same luminance level, so that perception is uniform across the scene. The colorimetric compensation algorithm control the color of the virtual image per pixel. In this paper a technique is developed which uses a camera-recorded image to compensate radiance from reality in the virtual image. For realization, we attached a HDR camera to a head-mounted display. Despite this camera location shifts the center of gravity to the front the wearing comfort is still given and the see-through display does not slip. The user’s field of view is not limited by the camera as it is mounted high enough. The compensation algorithm was integrated into the VR/AR framework MORGAN [6].
2 Related Work There are several publications containing valuable information for this work. Nevertheless none of these covers the exact topic of colorimetric or photometric compensation for see-through displays, so this work is considered to be a novelty. Most information for the colorimetric part arises from the area of tone reproduction and realistic image synthesis. Photometric compensation for see-through displays was not developed yet, but there is literature covering compensation for video projectors, which cannot be applied directly. Ferwerda et al. [2] present an advanced model based on psychophysical data from experiments, which produced a threshold-versus-intensity function. This function describes the threshold luminance for which a flashed disk is visible on a background of lower luminance. Another experiment analyzed the changing sensitivity or the rods and cones under different levels of luminance intensity. The results of both experiments are important parts in the tone reproduction system. The model includes four eye adaptation mechanisms, which are receptor system behaviour, pigment bleaching and fast and slow neural processes. Ferwerda thankfully provided images from his work which are used for box filter calibration in colorimetric compensation.
Fig. 1. Tone reproduction images from Ferwerda et al. [1]
Colorimetric and Photometric Compensation for Optical See-Through Displays
605
Irawan et al. [3] base their method on Ferwerda's work and extend the model. They set their focus on processing HDR image streams and introduce an operator capable of computing adaptation from a single starting point of time. The main part of the operator is an exponential decay function which mimics biological behaviour. There is work on photometric compensation for video projectors done by Bimber et al. [1]. They compensate the virtual image emitted by the projector by considering the environmental light and the surface material. Bimber implemented a version for smart beamers with shaders which is real-time capable. The main difference to seethrough displays is the direct interaction of emitted light and the surface which allows manipulation of the result. For see-through displays the interaction is independent from the material term. Another interesting solution in hardware is introduced by Kiyokawa et al. [5]. They mount a LCD display directly in front of the combiner and by opening and shutting its pixels can block the background light.
3 Compensation 3.1 Colorimetric Compensation The colorimetric compensation system is based on the model of Ferwerda et al. [2] for tone reproduction to compute an image under certain levels of eye adaptation. Since Ferwerda's model is too complex to be computed in real-time we developed an approach to estimate the effects by image filters, e.g. Gaussian or Box filters. The virtual image is read from the current framebuffer and the optimized filter is applied on it to achieve effects close to those Ferwerda's model would have produced. The result is written back to the buffer and displayed by buffer swapping.
Fig. 2. Colorimetric compensation applied on the Stanford Dragon. With increasing luminance sharpness, brightness and chrominance are modulated accordingly. (The negative pattern arises from photometric compensation.)
The approach consists of six steps shown in figure 3. The mapping experiment, the sampling and the time adaptation determine the luminance level the user's eye is adapted to. Essentially is a mapping from RGB pixel value to a luminance value, which is created in an offline experiment. Using the results from the mapping the environmental image is sampled. Therefore, a number of sampling points are updated each frame and their corresponding luminance values are added. The sum is divided by the number of samples to get the mean luminance in the image.
606
C. Weiland, A.-K. Braun, and W. Heiden
Fig. 3. Overview of compensation algorithm
Sampling is implemented for rods and cones independently. The coordinates of the sampling points are determined randomly at start-up time. The cone points are distributed over a small region at the center of the captured image, while the rod points are uniformly distributed over the entire image. The number of sampling points is a critical parameter. Using too many points, sampling the pixels would lead to considerable time costs. Yet if chosen too low, deviations and per-frame jitter would occur. To solve this problem a sample buffer is used as suggested in [8]. This buffer is divided into several slices, of which one per frame is updated. The sums of all samples of a buffer are held and only the sum for the updated slice is recomputed. This technique allows usage of a sufficient high number of sample points while at the same time causing minimal computation costs and reducing jitter signicantly. In the final version of this work, 80 sample points for the rod and cone system each are used, divided into four slices. Higher numbers up to 1000 samples are tested which lead to more stable mean values but higher computational costs. After the sampling step, the environment luminance is known and can be used to compute the eye's adaptation level including time effects. Originating from the time adaptation model presented in [3] and the time courses in [2] the form of the time adaptation function is known. Time course is modeled by an exponential decay function and for the intensity of adaptation a scaling factor from the difference between adaptation and measured luminance is included [3]. Another important feature is the blindness resulting from rapid changes in the environment luminance. The eye adaption function above compute the alignment of the current adaptation state to the measured luminance level, but do not consider the effects a rapid luminance change causes in the human eye. These effects result in a temporary shift to the monochromatic domain for the whole image. For simulating this effect a combination of two parameters is developed in this work and integrated in the colorimetric compensation system. The first parameter will recover data by a reverse lookup of the mapping experiment, which allows a simple transformation from the measured luminance to a RGB value of which the arithmetic mean is taken. By this lookup a luminance value can be extracted from the measured luminance. As this parameter depends not on the eye adaptation level but on the measured luminance it gives a value that is independent on whether the measured luminance is larger or smaller than the adaptation level. The second parameter describes the power how the first parameter influences the current pixel color.
Colorimetric and Photometric Compensation for Optical See-Through Displays
607
The further steps described in figure 3, the filter experiment, the parameter determination and the filtering, modify the virtual image to adjust sharpness, brightness and chrominance. In an offline filter experiment the parameters for steerable filters will be evaluated. Therefore a reference image with 1000 cd/m² is chosen as this luminance level is nearest to the level of an uncompensated virtual image. Taking this image as source, the other images at 10, 0.04 and 0.001 cd/m² are computed. The reduction of sharpness is effected naturally by any normalized Gauss or Box filter, controlled by the parameter size. The pixel intensity can also be easily influenced by multiplying the whole kernel by a factor scale. Chrominance is more complex as the pixel luminance must be preserved. Therefore another factor e is introduced as the different color channels contribute differently to luminance perception in the human eye, e.g. green is much better perceived by the cone receptors (the empirically found final values are: er = 0.35, eg = 0.59 and eb = 0.06). A simple Euclidean distance function between reference image pixel and filtered image pixel gives a criteria for the similarity of the images and thus for the accuracy of the current parameters.
Fig. 4. This series of colorimetrically compensated images shows the color dampening effect from full monochromatic level (left) in a scotopic environment to fully restored color (right) in a photopic environment
This approximation works remarkably well for Gauss and Box kernels, especially with the 10 cd/m² image, where no visual difference can be detected by observers. While at the darker images brightness and color intensity can be achieved to a not noticeable degree, sharpness cannot be reduced well as the form of the filter kernels leads to visible artifacts and higher distances. One essential parameter is the kernel size. For the 0.001 cd/m² image it reaches for both Gauss and Box kernels values around 20. With optimizations techniques the computational complexity of Box filters can be made independent of the kernel size, which is the deciding argument for Box and against Gauss filters in the final system. From the extracted luminance values the parameters which steer the image filter have to be computed. From the filter experiment the parameters size, scale and damp for every of the source images are known. For every measured luminance value between these parameters has to be interpolated. As final solution a linear interpolation is chosen, which considers only the image with the next higher and next lower luminance.
608
C. Weiland, A.-K. Braun, and W. Heiden
In the final step of colorimetric compensation the parameters from the former step are applied to a Box filter which is used to change the virtual backbuffer image to be adapted to the environment luminance. Though Gauss filters give better visual results the possibility to accelerate box filters and to decouple their runtime complexity from the kernel size is essential for a real-time application. Therefore box filters are chosen for the final system. 3.2 Photometric Compensation The photometric compensation takes place after colorimetric compensation. Inputs are the (colorimetrically compensated) virtual image and the background image which have to be aligned with pixel accuracy. We developed several compensation approaches: the trivial compensation, the subtraction compensation, the simple and the advanced smooth compensation. The trivial compensation approach is based on work for background compensation for video projectors ([1]). The computation step is to subtract the background from the virtual part pixel per pixel. The main advantage of this technique is its simplicity and therefore its speed. The worst-case occurs when all background color channel values are larger than the virtual values (figure 5), which renders the resulting pixel black. Though this can hardly be seen for single pixels, a bright background area can cause the virtual part to be completely black and thus invisible. Hence the trivial approach has serious problems with preservation of the virtual part with both luminance and contrast. Background luminance can be compensated well for dark backgrounds but this effect decreases with increasing luminance To avoid the shortcomings of the former approach, in the subtraction compensation the virtual part is introduced a base and only a fixed part is made dependent of the background. The constant parameter k denotes the fraction of the base. This small change weakens the disadvantages of normal Trivial Compensation and provides a decent compensation operator already. Background luminance is compensated to a lesser degree as in Trivial Compensation due to the base k. Contrast and luminance of the virtual part are maintained to some extent, yet the luminance of the virtual image is darkened for every k < 1 and bi > 0. Thus if the virtual image is dark itself and the background is bright, ri may become not noticeable anymore. To cope with this problem, the operator is rewritten to:
ri = k v * vi − k b * bi
(1)
Here both parameter k are made independent and can also be greater than 1. Adequate values found are 1.15 for kv and 0.25 for kb. Yet optimal values can vary dependent from virtual scene and environment. The difficulty of this approach lies in finding good parameter k that matchs not only one single setup of virtual objects and surrounding but many. In the experiments of this work no suitable parameter k are found that are robust to all setups.
Colorimetric and Photometric Compensation for Optical See-Through Displays
609
Fig. 5. This image shows the weakness of Trivial Compensation: Parts of the face disappear due to the brighter background
The simple smooth compensation arisen from the experiences with the former methods fulfill several demands on a compensation algorithm: It has to fulfill the four quality criteria and additionally must have smooth transitions between its adjustments to all parameters. Thus a new formula is created as fraction depending on the background value combined with the maximum intensity value max. This results in a smooth transition factor s which served as multiplier for the virtual part. max si = bi + max (2) ri = si * vi This compensation formula increases linearly with vi and due to s decreases smoothly with rising bi. By this behavior higher visual comfort for the user is achieved. Because of this smoothness the compensation of the background is weaker compared to e.g. Subtraction Compensation. Virtual luminance and contrast are preserved with reasonable success. The complexity of the formula is as about as simple as in the former algorithms One disadvantage of the used equation is that the pixel value is always decreased. Another point is that s is independent of the virtual part as vi is not included in the formula, thus there is no additional scaling with respect to vi. The Simple Smooth Compensation method already achieves reasonable results, but has the main weakness in not including the virtual part in s. This is changed with the advanced version as the new formula is a fraction of sums of differently weighted virtual and background values, resulting in a more complex s: si =
c * (max − vi ) + bi (max − vi ) + d * bi
(3)
ri = si * vi
c and d are weights used to adjust the slope of s. In this work c = 1.4 and d = 2.0 are found appropriate values. For c > 1 the factor s can exceed 1 which is an intended increase of the virtual part. By inverting the virtual value in s at (max - vi) the slope of the function decreases for higher values of vi, which is a desired effect. This compensation responds smoothly to changing v and b both, which achieves good visual comfort for the user. The main advantage of this operator is its robustness
610
C. Weiland, A.-K. Braun, and W. Heiden
as it responds smoothly to the environment luminance and scales with the virtual pixel intensity vi. Nevertheless to achieve best results the parameters c and d can be adjusted to a specific setup. For c = 1.4 and d = 2.0 virtual luminance and contrast are preserved well except for extreme values of vi and bi. Despite its complexity the formula does not significantly slow down the system compared to the former algorithms. Weaknesses originating from the formula appear at the minimum and maximum values. For b ≈ 0 the factor s reaches its peak c, which leads to a increase of the virtual part for a c > 1. Experiments showed that even through this raised virtual value the falsification is unnoticed during use and virtual contrast is preserved. Nevertheless for high-luminance virtual images there is the possibility for contrast loss. For b → max s reaches 1/d which results in a reduction of r. To avoid exaggerated darkening of the image d should be chosen between 1 and approximately 4.
4 Results The colorimetric compensation system described in this paper revises the visual part of an image to the adapted eye and thus improves visual quality for the user. The delay occurring from updating the ring buffer effects the time function only on slower machines as it is linked to the frame rate. Future versions may couple it instead to the passed time made independent from the kernel size which allows application of large kernel without time slowdown. This acceleration method is a major key to realtime. The exaggeration of edges in the image coming from the square form of the box filter is barely visible to the user. Irawan's model of adaptation over time [3] could not be reconstructed completely, but from its form a similar own model is derived. One disadvantage of all time adaptation models is a small frame-rate dependent error. The different photometric compensation operators introduced differ in quality of compensation for different virtual contrast ranges, level of background luminance and applicability. Trivial Compensation is a basic approach that can work for video projectors [1], but can cause severe image information loss for see-through displays. Especially for bright backgrounds parts of the virtual content can vanish. Though it shares form of the equation with Trivial Compensation, Subtraction Compensation gives the best results and satisfies all quality criteria well. On the contrary it has to be configured for a specific setup. Virtual objects and scene luminance are supposed not to change much as otherwise quality decreases. Therefore Subtraction Compensation is the best choice for setups with known and limited content. Figure 6 shows the difference between a compensated and an uncompensated image. In the uncompensated image (on the left side) the colored pattern is visible all over the face region and the grid shines through the image. Especially the yellow field behind the mouth and the light red field behind the left cheek appear disturbing. Subtraction Compensation is applied to the same virtual image and shown on the right side. One major compensation effect can be seen in the forehead, eye and cheek region where the background grid is no longer apparent. The colored background fields are also barely visible anymore. These regions are all bright and there the compensation operator functions best. In the mouth region virtual luminance is lower and the background becomes more visible. For the bright yellow field at the mouth red
Colorimetric and Photometric Compensation for Optical See-Through Displays
611
Fig. 6. Images captured by a webcam through the see-through display. Left is the uncompensated image, the right one is compensated with Subtraction Compensation
and green (which are composite colors of yellow) are subtracted partially. Together with the strong blue shift of the webcam this causes the field to appear purplish. Another effect is a gain in the virtual image's contrast. Especially darker regions like the temples are enhanced. This effect originates in the fact that the compensation operator subtracts the background independent of the virtual pixel value and thus darker pixels are reduced more. Both Smooth Compensation operators preserve virtual luminance and contrast and compensate the background but not as strong as a scene-optimized Subtraction Compensation operator. Their advantage is that they function over a wide range of luminance levels and need no recalibration. In direct comparison the normal version is faster while the advanced version compensates better due to inclusion of the virtual value in the equation.
5 Conclusion For colorimetric compensation more measurements are needed to provide more sample values for interpolation. Usage of a luminance meter that can be configured for scotopic, mesopic and photopic range would lead to very exact pixel-to-luminance mapping and allows independent computation of rod and cone adaptation. Another task is the development of the hardware system. Integration of the camera into the see-through display could be implemented using a beam splitter. This improvement would not only increase weight of the system and thus wearing comfort, but also solve the problem that the viewing distance has to be known. Adaptation over time is another task for future research. More psychophysical experiments, models and equations for the time course are needed to improve the colorimetric compensation system. A special focus has to be laid on rapid luminance changes and their influence on later eye adaptation effects. This focus is necessary for a quickly changing environment as it may occur in real-time applications with a moving user. An advance is also possible by new time course equations which reduce the per-frame error.
612
C. Weiland, A.-K. Braun, and W. Heiden
For photometric compensation there are two directions of research. The first one is to develop further advanced per-pixel operators. Both smooth operators are expected to be good examples as they are the most robust operators tested. A second way is to evaluate effects and speed of operators that include neighborhood information or even information about the whole image. A strong decrease of speed is expected. Acceleration can not only be done with parallel processing, but also on the graphics processor. Its parallel shader units and the native vector operations have great capability to accelerate the system. This would also omit reading and writing from the GPU backbuffer. A distribution on several computers is possible as well, though the network speed is assumed as bottleneck. Communication has to be bidirectional. For colorimetric compensation scene information and environment luminance have to be send to the slave computers and the results read back. For photometric compensation part of the background has to be send to each slave as they need the background information for the compensation operator and the results read back. The transfer is supposed to slow down the system and limit the frame rate.
References 1. 2. 3.
4. 5.
6.
7. 8.
Bimber, O., Emmerling, A., Klemmer, T.: Embedded entertainment with smart projectors. Computer 38(1), 48–55 (2005) Ferwerda, J., Pattanaik, S., Shirley, P., Greenberg, D.: A Model of Visual Adaptation for Realistic Image Synthesis. ACM Transactions on Graphics, 249–258 (1996) Irawan, P., Ferwerda, J., Marschner, S.: Perceptually Based Tone Mapping of High Dynamic Range Image Streams. In: The Eurographics Association 2005: Eurographics Symposium on Rendering (2005) Jarosz, W.: Fast Image Convolutions. In: SIGGRAPH Workshop (2001) Kiyokawa, K., Kurata, Y., Ohno, H.: An optical see-through display for mutual occlusion of real and virtual environments. In: Proceedings. IEEE and ACM International Symposium on Augmented Reality 2000 (ISAR 2000) (2000) Ohlenburg, J., Braun, A., Broll, W.: Morgan: A Framework for Realizing Interactive Realtime AR and VR Applications. In: Proceedings of the Workshop on Software Engineering and Architectures for Realtime Interactive Systems (SEARIS) at IEEE Virtual Reality 2008 (VR 2008) (2008) Pattanaik, S., Ferwerda, J., Fairchild, M.: A Multiscale Model of Adaptation and Spatial Vision for Realistic Image Display. In: Siggraph 1998 Conference Proceeding (1998) Stamminger, A., Scheeland, M., Seidel, H.-P.: Tone Reproduction for Interactive Walkthroughs. In: The Eurographics Association 2000 (2000)
A Proposal of New Interface Based on Natural Phenomena and so on (1) Toshiki Yamaoka1, Ichiro Hirata1, Akio Fujiwara1, Sachie Yamamoto1, Daijirou Yamaguchi2, Mayuko Yoshida2, and Rie Tutui1 1
Wakayama University, Faculty of Systems Engineering Sakaedani 930,Wakayama City, Japan [email protected], [email protected], {s5048,[email protected]}, [email protected], [email protected], [email protected]
Abstract. This study aimed at creating new user-interfaces based on natural phenomena, objects, accustomed manners and so on. At first, literature and script survey was conducted to get framework of the study. Next, 33 places like famous garden, castle, temple and so on in Japan were surveyed to get clue of creating new user-interfaces. New user-interfaces were created based on a lot of collected data. The three selected user-interface design were visualized to evaluate from the viewpoint of usability and emotion and so on. These userinterfaces were evaluated highly. Keywords: user-interface, natural phenomena, manners, behavior, observation.
1 Introduction Occasionally user-interface designers have created a new interface imaging or referring human or animal action, and natural existence like tree, flower and river and so on. So, imaging or referring them is very important for their designing. For an example, the user-interface of iPod was designed based on human action which is used in daily life. However, these user-interfaces based on human behavior and natural phenomena have not examined systematically. In the study new findings are discovered in any places from Hokkaido island (Northern island) to Okinawa island (Southern island) in Japan.
2 Literature and Script Survey A lot of literature was collected in order to survey Japanese manners, customs and design in Japanese architecture, garden and so on in association with user-interface. The findings of the survey were classified into 3 groups below. 1. Accustomed manners friendly a. dwelling b. utensil c. behavior, manners 2. Natural phenomena 3. Movement, behavior of plants and animals C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 613–620, 2009. © Springer-Verlag Berlin Heidelberg 2009
614
T. Yamaoka et al.
As next step, a lot of script were collected by discussing. The script means knowledge as to a series of behavior under a situation. The script were classified into 2 groups. 1. Daily script: cooking, bathing, taking a train, eat in restaurant and so on 2. Script without daily life: going to amusement park, staying at hotel and so on A new user-interface can be created by the above mentioned findings. A good example is shown below. Example: There is two kinds of slope on a hill of shrine in Japan. One is called as the Man slope which is steep, and another is called as the Woman slope which is gentle. It does not take much time to go over the Man slope. On the other hand, much time to go over the Woman slope. This means the slope was designed according to ability of a human. The fact suggests that some kinds of user-interface is needed according to user’s skill.
3 Field Survey and New User Interface Accustomed manners and objects in a field survey were collected. 33 surveyed places are as follows. 1. 2. 3. 4. 5.
Hokkaido: Asahiyama zoo Tokyo: famous park, garden, temple, shrine, department store and so on Kyoto: famous temple, shrine, castle, market and so on Kyuusyuu: famous garden, shrine, museum, castle Okinawa: former Nakijin castle, the Ocean Expo Park, market and so on
3.1 Exploring User Interface Based on Accustomed Manners and Objects The collected data were classified into 4 groups below. 1. Architecture: entrance, stairs, corridor, wall, window, door, roof and so on. Change, emphasis and flexibility of space were observed. 2. Utensil, object: clothing, sign, vehicle, signboard, display, product and so on. Elements that show user’s situation like distance to goal and direction were observed. 3. Behavior, manners: while eating, customs, hospitality, wording and so on. Method or clue to provide information was observed. 4. Environment: garden, tree, sound, smell, water, road, space and so on. Elements guided user like navigation, emphasis, informing present location were observed. A lot of user-interfaces were created based on the above mentioned 4 groups. Six examples are shown below. Collected data were analyzed from the viewpint of “scene”, ”components” and “user-interface”. 1. Example1 a. a scene: There is two observing routes in a hall of the Asahiyama zoo. b. components: two different routes c. a new user-interface: select the route according to situation.
A Proposal of New Interface Based on Natural Phenomena and so on (1)
Fig. 1. Two routes displayed at the entrance
615
Fig. 2. A new user interface
2. Example2 a. a scene: the site of former Nakijin castle b. components: combination of 3 steps + landing +5 steps +landing +7steps at stairs c. a new user-interface: operate rhythmically without monotony
Fig. 3. The site of former Nakijin castle.
Fig. 4. A new user interface
3. Example3 a. a scene: window of temple b. components: cut off a part of scene from scene c. a new user-interface: focus on the emphasized part
Fig. 5. A window of temple
Fig. 6. A new user-interface
616
T. Yamaoka et al.
4. Example4 a. a scene: look up a leopard in a cage of the Asahiyama zoo b. components: look up animals c. a new user-interface: look an object from various viewpoint
Fig. 7. Look up animals
Fig. 8. A new user-interface
5. Example5 a. a scene: go through a shop curtain b. components: be able to see the inside c. a new user-interface: see next page partly
Fig. 9. Shop curtain
Fig. 11. A picture1 scroll
Fig. 10. New user interface
Fig. 12. A new user interface
A Proposal of New Interface Based on Natural Phenomena and so on (1)
617
6. Example6 a. a scene: look at a picture scroll b. components: a scroll c. a new user-interface: display all information on one page 3.2 Exploring User Interface Based on Natural Phenomena A lot of data were collected in the field survey. The collected data were classified into 4 groups below. 1. mountain: marvelous shape of rock, surface of mountain, summit of mountain, Gently-sloping hill and so on. Elements that inform user of present location and season’s change were observed. 2. sea: vast expanse of sea, water’s edge, surface of water, sound of water, water vapor, steam, sandy beach, shore and so on. Movement like wave was observed. 3. sky: cloud, rainbow, blue sky, sea of cloud, morning sun, sunset and so on. Elements about lapse of time, direction and weather were observed. 4. he others: weathering, sunlight filtering down through trees, sunny place, smoke and so on. A lot of user-interfaces were created based on the above mentioned 4 groups. Two examples are shown below. Collected data were analyzed from the viewpoint of “scene”, ”components” and “user-interface”. 1. Example1. a. a scene: summit of mountain and mountain trail b. components: sky, trees, trees and plants, open space c. a new user-interface: flexible interface
Summit of mountain
bellows
Obscure zone
Mountain trail Fig. 13. Mountain trail
2. Example2 a. a scene: surface of water b. components: surface, light, plants c. a new user-interface: navigated user-interface
tag Fig. 14. New user interface
618
T. Yamaoka et al.
Fig. 15. Surface of water
Fig. 16. New user interface
3.3 Exploring User Interface Based on Movement and Behavior of Plants and Animals A lot of data were collected in the field survey. The collected data were classified into 4 groups below. 1. Dynamic movement and behavior with movement. Elements with speed and movement were observed. 2. Dynamic movement and behavior without movement. Elements which inform active, understanding of response, etc were observed. 3. Static movement and behavior. Static situations were observed. 4. Plants. Elements which display a situation and change were observed. A lot of userinterfaces were created based on the above mentioned 4 groups. One example is shown below. Collected data were analyzed from the viewpoint of “scene”, ”components” and “user-interface”. 1. Example a. a scene: fishes gathered to get bait b. components: a lot of fish, gather c. a new user-interface: gather information which are related.
Fig. 17. Fish gathered to get bait
Fig. 18. New user interface
A Proposal of New Interface Based on Natural Phenomena and so on (1)
619
4 Evaluation of New User Interface Three user-interfaces were selected in order to evaluate.
Fig. 19. Design 1 A new user-interface with scroll function
Fig. 20. Design2 A new user-interface with direction
Fig. 21. Design3 A new user-interface gathered related information
1. Method. The three user-interfaces were visualized as real interface of products. Participants: 16 persons (male:2, femail:14) businessman, housewife. Method: Participants answered questions looking at the user-interface design. Questionnaire: items regarding layout, color, operation, easy to see, usability characteristics of a new user-interface 2. Results and discussion. Design1. Generally, design1 has good results. Especially, the emotional aspect is evaluated highly. Design2. While interface aspects are evaluated highly, the navigation is not evaluated. Although the navigation is good idea, the design is not so good. Design3. The interface of design3 is evaluated highly because of new function and the convenience.
620
T. Yamaoka et al.
5 Conclusion User-interface focused on screen like operational panel becomes very important in any products. This study is aimed at creating new user-interfaces based on natural phenomena, object, accustomed manners and so on. As we’ve lived in nature, it is very reasonable to apply the dispensation of nature to an user-interface. And, by observing manners and objects influenced by natural climate, we can construct an userfriendly interface easily. As animal and plants makes us calm down, factors of the calm cause emotional interface which is familiar to us. A lot of new and original userinterfaces were created based on natural phenomena, object, accustomed manners and so on in this study.
Managing Intelligent Services for People with Disabilities and Elderly People Julio Abascal1, Borja Bonail2, Luis Gardeazabal1, Alberto Lafuente1, and Zigor Salvador3 1
Laboratory of HCI for Special needs. University of the Basque Country/Euskal Herriko Unibertsitatea. Manuel Lardizabal 1, 20018 Donostia. Spain {julio.abascal,luis.gardeazabal,alberto.lafuente}@ehu.es 2 Fatronik. Paseo Mikeletegi, 7 - Parque Tecnológico. 20009 Donostia. Spain [email protected] 3 CIC Tourgune. Paseo Mikeletegi, 56. 20009 Donostia. Spain [email protected]
Abstract. Ambient Supported Living systems for people with physical, sensory or cognitive restrictions have to guarantee that the environment is safe, fault tolerant and universally accessible. In addition it is necessary to overcome technological challenges, common to ubiquitous computing, such as the design of a middleware layer that ensures the interoperability of multiple wired and wireless networks and performs discovery actions. On top of that the system has to provide efficient support to the intelligent applications designed to assist people living there. In this paper we present the AmbienNet architecture designed to allow structured context information to be shared among the intelligent applications that support people with disabilities or elderly people living alone. Keywords: Supportive Ambient Intelligence, Users with disabilities, Elderly people, Ambient Assisted Living.
1 Introduction The Ambient Intelligence concept is being successfully applied to the development of supportive environments for people with disabilities and elderly people, under the framework of Ambient Supported Living. These environments have to meet several conditions in order to be truly helpful to and usable by people with physical, sensory or cognitive restrictions. For instance, accessibility barriers to interaction must be avoided. In addition, due to the fact that many users will depend on the system, it must be safe and fault tolerant. From the technological point of view, the system must be able to handle heterogeneous wired and wireless networks, embedded processors and sensors by means of a well-designed middleware layer. Furthermore, the system must allow the efficient processing of the intelligent applications that provide the actual support to users. Therefore special attention must be paid to the design of environments oriented to efficiently support intelligent applications, taking into account that each context-aware C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 623–630, 2009. © Springer-Verlag Berlin Heidelberg 2009
624
J. Abascal et al.
application has to process a large part of the huge amount of data collected by a great number of sensors. In addition, the information produced by each application (processing the information coming from sensors) in order to obtain knowledge to be used for its own purposes can be useful for other applications. This is the case, for instance, with adaptive user interfaces for intelligent environments [1] that have to create and maintain models of the user, the task and the environment, which can be shared with supportive intelligent applications that use models of the user, task and environment in order, for instance, to issue alarms or warnings when the user is trying to perform a task in an inappropriate place or at an inappropriate time [2]. Even if processors are currently able to process enormous quantities of data, ubiquitous computers can have a limited quantity of memory and processing capacity, which justifies the effort to minimize inefficient repetitions. In the following sections we will describe the technological AmbienNet approach to support applications and to allow structured information sharing among them.
2 The AmbienNet Environment AmbienNet is a research project that aims to study, design and test a supporting architecture for Ambient Intelligence environments devoted to people with disabilities and elderly people living at home or in institutions. The AmbienNet infrastructure is currently composed of two wireless sensor networks, a middleware layer and a number of smart wheelchairs provided with range sensors [3]. The following subsections briefly describe them. 2.1 Sensor Networks A key aspect of Ambient Intelligence is the deployment of diverse types of sensors throughout the environment. These are able to provide information about different physical parameters, such as temperature, light intensity, distance, presence/absence, acceleration, etc. The progress experienced in recent years by sensor technology allows cheaper and more accurate sensors, enabling the inclusion of a large number of them. For this reason the use of Sensors Network technology has been proposed in order to be able to efficiently handle them. AmbienNet uses two wireless networks based on Zigbee [4], which is currently the most widespread technology for sensor networks. Zigbee is chosen due to its advantages such as low power consumption (especially important for the mobile devices: wheelchairs, hand-held devices, etc.), large number of devices allowed, low installation cost, etc. Furthermore, in public buildings with relatively frequent re-partitioning of space (administration or commercial buildings, hospitals, residences) devices can easily be moved and reconfigured. Indoor Location System. The AmbienNet indoor location system is devoted to locating people living in an institution. This system has two main functions: (a) to monitor the current position of fragile elderly people and (b) to locate care personal in order to call the closest and most appropriate person when urgent help is required. People are located only if they provide their informed consent [5].
Managing Intelligent Services for People with Disabilities and Elderly People
625
The system has two levels of accuracy. In the proximity mode, a rough location (typically the room) of the target is computed. This operating mode requires just one or two beacons in every room from which we want to provide location information. The second one is the multilateration mode where location of the target is computed accurately by measuring the ultrasound times of flight. This mode requires some additional beacons (5 to 8) to enable multilateration and deal with non-line-of-sight errors. The indoor location system is able to locate people wearing a discrete tag, with an accuracy of up to 10 centimetres [6]. Currently studies are being developed in order to include an accelerometer in the tag, which could prove valuable for fall-detection. Image sensor network. The current availability of inexpensive, low power hardware including CMOS cameras makes it possible to deploy a wireless network with nodes equipped with cameras acting as sensors for a variety of applications. AmbienNet includes a wireless sensor network that performs detection and tracking of a mobile object taking into account possible obstacles. A network of small video cameras with simple local image processing allows the detection of static and mobile items such as humans or wheelchairs [7,8]. The images are processed locally in order to reduce the data transmission just to valid data. Ceiling cameras are used to detect objects that can be distinguished from the floor, shadows, etc., using simple and well-known image processing algorithms. The embedded microcontroller of each camera computes a grid map of the room with cells corresponding to small areas of 10-20cm. These areas may have two possible values: “occupied” cells correspond to obstacles as detected by the image processing algorithms; “free” cells represent the space where mobile objects and people can move. 2.2 Middleware The middleware is a key part of the ubiquitous computing environment [9]. After studying the infrastructure requirements of general pervasive health care applications [10] three specific requirements for the applications of AmbienNet have been identified: (1) integration of heterogeneous devices, (2) provision of device discovery mechanisms, and (3) management of context information. Adopting a layered architecture the middleware layer is able to abstract device-dependent features and to provide a homogeneous interface to the upper layers of the system, minimizing software complexity. Context in pervasive applications is related to high-level, user-centred information. Since context information is to be extracted from raw sensor data, an efficient software infrastructure has to process the sensor data and extract higher-level information and make it available to the application level. In addition, a flexible and powerful model is required for context representation. In addition to the role of efficiently integrating and dynamically managing heterogeneous resources and services, a second role is assumed by this layer: providing developers with a framework to build intelligent applications. In this way, the AmbienNet middleware layer offers an object-oriented interface to the applications. Interface functions refer to three different kinds of functional modules, which provide a unified interface to context, control and interaction drivers respectively, as described in [11]. Furthermore, an additional interface is provided to the intelligent context services as explained in the next subsection.
626
J. Abascal et al.
Operational interfaces to services and drivers follow the OSGi specification and system components at any layer are defined as OSGi bundles. In order to distribute the OSGi components, we adopt the R-OSGi approach proposed by Rellermeyer et al [12], which preserves the OSGi interface to applications, and hence portability. 2.3 Mobile Items: Smart Wheelchairs Smart wheelchairs are usually standard electrical wheelchairs provided with sonarand infrared-based distance sensors connected to the bus that links the driving device and the power stage, and controlled by an embedded processor [13]. Since they can act as autonomous vehicles, many navigation algorithms are taken from the mobile robotics field [14]. In AmbienNet the distance sensors are used for indoor navigation by the wheelchairs. In addition, this information is shared with the environment in order to provide information about temporary obstacles, wheelchairs traffic, etc. AmbienNet wheelchairs are provided with navigation algorithms and adaptive user interfaces. More information can be found in [15].
3 Sharing Contextual Information in AmbienNet The AmbienNet approach for sharing knowledge between applications is to create a new level to support intelligent applications, providing them with pre-processed data. In this way, this level acts as a distributed intelligent system where each application processes the raw data coming from the sensors and provides structured information, semantically annotated. In addition, intelligent applications can be consumers of the shared information to complement and enhance their knowledge. In order to share this information, in the AmbienNet project a new level called “Context-Awareness and Location Service”, has been designed (see Figure 1). This level receives structured information from all the applications that are able to produce it and serves information to the applications that require it, effectively acting as a context broker.
Fig. 1. AmbienNet infrastructure model
Managing Intelligent Services for People with Disabilities and Elderly People
627
Most intelligent applications use models to reason about the current situation and to be able to take decisions. Decision taking can involve user modelling to specify the characteristics of the user. These features must be observable and relevant for the application. In the case of people with physical, sensory or cognitive restrictions user modelling is very useful to tune the interaction system to the user functionalities. In addition, intelligent environments may require other models to describe the tasks that the user can (or can not) do and the environment where these tasks can (or can not) be performed. With this information an intelligent application, for instance, can supervise the users’ behaviour and help them to perform everyday tasks, plan their time, or prepare exceptional activities, such as trips, medical visits, etc. Modelling activity is a time and process consuming task that should not be repeated by each application needing it. The Intelligent Context Service layer processes the information coming from the sensors through the middleware in order to build and maintain the models, and serve this information to all the applications requiring it. These applications can also be producers of information that is shared with the other applications through this layer.
4 Intelligent Applications The information collected by all the sensors is delivered by the middleware to the Intelligent Context Services level, where it is materialized into the context-awareness and location services. These services process the information about location and sensors to extract data of a higher level to be used as support for the context applications. Moreover, this structure allows the design of enhanced applications and services, such as pattern recognition or user behaviours in order to identify or detect certain situations of risk or alarm. In order to demonstrate the validity of the Intelligent Content Services level designed for AmbienNet currently three intelligent applications share data: Intelligent Adaptive Interfaces, Everyday Tasks Support, and Navigation Support. All of them are information consumers and producers. 4.1 Intelligent Adaptive Interface Ambient Intelligence systems provide support to the user in a proactive way; that is, without waiting for users’ explicit requests. Nevertheless, in some cases it is necessary to establish a direct communication between the user and the environment. For instance, they explicitly interact when the user issues commands and when the system requests extra information (commands confirmation, supplementary data) or produces messages (warnings, reminders) [2]. Due to the fact that each user may have different capabilities and interests it is necessary to adapt the interaction to the specific user at hand. In addition, the modality that better fits the type of communication for a specific situation has to be chosen. The models of the user, the task and the context (handled and maintained by the intelligent Context Support Layer) are used to adapt the interaction.
628
J. Abascal et al.
4.2 Everyday Task Support This application supervises the activity of the users and provides them with recommendations to perform some tasks and warnings when potentially hazardous situations are detected. The information about the users, their activities and the adequate time and place for them, is stored in the previously mentioned models run by the intelligent Context Support Layer. More information about this application can be found in [1]. 4.3 Navigation Support The Navigation support is devoted to people with physical restrictions using smart wheelchairs. Even if this type of wheelchair is usually provided with range sensors that allow obstacle avoidance (short-term planning) they may experience difficulties for (long-term) trajectory planning. For this task, the information collected by the intelligent environment can be used. By means of the location sensor network deployed in AmbienNet the environment is able to locate the user with two precision levels: in which room he/she is (rough location) and his/her absolute position with a precision of centimetres (detailed location). In addition the video sensor network can place the wheelchair on a grid map (taking into account its position and orientation) together with the detected mobile and static obstacles [16]. This information is served to the wheelchair processor that uses it to plan its trajectory [17]. Therefore the smart wheelchairs take profit from contextual information coming from the indoor location system and from the image sensor network, which provides the wheelchairs with accurate information about places, distances, and mobile and sporadic obstacles. This information can be used for global navigation. In this way crowded corridors, closed doors, dangerous zones, etc., can be avoided. In addition, the wheelchair’s position and approximate speed, as well as obstacles, can be obtained from the ceiling cameras in order to perform a “local” navigation aid (avoiding obstacles and allowing specific goals to be reached). During the autonomous period between two successive updates from the environment the wheelchair uses the well-known wavefront path planning algorithm [38] to set checkpoints. Due to intrinsically inherent odometric errors, the estimated real position has to be maintained during the whole autonomous period until the next position update. For that, the Adaptive Monte Carlo Localization [39] algorithm was selected due to its great performance. This algorithm estimates the real position, comparing the map given and the one obtained through sensor readings. Additionally, the enhanced Vector Field Histogram [40] algorithm is used for obstacle avoidance. Furthermore, each wheelchair produces spatial information that can be useful for other applications. Therefore, the wheelchair controller has three different inputs: commands issued by the user through the user interface; distance information collected by range sensors installed in the own wheelchair; and context information coming from the environment. In order to use it properly, a shared-control paradigm is used.
Managing Intelligent Services for People with Disabilities and Elderly People
629
5 Conclusions In addition to guaranteeing universal accessibility, safety and fault tolerance, the design of supportive Ambient Intelligent environments has to ensure efficient processing of concurrent intelligent applications. The AmbienNet project proposed and tested an OSGi-based middleware to provide efficient interoperation among heterogeneous hardware and networks and intelligent applications. In addition, the new “ContextAwareness and Location Service” layer eases the creation of new supportive applications, providing them with pre-processed information in a proactive manner. Acknowledgment. The AmbienNet project is developed by the Laboratory of HCI for Special Needs of the University of the Basque Country, in collaboration with the Robotics and Computer Technology for Rehabilitation Laboratory of the University of Seville and the Technologies for Disability Group of the University of Zaragoza. This work has been partially funded by the Spanish Ministry of Education and Science as a part of the AmbienNet project TIN2006-15617-C03, and the Basque Government under grant No. S-PE07IK03.
References 1. Abascal, J., Fernández de Castro, I., Lafuente, A., Cia, J.M.: Adaptive Interfaces for Supportive Ambient Intelligence Environments. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2008. LNCS, vol. 5105, pp. 30–37. Springer, Heidelberg (2008) 2. Abascal, J.: Users with Disabilities: Maximun Control with Minimun Effort. In: Perales, F.J., Fisher, R.B. (eds.) AMDO 2008. LNCS, vol. 5098, pp. 449–456. Springer, Heidelberg (2008) 3. Salvador, Z., Jimeno, R., Lafuente, A., Larrea, M., Abascal, J.: Architectures for ubiquitous environments. In: IEEE Int. Conf. on Wireless and Mobile Computing, Networking and Communications. IEEE Press, New York (2005) 4. ZigBee, http://www.zigbee.org/ 5. Casas, R., Marco, A., Falcó, J.L., Artigas, J.I., Abascal, J.: Ethically Aware Design of a Location System for People with Dementia. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 777–784. Springer, Heidelberg (2006) 6. Marco, A., Casas, R., Falco, J., Gracia, H., Artigas, J.I., Roy, A.: Location-based services for elderly and disabled people. Computer Communications 31(6), 1055–1066 (2008) 7. Rowe, A., Goel, D., Rajkumar, R.: FireFly Mosaic: A Vision-Enabled Wireless Sensor Networking System. In: RT Systems Symp. 2007, pp. 459–468 (2007) 8. Fernández, I., Mazo, M., Lázaro, J.L., Pizarro, D., Santiso, E., Martín, P., Losada, C.: Guidance of a mobile robot using an array of static cameras located in the environment. Autonomous Robots 23(4), 305–324 (2007) 9. da Costa, C.A., Corrêa Yamin, A., Resin Geyer, C.F.: Toward a General Software Infrastructure for Ubiquitous Computing. IEEE Pervasive Computing 7(1), 64–73 (2008) 10. Salvador, Z., Larrea, M., Lafuente, A.: Infrastructural Software Requirements of Pervasive Health Care. In: Procs. IADIS Int. Conf. on Applied Computing, Salamanca (Spain), pp. 557–562 (2007)
630
J. Abascal et al.
11. Salvador, Z., Larrea, M., Lafuente, A.: Smart Environment Application Architecture. In: Procs. of the 2nd Int. Conf. on Pervasive Computing Technologies for Healthcare, PervasiveHealth 2008, Tampere (Finland), pp. 308–309 (2008) 12. Rellermeyer, J.S., Alonso, G., Roscoe, T.: R-OSGi: Distributed Applications Through Software Modularization. In: Proceedings of the ACM/IFIP/USENIX 8th International Middleware Conference (2007) 13. Ding, D., Cooper, R.A.: Electric-Powered Wheelchairs: A review of current technology and insight into future directions. IEEE Control Systems Magazine, 22–34 (April 2005) 14. Dutta, T., Fernie, G.R.: Utilization of Ultrasound Sensors for Anti-Collision Systems of Powered Wheelchairs. IEEE Transactions on Neural Systems and Rehabilitation Engineering 13(1), 24–32 (2005) 15. Abascal, J., Bonail, B., Cagigas, D., Garay, N., Gardeazabal, L.: Trends in Adaptive Interface Design for Smart Wheelchairs. In: Lumsden, J. (ed.) Handbook of Research on User Interface Design and Evaluation for Mobile Technology, pp. 711–729. Idea Group Reference, Pennsylvania (2008) 16. Takeuchi, E., Tsubouchi, T., Yuta, S.: Integration and Synchronization of External Sensor Data for a Mobile Robot. In: SICE Annual Conference, Fukui, Japan, pp. 332–337 (2003) 17. Jennings, C., Murray, D.: Stereo vision based mapping and navigation for mobile robots. In: IEEE Int. Conf. on Robotics and Automation, New, Mexico, pp. 1694–1699 (1998)
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors in Embodied Conversational Agents Afia Akhter Lipi1, Yukiko Nakano2, and Matthias Rehm3 1
Dept. of Computer and Information Sciences, Tokyo University of Agriculture and Technology, Japan [email protected] 2 Dept. of Computer and Information Science, Seikei University, Japan [email protected] 3 Institute of Computer Science, Augsburg University, Germany [email protected]
Abstract. The goal of this paper is to integrate culture as a computational term in embodied conversational agents by employing an empirical data-driven approach as well as a theoretical model-driven approach. We propose a parameter-based model that predicts nonverbal expressions appropriate for specific cultures. First, we introduce the Hofstede theory to describe socio-cultural characteristics of each country. Then, based on the previous studies in cultural differences of nonverbal behaviors, we propose expressive parameters to characterize nonverbal behaviors. Finally, by integrating socio-cultural characteristics and nonverbal expressive characteristics, we establish a Bayesian network model that predicts posture expressiveness from a country name, and vice versa. Keywords: conversational agents, enculturate, nonverbal behaviors, Bayesian network.
1 Introduction When we meet someone, one of the first things we do is to classify the person as “in-group” or “out-group”. This social categorization is often based on the ethnicity [4]. When someone is identified as part of in-groups as opposed to out-group, she or he is perceived as more trustworthy. As the same way, does the ethnicity of Embodied Conversational Agents (ECAs) also matter? Findings in previous studies support a claim that ethnicity of embodied conversational agents effects users’ attitudes and behaviors. Nass et al [8] found that users showed more trust and were more willing to take the agent’s suggestion if the agent was of the same ethnic group or from the same cultural background. Aiming at generating culture-specific behaviors, specifically postures, in ECAs, this study focuses on modeling cultural differences. Our method enables the user to experience exchanges of cultural specific posture expressions in human-agent interaction. However, defining culture is not an easy task and there are various definitions of this C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 631–640, 2009. © Springer-Verlag Berlin Heidelberg 2009
632
A.A. Lipi, Y. Nakano, and M. Rehm
notion around, and descriptive and explanatory theories are not very useful for computational purposes. Thus, to generate culturally appropriate nonverbal behaviors in ECAs, we will propose a parameterized socio-cultural model which characterizes the group or the society using a set of numerical values, and selects agents’ nonverbal expressions according to the parameter set using probabilistic reasoning facilitated by a Bayesian network. As a data-driven approach, we have already collected comparative multimodal corpus for two countries; an Asian country Japan and a European country Germany, and extracted culture-specific posture shapes from the corpus [1]. In this paper, based on the results of our empirical study, we extend our research by employing a model-driven approach by introducing Hofstede model [7] as a theoretical basis of describing socio-cultural characteristics. Hofstede’s theory is more appealing for establishing a computational model because Hofstede defines each culture using five dimensions each of which has quantitative nature. Integrating the Hofstede theory of culture [7] and the empirical data from our corpus [1], in this paper, we will implement a parameterized model which generates culture specific non-verbal expressions. Our final goal is not restricted to build a model for embodied conversational agents, but is to propose a general model which estimates nonverbal parameters for various cultures. In the following sections, first we will discuss related work in section 2, and in section 3, explain the approach of this study in addition to a brief description of Hofstede model. Section 4 reports the empirical data in our corpus, and Section 5 proposes a Bayesian network which combines Hofstede theory and the empirical data. Section 6 describes a nonverbal decision module, and Section 7 gives conclusions and future work.
2 Related Work As research in ethnicity of ECAs, Nass et al [15] examined the question: “Does the ethnicity of a computer agent affect users’ attitudes and behaviors?” So they did a study of Korean subjects interacting with an American agent or with a Korean agent. They found out that ethnic similarity had significant effect on users’ attitudes and behaviors. When the ethnicity of the subject was the same as the agent, the subject took the agent to be more trustworthy and convincing. Nass et al [15] claimed that user showed more trust and were more willing to take the agent’s suggestion or more willing to give credit card number. These results suggests that culture adapted agents are more positively accepted, and will provide more successful outcome in E-commerce. Francisco et al. [11] assessed index of ethnicity on the basis of language and non-verbal features, not by physical appearance such as skin color, hair, style or clothing. They found that children had longer interaction with a virtual peer whose verbal and nonverbal behaviors matched with their own ones than with a ethnically mismatched virtual peer. Isbister [5] pointed out the importance of non-verbal communicative behaviors which is largely culture-specific. She reviewed a number of features of nonverbal communication such as eye gaze and gestures. Arabs treat sustained eye contact as a sign of engagement and sincerity where as Japanese interpret sparse use of direct eye contact as a sign of politeness. Another example is a simple head nod which is interpreted as a sign of agreement in Germany but indicates only attention in Japan. The
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors
633
frequency, the manner, and the number of gestures are also culturally dependent. Mediterranean people have far more gestures than North American do. Italians tend to use big gestures and do gestures more frequently than English or Japanese. Southern Europeans, Arabs, and Latin Americans use animated hand gestures where as Asians and Northern Europeans use quieter gestures [5]. As studies in learning systems, Johnson et al. [10] described a language tutoring system that also takes cultural differences in gesture usage into account. Maniar and Bennett [12] proposed a mobile learning game to overcome cultural shock by making the user aware of the cultural differences. The eCIRCUS project (Education through characters with emotional intelligence and role playing capabilities that understand social interaction) [13] is aiming at developing models and innovative technologies that support social and emotional learning through role-plays. For example, children become aware of social sensitive issues such as bullying through virtual role-plays with synthetic characters.
3 Describing Socio-cultural Characteristics As a theoretical approach, we employ Hofstede theory to describe socio-cultural characteristics. Then, from an empirical approach, we propose several nonverbal expressive parameters to characterize the posture expressiveness. These two layers will be integrated into a Bayesian network model to predict either behavioral characteristics or a culture. We start with introducing Hofstede theory [7]. Hofstede theory defines culture as a dimensional concept, and consists of the following five dimensions which are based on a broad empirical survey. 1. Hierarchy/Power Distance Index: This dimension describes the extent to which different distribution of power is accepted by the less powerful members. More coercive and referent power is used in high power distance societies and more reward, legitimate, and expert power in low power distance societies. 2. Identity: This is the degree to which individuals are integrated into a group. On the individualist side, ties between individuals are loose, and everybody is expected to take care for herself/himself. On the collectivist side, people are integrated into strong and cohesive groups. 3. Gender: The gender dimension describes the distribution of roles between genders. Feminine cultures place more value on relationships and quality of life whereas in masculine cultures competition is rather accepted and status symbols are of importance. 4. Truth/Uncertainty: The tolerance for uncertainty and ambiguity is defined in this dimension. It indicates to what extent the members of a culture feel either uncomfortable or comfortable in unstructured situations which are novel, unknown, surprising, or different from usual. 5. Virtue/Orientation: This dimension distinguishes long and short term orientation. Values associated with long term orientation are thrift and perseverance whereas values associated with short term orientation are respect for tradition, fulfilling social obligations, and saving one’s face.
634
A.A. Lipi, Y. Nakano, and M. Rehm
Since cultural characteristics in Hofstede theory are synthetic, a set of parameter values indicates the cultural profile. Table 1 gives Hofstede’s ratings for three countries [2]. For example, in Identity dimension, Germany (67) is more individual culture than Japan (46), and US (91) is the most individual among three. Table 1. Hofstede ratings for three countries Hierarchy
Identity
Gender
Uncertainity
Orientation
Germany
35
67
66
65
31
Japan
54
46
95
92
80
US
40
91
62
46
29
4 Characterizing Nonverbal Behaviors 4.1 Defining Posture Expressive Parameters To define parameters that characterize posture expressivities, we reviewed previous studies. To describe cultural differences in gestures, Efron [14] proposed parameters such as spatio-temporal aspects, interlocutional aspects, and co-verbal aspects. Using a factor analysis, Gallahar [15] revealed four dimensions; expressiveness, expansiveness, coordination, and animation. Based on these previous studies, Hartmann et al. [16] defined gestural expressivity using six parameters such as repetition, activation, spatial extent, speed, strength, and fluidity Based on our literature study, we came up with five parameters which define the characteristics of posture. The five parameters are spatial extent, rigidness, mirroring, frequency, and duration. In the next section, the details of deriving values for each behavioral expressive parameter are explained. 4.2 Assigning Values Since we found that the cultural difference in posture shifts is very clear in arm postures [1], we focus on predicting arm postures. Among the five expressive parameters we proposed in section 4.1, we got the value of frequency and duration from our previous empirical study [1]. To find the values for spatial extent and rigidness, we will conduct an experiment. Then to derive the numerical value for mirroring, we will analyze our video data. Frequency and duration. Frequency and duration can be assigned by referring the results of our previous empirical study [1]. Average frequency of arm posture shifts in German data is 40.38 per conversation and 22.8 in Japanese data. On the other hand, average duration of each posture is 7.79 sec in German data and 14.8 sec in Japanese data. Thus, Japanese people like to keep one posture longer than German people.
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors
635
Measuring impression for spatial extent and rigidness. By spatial extent, we mean the amount of physical space required for a certain posture. As the term rigidness seems more tricky type, we used the opposite word relax instead of rigidness to make the term simple to the subjects. Study Design: We extracted 15 video clips of postures from Japanese video data and 15 posture video clips from German data, and asked 7 Japanese subjects and 10 German subjects to rate each video clip. The rating was made using a questionnaire which asked the subjects to rate impressions on the shape of arm, lower body, and whole body using 7 point scales where 1 is meant to be the least value and 7 is the top most value. For each video clip, the subjects answered their impression in two dimensions; spatial extent and relax. Before starting the experiment, each subject was handed an explanation form which explained how each subject should rate the video clips. Result: The rating results are shown below. Table 2 shows that German do more relaxed postures than Japanese, and Japanese do smaller postures than German. Table 2: Non-verbal expressive parameters from Experimental data
Country Japan Germany
Spatial Extent 5.25 7.33
Rigidness 8.58 7.62
Analyzing mirroring. Mirroring refers to an interpersonal phenomenon in which people unknowingly adjust the timing and content of their behavioral movements such that they mirror the behavioral cues exhibited by their social interaction partner. Mirroring has positive effects on interaction, and enhances the relationship between the conversants. Study Design: We analyzed videos of 10 Japanese pairs and 7 German pairs (both speaker and listener) acting the first time meeting scenario [1] where two people met first time and had a conversation to know each other. The dyadic conversation took place for 5 minutes. After annotating posture shifts of both speaker and listener using Bull’s Coding Scheme [3], we counted the number of postures common in both parties, speaker and listener, by using two conditions below; (1) If person A shifts to a new posture while speaking, and within five seconds person B also changes to the same posture as person A, vice versa.
(2) If person A shifts to a new posture, and soon person B also does the same posture which is overlapped with person A’s posture.
636
A.A. Lipi, Y. Nakano, and M. Rehm
Result: The average number of mirroring for Japanese is 6.2 per conversation, and for German is 0.57. This result suggests that Japanese are more likely to synchronize with the conversation partner than German people. So they like to be in a group and more collective in nature.
5 Combining Theoretical and Empirical Approach to Develop a Parameter-Based Model Based on Hofstede theory of culture, we proposed a model where culture is connected to Hofstede dimensions which are also connected with nonverbal expressive parameters for postures. 5.1 Reasoning Using a Bayesian Network To build this parameter based model, we employ Bayesian network technique. Figure 1 shows our Bayesian network which models relationship between socio-cultural aspects and behavioral expressiveness. Bayesian networks are acyclic directed graphs in which nodes represent random variables and arcs represent direct probabilistic dependences among them. Bayesian networks [2] handle uncertainty at every state. This is very important for our purpose as the linkage between culture and nonverbal behavior is a many to many mapping. In addition, since the network can be used in both directions, it can infer the user’s cultural background as well as simulate the system’s (agent’s) culture specific behaviors. 5.2 Parameter Based Socio-cultural Model In order to build a Bayesian network for predicting socio-cultural aspects in posture
expressiveness, the GeNie[6] modelling environment was used.
Fig. 1. Bayesian network model predicting Japanese posture expressiveness parameters
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors
637
First Layer: The first part of the network is quite simple. The entry node of the Bayesian network is a culture node which is connected to Hofstede’s dimensions. Currently we have inserted two countries, Germany and Japan. Middle Layer: The middle layer defines Hofstede’s five dimensions. We already integrated all five dimensions: hierarchy, identity, gender, uncertainty, and orientation. Hofstede ratings for each country shown in Table1 are used as the probabilities in each node. The Lowest layer: The lowest layer consists of a number of different behavioral parameters that depend on a culture’s position on Hofstede’s dimensions. We draw a connection between the cultural dimensions and the nonverbal behaviors. Lowest level consists of five nodes whose values were specified in section 4.2. a) Spatial Extent: Spatial extent describes the amount of physical space required for a certain posture. From our experimental data, we found that German do bigger posture than Japanese. When we compare the postures between male and female subjects, we found that, Japanese female do smaller posture than male, and the difference is bigger than in German data. So, we can say Japanese society is more masculine than German. Moreover, hierarchy affects the spatial extent. In high hierarchical societies, people seem to make small postures [5]. b) Rigidness: How stiff the posture is. Our experimental data revealed that Japanese people seem to do more rigid postures than German, and German seem to be more relaxed than Japanese. In high hierarchical society, people are stiffer than low hierarchical society. Thus, we assume a linkage between hierarchy and rigidness. c) Mirroring: Since mirroring is to copy the conversation partner’s postures during a conversation, we assume that frequency of mirroring correlates with collective nature. In our corpus study in section 4.2, Japanese people actually more frequently did mirroring than German people. d) Frequency: German people change their posture more frequently than Japanese. According to Hofstede theory, Japanese culture is of long term orientation therefore we set links from truth and virtue to frequency. e) Duration: Japanese people stay at a single posture for a long period of time than German people. Thus, we assume that both truth and virtue affect duration. For each node in the Bayesian network, probability is assigned based on the data that we reported in section 4. For example, since the posture shift frequency of German data (40.38) is 1.77 times of Japanese data (22.8), as probability values, we assigned 0.66 and 0.34 to each country respectively. Output: When a country is chosen at the top level as evidence, behavior expressive parameters are estimated. For instance, as shown in Figure 2, when Japan is chosen as evidence, the results of estimations are; spatial extent is small (51%), rigidness is extreme (51%), mirroring is most (54%), frequency is low (59%), duration is long (56%). In the same way, when Germany is given as evidence, the estimation results are; spatial extent is big (51%), rigidness is least (53%), mirroring is least (66%), frequency is high (52%), and duration is short (52%).
638
A.A. Lipi, Y. Nakano, and M. Rehm
5.3 Evaluation of the Model As an evaluation of our model, we tested whether this model can properly predict posture expressiveness of other countries. When the Hofstede scores for US shown in Table 1 are applied, the model predicts that spatial extent for US is big (51%), rigidness is least (52%), mirroring is least (90%), frequency is high (53%), and duration is short (53%). This prediction suggests that American postures are less rigid (in other words more relaxed), and this supports what Ting Toomey has reported [5].
6 Posture Selection Mechanism This section presents our posture selection mechanism which uses the Bayesian network model as one of its components. A simple architecture is given in Figure 2. Basically it is divided in to three main modules. The input to the mechanism is a country name and a text that the agent speaks. 6.1 Probabilistic Inference Module The Probabilistic Inference Module takes country name as input and outputs the nonverbal parameters for that country. To generate outputs, the module refers our Bayesian network model. We used Netica API of JAVA version as an inference engine. The outputs of this module are values of nonverbal expressive parameters of each culture: spatial extent, rigidness, duration, and frequency. 6.2 Decision Module This module is the most important module. This module has two sub-modules. b1: Posture computing module: This module takes the estimation results from the Bayesian network as inputs, and uses them as weights for each empirical data. Then, it calculates the sum of all the weighted values. For example, the score for a posture, PHFe (Put hand to face), which is frequently observed in Japanese data, is shown below. 0.5183, 0.507, 0.58, and 0.56 are weights for spatial extent, rigidness, frequency, and duration respectively, which are given by the Bayesian network. 4.19, 4.4, 2.725, and 1.01 are values obtained in empirical studies in section 41. PHFe={(0.5183 *4.19)+(0.507 *4.4)+( 0.58 *2.725)+ (0.56*1.01)} * 10 = 65.49 b2: Posture distinguishing module: This sub-module separates typical postures of each cultures as German like postures, Japanese like postures, or common postures (used in both Germany and Japan). It is judged by checking the thresholds for each country. If the text is Japanese, and the posture value falls within the range of Japanese posture, it sends the posture to the Generation phase as a posture candidate.
1
Since various kinds of measures were used in the empirical data, they were normalized them into 1 to 7.
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors
639
Japan ese A gent
Proba bilistic Infe re nc e M o dule
D ec ision M odule
G e ne ration M od ule
U ser
B ay esia n m o d el
E m p ir ica l d a ta
A n im a tio n s
Germ an Agen t
Spe e ch synthe siz e r
Fig. 3. A simplified architecture of the system
6.3 Generation Module This module takes postures recommended by the decision module and looks for the animation file for that posture in animation database. Then, Horde3D animation engine generates the animation file. We use Hitachi HitVoice for TTS which converts the text into a wav file, and then the agent speaks with appropriate culture-specific postures.
7 Future Work and Conclusions Employing Bayesian network, we combined Hofstede model of socio-cultural characteristics with posture expressive parameters that we proposed, and found that our model estimates cultural specific posture expressiveness quite well. As future work, we plan to apply our posture generation mechanism to language exchange application on the web where two users from different countries log on the service, and teach their own language to the partner, and learn a foreign language from her or his partner. In this application, the system not only helps the user teach a language, but also makes the learner familiar with the culture-specific nonverbal behaviors. Acknowledgment. This work is funded by the German Research Foundation (DFG) under research grant RE 2619/2-1 (CUBE-G) and the Japan Society for the Promotion of Science (JSPS) under a Grant-in-Aid for Scientific Research (C) (19500104).
References 1. Rehm, M., et al.: Creating a Standardized Corpus of Multimodal Interactions for Enculturating Conversational Interfaces. In: Proceedings of Workshop on Enculturating Conversational Interfaces by Socio-cultural Aspects of Communication, 2008 International Conference on Intelligent User Interfaces (IUI 2008) (2008) 2. Rehm, M., et al.: Too close for comfort? Adapting to the user’s cultural background. In: Proceedings of the 2nd International Workshop on Human-Centered Multimedia (HCM), Augsburg (2007) 3. Bull, P.E.: Posture and Gesture. Pergamon Press, Oxford (1987) 4. Nass, C., Isbister, K., Lee, E.: Truth is Beauty Researching Embodied Conversational Agents. In: Cassell, J., et al. (eds.) Embodied Conversational Agents, pp. 374–402. The MIT Press, Cambridge (2000) 5. Ting-Toomey, S.: Communication Across Culture. The Guildford Press, New York (1999)
640
A.A. Lipi, Y. Nakano, and M. Rehm
6. GeeNIe and SMILE, http://genie.sis.pitt.edu/ 7. Hofstede, http://www.geert-hofstede.com/hofstededimensions.php 8. Lee, E.-J., Nass, C.: Does the ethnicity of a computer agent matter? An experimental comparison of human-computer interaction and computer-mediated communication. In: Prevost, S., Churchill, E. (eds.) Proceedings of the workshop on Embodied Conversational characters (1998) 9. Isbister, K.: Building Bridges through the Unspoken: Embodied Agents to facilitate intercultural communication. In: Payr, S., Trappl, R. (eds.) Agent Culture: Human –Agent Interaction in a Multicultural World, pp. 233–244. Lawrence Erlbaum Associates, Mahwah (2004) 10. Johnson, W., et al.: Tactical Language Training System: Supporting the Rapid Acquisition of Foreign Language and Cultural Skills. In: Proc. of InSTIL/ICALL - NLP and Speech Technologies in Advanced Language Learning Systems (2004) 11. Iacobelli, F., Cassell, J.: Ethnic Identity and Engagement in Embodied Conversational agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 57–63. Springer, Heidelberg (2008) 12. Maniar, N., Bennett, E.: Designing a mobile game to reduce cultural shock. In: Proceedings of ACE 2007, pp. 252–253 (2007) 13. http://www.e-circus.org/ 14. Efron, D.: Gesture, Race and Culture. Mouton and Co. (1972) 15. Gallaher, P.E.: Individual Differences in Nonverbal Behavior; Dimension of style. Journal of Personality and Social Psychology 63819, 133–145 (1992) 16. Hartmann, B., Mancini, M., Buisine, S., Pelachaud, C.: Design and evaluation of expressive gesture synthesis for embodied conversational agents. In: Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, pp. 1095–1096 (2005)
Intelligence on the Web and e-Inclusion Laura Burzagli and Francesco Gabbanini Institute of Applied Physics “Nello Carrara” – Italian National Research Council Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy {L.Burzagli,F.Gabbanini}@ifac.cnr.it
Abstract. Within the context of Web, the word intelligence is often connected with the visions of Semantic Web and Web 2.0. One of the main characteristic of Semantic Web lies in the fact that information is annotated with metadata and this gives the opportunity of organizing knowledge, extracting new knowledge and performing some basic operations like query answering or inference reasoning. Following this argument, the advent of the Semantic Web is often claimed to bring about substantial progress in Web accessibility (which is part of the e-Inclusion concept). Web 2.0 sites, favoring massive information sharing, could as well be of great importance for e-Inclusion, enabling new forms of social interaction, collective intelligence and new patterns of interpersonal communication. Benefits could be substantial also for people with activity limitations. The paper tries to highlight the possible roles and convergence of Web 2.0 and Semantic Web in favoring e-Inclusion. It highlights the fact that examples of applications of these concepts to the e-Inclusion domain are few and limited to the e-Accessibility field. Keywords: e-Inclusion, Web 2.0, Semantic Web.
1 Introduction Due to the evolution and increased complexity of the Web, intelligence is becoming a challenging functionality in the Web itself, and a number of forms in which it can manifest itself have been identified, such as the Semantic Web and Web 2.0. Enhancements that these forms of intelligence might bring in access to information and interpersonal communication could have a positive impact in the field of e-Inclusion. The term e-Inclusion is here considered in its widest definition, both as a support to accessibility of Information and Communication Technology and as a support to the daily activities of people, according to the European Union Riga declaration1, point 4: “e-Inclusion” means both inclusive ICT and the use of ICT to achieve wider inclusion objectives. It focuses on participation of all individuals and communities in all aspects of the information society. E-Inclusion policy, therefore, aims at reducing 1
See http://ec.europa.eu/information_society/events/ict_riga_2006/doc/declaration_riga.pdf, last visited on 2/27/2009.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 641–649, 2009. © Springer-Verlag Berlin Heidelberg 2009
642
L. Burzagli and F. Gabbanini
gaps in ICT usage and promoting the use of ICT to overcome exclusion, and improve economic performance, employment opportunities, quality of life, social participation and cohesion. The analysis in this paper introduces Web 2.0 and Semantic Web, tries to highlight their possible convergence and to summarize their role in favoring e-Inclusion. The discussion points out the fact that, up to now, few examples of applications of Semantic Web and Web 2.0 to the e-Inclusion domain exist, and they are mostly limited to the e-Accessibility field. An example of one of such applications is presented to stimulate the discussion.
2 Aspects of Intelligence on the Web Web technologies have been constantly evolving, and also the way in which the Web is used and perceived by its users is evolving. As observed in [1], from the browsing of some pages containing text or images, going through the connection of pages, the Web has become an interactive ubiquitous information system that leverages the wisdom of many users and makes it possible to reuse data through mashups. From the perspective of users this structure offers information, services and powerful search engines to find them. Users can take advantage of an environment useful to people for research, learning, commerce, socialization, communication and entertainment. In order to fully exploit the potential of today’s Web, benefits would come from the introduction of intelligence embedded on the system, to handle complex scenarios. Intelligence on the web can be considered according to several different perspectives. In the complex and varied Web world, a noteworthy phenomenon is assuming a great importance: Web 2.0. According to the opinion of several experts, this phenomenon is not based only on technological innovations, but draws its peculiarity from social aspects, from the cooperation between users and the wide variety of web contents which are directly generated from users. This combination of technological and human factors could represent a valuable support also in the e-Inclusion field. However, at scientific level, the more advanced representation of intelligence on Web seems to be the Semantic Web, the revolution for the web proposed by Tim Berners-Lee almost 10 years ago [2]. If referred to e-Inclusion the concept seems to offer a number of features that not only can improve web accessibility and overcome several limitations which are present in today’s Web, but also can lead to the creation of new services which can be useful for a wide variety of users (see [3], [4]). An interesting aspect to note is that Semantic Web and Web 2.0 are not in contrast, but it has become clear to many experts (see [5], [6]) that they are natural complements of each other, because Semantic Web can give a formal representation of human meaning, while contents can be built and maintained using the techniques and data generation capabilities that are typical of Web 2.0. 2.1 Web 2.0: Social and Technological Aspects Although there is not full agreement on a definition, Web 2.0 (also called the wisdom Web, people-centric Web, participative Web, see [7], [8]) is perceived as a second phase in the Web’s evolution, in which web based services have the characteristic of
Intelligence on the Web and e-Inclusion
643
aiming to facilitate collaboration and sharing between users, letting them engage in an interactive and collaborative manner, giving more emphasis to social interaction and collective intelligence. It is a fact that the term does not refer to an update to Web technical specifications, but to changes in the way software developers use known web technologies and in the way end-users use the internet as a platform. A number of researches attribute to this phenomenon an element of intelligence (see [8], [9], [7]). Intelligence originates from interaction among users, when this interaction happens by means of the use of Internet, and differs from intelligence seen as a result of software routines implementing Artificial Intelligence procedures. In the literature, someone describe this aspect with the terms collected intelligence (i.e. the value of user contributions is in their being collected together), finding them more appropriate than collective intelligence (i.e. characterized by the emergence of truly new levels of understanding) (for example, see [10]). Accessibility Issues in Web 2.0. From the technological point of view, many (but not all) Web 2.0 applications are supported by a series of new generation web based technologies that have existed since the early days of the web, but are now used in such a way to exploit user-generated content, resource sharing and interactivity in a more sophisticated and powerful way (see [11]), giving rise to the so called Rich Internet Applications (RIA). Techniques such as AJAX have evolved that have the potential to improve the user-experience in browser-based applications. The main impact on accessibility comes from dynamic and incremental updates in web pages. From the one side, these may come unexpected to users, who may not notice that a part of the page has changed. From a second side it is to be noted that problems with asynchronous updates may be fatal for users relying on Assistive Technology (AT): in fact, updates can occur on a different area of the page than where the user is currently interacting and AT’s could fail notifying users that something on the page has changed. Issues concerning accessibility in RIA’s are being faced by the WAI-ARIA2 of the W3C, which has formulated a series of best practices for rich internet applications design. WAI-ARIA markup presents a solution to making these applications accessible. An interesting analysis of technologies to enable a more accessible Web 2.0, discussed in [12], points out that, basically, ARIA is built upon Semantic Web concepts in that it defines so called “live regions” that allow, adding semantic annotations to the HTML and XHTML markup in order to better define the role of user interface components. This feature can be used, for example, to enable assistive technologies to give an appropriate representation of user interfaces. For example a browser can interpret the additional semantic data and provide it to the assistive technology via the accessibility Application Programming Interface of the platform, which already implements mechanisms to describe user interface controls. Thiessen and Chen in [13] present a chat example that shows ARIA live regions in action and demonstrates several of their limitations. Web 2.0 perspectives for e-Inclusion. As a result of a first survey, it appears that the role of Web 2.0 with respect to e-Inclusion (considered from the perspective of its general definition) has not yet been object of interest in the scientific community. 2
See http://www.w3.org/WAI/intro/aria.php. Last visited on 2/27/2009.
644
L. Burzagli and F. Gabbanini
Up to now, Web 2.0 has almost exclusively been considered useful in particular fields of applications such as leisure, travel or e-commerce. The availability of a wide corpus of collective intelligence could give Web 2.0 an important role also in the field of e-Inclusion, where interaction among users has always been considered (see [12] as an example) to offer valuable support in helping people to overcome limitations (either physical or cultural), even if only a limited number of examples have been presented that follow this direction, yet. The new forms of social interaction and collective intelligence brought by Web 2.0 could enable new patterns of interpersonal communication for all users. Benefits could be substantial also for people with activity limitations. For example, through Web 2.0 sites motor impaired users could share their experiences about accessible accommodations and paths in towns: value to the application could increase as more users use it, putting their knowledge at the disposal of other users. An example of a service that exploits the capabilities of Web 2.0 and Semantic Web in a sub-domain of e-Inclusion, that is e-Accessibility, is described in Section 4. It is to be noted, moreover, that the interaction techniques that are typical of this Web 2.0 (although a number of problems related to their accessibility are emerging and are being handled by WAI) can represent a useful help for some group of users, especially for people with cognitive disabilities because of the possibility to implementing contextual help, tailoring interfaces to meet users’ experiences and because of the fact that techniques exist to learn users’ preferences (see [5]). 2.2 Semantic Web The World Wide Web is a container of knowledge as well as a mean to exchange information between peoples and let them communicate. As of today, even if pages are generated by CMS and stored in databases, they are presented in the form of pages written in mark-up languages such as HTML and XHTML. Thus, web contents are mostly structured to be readable by humans and not by machines. The aim of the Semantic Web initiative (originated by Tim Berners-Lee and now being developed within the World Wide Web Consortium3) is to represent web contents in a manner that is more easily processable by machines and to set up tools to take advantage of this representation. The initiative seeks to provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. In this way, according to the Semantic Web vision, a person, or a machine, should be capable to start browsing information on the web and then move through a set of information sources which are connected not by wires and links, but by being “about the same thing” (see [14] for example). The Semantic Web is based on the concept of ontology: an ontology is used to describe formally a domain of discourse and consists of a list of terms and the relationships between these terms. Metadata, organized with ontologies, are used to identify information in web sources and logic is used for processing retrieved information and uncover knowledge and relationships found in ontologies. 3
See http://www.w3.org/2001/sw/. Last visited on 2/27/2009.
Intelligence on the Web and e-Inclusion
645
It is to be noted that the Semantic Web does not aim to exhibit a human-level intelligence, as the one envisioned by Artificial Intelligence (AI). Though it builds on the work carried out in AI, the Semantic Web stack (made up with RDF, RDFSchema, Ontology vocabularies, Rules, Logic and Proof) aims at building intelligent agents, in the sense that they are capable of uncovering unexpected relationships between concepts. Semantic Web Perspectives for e-Inclusion. Discussions on the role of Semantic Web and e-Inclusion are mainly focused on aspects related to e-Accessibility. It is highlighted by several authors that the Semantic Web is not an area that is very well explored for supporting Web accessibility (see [5], [15]). However, it is also generally acknowledged that developments connected to the Semantic Web can provide a valuable contribution to creating accessible content, especially if taken together with Web 2.0. One of the main characteristic of Semantic data is that it can be modeless: it is not already deliberately constructed as a page. Following this argument, the advent of the Semantic Web is often claimed to bring about substantial progress in Web accessibility and presentation on mobile devices (which are part of the e-Inclusion concept), as it facilitates providing alternatives for content and form for documents (see, e.g., [16]). Harper and Bechhofer, in [4], observe that semantic information built into general purpose Web pages could enable substantial improvements of accessibility of web pages. In fact, information is often rendered on web pages in an order defined by the designer and not in the order required by the user and web pages may bear implicit meaning that is connected to how to the information presented visually, while, for example, the same meaning cannot be interpreted by visually impaired persons that are forced to interact with systems in a serial manner. The availability of semantics would be of great value for Assistive Technology, which relies on semantic understanding to provide access to the user interfaces. Moreover, this could also provide benefits in that it could enable automatic reformulation and rearrangement of contents, based on metadata, in view of their fruition by people with different preferences and needs, in different contexts of use.
3 Convergence Between Web 2.0 and Semantic Web So far Web 2.0 and Semantic Web, which are considered two visions of the intelligence on the web, have been considered as completely different approaches to the future web. However, recently (see [6] for example), a number of authors have started to consider a possible convergence between them, merging their most successful aspects. From one side, the richness of Web 2.0 lies in its social dimension, and is characterized by an easy exchange of information in wide communities (social network) and in the collection of large amount of information, even if this is often unstructured. From the other side, the strength of Semantic Web is in its capability of interlinking and reusing structured information, but it needs data to be aggregated and recombined. In other words there is the need to merge human participation with well structured information.
646
L. Burzagli and F. Gabbanini
Fig. 1. The “collective knowledge system” general scheme
Two main solutions appear in literature. The first is related ([5]) to the creation of services and tool that are able to automatically analyze web pages and discover semantics, thus allowing structuring users generated content. A different solution (see [10]) is represented by systems that are defined by Gruber as “collective knowledge systems”, in which existing semantic structures are used to organize large amounts of knowledge generated by users (see Fig. 1). These systems are made up with a social network, supported by computing and communication technology, in which self-service problem solving discussions take place and where people pose problems and others reply with answers; a search engine to find questions and answers; users helping the system learning about which query/document pairs were effective at addressing their problems. According to Gruber [10], the role of Semantic Web is firstly seen in adding value to user data by adding structured data, related to the content of the user contributions in a form that enables more powerful computation. Secondly, Semantic Web technologies can enable data sharing and computation across independent, heterogeneous Social Web applications, whereas, up to now, these data are presently confined in a given application.
4 Existing Applications An existing example of the collective knowledge systems outlined in Section 3 is presented in the next section. The example is given by the IBM Social Accessibility Project. Though it regards eAccessibility, it is taken as a model to highlight potential benefits that could come from the exploitation of the convergence of Semantic Web and Web 2.0, in the wider domain of e-Inclusion.
Intelligence on the Web and e-Inclusion
647
4.1 The IBM Social Accessibility Project The Social Accessibility Project (see screenshot in Fig. 2) has been set up by IBM to improve Web accessibility by using the power of communities. It is a service whose goal is to make Web pages more accessible to people with disabilities, taking advantage from users' input and leveraging on the power of the open community while not changing any existing content. The system allows users encountering Web access problems to immediately report them to the Social Accessibility server. Volunteers (called supporters) can be quickly notified and can easily respond by creating and publishing the requested accessibility metadata, which will help other users who encounter the same problems. Specifically, supporters are able to discuss solutions among themselves through Web applications on the server and create a set of metadata to solve the problem; they then submit it to the server. When the user visits the page again, the page is automatically fixed and any user who installs a suitable software extension can access the accessible version of the page. This project delineates an interesting possible convergence between Web 2.0 and the Semantic Web because it takes advantage of a social network that discusses problems and tries to provide solutions in a collaborative manner. There is a potentially continuous interaction between users and supporters to discuss solutions and consider comments.
Fig. 2. Screenshot of the gust page of the IBM Social Accessibility Project
648
L. Burzagli and F. Gabbanini
Users can also create metadata: for example when a user finds an important position in a page, the position can be submitted as a “landmark” for other users. This process ends up with the creation of metadata that help identifying and overcoming access problems. More details on the service can be found at the address http://sa.watson.ibm.com/.
5 Conclusions The paper presents Semantic Web and Web 2.0 as two different approaches for the distribution of intelligence on the web. Their characteristics are presented and a summary of potential benefits coming from the application of Web 2.0 and Semantic Web vision to the e-Inclusion field is discussed. A class of applications (named “collective knowledge systems”), which in the future could serve as an example for exploiting the convergence of Semantic Web and Web 2.0 and which could have a positive impact on e-Inclusion is presented. An example referred to the improvement of accessibility of the Web is described. It is to be noted that the Web has certainly evolved from a collection of hypertext pages to an interactive ubiquitous information system that leverages the wisdom of many users to provide knowledge on virtually all fields. Web 2.0 has indeed had a relevant impact on the evolution of the Web, whereas the contribution of Semantic Web, though presently less visible, is expected to have an influence at longer terms. However, the evolution of Web and the uptake of Web 2.0 and Semantic Web seem to have had, up to now, a limited impact on e-Inclusion and they have been mostly studied in relation to the impact on a sub-domain of e-Inclusion, that is, eAccessibility. In fact, in the literature that was examined during the authors’ work, very few specific references of Web 2.0 to e-Inclusion were found, with the exception of a number of works dealing with problems that Web 2.0 technology can bring to the accessibility of web sites (which, indeed, is an aspect of e-Inclusion) or some minor applications built to provide accessible interfaces to some Web 2.0 applications like YouTube4 and SlideShare5. The IBM Social Accessibility Project, though focused again on accessibility, seems to represent a valid example of a new perspective in that it exemplifies a class of applications that take advantage of the power of Web 2.0 and Semantic Web and could have a positive impact on e-Inclusion. Acknowledgements. The contribution of Pier Luigi Emiliani in the development of the ideas presented in the paper is warmly acknowledged.
References 1. Kelly, K.: Four stages in the internet of things (November 2007), http://www.kk.org/thetechnium/archives/2007/11/ four_stages_in.php (last visited on 2/27/2009) 4 5
See http://icant.co.uk/easy-youtube/, last visited on 2/26/2009 See http://icant.co.uk/easy-slideshare/about/index.html, last visited on 2/26/2009
Intelligence on the Web and e-Inclusion
649
2. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 284(5), 34–43 (2001) 3. Kouroupetroglou, C., Salampasis, M., Manitsaris, A.: A semantic-web based framework for developing applications to improve accessibility in the WWW. In: W4A: Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A), pp. 98–108. ACM Press, New York (2006) 4. Harper, S., Bechhofer, S.: Semantic triage for increased web accessibility. IBM Systems Journal 44(3), 637–648 (2005) 5. Cooper, M.: Accessibility of emerging rich web technologies: Web 2.0 and the Semantic Web. In: W4A 2007: Proceedings of the 2007 international cross-disciplinary conference on Web accessibility (W4A), pp. 93–98. ACM Press, New York (2007) 6. Heath, T., Motta, E.: Ease of interaction plus ease of integration: Combining web2.0 and the semantic web in a reviewing site. Web Semantics 6(1), 76–83 (2008) 7. Murugesan, S.: Understanding web 2.0. IT Professional 9(4), 34–41 (2007) 8. O’Reilly, T.: What is web 2.0? Design Patterns and Business Models for the Next Generation of Software (September 2005), http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/ what-is-web-20.html 9. Lin, K.J.: Building Web 2.0. Computer 40(5), 101–102 (2007) 10. Gruber, T.: Collective knowledge systems: Where the Social Web meets the Semantic Web. Journal of Web Semantics 6(1), 4–13 (2008) 11. Knights, M.: Web 2.0. Communications Engineer 5(1), 30–35 (2007) 12. Gibson, B.: Enabling an accessible Web 2.0. In: W4A 2007: Proceedings of the 2007 international cross-disciplinary conference on Web accessibility (W4A), pp. 1–6. ACM Press, New York (2007) 13. Thiessen, P., Chen, C.: Ajax live regions: chat as a case example. In: W4A 2007: Proceedings of the 2007 international cross-disciplinary conference on Web accessibility (W4A), pp. 7–14. ACM Press, New York (2007) 14. Antoniou, G., Van Harmelen, F.: A Semantic Web Primer, 2nd edn. MIT Press, Cambridge (2008) 15. Yesilada, Y., Harper, S.: Web 2.0 and the semantic web: hindrance or opportunity? In: W4a - international cross-disciplinary conference on web accessibility 2007. SIGACCESS Accessibility and Computing, vol. (90), pp. 19–31 (2008) 16. Seeman, L.: The semantic web, web accessibility, and device independence. In: Proceedings of the 2004 international Cross-Disciplinary Workshop on Web Accessibility (W4A), vol. 63, pp. 67–73. ACM Press, New York (2004)
Accelerated Algorithm for Silhouette Fur Generation Based on GPU Gang Yang1,2 and Xin-yuan Huang1 2
1 Beijing Forestry University, 100083, Beijing, China Institute of Software, Chinese Academy of Sciences, 100080, Beijing, China {yanggang,hxy}@bjfu.edu.cn
Abstract. In the method that represents fur with multi-layer textured slices, representing silhouette fur is a time consuming work, which requires silhouetteedge detection and fin slices generation. In the paper, we present an accelerated method for representing silhouette fur by taking advantage of the programmable ability of Graphic Process Units (GPU). In the method, by appending edge info on each vertex, the silhouette-edge detection can be implemented in GPU; and by storing fin slices data in video memory in preprocessing, the time spent on fin slices generation and on data transmission from CPU to GPU can be saved. Experimental results show that our method accelerates silhouette fur representation greatly, and hence improves the performance of rendering furry objects. Keywords: fur rendering; GPU; silhouette fur; multi-layer textured slices.
1 Introduction Representation of realistic fur is a hot research topic in computer graphics, and it is also a challenging problem due to the high complexity of furry surface. The most direct method for representing fur is using geometric primitives to represent each fur fiber, such as the approaches presented in [1][2][3][4]. But it is difficult to achieve high rendering performance with these explicit geometric approaches because the number of furs over object surface is always very large. In 1989, Kajiya et al [5] put forward the method of representing fur with volume texture, and achieved excellent rendering results. However, the rendering speed of volume texture is very slow. Meyer et al [6] presented the idea of using multi-layer two-dimensional textured slices to represent the effect of three-dimensional volume textures, and achieved interactive rendering performance. Based on Meyer’s idea, Lengyel et al [7][8] presented a real-time fur rendering method. In the method, the furry surface is represented as a series of concentric, semi-transparent shells, and alpha-blending these shells together would produce the visual furry effect. Lengyel’s technique has great application value as it can represent the realistic fur results in real-time. Furthermore, Yang et al [9] improved Lengyel’s method by using non-uniform layered slices to represent fur, which improved the efficiency and flexibility. But the multi-layer slices method is not very appropriate for representing the fur near object silhouettes. Near the silhouettes, multi-layer slices are seen at grazing C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 650–657, 2009. © Springer-Verlag Berlin Heidelberg 2009
Accelerated Algorithm for Silhouette Fur Generation Based on GPU
651
angles, hence the gaps between layers become evident and fur appears to be overly transparent. Lengyel added textured fins to overcome the overly transparent problem near silhouette regions. However, the detection of silhouette edges and the generation of fin geometry often consume a lot of time, affecting the rendering speed apparently. In the paper, we propose a GPU-based accelerated method for generating silhouette fins. In the method, by designing a certain kind of data structure for each vertex, the detection of silhouette edge and the generation of fins can be totally transferred into the GPU pipeline. Thanks to the strong computation capability of GPU, the silhouette fins generation are accelerated greatly, hence improved the fur rendering performance. In addition, with the method, the fin data transmission from CPU to GPU is also avoided, which improved the speed further. In the remainder of the paper, Section 2 first introduces the multi-layer slices method and its limitation; then, Section 3 discusses our GPU-based silhouette edge detection and fin generation approach in detail. The experimental results are given in Section 4, and summary is presented in Section 5.
2 Multi-layer Slices Method and Its Limitation Lengyel et al[8] and Yang et al [9] used multi-layer textured slices to represent fur. In their methods, the furry surface is represented as a series of concentric, textured shells. In preprocessing, they constructed a patch of geometry model of fur through particle systems, and sampled it into several layers of semi-transparent textures called shell textures. In the runtime rendering, multi-layer concentric shells are generated by shifting the original mesh outwards from the model surface, and each shell was mapped with corresponding layer of shell texture. Alpha-blending these textured shells from innermost to outmost would produce furry effects. Using the multi-layer slices method, the furry results can be produced in real-time. But the method is effective only when the viewing direction is approximately normal to the surface. Near silhouettes where the viewing direction is approximately parallel to the surface, the gaps between slice layers become evident, and the fur appears to be overly transparent. Lengyel’s method added textured quadrilateral slices called fins on silhouette edges to overcome the problem. These fin slices are normal to the surface, and mapped with the fin texture that is generated by sampling a tuft of fur. By adding these textured fin slices, the layer gaps are covered up, and good rendering results can be achieved. Nevertheless, the cost for generating fins is quite considerable. In order to generate fin slices, we must traverse and detect each edge on object surface, and only those edges that are detected lying in silhouette region will be equipped by fin slices. The silhouette edge detection and fin slices generation are processed in CPU, and these generated fin slices data must be transferred from CPU to GPU for rendering. The edge detection, fin generation and data transfer processes often occupy about 30% times of the whole rendering process of fur, and hence affect the performance evidently.
652
G. Yang and X.-y. Huang
3 GPU Based Silhouette fur Representation Object’s silhouette is the line on object’s surface that is tangent to current viewing direction. It is the division between object’s front face and back face. The silhouette region of object can be defined as the nearby region around silhouette line. In order to use fin slices, the silhouette edge detection will first be implement for each edge, and then the fin slices will be generated only on those edges that are detected lying in silhouette region. Silhouette edge detection is implemented by computing the dot production of viewing direction V and current edge’s normal N. If the absolute value of the dot production is less than certain threshold β, the current edge is considered to be lying in silhouette region. Viewing direction V is the vector directing from view point Eye to current edge’s midpoint Epos. Silhouette edge detection can be represented as formula: | V*•N* | < β, where V = Eye – Epos .
(1)
和
V* and N* represent the unit vector of V N respectively. The value of β is set by users. In our experiments, β is set as 0.2 for achieving satisfied rendering results. Every edge that is detected lying in silhouette region will be equipped with a quadrilateral fin slice. The fin slice is placed vertically on the edge, and slice height is the fur’s length on the edge. In Lengyel’s and Yang’s methods, the silhouette edge detection and fin slice generation are all implemented in CPU. In this paper, by adopting appropriate data transfer strategy, the silhouette edge detection and fin slice generation can be implemented in GPU’s vertex shader; in addition, we also utilize GPU’s pixel shader to implement the texture mapping and rendering computation of fin slices. Our method is discussed in detail as follows. 3.1 GPU Based Silhouette Edge Detection and Fin Generation In the last ten years, the GPU’s programming capability and its computing power have been improved greatly. GPU provides programming ability in its two models: vertex shader and pixel shader. In graphical processing pipeline, objects’ vertex data will first be send to vertex shader for processing, and people can design program in vertex shader to deal with these vertex data flexibly; then, the processed vertex data is rasterized to generate pixel data, and the pixel data are send to pixel shader for processing. Although GPU’s vertex shader has powerful ability in dealing with vertex data, we can’t directly use vertex shader to implement silhouette edge detection and fin generation for two reasons: (1) In order to generate fin slices, a series of new vertices must be generated for representing these new quadrilateral slices; but vertex shader still hasn’t the ability of generating new vertices. (2) The processing object of vertex shader is vertex, not edge, and there is no concept of “edge” in vertex shader, which means we can’t implement edge detection directly in vertex shader. To solve the two problems, we take the following strategy: generating fin slice for every edge in preprocess, and sending all the quadrilateral fin slices to vertex shader for processing when rendering. A sort of edge info is generated for each vertex of these quadrilateral
Accelerated Algorithm for Silhouette Fur Generation Based on GPU
653
slices, and by using the edge info, vertex shader can judge whether the vertex belongs to a silhouette fin slice, that is, a fin slice located in the silhouette region. As show in figure 1, the fin slice on edge e is formed by four vertices V1, V2, V3, and V4. For each vertex, besides the basic vertex information like position, normal, and fin texture coordinates, we append two edge info: edge normal Enormal and edge midpoint Epos. Enormal can be calculated as the sum of two adjacent polygons’ normal. For an object with edgeNum edges, all its fin slices will involve edgeNum*4 vertices, and all these vertices’ data will be sent to vertex shader for processing when rendering. In vertex shader, according to Enormal and Epos, we can judge if current vertex belongs to a silhouette fin slice by computing formula (1). If the vertex is detected to be in a silhouette fin, it will be preserved; if not, the vertex will be discarded by moving it out of the viewing frustum. The range of projected viewing frustum in z-axis is [-1, 1], so the vertex can be discarded by setting its z coordinates out of [-1, 1], for example, setting the vertex coordinates as (0, 0, -2). By the way, all the fin slices that don’t lie in silhouette range will be moved out of viewing frustum and hence be clipped before rasterization.
Fig. 1. Fin slice and edge info. The quadrilateral v0v1v2v3 is the fin slice on edge v0v1, N1 and N2 are the normals of two adjacent triangles, Enormal = N1 + N2, Epos is the mid-point of edge v0v1. The Enormal and Epos will be included in the vertex data of v0, v1, v2 and v3.
Essentially, in this strategy, we just generate all the fin slices in advance, and use the vertex shader to discard all the non-silhouette fins. By this way, we slide over the problem (1) mentioned above; and by introducing the edge info to each fin vertex, we solve the problem (2). As mentioned before, the fin slices that generated in advance have edgeNum*4 vertices; each vertex’s data involve some basic vertex information and additional edge info. It will be time consuming if we transmit all the edgeNum*4 vertices’ data to GPU when rendering each frame. Fortunately, these vertices’ data are unchanged in each frame, so we can transmit these data to GPU just once and store them in video memory for using in each frame. We utilize GPU’s feature “Vertex Buffer Object” to do this. Consequently, the fin slice data transmission can be avoided completely during the real-time rendering process. Therefore, in our method, CPU needn’t do any computation about silhouette edge detection. We just generate all the fin slice data and store these data in video memory in advance; and GPU will complete all the remaining work when rendering, like silhouette edge detection and vertex discarding.
654
G. Yang and X.-y. Huang
Compared to the fin representation method in [8] and [9], our method has higher performance. As shown in table 1, our method can improve the performance in processing fins by 10-15 times. 3.2 Silhouette Fin Rendering After the silhouette edge detection and vertex discarding in vertex shader, all the fin slices that lie in silhouette region are preserved and then will be rendered. We use pixel shader to implement the rendering computation of fin. When rendering fin, besides mapping fin texture, a tangent texture that records the tangent vector of fur also need to be mapped for computing the lighting of fur. As shown in figure 2, the left is fin texture, and the right is the corresponding tangent texture in which each pixel’s value is the tangent vector of fur in the pixel. Fin texture and tangent texture are both generated by sampling a tuft of fur in preprocessing. The fin texture used in our experiments is 512*128. When mapping the fin texture on a fin slice, we needn’t map the whole strip of texture; alternatively, we can just randomly select a segment of the fin texture for mapping, and the width of the selected segment is relative to the width of current fin slice. It is noted that the two ends of the fin texture in figure 2 are consecutive and can be linked together seamlessly; thus the rendering results of silhouette fur can have better continuity.
Fig. 2. Fin texture (left) and the corresponding tangent texture (right). In the tangent texture, the R, G and B components of each pixel’s color record the tangent vector of fur on current pixel.
When rendering fin slices in pixel shader, we first fetch fur’s color from fin texture, and fetch fur’s tangent vector from tangent texture; then the lighting computation of fur is implemented according to the formula 2 that is adopted in [8]: FurLighting = ka + kd * Chair * (1-(T · L)2)pd/2 + ks * (1-(T · H)2)ps/2 .
(2)
In formula 2, T is the tangent vector of fur; L is lighting direction; H is the half-vector between the viewing and lighting direction. Ka, kd and ks are the ambient, diffuse and specular colors; pd and ps are the diffuse and specular power. To maintain the coherence of rendering effect when representing silhouette fur, the opacity of fin slices will be multiplied by an adjusting factorα. *· * , where V* is the viewing direction, and N* is the normal of current edge. Both V* and N* are unit vectors. When a fin slice approach the silhouette line,αwill approach 1; contrarily,αwill lessening. Therefore the fin slices will gradually fade out from the silhouette line to the fringe of the silhouette region, and hence alleviate the color discrepancy between fin slices and the multi-layer fur slices, producing the smoother rendering results. Figure 3 gives some of our rendering results.
α=1-|V N |
Accelerated Algorithm for Silhouette Fur Generation Based on GPU
655
4 Experimental Results In our experiments, we compare the performance of our GPU-based method and the previous method. Three models are selected for experimenting (the rendering results of the three models are shown in Figure 3). We render each model in several different view points and compute its average performance for comparison. The experiments are processed on a PC installed with a P4 3.0GHz CPU, 1.0G Ram and a GeForce 6800GT graphics card with 256M video memory. Experimental data are listed in Table 1.
Fig. 3. Rendering results. In each row, the left is the fur rendering result without adding fin, in which the fur is overly transparent in silhouette region, and you can even see the model’s silhouette border through the fur; the middle is the result of only rendering silhouette fin; the right is the final rendering result by rendering fur and fin together.
656
G. Yang and X.-y. Huang Table 1. Comparison results
Model Polygon number / edge number Processing time of fin with our method (ms) Processing time of fin with previous method Improvement ratio of rendering fur Memory size of fin slice data (KB)
Torus 576 / 864 0.083 1.204 17% 148.5
Bunny 3065 / 4599 0.243 2.964 22% 790.5
Camel 4072 / 6112 0.296 3.525 24% 1050.5
The “processing time of fin” listed in the Row three and Row four of Table 1 is the total time for processing fin in each frame, including the time of silhouette edge detection, fin data transmission from CPU to GPU and fin rendering. It can be seen from the data that our method is faster 10-15 times than the previous method in processing fin. Row five of Table 1 gives the improvement ratio of the whole rendering process of fur by using our method. The improvement ratio is computed as: (FPSnew – FPSold) / FPSold, where FPSnew is the rendering rate by using our method to process fin, and FPSold is the rendering rate by using previous method. Comparing to the previous method, the only expense of our method is the larger video memory occupation. In our method, the data of all the fin slices will be stored in video memory, and in previous method, only the data of fin slices lying in the silhouette region need to be stored. Suppose the edge number of model is edgeNum, and then all the fin slices have edgeNum*4 vertices. For each vertex, the data that need to be stored in video memory include its position coordinates, fin texture coordinates, edge normal and the coordinates of edge midpoint. The coordinates of position or normal require three floating point data to store, and the texture coordinates require two to store. Therefore, the total memory requirement is edgeNum*4*(3*3+2) floating point data = edgeNum*44*4 bytes. The Row six of Table 1 gives the video memory occupation of fin data in our method. Relative to current video memory space, such a memory occupation can be received.
5 Conclusion This paper presents a GPU-based acceleration method for rendering silhouette fur. In the method, by producing edge info for each vertex, the silhouette edge detection can be implemented in GPU; by generating and storing all the fin slice data in video memory in preprocessing, the problem that vertex shader can’t be used to generate new vertices is slide over, and the time spent in fin data transmission from CPU to GPU is also saved. The method accelerates the silhouette fin detection and generation greatly, and improves the wholly performance of rendering furry objects. Besides used in silhouette fur rendering, silhouette edge detection is widely used in many other research issues of computer graphics, for example some rendering methods of NPR (non-photorealistic rendering). Our GPU-based silhouette edge detection approach can certainly be adopted in these issues for accelerating.
Accelerated Algorithm for Silhouette Fur Generation Based on GPU
657
Acknowledgments. This work was supported by the open grant of LCS, ISCAS (no. SYSKF0705), Natural Science Foundation of China (no. 60703006), and National HiTech 863 Program of China (no. 2006AA10Z232).
References 1. Miller, G.: From Wire-Frame to Furry Animals. In: Proceedings of Graphics Interface 1988, pp. 138–146. Lawrence Erlbaum Associates, Mahwah (1988) 2. LeBlanc, A., Turner, A., Thalmann, D.: Rendering hair using pixel blending and shadow buffers. J. Vis. Comput. Animat. 2(1), 92–97 (1991) 3. Watanabe, Y., Suenega, Y.: A trigonal prism-based method for hair image generation. IEEE Computer Graphics and Application 12(1), 47–53 (1992) 4. Berney, J., Redd, J.: Stuart Little: A Tale of Fur, Costumes, Performance, and Integration: Breathing Real Life Into a Digital Character. SIGGRAPH 2000 Course Note #14 (2000) 5. Kajiya, J.T., Kay, T.L.: Rendering Fur with Three Dimensional Textures. In: Computer Graphics Proceedings, ACM SIGGRAPH. Annual Conference Series, pp. 271–280. ACM Press, New York (1989) 6. Meyer, A., Neyret, F.: Interactive Volumetric Textures. In: Proceedings of Eurographics Workshop on Rendering 1998, pp. 157–168. Springer, Heidelberg (1998) 7. Lengyel, J.: Real-time fur. In: Proceedings of Eurographics Workshop on Rendering 2000, pp. 243–256. Springer, Vienna (2000) 8. Lengyel, J., Praun, E., Finkelstein, A., et al.: Real-Time Fur over Arbitrary Surfaces. In: Proceedings of ACM 2001 Symposium on Interactive 3D Graphics, pp. 227–232. ACM Press, New York (2001) 9. Yang, G., Sun, H.Q., Wang, W.C., Wu, E.H.: Interactive Fur Shaping and Rendering Using Non-Uniform Layered Textures. IEEE Computer Graphics and Applications 28(4), 85–93 (2008)
An Ortho-Rectification Method for Space-Borne SAR Image with Imaging Equation Xufei Gao, Xinyu Chen, and Ping Guo Image Processing and Pattern Recognition Laboratory, Beijing Normal University, 100875 Beijing, China [email protected], [email protected], [email protected]
Abstract. An ortho-rectification scheme for space-borne Synthetic Aperture Radar (SAR) image is investigated in this paper. It was usually achieved by indirect mapping between real SAR image pixels and the Digital Evaluation Model (DEM) grids. However, the precise orbit data cannot be easily obtained and using the Newton algorithm needs more calculation. In order to reduce the time consumed during iteration and further improving the accuracy of the SAR image, we propose a new ortho-rectification method with imaging equation. It removes the coordinate conversion by uniformly using the World Geodetic System 1984 (WGS-84). Moreover, the initial time of each DEM grid can be set according to the iteration result of its adjacent point. Compared to other methods, such as Collinearity Equation method, it costs less time and makes the SAR image more accurate. It is also much easier to be implemented in practice. Keywords: Ortho-rectification; Synthetic Aperture Radar; Digital Elevation Model; Imaging Equation.
1 Introduction The Synthetic Aperture Radar (SAR) system is becoming a more sophisticated and effective tool for continuously monitoring changes in various Earth’s surface processes with little dependencies [1][9][11][12]. Comparing with airborne SAR, space-borne SAR plays a significant role in remote sensing and microwave imaging, exploring ground from space, which is mainly used in the field of military [2][3][19]. The SAR digital image processing consists of two parts, one of which is to rectify distortion caused by altitude variations, velocity variations, altitude errors of platform and rotation of the earth and so on. Many SAR image rectification methods have been proposed, such as orthorectification of Radarsat fine using orbital parameters and DEM, SAR geometry and practical ortho-rectification of remote sensing images [5][10][13]. Other approaches emphasize interpolation of satellite orbit vector on the whole arc, calculation of corresponding pixel position of every three-dimension DEM point either, or both [16][18][20]. In these cases, however, the given orbital data are not always precise, and the SAR image is not necessarily rectified by coordinates’ conversion of inertial system if parameters of satellite are provided in Geodetic Coordinate System (GCS). C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 658–666, 2009. © Springer-Verlag Berlin Heidelberg 2009
An Ortho-Rectification Method for Space-borne SAR Image with Imaging Equation
659
It is even not appropriate to set the initial iteration time of each grid as zero, and using the Newton algorithm for DEM points needs more calculation as described in [20]. In this paper, we investigate an ortho-rectification scheme of space-borne SAR briefly and propose a new method with imaging equation. Experiments have been conducted on a SAR image of mountain area to verify the effectiveness of the proposed method. The paper is organized as follows: Section 2 shows the related work, the proposed method is discussed in Section 3, Section 4 presents experiment results, and Section 5 comes to conclusions.
Sensor path
P2 SAR
P1
Azimuth
Squint angle
R0 R
Nadir
R
Plane of zero Doppler X Ground range (after processing to zero Doppler)
Slant range (before processing) Slant range (after processing to zero Doppler) Target Beam footprint
Fig. 1. A simple model of SAR system with side-looking mode (from ref. [8])
2 Related Work The SAR adopts side-looking imaging mode, and the side-looking angle of SAR image is much larger than that of optical image as shown in Fig. 1. This mode leads to a great influence to geometric distortion of SAR image. Consequently, it is very important for SAR application to rectify such distortion and create ortho-image [4]. The Range-Doppler method is commonly used in ortho-rectification, which primarily discusses the relationship of image points and target points from the view of SAR imaging geometry and mostly lies on the accuracy of fundamental catalogue data. Based upon polynomial function, polynomial rectification is a comparatively traditional method, which assumes general distortion of remote sensing image as combination of several of basic and high distortion. As a result, it cannot reach the same satisfied results for different areas in SAR image.
660
X. Gao, X. Chen, and P. Guo
Collinearity equation method, considering the change of sensor’s exterior azimuthal elements firstly and that of terrain later, works with the simulation and calculation of sensor’s position and attitude and does not take fully SAR image side-looking projective characteristics into account. Moreover, it needs elevation information of Ground Control Points (GCP). Because of its mathematical properties, this method can be achieved by using Range-Doppler equations as well as highprecision orbit data. In fact, there are still two problems to be dealt with: one is the acquisition of precise orbit data, which has become a bottleneck of SAR image rectification; another is time consumption of algorithm due to Newton iteration. In order to overcome these two problems, we adopt the WGS-84, as shown in Fig. 2, in cooperating with the small time error between two DEM grid points.
Z WGS-84 Earth’s rotation BIH (1984.0) zero meridian plane Geocenter O Y WGS-84 CTP equator X WGS-84 Fig. 2. Conventional terrestrial system (from ref. [15])
3 Method 3.1 Problem Description Imaging equation refers to the mathematical relationship of coordinates between SAR image (i, j) and ground points (φ, λ, h) on which remote sensing image rectification for any type of sensors is based [17]. From one side, the inertial system is a space fixed coordinate system, in which most of ephemeris parameters are set up. From another side, the Earth's rotation is one of the influences to geometric distortion, especially for space-borne SAR images. As ground points are represented in GCS, and both vectors of position RT and velocity VT of the DEM grid do not change with time. Therefore, the coordinate conversion from inertial system to geodetic system is no longer required and thus more parameters are involved in ephemeris files, such as the number of orbit state vector and the vector itself increasing with time, including position RS and velocity VS.
An Ortho-Rectification Method for Space-borne SAR Image with Imaging Equation
661
The goal of ortho-rectification is to find azimuth and range direction position of SAR image parallel to latitude and longitude of the grid point and in turn to assign power value of the SAR image pixel to the point. The former is calculated through distance from point position in SAR image to the SAR. The latter can be obtained by irradiation time from the SAR phase centers to the range-line of the point [7]. In spite of the difficulties to get imaging equation of (i, j) and (φ, λ, h) and its explicit solution, a nonlinear one with regard to irradiation time is established between standard Doppler frequency and that of (i, j). For solving this equation, we take the iteration result of adjacent point enlightened by [20] as the initial time of next DEM grid. Details are discussed as follows. 3.2 Procedures Fig. 3 shows the procedure of our proposed ortho-rectification method, which is summarized as the following steps: Step 1: Calculation of RS and VS. A formula describing them on the whole orbit arc in the WGS-84 is formed with a number of discrete points on arc segments so as to calculate their values at time t given later [6][20]. Step 2: Determination of RT and VT. Given a selected ellipsoid, the values of RT and VT of each (φ, λ, h) from DEM database are determined through the conversion from Referential Ellipse's Earth Core Geodetic Datum to the WGS-84. Step 3: Evaluation of the relative iteration time based on UTC of the image’s start line. The iteration is started at t = 0 if the grid is a starting point; otherwise, t equals to the time when iteration of its adjacent point is ended. That is,
Is a starting point ⎧ 0, . t=⎨ otherwise ⎩t + σ t ,
(1)
The σt in Equation (1) is defined by
σt = −
f d − f de , f dr
(2)
where fd, fde and fd are expressed as the Doppler frequency, Doppler centroid frequency and Doppler frequency rate, respectively [14]. The iteration of current point terminates when |σt| is less than a predefined threshold, or t is turned to calculate RS, VS, RT, and VT. It should be pointed out that the threshold is determined by
ρ VS σ t < a , 10
(3)
where ρa denotes the azimuth resolution of satellite, indicating that the error of ρa resulting from σt should be less than one-tenth itself. Step 4: Calculation of the azimuth and range direction of SAR image. The position of azimuth and range direction of (i, j) is calculated by
R = RS − RT
,
(4)
662
X. Gao, X. Chen, and P. Guo
and
⎧ i = t * PRF / N , ⎨ ⎩ j = (R − R 0 ) / ρr
(5)
R is the distance from satellite to ground point, PRF is pulse repetition frequency, N is the number of sampling points of whole image’s azimuth direction, R0 is the first slant range, and ρr is the slant range resolution. Step 5: Traversal of all grid points. The program is not finished until all DEM grids are processed.
(ij, Ȝ, h) in DEM
Formula of RT and VT Satellite orbit data
Is Starting point Y
N
Formula of RS and VS
t=0
fd, fde and fdr
RS and VS at t N
ıt = (fd –fde) / fdr
VSıt < ȡa /10
t = t + ıt
Y Real SAR image
᧤i, j᧥
Gray Interpolation
Rectified SAR image
Fig. 3. Flowchart of otho-rectification for space-borne SAR image with the proposed method. The shadowed rectangles represent given data and the dotted lines mean an iterative process.
An Ortho-Rectification Method for Space-borne SAR Image with Imaging Equation
(a)
(b)
(c)
(d)
663
Fig. 4. An image used in the experiment and its results: (a) the original space-borne SAR image; (b) the rectified image by the proposed method; (c) the lower left-hand corner of the original image; and (d) the rectified image of (c).
4 Experiments To verify the feasibility and effectiveness of the proposed method, a space-borne SAR image with size of 14184×11384 and resolution of 10m is tested, which displays a mountain area of longitude 140.12°E and latitude 38.25°N, and it is from the descending-pass data of satellite. The evaluation rang is from -96~3581m, and the precision of data from the DEM database is 90m×90m. 4.1 Performances Analysis Intuitively, the normal ratio relation between front and back of slope of the SAR image is resumed and it is particularly obvious in biggish region shown in Fig. 4. The calculation is decreased dramatically as a result of removal of coordinates and evaluation of iteration time. By comparison, the proposed one saves about four-fifth of time, as listed in Table 1.
664
X. Gao, X. Chen, and P. Guo
It is also encouraging that the accuracy of rectified image is slightly improved with the orbital data computed in Step 1. Given n being the number of GCP, the Root Mean Square (RMS) error of all points is defined as
RMS =
1 n
n
∑ ( XR
2
i
+ YRi ) , 2
(6)
i =1
(X, Y) is the coordinate, XRi and YRi are the residual of X and Y, respectively. Studies have demonstrated that the more GCP are selected, the better the result is. And when the number of GCP reaches a certain value, it does not work any more. After 12 GCP and quadratic polynomial model are employed, the general RMS error is controlled within 0.6 pixels, which guarantees the image’s quality. Table 1. Comparison of time consumed during rectification. (The unit of time is minute.) Round
Range Doppler
Polynomial Function
Collinearity Equation
Proposed Method
1 2 3 4 5 6
18.45 17.69 19.27 17.83 18.15 16.92
18.49 19.26 20.11 20.37 19.76 18.94
20.94 21.07 20.73 21.75 21.36 22.08
4.87 4.96 5.01 4.73 4.39 4.15
Average
18.05
19.49
21.32
4.68
4.2 Program Discussion
The program written by C++ is executed on a PC with CPU of Pentium (R) IV 3.00 GHz and 448MB RAM. Due to the large size of data files, memory status becomes a very important matter. To keep the computational complexity as low as possible, the whole DEM data is partitioned in terms of their relative independence between each two grids. In addition, multi-threaded techniques are used and the execution time is somewhat reduced when run on server. Note that the number of threads can be set optionally with reference to both PC performance and the number of DEM partition from a configuration file.
5 Conclusions In this paper, an ortho-rectification method for space-born SAR is proposed, which mainly focuses on uniformly using the WGS-84 and flexibly choosing starting iteration time together. Experiments show that the new method needs less calculation time, results in desirable accuracy of image, and is much easier to be implemented in practice. However, the SAR image with higher quality needs to be obtained if the inversion or reconstruction of satellite orbit is precisely carried out. This issue will be studied in the future.
An Ortho-Rectification Method for Space-borne SAR Image with Imaging Equation
665
Acknowledgements. The research work described in this paper was fully supported by the grants from the National Natural Science Foundation of China (Project Nos. 60675011, 90820010).
References 1. Dierking, W., Dall, J.: Sea Ice Deformation State from Synthetic Aperture Radar Imagery Part II: Effects of Spatial Resolution and Noise Level. IEEE Trans. on Geoscience and Remote Sensing, 2197–2207 (2008) 2. Evans, D., Apel, J., Arvidson, R., Bindschadler, R., Carsey, F.: Space-borne Synthetic Aperture Radar: Current Status and Future Directions. NASA Technical Memorandum 4679, p. 171 (1995) 3. Fornaro, G., Lombardini, F., Pardini, M., Serafino, F., Soldovieri, F., Costantini, M.: Space-borne Multi-dimensional SAR Imaging: Current Status and Perspectives. In: IEEE International Geoscience and Remote Sensing Symposium, pp. 5277–5280 (2008) 4. Huang, G., Guo, J., Lv, J., Xiao, Z., Zhao, Z., Qiu, C.: Algorithms and Experiment on SAR Image Ortho-rectification Based on Polynomial rectification and Height Displacement Correction. In: Proceedings of the 20th ISPRS Congress, Istanbul, vol. 35, pp. 139–143 (2004) 5. Ibrahim, M., Ahmad, S.: Ortho-rectification of Stereo Spot Panchromatic and Radarsat Fine Mode Data using Orbital Parameters and Digital Elevation Model. In: Proceedings of GISdevelopment, ACRS 2000, Digital Photogrammetry (2000) 6. Li, J.: Orbit Determination of Spacecraft. National Defense Industry Press, Beijing (2003) (in Chinese) 7. Liu, L., Zhou, Y.: A Geometric Correction Technique for Space-borne SAR Image on System Level. J. Radar Science and Technology 2, 20–24 (2004) (in Chinese) 8. Dastgir, N.: Processing SAR Data using Range Doppler and Chirp Scaling Algorithms. Master’s of Science Thesis in Geodesy Report No. 3096, TRITA-GIT 07-005 (2007) 9. Nakamura, K., Wakabayashi, H., Doi, K., Shibuya, K.: Ice Flow Estimation of Shirase Glacier by Using JERS-1/SAR Image Correlation. In: IEEE International Geoscience and Remote Sensing Symposium, pp. 4213–4216 (2007) 10. Pierce, L., Kellndorfer, J., Ulaby, F., Norikane, L.: Practical SAR Ortho-rectification. In: Geoscience and Remote Sensing Symposium, IGARSS 1996, vol. 4, pp. 2329–2331 (1996) 11. Sandia National Laboratories Synthetic Aperture Radar Homepage, http://www.sandia.gov/radar/sar.html 12. Shi, L., Ivanov, A.Y., He, M., Zhao, C.: Oil Spill Mapping in the Western Part of the East China Sea Using Synthetic Aperture Radar Imagery. J. Remote Sensing 29, 6315–6329 (2008) 13. Toutin, T.: Geometric Processing of Remote Sensing Images: Models, Algorithms and Methods. J. Remote Sensing 25, 1893–1924 (2004) 14. Wen, Z., Zhou, Y., Chen, J.: Accurate Method to Calculate Space-borne SAR Doppler Parameter. J. Beijing University of Aeronautics and Astronautics 32, 1418–1421 (2006) (in Chinese) 15. Wikimedia Commons WGS-84.svg file, http://commons.wikimedia.org/wiki/File:WGS-84.svg
666
X. Gao, X. Chen, and P. Guo
16. Xu, L., Yang, W., Pu, G.: Ortho-rectification of Satellite SAR Image in Mountain Area by DEM. J. Computing Techniques for Geophysical and Geochemical Exploration 26, 145– 148 (2004) (in Chinese) 17. Zhang, Y.: Remote Sensing Image Information System. The Science Press, Beijing (2000) (in Chinese) 18. Zhang, P., Huang, J., Guo, C., Xu, J.: The Disposing Method of DEM for the Simulation Imaging of SAR. J. Projectiles, Rockets, Missiles and Guidance 27, 347–350 (2007) (in Chinese) 19. Zhang, S., Long, T., Zeng, T., Ding, Z.: Space-borne Synthetic Aperture Radar Received Data Simulation Based on Airborne SAR Image Data. J. Advances in Space Research 41, 1818–1821 (2008) 20. Zhang, Y., Lin, Z., Zhang, J., Gan, M.: Geometric Rectification of SAR Image. J. Acta Geodaetica et Cartographica Sinica 31, 134–138 (2002) (in Chinese)
Robust Active Appearance Model Based Upon Multi-linear Analysis against Illumination Variation Gyeong-Sic Jo, Hyeon-Joon Moon, and Yong-Guk Kim School of Computer Engineering, Sejong University, Seoul, Korea [email protected]
Abstract. Independent Active Appearance Model (AAM) has been widely used in face recognition, facial expression recognition, and iris recognition because of its good performance. It can also be used in real-time system application since its fitting speed is very fast. When the difference between the input image and the base appearance of AAM is small, the fitting is smooth. However, when the difference can be large because of illumination and/or pose variation in the input image, the fitting result is unsatisfactory. In this paper, we propose a robust AAM using multi-linear analysis, which can make an Eigen-mode within the tensor algebra framework. The Eigen-mode can represent the principal axes of variation across the order of tensor and it can apply to AAM for increasing robustness. In order to construct both of original AAM and the present AAM, we employ YALE data base, which consists of 10 subjects, 9 poses, and 64 Illumination variations. The advantage of YALE data base is that we can use the coordinate of landmarks, which are marked for train-set, with ground truth. Because when the subject and the pose were same, the location of face isalso same. We present how we construct the AAM and results show that the proposed AAM outperforms the original AAM. Keywords: AAM, YALE data base, Multi-linear Analysis, Eigen-mode, Tensor.
1 Introduction The Active Appearance Model (AAM) is a non-linear, generative, and parametric model for the certain visual phenomenon. And it is used for face modeling frequently as well as for other object modeling. The AAM is proposed in [1] firstly, and then improved in [2], which model shape and appearance separately. The AAM is computed by train-set, which consists of pair of images and land marks, which is marked manually by hand. Generally the AAM fitting has performed successfully when error rate between input image and base appearance of the AAM is low. However, when error rate becomes high when illumination and 3D pose are changing, and its fitting result is unsatisfactory. In this paper, we propose a new AAM which contains Eigen-mode based upon multi-linear analysis. The multi-linear analysis is extension of Singular Value Decomposition(SVD) or PCA, and offers a unifying mathematical framework C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 667–673, 2009. © Springer-Verlag Berlin Heidelberg 2009
668
G.-S. Jo, H.-J. Moon, and Y.-G. Kim
suitable for addressing a variety of computer vision problems. The multi-linear analysis builds subspaces of orders of the tensor and a core tensor. The advantage of multilinear analysis is that the core tensor can transform subspace into Eigen-mode, which represent the principal axes of variation across the various mode (people, pose, illumination, and etc)[9]. In contrast, PCA basis vectors represent only the principal axes of variation across images. In other words, Eigen-mode covers the variation of each mode but PCA vectors cover all variations. We can build the AAM which includes not only the principal axes of variation across images but also variation across the various modes. To include the variation across the various modes, the AAM can contain the variations for several modes. This paper is organized as follow. Section 2 and 3 explain AAM and multi-linear analysis. Then, in section 4, we describe the method how to apply multi-linear analysis to AAM. Finally, we are going to show our experimental results and summarize our work in sections 5.
2 Independent Active Appearance Models Independent AAM models the shape and appearance separately [2]. The Shape of AAMs is defined by a mesh located at a particular the vertex location. Because AAM allows a linear shape variation, we can define the shape as follow: .
(1)
indicate the shape parameters. indicates a In equation (1), the coefficients base shape, and represent shape vectors. The shape vectors can be obtained by applying PCA to train-set after using Procrustes analysis in order to normalize the landmarks. The appearance of AAMs is defined within the base shape . This mean that pixels in image lie inside the base shape . AAMs allow a linear appearance variation. Therefore we can define the appearance as follow: .
(2)
Where λi indicate the appearance parameters, represent the appearance vectors, and is a base appearance. After finding both the shape parameters and the appearance parameters, the AAMs instance can be generated by locating each pixel of appearance to the inner side of the current shape with piecewise affine warp. A model instance can be expressed as equation (3): ;
A
The parameters of both shape and appearance are obtained by a fitting algorithm.
(3)
Robust Active Appearance Model Based Upon Multi-linear Analysis
669
Fig. 1. Unfolding a 3rd – order tensor of dimension 3ⅹ4ⅹ5
3 Multi-linear Analysis 3.1 Tensor Algebra Multi-linear analysis is based on higher-order tensor. The tensor, well-known as nway array or multidimensional matrix or n-mode matrix, is a higher order generalization of a vector and matrix. A higher order tensor N is could be given by … … . Therefore the order of vector, matrix, and tensor is 1st, nd th 2 , and N , respectively. In order to manipulate the tensor easily, we should unfold , ,… ,… the tensor to matrix A by stacking its mode-n vectors to column of the matrix. Figure.1 shows the unfolding process. … by a matrix The mode-n product of a higher order tensor … … M is a tensor , which can be denoted by M, and its entries are computed by M
…
…
∑
…
…
.
(4)
This mode-n product of tensor and matrix can be represented in terms of unfolded matrices, B
MA
.
(5)
3.2 Tensor Decomposing In order to decompose the tensor, we employee Higher Order Singular Value Decomposition (HOSVD). HOSVD is an extension of SVD that expresses the tensor as the mode-n product of N-orthogonal spaces
670
G.-S. Jo, H.-J. Moon, and Y.-G. Kim
U
…
U …
U .
(6)
In equation (6), U are mode matrix that contains the orthonormal vectors spanning the column space of the matrix D which is result of unfolding tensor . Tensor , called the core tensor, is analogous to the diagonal singular value matrix of conventional matrix SVD, but it is does not have a diagonal structure. The core tensor is in general a full tensor. The core tensor governs the interaction between the mode matrix U , where is 1, 2, …, N. Procedure of tensor decomposition using HOSVD can be expressed as follows • •
and set up matrix U with left
Compute the SVD of unfolded matrix D singular matrix of SVD. Solve for the core tensor as follows U
…
U …
T .
(7)
4 Applying Multi-linear Analysis to AAMs In section 2 and 3, we described AAMs and multi-linear analysis. Now, we explain how to apply multi-linear analysis to AAMs. Since, in independent AAMs, the appearance vectors of AAMs influence the fitting result poorly and the shape of AAMs is not influenced by changing illumination, we consider identify and pose for AAMs. To build AAMs using multi-linear analysis, we construct a third-order tensor to represent identity, poses, and features. Using HOSVD, we can decompose the tensor into three factors as follows U
U
U ,
(8)
where is the core tensor that governs the interaction among the three mode matrices(U , U , and U ). Using core tensor and mode matrix U , we can build eigen-mode as U .
(9)
Since the AAM add Eigen-mode, we rewrite equation (1) as follow: ,
(10)
where are parameters of Eigen-mode. The advantage of the AAMs based upon Eigen-mode is that the shape is stable under higher error rate between base appearance and input image, which can be happened by changing illumination and pose because each Eigen-mode considers only each mode, not all train-set. Figure 2 compares the fitting results between the present AAM and the traditional AAM.
Robust Active Appearance Model Based Upon Multi-linear Analysis
671
Fig. 2. The fitting results of the present AAM(bottom) and traditional AAM(top).
In Figure 2, the shape of traditional AAM(top) is not able to cover the darker region with the face. On the other hand, the shape of the present AAM(bottom) is covering the darker region very well with the face.
5 Experiments and Evaluation We employ YALE face data base B[8], which is consisted of 10 subjects, 9 poses, and 64 Illuminations, for AAM training and experiment. In YALE face data base, when the subject and the pose are the same, the location of face is also same although there is changing illumination. This property allows that we use landmarks, marked for train-set, with ground truth, because the coordinates of landmarks is not changed in a category which is the same for the subject and the pose. In order to build both of AAMs, we have constructed train-set which consists of images in 9 subjects, 9 poses, and 1 illumination and meshes made by marking 64 landmarks on each image. Ground truth was established by meshes for train-set and images have deferent subject and pose. Experiments were divided into two evaluations: one was a test about how speedily each AAMs ran fitting algorithm, and another was an evaluation about how correctly each model fitted for image. 5.1 Efficiency Comparison The fitting speed of AAMs is important for applying AAM to real-time system. We have compared the fitting speed of both of AAMs, which is performed based on Quad core computer with CPU 2.4GHz and RAM 2GB. The fitting algorithm was run for 5 iterations. We measured the spent time for running fitting algorithm per iteration and all iteration. The traditional AAM used 11 parameters (4 global transform parameters and 7 local transform parameters), and our AAM used 18 parameters (4 global transform parameters, 7 local transform parameters, and 7 mode transform parameters).
672
G.-S. Jo, H.-J. Moon, and Y.-G. Kim Table 1. the speed of fitting algorithm for both AAMs 1
2
3
4
5
Avg.
Traditional AAM
7ms
6ms
6ms
6ms
7ms
6.4ms
Present AAM
7ms
7ms
7ms
7ms
7ms
7ms
Table 1 illustrates that the elapsed times are similar, although our AAM used more parameters than traditional AAM. 5.2 Robustness Experiments We have evaluated about how our AAM correctly fits for images under higher error rate between base appearance and input image. Our evaluation procedure can be expressed as follows: • Dividing images into 4 categories. Each category consists of images which have the average pixel errors 40~49, 50~59, 60~69, and 70~79, respectively. • Fitting for images, and then we measure coordinate errors between the ground truth and the fitted shape, per iteration.
Fig. 3. Fitting error of both of AAMs. Each graph represents shape error per iteration under pixels error 40~49(top left), 50~59(top right), 60~69(bottom left), and 70~79(bottom right).
Robust Active Appearance Model Based Upon Multi-linear Analysis
673
We employed L1 norm for measuring coordinate errors. In Figure 3, each graph represents the fitting error per iteration for categories. When average pixels error is increasing, the fitting error of traditional AAMs is also increasing, but our AAM is not increasing the fitting error.
6 Conclusion In this paper, we proposed a AAM based upon Eigen-mode. In order to establish that AAM, we have built the Eigen-mode using multi-linear analysis, that employs HOSVD to decompose the tensor. We have shown that the present AAM has ability to fit for image speedily, even though parameters are increased. And it can fit for image under higher error rate. Since the present AAM is fast in fitting diverse images, it could be applied to any real-time systems. We plan to apply out AAM to real-time system to recognize face and facial expression tasks.
Acknowledgement This work was supported by the Seoul R&BD Program (10581).
References 1. Edwards, G.J., Taylor, C.J., Cootes, T.F.: Interpreting Face Images Using Active Appearance Models. In: Proc. International Conference on Automatic Face and Gesture Recognition, pp. 300–305 (June 1998) 2. Matthews, I., Baker, S.: Active Appearance Models revisited. International Journal of Computer Vision, 135–164 (2004) 3. Cootes, T., Edwards, G., Taylor, C.: Active appearance Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6) (2001) 4. Gross, R., Matthews, I., Baker, S.: Constructing and fitting active appearance models with occlusion. In: Proceedings of the IEEE Workshop on face processing in Video (June 2004) 5. Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Congitive Neuroscienc 3(1), 71–86 (1991) 6. De Lathauwer, L., De Moor, B., Vandewalle, J.: A Multilinear Singular Value Decomposition. SIAM Journal of Matrix Analysis and Applications 21(4) (2000) 7. Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensembles: TensorFaces. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 447–460. Springer, Heidelberg (2002) 8. Georghiades, A.S., Belhumeur, P.N., Kriegman, K.J.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. Pattern Analysis and Machine Intelligence 23(6), 643–660 (2001)
Modeling and Simulation of Human Interaction Based on Mutual Beliefs Taro Kanno, Atsushi Watanabe, and Kazuo Furuta 7-3-1 Hongo, Bunkyo-ku, Tokyo,113-8656, Japan {kanno,watanabe,furuta}@sys.t.u-tokyo.ac.jp
Abstract. This paper presents the modeling and simulation of human-human interaction based on a concept of mutual beliefs, aiming to describe and investigate the cognitive mechanism behind human interactions that is a crucial factor for system design and assessment. The proposed model captures four important aspects of human interactions: beliefs structure, mental states and cognitive components, cognitive and belief inference processes, and metacognitive manipulations. This model was implemented with a Bayesian belief network and some test simulations were carried out. Results showed that some basic qualitative characteristics of human interactions as well as the effectiveness of mutual beliefs could be well simulated. The paper concludes by discussing the possibility of the application of this model and simulation to universal access and HCI design and assessment. Keywords: Human Modeling, Team Cognition, Interaction, Sharedness, Mutual Beliefs, Agent Simulation, Design and Assessment.
1 Introduction Although receiving relatively little attention, one of the important issues in the studies of human-computer interaction and universal access is that of cognitive factors specific to the interaction among plural persons through computers and IT systems as well as that of a team as a whole with computers or IT systems. CSCW is one of the research fields studying such interaction through computer systems. The field, however, has heretofore focused mainly on how corroborative activities can be supported or mediated by means of computer systems [1], while paying less attention to the cognitive mechanism behind cooperative activities. One of the reasons for this human-centered, but not “humans-centered”, approach in HCI or UA studies seems to be the lack of a sound theory and user modeling that describes the mechanism behind human cooperation in terms of cognitive models. With such a cognitive user model for cooperation, various cognitive simulations similar to those of individual user’s cognition by such as ACT-R and SOAR will be possible [2,3], resulting in a further developed understanding of cognitive factors in cooperation. This paper presents the modeling and simulation of human-human interaction based on a concept of mutual belief. In Section 2, the details of the model are introduced, and in Section 3, the model’s implementation with a Bayesian belief network is explained. In Section 4, the results of some test simulations as well as the simulation architecture is explained. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 674–683, 2009. © Springer-Verlag Berlin Heidelberg 2009
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
675
Section 5 concludes the paper by discussing the possible application of this model and simulation to universal access and HCI design and assessment.
2 Model of Team Cognition In our previous studies [4], we proposed a model of team cognition. The model was intended to describe inter-personal and intra-team factors in cognition in terms of beliefs about a partner’s cognition as well as one’s own cognition. The model consists of a set of three layers of mental components (both cognitive processes and beliefs) and their interactions. In a dyadic case (A and B), the model is composed of: a) Ma = A’s cognition, Mb = B’s cognition (individual cognition except beliefs b) Ma’= A’s belief about Mb, Mb’ = B’s belief about Ma (belief in another member’s cognition) c) Ma’’= A’s belief about Mb’, Mb’’ = B’s belief about Ma’ (belief in another member’s belief) Fig. 1 shows a schematic of this model, depicting three aspects of team cognition: belief structure, mental components, and the inter- and intra-personal interactions of these mental components. Details of each aspect are explained below.
Fig. 1. Team Cognition
2.1 Belief Structure The ability to infer or simulate the minds of others, that is, to obtain the beliefs of others, is believed to be innate and essential to human-beings [5]. It is necessary therefore to consider this aspect when modeling team cognition. We model team cognition with a structure of mutual belief based on the philosophical study of both team and collective intention [6]. Mutual belief is a set of beliefs hierarchically justifiable, such as in the above condition (b) and (c). Although theoretically mutual beliefs
676
T. Kanno, A. Watanabe, and K. Furuta
continue infinitely, empirically two or three are sufficient for actual cooperation. Most of the related theories have referred to the importance of the ability to infer and simulate the intentions of others in cooperative activities, while less attention has been paid to the function of beliefs in the third layer (belief in the beliefs of others). There is a high possibility that the third layer has a function in detecting and explaining, as well as recovering from, conflicts among team members. 2.2 Mental Components People can infer, simulate, feel, and share various aspects of mentality, including cognitive processes, mental states, knowledge, attitudes, and emotions. We refer to these as mental components in this paper. The circles of each layer in Fig. 1 represent mental components. If we can infer that we share some mental components or constructs with others, then they can be mapped onto one’s belief structure. For example, when one person gets angry (A’s first layer), then another person can easily understand or feel that anger (B’s second layer), and at the same time the person who has gotten angry can also infer or expect how the other person perceives their emotion (A’s third layer). Recent work has provided a listing of typical mental components identified by qualitative meta-analysis of recent HCI conference papers [7]. The mechanism and relations among the mental components in a single layer correspond to a model of individual cognition. It is therefore possible to incorporate such a model into each layer of the model shown in Fig.2. 2.3 Process and Manipulation The status of team cognition can be determined by the combination of its process and the status of its mental components. This combination is a key issue in understanding how a team member obtains and updates mental status for establishing and maintaining team cooperation. Communication and the observation of the behavior of partners are the main methods of human interaction. Much research on team cognition has analyzed such observable data to evaluate the efficiency and effectiveness of team cooperation. It is, however, obvious that the analysis of such phenotype interactions cannot directly explain or describe the mechanism behind cooperation because such observable behaviors are the results of the process of team cognition and not that of the reasoning involved in such a process. Indeed, there is another type of interaction involved in team cognition: intra-personal manipulation of mental components such as logical inferences, projections, and prototypes, including beliefs, in a single layer or between different layers,. Based on the status of one’s own mental components and the interrelations among the different layers, a person takes action (i.e., observing or communicating with others) to modify their own mental components as well as proactively influence those of their partner. Note that this type of interaction can be the sole reason for proactive interaction in team cooperation, thus providing a genotype of communication and behaviors. 2.4 Interaction Genotype Fig. 2 illustrates the relations between such observable behaviors and the mechanism and process behind them in communication. The upper two levels correspond to what
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
677
is talked about and its function, which can be analyzed from verbal protocols. Conventional protocol analysis has dealt with these aspects by analyzing the transcripts of verbal protocols. The lower two levels correspond to the drive or reasoning behind such observable interactions. We call the former type of communication “phenotype” and the latter “genotype” using an analogy of the categorization of human errors [8]. Table 1. Interaction Genotypes
Category 1. To drive and modify the process of each single layer (cognition and beliefs) 2. To help partner drive their process (update their cognition and beliefs)
3. To modify partner’s cognition and beliefs
Genotype Code(Reason/Objective) - Lack of necessary/adequate information or knowledge of mental components - Lack of confidence in beliefs
Phenotype (Performative) Query Confirm
-
Inform
-
-
Belief in the lack of necessary/adequate information about mental components To avoid conflicts Look-ahead Just for sharedness To avoid and recover from conflict in the status of mental components Correct misunderstandings
Fig. 2. Phenotype and Genotype of Interaction
Inform Query Confirm
678
T. Kanno, A. Watanabe, and K. Furuta
To elicit such interaction genotypes in team cognition, we conducted a qualitative analysis of several kinds of data obtained in team tasks: verbal protocols, post experiment interviews, and descriptions of the team behaviors by observers. The results of the inner manipulation of mental components and the genotype of interactions obtained to date are listed in Table 1. The second column shows a code in the data which includes reasons for verbal communication (phenotype) and observable behavior. The left column shows the code categories.
3 Simulation Model This section describes how the conceptual model was converted into a computational model. 3.1 Cognitive and Inference Process and Cognitive Status To simulate the non-monotonic human reasoning/inference process based on an uncertain and limited amount of information, a Bayesian Belief Network (BBN) was adopted for the representation of such a process in each layer of the team cognition model. BBNs are probabilistic graphical models consisting of nodes and links. A node for the team cognition model represents a type of cognitive status, such as situation awareness, and the probability of each node represents the degree of belief in the occurrence of the event. A link represents a causal relationship between two different nodes and a conditional probability is assigned to it. The team cognition model can be implemented with six BBNs (three layers * two persons) and the interactions among them. The cognitive task for the simulation performed in this study was to cooperatively achieve situation awareness. Specifically, a two-person team first obtained information from the environment or a partner and updated the probability of the corresponding nodes, and then all the probabilities of the entire BBN were calculated. In the simulation, conscious awareness of the occurrence of events was defined by Equation 1. U represents a set of the nodes of which the person is aware. Pi represents the occurrence probability of Node i. T is the threshold of the probability of becoming aware of the occurrence of events.
U =
∑ {i | P
i
≥ T}
(1)
i
3.2 Interaction between Different Layers It is reasonable to suppose that there are unconscious or subconscious interactions between different layers, for example, between own cognitive processes and the processes used in inferring a partner’s cognitive status. In a previous study, some evidence for this interaction was observed [9], for example people sometimes tended to believe without evidence that a partner might see the same information as they saw. In a computational model, this is represented as the manipulation of the probabilities of the corresponding two nodes between different layers. Two interaction effects, defined by Equations 2 and 3, were implemented in the present study. in Equation2
α
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
679
represents the effect of one’s own cognition on the belief layers, while βin Equation 3 represents the effect of the partner’s belief in their own cognition. Pi = αP1
(2)
P1 = βP2
(3)
3.3 Communication Generation As shown in Table 1, three types of interaction genotype have been obtained to date. In the following simulations, only the third one was implemented in the computational model. The rules derived from this genotype are defined by Equations 4 and 5. If Ua1 ≠Ua2 and If Ua2 is believed to be false then Inform (Ua1) to Modify (U1b). If Ua1 is believed to be false then Correct (Ua1) based on Ua2. If Ua1 ≠Ua3 then Inform(U1) to Modify(U2b).
(4) (5)
4 Simulation The process of obtaining shared situation awareness between agents A and B was simulated. Each agent has its own three layers of BBN. By the combination of these six BBNs, the distribution of knowledge, or heterogeneity of agents, can be represented. An example of the BBNs is shown in Fig. 3. The algorithm of the simulation is illustrated in Fig.4. The left upper nodes are those possessed only by Agent A, while the right-most node is the representative node for the events that Agent A cannot perceive but Agent B can.
Fig. 3. Agent A’s 1st Layer
680
T. Kanno, A. Watanabe, and K. Furuta
Fig. 4. Overview of the Simulation
This is a scenario-based simulation in which each agent obtains information from the environment sequentially based on the scenario and in which all occurrence probabilities are updated following the process shown in Fig.4. 4.1 Agent Characteristics The characteristics of an agent can be defined by its tendency in deciding the correct nodes between the 1st and 2nd layer, that is, the extent to which the agent has self confidence on their own cognition. The four characteristics shown in Table 2 were defined and implemented for the following simulation. Table 2. Agent Characteristics
Type 1 2 3 4
Character Strong self-confidence Following blindly Balanced Balanced (2)
Description Believe one‘s own cognition (U1 is correct) Follow one’s partner’s cognition (U2 is correct) Decide the one with more detailed knoweldge is correct Characteristics 3 without the third layer
4.2 Evaluation Criteria To assess the performance of the cooperation between the two agents, accuracy and sharedness, defined by Equations 6 and 7, respectively, were introduced. In Equations 6 and 7, U0 refers to the correct set of nodes that actually occurred in the scenario. Accuracy measures how correctly the team of Agent A and B is aware of the events that actually occur. The first term in sharedness is the completeness of the belief in the partner’s cognition (1st layer), while the second term represents the accuracy of the belief in the partner’s cognition (1st layer) [10, 11]. (6)
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
681
(7)
5 Results and Discussion Simulation was conducted with the different agent characteristics combinations. The tested combinations are shown in Table 3, and comparisons of the accuracy and sharedness of each team are shown in Fig.5. 40 trials for each team condition were conducted. The results show that Team A received the lowest score for both accuracy and sharedness. It was observed from the communication log that each agent insisted on their correctness and did not complement their own cognitions with their partner’s. Team B scored the highest for sharedness but not accuracy because the members were strongly mutually dependent on their partners and did not take advantage of the merit of distributed knowledge. Teams C and D exhibited good performance for both accuracy and sharedness. It was also found from the comparison between Teams C and D that activation of the third layer (beliefs about beliefs) was effective on team performance. From the communication log, it was found that feedback (acknowledgement) to the speaker made communication more efficient and effective in Team D. This matches the concept of closed loop communication believed to be one of the important team competencies [12]. Table 3. Combinations of Agent Characteristics
Team A B C D
Agent A 1 2 4 3
Fig. 5. Accuracy and Sharedness Results
Agent B 1 2 4 3
682
T. Kanno, A. Watanabe, and K. Furuta
6 Conclusion This paper introduced a model for the simulation of human cooperative activities based on a concept of mutual belief. One of the characteristics of this model is the capturing of the mechanism behind cooperation not in terms of team function or macrocognition [13,14] but a cognitive user model (process and status). Another important characteristic is that the model separates metacognitive processes for cooperation (vertical) from cognitive/inference processes (horizontal). The model therefore can be used for almost all types of cognitive user models including Card’s information processing model [15], Norman’s model[16], and Simplex2[7], when applying them to the cognitive aspects of human cooperation. The simulation results showed that some basic qualitative characteristics of human cooperation were simulated, suggesting in particular that consideration of what one’s partner is thinking about oneself (activation of the third layer) is effective for good team performance. Although further testing under various conditions to assess the validity of this model is necessary, the current results show the potential of our simulation to provide a testbed environment for human cooperation that otherwise would be difficult to prepare using laboratory experiments or filed tests. This type of simulation also could be utilized for the design and assessment of HCI and UA for cooperation and collaboration, such as in the assessment of usability and accessibility through the simulation of the sharing processes of certain mental aspects or components.
References 1. Wilson, P.: Computer Supported Cooperative Work: An Introduction. Kluwer Academic Publishers, Dordrecht (1991) 2. Anderson, J.R., Lebiere, C.: The Atomic Components of Thought. Erlbaum, Mahwah (1998) 3. Newell: Unified Theories of Cognition. Harvard University Press (1990) 4. Kanno, T., Furuta, K.: Sharing Awareness, Intention, and Belief. In: Proc. 2nd Int. Conf. Augmented Cognition, pp. 230–235 (2006) 5. Baron-Cohen, S.: Mindblindnes. The MIT Press, Cambridge (1997) 6. Tuomela, R., Miller, K.: We-intentions. Philosophical Studies 53, 367–389 (1987) 7. Adams, R.: Decision and stress: cognition and e-accessibility in the information workplace. Univ. Access Inf. Soc. 5, 363–379 (2007) 8. Hollnagel, E.: The phenotype of erroneous actions: Implications for HCI design. In: Weir, G., Alty, J. (eds.) Human-Computer Interaction and Complex Systems. Academic Press, London (1990) 9. Kitahara, Y., Hope, T., Kanno, T., Furuta, K.: Developing an understanding of genotypes in studies of shared intention. In: Proc. 2nd Int. Conf. Applied Human Factors and Ergonomics, CD-ROM (2008) 10. Kanno, T.: The Notion of Sharedness based on Mutual Belief. In: Proc. 12th. Int. Conf. Human-Computer Interaction, pp. 1347–1351 (2007) 11. Shu, Y., Furuta, K.: An inference method of team situation awareness based on mutual awareness. Cognition, Technology, and Work 7, 272–287 (2005)
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
683
12. Guzzo, R.A., Salas, E. (Associates eds.): Team effectiveness and decision-making in organizations, pp. 333–380. Pfeiffer (1995) 13. Letsky, P.M., Warner, W.N., Fiore, M.S., Smith, C.A.P.: Macrocognition in Teams, Ashgate (2008) 14. Salas, E., Fiore, M.S.: Team Cognition. American Psychology Association (2004) 15. Card, S.K., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Mahwah (1983) 16. Norman, D.A.: Cognitive Engineering. In: Norman, D.A., Draper, S.W. (eds.) User Centered System Design, ch. 3, pp. 31–61. Erlbaum, Hillsdale (1986)
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users Jan-Paul. Leuteritz1, Harald Widlroither1, Alexandros Mourouzis2, Maria Panou2, Margherita Antona3, and Asterios Leonidis3 1
Fraunhofer IAO / University of Stuttgart IAT, Nobelstr. 12, 70569 Stuttgart, Germany {jan-paul.leuteritz,harald.widlroither}@iao.fraunhofer.de 2 Centre for Research and Technology Hellas, Hellenic Institute of Transport, Thessaloniki, Greece {mourouzi,panou}@certh.gr 3 Foundation for Research and Technology - Hellas, Institute of Computer Science, Heraklion, Greece {anona,leonidis}@ics.forth.gr
Abstract. This paper describes the framework and development process of adaptive user interfaces within the OASIS project. After presenting a rationale for user interface adaptation to address the needs and requirements of older users, the paper presents and discusses the architecture and functionality of the OASIS adaptation framework, focussing in particular on an advanced library of adaptive widgets, as well as on the process of elaborating the adaptation rules. The results of the adopted approach are discussed and hints to future developments are provided. Keywords: Automatic user interface adaptation, Unified User Interface Design., adaptive widgets, adaptation decision-making.
1 Introduction Over the last 50 years, the number of older persons worldwide has tripled - and will more than triple again over the next 50-year period as the annual growth of the older population (1.9%) is significantly higher than that of the total population (1.02%). The European Commission has predicted that between 1995 and 2025 the UK alone will see a 44% rise in people over 60, while in the United States the baby-boomer generation which consists of about 76 million people and is the largest group ever in the U.S., is heading towards retirement [7]. This situation asks for new solutions towards improving the independence, the quality of life, and the active ageing of older citizens. Although substantial advances have been made in applying technology for the benefit of older persons, a lot of work remains to be done. Notably, only 13% of people aged over 65 are Internet users, while the average in Europe is 51%. Recent advancements in Information Society research have tremendous potential for meeting the emerging needs of older people and for further improving their quality C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 684–693, 2009. © Springer-Verlag Berlin Heidelberg 2009
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
685
of life. OASIS is an Integrated Project of the 7th FP of the EC in the area of eInclusion that aims at increasing the quality of life and the autonomy of elderly people by facilitating their access to innovative web-based services. OASIS stands for “Open architecture for Accessible Services Integration and Standardisation”, which hints at the project’s way towards making this vision a reality: OASIS aims at creating an open reference architecture, which allows not only for a seamless interconnection of Web services, but also for plug-and-play of new services. In order to give the OASIS architecture a critical mass for widespread implementation, the project consortium will make the reference architecture in question, and the related tools, available as open source. 12 initial services have been selected for prototype development in the project’s lifetime. They are joined into three main categories considered vital for the quality of life enhancement of the elderly: Independent Living Applications, Autonomous Mobility, and Smart Workplaces Applications [8]. OASIS aims at creating an open system. Not only new Web services will be able to connect via the hyper-ontological framework. New applications, that process information from different Web services in an innovative manner, are expected to emerge frequently. One main advantage of this approach is that it enables developers to make any new Web service or application available to a large community of elderly users through the OASIS platform. The OASIS approach aims at delivering all such services in appropriate forms optimally tailored to diverse interaction needs through the OASIS advanced approach to user interface adaptation. This paper focuses on the R&D approach of the project towards ensuring high quality interaction for older users, building on personalisation and adaptation techniques. The chosen methods for automatic user interface adaptation and rules generation are introduced and discussed here. Their purpose is: • to facilitate the development of interactive applications and services for different platforms; • to develop various accessibility components that can be used across the range of interaction devices supported by the project; • to enable the personalisation of interactions, as well as automatic tailoring-to-device capabilities and characteristics, thus offering an individualised user experience; • to develop components that facilitate the rapid prototyping of accessible and selfadaptive interfaces for the project’s range of supported devices.
2 Background 2.1 Older Users as a Target Group Older people are increasingly becoming the dominant group of customers of a variety of products and services (both in terms of number and buying power) [7]. This user group, large and diverse in its physical, sensory, and cognitive capabilities, can benefit from technological applications which can enable them to retain their independent living, and ultimately reduce health care expenditure. Although older people are not generally considered to have disabilities, the natural ageing process carries some degenerative ability changes, which can include diminished vision, varying degrees of hearing loss, psychomotor impairments, as well as reduced attention, memory and learning abilities. All of these changes affect the way
686
J.-P. Leuteritz et al.
older people use Information and Communication Technology (ICT), which must be accommodated to ensure that they are not disadvantaged when using ICT. This accommodation can only be realized after a thorough understanding of the changes associated with ageing and of their impact on the needs of older people concerning the interaction with technical systems. 2.2 Rationale for a User Interface Adaptation-Based Approach According to ISO 9241, the usability of a technical system depends inter alia on the user and the context of use. This requirement becomes even more critical when designing for non–traditional and diverse user groups, such as the elderly. Therefore, appropriate, personalised, systematically-applicable and cost-effective interaction solutions need to be elaborated, and proactive approaches towards coping with multiple dimensions of diversity are a prerequisite. The concepts of Universal Access and Design for All [12], [13] have the potential to contribute substantially in this respect, as they cater for diversity in every dimension of human-computer interaction. Recent approaches towards Design for All imply the notion of intelligent user interface run-time adaptation, i.e., the capability of automatically adapting to individual user characteristics and contexts of use through the realization of alternative patterns of interactive behaviour. The Unified User Interface design method has been developed to facilitate the design of user interfaces with automatic adaptation behavior [11] These efforts have also pointed out the compelling need of making available appropriate support tools for the design process of user interfaces capable of automatic adaptation. In a parallel line of work to user-oriented adaptivity, user interface (UI) research has recently addressed the identification of, and adaptation to, the situational and technical context of interaction (see, e.g., [1] for an overview) – although most of the time, user- and context-oriented adaptivity are combined (e.g., see [3]). Adaptivity concerns systems that adapt to the form factor of the user’s device, the actual interaction devices available to the user, the user’s geographical location, etc. In the context outlined above, OASIS aims to provide high-quality, ambient user interfaces by effectively addressing diversity in the following dimensions: (i) target user population and changing abilities due to aging; (ii) categories of delivered services and applications; and (iii) different computing-platforms and devices (i.e., PDA, smartphone, desktops, laptops). In this context, new accessibility components and alternative interfaces are constructed within OASIS that will be used across the range of devices supported by the project, offering personalised, ambient, multimodal, and intuitive interaction. By design, the OASIS user interface will embed adaptations based on user, device and context characteristics. Furthermore, OASIS develops innovative tools to facilitate the rapid prototyping of accessible and self-adaptive interfaces for cutting-edge technologies and devices supported by the project.
3 User Interface Adaptation Methodology The OASIS user interface adaptation methodology is decomposed into two distinct but highly correlated stages: the specification and the alternative design. During the specification stage, the conditionally adjustable UI aspects and the discrete dimensions that are
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
687
correlated with the adaptation decisions (user- and context- related parameters) are identified. During the alternative design stage, a set of alternative designs is created for each UI component. These alternatives are defined according to the requirements posed by each adaptation dimension (e.g., visual impairment) and parameter (e.g., red-green colour blindness or glaucoma). These alternatives need to be further encoded into a rule set, loaded by a rule inference engine, evaluated and finally propagated from the concept layer to the actual presentation layer. The OASIS project boosts adaptation by incorporating the above-outlined mechanisms into a complete framework that inherently supports adaptation. The Decision Making Specification Language (DMSL) engine and run-time environment [10] offer a powerful rule definition mechanism and promote scalability by utilizing external rule files while relieving the actual UI implementation code from any adaptationrelated conditionality. The Adaptive Widget Library developed in OASIS (see section 3.1) encapsulates all the necessary complexity for supporting adaptation of user interface components (from evaluation request till decision application). The OASIS adaptation platform infrastructure consists of the following components (see Fig. 1): the DMSL Server and the Adaptive Widget Library. The DMSL server is divided into the DMSL Engine Core and the DMSL Proxy. The Core is responsible for loading and evaluating the rules, while the Proxy acts as a mediator between the Core and external “clients”, by monitoring incoming connections, processing the requests and invoking the appropriate core methods.
Fig. 1. The OASIS Adaptation platform infrastructure
3.1 The OASIS Adaptive Widget Library The Adaptive Widget Library is a set of primitive (e.g., buttons or drop-down menus) and complex (e.g. file uploaders, image viewers) UI components that utilizes the DMSL Server facility to support adaptation. The library’s ease of use is ensured by relieving developers of the responsibility of manually adapting any widget attributes by offering a common “adapt” method. Each widget encloses a list of its adaptive attributes and when instructed to adapt itself, evaluates each attribute and applies the corresponding decision. Considering that the DMSL Server is a remote component, network connectivity is an essential precondition for the overall process; thus any lack of it should be handled beforehand. A fail-safe mechanism has been developed to minimize the side effects of potential connectivity loss, where the “last” known
688
J.-P. Leuteritz et al. Table 1. Adaptation steps in the OASIS framework 1.
2. 3. 4.
At compile time, the developer defines the rule file that the DMSL Server will load for the specific User Interface decision-making process and builds the user interface using the OASIS Adaptive Widget Library At runtime, the application – when necessary – invokes the adapt method for each contained widget Each widget asks the DMSL server to evaluate all the rules related to its subject to adaptation attributes Upon successful evaluation, it applies these decisions and updates its appearance to meet user and context needs
configuration is stored and maintained locally to facilitate “static” user interface generation without supporting on-the-fly adaptation. The adaptation process in the OASIS framework is outlined in Table 1. An example of user interface created using the Adaptive Widget Library is presented in Figure 2.
Fig. 2. An exemplary user interface developed with the Adaptive Widget Toolkit
Fig. 3. A simple example of widget adaptation
The panel, button and image UI components which appear in this interface are available through the library. Figure 3 depicts how this interface is automatically adapted through DMSL rules. In the left part of the figure, the interface displays a color combination, while in the right part a greyscale is used for enhanced contrast. This type of adaptation can be useful in case of visual limitations of older users.
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
689
3.2 Interaction Prototyping Tool To further facilitate adaptation design using the Adaptive Widget Library, a tool for rapid prototyping of interactions is being implemented to bind together all the components of the framework. This tool is intended to facilitate the connection of application task models (i.e., services) with accessibility solutions and adaptivity. Specifically, it is going to enable interaction designers to: − create rough interaction models, − encapsulate preliminary adaptation logic and effects and − specify how adaptations are effected in the interactive front-end. This tool incorporates the facilities mentioned above in a form similar to reusable software design patterns. The output of the prototyping tool will facilitate further development of the interfaces, while preserving the possibility for full-cycle reengineering of the modified output.
4 Adaptation Rules Elaboration Methodology This section discusses the methodology adopted in the elaboration of the adaptation rules for the OASIS prototyping tool. The major challenge in creating adaptation rules for self-adaptive user interfaces lies within the complexity of the resulting design space. Even a relatively simple adaptation design space including 3 different aspects of the interface, each of which will have three different alternatives, can in theory produce 27 different interface instantiations. Hence, iterative user interface development, involving repetitive user testing in the early phases, is not a very attractive method for creating adaptation rules. Instead, a more cost- and time-efficient solution is a theory based approach. In the OASIS project the first step in the development of the adaptation routine was a review of general interface design guidelines (e.g., ISO 9241). This was primarily meant to ensure that the adaptation rules defined would not contradict existing standards. Afterwards, specific design guidelines for elderly users were examined (e.g., [4]), in order to determine where adaptations would be appropriate, according to the restrictions of the devices to be used. The result of this work is a matrix in which the lines contain the adaptation-trigger parameters (e.g. the user’s age or impairments profile) and the columns show the user interface elements to be adapted (e.g., font size and color profile). Parameters to be linked via an adaptation rule are indicated at the intersection of rows and columns. In the matrix, the trigger parameters are specified in a format that takes into account the exact definitions of these variables. This is meant to facilitate the translation of the rules into DMS. As a subsequent step, the matrix was sent out to the OASIS consortium in order to collect feedback and to ensure that application-dependent issues are appropriately taken into account. The option of also collecting user-based feedback was discarded in this phase, based on the rationale that both the matrix and the underlying concepts would be difficult for the users to fully comprehend and comment upon. Users should rather be confronted with a prototype of the adaptive interface and provide a direct statement of approval or disapproval, e.g., via a validated measurement instrument for user satisfaction [6].
690
J.-P. Leuteritz et al.
After updating the matrix according to the collected feedback, it was checked for possible conflicts between rules to be created. Finally, the adaptation rules were elaborated. Table 2 below summarizes the resulting adaptation trigger parameters, and Table 3 shows the user interface elements subject to adaptation in the context of OASIS. Table 4 displays two examples of adaptation rules. Table 2. Trigger parameters • • •
End devices: PCs (including laptops • and tablet PCs) PDAs • Symbian mobile phones
• •
Person-related parameters: All users o Language Elderly users: o Age o Occupation / Life situation o Computer literacy o Speech impairment o Vision impairment o Mobility- / Motor impairment o Cognitive impairment o Hearing impairment Caregivers o Profession Others o User subgroup
Context-related parameters: Location o Office o Home o Other points of interest • Ambient parameters o Illuminance o Noise level o Handling conditions • Occupation parameters o At work o Moving o Car o By feet o On bus / train • Device specification o Weight o Robustness •
Table 3. User Interface elements subject to adaptation Font size Icon size Color-profile Brightness Audio volume Cursor Size of edit fields
Animation Voice control Text-to-speech Touch screen On-screen keyboard Touch less interface Caution warnings
Table 4. Adaptation rules – examples 1
If
Then
2
If
Then
[Elderly user’s age = 1 or 2 or 3] or [Elderly user’s life situation = 2 or 3] or [Elderly user’s computer literacy level = 0] or [Vision impairment = 1 or 2 or 3] Resolution 640*480 pixels
[Elderly user’s life situation =1] or [Elderly user’s computer literacy level = 1] Resolution 800*600 pixels
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
691
A further step to be accomplished is the final validation of the designed adaptations. Two approaches are under consideration towards this purpose: Hypothesis driven: each single adaptation rule could be tested by presenting all possible variations of an interface element to test users and asking them for their preference – or doing a performance test with interface instances that only differ concerning one variable. Comparison to standard device: one interface instance is considered as “standard” and each user in a user test is presented with both the standard and an adapted instance. This method will not tell the experimenter which adaptation is preferable, but it will show if all the adaptations together make sense. 4.1 Lessons Learned Significant experience was acquired through the process of creating adaptation rules. For example, it was found that that the brief descriptions of some interface characteristics in the matrix could cause misunderstandings. When using such a matrix to collect feedback, each of the dependent parameters should be accompanied by a short description. The use of scenarios and personas can be very useful in order to explain why an additional parameter is needed and how it is supposed to behave. Furthermore, phrasing scenarios can help developers of the adaptation rules to keep a focus on the overall usability of the system and avoid losing orientation between a large number of more or less important and even partially contradicting adaptation indications. Another important aspect of elaborating adaptation rules relates to the interpretation of existing design knowledge and guidelines. For many adaptation parameters, the literature does not provide precise thresholds for the trigger variables or the elements to be adapted. For example, older users are said to prefer bigger font sizes. Yet sources often do not give age-related cut-off-values, which is presumably due to the fact that the elderly are a very heterogeneous group. On the other hand, adaptation rules must be elaborated using precise thresholds, specifying, for example, at which user age the font size should grow, and to which extent. This issue was addressed by including adaptation rules with arbitrarily set thresholds based on design experience rather than excluding adaptations. It was assumed that the precision of the adaptation rules could still be fine tuned at a later stage through user testing. This decision was taken in order to allow the prototyping tool work with a rather large variety of rules. Corrective mechanisms will be included in the OASIS adaptation framework which support also the manual configuration of some interface aspects such font size or color profile. This empowers the user to reject any unwanted adaptation, resulting in an optimal personalisation of the UI.
5 Discussion and Conclusions This paper has presented the OASIS approach to user interface adaptation in the context of Web services for older users, addressing in particular the elaborated adaptation framework, the role of the OASIS Adaptive Widget Library and the process of designing the adaptations embodied in such a library.
692
J.-P. Leuteritz et al.
The design of user interface adaptation is a relatively novel undertaking. Although a general methodology is available, such as Unified User Interface Design, further research is necessary on how to best fine-tune various aspects of this methodology in different design cases. The work presented in this paper may serve as an example in this respect and offers hints for discussion. A first consideration that emerges is that tools are required in order to easily integrate adaptation knowledge into user interface development. The OASIS Widget Library has been developed in order to provide developers with fundamental support in applying adaptation. Through such a library, developers can easily embed user interface components’ adaptations in their user interfaces without having to design them from scratch or to implement the adaptive behaviour. However, it should be considered that adaptation not only affects the physical level of interaction, i.e., the presentation of interactive artifacts in the user interface, but also the interaction dialogue and overall structure. For example, the length of interactions, the number of interaction objects or options, the metaphors, wordings and operators, and the depth of menus are dialogue characteristics potentially subject to adaptation. Furthermore, additional adaptation triggers could also be considered, such as, for example, computer literacy and expertise. These aspects are not explicitly addressed in OASIS at the moment. However, it should be mentioned that the Unified User Interface Methodology provides techniques and tools for gradually expanding the types and levels of adaptation in a user interface, thus offering the opportunity to address increasing and evolving adaptation design requirements [2]. Additionally, the recent uptake of ontology-driven system development demands for general approaches targeted to linking adaptation to ontologies. A potential architecture for exploiting ontologies for adaptation purposes is presented in [9]. One fundamental challenge in creating self-adaptive interfaces lies in the difficulties encountered when translating the scientific state of the art into precise rules, as there are seldom concise thresholds defined for the triggers or even for the adaptive elements. Yet this challenge can be turned into a unique opportunity: adaptation rules are probably the most concise form of shaping theories about user behavior. Once a rule is established, it can be tested with specific user groups in experimental settings. If the results indicate that a rule improves interaction for certain user characteristics, than this rule is consolidated and can be re-used. If the rule turns out to be at least not generally right, it can be dropped. Eventually, the development of self-adaptive systems could bring new importance to basic research in human-computer-interaction.
References 1. Abowd, G.D., Ebling, M., Gellersen, H., Hunt, G., Lei, H.: Guest Editors’ Introduction: Context-Aware Computing. IEEE Pervasive Computing 1(3), 22–23 (2002), http://dx.doi.org/10.1109/MPRV.2002.1037718 2. Antona, M., Savidis, A., Stephanidis, C.: A Process–Oriented Interactive Design Environment for Automatic User Interface Adaptation. International Journal of Human Computer Interaction 20(2), 79–116 (2006) 3. Doulgeraki, C., Partarakis, N., Mourouzis, A., Stephanidis, C.: Adaptable Web-based user interfaces: methodology and practice. eMinds International Journal of Human Computer Interaction 1(5), 79–110 (2009)
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
693
4. Fisk, A., Rogers, W., Charness, N.: Designing for older adults: Principles and creative human factor approaches. Crc. Pr. Inc., London (2004) 5. ISO 9241, Ergonomics of human-system interaction. International Organization for Standardisation (2008) 6. Leuteritz, J., Widlroither, H., Klüh, M.: Multi-level validation of the ISOmetrics Questionnaire during the quantitative and qualitative usability assessment of two prototypes of a wall-mounted touch-screen device. In: Stephanidis, C. (ed.) Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009), San Diego, California, USA, July 19–24. Springer, Berlin (2009) 7. Kurniawan, S.: Age Related Differences in the Interface Design Process. In: Stephanidis, C. (ed.) The Universal Access Handbook. Taylor & Francis, Abington (in press, 2009) 8. OASIS Consortium, (OASIS) Grant Agreement no 215754 – Annex I - Description of Work. European Commission, Brussels, Belgium (2007) 9. Partarakis, N., Doulgeraki, C., Leonidis, A., Antona, M., Stephanidis, C.: User Interface Adaptation of Web-based Services on the Semantic Web. In: Stephanidis, C. (ed.) Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009), San Diego, California, USA, July 19–24. Springer, Berlin (2009) 10. Savidis, A., Antona, M., Stephanidis, C.: A Decision-Making Specification Language for Verifiable User-Interface Adaptation Logic. International Journal of Software Engineering and Knowledge Engineering 15(6), 1063–1094 (2005) 11. Savidis, A., Stephanidis, C.: Unified User Interface Design: Designing Universally Accessible Interactions. International Journal of Interacting with Computers 16(2), 243–270 (2004) 12. Stephanidis, C. (ed.) Salvendy, G., Akoumianakis, D., Arnold, A., Bevan, N., Dardailler, D., Emiliani, P.L., Iakovidis, I., Jenkins, P., Karshmer, A., Korn, P., Marcus, A., Murphy, H., Oppermann, C., Stary, C., Tamura, H., Tscheligi, M., Ueda, H., Weber, G., Ziegler, J.: Toward an Information Society for All: HCI challenges and R&D recommendations. International Journal of Human-Computer Interaction 11(1), 1–28 (1999) 13. Stephanidis, C. (ed.) Salvendy, G., Akoumianakis, D., Bevan, N., Brewer, J., Emiliani, P.L., Galetsas, A., Haataja, S., Iakovidis, I., Jacko, J., Jenkins, P., Karshmer, A., Korn, P., Marcus, A., Murphy, H., Stary, C., Vanderheiden, G., Weber, G., Ziegler, J.: Toward an Information Society for All: An International R&D Agenda. International Journal of Human-Computer Interaction 10(2), 107–134 (1998)
User Individual Differences in Intelligent Interaction: Do They Matter? Jelena Nakić and Andrina Granić Faculty of Science, University of Split, Nikole Tesle 12, 21000 Split, Croatia {jelena.nakic,andrina.granic}@pmfst.hr
Abstract. Designing an intelligent system, as confirmed by research, must address relevant individual characteristics of users. This paper offers a brief review of individual differences literature in the HCI field in general and e-learning area in particular. Research suggests that using adaptive e-learning systems may improve user learning performance and increase her/his learning outcome. An empirical study presented in this paper encompasses a comprehensive user analysis regarding a web-based learning application. Statistically significant correlations were found between user intelligence, experience and motivation for e-learning with her/his learning outcome accomplished in an e-learning session. These results contribute to the knowledge base of user individual differences and will be considered in an estimation of possible benefits from enabling the system adaptivity. Keywords: individual differences, user analysis, adaptive systems, e-learning, empirical study.
1 Introduction System intelligent/adaptive behavior strongly relies on user individual differences, the claim which is already confirmed and empirically proved by Human-Computer Interaction (HCI) research [6, 12, 13, 15, 22, 25]. Such assumption is in line with related studies completed by the authors; for example [17, 18]. However, developing adaptive systems is the process that includes comprehensive research, in relation to application domain of particular system. Designing intelligent interaction needs to take into account several research questions, including (i) how to identify relevant user characteristics, (ii) how to model the user, (iii) what parts of the adaptive system shall change and in what way and (iv) how to employ user model to implement adaptivity [4]. This paper describes an empirical study considering the first question in context of education. Particularly, the study identifies and appraises user individual differences and their relevance in learning environment. The paper is structured as follows. An introductory section provides a brief review of individual differences literature in the HCI field in general and e-learning area in particular. Literature findings are discussed in context of objectives and motivation for the research. Subsequently, the exploratory study is presented, along with results and discussion. Finally, conclusions are drawn and future research work is identified. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 694–703, 2009. © Springer-Verlag Berlin Heidelberg 2009
User Individual Differences in Intelligent Interaction: Do They Matter?
695
1.1 Individual Differences in HCI: A Literature Review and Discussion The first step in enabling a system to adapt to individual use is identifying and acquiring relevant information about users. The initial comprehensive overview of individual differences in the HCI field is Egan’s (1988) report on diversities between users in completing common computing tasks such as programming, text editing and information search. He pointed out that the ambition of adaptivity (e.g. dynamic or real-time adaptation) is that not only “everyone should be computer literate” but also that “computers should be user literate”, suggesting that user differences could be understood and predicted as well as being modified through the system design. Since then, the diffusion of technology brought computers to wide user population with extensive variety of knowledge, experience and skill dimensions in different areas. Accordingly, the identification of individual differences relevant for a system adaptation became a critical issue. In their early consideration of adaptivity, Browne, Norman and Riches (1990) provided one of the first classifications of candidate dimensions of user differences that may impact computer usage. They included diversities in cognitive styles (field dependence/independence, impulsivity/reflectivity, operation learning/comprehension learning), personality factors, psycho-motor skills, experience, goals and requirements, expectations, preferences, cognitive strategies and a number of cognitive abilities. Later on, Dillon and Watson (1996) reviewed a century of individual differences work in psychology stressing the role of differential psychology in the HCI field. They have identified a number of basic cognitive abilities that have reliably influenced the performance of specific tasks in predictable ways. Based on own analyses, they summarized that measures of ability can account for approximately 25% of variance in performance thus being suitable for usage in decision making for most systems, especially in addition to other sources of information (previous work experience, education, domain knowledge, etc.) According to their recommendations, psychological measures of individual differences should be used to increase possibilities for generalization of HCI findings. There is a number of studies confirming these pioneer work suggestions, showing for example that cognitive abilities, such as spatial and verbal ability, do affect the interaction, particularly navigation performance of the user [2, 9, 23, 27, 34]. The influence of user goals, knowledge, preferences and experience on her/his interaction with an intelligent system is unquestionable [4]. Moreover, these characteristics have been successfully employed in many adaptive systems, for example AHA!1 [11], InterBook2 [5], KBS Hyperbook [19], ELM-ART3 [33], INSPIRE [26], AVANTI [30], PALIO [31]. On the other hand, the matter of adaptation to cognitive styles and learning styles has been mainly ignored until last decade. Nevertheless, newer research (e.g. [8, 16]) confirms that navigation preferences of the users reflect their cognitive styles. In educational area many authors concluded that adaptation to learning styles, as defined by Kolb (1984) or Honey and Mumford (1992), could bring substantial benefits to students’ learning activities. This is evident from an increasing number of adaptive 1
http://aha.win.tue.nl/ http://www.contrib.andrew.cmu.edu/~plb/InterBook.html 3 http://apsymac33.uni-trier.de:8080/Lisp-Course 2
696
J. Nakić and A. Granić
educational systems having implemented some kind of adaptation (adaptability or adaptivity) to learning styles, see for example CS388 [7], INSPIRE [26] and AHA [28]. 1.2 Motivation for the Research Evidently, the affect of user individual differences on her/his performance has been the topic of very fruitful research for the last few decades. However, the obtained results are not quite consistent, partially because the user performance while using a particular system depends greatly on the system itself [3]. In addition, the research area of cognitive styles and learning styles in the HCI field is very recent so yet there is no strong evidence of their relevance concerning user’s interaction with an intelligent system (as discussed in [29]). Furthermore, even if these user styles were proved to be relevant, the question of potential benefits from personalized interaction still remains. System adaptation, even when well designed, does not necessarily imply user’s performance improvement [8]. Moreover, it can be disadvantageous to some classes of users [10]. Before including adaptation into a system, it is worthwhile to consider the possible alternatives. One good alternative, as suggested by Benyon and Hook (1997), could be an enlargement of learner’s experience in order to overcome her/his low spatial ability. As a second alternative, an appropriate redesign of a nonadaptive interface can be considered [20]. Based on these reflections, the research presented in this paper encompasses a comprehensive user analysis regarding a web-based learning application. The empirical study reported in the following aims to provide an answer whether it is reasonable or not to implement adaptation into the system.
2 User Analysis in e-Learning Environment: An Empirical Study The methodology for this experiment has been grounded mainly on our previous exploratory study reporting the relevance of user individual characteristics on learning achievements acquired in interaction with an e-learning system [18]. Although we have found some statistically significant correlations of user individual characteristics and learning performance, the results were not suitable for generalization, mainly due to certain limitations of the participants sample and of methodology applied. Encouraged by the results of the pilot experiment but also aware of its limitations, we have redesigned the methodology and conducted the second study elaborated in the following. The main objective of the research remains the same – to estimate potential benefits of engaging adaptation into the system. Clearly, such estimation should be based on in depth user analysis comprising both the analysis of user individual differences and user behavior in e-learning environment. In particular, the presented empirical study identifies and appraises those users' characteristics that produce statistically significant differences in the “amount of knowledge” which students get in learning session (i.e. learning outcomes). These characteristics are candidate variables for steering the adaptation process towards them. It can be assumed that adaptation of the system to those user characteristics that significantly correlate with learning outcomes could bring substantial benefits to students’ learning performance. Such hypothesis still has to be confirmed or rejected experimentally for each one of the candidate variables.
User Individual Differences in Intelligent Interaction: Do They Matter?
697
2.1 Participants Student volunteers were recruited from two faculties of the University of Split. The first group of participants was selected among 30 first-year undergraduate students (from two different study programs) attending The Computer Lab 1, a laboratory classes at the Faculty of Science. The second group was chosen from 30 candidates of first-year graduate students who were taking the Human-Computer Interaction course at the Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture. Overall, fifty-two students agreed to take part in the study and five of them were engaged to assist in carrying out the procedure. The experiment was completed over four weeks in class. Consequently, there was a number of students who did not accomplish all phases of the procedure, partially because of certain technical limitations occurred at the days of the learning sessions. A total of 33 students completed the study. 2.2 Variables and Measuring Instruments User individual differences concerned as predictor variables include: age, personality factors, cognitive abilities, experience, background, motivation and expectations from e-learning. The Eysenck’s Personality Questionnaire (EPQ) was used to measure students’ personality factors. According to Eysenck (1992) one of the two main personality factors is neuroticism or the tendency to experience negative emotions. The second one is extraversion, as the tendency to enjoy positive events, especially social events. General factor of intelligence or “g” factor, as defined by Sternberg (2003), is a cognitive ability measure assessed through M-series tests, consisting of 5 subtests. We have used a Likert-based questionnaire to measure students’ experience, motivation and expectations. There were three dimension of experience assessed: computer experience and Internet experience which refer to time students spend using computer and Internet at the present time, as opposed to prior experience in using computers that refers to their previous education. Motivation for e-learning was the most difficult variable to measure. Although the learning sessions were integrated in the class, the students’ learning performance did not affect their course grades. That was the way to prevent interfering of extrinsic motivation for learning. The motivation assessed through the questionnaire refers only to intrinsic motivation of students, i.e. the level of their interest in the subject matter and in the mode of its presentation as web-based application. Students’ expectations from e-learning are another subjective measure, estimated through their own opinion about the quality and efficiency of e-learning applications in general. Information about students’ background, i.e. previous knowledge was calculated on their grades from previously passed exams (for graduate students) or from entry tests and pre-exams of first-year courses (for undergraduate students) in addition to their high school grades of relevant subjects. Students’ outcome acquired in learning session is expressed as a gain between pretest and post-test scores. The same paper-based 19-item multiple choice test served as pre-test and post-test. A lesson related to a communication and collaboration of Internet users, provided through a learning management system, was selected as a topic of the learning session. Selected lesson has not been thought previously in any university course at both faculties.
698
J. Nakić and A. Granić
2.3 Procedure The whole experiment procedure was conduced as a part of usual class time, integrated into the courses curriculums. It took four weeks to carry out all phases of the procedure. Through an introductory interview we have informed the students about the purpose and nature of the experiment. They have been told that participation in the study is on voluntarily basis and that their performance or scores on tests will not affect their course grades in any way. Obtained participants’ data were used for the preparation of a finely tuned questionnaire used afterwards to assess their experience, motivation and expectations. In the second week of the experiment the participants attained M-series tests. Testing was conducted under the supervision of psychologist and took 45 minutes for fulfillment of 5 tests, each one of them time limited separately. A following week the students took the EPQ test and filled the prepared questionnaire which measured remaining personal characteristics. The last week of the procedure comprised four steps. First, the students were given the pre-test on the subject matter expected to learn afterwards using the e-learning system. They were allowed 10 minutes to complete the pre-test. Then the students started a web-based learning application. Time for learning was limited to 30 minutes. The students were permitted to take notes while reading, but not allowed to use any external material on the subject, such as textbooks or other web resources. These notes could serve them only in reviewing the lesson material. After completing the learning session, the students were given the post-test. Again, a maximum of 10 minutes was allowed for completing the test. Usage of the notes taken while learning was not permitted. On completion of the post-test, the students were asked to fill the SUS questionnaire, thus measuring their satisfaction with the system they have just experienced.
3 Results Data analysis was conduced using SPSS version 16.0 for Windows. Pearson correlations were calculated, with p < 0.05 as acceptable level of significance for the experiment. 3.1 The Sample A total of 33 datasets were analyzed. The sample consisted of 12 females (36.4%) and 21 males (63.6%). The age varied from 18 to 24, with a mean of 20.3. The distribution of gender and age is shown in Table 1, distinguished into different study programs of students. Table 1. The distribution of gender and age within the sample Study program Computer Science and Technics Mathematics Computer Science Total
Study
Female
Male
Age range
Average age
undergraduate
2
10
18-21
22.0
undergraduate graduate
8 2 12
0 11 21
18-20 21-24 18-24
19.1 19.3 20.3
User Individual Differences in Intelligent Interaction: Do They Matter?
699
Descriptive statistics of all measured variables is presented in Table 2. The sample is relatively heterogeneous, considerable differences are evident in prior experience as well as background knowledge. This can be explained by the fact that participants come from three different study programs of two faculties. Two groups of participants are composed of first-year undergraduate students, while the third group is recruited from first-year graduate students. Regardless the differences in student experience, neither of participants has previously read any lesson from the learning management system used in the experiment. Table 2. Descriptive statistics of the sample
Age Extroversion Neuroticism Intelligence Prior experience Computer experience Internet experience Motivation Expectations Background knowledge Learning outcome Satisfaction
Minimum 18 3 1 36 6 4 4 0 0 8 10 47.5
Maximum 24 21 18 60 54 16 16 8 6 56 43 95
Mean 20.30 14.18 9.48 49.15 25.64 9.64 9.52 6.30 4.12 27.30 30.52 74.545
Std. Deviation 1.72 4.43 4.37 6.79 15.22 3.62 3.36 1.81 1.41 12.34 6.90 12.63
3.2 Results and Interpretation Data analysis showed highly significant correlation of M-series tests results with learning outcome (r = 0.47, p < 0.01). Since learning outcome is measured as a gain between pre-test and post-test scores, this result suggests that more intelligent students have learned more in the e-learning session then the less intelligent ones. The probability of occurring this by chance is less then 0.01. Another statistically significant correlation was identified between M-series tests results and background knowledge (r = 0.39, p < 0.05), indicating that more intelligent students have also achieved better grades on their previously passed exams and/or pre-exams. In light of these two significant correlations, it seems that more intelligent students have better learning performance in web-based than in traditional learning environment. No significant correlations were found between personality factors and learning outcome. Table 3 shows Pearson correlations of all psychological tests scores with background knowledge, learning outcomes and satisfaction with the system. Conducting age and experience analysis we have found that Internet experience significantly correlates with learning outcome (r = 0.37, p < 0.05), as shown in Table 4, suggesting that students who spend more time on the Internet use web-based learning application more successfully than students who spend less time. Intrinsic motivation for e-learning positively correlates with learning outcome (r = 0.36, p < 0.05), suggesting that more motivated students have acquired more knowledge in learning session than less motivated students. Another statistically significant
700
J. Nakić and A. Granić Table 3. Correlations of personality and intelligence with knowledge and satisfaction Extroversion
Background knowledge Learning outcome Satisfaction
Neuroticism
-.182 p = .311 -.088 p = .626 .043 p = .810
.066 p = .714 .184 p = .305 .260 p = .143
Intelligence
.393* p = .024 .465** p = .006 .035 p = .845
** Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed). Table 4. Correlations of age and experience with learning session results and satisfaction Age Learning outcome Satisfaction
.333 p = .058 .276 p = .120
Prior experience
Computer experience
Internet experience
.284 p = .109 .139 p = .441
.180 p = .315 .116 p = .521
.370* p = .034 .094 p = .602
* Correlation is significant at the 0.05 level (2-tailed).
correlation (r = 0.35, p < 0.05) was found between expectations from e-learning and satisfaction in using the system (SUS questionnaire). Apparently, users with grater expectations from the system have experienced higher levels of fulfillment in system usage. Those correlations are presented in Table 5, along with correlations for background knowledge. No significant connections were identified between background knowledge and other variables presented in this table. Table 5. Correlations of motivation and expectations from e-learning with background knowledge, learning outcome and satisfaction Motivation Background knowledge Learning outcome Satisfaction
.082 p = .648 .357* p = .041 .184 p = .306
Expectations
.163 p = .364 -.026 p = .886 .346* p = .049
Background knowledge
.314 p = .075 .205 p = .251
* Correlation is significant at the 0.05 level (2-tailed).
3.3 Discussion Personality factors, namely extroversion/introversion and the level of neuroticism, seem to have no impact on learning outcome (Table 3), the results which are in line with related literature, cf. [12, 13].
User Individual Differences in Intelligent Interaction: Do They Matter?
701
Considering motivation for e-learning and expectations from it, obtained results were expected (Table 5) – while motivation for e-learning is related to learning outcome, expectations from e-learning correlate with user satisfaction. In order to offer valuable results’ interpretation, it is important to distinguish motivation from satisfaction. Motivation includes aspiration and effort to achieve a goal, while satisfaction refers to fulfillment we feel due to a goal achievement. Thus the obtained connection of expectations and satisfaction seems very natural. Apparently, there is no connection between background knowledge and learning outcome. Such connection was expected because of the following reason. Namely, there are high correlations of background knowledge with all three dimensions of experience: prior experience (r = 0.79, p < 0.01), computer experience (r = 0.44, p < 0.05) and Internet experience (r = 0.44, p < 0.05). On the other hand, experience significantly correlates with learning outcome (Table 4). Consequently, the correlation of the background knowledge and learning outcome was also expected and such result would be in line with related studies [4]. The absence of particular connection may be explained by the fact that the topic of learning session was previously unknown to majority of participants, as confirmed with the pre-test scores.
4 Conclusion Appraising user characteristics that produce differences in learning performance has an important role when considering adaptive educational systems. The conducted empirical study reveals that there are significant connections of user intelligence, experience and motivation with her/his learning outcome in an e-learning environment. These results contribute to the knowledge base of user individual differences and they should be taken into account when developing a web-based instructional content. Nevertheless, further work is required in order to determine the way in which relevant user characteristics could be exploited in enabling the system adaptation. Additional research will be conducted to investigate what affects learning behavior as well as to determine how learning behavior is reflected on learning outcomes. It will be particularly interesting to see if the predictors of the learning behavior could predict learning outcome as well. Acknowledgments. This work has been carried out within project 177-0361994-1998 Usability and Adaptivity of Interfaces for Intelligent Authoring Shells funded by the Ministry of Science and Technology of the Republic of Croatia.
References 1. Benyon, D., Höök, K.: Navigation in Information Spaces: Supporting the Individual. In: INTERACT 1997, pp. 39–46 (1997) 2. Benyon, D., Murray, D.: Adaptive systems: From intelligent tutoring to autonomous agents. Knowledge Based Systems 6, 197–219 (1993)
702
J. Nakić and A. Granić
3. Browne, D., Norman, M., Rithes, D.: Why Build Adaptive Systems? In: Browne, D., Totterdell, P., Norman, M. (eds.) Adaptive User Interfaces, pp. 15–59. Academic Press. Inc., London (1990) 4. Brusilovsky, P.: Adaptive Hypermedia. User Modeling and User-Adapted Interaction 11, 87–110 (2001) 5. Brusilovsky, P., Eklund, J.: InterBook: an Adaptive Tutoring System. UniServe Science News 12 (1999) 6. Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.): Adaptive Web 2007. LNCS, vol. 4321. Springer, Heidelberg (2007) 7. Carver, C.A., Howard, R.A., Lavelle, E.: Enhancing student learning by incorporating learning styles into adaptive hypermedia. In: Proc. of 1996 ED-MEDIA World Conf. on Educational Multimedia and Hypermedia, Boston, USA, pp. 118–123 (1996) 8. Chen, S., Macredie, R.: Cognitive styles and hypermedia navigation: development of a learning model. Journal of the American Society for Information Science and Technology 53(1), 3–15 (2002) 9. Chen, C., Czerwinski, M., Macredie, R.: Individual Differences in Virtual Enviroments – Introduction and overview. Journal of the American Society for Information Science 51(6), 499–507 (2000) 10. Chin, D.N.: Empirical Evaluation of User Models and User-Adapted Systems. User Modeling and User Adapted Interaction 11, 181–194 (2001) 11. De Bra, P., Calvi, L.: AHA! An open Adaptive Hypermedia Architecture. The New Review of Hypermedia and Multimedia, 115–139 (1998) 12. Dillon, A., Watson, C.: User Analysis in HCI – The Historical Lessons From Individual Differences Research. International Journal on Human-Computer Studies 45, 619–637 (1996) 13. Egan, D.: Individual Differences in Human-Computer Interaction. In: Helander, M. (ed.) Handbook of Human-Computer Interaction, pp. 543–568. Elsevier Science B.V. Publishers, North-Holland (1988) 14. Eysenck, H.J.: Four ways five factors are not basic. Personality and Individual Differences 13, 667–673 (1992) 15. Ford, N., Chen, S.Y.: Individual Differences, Hypermedia Navigation and Learning: An Empirical Study. Journal of Educational Multimedia and Hypermedia 9(4), 281–311 (2000) 16. Graff, M.G.: Individual differences in hypertext browsing strategies. Behaviour and Information Technology 24(2), 93–100 (2005) 17. Granić, A., Stankov, S., Nakić, J.: Designing Intelligent Tutors to Adapt Individual Interaction. In: Stephanidis, C., Pieper, M. (eds.) ERCIM Ws UI4ALL 2006. LNCS, vol. 4397, pp. 137–153. Springer, Heidelberg (2007) 18. Granić, A., Nakić, J.: Designing intelligent interfaces for e-learning systems: the role of user individual characteristics. In: Stephanidis, C. (ed.) HCI 2007. LNCS, vol. 4556, pp. 627–636. Springer, Heidelberg (2007) 19. Henze, N., Nejdl, W.: Adaptivity in the KBS Hyperbook System. In: 2nd Workshop on Adaptive Systems and User Modeling on the WWW. Toronto, Banff (1999), Held in conjunction with the WorldWideWeb (WWW8) and the International Conference on User Modeling (1999) 20. Hook, K.: Steps to Take Before Intelligent User Interfaces Become Real. In: Interacting with Computers, vol. 12, pp. 409–426. Elsevier Science B.V (2000) 21. Honey, P., Mumford, A.: The Manual of Learning Styles, 3rd edn. Peter Honey, Maidenhead (1992) 22. Jennings, F., Benyon, D., Murray, D.: Adapting systems to differences between individuals. Acta Psychologica 78, 243–256 (1991)
User Individual Differences in Intelligent Interaction: Do They Matter?
703
23. Juvina, I., van Oostendorp, H.: Individual Differences and Behavioral Metrics Involved in Modeling web Navigation. Universal Access in the Information Society 4(3), 258–269 (2006) 24. Kolb, D.A.: Experiential Learning: Experience as the Source of Learning and Development. Prentice-Hall, Englewood Cliffs (1984) 25. Magoulas, G., Chen, S. (eds.): Proceedings of the AH 2004 Workshop, Workshop on Individual differences in Adaptive Hypermedia, The 3rd International Conference on Adaptive Hypermedia and Adaptive Web-based Systems, Eindhoven, Netherlands (2004) 26. Papanikolaou, K.A., Grigoriadou, M., Kornilakis, H., Magoulas, G.D.: Personalising the Interaction in a Web-based Educational Hypermedia System: the case of INSPIRE. UserModeling and User-Adapted Interaction 13(3), 213–267 (2003) 27. Stanney, K., Salvendy, G.: Information visualization: Assisting low spatial individuals with information access tasks through the use of visual mediators. Ergonomics 38(6), 1184–1198 (1995) 28. Stash, N., Cristea, A., De Bra, P.: Adaptation to Learning Styles in ELearning: Approach Evaluation. In: Reeves, T., Yamashita, S. (eds.) Proceedings of World Conference on ELearning in Corporate, Government, Healthcare, and Higher Education, pp. 284–291. Chesapeake, VA, AACE (2006) 29. Stash, N., De Bra, P.: Incorporating Cognitive Styles in AHA (The Adaptive Hypermedia Architecture). In: Proceedings of the IASTED International Conference Web-Based Education, pp. 378–383 (2004) 30. Stephanidis, C., Paramythis, A., Karagiannidis, C., Savidis, A.: Supporting Interface Adaptation: the AVANTI Web-Browser. In: 3rd ERCIM Workshop on User Interfaces for All (UI4ALL 1997), Strasbourg, France (1997) 31. Stephanidis, C., Paramythis, A., Zarikas, V., Savidis, A.: The PALIO Framework for Adaptive Information Services. In: Seffah, A., Javahery, H. (eds.) Multiple User Interfaces: Cross-Platform Applications and Context-Aware Interfaces, pp. 69–92. John Wiley & Sons, Ltd., Chichester (2004) 32. Sternberg, R.J.: Cognitive Psychology, Wadsworth, a division of Thompson Learning, Inc., 3rd edn (2003) 33. Weber, G., Brusilovsky, P.: ELM-ART: An Adaptive Versatile System for Web-based Instruction. International Journal of Artificial Intelligence in Education 12, 351–384 (2001) 34. Zhang, H., Salvendy, G.: The implication of visualization ability and structure preview design for web information search tasks. International Journal of Human–Computer Interaction 13(1), 75–95 (2001)
Intelligent Interface for Elderly Games Changhoon Park Dept. of Game Engineering, Hoseo University, 29-1 Sechul-ri Baebang-myun, Asan, Chungnam 336-795, Korea [email protected]
Abstract. This paper proposes an intelligent interface to improve the game accessibility for the elderly based on the multimodal interface and dynamic load balancing. This approach aims to control the fidelity of feedback and the level of difficulty dynamically when the elderly become bored or frustrated with the game. By applying the proposed intelligent interface, we will present the implementation of a rhythm game for the elderly with a specialized game controller like a drum. Keywords: Game Accessibility, Multimodal Interface, Dynamic Game Balancing, Rhythm Game.
1 Introduction As the percentage of older persons in the world's population is continually increasing, the issue of accessibility to and usability of products and services has become more critical. And, there has been an explosion of interest and involvement in the field of gerontechnology1 for innovative and independent living and social participation of older adults in good health, comfort and safety [1]. In recent years, we have been studied serious games for the elderly based on the previous research finding positive effects of video game use on the cognitive and neuromotor skills of the elderly [2]. And, we have started a project, "A research on serious games for the elderly toward human service" supported by Hoseo University. A research team in the faculty of game engineering, welfare for the elderly, nursing, and electronic engineering are working in collaboration. The goal of this project is to improve the quality of their lives by means of game play interaction. And, our strategy to achieve this goal includes a change from technology development driven by technical feasibility towards one driven by the knowledge about behavior of the elderly, considered as a special category of users, whose particular abilities and needs, at cognitive, social and health levels, have to be taken into account during the research process. This paper will present a way of adapting a game to the elderly dynamically in order to keep them challenged and interested based on the consideration and understanding of them. In section 2, we introduce our previous experiment and the concept 1
The term of gerontechnology is a composite of two words, “gerontology” the scientific study of aging and “technology”: research, development, and design of new and improved techniques, products, and services.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 704–710, 2009. © Springer-Verlag Berlin Heidelberg 2009
Intelligent Interface for Elderly Games
705
of game accessibility. Section 3 proposes our approach in order to improve the game accessibility for the elderly. Section 4 presents the implementation of a rhythm game with a specialized game controller like a drum. Finally section 5 concludes this.
2 Related Work This section introduces our previous experiment to identify barriers posed by current video games and understand the content interests and the skill sets of the elderly. And, we present multimodal interface and design for dynamic diversity, which can be applied to improve the game accessibility. 2.1 Experiment The objective of this study is to examine Korean elders’ playing of video games. The total number of participants was forty. We recruited participants who were over the age of 65 at the time of the study. We conducted a series of four focus groups with four games selected for the study (2 Taiko master, Wii sports, and WarioWare).
Fig. 1. Participants were encouraged by the researchers to take turns and to play a variety of games within the one hour. During the game play, participants’ comments were noted. In addition, the interview and participant's movement of hands and eyes were recorded by two video cameras in order to evaluate the difficulty or frustration of them.
Relating to the game controller, an input device used to control a game, the participants demonstrated no difficulty when using the drum-like game pad with sticks for Taiko master. And, participant remarks indicating difficulty with Wii remote controller. While the drum-like game pad is simple and easy to use, participants need more time to familiarize themselves with the Wii remote, especially the use of the buttons. This means that special purpose devices are more familiar and intuitive to use than general-purpose devices. Relating to the game design, each mini-game of WarioWare lasts only about five seconds or so. It’s too short to understand and enjoy the challenge of game for the elderly. And, one participant commented after playing Wii sports, “I don’t know the rule of bowling. Make the games with common activity such as cleaning, dancing and
706
C. Park
so on. Taiko master provides weak feedback about the progress and activity of game play in spite of long playtime. A majority of participants indicated after the game play that they would be interested in playing video games in the future. To appeal to an elderly people, existing games need some medications to the complexity of controls and simplifying the challenge of the activity. This study demonstrates that interactive games allow the elderly to enjoy new opportunities for leisure and entertainment situations, while improving their cognitive, functional and social skills. 2.2 Game Accessibility Game Accessibility is defined as the ability to play a game even when functioning under limiting conditions. Limiting conditions can be functional limitations, or disabilities — such as blindness, deafness, or mobility limitation . Multimodal interface provides the user with multiple modes of interfacing with a system beyond the traditional keyboard and mouse. Modality refers to any of the various types of sensation, such as vision or hearing. And, a sensory modality is an input channel from the receptive field. A well-designed multimodal application can be used by people with a wide variety of impairments. This means that the weaknesses of one modality or sensory ability can be offset by the strengths of another. For examples, visually impaired users rely on the voice modality with some keypad input. Hearing-impaired users rely on the visual modality with some speech input. Among the most important reasons for developing multimodal interfaces is their potential to greatly expand the accessibility of computing for diverse and non-specialist users, and to promote new forms of computing not previously available [3]. [4] proposed a paradigm to support universal design is called Design for Dynamic Diversity (DDD or D3). Traditional User Centered Design (UCD) does not support this paradigm, as the focus of UCD is placed on the “typical user”. As has been described, “the elderly” encompasses a very diverse group of users in which individual requirements change over time, making it a group that UCD has difficulties coping with. That is why a new methodology has been introduced to accommodate Design for Dynamic Diversity. In this paper, we will apply the concept of DGB for game balance. Game balance is a concept in game design describing fairness or balance of power in a game between multiple players or strategic options. We will control the level of challenge dynamically and individually in order to help the elderly’ game play.
3 Intelligent Interface In this section, we propose an intelligent interface based on the understanding of skill sets of this significant population. We aims to keep the elderly in the mental state of operation in which the person is fully immersed in what he or she is doing by a feeling of energized focus, full involvement, and success in the process of the activity. The intelligent interface provides two methods to avoid the elderly becoming board or frustrated with the game.
Intelligent Interface for Elderly Games
707
Fig. 2. Overview of intelligent interface
First method is to select the most appropriate mode of feedback to encourage and assist the elderly. Feedback is an important part of video games for a fulfilling interactive experience. And, it can be presented in several types of feedback such as visual, audio, action, NPC and so on. The more alternative type of feedback used with the use of multimodal interface, the greater the number of people who will be suited. In terms of function for elderly peoples with dexterity and strength impairment, this methods makes game accessible by the use of another modality or sensory ability. And, each feedback is designed to have multiple levels of fidelity. Second method is to change the level of the difficulty dynamically in order to keep the elderly away from states where the game is far too challenging, or way too easy. Traditional game design is well suited to covering particular clusters of players as a developer’s perception of what makes a good game is sure to appeal to someone. So, designers have relied on the provision of adaptable gaming experiences to make for better audience coverage, for example most games come equipped with an easy, medium and hard difficulty setting. But, the variety of the elderly means that some players will inevitably lie outside the scope of predetermined adaptation [5]. Our approach is to keep the elderly interested from the beginning to the end individually by changing parameters, scenarios and behaviors in video games. In order to realize intelligent interface, we need to detect the difficulty the user is facing at a given moment. Challenge function maps a given game state into a value that specifies how easy or difficult the game feels to the user. Depending on this value, intelligent interface can control the fidelity of feedback and the level of challenge in order to making game adaptable to different users. And, intelligent interface can be used to positive or negative and explicit or implicit way to avoid the elderly becoming board or frustrated with the game
4 Implementation We have implemented a rhythm game for the elderly in order to apply the proposed intelligent interface. This game has been developed using Microsoft Visual C++ and DirectX as the graphic API. In rhythm games, the players must match the rhythm of the music by pressing specific buttons, or activating controls on a specialized game controller, in time with the
708
C. Park
game's music. This kind of interaction helps the elderly improve perceptual motor skills and cognitive functioning. Motor skills can be defined as a refined use of small muscle controlling the hand, and fingers, usually in coordination with the eyes. This skill allows one to be able to complete tasks such as writing, drawing, and buttoning. A decline in perceptual-motor functions has serious consequences which affects a range of activities of daily living[6].
Fig. 3. Feedback and Difficult Control for Rhythm Game
Fig. 4. (a) The buk is a traditional Korean drum, with a round wooden body that is covered on both ends with animal skin. Performers usually beat their buk with bukchae (a drumstick) on one hand or two hands together. (b) We modified buk to detect when a sensor in the drum’s surface is hit. And, there are two blue buttons, left, and right, which are used to select and decide in the selection screens.
To implement intelligent interface in this game, we need to define a challenge function to detect the difficulty of gameplay. The first way is to allow the user press button when he or she feels difficulty. The second way is to make an equation dependent on a game score. If the variation of game score for a given time is positive, then the level of challenge become sink. Intelligent interface aims to keep this value stable to avoid the elderly becoming board or frustrated with the game. And, we need to control the level of difficulty for intelligent interface in rhythm game. In rhythm games, the player is required to hit the drum in time when large beats will scroll across the screen. So, the difficulty of this game can be controlled by
Intelligent Interface for Elderly Games
709
the speed and amount of note on screen. This is a way to make the game easy or difficult directly. For the fidelity of feedback, To improve the accessibility of game, we developed a specialized game controller by modifying buk2, traditional Korean drum (Fig. 1). Instead of using standard interfaces like keyboard and mouse, buk is so intuitive and simple that the elderly don’t need to spend the time to learn how to use. This game is played simply by hitting the drum in time with notes traveling across the screen. And, we can design our game as a one-switch game that can be controlled by single button.
Fig. 5. Introduction and gameplay Screenshot
5 Conclusion To identify barriers posed by current video games, we examined Korean elders' playing with three popular games. And, we presented an intelligent interface, which enables to control the fidelity of feedback and the level of difficulty dynamically. Our approach is based on the multimodal interface and dynamic game balancing for the accessibility of games. By applying the proposed interface, we developed a rhythm game and specialized controller especially for the elderly. This game can help the elderly improve perceptual motor skills and cognitive functioning. And, intuitive and simple game controller was also developed by modifying a traditional Korean drum. Acknowledgments. “This research was supported by the Academic Research fund of Hoseo University in 2008” (20080015)
References 1. Harrington, et al.: Gerontechnology: Why and How. Herman Bouma Foundation for Gerontechnology, 90–423 (2000) 2. Dustman, et al.: Aerobic exercise training and improved neuropsychological function of older individuals. Neurobiology of Aging (1984) 2
The term buk is also used in Korean as a generic term to refer to any type of drum. Buk have been used for Korean music since the period of the Three Kingdoms of Korea (57 BC – 668 AD).
710
C. Park
3. Oviatt: Advances in robust multimodal interface design. IEEE Computer Graphics and Applications 272, 03 (2003) 4. Gregor, et al.: Designing for dynamic diversity: making accessible interfaces for older people. In: Proceedings of the 2001 EC/NSF workshop on Universal (2001) 5. Gilleade, et al.: Using frustration in the design of adaptive videogames. In: Proceedings of the 2004 ACM SIGCHI International Conference (2004) 6. Drew, et al.: Video Games: Utilization of a Novel Strategy to Improve Perceptual-Motor Skills in the noninstitutionalised elderly. Cognitive Rehabilitation 4, 26–34 (1985)
User Interface Adaptation of Web-Based Services on the Semantic Web Nikolaos Partarakis1, Constantina Doulgeraki1, Asterios Leonidis1, Margherita Antona1, and Constantine Stephanidis1,2 1
Foundation for Research and Technology – Hellas (FORTH) Institute of Computer Science GR-70013 Heraklion, Crete, Greece 2 University of Crete, Department of Computer Science, Greece [email protected]
Abstract. The Web is constantly evolving into an unprecedented and continuously growing source of knowledge, information and services, potentially accessed at by anyone anytime, and anywhere. Yet, the current uptake rates of the Web have not really reached their full potential, mainly due to the design of modern Web-based interfaces, which fail to satisfy the individual interaction needs of target users with different characteristics. A common practice in contemporary Web development is to deliver a single user interface design that meets the requirements of an “average” user. However, this “average” user is in fact an imaginary user. Often, the profiles of a large portion of the population, and especially people with disability, elderly people, novice users and users on the move, differ radically. Although much work has been done in the direction of providing the means for the development of inclusive Web-based interfaces that are capable to adapt to multiple and significantly different user profiles, the current evolution towards the semantic web poses several new requirements and challenges for supporting user and context awareness. Building upon existing research in the field of semantics-based user modeling, this paper aims to offer potential new directions for supporting User Interface Adaptation on the Semantic Web. In this context, the benefits gained from supporting semantically enabled ontology based profiling are highlighted, focusing on the potential impact of such an approach to existing UI adaptation frameworks.
1 Introduction Recently, computer-based products have become associated with a great amount of daily user activities, such as work, communication, education, entertainment, etc. Their target population has changed dramatically. Users are no longer only the traditional able-bodied, skilled and computer-literate professionals. Instead, users are potentially all citizens of the emerging Information Society, and demand customised solutions to obtain timely access to any application, irrespective of where and how it runs. At the same time, the type and context of use of interactive applications is radically changing (e.g., personal digital assistants, kiosks, cellular phones and other C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 711–719, 2009. © Springer-Verlag Berlin Heidelberg 2009
712
N. Partarakis et al.
network-attachable equipment). This progressively enables nomadic access to information [15]. In computing, the notion and importance of adaptation, as the ability to adapt a system to the user’s needs, expertise and requirements was only recently recognised. In this context the computationally empowered environment can adapt itself, at various degrees, to its ‘inhabitants’, thereby reducing drastically the amount of effort required from the users. Methods and techniques for user interface adaptation meet significant success in modern interfaces, but most focus mainly on usability and aesthetics. The Unified User Interfaces methodology for UI adaptation [15] was conceived and validated as a vehicle to efficiently and effectively address, during the interface development process, the accessibility and usability of UIs to users with diverse characteristics, supporting also technological platform independence, metaphor independence and user-profile independence. Web-based user interfaces (WUIs) constitute a particular type of UIs that accept input and provide output by generating web pages that are transported via the Internet and are viewed by the user through a web browser. Adaptive Web-Based User Interfaces support the delivery of qualitative user experience for all, regardless of the user’s (dis)abilities, skills, preferences, and context of use. In the web context, factors such as visual experience and site attractiveness, quality of navigation organization (especially on large sites), placement of objects [3], colour schema, and page loading time also affect the overall user experience and satisfaction and can be employed by adaptation mechanisms to personalize web user interfaces. On the other hand, the Semantic Web provides valuable means and raises great expectations WUIs adaptation. Research has already employed the features offered by the Semantic Web for generating adaptation recommendations using mining techniques [12]. In the same context, work has been conducted towards providing dynamically generated Web content to better meet user expectations through semantic browsing of information [8]. However, the potential of developing an adaptive web-based environment in the context of the Semantic Web has not yet been fully investigated. In this paper, a potential architecture for a development framework that supports the creation of adaptive Web User Interfaces is introduced by extending the architecture of an existing development framework (EAGER [5]). This paper is structured as follows. Section 2 discusses various approaches to User Interface Adaptation. In section 3, a potential architecture for supporting User Interface adaptation on the Semantic Web is presented, based on the experience gained through the development of adaptive applications in various contexts. Section 4 outlines the main potential benefits of employing such a methodology in a semantically enabled environment. Finally, section 5 discusses further research and development steps in this direction.
2 Current Approaches to User Interface Adaptation 2.1 User and Context Profiling The scope of user profiling is to provide information regarding the user who accesses an interactive application. A user profile contains attributes either specified by the
User Interface Adaptation of Web-Based Services on the Semantic Web
713
user prior to the initiation of interaction or acquired by the system during interaction (through interaction monitoring). On the other hand, context profiling aims at collecting context attribute values (machine and environment) that are (potentially) invariant, meaning unlikely to change during interaction, (e.g., peripheral equipment or variant), or dynamically changing during interaction (e.g., due to environment noise, or the failure of particular equipment, etc). Static profiling. Static profiling entails the complete specification of attributes prior to the implementation of the reasoning engine of an interactive application. Where static profiling is employed, the process of altering the logic used for generating the adaptable behaviors of the system is semi-automatic and cannot be provided on the fly. More specifically it is not feasible, when such an approach is followed, to enrich the decision logic while the system is running to perform meta-adaptation. This can only be achieved in the context of adaptations that occur based on collecting and analyzing usage data. Extensible profiling using special purpose languages and Design Support Tools. A potential solution to the limitations of static profiling is to separate the logic under which adaptation occurs from the system performing the adaptation. This can be achieved, for example, through the creation of special purpose languages for the specification of the decision logic. An example of such a language is the Decision Making Specification language (DMSL [14]). Special purpose design support tools, such as MENTOR [1], can be used to produce the decision logic of an application orchestrating user interaction. 2.2 User Interface Adaptation Toolkits Data stemming from user and context profiling are used by adaptation toolkits for dynamically generating the interface instance that is more appropriate for a specific user in a specific context of use. Such toolkits in their most advanced implementation consist of collections of alternative interaction elements mapped to specific user and context parameters. The automatic selection of the appropriate elements is the key for supporting a large amount of alternative interface instantiations. In the following sections some indicative examples of existing tools that support the development of adaptive User Interfaces in various contexts are presented. The EAGER toolkit. EAGER [5] is a development toolkit that allows Web developers to build adaptive applications using facilities similar to those offered by commonly user frameworks (such as ASP.NET [2] and Java server faces [6]). It is a developer framework build over ASP.NET providing adaptation-enabled ready to use dialogs. By means of EAGER, a developer can produce Web portals that have the ability to adapt to the interaction modalities, metaphors and UI elements most appropriate to each individual user, according to profile information containing user and context specific parameters. Advanced toolkit for UI adaptation in mobile services. The main concept of this toolkit [9] is to facilitate the implementation of adaptive-aware user interfaces for mobile services. UI widgets supported by this framework encapsulate all the necessary information and are responsible for requesting and applying the relative
714
N. Partarakis et al.
decisions. The Toolkit employs DMSL to allow UI developers to turn hard-coded values of lexical attributes to adapted UI parameters specified in an external preference file. As a result, the UI Implementation is entirely relieved from adaptation-related conditionality, as the latter is collected in a separate rule file. 2.3 Case Studies In this section real life applications developed utilizing adaptation toolkits are briefly overviewed, focusing on highlighting their ability to cope with the diversity of the target user population and therefore offering qualitative user experience for all, regardless of the user’s (dis)abilities, skills, preferences, and context of use. The AVANTI web Browser. The AVANTI Web Browser [16] facilitates static and dynamic adaptations in order to adapt to the skills, desires and needs of each user including people with visual and motor disabilities. The Avanti’s unified interface can adapt itself to suit the requirements of three user categories: able-bodied, blind and motor impaired. Adaptability and adaptivity are used extensively to tailor and enhance the interface respectively, in order to effectively and efficiently meet the target of interface individualisation for end users. Additionally, the unified browser interface implements features, which assist and enhance user interaction with the system. Such features include enhanced history control for blind and sighted users, link review and selection acceleration facilities, document review and navigation acceleration facilities, enhanced intra-document searching facilities etc. The EDEAN portal. EDEAN is a prototype portal developed, as proof-of-concept, following the UWI methodology by means of the EAGER toolkit [5]. In order to elucidate the benefits of EAGER, an already existing portal was selected and redeveloped from scratch. In this way, it was possible to identify and compare the advantages of using EAGER, both at the developer’s site, in terms of developer’s performance, as well as at the end-user site, in terms of the user-experience improvement. The ASK-IT interface for mobile transportation services. The Home Automation Application developed in the context of ASK-IT facilitates remote overview and control through the use of a portable device. These facilities provided the ability to adapt themselves according to user needs (vision and motor impairments), context of use (alternative display types and display devices) and presence of assistive technologies (alternative input devices). 2.4 Discussion The approaches developed so far to support User Interface Adaptation have shown to be adequate for addressing a number of requirements. Especially in the context of web applications, previous work has proven that it is technologically feasible to develop web-based interfaces that are able to adapt to various user profiles and contexts of use. Limitations of current approaches include the difficulties faced when addressing the potential of change and the reduced reasoning capabilities resulting from the methods used for capturing user and context profiles. The semantic web brings new directions and challenges, and offers new paths through enhanced expressive power and advanced reasoning facilities. The next section discusses how these facilities can
User Interface Adaptation of Web-Based Services on the Semantic Web
715
be used towards enriching the adaptive behavior of existing frameworks. A potential implementation architecture for supporting User Interface Adaptation on the Semantic Web will be presented, focusing on the feasibility of such a concept and on its potential advantages.
3 User Interface Adaptation on the Semantic Web 3.1 Requirements for Effective User Modeling Requirements for creating effective user modeling systems have been documented in [7] and [4], and include: • Generality, including domain independence. User modeling systems should be usable in as many domains as possible, and within these domains for as many user modeling tasks as possible. • Expressiveness and strong inferential capabilities. Expressiveness is a key factor in user modeling systems; they are expected to express many different types of assumptions about the users and their context. Such systems are also expected to perform all sorts of reasoning, and to perform conflict resolution when contradictory assumptions are detected. • Support for quick adaptation. Time is always an important issue when it comes to users; User modeling systems are required to be adaptable to the users’ needs. Hence they need to be capable of adjusting to changes quickly. • Precision of the user profile. The effectiveness of a user profile depends on the information the system delivers to the user. If a large proportion of information is irrelevant, then the system becomes more of an annoyance than a help. This problem can be seen from another point of view; if the system requires a large degree of customization, then the user will not be willing to use it anymore. • Extensibility. A user modeling system’s success relies on the extensibility it offers. Companies may want to integrate their own applications (or API) into the available user models. • Scalability. User modeling systems are expected to support many users at the same time. • Import of external user-related information. User models should support a uniform way of describing users' dimensions in order to support integration of already existing data models. • Management of distributed information. The ability of a generic user modeling system to manage distributed user models is becoming more and more important. Distributed information facilitates the interoperability and integration of such systems with other user models. • Support for open standards. Adherence to open standards in the design of generic user modeling systems is decisive since it fosters their interoperability. • Load balancing. User modeling servers should be able to react to load increases through load distribution and possibly by resorting to less thorough (and thereby less time-consuming) user model analyses.
716
N. Partarakis et al.
• Failover strategies. Centralized architectures need to provide fallback mechanisms in case of a breakdown or unexpected situation. • Fault tolerance. In case a user inserts wrong data in his/her profile by mistake (i.e. a user denotes an opposite gender), the system must prompt the user to adjust the corresponding parameters, rather than reset his/her profile. • Transactional Consistency. Parallel read/write procedures on the user model should lead to the deployment of sufficient mechanisms that preserve and restore possible inconsistencies. • Privacy support. Another requirement of user modeling systems is to respect and retain the user's privacy. In order to meet these requirements, such systems must provide a way for the users to express their privacy preferences, as well as the security mechanisms to enforce them 3.2 User Interface Adaptation on the Semantic Web: Proposed Architecture Figure 1 presents the proposed implementation architecture for supporting adaptive interfaces on the Semantic Web.
Fig. 1. User Interface Adaptation on the Semantic Web: proposed architecture
Modeling User, Context and Interaction. In the proposed architecture, the Knowledge Base contains the ontology representing the modeled classes and properties for supporting the collection of parameters appropriate for modeling: • User Profile (Disability, Web Familiarity, Language, etc.) • Context Profile (Input-Output devices, screen capabilities, etc.) • User Interaction (monitoring user actions, user navigation paths, etc.)
User Interface Adaptation of Web-Based Services on the Semantic Web
717
The Knowledge Base can use web ontology languages such as OWL to store the appropriate information in the form of semantic web rules and OWL-DL [11] ontologies. This approach offers enough representational capabilities to develop a formal context model that can be shared, reused, and extended for the needs of specific domains, but can also combined with data originating from other sources, such as the Web or other applications. Moreover, currently the logic layer of the Semantic Web is evolving towards rule languages that enable reasoning about the user’s needs and preferences and exploiting available ontology knowledge [10]. An example of how user profile parameters can by modeled in an ontology is presented in Figure 2. User is a superclass that includes the user groups a user may belong to according to his/her functional limitations (NonImpairedUser, HearingImpairedUser, MotorImpairedUser or VisuallyImpairedUser), each of which is further analysed where appropriate.
Fig. 2. An example of an ontology representing user abilities
Designs Repository. The Designs Repository contains abstract dialogues together with their concrete designs. Following the Unified User Interface Design methodology [15], this is achieved through polymorphic decomposition of tasks that leads from abstract design pattern to a concrete artifact. Design Repositories for supporting adaptation of web-based services can consists of primitive UI elements with enriched attributes (e.g., buttons, links, radios, etc.), structural page elements (e.g., page templates, headers, footers, containers, etc.), and fundamental abstract interaction dialogues in multiple alternative styles (e.g., navigation, file uploaders, paging styles, text entry) [5]. Reasoner and Rule Engine. The Reasoner module, together with the Rule engine, undertakes the job of classifying instances and performing the overall decision making that is required for selecting the appropriate interaction elements to build the concrete user interface. In this context, the Reasoner classifies instances into classes that have a strict definition, taking into account the Open World Assumption (i.e., if there is a statement for which knowledge is not currently available, it cannot be inferred if it is true or false). The Rule Engine undertakes the classification into primitive classes and specifies and executes classification rules.
718
N. Partarakis et al.
Orchestration (Adaptation Core). The adaptation core undertakes the orchestration of the main modules of the proposed architecture. When a user profile is created, the Reasoner and Rule engine are invoked for classifying instances under various classes, computing inferred types and reasoning on the available context. The results are stored in the knowledge based and are used by the adaptation core for inferring specific actions regarding the activation and deactivation of alternative dialogs. The adaptation core is also responsible for re-invoking the aforementioned services when the data stemming from the user interaction monitoring process lead to the need of reevaluating existing user profile information through reevaluation of rules. 3.3 Benefits Regarding the adaptation process itself, the adoption of a semantically enabled inference mechanism potentially allows the evaluation of more complex rules, thus making reasoning more solid and enriching the application logic. Moreover, an ontology based specification of user, context and interaction profiles makes the potential extension of the system easier. Another important benefit of a semantically enabled adaptation approach is the increased possibility of learning user preferences. These attributes traditionally can be set by the user, but in most cases cannot be inferred from user actions. In the context o the proposed architecture it is possible to dynamically generate social tags that can in turn be used for performing adaptive filtering of information based on user preferences. A similar result can be also obtained by modeling user interaction data and performing batch analysis. This can be supported in the proposed architecture through introducing another layer of modeling beyond the designs repository used for strict UI purposes (i.e., a content modeling repository).
4 Conclusions and Future Work This paper has proposed an architecture for supporting the development of Adaptive User Interfaces on the Semantic Web, based on existing approaches which have been successfully used in the recent past for supporting adaptation of user interfaces in various contexts. Modifications to the architectural structure used in these adaptation frameworks have been proposed in order to cope with the requirements set in the context of the Semantic Web. Taking into account the enriched modelling and inference capabilities offered, this novel architecture aims at combining the benefits of the Semantic Web (such as extensibility, strong inference capabilities, etc.) with benefits of existing adaptation frameworks (such as the ability to address accessibility, user preference, various input output devices, etc). In future work, this implementation architecture will be employed in the context of the EAGER development framework. In this context, the Knowledge Base of Eager together with the inference mechanisms will be replaced by the modules proposed in the extended architecture (Knowledge base, Rule engine, Reasoner, etc.) allowing the reuse of facilities common to both architectures, such as the Designs Repository (which has been already put into use in the context of several interactive web based applications, such as the EDEAN portal, http://www.edean.org).
User Interface Adaptation of Web-Based Services on the Semantic Web
719
References 1. Antona, M., Savidis, A., Stephanidis, C.: A Process–Oriented Interactive Design Environment for Automatic User Interface Adaptation. International Journal of Human Computer Interaction 20(2), 79–116 (2006) 2. ASP.NET Web Applications, http://msdn.microsoft.com/en-us/library/ ms644563.aspx 3. Bernard, M.L.: User expectations for the location of web objects. In: Proceedings of CHI 2001 Conference: Human Factors in Computing Systems, pp. 171–172 (2001), http://psychology.wichita.edu/hci/projects/ CHI%20web%20objects.pdf [2007] 4. Çetintemel, U., Franklin, M.J., Giles, C.L.: Self-adaptive user profiles for large-scale data delivery. In: Proceedings of the 16th IEEE International Conference on Data Engineering (ICDE 2000) (2000) 5. Doulgeraki, C., Partarakis, N., Mourouzis, A., Stephanidis, C.: Adaptable Web-based user interfaces: methodology and practice. eMinds 1(5) (2009) http://www.eminds.hci-rg.com/index.php?journal=eminds&page= article&op=download&path=58&path=33 6. JavaServer Faces Technology, http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JSFIntro.html 7. Kobsa, A.: Generic user modeling systems. User Modeling and User-Adapted Interaction 11(1-2), 49–63 (2001) 8. Kostkova, P., Diallo, G., Jawaheer, G.: User Profiling for Semantic Browsing in Medical Digital Libraries. The Semantic Web: Research and Applications, 827–831 (2008) 9. Leuteritz, J.-P., Widlroither, H., Mourouzis, A., Panou, M., Antona, M., Leonidis, A.: Development of Open Platform Based Adaptive HCI Concepts for Elderly Users. In: Stephanidis, C. (ed.) Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009, San Diego, California, USA, July 19–24. Springer, Berlin (2009) 10. Michou, M., Bikakis, A., Patkos, T., Antoniou, G., Plexousakis, D.: A Semantics-Based User Model for the Support of Personalized. Context-Aware Navigational Services. In: First International Workshop on Ontologies in Interactive Systems, 2008. ONTORACT 2008, pp. 41–50 (2008) 11. OWL Web Ontology Language Reference. W3C Recommendation, February 10 (2004), http://www.w3.org/TR/owl-ref/ 12. Robal, T., Kalja, A.: Applying User Profile Ontology for Mining Web Site Adaptation Recommendations. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 126–135. Springer, Heidelberg (2007) 13. Savidis, A., Leonidis, A., Lilis, I., Moga, L., Gaeta, E., Villalar, J.L., Fioravanti, A., Fico, G.: Self-configurable User Interfaces. ASK-IT Deliverable - D3.2.1 (2004) 14. Savidis, A., Antona, M., Stephanidis, C.: A Decision-Making Specification Language for Verifiable User-Interface Adaptation Logic. International Journal of Software Engineering and Knowledge Engineering 15(6), 1063–1094 (2005) 15. Stephanidis, C.: The concept of Unified User Interfaces. In: Stephanidis, C. (ed.) User Interfaces for All - Concepts, Methods, and Tools, pp. 371–388. Lawrence Erlbaum Associates, Mahwah (2001) 16. Stephanidis, C., Paramythis, A., Sfyrakis, M., Savidis, A.: A Case Study in Unified User Interface Development: The AVANTI Web Browser. In: Stephanidis, C. (ed.) User Interfaces for All - Concepts, Methods, and Tools, pp. 525–568. Lawrence Erlbaum Associates, Mahwah (2001)
Measuring Psychophysiological Signals in Every-Day Situations Walter Ritter University of Applied Sciences Vorarlberg, Hochschulstraße 1, 6850 Dornbirn, Austria [email protected]
Abstract. Psychophysiological signals enable computer systems to monitor the emotional state of a user. Such a system could adapt its behavior to reduce stress, give assistance, or suggest well-being tips. All of this should lead to a technology that is more user-friendly and more accessible to older people. Measuring physiological signals in research labs has been done for many years. In such a controlled environment the quality of signals is very high because of the optimal placement of electrodes by research staff. Analysis techniques can therefore rely on high quality data. Measuring physiological signals in real-life settings without the assistance of well-trained staff, is much more challenging because of artifacts and signal distortions. In this paper we discuss the approach taken in the Aladin project to cope with the inferior and unreliable quality of physiological signal measurements. We discuss a sensor design intended for every-day use and present the variance of skin conductance we experienced within measurements, between different measurements of the same individual as well as between different persons. Finally, we suggest using trends instead of absolute values as a basis for physiology-enhanced human-computer interaction “in the wild”. Keywords: psychophysiology, skin conductance, heart rate, sensor technology, real-life settings, artifacts.
1 Introduction Measuring physiological signals has become widespread in an increasing number of application areas. Whilst it has been used within clinical applications for a long time1, recently it has also become popular in human computer interaction, as it provides clues as to what is going on psychologically inside a user. On the one hand, this provides valuable input in usability evaluations, identifying issues that could not be uncovered by questionnaires [8,10]. On the other hand physiology has been also proposed as an important input-channel for interactive computer applications [13]. Picard suggests the term Affective Computing for computers that also take emotions of their users into account for adjusting their behavior. Even though this term is mainly used 1
The origins of electrocardiogram (ECG) measurement date back as far as 1838, when Carlo Matteucci found out that each heart beat was accompanied by electrical current.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 720–728, 2009. © Springer-Verlag Berlin Heidelberg 2009
Measuring Psychophysiological Signals in Every-Day Situations
721
in the context of wearable computers, it also extends to a broader range of computer driven applications. Even intelligent building management systems and smart rooms come to mind. The physiological signals measured from human bodies range from electrocardiograms (ECG), electrodermal activity (EDA), electromyography (EMG) to electroencephalogram (EEG) and more. Each measure has its specific meaning, which is often dependent on other measurements. 1.1 Common Physiological Parameters ECG. Electrocardiograms show the electrical activity of the heart during the various phases of heart beats. The typical heart beat is characterized by the P, Q, R, S, and T waves as well as resulting segments [12]. The most prominent peak is the R-peak, which is often used to determine the RR-intervals also referenced as inter-beat intervals, describing the time between two consecutive heart beats. ECGs are applied in many contexts. In the context of human-computer interaction ECGs are often the basis for calculations derived from heart rate like heart rate variability. Heart rate variability is often analyzed in the frequency domain using spectral analysis techniques. Here the resulting power spectrum is divided into the frequency bands of very low frequency (VLF, 0.0033Hz - 0.04Hz), low frequency (LF, 0.04Hz 0.15Hz), and high frequency (HF, 0.15Hz - 0.4Hz). HF power spectrum is related to parasympathetic tone and variations due to spontaneous respiration, LF power spectrum indicates parasympathetic as well as sympathetic tone, and the VLF power spectrum, especially in shorter recordings, is considered an indicator for anxiety or negative emotions. The ratio of LF/HF is seen as balance indicator between the sympathetic and parasympathetic tone [1]. Whilst heart rate variability is a very promising indicator, variables in its processing, as well as the influence of artifacts or ectopic beats [3] make it very hard to use reliably in uncontrolled environments. EDA. Electrodermal activity is determined by measuring the electrical conductance of skin. Skin conductance is usually measured with electrodes placed on the index and middle finger or inside the palm [5]. For parameterization skin conductance usually is divided into the tonal skin conductance level (SCL) and skin conductance responses (SCR). Among the responses, a distinction between non-specific responses (responses without external stimuli) and specific responses to external stimuli is made. Specific SCRs are characterized by the latency time between the stimuli and the beginning of the rise, typically between one and three seconds, the amplitude of the rise (highly varying among test persons, also depending on habituation effects) and its half-value period [5, 16]. SCRs can therefore be used to detect physiological reactions to specific stimuli. If the occurrence of external stimuli is unknown, it is difficult to distinguish between specific responses and non-specific responses. Skin conductance level is often related to the concept of activation [6]. SCL is especially used as an indicator for the activation level [16], where longer term measurements allow for observations of the activation progress. EDA is a very popular measure in HCI, not the least because it can be easily measured on the fingers of the user, meaning there is no need for attaching electrodes underneath the clothes.
722
W. Ritter
EMG. Electromyography measures the activity of muscular tension. In this way, action of specific muscles can be identified and, for example, be used to count eye blinks, or tension of the neck muscles that may suggest frustration. Although the basic principle of measuring EMG is simple (two electrodes placed on the skin over the relevant muscle), the transformation of measured results into psychophysiological concepts is difficult. For example, a certain expression is usually caused by many contractions of multiple muscles [7]. Also, placement of electrodes on the surface often captures activity of various muscles, due to their relative size. Besides, electrodes may reduce the ability of an individual to move [2], thus also limiting the acceptance of applications in real-life settings. EEG. The electroencephalogram shows electrical activity on the head surface, allowing to deduce activity in specific parts of the brain. To be able to measure the very low amplitudes of activity, electrodes need to be attached using conductivity gel. The locations of electrode placement has been widely standardized in the so-called International 10-20 system. Basically, EEG measures the potential difference between two electrode sites. A theoretical distinction between active and inactive sites is made, where inactive sites are characterized as not being a source of electrical activity themselves. Among inactive sites are ear lobes, the vertex, or the nasion [14]. If the measurement is made by comparison of two active sites, the term bipolar recording is used. This is mostly prevalent in medical scenarios, whereas for psychophysiological purposes often monopolar recordings are used, meaning one site is inactive. According to Coles [4], EEG is typically recorded at sampling rates of 100Hz to 10000Hz. For the analysis of EEG signals, event-related brain potentials (ERP) play an important role. ERPs reflect brain activity caused by a specific discrete event, finally leading to a response. They are also regarded as indicators for psychological processes [4]. EEG is widely used in the field of brain-computer interfaces. However, its use in real-life settings for now has been constrained by the complicated measurement procedure involved. All of these measures described above work best with exact electrode placement on the human skin, especially EEG and EMG. Whilst in a laboratory, clinical, or otherwise controlled environment this is not much of a problem thanks to well-trained staff. Measuring these parameters in real-life settings by users is a completely different story. One problem may be the use of electrodes themselves. Telling a user to attach electrodes using a special gel to support conductance elicits negative associations reminding of medical treatments and thus leads to reluctance of use. But even if users were willing to use electrodes, the varying placements of them can lead to completely different measured values. In the following section we give an overview of the Aladin sensor glove, intended for physiology data collection in real-life settings, before we analyze the collected data regarding variances and their implications.
2 The Sensor Design In the Aladin project, psychophysiological measures were used as input for light adaptation algorithms as well as biofeedback applications. The project partners have developed an adaptive lighting system for elderly people with the aim to increase
Measuring Psychophysiological Signals in Every-Day Situations
723
mental and physical fitness [9]. As the project targets the elderly, in particular, it was essential that the sensor solution be accepted by potential customers. In a pre-study we learned that electrodes with the need for conductance gel were not accepted by the target group. Therefore we had to find alternative forms of electrodes that would be accepted but in turn may have the downside of not yielding equally reliable results. Wearable shirts would have been a nice solution, however, they only work well on skinny people. As regards psychophysiological measures we opted for heart rate and electrodermal activity, as both relate to the concept of activation and are also least susceptible to errors due to varying electrode placement, though still problematic. On the hardware side we developed a sensor design over two iterations. First we intended to use a sensor belt inspired by heart rate monitors, and also capable of measuring skin conductance at one side of the chest, temperature, and acceleration (see Figure 1). During a pre-test, however, we learned that attaching the sensor belt was a big problem for some persons of the target group due to their reduced mobility [15].
Fig. 1. Aladin sensor belt
Fig. 2. Aladin sensor glove
724
W. Ritter
As a result we opted for a glove based sensor design. A glove can be attached easily by all persons, and works well regardless of body size or figure. Heart rate in this sensor was measured via a method based on blood-volume pulse (BVP). Skin conductance was measured between the thumb and index finger (see Figure 2). Both sensor designs were based on a special mobile version of the Varioport recorders developed by Becker Meditec2. Sensor data is sent wirelessly to the measurement system via a Bluetooth connection. The decision for measuring heart rate based on BVP resulted in high levels of artifacts during movements, especially when moving the hand above the head, but in the typical usage scenarios this did not prove to be an issue. In the following section we describe the results we received using the Aladin sensor glove during field tests carried out in twelve test households.
3 Results of Psychophysiological Measurements from Field Tests Given the unpredictable usage scenarios during the field tests, the main questions were how reliable the signals would be within sessions of the same person on different occasions, as well as signal quality among different test persons, all compared to innersession variance, mostly caused by physiological processes. This should give an indication of which parameterization technique might be most suitable for the recorded data. As no information existed about what a person did at a specific time, all data was drawn from biofeedback exercises, where test persons should be reasonably calm and relaxed, thus not measuring results from unexpected actions. As skin conductance uses raw values and therefore is more prone to attachment-variations than the heart rate derived from BVP, we focus on skin conductance in this analysis. In the context of Aladin it was particularly important to get an idea of a person’s maximum span of skin conductance to give appropriate feedback of the relaxation progress in a biofeedback session. Table 1. Standard deviations within measurement sessions (Inner-STDEV) and between sessions (Inter-STDEV) for whole 4 minute periods and the first 30 seconds of the periods
2
Becker Meditec, Karlsruhe, Germany.
Measuring Psychophysiological Signals in Every-Day Situations
725
Fig. 3. Inter-person signal comparison
The following data is drawn from twelve field test persons - ten female and two male, aged between 64 and 82 years. A total of 1402 measurements each with a duration of four minutes were analyzed. Data was collected by the sensor glove described above. After initial instructions, the test persons attached the sensor glove by themselves without being observed or coached any further. As depicted in Figure 3 skin conductance differs enormously between individuals, but also within a single person, with the graphs showing minimum, maximum, aver-
726
W. Ritter
age and median of a measurement session. It can be seen that test person VP4 is much more stable than VP12, but even here variations between individual sessions outweigh the inner-session specific variations of skin-conductance. The graphs also show that average and median values are nearly identical. An analysis of standard deviations within individual measurement sessions and between them supports this impression. Table 1 lists individual standard deviations for whole 4-minute measurements, as well as for the first 30 seconds of them. Skin conductance values are not converted to µS units, but instead reflect raw device values, as for our purposes only relative changes are of interest. The average and standard deviation of all test persons' combined standard deviations support the hypothesis that the phenomenon of these highly varying valueranges between sessions is far more pronounced than changes within a session, and can be observed among all test persons. Figure 4 shows that for both period variants, 4 minutes and only the first 30 seconds, the proportion of changes between sessions and within sessions is about the same. The individual differences between persons are also apparent although the ratio between inter- and inner-session standard deviations is comparable.
Fig. 4. Standard deviations between measurements and within measurements within the whole 4 minute periods and the first 30 seconds of the periods
4 Discussion These results demonstrate that variations between individual measurements clearly outweigh changes within one measurement. In the context of our biofeedback application this means that we cannot use the history of past measurements, for example average or median values, directly to predict the span of values in the current session. As the graphs in Figure 3 illustrate, average and median values are nearly identical. This could be seen as an indicator that artifacts due to badly attached electrodes are minimal. Therefore the electrodes implemented into the sensor glove seem to be appropriate for use in such application scenarios. However, the results also seem to confirm that psychophysiology-based applications in real-life settings should not rely on absolute measures and amplitude-based
Measuring Psychophysiological Signals in Every-Day Situations
727
parameters, as these vary too much between usage sessions and also among different users. This is especially true for directly measured parameters like skin conductance, as opposed for example to heart-rate deduced from significant amplitude changes from the R-wave of ECG or pulse wave of BVP. One approach to circumvent the problem of highly varying levels and to predict the value-span would be to require a calibration procedure at the beginning of every usersession. This would give an idea of which values to expect and then scale feedback accordingly. This actually is common practice in laboratory situations to accommodate varying levels of skin conductance between individual test persons. From a usability standpoint, this would be highly annoying for users who just want to play a game or perform a biofeedback session, instead of helping the system to guess what to expect. A solution might be to embed the calibration procedure into the application, for example, as part of a game or exercise. In Aladin, we applied a different approach. Instead of having amplitude-based measures we suggest a simple approach based on trends. In the case of skin conductance this could be the integration of the number of occurrences of rising skin conductance, as well as periods of falling skin conductance, and then taking them as indicators for activation or relaxation. Such an approach based on trends already proved to be promising in our study of an adaptive memory game that subliminally changed its appearance based on physiological input in order to increase memory-performance of the user [11].
5 Conclusion The lessons learned from the field study and the analysis of the measured signals indicate that for psychophysiological appliances in real-life settings one might have to take a step back from the laboratory proven measurement and parameterization techniques and find new ways to utilize physiological signals in alternative human computer interaction. Whilst devices definitely will get better over time, and deliver more precise measurements of certain parameters, the basic problem is likely to remain: users have to accept these measurement devices, and as long as skin contact is required, simple and comfortable electrodes play an important role. This, however, also means that we have to make compromises regarding signal quality and reliability. In future we plan to further develop and evaluate alternative parameterization techniques for physiological values that are less demanding regarding signal quality than the standard laboratory routines, and therefore more appropriate as an additional channel in human-computer interaction for AAL applications in the wild. Acknowledgments. The Specific Targeted Research Project (STREP) Ambient Lighting Assistance for an Ageing Population (ALADIN) has been funded under the Information Society Technologies (IST) priority of the Sixth Framework Programme of the European Commission (IST-2006-045148).
728
W. Ritter
References 1. Acharya, R.U., Joseph, P.K., Kannathal, N., Choo Min, L., Suri, J.S.: Heart Rate Variability. In: Acharya, R.U., Suri, J.S., Spaan, J.A.E., Krishnan, S.M. (eds.) Advances in Cardiac Signal Processing, pp. 121–166. Springer, Berlin (2007) 2. Cacioppo, J.T., Tassinary, L.G., Fridlund, A.J.: The Skeletomotor System. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 325–384. Cambridge University Press, Cambridge (1990) 3. Clifford, G.D.: ECG Statistics, Noise, Artifacts, and Missing Data. In: Acharya, R.U., Suri, J.S., Spaan, J.A.E., Krishnan, S.M. (eds.) Advanced Methods and Tools for ECG Data Analysis, pp. 55–100. Artech House, Boston (2007) 4. Coles, M.G.H., Gratton, G., Fabiani, M.: Event-Related Brain Potentials. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 413–455. Cambridge University Press, Cambridge (1990) 5. Dawson, M.E., Schell, A.M., Filion, D.L.: The Electrodermal System. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 295–324. Cambridge University Press, Cambridge (1990) 6. Duffy, E.: Activation. In: Greenfield, N., Sternbach, R. (eds.) Handbook of Psychophysiology, pp. 577–622. Holt, Reinhart and Winston, New York (1972) 7. Eckman, P.: Methods for Measuring Facial Action. In: Scherer, K.R., Ekman, P. (eds.) Handbook of Methods in Nonverbal Behavior Research, pp. 45–90. Cambridge University Press, Cambridge (1982) 8. Izsó, L.: Developing Evaluation Methodologies for Human-Computer Interaction. Delft University Press, Delft (2001) 9. Kempter, G., Maier, E.: Increasing Psycho-physiological Wellbeing by Means of an Adaptive Lighting System. In: Cunningham, P., Cunningham, M. (eds.) Expanding the Knowledge Economy: Issues, Applications, Case studies, pp. 529–536. IOS, Amsterdam (2007) 10. Kempter, G., Ritter, W.: Einsatz von Psychophysiologie in der Mensch-Computer Interaktion. In: Heinecke, A.M., Paul, H. (eds.) Mensch & Computer 2006. Mensch und Computer im Strukturwandel, pp. 165–174. Oldenbourg, München (2006) 11. Kempter, G., Ritter, W., Dontschewa, M.: Evolutionary Feature Detection in Interactive Biofeedback Interfaces. In: Universal Access in HCI: Exploring New Interaction Environments. HCII 2005 Las Vegas Paper Presentation. CD-ROM (2005) 12. Papillo, J.F., Shapiro, D.: The Cardiovascular System. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 456–512. Cambridge University Press, Cambridge (1990) 13. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997) 14. Ray, W.J.: The Electrocortical System. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 385–455. Cambridge University Press, Cambridge (1990) 15. Ritter, W., Becker, K., Kempter, G.: Mobile Physiology Monitoring Systems. In: Maier, E., Roux, P. (eds.) Seniorengerechte Schnittstellen zur Technik, pp. 78–84. Pabst Science Publishers, Lengerich (2008) 16. Vossel, G., Zimmer, H.: Psychophysiologie. Verlag W. Kohlhammer, Stuttgart (1998)
Why Here and Now Antonio Rizzo1, Elisa Rubegni1,2, and Maurizio Caporali1 1
University of Siena, Computer Science Department, Via Roma 56, 53100 Siena, Italy 2 Università della Svizzera italiana, TEC-Lab, via Buffi 13, 6900 Lugano TI (CH) {rizzo,caporali}@unisi.it, [email protected]
Abstract. The paper presents our vision in the process of creating new objects and things, based on reducing the estrangement of Internet content consumption by conceiving interaction modalities suitable for social activities occurring in the here-and-now, in real-time and real-place. These aspects should be incorporated in interactive artefacts not only for the contents consumption but also for editing and manipulating information. We present some projects and concepts that go in this direction, and among them we show the design solutions developed in our laboratory that aim to enhance the role of the physical location, social and cultural environment in affecting the contents and the way to interact with them. Keywords: Human-Computer Interaction, Interaction design, Situated Editing, Design for all, Tangible User Interface, Ubiquitous computing, Internet of Things.
1 Introduction We share the view that computers, phones and game consoles are no longer the only devices in our environment deemed worthy of being “intelligent” and connected. But within this broad view of the process of creating new objects and things, we want to address two specific issues. First, most current technology solutions take the user out from the physical and social context in which he is actually involved – communication devices and data links are seldom used to empower in-presence social activities. Second, the development of most networked devices proposes the irrelevance of physical location as one of the key advantages of being constantly connected. In considering these two issues, we do not dispute the utility of distance communication and any-place-any-time accessibility to information. We seek to complement these two crucial factors of ubiquitous computing with their converse. We propose to respond to these challenges by conceiving interaction modalities suitable for social activities occurring in the here-and-now, in real-time and real-place. Our goal is to develop design solutions that mitigate or even eliminate the almost compulsory estrangement from the physical context when using communication technology. We challenge information and communication technologies that allow anytime-anywhere access to provide content that could be enhanced by the fact of the user being in a given location. Internet resources are barely affected by the context in which the user accesses them - the content and interaction modality remain the same irrespective of physical C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 729–737, 2009. © Springer-Verlag Berlin Heidelberg 2009
730
A. Rizzo, E. Rubegni, and M. Caporali
location or cultural environment. For most web technologies the specific location from which the user accesses information adds nothing to the user’s experience: whether the user is in New York, Hong Kong, or the countryside near Siena doesn’t seem to matter. In the paper, concrete example of products will be provided and the implications of this approach will be discussed. These aspects should be incorporated in interactive artefacts not only for the contents consumption but also in editing and manipulating information.
2 Background and Motivation Lucy Suchman 0 gives a fundamental contribution in the direction of understanding the relationship between “interactive artefacts” and the context of use, emphasizing the role of the environment in the cognitive process. From this standpoint, designing an interactive artefact means not only designing a device, but designing new human activities and behaviour [1]. Since the cognitive process is socially and historically situated, the physical location is important and the focus of designing interactive systems should be on the usage. The Situated action theory 0 influenced the research behind the design of interactive systems and contributed to define a methodological background for development of interactive system 0. In this view, the physical/social context, the range of human activities supported and the contents provided are the main factors that should be considered in the definition of the user interface [2]. Internet and mobile ICT promise the redefinition of spatial, temporal and contextual boundaries promoting “anytime, anywhere” access to information 0 allowing new interaction modalities among people, objects and machines located in the environment. Especially mobile interaction implies the possibilities for the compression and fluidification of space, time and context of interaction [6]. The patterns of spatial behaviour and temporal understanding change dramatically within mobile technology [8]. Mobile technology opens new visions for design and creates new challenges as the adaptation of the contents and of the interaction patterns to the context of use. Thus, designing user interaction needs to consider the temporal and spatial interdependencies. One of the most current topics of debate regards designing for Situated interaction [9], considering context as not merely the location, but also as the user activity supported. Recent studies on embedding information devices in the environment put significant attention on the understanding of possible patterns of interaction enabled by technology, the physical/social and cultural environment and the human activities potentially supported [12, 16, 21]. The issue of the compulsory estrangement from the physical context when using communication technology is a key challenge in research, especially in the domain of Ubiquitous computing [24], Physical Computing [15] Everyware [5], Tangible User Interface [22] and the Internet Of Things [14]. These are based on the same viewpoint: computers, phones and game consoles are no longer the only devices in our environment deemed worthy to embody computation and be connected. New kinds of devices potentially enable interaction modalities oriented to provide added value of being in a given location when accessing a given content [e.g. 23]. In this perspective, the design
Why Here and Now
731
has to be based on the creation of patterns of interaction in accordance with the specific type of place, contents and activity that differ according with the technology used. The vision given by the above mentioned emerging technology domains allows the definition of innovative interaction modalities in terms of contents consumption and editing. For example, current research considers possible ways in which contents consumption and manipulation can be affected by the location in which they are delivered. Therefore, even the manipulation of contents has to consider the context in which the activity takes place. In this direction, an interesting concept that emerged from research is the Situated Editing (SE). SE was developed within the POGO project as a way to join the “invisible computing” approach [12], that is, to allow a seamless integration of the physical and virtual world through intuitive interaction modalities. These concepts are extended from the POGO project research, which designed and tested prototypes for children of elementary school level to support narrative storytelling through interaction with digital devices [17]. SE enables real-time manipulation of educational assets, permitting students and teachers to share the production of course content. The environment developed in POGO has a number of tools that support the SE. Raw non-digital media elements (e.g. drawings, sounds) can be converted into digital assets using tools for rich asset creation. These digital assets are stored on physical media carriers (transponders) and can be used in tools that support story telling. With these tools assets come alive on a big projection screen, sound system, paper cardboard, paper sticks etc. The system offers tools to capture the creative end-results and share it with others using the Internet in movies or in digital or paper based storyboards. This concept orients the design of the few projects at our laboratory and inspires the development of several prototypes. The SE concept as has been elaborated in our project has been slightly modified. The interaction modalities designed for the consumption and manipulation of the Internet contents are defined for specific devices as computer or mobile phone. Recent elaborations of these modalities go in the direction of adding physical control devices to extend the interaction to physical world. Though these modalities could enhance user experience, they always enable the manipulation and fruition of contents in a traditional ways: using computer or mobile device in defined place (or in movement) and time. In our perspective the way people access, consume and manage Internet contents has to be enriched by a given location and context. It means that being in a physical space should affect people interaction not just in terms of the device used but also concerning the way people interact with it. Thus, the system should be aware if the user is alone or with others, at home or in the office and provide more suitable interaction. The idea is not to jeopardize the actual way of interacting with media, but to enrich and complement these modalities. Our research activity goes in the direction of extending this research line, by offering new patterns and tools that enable an interaction contingent on the user location (here and now) and activities. Our aim is to reduce the estrangement that actually characterizes the interaction through traditional devices (e.g. desktop, laptop, mobile phone) by conceiving interaction modalities suitable for social activities occurring in the here-and-now, in real-time and real-place.
732
A. Rizzo, E. Rubegni, and M. Caporali
Following we provide some examples of artefacts that embody the concept of situated editing as a metaphor to mediate interaction between tools and human activity. Several projects are oriented to improve the role of here and now in the fruition of contents. These projects span many modes of interaction, from gestural (gesture as the main input based interaction), visual (manipulation of images) and aural (sound based interface).
3 Situated Interaction and Editing: Some Examples The most consolidate interaction modalities to access Internet contents are based on keyboard and mouse but recently Ubiquitous computing, Physical Computing, Everyware, Tangible User Interface and the Internet Of Things propose richer interaction patterns. Nowadays, one of the main problems related to the contents consumption concerns the time spent for accessing and managing information, and different modes of accessing it in different places. The Web 2.0 enables Internet users to access contents in real time (e.g. video streaming). This is based on the idea that the contents and the way information is consumed and edited may change according with the location, timing, channel and technological device. In our perspective, contents may and have to be affected by these dimensions. The same contents can be accessible by the users in a huge variety of situations (being in motion or being in a specific physical space) through a specific channel (e.g. audio or video streaming) and using the devices that are more suitable to the context of usage. For example, while the user is driving (s)he can listen to the radio information that are located on his/her computer at office, or once at home (s)he can read news selected beforehand at the office, on the screen of his TV. In this case the interaction modalities and the delivery of contents change according with location and the activities in which the user is involved. Channels that deliver contents can be determined by the user location and activity. For example, the user can bookmark content and decide how and when to access it according to specific location, channel or timing. In order to exemplify our vision on situated editing and interaction we use some examples of projects and concepts. There is a huge quantity of projects that address the issue of manipulating media, from which we selected for illustration just those that better represent our vision. Some of the projects mentioned below concern editing and management of contents, tagging, social bookmarking, mobile social networking. Recently, the trend of editing and manipulating digital contents moves interaction from the desktop to the physical world. Aurora1 is a new concept for a browser that integrates Web, desktop and physical location of the user. The browser is aware of the user physical context and proposes patterns of interaction suitable to it, merging data and user behaviour. Whenever possible, Aurora leverages natural interactions with objects in space. The interaction with physical objects for the manipulation of media is a key feature of the Bowl project. The Bowl2 [10] focuses on the use of tangible interfaces for 1 2
http://adaptivepath.com/aurora/ http://www.thisplacement.com/2007/11/12/bowl-tokene-based-media-for-children-at-dux2007
Why Here and Now
733
handling media in the home environment. The bowl, placed on the living room table, can be used for the manipulation of media moving from physical towards online, social and time shifted distribution with services like YouTube, Vimeo, TiVo and Apple TV. The project aims to find solutions for the physical form of the interface, the types of interaction and the kinds of suitable content. In the same direction, Siftables3 [11] enables people to interact with information and media approaching interactions with compact devices with sensing, graphical display, and wireless communication capabilities. These compact devices can be physically manipulated as a group to interact with digital information and media. Items can be sorted into piles or placed one close to the other.
Fig. 1. On the left side a picture of the Bowl project; at the right side Siftable items
Several projects regarding the issue of location affects contents are related to the domain of the mobile social networking [19]. Many projects aim to enhance the engagement of people with the local environment. The project Whrrl4 allows the real-time personalization of the physical environment based on local discovery and user-generated content through the integration of Web and mobile experience. Whrrl enables the sharing of local knowledge with other members of the community. All these examples share the same vision of adding value to interaction by enabling the manipulation of physical objects for editing media and affecting contents with the physical location. Though these projects address these issues, they provide a way of interacting with content that is traditionally connected to the use of computers or mobile devices. These projects show that there is still a lack in reducing the estrangement of people in interacting with contents in a given location. Our purpose is to address to this challenge by conceiving innovative interaction patterns that consider the given location and context. Following, some examples of projects from our research laboratory. 3.1 Wi-roni and Other Examples The Wi-roni project [18] goes in the direction of enhancing the role of the territory (in both social and physical aspects) in affecting the contents (and the way to interact with contents). Wi-roni addresses to improve Internet contents consumption and 3 4
http://web.media.mit.edu/~dmerrill/siftables.html Launch Pad, O'Reilly Where 2.0 Conference, Burlingame, CA, May 12, 2008.
734
A. Rizzo, E. Rubegni, and M. Caporali
manipulation considering the added-value of being in a given location when accessing a given content. Wi-roni is a Urban Architecture project located in the La Gora public park in Monteroni d'Arbia, a small village in the province of Siena (Italy). For this project, we developed two interconnected solutions: Wi-wave, a vertical pillar for accessing web audio content in public spaces and Wi-swing, a children’s swing that tells stories while swinging. Wi-wave facilitates the interaction of people with web content in public spaces. Wi-wave uses ultrasonic sensor technology to capture physical gestures as a navigation interface for three channels of audio playback/streaming. The content offered by Wi-wave is a collection of two audio types: streaming radio and synchronized podcast. Wi-wave allows everyone to listen to Podcasts in a public area, and, from a research perspective, allows us to explore issues regarding the design of interaction through patterns of behavior that may have aesthetic and imaginative value. Users can also upload through their mobile device their podcast files and listen to these in the Park sharing the contents with their friends.
Fig. 2. On the left side the Wi-roni playground system; on the right side Wi-wave final prototype
Wi-wave is a fully implemented prototype installed in the public park, while Wi-swing is a concept under development. Wi-swing is the tool that associated with Wi-wave allows the situated editing. Wiswing is a tool for listening to storytelling and in general for broadcasting the output of Wi-wave (the other concept developed in the project). Children can browse and control the speed of the narrative through the movements of the swing and can edit contents delivered by adding sound to the original story. These modalities are complementary and they can be activated according with the context of use: if the child is alone s/he can just listen to the story, but if more children are playing with Wi-swing the author modality is switched on. They can play with sound objects and create the audio for the story.
Why Here and Now
735
Another concept developed at our laboratory is the “Parco della Luna”(“Moon park”). The project aims to support people activities in a public park namely “Parco della Luna” devoted at delivering astronomical contents. “Parco della Luna” extends the exploration of the sky, in the physical place, sitting or lying down in the park, through the Internet. The idea is to use spatial references such as trees or buildings for navigating information on the Internet and have an overlapping layer between what you are seeing in the physical space and information from Internet. Currently, between the place where you are living the experience with the action and the place where you are interacting with the on-line world, there is a gap. The prototypes and concepts mentioned above (Wi-roni and “Parco della Luna”) developed at our laboratory try to reduce this gap.
4 Conclusion Our vision is based on reducing the estrangement of Internet content consumption by conceiving interaction modalities suitable for social activities occurring in the hereand-now, in real-time and real-place. Using the metaphor of water distribution as a stand-in for the Internet: water should be in any location where we want life, but it is a matter of human cultural design whether to have a simple tap or to build a fountain. Our perspective offers room for design in the direction of creating “fountains” with unique interaction modalities where the emerging behaviour of people becomes a value itself. This perspective opens a huge space for the design of user experiences that integrate existing modalities of interaction (as those enabled by desktop and mobile phones) with innovative ones (e.g Wi-wave) in order to extend the opportunities offered by a given location and context for contents consumption and manipulation. Another important aspect regards the trend of merging digital and physical worlds that allow the use of pre-existing objects for the editing and consumption of media. For example, the interaction conceived in Wi-swing for allowing children to listen to narratives and to edit sounds is based on the traditional modalities of playing with a swing. In these situations, the user experience is enriched, bringing together familiar modes of interacting with physical objects and the interactive possibilities offered by the use of media. The SE principle moves from the opportunities for actions (affordances), which are available to specific people in given context, towards elaborating interaction modalities that seamlessly integrate the physical and the digital world. This integration is not only a way to empower existing opportunities for action with new goal-oriented effects and transformation in the world but also a way to transform constraints in resources. This happens since the design of the interaction follows the action pattern that specific users tend to privilege, according to their abilities and competence, in a given context and set of activities associated. Furthermore, the interaction modalities can overlap with one another, that is, different users could reach the same or close goals through their own interaction pattern. This seems to us a promising way to approach the design considering all the issues we have mentioned [4].
736
A. Rizzo, E. Rubegni, and M. Caporali
References 1. Bannon, L.J., Bødker, S.: Beyond the interface. Encountering artifacts in use. In: Carroll, J.M. (ed.) Designing interaction: Psychology at the human-computer interface, pp. 227–253. Cambridge University Press, Cambridge (1991) 2. Bødker, S.: A Human Activity Approach to User Interfaces. Human Computer Interaction 4(3), 171–195 (1989) 3. Bødker, S.: Creating conditions for participation: Conflicts and resources in systems design. Human Computer Interaction 11(3), 215–236 (1996) 4. Emiliani, P.L., Stephanidis, C.: Universal access to ambient intelligence environments: opportunities and challenges for people with disabilities. IBM Systems Journal 44(3), 605–619 (2003) 5. Greenfield, A.: Everyware: The Dawning of Ubiquitous Computing. New Riders Publishing (2006) 6. Kakihara, M., Sørensen, C.: Mobility: An Extended Perspective. In: Sprague Jr., R. (ed.) Thirty-Fifth Hawaii International Conference on System Sciences (HICSS-35), Big Island Hawaii. IEEE, Los Alamitos (2002) 7. Kleinrock, L.: Nomadicity: Anytime, Anywhere in a Disconnected World. Mobile Networks and Applications 1, 351–357 (1996) 8. Ling, R.: The Mobile Connection: The Cell Phone’s Impact on Society. Morgan Kaufmann, Amsterdam (2004) 9. Mccullough, M.: Digital Ground: Architecture, Pervasive Computing, and Environmental Knowing. MIT Press, Cambridge (2004) 10. Martinussen, E.S., Knutsen, J., Arnall, T.: Bowl: token-based media for children. In: Proceedings of the 2007 Conference on Designing For User Experiences. DUX 2007, Chicago, Illinois, November 05 - 07, 2007, pp. 3–16. ACM, New York (2007) 11. Merrill, D., Raffle, H., Aimi, R.: The Sound of Touch: Physical Manipulation of Digital Sound. In: The Proceedings the SIGCHI conference on Human factors in computing systems (CHI 2008), Florence, Italy (2008) 12. Norman, D.: The Invisible Computer: Why Good Products Can Fail, the Personal Computer Is So Complex, and Information Appliances Are the Solution. MIT Press, Cambridge (1999) 13. Norman, D., Draper, S.W.: User-Centered System Design: New Perspectives on HumanComputer Interaction. Lawrence Erlbaum Associates, Hillsdale (1986) 14. O’Reilly, T.: Web 2.0 Compact Definition: Trying Again (2007) (retrieved on 2007-01-20) 15. O’Sullivan, D., Igoe, T.: Physical Computing. Thompson, Boston (2004) 16. Redström, J.: Designing Everyday Computational Things. PhD dissertation (Report 20), Göteborg University (2001) 17. Rizzo, A., Marti, P., Decortis, F., Moderini, C., Rutgers, J.: The POGO story world. In: Hollnagen, E. (ed.) Handbook of Cognitive Task Design. Laurence Erlbaum, London (2003) 18. Rizzo, A., Rubegni, E., Grönval, E., Caporali, M., Alessandrini, A.: The Net in the Park. In: Special issue Interaction Design in Italy: where we are, Knowkege, technology and Politics Journal. Springer, Heidelberg (in printing) 19. Strachan, S., Murray-Smith, R.: GeoPoke: rotational mechanical systems metaphor for embodied geosocial interaction. In: Proceedings of the 5th Nordic Conference on HumanComputer interaction: Building Bridges, NordiCHI 2008, Lund, Sweden, October 20 - 22, 2008, vol. 358, pp. 543–546. ACM, New York (2008)
Why Here and Now
737
20. Suchman, L.A.: Plans and situated actions. The problem of human-machine communication. Cambridge University Press, Cambridge (1987) 21. Susani, M.: Ambient intelligence: the next generation of user centeredness: Interaction contextualized in space. Interactions 12(4) (2005) 22. Ullmer, B., Ishii, H.: Emerging frameworks for tangible user interfaces. IBM Systems Journal 39, 915–931 (2000) 23. Vazquez, J.I., Lopez-de-Ipina, D.: Social devices: Autonomous artifacts that communicate on the internet. In: Floerkemeier, C., Langheinrich, M., Fleisch, E., Mattern, F., Sarma, S.E. (eds.) IOT 2008. LNCS, vol. 4952, pp. 308–324. Springer, Heidelberg (2008) 24. Weiser, M.: Ubiquitous computing. Computer 26, 71–72 (1993)
A Framework for Service Convergence via Device Cooperation Seungchul Shin1, Do-Yoon Kim2, and Sung-young Yoon3 1
Samsung Electronics 2 LG Electronics 3 Yonsei Univ. 232, Sinchon-Dong, Seodaemun-Gu, Seoul, 120-749, Republic of Korea [email protected], [email protected], [email protected]
Abstract. Device convergence is one of the most significant trends in current intelligence technology (IT). It is incorporating various kinds of existing services into one device that enables users to provide converged services. However, device convergence is not the whole counterplan that in some fields, costumers prefer divergent application-specific device. In addition, service is always hardware dependent that if a new service appears device convergence will be helpless in case of the hardware not supporting the service. In this situation to use the new service, we have to purchase the whole new device. Therefore, we propose a framework for service convergence via device cooperation supported by the wireless network to overcome the constraint of device convergence. Our framework shows a guideline that enables a device to provide a service by cooperating with other devices despite it lacks the hardware support or only provides one or two specialized services. Keywords: service convergence, device cooperation, device convergence, mobile computing.
1 Introduction As the hardware resources and the mobile computing technology has improved, various fixed and mobile services such as games, digital multimedia broadcasting (DMB), location based services (LBS), RFIDs are now supported on a single handset. This device convergence approach makes the mobile device a phone, a mp3 player, a digital camera, a camcorder, a game player, a pda and so on while the device divergence approach focus on one or two specialized services. Though there are huge arguments prospecting the future trend of the mobile device, the device convergence is still mentioned as a key. However, there are some issues on the device convergence. First, the support of multifunction makes the interface generic because multiple functions have to be combined into one design which loses the peculiarity [1]. Second, despite the hardware resources have improved, the cost, battery consumption and processing power increases. Third, the key factor of the success is the consumer’s demand not the C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 738–747, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Framework for Service Convergence via Device Cooperation
739
Fig. 1. The concept of the Service Convergence via Device Cooperation
technology [2]. Finally, the boundary of the devices will be ambiguous that the customers will lose a large opportunity of choice. On the other hand, the device divergence has some features [1]. First, the interface and the layout of the device are closely related to its services. Second, it has developed a new market of widgets related to the device that enriched additional specialized functions. For an example, the Apple iPod, iPod nano, iPod shuffle [3] have limited services but also has a bunch of accessories such as iPod Hi-Fi, Armband, Nike+iPod Sport Kit, radio remote and so on to supplement them. Third, as the result, a high degree of differentiation can be accomplished. In [4] and [5], it concludes that divergent application-specific device is more probable than a perfect level of convergent device. However, arguing device convergence and device divergence of which will be the trend in the future can be no use when a new service appears but the hardware lacks the ability. This occurs because services are hardware dependent. If the device has an interface to attach a supplementary hardware for the new service, it will be no problem. But in the case such as device has no interface, no attachable hardware exists, the attachable hardware is too expensive or is too cumbersome due to the size, the weight, the design, then we have to buy a whole new device supporting that new service. We propose a framework for service convergence via device cooperation to overcome the constraint of the hardware support and to avoid device convergence but to support multiple functions in mobile environment. Service divergence is incorporating disparate services together into a combined service. By using the ability of wireless network, the framework enables a user to use the services of the surrounding devices in his handset as if he carries a service-convergent device.
740
S. Shin, D.-Y. Kim, and S.-y. Yoon
The remainder of this paper is organized as follows. In section 2, we discuss several types of convergence for the related work. Section 3 presents the results of the analysis based on web-based survey related to the device convergence and the device divergence. In section 4, the type, classification and flow of the context used in our proposed system is introduced. Section 5 describes the overall architecture of the framework, the service discovery protocol, the context exchange protocol, privacy and authentication for service convergence via device cooperation. In section 6, we implemented an application of the framework. Several PDAs and a ultra-mobile-PC (UMPC) was used for the device. Section 7 evaluates the performance of the framework and finally section 8 concludes the paper and discusses about future work.
2 Framework Design In this section, we introduce the framework for service convergence via device cooperation. The proposed framework is systematically divided into components for the operations and protocols for communication between the mobile devices. The components are classified into three hierarchical layers. Moreover, only the communication protocols between the layers or the devices are defined. This makes the architecture support flexibility and extensibility. The framework is conceptually classified into two part.: One for the service provider and the other for the service consumer. The service consumer indicates the user’s device and the service provider is the cooperative device that provides its service through the wireless network. The role of the service consumer and service provider are interchangeable. They are organized into the three hierarchical layers mentioned above which are the interface layer, the core layer and the communication layer [19]. The details are described below: The interface layer connects the service application and the framework. In this layer, a device communicates with the service using the service API. A module for managing the service API and for translating the received message from the service API is loaded. The layer locates between the application and the core layer. The Core layer components include the core functions of the device, manage the entire framework and control the operation. Moreover, the system log manager supports the record of all operations in the framework. The communication layer supports the component for communication between devices. It delivers the transferred message to the upper layer. 2.1 Service Consumer Service Consumer is the device which requests for the services supported by other devices. It contains the component which discovers nearby service lists. Figure 2 is the architecture of the service consumer. It is divided to the interface layer, core layer and communication layer. The interface layer consists of the service manager and the message parser. The service manager manages the service list to support simultaneous service for the framework. In addition, it request for the service from the service API and informs the result. The message parser parses the input API parameter and analyzes it to an understandable format for the framework. The core layer is composed of the service consumer controller, the database manger, service list manager, task manager and the log
A Framework for Service Convergence via Device Cooperation
741
Fig. 2. Service Consumer System Architectures
manager. The service consumer controller supervises the entire components in the framework. The data manager stores the discovered service list and the service description, and also retrieves them when requested by the service API. The stored service provider list is continuously supervised. The task manager manages the task queue for the pending service requests. The log manager is a component for the developers that it records the entire operations executed in the service consumer device. The communication layer is organized into the service discovery protocol component and the service transfer / receiver protocol all related to packet transfer components. The service discovery protocol has the duty of collecting service lists through network and sending the results to the service data base manager. The service transfer / receiver component has the responsibility to send / receive the requested service 2.2 Service Provider Service Provider is the device that provides its service to other devices. Figure 3 shows the architecture and the components. The service manager and the message parser is the component of the interface layer in the service provider. They function as same as the ones in the service consumer. The core layer in service provider consists of service provider controller, scheduler, service database manger and the log manager. The service provider controller controls the overall components and the operations. The service database manager contains the service list which the device supports. The context transfer scheduler schedules sending a service when an event occurs or is activated in a periodical cycle. The log manager component in the service provider is the same as the one of the service consumer. The service discovery and the service transmitter component are the ones in the communication layer. These are for communication which is a protocol to use a service. The service discovery component sends the request form the service consumer to the upper layer. The service provider
742
S. Shin, D.-Y. Kim, and S.-y. Yoon
Fig. 3. Architecture of Service Provider
controller component then decides the possibility of sending the service list. The service transmitter component receives the appropriate service from the service provider controller component The transmitting time is decided by the scheduler.
3 Implementation of Device Share We have implemented the proposed framework for service convergence via device cooperation. The proposed framework is based on wireless network; However not all mobile devices supports wireless network. Therefore, we chose PDA, Smart phone and Ultra Mobile PC (UMPC) for the mobile device. We limited the services which the chosen mobile device can provide for the testing. Proposed framework was implemented based on Microsoft .NET Framework 2.0. The detail specifications are shown in table 1. Fig. 4 shows the mobile devices used for the implementation. Each device provides its own unique function such as RFID, Camera, DMB, and GPS. All of them support wireless LAN and share the service through wireless LAN. In case of HP RX5965 PDA, we have used external GPS for testing because built-in GPS had a poor receive sensitivity. Table 1. The detailed specification of devices used for the implementation Device Name(Type) Acer n50 (PDA) HP RW6100 (Smart Phone) Samsung Q1 (UMPC) HP RX5965 (PDA)
CPU 520Mhz 520Mhz 900Mhz 400Mhz
O/S Wireless WM 2003 O WM2003 O Win XP O WM 5.0 O
Bluetooth O X O O
Services RFID Camera DMB GPS
A Framework for Service Convergence via Device Cooperation
743
Fig. 4. The snapshot of devices used for the implementation
In case of UMPC, we tried to share DMB service; however we could not acquire the development kit for DMB control. Therefore, we used it as a service consumer to use service from the other devices. The mobile devices all support Bluetooth and 802.11b except RW6100 does not have Bluetooth. We used only 802.11b for the evaluation this time.
(a)
(b)
(c)
(f)
(d)
(e)
(g)
(h)
(i) Fig. 5. The screenshot of implementation. (a) ~ (e), (g), and (h) shows the services running on PDA. (f) shows the application on PC. (i) shows the images using the implementation.
744
S. Shin, D.-Y. Kim, and S.-y. Yoon
Fig.5 shows the implementation of the proposed framework. Fig.5 (a) shows the initialized main menu in PDA. Since the framework is not yet executed, no service appears in the display. Fig.5 (b) shows the result of Service list that is received from peripheral devices after execution of Service Discovery. As you can see on Figure, searched Service has the Service name together with activated button. If you press this button, you can use the Service through the hardware of peripheral Service provider. Fig.5 (c) shows the view of using RFID tag Reader through the peripheral RFID Service provider. This system is not equipped with RFID Reader. Fig.5 (d) shows the picture of receiving local information from GPS Service provider. As you can see on this figure, the location of Service provider is indicated on screen. Fig.5 (e) shows the view of taking picture by Camera Service. Fig.5 (f) shows that the local information provided from Fig.5 (d) is saved and indicated on Google Map. Fig.5 (f) and 5 (g) shows the implementation using RFID tag on u-TourGuide one of the result of previous studies. And the rest figures show the view of using actual Service.
4 Experimental Results In this chapter, we measure performance of framework developed in Chapter 3. In this test, it measures Service research time according to the number Service in Discovery protocol and investigates the difference of Service time between passed through framework and local providing. It shows the result of the time based on the number of client to Service provider and battery consumption time in case of processing with attaching to local and network. Developed system can only measure 1ms unit, so we measure the time with the number of System Tick provided by .NET Framework. Under the testing environment, measuring of 100ms tick was about 102 Tick Count. It is anticipated that 1Tick is Approx. 1ms. All the value used in this testing were recorded through the file in/output of log manger, therefore if I/O is hard, there may have little bit of latency compared to actual consumption time. The result of testing is like following. 4.1 Service Discovery Time Fig. 6 shows Service research time based on the number of device. In this testing, in a testing room of small-sized network, we added one more PC to the device of figure 8 and measured consumption time of Discovery protocol. PC is connected with wire LAN and the rest units are connected to network oriented to AP with wireless condition. Fig.10 shows the kind of devices used in X axis testing and Y axis is Tick Count that is held by Service Discovery response. ‘Send’ expresses the time until the transmission of Request packet for Discovery. The testing result of Figure is from the measuring on Acer n50 Premium PDA which has RFID Service and it allows Loop back for the response regarding transmitted Discovery from self device. As a result of testing, 3 PC, GPS and RFID devices had constant response time. On the other hand, RW6100 and UMPC consumed relatively long time for Discovery. In case of RW6100, it is Smart phone that has operated many processes compared to other devices; therefore it takes more time to request Discovery response. In case of UMPC,
A Framework for Service Convergence via Device Cooperation
745
Fig. 6. Service discovery protocol time depend on number of service devices
Fig. 7. Data transmission time on each services
WindowsXP is too heavy to operate in UMPC, therefore it is able to be anticipated that the response request has some delays. Fig.7 is the summary of Service time difference between local service in each Services and remote device using proposed framework. In this testing, we showed testing result for Camera, RFID, GPS, 3 items only. For the Camera, we utilized unique interface that is used for RW6100 and it transmits the date about 80000 ~ 11000byte during its data transmission. Both GPS and RFID are using serial interface and transmit approximately 26 byte at once. We summarized the type of interface for each Service and data amount on Fig. 8. In case of camera, it transmits preview screen also in real time, SDK that used in this testing is not able to control preview screen. Therefore we did not transmit any motion picture of preview and only measured the time when it captured the image.
746
S. Shin, D.-Y. Kim, and S.-y. Yoon
4.2 Service Transmission Time We excluded the time of camera open or initialization. In case of RFID, it is necessary to execute both Trigger for selecting of tag and Read operation for reading the tag. But in this testing, we only measured the Service time during Read operation. In case of GPS, we set that the Service provider transmits coordinates only for the time of local information request in a condition of GPS on every time. Therefore there was no measurement for extra initialization time. As a result of testing, we realized that camera consumed quite long time to transmit the date with using proposed framework. Same as Discovery, RW6100 consumed more latency because of CDMA function by the feature of Smart Phone. Big amount of date to transmit and receive also affect a lot for that. Table 2. Service Type, Data Amount to transmit and Interface Device Name(Type) Acer n50 (PDA) HP RW6100 (Smart Phone) HP RX5965 (PDA)
Service Type RFID Camera GPS
Interface Serial Unknown Serial
Min DataS ize 4 Bytes 85,700 Bytes 8 Bytes
Max Data Size 64 Bytes 105,124 Bytes 8 Bytes
5 Conclusions and Discussion In this paper, we proposed a framework for service convergence via device cooperation in mobile environment. The framework is proposed to cope with the limitations of the hardware dependent service. By cooperating the devices’ ability, a user can use all the combined service which is called service convergence, although the hardware could not support them. This means that the device does not have to convergent all kinds of services but only can support one or two specialized high quality services. Moreover, the architecture of the framework is classified into three hierarchical layers which can be easily extendable. As the result, it will have no problem to cope with any new service in the future. We proved the proposed framework has reasonable performance under mobile computing environment through actual implementation and testing results.
References 1. Norager, R.: Complex Devices, Widgets and Gadgets.: Product Convergence. Copyright by Rune Norager 2005 (2005) 2. Yoffie, D.: Competing in the Age of Digital Convergence. Harvard Business School Press, Cambridge (1997) 3. iPod. Apple Inc., http://www.apple.com/ipod/ipod.html 4. Kim, Y.-B., Lee, J.-D., Koh, D.-Y.: Effects of Consumer Preference on the Convergence of Mobile Telecommunications Device. Applied Economics 37, 817–826 (2005) 5. Mueller, M.: Digital convergence and its consequences, Working Paper, Syracuse University (1999)
A Framework for Service Convergence via Device Cooperation
747
6. Bubley, D.: Device divergence always beats convergence, http://disruptivewireless.blogspot.com/2006/05/ device-divergence-always-beats.html 7. Yasuda, K., Hagino, Y.: Ad-hoc Filesystem: A Novel Network Filesystem for Ad-hoc Wireless Networks. In: Lorenz, P. (ed.) ICN 2001. LNCS, vol. 2094, pp. 177–185. Springer, Heidelberg (2001) 8. Kravets, R., Carter, C., Magalhaes, L.: A Cooperative Approach to User Mobility. ACM Computer Communications Review 31 (2001) 9. Shin., S.-C., Kim, D.-Y., Cheong, C.-H., Han, T.-D.: Mango: The File Sharing System for Device Cooperation in Mobile Computing Environment. In: Proceedings of International Workshop on Context-Awareness for Self-Managing Systems (CASEMANS 2007), Toronto, Canada (May 2007) 10. Sun Microsystems, Jini architecture specification, http://www.javasoft.com/products/jini/specs/jini-spec.pdf 11. UPnP Forum: Understanding Universal Plug and Play White Paper (2000), http://www.upnp.org 12. International Organization for Standardization: Information technology – Automatic identification and data capture techniques – Bar code symbology – PDF-417. QR Code. ISO/IEC 15438 (2001) 13. International Organization for Standardization: Information technology – Automatic identification and data capture techniques – Bar code symbology –QR Code. ISO/IEC 18004 (2000) 14. ColorCode. ColorZip Media Inc (2006), http://www.colorzip.com 15. Kim, D., Seo, J., Chung, C., Han, T.: Tag Interface for Pervasive Computing. In: Proceedings of the international conference on Signal Processing and Multimedia, pp. 356–359 (2006) 16. Azfar, A.: Fixed mobile convergence - some considerations 17. Chung, Y.-W., et al.: A Strategy on the Fixed Mobile Convergence 18. D2., Cowon Corp, http://product.cowon.com/product/product_D2_dic.php 19. Han, T.-D., Yoon, H.-M., Jeong, S.-H., Kang, B.-S.: Implementation of personalized situation-aware service. In: Proc. of Personalized Context Modeling and Management for UbiComp Applications (ubiPCMM 2005), pp. 101–106 (2005)
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents Jérôme Simonin and Noëlle Carbonell INRIA (Nancy Grand-Est Research Center), Henri Poincaré University LORIA, Campus Scientifique BP 70239, Vandoeuvre-lès-Nancy Cedex, France {Jerome.Simonin,Noelle.Carbonell}@loria.fr
Abstract. We present and discuss the results of two empirical studies that aim at assessing the contributions, to the effectiveness and efficiency of online help of: adaptive-proactive user support (APH), multimodal (speech and graphics) messages (MH), and embodied conversational agents (ECAs). These three enhancements to online help were implemented using the Wizard of Oz technique. The first study (E1) compares MH with APH, while the second study (E2) compares MH with embodied help (EH). Half of the participants in E1 (8) used MH, and the other half used APH. Most participants who used MH, resp. APH, preferred MH, resp. APH, to standard help systems which implement text and graphics messages (like APH). In particular, proactive assistance was much appreciated. However, higher performances were achieved with MH. A majority of the 22 participants in E2 preferred EH to MH, and were of the opinion that the presence of an ECA, a talking head in this particular case, has the potential to improve help effectiveness and efficiency by increasing novice users’ self confidence. However, performances with the two systems were similar, save for help consultation rate which was higher with EH. Longitudinal (usage) studies are needed to confirm the effects of these three enhancements on novice users’ judgments and performances. Keywords: Adaptive user interfaces, Embodied conversational agents, Talking heads, Online help, Speech and graphics, Multimodal interaction, Eye tracking.
1 Introduction The effectiveness of online help for the general public is still unsatisfactory despite continuous efforts from researchers and designers over the last twenty years. Help facilities are still ignored by most “lay users” who prefer consulting experienced users to browsing online manuals. This behavior is best accounted for by the “motivational paradox” [4], that is: users in the general public are reluctant to explore the functionalities of unfamiliar software and learn how to use them efficiently, as their main objective is to carry out the tasks they have in mind. The two studies reported here address the crucial issue of how to design online help that will actually be used by the general public, an essential condition for ensuring its effectiveness. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 748–757, 2009. © Springer-Verlag Berlin Heidelberg 2009
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
749
To achieve this goal, help systems should be capable of providing users with appropriate information right when they need it; according to the “minimum manual” [4]. They should be aware of the user’s current goal, knowledge and skills to meet both requirements; that is, they should have the capability to create and update an adaptive model of the current user’s profile from interaction logs analysis. Such a model is needed for (i) tailoring help information to the user’s current knowledge, and (ii) anticipating their information needs accurately so as to satisfy them through timely initiatives. This help strategy, which is both adaptive and proactive, is liable to improve help effectiveness and alleviate users’ cognitive workload. Using speech for conveying user support information may also contribute to increase online help usage, hence its effectiveness, by reducing the interference of help consultation in the user’s main activity: users have to stop interacting with software applications to read textual help information which is usually superimposed on the application window. In contrast, oral messages do not use screen space, and users can easily go on interacting with the application while listening to oral messages; in particular, they can carry out a sequence of oral instructions while it is being delivered. Finally, novice users’ motivation and emotional state strongly influence the effecttiveness of knowledge or skill acquisition. Embodied online help has the potential to encourage requests for assistance; it may also increase novice users’ self-confidence and reduce their stress. We present two empirical studies, E1 and E2, which attempt to assess the actual contributions of three possible enhancements to online help effectiveness and userfriendliness: adaptivity and proactivity, speech- and graphics-based messages, and embodied conversational agents (ECAs). The aim is to determine whether these enhancements have the potential to win user acceptance and improve novice users’ performances, especially memorization of the procedural and semantic knowledge needed to operate standard applications effectively. The method is described in the next section. Results are presented and discussed in the third section.
2 Related Work Adaptive online help has motivated large-scale research efforts, such as the Lumiere project [8] or the Berkeley Unix Consultant [23]. According to [5] and [9], the bulk of research on adaptive human-computer interaction has been focused on designing efficient user models. Contrastingly, assessment of these models and evaluation of the effectiveness and usability of adaptive user interfaces are research areas which need to be further developed. Evaluation of the ergonomic quality of speech as an output modality has motivated few studies compared to speech input. Recent research has been centered on issues pertaining to the use of speech synthesis in contexts where displays are difficult to use; see, for instance, interaction with in-vehicle systems [24] and mobile devices [19], or interfaces for sight impaired users [18]. Speech synthesis intelligibility [1] and expressiveness (especially the use of prosody for conveying emotions [22] have also motivated a number of studies. In contrast, to our best knowledge, the use of speech for expressing help information has been investigated by only one research
750
J. Simonin and N. Carbonell
group. Authors of [10] propose guidelines for the design and test of help information presented via voice synthesis to users of current commercial computer applications. ECAs with humanlike appearance and advanced verbal and nonverbal communication capabilities have been designed and implemented in many research laboratories. Current research efforts focus on creating ECAs which emulate, as best as possible, human appearance, facial expressions, gestures and movements, emotions and intelligent behaviors. Modeling expression of emotions [17], human conversational skills [16] and gaze activity during face-to-face exchanges [7] are very active research areas. Most evaluation studies of ECAs’ contributions to human-computer interaction focus on computer-aided learning situations; for a survey, see [14]. For instance, [12] investigates students’ motivations for acceptance or rejection of ECAs. In other application domains, utility and usability of humanlike ECAs have motivated only a few ergonomic studies1. Save for the pioneer work reported in [21], most evaluation studies address only a few issues in the design space for humanlike ECAs, such as the influence of the ECA’s voice quality (extrovert vs introvert) on children performances [6], or the affective dimension of interaction with a humanlike ECA (7 out of the 9 studies mentioned in [20].
3 Method 3.1 Overview We used the same methodology and setup for E1 and E2, in order to compare the effects of the three proposed enhancements to online help on learning how to operate an unfamiliar software application. We chose Flash, a software tool for creating graphical animations, because computer-aided design of animations involves concepts which differ from those implemented in standard interactive software for the general public. Thus, participants in E1 and E2, who were unfamiliar with animation creation tools, had to acquire both semantic knowledge and procedural know-how in order to carry out the proposed animation creation tasks using Flash. Undergraduate students (16 for E1, 22 for E2) who had never used Flash or any other animation tool, had to create two animations. E1 participants were divided into two gender-balanced groups with 8 participants each; one group could consult an adaptive-proactive help system (APH), and the other group a multimodal system (MH). Two online help systems, the MH system and an embodied help system (EH), were put successively at the disposal of E2 participants, one per animation; presentation order of MH and EH was counterbalanced among participants. Withinparticipant design, which reduces the effects of inter-individual differences, was not adopted for E1 to ensure that participants would notice message content evolutions. The MH system (used in E1 and E2) delivered oral help information illustrated with copies of Flash displays. The same database of multimodal help messages (over 300 messages) was used for MH and EH. EH was embodied by a female talking head which “spoke” speech components of messages. Speech was transcribed into printed text for APH so as to avoid implementing two enhancements in the same system. 1
e.g., Only 1 session (out of 8) was devoted to evaluation at IVA’07.
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
751
Once participants had filled out a background information questionnaire (10 min.), they got acquainted with Flash basic concepts (e.g., ‘scenario’, ‘interpolation’, etc.) using a short multimedia tutorial they could browse through as long as they wanted to (15-20 min.). Then, they tried to reproduce two predefined animations (1 hour 15 min. or so). During the second study only, their gaze movements were recorded throughout their interactions with Flash and the help systems, so as to obtain objective data on participants’ subjective reactions to the ECA’s presence. Afterwards, they filled out two questionnaires, a verbal and a nonverbal one [2]; both questionnaires were meant to elicit their subjective judgments on the help system(s) they had experimented; Lickert scales and Osgood’s semantic differentiators were preferably used for collecting verbal judgments. Next, their understanding and memorization of Flash basic concepts and procedures were assessed using a written post-test. Finally, they participated in a debriefing interview. All-in, individual sessions lasted about two hours and a half. 3.2 Implementation of the Three Help Systems To reduce interference between help consulting and animation design, the display included two permanent windows (see figure 1), a sizeable Flash window and a small help window (on the right of the screen). Based on earlier empirical work [3], participants could request four different types of help information using four dedicated buttons: procedural know-how (How?), semantic knowledge (What?), explanations of the application current state (Why?) and confirmation or invalidation of most recent Flash commands (Confirm?). Oral help messages were activated using colored buttons placed above Flash display copies illustrating their content. Speech acts were indicated by colors: warning (e.g., precondition), concept definition or procedure description, and explanations or recommendations. Colors were also used to denote speech acts in APH textual messages. The talking head was placed at the top of the EH window. To achieve realistic adaptive-proactive assistance, we resorted to the Wizard of Oz paradigm as a rapid prototyping technique. We also used it for EH and MH so as to ensure identical reaction times to participants’ requests for the three simulated systems. The Wizard was given software facilities to adapt the information content of help messages to the current participant’s actual knowledge and skills as s/he perceived them through observing their interactions with Flash and the help system. Three different versions of each message (M) were available to the Wizard: − An initial version (V1) including all the semantic and procedural information needed by users unfamiliar with the information in M; the Wizard used it to answer participants’ first request for M. − To answer further requests for M, s/he had to choose between two other versions: a short reminder of the information in V1, with or without additional technical details; s/he activated it for participants who had shown a good understanding of the information in V1 during interactions with Flash subsequent to their first request for M; a detailed presentation of the information in V1, including explanations, examples and/or illustrations; it was intended for participants who had experi-enced difficulties in understanding and putting to use the information in V1.
752
J. Simonin and N. Carbonell
Fig. 1. The three help systems, APH on the left and MH/EH on the right. Creation of a keyframe Flash APH(“How?” Message).MH & EH
Fig. 1. The three help systems, APH in the center, MH/EH on the right, and Flash window on the left
Practically, when a participant sent a request to the APH system, a software assistance platform displayed the name of the appropriate message on the Wizard’s screen. Thus, the Wizard had just to select the version of the message that best matched this participant’s current knowledge and skills, based on the observation of her/his interactions with Flash; the message was automatically displayed on the participant’s screen. As participants’ interactions with Flash and the APH system lasted one hour at the most, three versions of the same message were sufficient to achieve realistic simulations of dynamic adaptation to the evolution of participants’ familiarity with Flash through the session. When simulating MH or EH, the Wizard had just to activate the message selected by the assistance platform. To implement proactive user support, the Wizard was instructed to observe participants’ interactions with Flash, and assist them in creating the two predefined animations by anticipating their information needs and satisfying them on her/his own initiative using appropriate versions of messages in the database.
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
753
3.3 Measures Post-session questionnaires and debriefings were analyzed in order to gain an insight into participants’ subjective judgments on the help system(s) they had experimented. To assess the influence of each enhancement to online help on participants’ performances and behaviors, we used post-test marks and analyses of manually annotated interaction logs. These data provided us with information on participants’ assimilation of Flash concepts and operation, help usage and task achievement. Some analyses of annotated logs were restricted to the first animation creation task which lasted about 40 minutes on average, as participants seldom requested help or needed pro-active assistance while creating the second animation, most of the necessary knowledge to carry it out having been acquired during execution of the first task. To assess participants’ affective responses to the presence of the ECA, behaviorbased measures were collected in addition to verbal and non verbal judgments, as recommended in [15]. E2 participants’ gaze movements were recorded (at 60 Hz) throughout the session, using a head-mounted eye tracker (ASL-501) which allows free head and chest movements without loss of precision. As voluntary eye movements express shifts of visual attention, they are valuable indicators of users’ engagement and involvement in interaction with an ECA [11]. Physiological measures, such as heart rate or galvanic skin response [13] were ignored, as they are more intrusive than eye tracking in standard HCI environments. 3.4 Software Developments We developed a client-server platform to assist the Wizard in his/her task and tools for facilitating annotation and analysis of interaction logs. These software developments in Java are briefly described in this section. Software Assistance to the Wizard’s Activity. The client-server platform can: display copies of the participant’s screen on the Wizard’s screen; display help messages activated by the Wizard on the participant’s screen; assist the Wizard in the simulation of the three help systems by selecting the message, M, matching the participant’s current request; and, for the APH system, provide her/him with a history of all versions of M sent to the participant previously. Messages are stored in a hierarchical database as multimedia Web pages. The platform also records logs of participants’ interactions with both Flash and the current help system. Time-stamped logs comprise user and system events, mouse positions and clicks, screen copies, eye tracking samples; they may also include recordings of the user’s speech and gestures. Software Assistance to Annotation and Analysis of Interaction Logs. Interaction logs saved by the platform can be “replayed” with gaze points or fixations superimposed on displays. Main annotation facilities include: interactive segmentation and labeling of logs; and, for eye movement analysis, automatic or manual definition of ‘areas of interest’ over display sequences. Graphical facilities are also provided for visualizing results of eye tracking data analyses (e.g., heat maps).
754
J. Simonin and N. Carbonell
4 Main Results 4.1 Participants’ Subjective Judgments First Study. 6 participants out of 8 preferred the APH system to standard online help. The two remaining participants preferred standard help, due to “the force of habit” according to them. Proactive user support raised enthusiastic comments while message content evolution (adaptivity) went almost unnoticed. 7 participants rated the support provided by APH as very useful, and its initiatives as most effective. Similarly, 7 participants rated the MH system higher than standard online help. However, only 5 of them preferred audio to visual presentations of help information, based on the observation that one could carry out oral instructions while listening to them. The 3 other participants who preferred visual presentations explained that taking in spoken information is a more demanding cognitive task than assimilating the content of a text: one can read displayed a textual message at one’s pace, and freely select or ignore parts of it, which is impossible with spoken information. Second Study. According to verbal questionnaires, 16 participants (out of 22) preferred the EH system to the MH system. 5 participants rated MH higher than EH, whereas only one participant gave preference to standard help facilities. Most participants were of the opinion that the presence of an ECA had not disrupted their animation creation activity (17 participants), that it could greatly increase the effectiveness (19), and appeal (21) of online help. Non verbal judgments in Sam 1st line are also very positive: 19 participants enjoyed the ECA’s presence, the feelings of the 3 remaining participants being neutral. In addition, 14 participants had the impression that the ECA’s presence increased their self-confidence and reduced their stress (Sam 3rd line). Analyses of annotated eye tracking data during the first task (40 min. or so) indicate that, from the beginning to the end of task execution, all participants (11) glanced at the ECA whether it was talking or silent. Each participant looked at it 75 times on average. Fixation duration was longer while the ECA was talking than when it was silent. These objective measures indicate that the ECA actually succeeded in arousing participants’ interest and maintaining it throughout the first task execution, although all of them were primarily intent on achieving the first animation creation task. These results confirm judgments expressed in questionnaires. 4.2 Participants’ Performances and Behaviors First Study. Duration of the first task execution greatly varied from one participant to another. Inaction rate, that is, the percentage of time while the mouse remained still, is sensibly higher for the APH group than for the MH group (62% versus 53%). This difference illustrates the efficiency of spoken compared to textual help messages: one has to stop interacting with the application while reading a textual message as mentioned by some participants. APH participants consulted help as often as MH participants (58 requests vs 60 on the whole), although the APH system displayed 183 messages on its own initiative. Pushing help information to novice users does not seem to reduce the number of help requests. This suggests that proactive assistance is an efficient strategy for increasing help effectiveness, as APH participants read most of the help messages pushed to them, according to debriefings.
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
755
Analysis of post-test marks suggests that MH participants gained a better understanding of Flash concepts than APH participants and recollected procedures to activate its functions better (average marks: 17.6/31 vs 15.6/31). As for task achievement, the difference between the two groups (MH: 12.6/20 vs APH: 11.1/20 for the first task) may be due to the necessity, for APH participants, to stop interacting with Flash to read help messages. These interruptions may have interfered with task achievement. Second Study. We divided participants into two groups (of 11 participants each), according to the help system they had used first, EH or MH. For most measures, average values computed over each group are not sensibly different. This is the case for post-test marks and, concerning the first task, for task execution duration (40 min.), the number of interactions with Flash and the help system, task achievement. The only noticeable difference between the two groups is the average number of help message activations per participant during the first task: 22 for EH vs 16 for MH. This result suggests that the presence of an ECA may encourage help consultation. 4.3 Discussion According to E1 results, proactive online assistance is likely to arouse higher subjective satisfaction among novice users than adaptive help or multimodal (speech and graphics) messages. Evolutions of message content may have gone unnoticed because adaptivity is a basic feature of human communication, especially in the context of tutor-novice dialogues. Participants’ rather balanced judgments on the usability of speech compared to text should not deter designers from considering speech as a promising alternative modality to text for conveying help information. Speech usability can easily be improved by implementing advanced audio browsing facilities. Taking up this research direction might prove to be more rewarding in the short term than implementing effective proactive help, which still raises unsolved scientific issues. Firstly, to guess novice users’ intents accurately from their interactions with an unfamiliar application is a difficult challenge, as these users tend to perform actions unrelated to the achievement of their goals. Secondly, the MH group achieved better performances than the APH group. Efficient proactive help may have induced APH participants to rely on help information to achieve the two animation creation tasks, hence, to put little effort into learning Flash. Additional empirical evidence is needed to validate this interpretation, as the number of participants in E1 was limited. The presence of an ECA was well accepted by E2 participants. The EH system was preferred to the MH system by most participants who stated that the ECA’s presence had not interfered with their animation creation activities. The vast majority of them considered that the presence of an ECA had the potential to improve help effectiveness through increasing users’ motivation and self-confidence. However, these perceptions are at variance with their actual performances which were similar for EH and MH. The ECA’s presence had no noticeable effect on Flash semantic and procedural knowledge acquisition, task execution time or task achievement; it only encouraged help consultation. Nevertheless, observation of novice users’ behaviors and activities over longer time spans may be necessary in order to perceive its possible influence on learning new concepts and skills. Longitudinal studies are
756
J. Simonin and N. Carbonell
essential to obtain conclusive evidence on the effects of the presence of an ECA on novice users’ performances and its contribution to online help effectiveness and efficiency. Such studies are also needed to determine whether users will not get bored with the presence of an ECA in the long run, and prefer MH to EH, contrary to E2 results. A few participants commented that the usefulness of the ECA would decrease as the user’s knowledge and practice would grow. Performing such studies is a longterm objective as their design and implementation raise many difficult research issues.
5 General Conclusion We have presented and discussed the results of two empirical studies that aim at assessing the contributions, to the effectiveness and efficiency of online help of: adaptive-proactive user support (APH), multimodal (speech and graphics) messages (MH), and embodied conversational agents (ECAs). These three enhancements to online help were implemented using the Wizard of Oz technique. The first study (E1) compares MH with APH, while the second study (E2) compares MH with embodied help (EH). Half of the participants in E1 (8) used MH, and the other half used APH. Most participants who used MH, resp. APH, preferred MH, resp. APH, to standard help systems which implement text and graphics messages. In particular, proactive assistance was much appreciated. However, higher performances were achieved with MH. A majority of the 22 participants in E2 preferred EH to MH, and were of the opinion that the presence of an ECA, a talking head in this particular case, has the potential to improve help effectiveness and efficiency by increasing novice users’ self confidence. However, performances with the two systems were similar, save for help consultation rate which was higher with EH. These results open up several research directions. Improving browsing through oral messages is a short-term promising direction. Longitudinal studies are needed to assess whether APH and EH have the potential to improve online help effectiveness and efficiency, especially semantic and procedural knowledge learning. They are also needed to determine whether users will get bored or irritated in the long run by these two enhancements. Design and development of an adaptive-proactive help system, is a long-term research direction, due to the numerous issues that still need to be solved.
References 1. Axmear, E., Reichle, J., Alamsaputra, M., Kohnert, K., Drager, K., Sellnow, K.: Synthesized Speech Intelligibility in Sentences: a Comparison of Monolingual EnglishSpeaking and Bilingual Children. Language, Speech, and Hearing Services in Schools 36, 244–250 (2005) 2. Bradley, M., Lang, P.: Measuring emotion: the self assessment manikin and the semantic differential. Journal of Behavioral Therapy and Experimental Psychiatry 25, 49–59 (1994) 3. Capobianco, A., Carbonell, N.: Contextual online help: elicitation of human experts’ strategies. In: Proc. HCI International 2001. LEA, vol. 2, pp. 824–828 (2001) 4. Carroll, J.M., Smith-Kerber, P.L., Ford, J.R., Mazur-Rimetz, S.A.: The minimal manual. Human-Computer Interaction 3(2), 123–153 (1987)
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
757
5. Chin, D.N.: Empirical Evaluation of User Models and User-Adapted Systems. User Modeling and User-Adapted Interaction 11, 181–194 (2001) 6. Darves, C., Oviatt, S.: Talking to digital fish: Designing effective conversational interfaces for educational software. In: Pélahaud, C., Ruttkay, Z. (eds.) From brows to trust: Evaluating embodied conversational agents. Part IV, ch. 10. Kluwer, Dordrecht (2004) 7. Eichner, T., Prendinger, H., André, E., Ishizuka, M.: Attentive presentation agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 283–295. Springer, Heidelberg (2007) 8. Horvitz, E., Breese, J., Heckerman, D., Hovel, D., Rommelse, K.: The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users. In: Proc. UAI 1998, pp. 256–265 (1998) 9. Jameson, A.: Adaptive Interfaces and Agents. In: Jacko, J., Sears, A. (eds.) HumanComputer Interaction Handbook, ch. 15, pp. 305–330. Erlbaum, Mahwah (2003) 10. Kehoe, A., Pitt, I.: Designing help topics for use with text-to-speech. In: Proc. DC 2006, pp. 157–163. ACM Press, New York (2006) 11. Ma, C., Prendinger, H., Ishizuka, M.: Eye Movement as an Indicator of Users’ Involvement with Embodied Interfaces at the Low Level. In: Proc. AISB 2005 Symp. U. of Hartfordshire, pp. 136–143 (2005) 12. Moreno, R., Flowerday, T.: Students’ choice of animated pedagogical agents in science learning: A test of the similarity attraction hypothesis on gender and ethnicity. Contemporary Educational Psychology 31, 186–207 (2006) 13. Mori, J., Prendinger, H., Ishizuka, M.: Evaluation of an Embodied Conversational Agent with Affective Behavior. In: Proc. Workshop on Embodied Conversational Characters as Individuals, at AAMAS 2003, pp. 58–61 (2003) 14. Payr, S.: The university’s faculty: an overview of educational agents. Applied Artificial Intelligence 17(1), 1–19 (2003) 15. Picard, R.W., Daily, S.B.: Evaluating Affective Interactions: Alternatives to Asking What Users Feel. In: Workshop on Evaluating Affective Interfaces: Innovative Approaches, at CHI 2005 (2005), http://affect.media.mit.edu/publications.php 16. Piwek, P., Hernault, H., Prendinger, H., Ishizuka, M.: T2D: Generating dialogues between virtual agents automatically from text. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 161–174. Springer, Heidelberg (2007) 17. Poggi, I., Pelachaud, C., de Rosis, F.: Eye Communication in a Conversational 3D Synthetic Agent. The European Journal on Artificial Intelligence 13(3), 169–182 (2000) 18. Ran, L., Helal, A., Moore, S.E., Ramachandran, B.: Drishti: An Integrated Indoor/Outdoor Blind Navigation System and Service. In: Proc. IEEE PERCOM 2004, pp. 23–30 (2004) 19. Roden, T.E., Parberry, I., Ducrest, D.: Toward mobile entertainment: A paradigm for narrative-based audio only games. Science of Computer Programming 67(1), 76–90 (2007) 20. Ruttkay, Z., Dormann, C., Noot, H.: Evaluating ECAs. What and How? In: Workshop on Embodied Conversational Agents. Let’s specify and evaluate them, at AMAAS (2002) 21. van Mulken, S., André, E., Müller, J.: An empirical study on the trustworthiness of lifelike interface agents. In: Proc. HCI International 1999, LEA, vol. 2, pp. 152–156 (1999) 22. Ververidis, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Communication 48(9), 1162–1181 (2006) 23. Wilensky, R., Chin, D.N., Luria, M., Martin, J., Mayfield, J., Wu, D.: The Berkeley UNIX Consultant Project. Artificial Intelligence Review 14(1-2), 43–88 (2000) 24. Zajicek, M., Jonsson, I.-M.: Evaluation and context for in-car speech systems for older adults. In: Proc. ACM Latin American Conf. on HCI, pp. 31–39. ACM Press, New York (2005)
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users? Ivar Solheim Norwegian Computing Center, P.O. Box 114 Blindern, 0314 Oslo, Norway [email protected]
Abstract. This paper addresses web accessibility and usability for lowerliteracy users with limited ICT skills. Although adaptive and adaptable user interfaces have been studied and discussed at least since the 80s, the potential of adaptive user interfaces is still far from realization. A main conclusion drawn in this paper is that simple, straightforward and intuitive adaptivity mechanisms may work well, but more complex and pervasive ones don’t, and may even be counterproductive. A potential pitfall may be simplistic and “cognitivist” user and task modelling that fails to take the user’s experience, competence and socio-psychological context—in short, the user’s actual, real perspective and environment—into account. Keywords: adaptive interfaces, personalisation, multimodality, user modelling, universal design.
1 Introduction This paper addresses web accessibility and usability for lower-literacy (LL) users with limited ICT skills. It is a general problem that the needs of LL users as well as those of the cognitively disabled have been ignored in HCI (Human Computer Interface) research, in contrast to the needs of the physically disabled, in particular the visually impaired. The paper focuses on research issues pertaining to adaptive and personalized user interfaces for LL users with weak ICT skills. It distinguishes between adaptability and adaptivity. Adaptability offers the user several options for personalization and adjustment according to the user’s subjective preferences. Adaptivity means that the user interface can be dynamically and automatically tailored to the needs of the user. The system automatically recognizes the user’s behaviour over time and improves the quality of the user interface interaction. The research reported here contributes to existing research in two ways: first, by studying and identifying the specific user interface needs and requirements of LL users, and second, by providing knowledge on the interaction level that can shed light on the merits as well as the pitfalls of adaptive interface approaches. Although adaptive (and adaptable) user interfaces have been studied and discussed at least since the 80s [11], it seems clear that the potential of adaptivity is still far from realization. The merits of adaptivity and in particular adaptive personalization of C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 758–765, 2009. © Springer-Verlag Berlin Heidelberg 2009
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users?
759
web pages have been disputed and are said to be overrated (see e.g. [9]). However, it is still a popular idea that adaptivity can play a significant role in enhancing accessibility and usability, also for elderly, LL users [1], [12]. Lower-literacy users can be defined as people that can read, but who have nontrivial difficulties doing so [12]. The group is a minority of our informants, but they represent a societal challenge that is often undervalued. For Norway it is estimated that at least 30 % of the population are low-literate (Statistics Canada and OECD 2005), whereas in other countries the rate is higher; in the USA above 40 % according to the National Assessment of Adult Literacy [4] Low literacy can be related to several fundamental factors, diagnosis and causes, but these are not accentuated in the study since the focus is principally on functions and behaviour as users, not on medical or clinical diagnosis [10].
2 The Research and the Data The research reported in this paper is based on data from several research projects that the Norwegian Computing Center has been involved in over the recent four years. The visions of “universal design” and “design for all” have been guiding and inspiring this research, acknowledging that a substantial part of all citizens lack the skills and the possibility that are required to be fully integrated participants in the modern information society. Personal interviews and observations of more than 60 Norwegian users of electronic forms have been carried out, in particular with groups with visual impairments, the elderly 70 years and older and finally also people with cognitive disabilities. This paper does not report and discuss the results for all groups, but focuses on the smaller sub group of LL users, about 20 users. Some appear in several categories and some are hard to define as we don’t base the categorization on clinical diagnosis, but only on observed functional behaviour and difficulties in the field trials. We have found that despite differences in age, background and other abilities, those that we observe as LL users share several important characteristics as users of ICT. Importantly, the needs of this type of users help us highlight crucial challenges for the development of design-for-all user interface design. The projects are still on-going, also involving collaboration with other countries in the EU-funded project DIADEM,1 but the data and the analysis presented here are only from Norway and with Norwegian users and are not necessarily compliant with findings in other countries. The projects have two general hypotheses. First, can multimodality make a difference? A significant part of the research has been to study to what extent various multimedia affordances (text, audio, video) can provide improved accessibility for certain types of users, e.g., replacing text with audio and video for LL and dyslectic users. Second, can adaptivity and “intelligent” interface components enhance accessibility and usability? These hypotheses are closely connected in the sense that multimodality is applied as a means for making adaptivity mechanisms user friendly and accessible. 1
Delivering Inclusive Access to the Disabled and Elderly Available: http://www.projectdiadem.eu/
760
I. Solheim
3 Lower-Literacy User Characteristics Summers [12] and Payne [2] argue that the problems and challenges of lower-literacy ICT are related to the fact that they don’t have an appropriate mental model of this socio-technical environment, in this case electronic forms on PCs. LL users lack appropriate mental models that should have guided them in their work with the electronic forms. However, this is only one part of the story. The users’ behaviour is not only a function of missing mental models, but of various sociocultural and psychological contexts and factors. For example, their behaviour is shaped by the fact they are often less comfortable with this working context in which their weak skills and (for some) dyslexia may be exposed and challenged. Often these users also have weak ICT skills and this reinforces the challenges and problems for them. Weak literacy and ICT skills also lead to lack of self-confidence and motivation. Some of the findings from our studies2 shed light on how users act under these circumstances: • Lower-literacy users are often not able to scan text and must most often read word for word [3], [12]. Typically, they would use much time and effort reading pages with much text. • Many LL users apply various evasion strategies. For example, rather than reading all the text on a page of a form, they skip this and go directly to the field to be filled out. Sometimes the users seem to guess what the page is about without reading it by looking at what they think are key words. • Complex page structure may be confusing. The user focuses on what he/she thinks is essential, not looking at additional information unless forced to it. • As these users are not familiar with common ICT icons, for example Help buttons, this means that Help buttons are not intuitively understood and seldom used. • The focus of attention is narrow and word-by-word oriented [3] which means that scrolling is difficult and confusing. • Navigating between pages is difficult because users are not familiar with the electronic form. • LL users appreciate predictability, simplicity and clear structure. • Multimodality (e.g., Help messages as audio) must be used with caution—can easily lead to confusion because the user is not prepared for this.
4 Adaptivity Process Attributes and Design Objectives The research work is about providing interface adaptivity in electronic forms. In the DIADEM project the goal is to provide an adaptive web browser interface. This will be achieved by developing an intelligent interface component which monitors the user, adapting and personalising the computer interface to enable people to interact with web-based forms. The figure below illustrates some of the basic elements in the adaptivity process of the work reported here. 2
Especially from the project UNIMOD – Universal Design in Multimodal Interfaces http://www.unimod.no/unimod-web/Startside_Engelsk.html
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users?
761
Table 1. Basic elements in the adaptivity process
Adaptivity process attributes Interaction techniques – approach, hypothesis
Attribute objectives and categories
User and task model definitions
The user model and requirements defined. Electronic form task models defined Support for target users, basic interface requirements
Goals
Rules
Testing multimodality and multimodal interaction mechanisms
A variety of rules, related to various aspects and challenges in the form
Activities and approaches in UI design practice Implement feasible multimodal support for users: Video, audio, animations, text – and in combination Key user and task characteristics derived from observations and analysis A research-based prioritized list of requirements for elderly and lowerliteracy users of forms Rules implemented, derived from user studies and analysis
There are basically four adaptivity process attributes [13]: interaction techniques, user and task models, goals and rules. An overall objective has been to test out the hypothesis that multimodality can be beneficial for users and, in line with this, multimodal support is implemented. This includes, for example, help messages that are (sometimes, but not always) both in text and audio mode. User and task models are defined in order to define key user characteristics of LL users, and the forms are analyzed in order to identify accessibility challenges. User studies are important for user model definitions and requirements as well as for task modelling. Finally, and on the basis of the other attributes, objectives and activities, a number of logical, “expert system” rules are defined and implemented. The system’s “intelligence” can provide personalised and tailored help, e.g. a user with serious problems in filling out the form is provided with different type of help than a user with minor difficulties. In case the user appears to be stuck, quite pervasive (often multimodal) help is automatically provided, for example by forcing the user to stop his work on the form and read short messages that appear on the screen. The type of help provided is designed on the background of studies of how users work with electronic forms, in particular observed frequent problems as well as problem solving strategies and techniques of LL and other types of users.
5 Adaptivity Mechanisms: An Example The figure below shows the start position of the form, before any modification and adaptivity measures were implemented. This electronic safety alarm application form has been used for several years in many municipalities in Norway.
762
I. Solheim
In our research projects, the form has been modified, redesigned and a number of new functionalities and affordances have been added in order to provide timely and personalised support for the user. Below (Figure 1) is shown an example that illustrates the task, the interaction design and the adaptivity mechanisms. The design of the form is intended to satisfy needs for universal access and “design for all” requirements. The design complies with Norwegian national guidelines3 for the design, accessibility and usability of electronic governmental forms, and according to this, the form is divided into three parts: • Navigation area to the left • Work areas in the middle • Information area to the right, See screen capture below:
Fig. 1. Task, interaction design and adaptivity mechanisms
The screen capture below shows the form in a modified way and illustrated with an activated adaptivity rule. First we can see that the initial page design interface has been changed. Now the user may focus on the field that he/she is supposed to fill out. The cursor is initially placed in the first field. As fields are filled out, they become “grey,” (not shown here), indicating the user can proceed the work, and the cursor moves to the next field. The capture (Figure 2) below shows an error. 3
User Interface Guidelines for Governmental Forms on the Internet, see http://www.elmer. no/retningslinjer/pdf/elmer2-english.pdf
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users?
763
Fig. 2. An error
In this case the user has made an error in the first field that changes the colour of the field and the related text from white to red, signalling that something is wrong. However, just a change of colour of a field may not be a sufficient aid for the user in order to understand what is wrong and what he or she should do. Therefore, two additional elements are activated, first, a text message appears in the open field above the red-coloured one; second, the same message is presented in audio mode. The user is told that he or she has to click in the field in question before writing the answer (“You must click on a field before you can write in it” it says, in Norwegian).
Fig. 3. Error response screen
764
I. Solheim
If the user after this does not succeed or even wait for a period, the whole page become grey with a message in the middle that he or she has to click on the green OK button on the top of the page in order to proceed. The screen shot below (Figure 3) illustrates this.
6 Discussion and Conclusion: When Adaptivity Becomes Counterproductive The findings from the observations, evaluation and field trials were ambiguous, but a main conclusion to be drawn is that simple, straightforward adaptivity mechanisms may work well, but more complex and pervasive ones don’t, and may even be counterproductive. The simple example above provides an illustration as to why this is so. A basic problem is simplistic and “mentalistic” user and task modelling that fails to take the user’s complex experience, competence and socio-psychological context into account. On the one hand, the users clearly appreciated several simplifying changes in the user interface and functionalities that helped them to accomplish tasks and navigate in the form. For example, the users found that filling out the modified form was easy and simple, as long as they did not make mistakes that activated the adaptivity rules. One successful, simple mechanism was that fields became grey when the field was filled out, providing intuitive feedback and clues how to proceed. Furthermore, when asked after the completion of the form, they also were in principle clearly positive towards the adaptivity mechanisms as such. A usability (SUMI) test was carried out and showed that also LL users were positive in general and had a tendency to blame themselves rather than the technology in case of difficulties. On the other hand, the observations of the actual behaviour of users showed that most users had problems with the more complex adaptivity mechanisms. The users in this study clearly favoured as little interaction with the system as possible, were highly sensitive to disturbances and unnecessary interaction and were easily distracted and led astray when they had to relate to additional error-handling. The observations of the users showed that the adaptivity mechanisms that are illustrated above were often inaccurate and disturbing rather than relevant and helpful. A fundamental problem seems to be that users don’t share the implicit mental model implemented and reflected in the adaptivity mechanisms. For example, the user is not always aware that he or she has done something wrong when the “intelligent” help is activated. When one mechanism is activated because the user (for example) is very slow in filling out the form, the user becomes confused and frustrated because he or she does not know why a mechanism is activated. Furthermore, the additional audio function, which is meant to be beneficial for LL users, also becomes a source of confusion because the user does not see the relevance of the audio message. The user modelling must take into account that for this type of users, issues related to cognitive overload and reduced cognitive ability to process textual input, are important. In line with this, users favoured as simple interface structure; immediate, context-sensitive feedback and help, simple language, concise texts and minimized use of icons. But there will also be other critical issues such as motivation, selfconfidence, feeling of mastery and the overall working context – factors that also must be taken seriously into account in the design.
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users?
765
Acknowledgement. The research reported here was supported by the Norwegian Research Council and the EU ICT FP6 programme.
References 1. Aula, A.: User study on older adults’ use of the Web and search engines. Universal Access in the Information Society 4(1), 67–81 (2005) 2. Payne, S.: User’s Mental Models: The Very Ideas. In: Caroll, J. (ed.) HCI Models, Theories, and Frameworks, pp. 135–156. Morgan Kaufman Publishers, San Francisco (2003) 3. Nielsen, J.: Lower-Literacy User. J Nielsen Alertbox column (2005), http://www.useit.com/alertbox/20050314.html 4. NCEIS National Center for Education Statistics: Overview of the National Assessment of Adult Literacy (2003), http://nces.ed.gov/naal/PDF/NAALOverview.pdf 5. Statistics Canada and OECD: Learning a living first results of the adult literacy and life skills survey, http://lesesenteret.uis.no/getfile.php/Lesesenteret/ Hovedrapport_All.pdf (2003) 6. Hasselbring, T.: Interview in Scholastic Offers Solutions in the Face of National Reading Crisis. Education Update 11(2), 11 (2005) 7. Boham, P.: Cognitive disabilities part 1: We still know too little and we do even less (2005), http://www.webaim.org/techniques/articles/ cognitive_too_little/ 8. Boham, P., Anderson, S.: A conceptual Framework for Accessibility Tools to Benefit Users with Cognitive Disabilities. In: Proceedings of the 2005 International CrossDisciplinary Workshop on Web Accessibility, Chiba, Japan, May 10, pp. 85–89 (2005) 9. Jupiter Research: Beyond the Personalization Myth: Cost-effective Alternatives to Influence Intent Report. Jupiter Research Corporation. 26 Pages - Pub ID: JUPT949160 (2003) 10. Rowland, C.: Cognitive disabilities part 2: conceptualising design considerations (2004), http://www.webaim.org/techniques/articles/conceptualise 11. Schneider-Hufschmidt, M., Kuhme, T., Malinowski, U. (eds.): Adaptive User Interfaces: Principles and Practice. Adaptive User Interfaces. Elsevier Science Publishers B.V, Amsterdam (1993) 12. Summers, K., Summers, M.: Making the Web Friendlier for Lower-Literacy Users. Intercom magazine of the Society for Technical Communication, 19–21 (June 2004) 13. Karagiannidis, C., Koumpis, A., Stephanidis, C.: Supporting Adaptivity in Intelligent User Interfaces: the case of Media and Modalities Allocation. In: ERCIM Workshop Towards User Interfaces for All: Current Efforts and Future Trends, Heraklion, Greece, October 30-31 (2005) 14. Dieterich, H., Malinowski, U., Kuhme, T., Schneider-Hufschmidt, M.: State of the Art in Adaptive User Interfaces. In: Schneider-Hufschmidt, M., Kuhme, T., Malinowski, U. (eds.) Adaptive User Interfaces: Principles and Practice. Adaptive User Interfaces. Elsevier Science Publishers B.V, Amsterdam (1993)
Adaptative User Interfaces to Promote Independent Ageing Cecilia Vera-Muñoz1, Mercedes Fernández-Rodríguez1, Patricia Abril-Jiménez1, María Fernanda Cabrera-Umpiérrez1, María Teresa Arredondo1, and Sergio Guillén2 1
Life Supporting Technologies. Technical University of Madrid, Ciudad Universitaria s/n. 28040- Madrid. Spain {cvera,mfernandez,pabril,chiqui,mta}@lst.tfo.upm.es 2 ITACA Institute. Technical University of Valencia, Camino de Vera s/n. 46022- Valencia. Spain [email protected]
Abstract. During the last years, the EU population is experiencing an increasing aging process. This tendency is motivating the emergence of new needs and the appearance of diverse services and applications oriented improve the quality of life of senior citizens. The creation of such services requires the use of technological advances and design techniques specifically focused on addressing elderly requirements. This paper presents the adaptative user interfaces that have been developed in the context of an EU funded project, PERSONA, aiming to provide different services to promote independent aging Keywords: adaptative user interfaces, ambient assisted living, services for elderly, independent aging.
1 Introduction EU population is becoming increasingly older [1]. As a result of this demographic trend, European countries are expected to experience significant social and economical impacts, with enormous effects on welfare expenditures and, in particular, on employment and labor markets, on pension systems and healthcare systems. The European social model is based on the wellness for all the citizens and frequently this wellbeing is perceived in terms of “quality of life”. Quality of life is a subjective concept but, from the perspective of an elderly person, it can be analyzed from different viewpoints or domains: physical, psychological, level of independence, social relationships, environments and spirituality, religions or beliefs. It is a technological challenge to provide senior citizens with systems that can foster the different facets in their perception of quality of life. These systems should improve the level of independence, promote the social relationships, leverage the immersion in the environments, and encourage the psychological and physical state of the person. Ambient Assisted Living (AAL) is a concept that embraces all the technological challenges in the context of the Ambient Intelligence (AmI) paradigm to face the problem of the aging population in Europe. AAL aims to create a technological context, transparent to the user, and specifically developed to manage elderly needs and increasing their life independence. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 766–770, 2009. © Springer-Verlag Berlin Heidelberg 2009
Adaptative User Interfaces to Promote Independent Ageing
767
PERSONA, a European research funded project, firmly believes that the application of new AAL technologies can improve the quality of life of senior citizens [2]. The project aims at advancing the paradigm of AmI through the harmonization of AAL technologies and concepts for the development of sustainable and affordable solutions that promote the social inclusion and independent living of elderly. PERSONA has developed a common semantic framework that comprises a scalable open standard AAL technological platform and a broad range of AAL services. These solutions cover the user’s needs in the areas of social integration, daily activities, safe and protection, and mobility. The AAL services are offered to the users by means of adaptative interfaces developed as a result of a complex human-computer interaction design that involved the consideration of several aspects related with the user’s needs and context information.
2 Methodology The definition of the adaptative user interfaces developed within PERSONA project has been based on the “Interaction Framework” design method, described as part of the goal-directed design methodology [3]. The modelling included several tasks, starting with the definition of input methods, where the various means that a user could use for entering information into the system where assessed (i.e. keyboard, mouse, tactile screens, etc.). Then, the primary screens for presenting the information where described, following the “description of views” task. In a third step, the definition of functional and data elements established the concrete representations in the user interface of the functions and needs identified in the requirements phase. Additionally, the allowed operations with the diverse elements of the interface were determined. And, finally, a sketch with the basic interaction and key path scenarios was described. The project has required the consideration of diverse interaction options for providing the developed services. The study started with the designation of interaction channels, which are basic interaction modalities based on the five basic physiological senses (visual, auditory, haptic, olfactory, taste). For each of these channels the possible interaction modes that could be used by the different services for interacting with the users have been analyzed. The considered alternatives comprise icons and graphical elements as visual interaction, voice and sounds for auditory interaction, gestures recognition and tactile displays as haptic interaction, and taste and smell for olfactory interaction. Furthermore, a set of additional options such as tangible user interfaces, avatar based interaction, smart objects, multimodality, and adaptative graphical user interfaces, have been also studied, all grouped under a so-called spanning channel. Following, a specific interaction channel/mode and a type of device have been selected for each of the users’ target group defined in the project: elderly at home, elderly outside, relatives and care staff (Table 1). These groups where identified using the International Classification of Functioning, Disability and Health methodology (ICF), which makes possible to classify users according to their capabilities [4].
768
C. Vera-Muñoz et al. Table 1. Interaction channel/mode and type of device per user’s target group
Target group Elderly at home Elderly outdoor Relatives Care staff
User to system interaction Voice/ Touch screen Graphics/ Portable Mobile Device (PMD) Graphics/ Keyboard-PC Call / PC or PMD
System to user interaction Text and speech/ Screen Graphics/ PMD Graphics/ PC SMS or call/ PC or PMD
The classification showed a clear predominance of visual and auditory interaction as the most suitable alternative in all possible scenarios. Thus, new metaphors and different formats for representing the information have been designed for these two options. In this context, special attention has been paid to the design of the graphical user interfaces (GUIs), following the design for all principles and applying accessibility and usability criteria for the creation of easy-intuitive interaction dialogs between the user and the system.
3 Results The platform developed within the PERSONA project includes an interaction system designed to provide adaptative user interfaces for the diverse offered services. The adaptation is made automatically based on different parameters: the information to be presented to the user, as required by each service, the user’s profile, and the context information. The user interaction system includes two basic components: the dialog handler and the I/O handlers. These two components are close related with a context awareness framework and a profiling component (Fig.1).
Fig. 1. PERSONA user interaction system architecture
Adaptative User Interfaces to Promote Independent Ageing
769
The dialog handler component main purpose is to decide the type of device and interaction channel to be used in each of the services’ user interfaces. This selection is needed anytime a service is invoked and it is conducted considering the information to be shown to the user (provided by the service), along with data extracted from the user’s profile and environmental context. The output of the dialog handler is a specific interaction channel and a generic type of device to be used for interacting with the user (i.e. graphical interaction on a PC screen). With this information, an I/O handler selects the specific device that the system will use for interacting with the system (i.e. PC screen of the bedroom) and present the required information on it. The I/O handlers are application-independent pluggable technological solutions, associated with specific interaction characteristics, which manage the respective I/O channels to particular devices. Basically, six possible I/O handlers have been defined, each one associated with a particular interaction mode or device type: • The “Voice at home I/O handler” is responsible for any voice-based interaction with the user while he is located at home. • The “GUI-1 I/O handler” manages all graphical based interactions of indoor services. • The “GUI-2 I/O handler” is responsible for contents representation on portable mobile devices (PMD). • The “SMS I/O handler” manages information interchange in SMS format for all services. • The “Voice-graphical I/O handler” supports a combination of voice and graphical information. • The “Voice-Gesture I/O handler” deals with voice interactions combined with gestures. This option offers users the possibility of interacting with the systems using gestures (i.e. pointing) combined with voice in order to emphasize an intended or desired action. The procedure for adapting a user interfaces starts with the dialog manager that selects a specific output mode as the most appropriate according to the user’s profile, context data and information to be presented to the user. Once this selection has been done, it sends the information to the correspondent I/O handler, who chooses a specific device based on the context parameters, and presents the information required by the service on it. Additionally, I/O handlers convert the services’ output and the user’s input into the appropriate format, according to the device characteristics. The final result is a framework that manages user interaction in a service and device independent way, being also completely adaptative in terms of user’s profile, context parameters and type of information to be presented to the user.
4 Conclusions PERSONA project has applied the Ambient Intelligence paradigm to the design and development of the presented interaction system that provides adaptative user interfaces for AAL services. The developed solution bring systems fitting better users’ needs, lifestyles and contexts by further developing multimodal communication and
770
C. Vera-Muñoz et al.
integrating information acquired from the environment in the process of interaction with the user. PERSONA project is taking a step forwards in the field of services for supporting the elderly, by improving social connectedness and participation, providing control over the environment, mobility or prevention services. Elderly users will be highly benefited from having completely adaptable and personalized AAL services that can significantly improve their quality of life. Acknowledgments. We would like to thank the PERSONA Project Consortium for their valuable contributions for the realization of this work. This project is partially funded by the European Commission.
References 1.
2. 3. 4. 5. 6. 7.
European Commission, Directorate-General for Economic and Financial Affairs. The, Ageing Report: Underlying Assumptions and Projection Methodologies for the EU-27 Member States (2007-2060). EUROPEAN ECONOMY 7|2008 (2009) PERSONA EU funded project (2007-2010). IST-045459. European Commission Six Framework Programme, http://www.aal-persona.org Cooper, A., Reimann, R.: About Face 2.0: The Essentials of Interaction Design. John Wiley and Son, New York (2007) The International Classification of Functioning, Disability and Health (ICF), http://www.who.int/classifications/icf/ Emile, H.L.: Aarts, Stefano Marzano: The New Everyday: Views on Ambient Intelligence, pp. 78–83 (2003) Akman, V., Surav, M.: The Use of Situation Theory in Context Modeling. Computational Intelligence 13(3), 427–438 (1997) Mayrhofer, R.: Context Prediction Based on Context Histories: Expected Benefits, Issues and Current State-of-the-Art. ECHISE (2005)
Author Index
Abascal, Julio 623 Abril-Jim´enez, Patricia 139, 766 Adams, Ray 467 Ahamed, Sheikh I. 189 Akhter Lipi, Afia 631 Albayrak, Sahin 150 Alexandris, Christina 92 Aloise, Fabio 483 Amemiya, Tomohiro 477 Antona, Margherita 684, 711 Ao, Xuefeng 583 Arredondo, Mar´ıa Teresa 49, 75, 139, 248, 766 Azuma, Kousuke 209 Babiloni, Fabio 483 Balandin, Sergey 3 Barbosa, Tiago 345 Basdekis, Ioannis 279 Bened´ı, Jose Miguel 160 Betke, Margrit 493 Bianchi, Luigi 483 Bieber, Gerald 289 Bjærum, Robert 317 Blumendorf, Marco 150 Bollow, Eckhard C. 258 Bonail, Borja 623 Braun, Anne-Kathrin 603 Bruegger, Pascal 297 B¨ uhler, Christian 143 Burzagli, Laura 641 Cabrera-Umpi´errez, Mar´ıa Fernanda 49, 139, 766 Cansizoglu, Esra 493 Caporali, Maurizio 729 Carbonell, No¨elle 748 Carri¸co, Lu´ıs 384 Chang, Chia-Wei 455 Chen, Chien-Hsu 13, 455 Chen, Xinyu 658 Chien, Szu-Cheng 20 Cho, Hyunjong 57 Choi, Soo-Mi 394 Chong, Anthony 29
Chuang, Su-Chen 82 Cincotti, Febo 483 Comley, Richard 467 Connor, Caitlin 493 Daunys, Gintautas 503 de las Heras, Rafael 75 Delogu, Franco 557 Dias, Ga¨el 345 Doulgeraki, Constantina 711 Doulgeraki, Voula 279 Duarte, Carlos 384 Epstein, Samuel Ezer, Neta 39
493
Faasch, Helmut 258 Federici, Stefano 557 Felzer, Torsten 509 Fern´ andez, Carlos 160, 228 Fern´ andez, Mar´ıa 49 Fern´ andez-Rodr´ıguez, Mercedes Fisk, Arthur D. 39 Fujiwara, Akio 528, 613 Furuta, Kazuo 674 Gabbanini, Francesco 641 Gao, Xufei 658 Gardeazabal, Luis 623 Georgalis, Yannis 168 Georgila, Kallirroi 117 Ghoreyshi, Mahbobeh 467 Glavinic, Vlado 307 Grammenos, Dimitris 168 Grani´c, Andrina 694 Guill´en, Sergio 766 Guo, Ping 658 Han, Dongil 57 Hasan, Chowdhury S. 189 Heiden, Wolfgang 603 Hein, Albert 178, 519 Hellenschmidt, Michael 228 Hellman, Riitta 317 Hirata, Ichiro 528, 613
766
772
Author Index
Hirsbrunner, B´eat 297 Hitz, Martin 355 Hoffmeyer, Andr´e 519 Hong, Seongrok 365 Huang, Xin-yuan 650 Irwin, Curt B. 535 Islam, Rezwan 189 Izs´ o, Lajos 67 Jansson, Harald K. 317 Jedlitschka, Andreas 199 Jeong, Kanghun 365 Jimenez-Mixco, Viveca 75 Jo, Gyeong-Sic 667 Jokinen, Kristiina 537 Jones, Brian 127 Joo, Ilyang 365 Kanno, Taro 674 Karampelas, Panagiotis 279 Kempter, Guido 218 Kim, Do-Yoon 738 Kim, Yong-Guk 667 Kirste, Thomas 178, 519 Kleinberger, Thomas 199 Kogan, Anya 445 K¨ ohlmann, Wiebke 564 Kondratova, Irina 327 Kukec, Mihael 307 Lafuente, Alberto 623 Lauruska, Vidas 503 L´ azaro, Juan-Pablo 160, 238 Lee, Chang-Franw 422 Lee, Jaehoon 365 Lee, Jeong-Eom 209 Lee, Ji-Hyun 29 Lee, Joo-Ho 209 Leitner, Gerhard 355 Leonidis, Asterios 684, 711 Leung, Cherng-Yee 82 Leuteritz, Jan-Paul. 684 Li, Dingjun 335 Li, Hui 335 Liu, Jun 335 Liu, Ying 335 Ljubic, Sandi 307 Machado, David 345 Maes, Pattie 547 Magee, John 493
Mahdavi, Ardeshir 20 Maier, Edith 218 Malagardi, Ioanna 92 Martins, Bruno 345 Mattia, Donatella 483 Melcher, Rudolf 355 Mercalli, Franco 228 Mistry, Pranav 547 Miyashita, Satoshi 209 Montalv´ a, Juan Bautista 49 Moon, Hyeon-Joon 57, 365, 667 Morka, Sverre 317 Mourouzis, Alexandros 684 M¨ uller, Katrin 238 Nadig, Oliver 564 Nakano, Yukiko 631 Naki´c, Jelena 694 Naranjo, Juan-Carlos 139, 228, 238 Narzt, Wolfgang 374 Nien, Ken-Hao 13 Nordmann, Rainer 509 Nowakowski, Przemyslaw 100 Olivetti Belardinelli, Marta Pais, Sebasti˜ ao 345 Palmiero, Massimiliano 557 Panou, Maria 684 Park, Changhoon 704 Park, Gwi-Tae 209 Park, Jieun 29 Partarakis, Nikolaos 711 Peinado, Ignacio 248 Prueckner, Stephan 199 Quitadamo, Lucia Rita
483
Rau, Pei-Luen Patrick 335 Rehm, Matthias 631 Reis, Tiago 384 Renals, Steve 117 Rhee, Seon-Min 394 Rinderknecht, Stephan 509 Ritter, Walter 720 Rizzo, Antonio 729 Rogers, Wendy A. 39 Rubegni, Elisa 729 Ryu, Han-Sol 394
557
Author Index Sala, Pilar 228, 238 Salvador, Zigor 623 S´ anchez, Jaime 402 Schiewe, Maria 564 Schmidt, Michael 574 Schmitzberger, Heinrich 374 Serrano, J. Artur 238 Sesto, Mary E. 535 Shin, Seungchul 738 Simonin, J´erˆ ome 748 Solheim, Ivar 758 Song, Jaekwang 57 Song, Ji-Won 412 Song, Rongqing 583 Steinbach-Nordmann, Silke 199 Stephanidis, Constantine 168, 279, 711 Storf, Holger 199 Tanviruzzaman, Mohammad Tj¨ ader, Claes 108 Tsai, Wang-Chin 422 Tutui, Rie 528, 613 Urban, Bodo
189
289
Vanderheiden, Gregg C. 432, 438 Vera-Mu˜ noz, Cecilia 139, 766 Vilimek, Roman 593 Villalar, Juan-Luis 75 Villalba, Elena 248
Vipperla, Ravichander Voskamp, J¨ org 289
117
Walker, Bruce N. 445 Waris, Heikki 3 Watanabe, Atsushi 674 Weber, Gerhard 564, 574 Weiland, Christian 603 Welge, Ralph 258 Wichert, Reiner 267 Widlroither, Harald 684 Winegarden, Claudia 127 Wolters, Maria 117 Wu, Fong-Gong 13, 455 Wu, Zhongke 583 Yamaguchi, Daijirou 528, 613 Yamamoto, Sachie 528, 613 Yamaoka, Toshiki 528, 613 Yang, Gang 650 Yang, Sung-Ho 412 Yao, Yan-Ting 82 Yoo, Seong Joon 57 Yoon, Sung-young 738 Yoon, Yeo-Jin 394 Yoshida, Mayuko 528, 613 Yu, Emily 493 Zander, Thorsten O. 593 Zhou, Mingquan 583
773