Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5061
Frode Eika Sandnes Yan Zhang Chunming Rong Laurence T. Yang Jianhua Ma (Eds.)
Ubiquitous Intelligence and Computing 5th International Conference, UIC 2008 Oslo, Norway, June 23-25, 2008 Proceedings
13
Volume Editors Frode Eika Sandnes Oslo University College, Oslo, Norway E-mail:
[email protected] Yan Zhang Simula Research Laboratory, Lysaker, Norway E-mail:
[email protected] Chunming Rong University of Stavanger, Stavanger, Norway E-mail:
[email protected] Laurence T. Yang St. Francis Xavier University, Antigonish, NS, Canada E-mail:
[email protected] Jianhua Ma Hosei University, Tokyo 184-8584, Japan E-mail:
[email protected]
Library of Congress Control Number: 2008929350 CR Subject Classification (1998): H.4, C.2, D.4.6, H.5, I.2, K.4 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13
0302-9743 3-540-69292-4 Springer Berlin Heidelberg New York 978-3-540-69292-8 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2008 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12277919 06/3180 543210
Preface
This volume contains the proceedings of UIC 2008, the 5th International Conference on Ubiquitous Intelligence and Computing: Building Smart Worlds in Real and Cyber Spaces. The conference was held in Oslo, Norway, during June 23–25, 2008. The event was the fifth meeting of this conference series. USW 2005 (First International Workshop on Ubiquitous Smart World), held in March 2005 in Taiwan, was the first event in the series. This event was followed by UISW 2005 (Second International Symposium on Ubiquitous Intelligence and Smart Worlds) held in December 2005 in Japan, by UIC 2006 (Third International Conference on Ubiquitous Intelligence and Computing: Building Smart Worlds in Real and Cyber Spaces) held in September 2006 in Wuhan and Three Gorges, China, and by UIC 2007 held in July 2007 in Hong Kong. Ubiquitous computers, networks and information are paving the road to a smart world in which computational intelligence is distributed throughout the physical environment to provide trustworthy and relevant services to people. This ubiquitous intelligence will change the computing landscape because it will enable new breeds of applications and systems to be developed; the realm of computing possibilities will be significantly extended. By embedding digital intelligence in everyday objects, our workplaces, our homes and even ourselves, many tasks and processes could be simplified, made more efficient, safer and more enjoyable. Ubiquitous computing, or pervasive computing, composes these numerous “smart things/u-things” to create the environments that underpin the smart world. A smart thing can be endowed with different levels of intelligence, and may be context-aware, active, interactive, reactive, proactive, assistive, adaptive, automated, sentient, perceptual, cognitive, autonomic and/or thinking. The field of intelligent/smart things is an emerging research field that covers many disciplines. A series of grand challenges exist to move from the world of ubiquitous computing with universal services of any means/place/time to the smart world of trustworthy services with the right means/place/time. The UIC 2008 conference offered a forum for researchers to exchange ideas and experiences in developing intelligent/smart objects, environments, and systems. This year, the technical program of UIC drew from a very large number of submissions: 102 papers submitted from 26 countries representing four regions – Asia Pacific, Europe, North and South America. Each accepted paper was reviewed (as a full paper) by at least three reviewers, coordinated by the international Program Committee. In order to allocate as many papers as possible and keep the high quality of the conference, we finally decided to accept 27 regular papers for presentation, which reflected a 26% acceptance rate. In addition, there were 28 special session papers included into the proceedings. The accepted papers provide research contributions
VI
Preface
in a wide range of research topics that were grouped into 14 conference tracks: smart objects and embedded systems, smart spaces/environments/services, intelligent computing, wireless sensor networks, context-aware applications and systems, and wireless networks. In addition to the refereed papers, the proceedings include Jadwiga Indulska’s keynote address on “Challenges in the Design and Development of Context-Aware Applications,” Petter Øyan’s keynote address on “The Importance of Including the Haptics Factor in Interaction Design,” and an invited paper from Stephen S. Yau on “Security Policy Integration and Conflict Reconciliation for Collaborations Among Organizations in Ubiquitous Computing Environment.” We believe that the conference not only presented novel and interesting ideas but also stimulated future research in the area of ubiquitous intelligence and computing. Organization of conferences with a large number of submissions requires a lot of hard work and dedication from many people. We would like to take this opportunity to thank the numerous people whose work made this conference possible and ensured its high quality. We wish to thank the authors of submitted papers, as they contributed to the conference technical program. We wish to express our deepest gratitude to the Program (Vice) Chairs for their hard work and commitment to quality when helping with paper selection. We would also like to thank all Program Committee members and external reviewers for their excellent job in the paper review process, the Steering Committee and Advisory Committee for their continuous advice, and Jadwiga Indulska and Daqing Zhang for organizing a panel on the important question: “What do we expect from pervasive/intelligent computing and how far are we from achieving it?” A special thanks to Yo-Ping Huang for organizing a special session on “Object Identification: Techniques and Applications.” We are also in debt to the Publicity Chairs for advertising the conference, to the Local Organizing Committee for managing the registration and other conference organization-related tasks, and to Oslo University College for hosting the conference. We are also grateful to Kyrre Begnum for the hard work on managing the conference website and the conference management system.
June 2008
Frode Eika Sandnes, Yan Zhang, Chunming Rong, Laurence Tianruo Yang and Jianhua Ma UIC 2008 Editors
Organization
Executive Committee General Chairs
Program Chairs
Program Vice Chairs
Honorary Chairs Steering Committee
International Advisory Committee
Frode Eika Sandnes, Oslo University College, Norway Mark Burgess, Oslo University College, Norway Chunming Rong, University of Stavanger, Norway Yan Zhang, Simula Research Laboratory, Norway D. Manivannan, University of Kentucky, USA Michael Beigl, University of Hannover, Germany Yo-Ping Huang, National Taipei University of Technology, Taiwan Oliver Sinnen, University of Auckland, New Zealand Graham Megson, University of Reading, UK Stephen S. Yau, Arizona State University, USA Norio Shiratori, Tohoku University, Japan Jianhua Ma (Chair), Hosei University, Japan Laurence T. Yang (Chair), St. Francis Xavier University, Canada Hai Jin, Huazhong University of Science and Technology, China Jeffrey J.P. Tsai, University of Illinois at Chicago, USA Theo Ungerer, University of Augsburg, Germany Leonard Barolli, Fukuoka Institute of Technology, Japan Victor Callaghan, University of Essex, UK Yookun Cho, Seoul National University, Korea Sumi Helal, University of Florida, USA Frank Hsu, Fordham University, USA Ali R. Hurson, Pennsylvania State University, USA Qun Jin, Waseda University, Japan Beniamino Di Martino, Second University of Naples, Italy Christian Muller-Schloer, University of Hannover, Germany Timothy K. Shih, Tamkang University, Taiwan Ivan Stojmenovic, Ottawa University, Canada Makoto Takizawa, Tokyo Denki University, Japan David Taniar, Monash University, Australia Jhing-Fa Wang, National Cheng Kung University, Taiwan
VIII
Organization
Executive Committee(continued) International Advisory Committee
Publicity Chairs
International Liaison Chairs
Award Chairs
Panel Chairs
Special Session Organizer Publication Chairs Web Chair Local Organizing Chair
Guangyou Xu, Tsinghua University, China Yaoxue Zhang, Tsinghua University, China Albert Zomaya, University of Sydney, Australia Yueh-Min Huang, National Cheng Kung University, Taiwan Jinhua Guo, University of Michigan - Dearborn, USA Mieso Denko, University of Guelph, Canada Hakan Duman, British Telecom, UK Ohbyung Kwon, Kyunghee University, Korea Wen Jing Hsu, National University of Singapore, Singapore Benxiong Huang, Huazhong University of Science and Technology, China Jong Hyuk Park, Kyungnam University, Korea Leonel Sousa, INESC-ID, Portugal Si Qing Zheng, University of Texas at Dallas, USA David Nelson, University of Sunderland, UK Jiannong Cao, Hong Kong Polytechnic University, Hong Kong Demissie Aredo, Oslo University College, Norway Jadwiga Indulska, University of Queensland, Australia Daqing Zhang, National Institute of Telecommunications, France Yo-Ping Huang, National Taipei University of Technology, Taiwan Tony Li Xu, St. Francis Xavier University, Canada Liu Yang, St. Francis Xavier University, Canada Ismail Hassan, Oslo University College, Norway Siri Fagernes, Oslo University College, Norway Simen Hagen, Oslo University College, Norway Kirsten Ribu, Oslo University College, Norway Kyrre Begnum, Oslo University College, Norway Jie Xiang, Simula Research Laboratory, Norway Qin Xin, Simula Research Laboratory, Norway Hai Ngoc Pham, University of Oslo, Norway
Program Committee Waleed Abdulla Sheikh Iqbal Ahamed
The University of Auckland, New Zealand Marquette University, USA
Organization
IX
Program Committee(continued) Alexandra Branzan Albu Najah Abu Ali Noureddine Boudriga Phillip G. Bradford Tsun-Wei Chang Jiming Chen Ruey-Maw Chen Shu-Chen Cheng Wei Chen Wen Chen Peter H.J. Chong Hung-Chi Chu Alva Couch Alma Leora Culn Waltenegus Dargie Reza Dilmaghani Panayotis E. Fouliras Alois Ferscha Edgar Gabriel Antonio Mana Gomez Zonghua Gu Khoon Guan Winston Seah Arobinda Gupta Laurence Habib Hrek Haugerud Dag Haugland Jianhua He Shang-Lin Hsieh Bin Hu Lin-Huang Chang Yu Hua Tao Jiang Wenbin Jiang Tore Møller Jonassen Audun Josang James B.D. Joshi Akimitsu Kanzaki Helen Karatza Andreas J. Kassler
University of Victoria, Canada United Arab Emirates University, UAE University of Carthage, Tunisia University of Alabama, USA De Ling Institute of Technology, Taiwan Zhejiang University, China National Chin-Yi University of Technology, Taiwan Southern Taiwan University, Taiwan Nanjing University of Posts and Telecommunications, China Shanghai Jiaotong University, China Nanyang Technological University, Singapore Chaoyang University of Technology, Taiwan Tufts University, USA University of Oslo, Norway Technical University of Dresden, Germany King’s College London, UK University of Macedonia, Greece Johannes Kepler Universit¨ at Linz, Austria University of Houston, USA University of Malaga, Spain Hong Kong University of Science and Technology, Hong Kong Institute for Infocomm Research, Singapore Indian Institute of Technology, India Oslo University College, Norway Oslo University College, Norway Bergen University, Norway Swansea University, UK Tatung University, Taiwan University of Central England, UK National Taichung University, Taiwan Huazhong University of Science and Technology, China University of Michigan, USA Huazhong University of Science and Technology, China Oslo University College, Norway Queensland University of Technology, Australia University of Pittsburgh, USA Osaka University, Japan Aristotle Univeristy of Thessaloniki, Greece Karlstad University, Sweden
X
Organization
Program Committee(continued) Paris Kitsos Gabriele Kotsis Lars Kristiansen Stefan Lankes Matti Latva-aho Cai Li Frank Li Jie Li Shiguo Lian Daw-Tung Lin Giuseppe Lo Re Arne Lkketangen Sathiamoorthy Manoharan Hassnaa Moustafa Josef Noll Lukowicz Paul
Hellenic Open University, Greece Johannes Kepler University of Linz, Austria University of Oslo, Norway RWTH Aachen University, Germany Oulu University, Finland Communications University of China, China Agder University, Norway University of Tsukuba, Japan France Telecom R&D Beijing, China National Taipei University, Taiwan University of Palermo, Italy Molde University College, Norway
University of Auckland, New Zealand France Telecom R&D , France UniK/Movation, Norway Lehrstuhl f¨ ur Informatik, Universit¨ at Passau, Germany Marius Portmann University of Queensland, Australia Ravi Prakash University of Texas at Dallas, USA Antonio Puliafito University of Messina, Italy Ragnihild Kobro Runde University of Oslo, Norway Hamid Sharif University of Nebraska, USA Albrecht Schmidt University of Munich, Germany Rose Shumba Indiana University of Pennsylvania, USA Isabelle Simplot-Ryl University of Lille, France Lingyang Song University of Oslo, Norway Min Song Old Dominion University, USA Lambert Spaanenburg Lund University, Sweden Zhou Su Waseda University, Japan Evi Syukur Monash University, Australia Chik How Tan Gjvik University College, Norway Hallvard Trtteberg Norwegian University of Science and Technology, Norway Shiao-Li Tsao National Chiao Tung University, Taiwan Thanos Vasilakos University of Western Macedonia, Greece Javier Garcia Villalba Complutense University of Madrid, Spain Agustinus Borgy Waluyo Institute for Infocomm Research (I2R), Singapore Guojun Wang Central South University, China Tzone-I Wang National Cheng-Kung University, Taiwan Xinbing Wang Shanghai Jiaotong University, China Zhijun Wang Hong Kong Polytechnic University, Hong Kong Jiang (Linda)Xie University of North Carolina at Charlotte, USA Qin Xin Simula Research Laboratory, Norway
Organization
Program Committee(continued) Stephen Yang Wei Yen Muhammad Younas Rong Yu Hongwei Zhang Chi Zhou Yuefeng Zhou
National Central University, Taiwan Tatung University, Taiwan Oxford Brookes University, UK South China University of Technology, China Wayne State University, USA Illinois Institute of Technology, USA NEC Laboratories Europe, UK
XI
Table of Contents
Keynote Speech Challenges in the Design and Development of Context-Aware Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jadwiga Indulska
1
The Importance of Including the Haptics Factor in Interaction Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Petter Øyan and Anne Liseth Schøyen
2
Regular Papers Ubiquitous Computing Security Policy Integration and Conflict Reconciliation for Collaborations among Organizations in Ubiquitous Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephen S. Yau and Zhaoji Chen Improved Weighted Centroid Localization in Smart Ubiquitous Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephan Schuhmann, Klaus Herrmann, Kurt Rothermel, Jan Blumenthal, and Dirk Timmermann A Formal Framework for Expressing Trust Negotiation in the Ubiquitous Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deqing Zou, Jong Hyuk Park, Laurence Tianruo Yang, Zhensong Liao, and Tai-hoon Kim Pervasive Services on the Move: Smart Service Diffusion on the OSGi Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Davy Preuveneers and Yolande Berbers
3
20
35
46
Smart Spaces/Environments/Services Robots in Smart Spaces - A Case Study of a u-Object Finder Prototype - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomomi Kawashima, Jianhua Ma, Bernady O. Apduhan, Runhe Huang, and Qun Jin Biometrics Driven Smart Environments: Abstract Framework and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vivek Menon, Bharat Jayaraman, and Venu Govindaraju
61
75
XIV
Table of Contents
A Structured Methodology of Scenario Generation and System Analysis for Ubiquitous Smart Space Development . . . . . . . . . . . . . . . . . . . . . . . . . . . Ohbyung Kwon and Yonnim Lee Capturing Semantics for Information Security and Privacy Assurance . . . Mohammad M.R. Chowdhury, Javier Chamizo, Josef Noll, and Juan Miguel G´ omez
90
105
Context-Aware Services and Applications A Framework for Context-Aware Home-Health Monitoring . . . . . . . . . . . . Alessandra Esposito, Luciano Tarricone, Marco Zappatore, Luca Catarinucci, Riccardo Colella, and Angelo DiBari Semantic Learning Space: An Infrastructure for Context-Aware Ubiquitous Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhiwen Yu, Xingshe Zhou, and Yuichi Nakamura A Comprehensive Approach for Situation-Awareness Based on Sensing and Reasoning about Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Springer, Patrick Wustmann, Iris Braun, Waltenegus Dargie, and Michael Berger
119
131
143
Context-Adaptive User Interface in Ubiquitous Home Generated by Bayesian and Action Selection Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Han-Saem Park, In-Jee Song, and Sung-Bae Cho
158
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yan Tang and Robert Meersman
169
Combining User Profiles and Situation Contexts for Spontaneous Service Provision in Smart Assistive Environments . . . . . . . . . . . . . . . . . . . Weijun Qin, Daqing Zhang, Yuanchun Shi, and Kejun Du
187
Ubiquitous Phone System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shan-Yi Tsai, Chiung-Ying Wang, and Ren-Hung Hwang
201
Utilizing RFIDs for Location Aware Computing . . . . . . . . . . . . . . . . . . . . . Benjamin Becker, Manuel Huber, and Gudrun Klinker
216
Intelligent Computing: Middleware, Models and Services A Component-Based Ambient Agent Model for Assessment of Driving Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tibor Bosse, Mark Hoogendoorn, Michel C.A. Klein, and Jan Treur
229
Table of Contents
XV
A Cartesian Robot for RFID Signal Distribution Model Verification . . . . Aliasgar Kutiyanawala and Vladimir Kulyukin
244
Self-Localization in a Low Cost Bluetooth Environment . . . . . . . . . . . . . . . Julio Oliveira Filho, Ana Bunoza, J¨ urgen Sommer, and Wolfgang Rosenstiel
258
Penetration Testing of OPC as Part of Process Control Systems . . . . . . . Maria B. Line, Martin Gilje Jaatun, Zi Bin Cheah, A.B.M. Omar Faruk, H˚ avard Husev˚ ag Garnes, and Petter Wedum
271
Intersection Location Service for Vehicular Ad Hoc Networks with Cars in Manhattan Style Movement Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yao-Jen Chang and Shang-Yao Wu
284
Ubiquitous and Robust Text-Independent Speaker Recognition for Home Automation Digital Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jhing-Fa Wang, Ta-Wen Kuan, Jia-chang Wang, and Gaung-Hui Gu
297
Energy Efficient In-Network Phase RFID Data Filtering Scheme . . . . . . . Dong-Sub Kim, Ali Kashif, Xue Ming, Jung-Hwan Kim, and Myong-Soon Park Energy-Efficient Tracking of Continuous Objects in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jung-Hwan Kim, Kee-Bum Kim, Chauhdary Sajjad Hussain, Min-Woo Cui, and Myong-Soon Park
311
323
Wireless Sensor Networks Data Randomization for Lightweight Secure Data Aggregation in Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abedelaziz Mohaisen, Ik Rae Jeong, Dowon Hong, Nam-Su Jho, and DaeHun Nyang Mobile Sink Routing Protocol with Registering in Cluster-Based Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ying-Hong Wang, Kuo-Feng Huang, Ping-Fang Fu, and Jun-Xuan Wang
338
352
Towards the Implementation of Reliable Data Transmission for 802.15.4-Based Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . Taeshik Shon and Hyohyun Choi
363
An Energy-Efficient Query Processing Algorithm for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun-Zhao Sun
373
XVI
Table of Contents
Special Session Papers Smart Objects and Embedded Computing Rule Selection for Collaborative Ubiquitous Smart Device Development: Rough Set Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyoung-Yun Kim, Keunho Choi and Ohbyung Kwon
386
An Object-Oriented Framework for Common Abstraction and the Comet-Based Interaction of Physical u-Objects and Digital Services . . . . Kei Nakanishi, Jianhua Ma, Bernady O. Apduhan, and Runhe Huang
397
Personalizing Threshold Values on Behavior Detection with Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroyuki Yamahara, Fumiko Harada, Hideyuki Takada, and Hiromitsu Shimakawa IP Traceback Using Digital Watermark and Honeypot . . . . . . . . . . . . . . . . Zaiyao Yi, Liuqing Pan, Xinmei Wang, Chen Huang, and Benxiong Huang
411
426
Wireless Networks: Routing, Mobility and Security Multi-priority Multi-path Selection for Video Streaming in Wireless Multimedia Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lin Zhang, Manfred Hauswirth, Lei Shu, Zhangbing Zhou, Vinny Reynolds, and Guangjie Han
439
Energy Constrained Multipath Routing in Wireless Sensor Networks . . . Antoine B. Bagula and Kuzamunu G. Mazandu
453
Controlling Uncertainty in Personal Positioning at Minimal Measurement Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hui Fang, Wen-Jing Hsu, and Larry Rudolph
468
RFID System Security Using Identity-Based Cryptography . . . . . . . . . . . . Yan Liang and Chunming Rong
482
Ubiquitous Computing RFID: An Ideal Technology for Ubiquitous Computing? . . . . . . . . . . . . . . . Ciaran O’Driscoll, Daniel MacCormac, Mark Deegan, Fred Mtenzi, and Brendan O’Shea An Experimental Analysis of Undo in Ubiquitous Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Loregian and Marco P. Locatelli
490
505
Table of Contents
XVII
Towards a Collaborative Reputation Based Service Provider Selection in Ubiquitous Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . Malamati Louta
520
Petri Net-Based Episode Detection and Story Generation from Ubiquitous Life Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Young-Seol Lee and Sung-Bae Cho
535
Smart Spaces/Environments/Services Protection Techniques of Secret Information in Non-tamper Proof Devices of Smart Home Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abedelaziz Mohaisen, YoungJae Maeng, Joenil Kang, DaeHun Nyang, KyungHee Lee, Dowon Hong, and Jong-Wook Han
548
Universal Remote Control for the Smart World . . . . . . . . . . . . . . . . . . . . . . Jukka Riekki, Ivan Sanchez, and Mikko Pyykk¨ onen
563
Mobile Navigation System for the Elderly – Preliminary Experiment and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takahiro Kawamura, Keisuke Umezu, and Akihiko Ohsuga
578
Time Stamp Protocol for Smart Environment Services . . . . . . . . . . . . . . . . Deok-Gyu Lee, Jong-Wook Han, Jong Hyuk Park, Sang Soo Yeo, and Young-Sik Jeong
591
Intelligent Computing: Middleware, Models and Services An Analysis of the Manufacturing Messaging Specification Protocol . . . . Jan Tore Sørensen and Martin Gilje Jaatun
602
A Long-Distance Time Domain Sound Localization . . . . . . . . . . . . . . . . . . . Jhing-Fa Wang, Jia-chang Wang, Bo-Wei Chen, and Zheng-Wei Sun
616
Towards Dataintegration from WITSML to ISO 15926 . . . . . . . . . . . . . . . . Kari Anne Haaland Thorsen and Chunming Rong
626
A SIP-Based Session Mobility Management Framework for Ubiquitous Multimedia Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chung-Ming Huang and Chang-Zhou Tsai
636
Context-Aware Services and Applications AwarePen - Classification Probability and Fuzziness in a Context Aware Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Berchtold, Till Riedel, Michael Beigl, and Christian Decker
647
XVIII
Table of Contents
A Model Driven Development Method for Developing Context-Aware Pervasive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estefan´ıa Serral, Pedro Valderas, and Vicente Pelechano
662
Intelligent System Architecture for Context-Awareness in Ubiquitous Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jae-Woo Chang and Seung-Tae Hong
677
User-Based Constraint Strategy in Ontology Matching . . . . . . . . . . . . . . . . Feiyu Lin and Kurt Sandkuhl
687
Object Identification: Techniques and Applications RFID-Based Interactive Learning in Science Museums . . . . . . . . . . . . . . . . Yo-Ping Huang, Yueh-Tsun Chang, and Frode Eika Sandnes
697
Real-time Detection of Passing Objects Using Virtual Gate and Motion Vector Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daw-Tung Lin and Li-Wei Liu
710
A Ubiquitous Interactive Museum Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yo-Ping Huang, Tsun-Wei Chang, and Frode Eika Sandnes
720
Dynamic Probabilistic Packet Marking with Partial Non-Preemption . . . Wei Yen and Jeng-Shian Sung
732
Fractal Model Based Face Recognition for Ubiquitous Environments . . . . Shuenn-Shyang Wang, Su-Wei Lin, and Cheng-Ming Cho
746
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
761
Challenges in the Design and Development of Context-Aware Applications Jadwiga Indulska School of Information Technology and Electrical Engineering, The University of Queensland and NICTA
[email protected]
Abstract. Context-awareness plays an important role in pervasive computing as adaptations of applications to context changes (changes in computing environment and in user activities/tasks) help to achieve the goal of computing services available everywhere and at any time. There is a growing body of research on context-aware applications that are adaptable and capable of acting autonomously on behalf of users. However, there are still many open research issues that challenge the pervasive computing community. In this talk I discuss several of these research challenges. First, I outline the state of the art in context information modelling, management and reasoning as well as possible future research directions in this area. This is followed by a discussion of context information management that allows development of fault-tolerant and autonomic context-aware applications. As one of the challenges inhibiting the development of context-aware applications is their complexity, I discuss software engineering approaches that ease the task of developing such applications. Context-aware applications adapt to context changes using context information. However this context information may be imprecise or erroneous and therefore can lead to incorrect adaptation decisions creating usability problems and affecting acceptance of context-aware applications. This creates a need for some balance between autonomy of context-aware applications and the user control of the applications. I describe some early approaches my team is working on to tackle this problem. Finally, I discuss research issues related to privacy of context information and how context can be used to enhance security mechanisms within pervasive computing environments.
NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program; and the Queensland Government.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, p. 1, 2008. c Springer-Verlag Berlin Heidelberg 2008
The Importance of Including the Haptics Factor in Interaction Design Petter Øyan and Anne Liseth Schøyen
Abstract. One aspect of interaction design is communication of information through the use of a screen. Another aspect, the physical or haptic interaction with the device itself, is another important issue, especially to reduce errors when the device is used in critical situations. Ostfold university college has cooperated with the Institute for Energy Technology in Halden and the Norwegian Defence Research Establishment in Kjeller, and several student groups have been given the opportunity to work on research data and in collaboration with researchers from these two institutions within their final year studies. The focus of the projects has been realistic; emphasizing the importance of minimizing operating errors to ensure safe operation in critical situations, where the ability to give correct feedback through haptic interaction is as important as the correct understanding of visual communication. The cases are demonstrating the user centered approach to problem solving used by industrial designers and the analogy between the design- and the resarch process, especially focusing on the use of physical designs to test and review and thereby exploring form as an interaction parameter.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, p. 2, 2008. © Springer-Verlag Berlin Heidelberg 2008
Security Policy Integration and Conflict Reconciliation for Collaborations among Organizations in Ubiquitous Computing Environments Stephen S. Yau and Zhaoji Chen Department of Computer Science and Engineering School of Computing and Informatics Arizona State University Tempe, AZ 85287-8809, USA {yau,zhaoji.chen}@asu.edu
Abstracts. The investigative capabilities and productivity of researchers and practitioners of various organizations can be greatly expanded with collaborations among these organizations in ubiquitous computing environments. However, in order to provide effective collaborations among organizations in ubiquitous computing environments, due to the heterogeneity and dynamic nature of the ubiquitous computing environments it is critical to generate an integrated security policy set to govern the interactions among collaborating organizations during the collaboration. Existing approaches in security policy integration and conflict analysis have focused on static policy analysis which cannot address the dynamic formation of collaborative groups in ubiquitous computing environments. In this paper, an approach is presented to security policy integration, including a similarity-based policy adaptation algorithm for changing collaborative groups and a negotiation-based policy generation protocol for the new resources generated by the collaboration as well as for conflict reconciliation. Our approach can be automated with user defined control variables, which can greatly improve the efficiency and make the system scalable with little human intervention. Keywords: Security policy integration, conflict reconciliation, organization collaboration, similarity-based policy adaptation, negotiation-based policy reconciliation, ontology, ubiquitous computing environments.
1 Introduction Due to rapid development of mobile devices and embedded systems, computing resources are embedded in various devices, appliances, such as cell phones, PDAs, televisions, TiVo, refrigerator and cars. All these devices can provide certain information processing capability. With the technologies, such as Bluetooth and Wi-Fi, these devices can easily communicate and share data and/or resources with each other. These advances in ubiquitous computing can greatly improve the investigative capabilities and productivity of researchers and practitioners in many fields, such as health care, e-business, e-government, telecommunication, and homeland security, F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 3–19, 2008. © Springer-Verlag Berlin Heidelberg 2008
4
S.S. Yau and Z. Chen
where the sharing of data, equipment and other resources among collaborating organizations may be extensive. In such an environment, computing resources are distributed throughout the physical environment, and are usually owned and managed by various organizations. An organization can be viewed as a set of nodes which has a set of users, certain resources and is capable of performing certain functions autonomously. To facilitate collaborations among participating organizations, we need to make their resources available across organizations seamlessly. By seamlessly, we mean if a user from one of the participating organizations enters the environment controlled by another organization, the user can access and use the appropriate resources through direct interactions with the resource owners. Otherwise, the user will need to send a request to his/her affiliated organization, and then the affiliated organization will forward the request to the organizations which own the resources. The resource owners would then make the decisions on whether to allow the user to access their resources, and then inform the affiliated organization, which will inform the user of their decisions. It is obvious that seamless resource sharing will greatly improve the effectiveness and efficiency of collaborations among various organizations. However, ubiquitous computing devices in such ubiquitous computing environments are more vulnerable to attacks than other distributed computing systems, especially when malicious users pretend to be legitimate users and try to sneak in the collaboration groups. With the growing concerns on security and privacy, a major obstacle for this kind of collaborations to be widely adopted, especially for critical applications, is how to govern the access to sensitive data and limited resources during the collaborations. Each organization may specify its own set of security policies to govern how its users to access its resources. But, in collaborations among organizations, we need to integrate all the relevant security policies from the participating organizations to govern the interactions throughout the collaborations. The heterogeneity and dynamic nature of ubiquitous computing environments poses certain unique challenges for generating an integrated congruous security policy set. In this paper, we will present an approach to integrating security policies of collaborating organizations in ubiquitous computing environments, including a similarity-based policy adaptation algorithm for security policy integration and a negotiation protocol for conflict reconciliation. An example will be given to illustrate our approach.
2 Challenges and Current State of the Art Collaborations among organizations in ubiquitous computing environments have the following characteristics. First, each organization is a group of nodes which manages its own resources and can function autonomously. Second, a group of collaborating organizations can be formed and changed at runtime. Each organization can choose to join or leave the group based on its needs. Third, organizations are peers to each other. Normally, there is no organization to serve as an authority to coordinate or manage the interactions among various collaborating organizations. Due to these characteristics, we have the following three major challenges for providing security in such environments.
Security Policy Integration and Conflict Reconciliation
5
1) Ambiguous security policy specifications: Each participating organization specifies its security policies describing how its resources should be used, including various limits and access control, and how these policies are enforced at runtime. Because each organization acts autonomously, the organizations can choose different vocabularies to specify their own security policies, which may cause confusion and difficulty when they try to understand each other’s policies. 2) Dynamic set of users: Individual organization’s security policies normally do not address the expanded user sets for collaborations. An organization’s security policies only address its own users and resources. Because there is no authority in the coalition to manage the requests from other organizations, and because the group of collaborating organizations may be changing, security policies specified by participating organizations need to be integrated together to govern the interactions during the collaboration, and the owner of a resource needs to know how to handle the requests from other organizations at runtime without prior knowledge of the collaborators. The collaboration may generate some resources that cannot be provided by any single participating organization. To govern future accesses to these new resources, the integrated security policy set should include policies for all these resources and the policies need to be acceptable by all participating organizations. 3) Conflicting security policies: Participating organizations may have inconsistent security policies among them. Because the group of collaborating organizations may continuously change at runtime, the organizations cannot anticipate all possible collaborative partners a priori and specify their security policies accordingly. The fact that each participating organization has little knowledge about what has or has not been specified by the other organizations may lead to redundancy and conflict when trying to integrating their security policies together. To avoid inconsistency in security policies integration and enforcement, redundant policies should be removed, and conflicts must be resolved. To address the first challenge, several approaches [1, 2, 4] have been developed to provide unambiguous interoperable security policy specifications using XML. These approaches provide syntactic tags for the security policy specifications and the tags show how the specifications should be interpreted. The general syntax of such approaches is rigid in the sense that specifications must adhere to the general rules of XML to ensure that all XML-aware software can at least read and understand the relative arrangement of information within them. This usually results in lengthy specifications as the user needs to define a lot of tags just to comply with XML standards. We have developed an approach based on ontology in [3], where the security policy has a fixed format, but users can derive individual components from an ontology to specify application-specific information needed in the security policy specifications. All these approaches can address Challenge 1) sufficiently and provide unambiguous security policy specifications across collaborating organizations. In this paper, we will present methods to address Challenges 2) and 3). To address Challenge 2), relevant security policies from various collaborating organizations need to be combined together. Bonatti et al. [5] dealt the policy composition problem through a multi-level security model. Bidan et al. [6] introduced a set of
6
S.S. Yau and Z. Chen
combination operators to combine security policies in a controlled manner. Several algebraic operators for combining enterprise privacy policies specified by EPAL were introduced in [7]. SPL [8] has stratified security rules, and there is a special layer of rules, which is used to control policy integration. These approaches are common in that they all require prior knowledge about the other collaborating organizations. However, in ubiquitous computing environments, groups of collaborating organizations are continuously changing with participating organizations joining or leaving the groups/. Each organization has little prior knowledge on the security policies of other collaborating organizations and how they will impact its own security policies. To address Challenge 3), Ribeiro, Zuquete and Ferreira [8] specified special rules to decide which security policy should override the others when there is a conflict. Similarly, Lupu and Sloman [9] specified “policy precedence relationships” to determine which one of two conflicting policies should apply and which should be ignored. Agrawal et al. [10] presented an approach to eliminating conflicts by checking the new policies and prevent them from being added to the integrated policy set if there is a conflict. All these approaches rely on some policy hierarchy structure, where certain policies are more important than the others, and more important policies override less important ones in case of conflict. For [10], the security policies that are already in the integrated set are more important than those that are waiting to join the set. It is difficult to adopt these approaches in ubiquitous computing environments because there is no authority in the group of collaborating organizations to establish such hierarchy and each participating organization is an autonomous entity. Hence, it is not reasonable to choose an organization’s security policies over the others in case of conflict. So far, no comprehensive approach addresses both Challenges 2) and 3) for collaborations among organizations in ubiquitous computing environments, where groups of collaborating organizations are continuously changing with little prior knowledge about when other organizations will join and what kinds of security policies they have specified. This makes policy integration and conflict reconciliation more difficult compared to the case that the groups of collaborating organizations remain the same.
3 Our Overall Approach Our approach aims at generating an integrated, congruous security policy set based on the security policies specified by individual participating organizations. Our overall approach is shown in Fig.1 and consists of the following three major parts. Part1) uses a specification language similar to the one we presented in [3] to generate security policy specifications for each collaborating organization, where a security policy can be specified as a quadtuple with the components derived from the ontology containing the key concepts needed for specifying security policies. For the sake of completeness, we will summarize the important aspects of this specification language needed in our approach in the Appendix.
Security Policy Integration and Conflict Reconciliation
7
Part2) adapts existing security policies to address the expanded user set for the collaboration. When an access to a resource is requested by a user from other participating organizations, the resource owner should make the decision on whether to grant the request to the resource. However, since the requester is from another organization, the resource owner may not have much knowledge about this requester, which makes it difficult to make a decision on the request. In reality, it is natural to evaluate a
OrgA
OrgB
Natural Language Security Policies
Natural Language Security Policies
Security Policy Specification Language
Security Policy Specification Language
Part1
Security Policy Specifications
Security Policy Specifications
Address expanded user set
Address newly generated resources Integration Process Negotiation-base Policy Generation
Similarity-base Security Policy Adaptation
Part2 Integrated Security Policy Specifications
Part3
Yes Have conflicts?
No
Negotiation -base Conflict Reconciliation
Final Security Policy Set for Collaboration
Fig. 1. Our overall approach for security policy integration and conflict reconciliation
8
S.S. Yau and Z. Chen
stranger’s trustworthiness based on where he comes from and how he is trusted in the organizations he belonged before. Thus, we will use a similarity-base security policy adaptation algorithm to help the resource owner make a decision based on evaluating whether the participating organizations that the user comes from have similar security policies, and whether the user request will be granted under those security policies. The details of this algorithm will be presented in Section 4. Part3) uses a negotiation-based approach to generating a congruous set of security policies which can be accepted by all participating organizations. Participating organizations will first provide their inputs towards the generation of the policies, and then make compromise in order to resolve possible conflicts and reach an agreement. This is used to address possible conflicts which arise from the integration, and to specify new security policies for the resources generated by the collaboration. In both cases, no single organization can claim the sole ownership of the generated resources, and thus no security policies of any participating organization should take priority over those of other organizations. The details of this approach will be presented in Section 5.
4 Similarity-Based Security Policy Integration Algorithm The specification approach presented in the Appendix provides an unambiguous, interoperable security policy specification that is understandable by other participating organizations. To enable collaborations among various collaborating organizations, security policies specified by these organizations need to be integrated to govern the interactions among them during collaboration. It is similar to corporation mergers, where company policies related to labor forces, equipments as well as intellectual properties need to be integrated before the merge can be completed. In such a case, the integration of the policies of different companies is done through negotiation talks between the management and legal departments of the merging companies. However, for dynamic collaborations in ubiquitous computing environments, it is simply not practical if we need to halt the collaboration and get human involvement every time there is an organization join or leave the group of collaborating organizations. In this paper, we present an alternative approach based on policy adaptation and negotiation to achieve dynamic security policy integration with minimum human intervention. During the collaboration, an organization may need to choose to make its resources available to other collaborating organizations so that users from those organizations can access the resources. Even though a collaboration may involve multiple organizations, it is rare that a user belong to two or more of these organizations. In case a user belongs to more than one organization during a collaboration, when that user requests to access a specific resource, we can require the user to specify which organization affiliation he wants to submit this request. This is similar to system administration that a person can be a normal user or an administrator. But, when the user wants to perform system administration tasks, he must sign in as an administrator. So we assume that when a user requests to access a resource, he belongs to only one organization. A general scenario of such pair-wise user-resource access scenario is shown in Figure 2.
Security Policy Integration and Conflict Reconciliation
9
Fig. 2. Pair-wise user-resource integration
There are two organizations OrgA and OrgB. OrgA has its user set UA= {ui, uj, …} and data set DatA ={datAp, datAq …}. A set of security policies PA has been specified to govern the data access behavior in OrgA. Similarly, OrgB has its user set UB, data set DatB, and security policy set PB. The collaboration will not only make existing data available to all the researchers, it may also generate some new data which appears in neither DatA nor DatB. For example, the collaboration can produce a new data datk through analysis of datAq and datBm. We use DatAB to denote the set of the newly generated data. The integrated security policy set should be able to handle all possible data access requests. Let u be the requester and dat be the requested data. There are three types of requests. Type1 consists of C1: dat ∈ DatA , u ∈ UA and C2: dat ∈ DatB , u ∈ UB, where u and dat are in the same organization. Type2 consists of C3: dat ∈ DatB , u ∈ UA and C4: dat ∈ DatA , u ∈ UB, where u and dat are in different organizations and the original security policies specified by the data owner cannot cover the new users from the other collaborating organization. Type3 is C5: dat ∈ DatAB, u ∈ UA ∪ UB, where dat is newly generated data and both collaborating organizations can claim ownership of the data. For Type1 requests, the corresponding original security policies in PA and PB apply. For Type3 requests, the policy integration needs to generate a new security policy to address the security concerns from both organizations. To generate such policies, similar to a conflict reconciliation process, we take the security concerns of the newly generated data from both organizations, and through a negotiation process, a common set of policies accepted by both organizations can be generated. The negotiation process will be discussed in Section 5.
10
S.S. Yau and Z. Chen
For Type2 requests, we will present a similarity-based security policy adaptation algorithm for generating integrated policies. This type of requests does not have existing security policies because the request and the resource belong to different organizations. Original security policies specified by the data owner cannot cover the new users from the other collaborating organization, and the collaborating organization does not have policies to govern the access to data not belonging to the organization. However, because one organization is the only owner of the data, the organization should control the generation of new security policies to address the new users of the data. To deal with this type of requests, adapting the existing security policies of the organization, which owns the data, to handle the new user set, is better than negotiation with the other organization. Because the data owner may have little knowledge about a requester which is not part of its original user set, it is difficult for the data owner to know how to adapt its security policy accordingly. However, it is very difficult to adapt existing security policies to address the new users without heavy human intervention, Because an organization may have specified a list of hundreds or even thousands security policies for its shared resources, it will greatly reduce the efficiency of the collaboration and easily cause human errors due to the large number of requests, On the other hand, even though the organization having the new requestor does not have a security policy to govern the access to this resource, it is often that there are some security policies for this requestor to access some resources similar to the resource it currently requests for accessing. For example, a user u ∈ UB wants to access the raw medical data dat1 ∈ DatA, which contains patients’ SSN. OrgA does not have existing security policy to cover u. If OrgA knows that u can access dat2 ∈ DatB, which contains patients’ mental evaluation results, medication history, and other sensitive information, it is reasonable for OrgA to draw the conclusion that u is trusted to handle sensitive data in OrgB. It is less risky to extend the access of dat1 to u compared to some other user u’ from the same organization which does not have such kind of access rights. Based on this observation, we can develop the following policy adaption algorithm based on policy similarity evaluation. We first need to define a metric for the similarities among different entities. Because our security policy specification is based on a vocabulary organized in a hierarchical structure [3], relative relations between various elements in security policies specifications can be established. Let e1, e2, and e3 be any three elements, e4 be the lowest common ancestor of e1 and e2 in the hierarchical structure, and e5 be the lowest common ancestor of e1 and e3. Element e1 is said to be closer to e2 than to e3 if e4 appears at a lower level in the class hierarchy than e5. The similarity index SI(ei , ej) between two elements ei and ej, is defined as follows:
{
SI (ei , e j ) =
1 if ei = e j SI (ek , e j ) / n if e j is ei 's ancestor, where n is the number of siblings e1 has and ek is ei 's parent SI (ei , el ) × SI (e j , el ) for all other cases, where el is the lowest common ancester of ei and e j
For example, e1:“SHA-1”is a child of e4:“Hash Function”. e2:“MD5” is also a child of e4. e3:“Fingerprint checking” is a child of e5: “Biometric Authentication” mechanism. Both e4 and e5 are derived from e6: “SecurityMechanism”. If these are all the elements in this example, then the similarity between e1 and e2, and between e1 and e3 can be calculated as follows:
Security Policy Integration and Conflict Reconciliation
11
SI(e1 , e2)= SI(e1 , e4) × SI(e2 , e4)=(1/2)×(1/2)=0.25, SI(e1 , e3)= SI(e1 , e6) × SI(e3 , e6)=((1/2)/2)×(1/2)=0.125. SI(e1 , e2) > SI(e1 , e3), which is in line with the fact that “SHA-1” is closer to “MD5” compared to “Fingerprint checking”. The similarity between two atomic security policies, P= (S, A, O, C) and P’ = (S’, A’, O’, C’), is defined as follows: SI(P, P’) = SI(S, S’) × SI(A, A’) × SI(O, O’) × SI(C, C’). The similarity between a composite security policy P = (P1∧P2) ∨ P3∨ ~P4 and an atomic security policy P’ is defined as follows: SI(P, P’)= (SI(P1, P’)× SI(P2, P’))+ SI(P3, P’)+(- SI(P4, P’)). The candidate policy for adaption can be a composite security policy. When searching for similar policies for the request in the collaborating organization, we only search for the atomic policies related to the requester. If there is a composite security policy, we can treat it as several atomic policies during our similarity calculation. Thus, we do not define the similarity between two composite security policies. Based on these definitions, our similarity-based security policy integration algorithm allows the data owner to make the decisions on whether to grant access to a requester from a participating organization based on similarity analysis of the two organizations’ security policies. Let P be the security policy specified by the data owner for a specific data dat. Let TS be a similarity threshold the data owner specified. A security policy P’ in the collaborating organization is selected by the data owner to help it make security decision only if SI(P, P’) is greater than TS. A greater TS will guarantee the selected policies to be closer to P, but if the selection criteria is too tight, it may not be able to return any policy which is certainly not the purpose of this process. The similarity-based security policy adaptation algorithm can be presented as follows: Input: a request where dat DatA , u UB Process: 1) Based on the request action and current situation, find the suitable policy, PA, in OrgA which is used to handle this kind of request if the user is from OrgA. 2) Find PS(u), which is the set of atomic policies regarding the requester u in OrgB. 3) For every P in PS(u), calculate SI(PA, P) 4) Select a subset PS’(u) of PS(u), where for every P in PS’(u), SI(PA, P) > TS 5) Divide PS’(u) into two subsets, PSP(u) and PSD(u), where PSP(u) is the set of rules that permit access for u, and PSD(u) is the set of rules that deny access for u. SI ( PA , P) ¦ SI ( PA , P) P¦ PPSP ( u ) PS D ( u ) 6) Calculate R u W , where W is a confidence min( SI ( PA , P), P PS '(u )) weight OrgA assigned to OrgB. W is originally set to 1, and every time if OrgA grants access for a user in OrgB based on this calculation and the user does not cause any security problem, W is increased by 1. On the other hand, if the user failed to meet the expectation, W is decreased by 1. Output: If R > 1, the request is granted, otherwise it is denied.
12
S.S. Yau and Z. Chen
The original similarity threshold can be generated based on sampling result of the collaboration with domain knowledge. If on the average, every node has N children, two policies with a similarity index in the range N-3 ~ N-4 are considered to be similar. At runtime, the threshold can be updated based on previous evaluation results to make it more suitable for a specific collaboration. The advantage of this approach is that this process can be automated and very efficient.
5 Negotiation-Based Conflict Reconciliation Protocol We consider two security policies P1 and P2 have conflict with each other if for their satisfactory sets Q1 and Q2, there exists an element e such that e ∈ Q1 and e ∈ Q2 , where Q1 and Q2 are satisfactory sets for P1 and P2 respectively, which contain all the elements that comply with their corresponding policies. As discussed in Section 1, due to the dynamic nature of collaborations among organizations in ubiquitous computing environments, when specifying its own security policies, an organization has difficulties in knowing what security policies have specified by other organizations which may collaborate with in the future. Thus, for a complex system, conflicts are hard to avoid. It is important that the collaborations do not undermine the security of all participating organizations. On the other hand, security should not deny participating organizations the possibility of collaboration simply because there exist some conflicting security policies. Our conflict reconciliation protocol can improve the collaboration chance. It is based on the following observations of real world scenario: it is difficult to find suitable collaborator and collaboration can generate extra synergies. Thus, instead of rejecting a possible collaborator because of some conflicting security policies, collaborating organizations often prefer to take certain risk and move forward with the collaboration by relaxing their security policies and adopting some weaker security policies to resolve the conflicts. Policy P1 is said to be weaker than P2 if P1 a Q1, P2 a Q2, and Q2 ⊂ Q1. Because each collaboration goal and collaborating participants may be different, the security compromise should not be done in a static manner. Our reconciliation protocol captures these dynamic factors as situation-aware compromise thresholds, which specify how much compromise an organization is willing to make for a specific collaboration. A compromise threshold, TC, is specified based on the following factors: (1) How much benefit can be generated if this collaboration goes through? (2) How much damage it may cause by adopting a weaker security policy? (3) What is the track record of this specific collaborating organization? How did it perform in pervious collaborations? The value of Tc should be updated when the above three factors change. Now, we will present the negotiation-based conflict reconciliation protocol as follows:
Security Policy Integration and Conflict Reconciliation
13
Step1 For a specific resource, each organization specifies a chain of security requirements instead of just one security requirement. The head of the chain is the most desirable security policy. Traveling through the chain from the head, the security requirements become weaker and weaker. Step2 Each organization specifies compromise thresholds under various situations as (TC,T), which means when situation T is true, the weakest security policy it is willing to take is indexed by TC along the chain. Step3 At runtime, the system obtains the compromise thresholds TC1 and TC2 for both organizations based on the current situation, and identifies the index for the two conflict policies in their chain. Test to see whether the current index is less than the compromise threshold. If not, go to Step6. Step4 One organization is randomly selected to make compromise first, and proceeds along the policy chain to select a weaker policy; Step5 Test whether there is still any conflict. If no conflict is detected, then reconciliation is done. If there is still conflict, repeat Step4 by proceeds along the other organization’s policy chain. Step6 Both organizations make compromise in turns, until one of the compromise thresholds is reached. If no reconciled policy is generated by then, failure is reported for human intervention.
For two conflict policies P1 and P2, the above steps can be summarized in the following pseudo-code. // sp1[] is the policy chain contains P1, and sp2[] is the policy chain contains P2 x1 = GetCompThreshold1(θ ); x2 = GetCompThreshold2(θ ); n1 = Sizeof(sp1); n2 = Sizeof(sp2); flag = 0; for (p=0, q=0; p≤n1, q ≤ n2; ) { if ((sp1[p] not conflict with sq2[q]) substitute P1 with sp1[p] and P2 with sp2[q], break; else{ if(flag == 0){ p++; flag = 1; }else{ q++; flag = 0; } } } if (p > n1 || q > n2) report the conflict cannot be automatically reconciled For the sake of simplicity, we only illustrate the reconciliation process between two collaborating organizations. The reconciliation process involving more than two organizations can be handled in a similar fashion: We will have multiple security policy chains and the compromise may be made in a loop circling all participating organizations, instead of going back and forth between two organizations.
14
S.S. Yau and Z. Chen
6 An Illustrative Example In this section, we will give an example to illustrate our approach. Continuing with the scenario described in Section 4, a university OrgA and a pharmaceutical company OrgB collaborate on a joint vaccine development project. UA is made up of a set of professors (UP), a set of graduate research assistants (UG) and a set of undergraduate students (UU). DatA includes some research papers (datAp), unpublished experiment results and paper drafts (datAo), and composition data on several promising chemicals, which the users in UA intend to use (datAq). UB consists of a set of research scientists (US), a set of lab technicians (UT) and several directors for research (UD). DatB includes patents of existing drugs related to this disease (datBn), a list of potential clinical trial participants (datBr), and a set of secret methods for fast vaccine prototype development (datBm). These users and data can be organized in a hierarchical manner as shown in Figure 3. Entity Is a
Is a
Resource
User
Is a
Data
Is a
dat Ap
Is a
Confidential Info
Is a
dat Bn
Is a
dat Ao
UD
dat Aq
Is a
Participant Is a
UG
Is a Is a
Is a
Supervisor
Is a
Is a
Pub Info
Is a
Is a Is a
Volunteer
Investigator
Is a
Is a
UU
Is a
UT
dat Bm
UP
Is a
US
DatBr
Fig. 3. The hierarchy of concepts of the example
To follow our approach, Part1), we have the following security policies specified: OrgA – (1) All members can access the published papers at anytime. P1: (UP ∪ UG ∪ UU, Access, datAp, __ ) (2) Professors can always access and modify the unpublished paper draft. P2: (UP, Access∪Modify, datAo, __ ) (3) Graduate assistants can always access the unpublished paper draft. However, they can only update the draft if they are the co-authors. P3: (UG, Access, datAo, __ ) ∧ (UG, Modify, datAo, C3), C3: Up ⊂ Author(datAo) OrgB – (1) All members can access the drug patents at anytime. P4: (UD ∪ US ∪ UT, Access, datBn, __ ) (2) Scientists and directors can access the trial participant list. P5: (UD ∪ US, Access, datBr, __ ) (3) Only directors can update the trial participant list. P6: (UD, Modify, datAo, __ )
Security Policy Integration and Conflict Reconciliation
15
Part 2), if a professor u from OrgA wants to access the list of trial participants, OrgB’s current policies cannot handle this request. Follow our similarity-based algorithm, we have 1) Because the request is to access datBr, P5 is the suitable policy for similarity comparison. We can transform P5 into P5’:(UD, Access, datBr, __ ) ∨ P5’’:(US, Access, datBr, __ ) 2) Select policies for professors in OrgA. We only consider the atomic component for professors if the policy also applies to other users. Thus, PS(u) = { P1’:(UP, Access, datAp, __ ), P2’:(UP, Access, datAo, __ ), P2’’:(UP, Modify, datAo, __ )}. 3) SI(P5, P1’) = SI(P5’, P1’) + SI(P5’’, P1’). SI(P5’, P1’) = SI(UD, UP) × SI(Access, Access) × SI(datBr, datAp) × SI( __, __) = 1/32 × 1 × 1/32 × 1 = 1/1024. Similarly, SI(P5’’, P1’) = 1/16. Thus SI(P5, P1’) = 65/1024. We can also calculate SI(P5, P2’) = 9/128, and SI(P5, P2’’) = 9/3200. 4) For this example, assume that OrgB sets the threshold Ts = N-4. Because on the average every node has 3 children, Ts = 1/81, and hence both P1’ and P2’ are selected. 5) Both P1’ and P2’ grant access to u, thus PSP(u) = { P1’ , P2’ } and PSD(u) = φ 6) Assume that this is the first time the two organizations collaborate. Then, W =1. R = (SI(P5, P1’)+ SI(P5, P2’)) / min (SI(P5, P1’), SI(P5, P2’)) = (65/1024+9/128) / min(65/1024, 9/128) = 137/65. Because R > 1, the access is granted to u. The system can make this decision automatically without any human supervision at runtime. Because professors in OrgA are the main contributors (Investigators) for this project, and professors can access confidential information in OrgA, it is reasonable for a human supervisor in OrgB to grant a professor from OrgA the access to the participant list. This decision is in line with the real world situation. Part3), since negotiation is used for conflict reconciliation as well as security policy generation for newly generated resources, we only need to show an example of the latter to illustrate this negotiation process. Assume that based on datAq, the collaboration developed several vaccine prototypes using the methods listed in datBm, and performed several clinical trials. We will have a new set of data datk which is the evaluation results of these vaccine prototypes. Neither OrgA nor OrgB can claim the ownership of datk completely. Thus, a security policy for datk will be generated through negotiation. Following our negotiation protocol discussed in Section 5, for this example, Step 1. OrgA presents its list of input policies sp1[], with the most desirable policy for at front. OrgB also presents its inputs sp2[]. Step 2. Both OrgA and OrgB specify their compromise thresholds for various situations as (TC,θ) pairs. Step 3. After analyzing the collaboration history and the importance of the data, the compromise thresholds for both organizations are 2. In the following we only list the input policies up to the threshold.
16
S.S. Yau and Z. Chen
OrgA: (1) Only investigators can access datk through a communication channel using 64-bit encryption. sp1[0]: (UI, Access, datk, “64-bit encryption” ) (2) Both investigators and participants can access datk through a communication channel using 64-bit encryption. sp1[1]: (UI ∪ UPa, Access, datk, “64-bit encryption” ) (3) Both investigators and participants can access datk. sp1[2]: (UI ∪ UI, Access, datk, __ ) OrgB: (1) Both investigators and participants can access datk through a communication channel using 128-bit encryption. sp2[0]: (UI ∪ UPa, Access, datk, “128-bit encryption” ) (2) Both investigators and participants can access datk through a communication channel using 64-bit encryption. sp1[1]: (UI ∪ UPa, Access, datk, “64-bit encryption” ) (3) Anyone in the collaboration can access datk. sp2[2]: (U, Access, datk, __ ) Both organizations start with their first choice, sp1[0] and sp2[0]. Obviously, there is a conflict since sp1[0] allows less investigators to access datk while sp2[0] grant access to both investigators and participants. Step 4. OrgA is randomly selected to initialize the negotiation. It will proceed along the input policy list and present the next policy sp1[1] which is weaker than sp1[0]. Step 5. After analyzing sp1[1] and sp2[0], there is still a conflict since sp2[0] requires access through 128-bit encryption while sp1[1] only requires 64-bit encryption. Hence, Step 4 is repeated. This time, OrgB needs to make its compromise in the negotiation process. The next policy in its input list, sp2[1], is presented. Step 6. After analysis, no conflict is found between the two input policies sp1[1] and sp2[1], and hence a security policy for datk is generated: (UI ∪ UPa, Access, datk, “64-bit encryption” ) Note that we have not reached the compromise threshold. Though there is no conflict between sp1[2] and sp2[2] either, the negotiation process stops as soon as a pair of non-conflict policies is found because they are stronger than the policies appearing later in the list. This will provide the best protection for the collaboration system without creating any conflict.
7 Conclusions and Future Work In this paper, we have presented an approach to security policy integration and conflict reconciliation for collaborations among organizations in ubiquitous computing environments. Based on the similarities between security policies, a resource owner can adapt its existing security policies to cover the new users from collaborating organizations using the similarity indices as a selection criterion for selecting similar policies in the collaborating organization and make a decision based on their inputs. Collaborating organizations with conflicting security policies can try to resolve the conflicts by gradually making security compromise and relaxing their security
Security Policy Integration and Conflict Reconciliation
17
policies until either the conflict is resolved or the conflict reconciliation process reaches certain compromise threshold associated with that particular collaboration. This approach can also be used for generating security policies for the new resources generated as a result of the collaboration, where all participating organizations specify their security concerns for the resource, and the negotiation process is trying to find a common set of policies that will satisfy all the organizations’ concerns. We plan to conduct case studies to establish some applicable guidelines for specifying the similarity threshold and updating the confidence weight in the decision making process. In addition, we will investigate the impact of extending the security policy set through logical reasoning and try to associate that impact with policy similarity so that we can update the resource owner’s security policies based on the similarity analysis, rather than relying on similar security policies specified by the collaborating organizations.
References [1] Xu, F., Lin, G., Huang, H., Xie, L.: Role-based Access Control System for Web Services. In: Proc. 4th Int’l Conf. on Computer and Information Technology, pp. 357–362 (2004) [2] Bhatt, R., Joshi, J.B.D., Bertino, E., Ghafoor, A.: Access Control in Dynamic XMLBased Web Services with XRBAC. In: Proc. 1st Int’l Conf. on Web Services (2003), http://www.sis.pitt.edu/~jjoshi/ICWS_XRBAC_Final_PDF.pdf [3] Yau, S.S., Chen, Z.: A Framework for Specifying and Managing Security Requirements in Collaborative Systems. In: Yang, L.T., Jin, H., Ma, J., Ungerer, T. (eds.) ATC 2006. LNCS, vol. 4158, pp. 500–510. Springer, Heidelberg (2006) [4] OASIS, eXtensible Access Control Markup Language (XACML) Version 2.0, OASIS standard (2005), http://docs.oasis-open.org/xacml/2.0/ access_control-xacml-2.0-core-spec-os.pdf [5] Bonatti, P.A., Sapino, M.L., Subrahmanian, V.S.: Merging Heterogeneous Security Orderings. In: Martella, G., Kurth, H., Montolivo, E., Bertino, E. (eds.) ESORICS 1996. LNCS, vol. 1146, pp. 183–197. Springer, Heidelberg (1996) [6] Bidan, C., Issarny, V.: Dealing with Multi-policy Security in Large Open Distributed Systems. In: Quisquater, J.-J., Deswarte, Y., Meadows, C., Gollmann, D. (eds.) ESORICS 1998. LNCS, vol. 1485, pp. 51–66. Springer, Heidelberg (1998) [7] Backes, M., Durmuth, M., Steinwandt, R.: An Algebra for Computing Enterprise Privacy Policies. In: Samarati, P., Ryan, P.Y.A., Gollmann, D., Molva, R. (eds.) ESORICS 2004. LNCS, vol. 3193, pp. 33–52. Springer, Heidelberg (2004) [8] Ribeiro, C., Zuquete, A., Ferreira, P.: SPL: An access control language for security policies with complex constraints. In: Proc. Network and Distributed System Security Symp (NDSS 2001), pp. 89–107 (2001) [9] Lupu, E., Sloman, M.: Conflicts in policy-based distributed systems management. IEEE Trans. on Software Engineering 25(6), 852–869 (1999) [10] Agrawal, D., Giles, J., Lee, K.W., Lobo, J.: Policy ratification. In: Proc. 6th IEEE Int’l Workshop on Policies for Distributed Systems and Networks (POLICY), pp. 223–232 (2005) [11] Yau, S.S., Huang, D., Gong, H., Yao, Y.: Support for Situation-Awareness in Trustworthy Ubiquitous Computing Application Software. J. Software Practice and Engineering 36(9), 893–921 (2006)
18
S.S. Yau and Z. Chen
Appendix Ontology-Based Security Policy Specification Our approach uses ontology-based security policy specification language in [3]. For the sake of completeness, we summarize the important features of the specification language. Natural language security policy specifications can be expressed by a set of logic expressions with the key vocabulary derived from a hierarchical security policy ontology. The ontology is shown in Fig. 4, where the generic classes in the core ontology are defined as follows: • SecurityPolicy can be an AtomicSecurityPolicy or CompositeSecurityPolicy. • CompositeSecurityPolicy is composed of AtomicSecurityPolicy. • AtomicSecurityPolicy has four components: Subject, Object, Action and Condition. • Both Subject and Object are Entity. • Entity can be User or Resource, and Data is also considered as Resource. • Action specifies what action Subject can perform on Object, which includes Access, Share, Delegate, Delete and Modify. • Condition can be specified as a specific SecurityMechanism, which must be adopted, or as a certain environment or system Situation, which will trigger the security policy. The value of a Situation expression is determined by situationaware processors [11].
Security Policy
compose
Is a
Is a
Composite Security Policy
Has
Atomic Security Policy
Has
Subject Is a
Is a
Condition
Has Has Is a
Action
Is a
Resource
Is a
Is a Is a
User Is a
Data
Is a
Security Mechanism
Object
Entity
Is a
Access
Is a
Share
Situation
Is a
Modify
Delegate Delete
Fig. 4. An ontology for security policy specification
This ontology provides the fundamental vocabulary for specifying security policies. However, users can extend the basic concepts and introduce more applicationspecific terms when specifying their own security policies. The basic building block is the atomic security policy, which can be specified as a quadtuple: P = (S, A, O, C), where S ⊂ Subject, A ⊂ Action, O ⊂ Object, and C ⊂ Condition. The quadtuple is interpreted as follows: Security policy P specifies that a set of subjects S can perform
Security Policy Integration and Conflict Reconciliation
19
a set of actions A on a set of objects O when a set of conditions C is met. A request is granted if there exists a policy P = (S, A, O, C) such that the requester s ∈ S, the requested action a ∈ A, the requested target o ∈ O and the conditions listed in C are satisfied when the request is made. If C is omitted, it means the policy applies to all conditions. Atomic security policies can be composed through logic operators conjunction ‘∧’, disjunction ‘∨’ and negation ‘¬’ to form more complex security policy specifications. Based on these definitions, for a give set of security policies, we can derive a satisfactory set Q which contains all the elements that comply with the given set of policies. We use P a Q to denote this logic reasoning process. This specification approach has two advantages. First, because all four elements within a specification are specified based on the ontology, each element is clearly defined. There is only one interpretation for a quadtuple, and thus a specified security requirement is unambiguous. Second, the hierarchical structure of the ontology can be used to establish common understanding of different terms used by different organizations in their security policy specifications. For example, one policy specifies “administrator can only access the account information through a 128-bit encrypted channel”, and the other policy specifies “administrator must use a 128-bit encrypted channel to access the username/password information”. Based on the ontology we used in our specifications, both “account information” and “username/password information” are Objects. If the two collaborating organizations establish a common understanding that “username/password” used by OrgA is the “account information” used by OrgB, the system will be able to “understand” this fact and process the policies from both organizations based on this fact accordingly. In this example, the two security policies will be treated as the same policy. The algorithm to bridge different terms used by different organizations and form common understandings was presented in [3].
Improved Weighted Centroid Localization in Smart Ubiquitous Environments Stephan Schuhmann1 , Klaus Herrmann1 , Kurt Rothermel1 , Jan Blumenthal2 , and Dirk Timmermann2 1 University of Stuttgart, Germany Institute of Parallel and Distributed Systems (IPVS) {firstname.lastname}@ipvs.uni-stuttgart.de 2 University of Rostock, Germany Institute of Applied Microelectronics and CE {jan.blumenthal,dirk.timmermann}@uni-rostock.de
Abstract. Location-awareness is highly relevant subject in ubiquitous computing, as many applications exploit location information to provide adequate services or adapt to a changing physical environment. While GPS provides reliable outdoor localization, indoor positioning systems present a bigger challenge. Many indoor localization systems have been proposed. However, most of them rely on customized hardware or presume some dedicated infrastructure. In this paper, we focus on WLAN-based localization in smart ubiquitous environments. We propose an improved scheme of the Weighted Centroid Localization (WCL) algorithm that is robust and provides higher location accuracy than the original WCL algorithm. The improvements are based on the use of dynamic weighting factors that are solely dependent on the correlation of the Received Signal Strength Indicators of the received beacon signals. Compared to the original WCL scheme, our approach does not increase requirements to the environment. Real-world experiments in a typical environment that we report on in this paper confirm that the increased location accuracy determined in previous calculations is reproducible in a realistic noisy environment. This provides a simple, cost-efficient, and battery-conserving, but yet adequate technique for getting the accurate location information of mobile devices.
1
Introduction
Location-awareness is an essential service for many wireless ubiquitous computing scenarios, as many applications integrate location information to increase context knowledge. However, ubiquitous computing environments have specific characteristics which limit the approaches that are applicable to maintain location-awareness. For instance, pervasive applications have to deal with fluctuating availability of devices due to mobility and network failures. Thus, environmental contexts are highly dynamic. Furthermore, processor performance and available energy of mobile nodes in ubiquitous computing scenarios are often limited. Therefore, intensive communication and computation tasks are not F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 20–34, 2008. c Springer-Verlag Berlin Heidelberg 2008
Improved Weighted Centroid Localization
21
feasible. In this context, algorithms are subject to strict requirements covering reduced memory consumption, communication, and processing time. In Smart Environments, available infrastructure devices can provide additional support for mobile devices. Determining the position of nodes in wireless networks, particularly in noisy environments, represents a real challenge. To identify the exact coordinates of a device (also called Station Of Interest, or SOI ) requires measuring a distance, e.g., measuring Time of Arrival (ToA) or Time Difference of Arrival (TDoA). Difficulties concerning time measurements result from synchronization of involved devices as well as the high computational effort to calculate the position. Measuring the Received Signal Strength Indicator (RSSI) offers a possibility to realize distance determination with minimal effort. A good localization algorithm should calculate a position as fast as possible and should be resistant to environmental influences as well as imprecise distances. However, it is desirable to use standard off-the-shelf consumer products without the necessity for far-reaching customization as this reduces the effort necessary for setting up the positioning system. Thus, our approach of combining the Weighted Centroid Localization (WCL) [1] with standard Wireless LAN technology has high practical relevance in typical environments. The main contribution of this paper is an improved Weighted Centroid Localization scheme that uses weighting factors of dynamic degrees to increase location accuracy in smart ubiquitous environments. Therefore, we derived optimal dynamic weighting factors through theoretical calculations and approximated the obtained weighting factors by a power function using the least-squares method. Furthermore, we evaluated the improved localization scheme in an office room that represents a typical noisy ubiquitous computing environment. The measurements confirmed that our approach increases location accuracy almost to the best possible value for weighted centroid schemes. The paper at hand is divided into six sections. The second section discusses the requirements to be met by the localization scheme and the environment. Section 3 gives a broad survey of existing localization approaches. In Section 4, we at first describe the original WCL scheme. Following, we derive an improved WCL scheme that uses dynamic weighting factors and increases location accuracy. We present our theoretical and practical evaluation results in Section 5. This is followed by the conclusion and a brief outlook to future work in Section 6 which closes this paper.
2
Requirements
In this paper, we focus on indoor localization for resource-restricted devices in smart ubiquitous environments. This requires to avoid high battery consumption, as mobile devices like PDAs or smart phones are equipped with limited battery and computation power. Hence, we aim at minimizing the requirements for the localization devices as well as the infrastructure to allow applicability to a wide
22
S. Schuhmann et al.
range of scenarios. This means we want to provide a flexible solution for indoor localization. Thus, our system has to maintain the following properties: – Minor Environmental Requirements: The environment solely has to provide infrastructure nodes with fixed positions that send periodic beacon signals. Many realistic ubiquitous scenarios fulfill this property, as they include 802.11 WLAN Access Points (APs), for example. These devices have to know their exact position and notify other devices of their location, e.g. within the beacon signals which needs slight adaptations to the APs. If changes to the APs are undesired, AP positions can be received by queries to a location server that provides the clients with AP locations. – Minimal Costs: Additional hardware costs should be avoided by relying on standard components that are available in typical ubiquitous scenarios, both on the client and on the infrastructure side. For instance, the localization scheme ought to integrate existing infrastructure devices without the need of large changes in their configuration. To minimize costs, we rely on exactly four beacon nodes in this paper. These beacons are placed at the edges of a rectangular plain, as common rooms typically have a rectangular base. This AP positioning enables nodes to localize themselves in the whole region. – Minimal Communication Needs on the Client Devices: Since power supply is critical for small mobile devices, the clients should avoid any additional communication during the localization process to maximize battery lifetimes. So, only the infrastructure devices should send beacons needed for the localization process, while the mobile devices have to exploit the information provided by these beacons. As localization is supposed to be executed in various scenarios in real-time, a solution that demands prior configuration of the client devices is not feasible, as this limits the use of the system. – Applicability in Realistic Ubiquitous Scenarios: Contrary to laboratory environments, realistic scenarios often suffer from effects like multi-path propagation or interferences with other wireless networks operating on the same frequency spectrum. This can significantly influence the accuracy of the localization. A localization scheme must be able to cope with this interference in order to provide accurate localization nevertheless.
3
Related Work
In this section, we investigate several related localization approaches concerning the above requirements to confine WCL from these approaches, and we clarify why they prove inadequate in our scenarios. In Figure 1, we provide a classification of proposed schemes. At first, these schemes can be divided into those that are based on coordinates and those that are coordinate-free. Coordinate-free schemes, as presented by Fekete et al. [2], focus on an abstract way of location-awareness. These schemes aim at achieving consistent solutions by the use of geographic clustering. Unfortunately, they rely on rather dense sensor networks to achieve adequate accuracy and, therefore, are not usable here. Thus, we focus on coordinate-based approaches that can further be divided into
Improved Weighted Centroid Localization
23
Fig. 1. Classification of proposed localization schemes
range-free and range-based schemes. Range-free schemes comprise implicit distance measurements, while range-based schemes use explicit distances for localization. Regarding range-free schemes, there exist three main approaches: – Anchor-based approach: This approach uses anchor beacons, containing two-dimensional location information (xi , yi ), to estimate node positions. After receiving these beacons, a node estimates its location using a specific formula. Besides a scheme proposed by Pandey et al. [3], another anchorbased approach is Centroid Localization (CL), proposed by Bulusu et al. [4]. In this approach, the SOI calculates its position as the centroid of the coordinates from the stations whose beacons are received. While CL performs only averaging the coordinates of beacons to localize SOIs, the Weighted Centroid Localization (WCL) algorithm [1] uses weights to ensure an improved localization and, hence, represents a direct advance over CL. – Hop-based approach: In this type of localization scheme, the SOI calculates its position based on the hop distance to other nodes [5],[6]. This scheme delivers an approximate position for all nodes in networks where only a limited fraction of nodes have self-positioning capability. Through this mechanism, all nodes in the network (including other anchors) get the shortest distance, in hops, to every anchor. However, this approach only works well in dense networks and relies on flooding of messages which produces high communication overhead that is undesired in our scenarios. – Area-based approach: Area-based schemes [7],[8] perform location estimation by isolating the environment into triangular regions between beaconing nodes. The presence inside or outside of these triangular regions allows a node to narrow down the area in which it can potentially reside. Area-based approaches normally perform particularly well in heterogeneous networks with high node density. Unfortunately, this condition does not hold in our case and, thus, area-based approaches cannot be used here.
24
S. Schuhmann et al.
Besides range-free schemes, a couple of approaches exist that are based on the transmission ranges of the nodes. Range-based approaches can be subdivided into the following approaches: – Signal Propagation Time: The Time of Arrival (ToA) technology is commonly used as a means of obtaining range information via signal propagation time. This approach is used by satellite navigation systems [9]. Though this approach guarantees high precision and can be globally used, it suffers from several major drawbacks as this scheme can only be used outdoors, needs expensive hardware, and the receivers consume much energy. Thus, ToA approaches are inapplicable in our case. – Localization by Measuring Signal Propagation Time Differences and Arrival Angles: This technology is based on measurements of the Time Difference of Arrival (TDoA) [10], or the Angle of Arrival (AoA) [11]. While TDoA estimates the distance between two communicating nodes (this is also called ranging), AoA allows nodes to estimate and map relative angles between neighbors. Like ToA technology, TDoA and also AoA rely on special hardware that is expensive and energy consuming. Thus, these approaches are unsuited here. – Simple Signal Strength Measurements: This approach uses the Received Signal Strength Indicator (RSSI) of incoming beacon signals and has been proposed for hardware-constrained systems. Contrary to techniques like ToA, TDoA or AoA, RSSI is the only feature that is measurable with reasonably priced current commercial hardware. RSSI techniques use either theoretical or empirical models to translate signal strengths into distance estimates where each point is mapped to a signal strength vector [12], or to a signal strength probability distribution [13]. This technology suffers from problems such as background interference or multi-path fading which make range estimates inaccurate, as shown in [14]. Furthermore, many signal strength based approaches rely on special hardware, like small infrared badges [15], magnetic trackers [16], or multiple cameras [17], which usually are not included in typical environments and mobile devices. Some approaches do not use client-based, but sniffer-based localization [18]. However, this technique cannot be used here as it assumes that the access points can localize other devices which induces major changes in the APs. Many approaches use location fingerprinting for localization in wireless networks [12],[13]. This method consists of a training phase and a positioning phase. Unfortunately, this approach needs additional preconditioning and the creation of a training database for several environments beforehand, which makes it inapplicable in our case. In summary, one can see that most approaches cannot be chosen since they rely on extensive special hardware, need additional preconditioning to build a training database, use sniffer-based localization, or depend on high node densities to perform well. This narrows the huge field of localization schemes down to few anchor-based approaches. Among those, Weighted Centroid Localization uses more fine-grained parameters than CL to weight the received beacon signals, as
Improved Weighted Centroid Localization
25
CL simply uses binary weighting (weight 1 for those APs whose beacons were received, weight 0 for the others). This ensures higher location accuracy for WCL.
4
Improved Weighted Centroid Localization
After a short introduction to the original WCL scheme with static weighting factors [1], we propose an advanced scheme that uses dynamic weighting factors which are based on the correlation of the received RSSI values. 4.1
Static Degree Weighted Centroid Localization (SWCL)
We assume that the network consists of SOIs that want to localize themselves, and n beacons. In our case, beacons are WLAN access points whose position is assumed to be known exactly. We depend on a small number of n = 4 beacons, placed at the edges of a rectangular plain. The SOIs are mobile devices that initially do not know their own position. Algorithms such as CL use centroid determination to calculate a device’s position [4]. In the first phase, each beacon Bj sends its position (xj , yj ) to all nodes within its transmission range, which can simply happen through the periodically sent beacon signals. In the second phase, the SOI Pi calculates an approximation (xi , yi ) of its real position (xi , yi ) by a centroid determination from all n positions of the beacons in range (1). 1 (xj , yj ) n j=1 n
(xi , yi ) =
(1)
The localization error fi for the SOI Pi is defined as the distance between the exact position (xi , yi ) and the approximated position (xi , yi ) of Pi (2). fi = (xi − xi )2 + (yi − yi )2 (2) While CL only averages the coordinates of beacon devices to localize SOIs, WCL uses weights to ensure an improved localization. In CL, all weights of received beacon signals are implicitly equal to 1. WCL represents a generalization of this scheme since it introduces variable weights wij for each device Pi and each beacon Bj . These weights depend on the distance between the two as it will be explained later in this section. In the more general WCL equation, the number of beacons n is replaced by the sum of weight factors wij , and each beacon position (xj , yj ) is multiplied by its weight factor wij . Hence, (1) is expanded to the WCL formula (3) yielding the new approximation (xi , yi ) for Pi ’s position. n j=1 wij · (xj , yj ) n (xi , yi ) = (3) i=1 wij The weight wij is a function depending on the distance and the characteristics of the SOI’s receivers. In WCL, shorter distances cause higher weights. Thus, wij
26
S. Schuhmann et al.
and dij (the distance between beacon Bj and SOI Pi ) are inversely proportional. As an approximation, the correlation is equivalent to the function 1/dij . To weight longer distances marginally lower, the distance is raised to a higher power hs . For a concentric wave expansion with a linear characteristic of the receiver and a uniform density of the beacons, we form (4). wij =
1 (dij )hs
(4)
The static degree hs has to ensure that remote beacons still impact the position determination. Otherwise in case of a very high hs , the approximated position moves to the closest beacon’s position and the positioning error fi increases. There exists a minimum of fi where hs is optimal [19]. However, as the next section clarifies, this use of static weight values leads to suboptimal accuracy if you compare it to the use of adapted dynamic weight factors that are based on the correlation between the received RSSIs from several APs. According to Friis’ free space transmission equation (5), the detected signal strength decreases quadratically with the distance to the sender. Prx = Ptx · Gtx · Grx · (
λ 2 ) 4πdij
(5)
Prx = Remaining power of wave at receiver Ptx = Transmission power of sender Gtx , Grx = Gain of transmitter and receiver λ = Wave length dij = Distance between sender and receiver In embedded devices, the received signal strength is converted to the RSSI which is defined as ratio of the received power to the reference power (Pref ). Typically, the reference power represents an absolute value of Pref = 1 mW . The actually received power Prx can be obtained by the RSSI and vice versa, as denoted in the following equations. Prx = Pref · 10
RSSI 20
⇐⇒
RSSI = 20 · log10
Prx Pref
(6)
An increasing received power results in a rising RSSI. Thus, distance dij is indirect proportional to Prx , and (7) with power gs ( = hs ) is formed. gs wij = (Prx )
(7)
So, it can be seen that the knowledge of dij is not necessary, as the algorithm can simply use the received power Prx or, alternatively, the RSSI. In this paper, we use the RSSI, i.e. we take (6) and (7), and form (8). wij = (Pref · 10
RSSI 20
)gs .
(8)
Improved Weighted Centroid Localization
AP3
R1,1 AP1
R1,2
R2,1 l1
AP4 AP1
AP2
a)
l2
AP3
R1,1
R2,2
l1
R2,2
R1,2
R2,1 AP2
27
AP4
l2
b)
Fig. 2. a) Distribution of gopt at a rectangular plain (p = 43 ), b) Potential for improvement of location error at a rectangular plain (p = 43 ), relative to longer side l1
In practical scenarios, the ideal distribution of Prx does not hold, because the radio signal propagation is interfered with many influencing effects, like reflection on objects, diffraction at edges, or superposition of electro-magnetic fields. Thus, Prx and Prx represent different values which yields localization errors. 4.2
Dynamic Degree Weighted Centroid Localization (DWCL)
Using an optimally chosen static degree gs , as original WCL does, results in improved location accuracy, compared to CL [1]. However, the degree gopt that leads to minimal error can significantly differ from gs in some regions, as this section will clarify. Therefore, we used (5) and (6), and put Prx = Prx to calculate the expected RSSI in various positions within an idealized region under observation. Following, we used the WCL algorithm with different powers g to determine the optimal gopt which yields highest location accuracy at a particular place in the region. Figure 2a shows the distribution of the factors gopt in a rectangular region where the proportion p of the length l1 of the longer side divided by the length l2 (≤ l1 ) of the shorter side is p = ll12 = 43 . It can be seen that the distribution of gopt is very uneven. The values at the regions R1,1 and R1,2 exactly between two adjacent anchors with a distance of l1 to each other (e.g., AP1 and AP3 ) are relatively high (up to 5.0), as well as at those regions R2,1 and R2,2 between two anchors with a distance of l2 to each other (e.g., AP1 and AP2 ), where the maximum gopt is approximately 2.5. In the special case of a quadratic plain (l1 = l2 , i.e. p = 1), the distribution of gopt is symmetric to the center of the region. By selecting the determined optimal dynamic powers shown in Figure 2a, the mean error could be decreased by about 30 % (compared to WCL with an optimal static weight factor of gs = 0.87 in a region with p = 43 ) to the best possible value for WCL. Thus, it is obvious that there is huge potential for improvements by using dynamic weight factors. Figure 2b shows this potential which represents the difference between the error if gs is used, and the minimum possible error if
28
S. Schuhmann et al.
the optimal gopt is used. This figure clarifies that the potential for improvements is alike very unevenly distributed. Particularly the regions Ri,j (i, j ∈ {1, 2}) offer an enormous potential of more than 10 % of l1 for improvements, which corresponds to the noticeably increased optimal weight factor at these regions, as Figure 2a shows. It is obvious that the influence of the two farthest access points was too high there, as the value of gs was chosen too low. This shifted the calculated position near the centre of the region. The proceeding of DWCL to obtain the optimal dynamic weighting factor gd is the following: At first, the SOI has to identify the approximate subregion it is located in. Then, it has to lookup subregion-specific parameters. Now, the SOI can at first calculate the value of a balancing function and, based on this result, the approximated actual dynamic weight gd and its location within the whole region under observation. We will focus on these steps in the following. Subregion Determination. First of all, a SOI has to determine the approximate subregion in which it is located. This has to happen by solely regarding the RSSIs of incoming beacon signals. Figure 2a suggests to define four different subregions, as there exist four peaks for the optimal values of g. A simple way to divide the region in adequate subregions is using the perpendicular bisectors of the sides from AP1 to AP4 and from AP2 to AP3 , as illustrated in Figure 3. A SOI then can easily determine in which of the emerging subregions C1,1 , C1,2 , C2,1 , and C2,2 it is located. This is performed by simply comparing the RSSIs pairwise, as denoted in Table 1. Table 1. Determination of subregions RSSI constraints
Subregion
(RSSI(AP1 ) > RSSI(AP4 )) ∧ (RSSI(AP3 ) > RSSI(AP2 )) (RSSI(AP4 ) > RSSI(AP1 )) ∧ (RSSI(AP2 ) > RSSI(AP3 )) (RSSI(AP1 ) > RSSI(AP4 )) ∧ (RSSI(AP2 ) > RSSI(AP3 )) (RSSI(AP4 ) > RSSI(AP1 )) ∧ (RSSI(AP3 ) > RSSI(AP2 ))
C1,1 C1,2 C2,1 C2,2
Determination of Subregion-Specific Parameters. After determining the subregion it is located in, a SOI needs to obtain further parameters which we will derive in this paragraph. Regarding the beacon signals received within Ri,j , the RSSIs have several special characteristics there: – The RSSIs of the two strongest received signals are almost equal, as the distances from the SOI to the corresponding access points, respectively, are almost the same. The same holds for the two weakest signals. – At those points in Ri,j where gopt is maximal, the quotient Qi (i ∈ {1, 2}) of the distance to the far APs by the distance to the close APs can be calculated by Pythagoras’ formula, as depicted in Figure 3. l22 + ( 12 l1 )2 l12 + ( 12 l2 )2 x1 x2 Q1 = = ; Q = = (9) 2 1 1 y1 y2 2 l1 2 l2
Improved Weighted Centroid Localization
29
Fig. 3. Calculation of parameters Q1 and Q2
Fig. 4. Typical values in common rooms a) for D1 and D2 , b) for a1 , a2 , b1 , and b2
According to Equation 6, the RSSI differences between the received signals from the corresponding APs are Di = 20 · log10 Qi , i ∈ {1, 2}.
(10)
Thus, a SOI only needs to know l1 and l2 to achieve the parameters Qi and Di . The values of l1 and l2 can easily be obtained by the fixed AP positions. The SOI has to use D1 if it is situated in region C1,1 or C1,2 . Otherwise, it has to choose D2 , as it is situated in region C2,1 or C2,2 then. Common values for Di are shown in Figure 4a. For example, in a region with p = 43 , the calculation results for Di are D1 = 9.0908 dB and D2 = 5.1188 dB. The differences between the received RSSIs can generally be expressed as follows: Δk = RSSIk − RSSIk+1 , k ∈ {1, 2, 3} where RSSIk represents the k-th strongest received RSSI. Within regions Ri,j , the values of Δ1 and Δ3 are typically very low (close to 0), while those of Δ2 are close to D1 (in R1,j ) or D2 (in R2,j ).
30
S. Schuhmann et al.
Fig. 5. Distribution of calculated gi (S) in theory (p = 1, i.e. g1 (S) = g2 (S))
Balancing Function. To differ if a SOI is located in Ri,j where gopt is considerably high, we decided to define the following balancing function S that is based on the special RSSI characteristics in Ri,j that were mentioned above: S = Δ1 + |Di − Δ2 | + Δ3 , i ∈ {1, 2}
(11)
As already mentioned, Δ1 and Δ3 are close to zero within Ri,j , while Δ2 depends on the parameter p and is around Di in Ri,j . Thus, (Di − Δ2 ) is around zero in Ri,j . Following, S is minimum within Ri,j and increases outside of these regions. Hence, the use of S helps to state more precisely the location of the SOI. Approximation. We calculated S for various positions in the whole region, and compared the values with those of the optimal gopt at this position. The distribution of S dependent on g is shown in Figure 5 and suggests to approximate gd by the power function that is denoted in (12). ai gd = gi (S) = bi , i ∈ {1, 2} (12) S with the unknown parameters a1 and b1 if the SOI is within C1,1 or C1,2 , and with a2 and b2 if the SOI is located in C2,1 or C2,2 . Now, we approximated the power functions with the least-squares method for various values of p and got the parameters ai and bi as denoted in Figure 4 that yield minimum root mean square (RMS) errors. For example, in case of p = 1, we got a1 = a2 ≈ 1.580 and b1 = b2 ≈ 0.248 which led to a very small RMS error of only 0.002 which is a small fraction of the RMS error of SWCL with optimal static weight.
5
Evaluation
First, we present the theoretical results of DWCL and compare them with those of SWCL. Then, we concentrate on the indoor tests with 802.11 WLAN access
Improved Weighted Centroid Localization
31
Fig. 6. Evaluation results of theoretical calculations 3·108 m
points, operating at a wavelength of λ = fc = 2.44·109sHz ≈ 0.123 m, and compare our practical results with the calculated ones. 5.1
Theoretical Results
We compared the calculated location accuracies of DWCL and SWCL with the results we got by choosing the optimal gopt at each position within the observed region. The results are shown in Figure 6. The mean error of SWCL at a quadratic room (p = 1) is 7.6 %, while the mean error of DWCL is 5.3 % and, hence, very close to the best possible WCL value of 5.1 %. Regarding the maximum error within the region, DWCL and the WCL optimum are almost equal, while SWCL’s accuracy is worse by 2.5 percentage points. The results for a value of p = 2 are quite similar, even though DWCL’s mean gain decreases a bit. 5.2
Indoor WLAN Tests
We implemented the WCL scheme on a common smart phone1 which served as a client device. The infrastructure beacons were represented by standard 802.11 access points2 . The transmission power was put to the maximum possible value of 50 mW with a fixed bandwidth of 11 Mbit/s at 802.11b mode. According to [20], this provides a maximum range of about 67 meters in indoor scenarios. Figure 7a shows the setting for our indoor tests. We used four beacons, A1 to A4 , which were placed in the corners of a quadratic plain of 4.8 m x 4.8 m (i.e., p = 1) in a first test, and a rectangular plain of 5.8 m x 3.2 m (i.e., p ≈ 1.81) in a second test. The tests have been performed in a wall-enclosed office within a 1 2
T-MobileTM MDA Pro with PXA 270 CPU (520 MHz), 48 MB RAM, IEEE 802.11b. CiscoTM Aironet 1200, [20].
32
S. Schuhmann et al.
Fig. 7. a) Indoor test setup, b) Distribution of calculated gd at indoor tests (p = 1)
building where interferences with other WLANs occured. Furthermore, typical office furniture like chairs, desks, and a flip chart were available in the room. The SOI and the APs were placed on a plain table. We deactivated rate shifting on the APs to guarantee fixed transmission powers and performed 20 localization processes at various positions within the region, respectively. Then, we compared the averaged calculated positions with the real ones to determine the location errors. Figure 7b displays the values of the balance function S at the test positions and the approximated power function for p = 1. The mean square error (0.330) is
Fig. 8. Evaluation results of indoor tests
Improved Weighted Centroid Localization
33
higher than in theory as the signals suffered from noise and reflections. However, this error is still about 33.7 % lower than that of SWCL with gs = 0.93. Figure 8 presents a comparison of the mean errors of DWCL, SWCL, and calculations with gopt in these tests. It states that for p = 1, DWCL performs about 0.7 percentage points worse than the theoretical WCL optimum, but at the same time about 2.3 percentage points better than SWCL regarding the mean error, which implies it uses most of the existing potential for improvements. Concerning worst cases, DWCL’s difference to the optimum in accuracy is close to zero, while SWCL performs about 4.2 percentage points worse than the optimum. These results confirm the increased location accuracy by dynamic weighting factors, particularly regarding worst case accuracies in rooms that are almost quadratic where DWCL chooses nearly optimal values for gd .
6
Conclusion and Outlook
In this paper, we proposed an improved Weighted Centroid Localization scheme called DWCL in combination with 802.11 access points as a simple, cost-efficient, applicable, and accurate localization scheme for ubiquitous computing scenarios. We introduced the problem domain, discussed the requirements, and presented the original WCL scheme with static weight factors (SWCL). Following, we developed an improved WCL scheme called DWCL that uses dynamic weighting factors which depend on the RSSIs to increase location accuracy. From our experiments, we conclude that DWCL represents a very effective and yet simple method for indoor localization, as it implies almost optimal location accuracy for weighted centroid schemes with rectangular anchor positioning. It does not pose additional requirements to the clients as well as to the infrastructure beacons. Standard WLAN access points suffice for achieving acceptable localization errors, and the computational burden put on mobile client devices is mild. Due to the fact that WLAN-based WCL can be used in environments with off-the-shelf WLAN hardware, it is appropriate for achieving ad hoc localization in many ubiquitous smart environments with minimal prior setup efforts. The research group at the University of Stuttgart currently investigates if this improved WCL scheme is applicable for localization across several rooms with larger distances between the access points, and alternative positionings of the access points. The research team at the University of Rostock concentrates on the reduction of the positioning error that is mainly caused by border effects. Their current focus is the adaptation of weight functions according to the number of known beacons and the adjusted transmission range of the beacons.
Acknowledgement This work is funded by the German Research Foundation (DFG) within Priority Programme 1140 - Middleware for Self-organizing Infrastructures in Networked Mobile Systems.
34
S. Schuhmann et al.
References 1. Blumenthal, J., Reichenbach, F., Timmermann, D.: Position Estimation in Ad hoc Wireless Sensor Networks with Low Complexity. In: Joint 2nd Workshop on Positioning, Navigation and Communication and 1st Ultra-Wideband Expert Talk, pp. 41–49 (March 2005) 2. Fekete, S.P., Kr¨ oller, A., Buschmann, C., Fischer, S., Pfisterer, D.: Koordinatenfreies Lokationsbewusstsein. it - Information Technology 47(2), 70–78 (2005) 3. Pandey, S., Anjum, F., Kim, B., Agrawal, P.: A low-cost robust localization scheme for WLAN. In: Proceedings of the 2nd Int.’l Workshop on Wireless Internet (2006) 4. Bulusu, N., Heidemann, J., Estrin, D.: GPS-less Low Cost Outdoor Localization For Very Small Devices. IEEE Personal Communications Magazine 7(5), 28–34 (2000) 5. Niculescu, D., Nath, B.: DV Based Positioning in Ad hoc Networks. Journal of Telecommunication Systems (2003) 6. Nagpal, R.: Organizing a global coordinate system from local information on an amorphous computer. Technical Report AI Memo No. 1666, MIT A.I. Laboratory (1999) 7. He, T., et al.: Range-free localization and its impact on large scale sensor networks. Transactions on Embedded Computing Systems 4(4), 877–906 (2005) 8. Elnahrawy, E., Li, X., Martin, R.P.: Using Area-Based Presentations and Metrics for Localization Systems in Wireless LANs. In: Proceedings of the 29th IEEE International Conference on Local Computer Networks (LCN 2004), Washington, USA (2004) 9. Parkinson, B.W., Spilker Jr., J.J.: Global Positioning System: Theory and Applications, Vol. 1. In: Progress in Astronautics and Aeronautics, American Institute of Aeronautics and Astronautics, vol. 163, pp. 29–56 (1996) 10. Harter, A., Hopper, A., Steggles, P., Ward, A., Webster, P.: The Anatomy of a Context-Aware Application. Mobile Computing and Networking, 59–68 (1999) 11. Niculescu, D., Nath, B.: Ad hoc positioning system (APS). In: Proceedings of GLOBECOM, San Antonio (November 2001) 12. Bahl, P., Padmanabhan, V.N.: RADAR: An In-Building RF-Based User Location and Tracking System. In: INFOCOM, vol. (2), pp. 775–784 (2000) 13. Youssef, M., Agrawala, A., Shankar, U.: WLAN Location Determination via Clustering and Probability Distributions. In: Proceedings of IEEE PerCom 2003 (March 2003) 14. Ganesan, D., Krishnamachari, B., Woo, A., Culler, D., Estrin, D., Wicker, S.: Complex Behavior at Scale: An Experimental Study of Low-Power Wireless Sensor Networks. Technical Report CSD-TR 02-0013, UCLA (February 2002) 15. Want, R., Hopper, A., Falc˜ ao, V., Gibbons, J.: The Active Badge Location System. ACM Transactions on Information Systems (TOIS) 10(1) (1992) 16. Paperno, E., Sasada, I., Leonovich, E.: A New Method for Magnetic Position and Orientation Tracking. IEEE Transactions on Magnetics 37(4) (July 2001) 17. Krumm, J., Harris, S., Meyers, B., Brumitt, B., Hale, M., Shafer, S.: Multi-Camera Multi-Person Tracking for EasyLiving. In: Proceedings of the 3rd IEEE International Workshop on Visual Surveillance (VS 2000), Washington, USA (2000) 18. Ganu, S., Krishnakumar, A., Krishnan, P.: Infrastructure-based location estimation in WLAN networks. In: Proceedings of IEEE WCNC 2004 (March 2004) 19. Blumenthal, J., Grossmann, R., Golatowski, F., Timmermann, D.: Weighted Centroid Localization in WLAN-based Sensor Networks. In: Proceedings of the 2007 IEEE International Symposium on Intelligent Signal Processing (WISP 2007) (2007) 20. Cisco Systems, Aironet 1200 Series Access Point Data Sheet, http://www.cisco.com/en/US/products/hw/wireless/ps430
A Formal Framework for Expressing Trust Negotiation in the Ubiquitous Computing Environment* Deqing Zou1, Jong Hyuk Park2, Laurence Tianruo Yang 3, Zhensong Liao1, and Tai-hoon Kim4 1 Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China 2 Department of Computer Science and Engineering, Kyungnam University, Korea 3 St. Francis Xavier University, Canada 4 Division of Multimedia, Hannam University, Korea
Abstract. There are lots of entities in the ubiquitous computing environment. For the traditional public key Infrastructure (PKI), every entity should be signed a valid certificate by the certificate authentication center. However, it’s hard to construct a centralized trust management framework and assign a valid certificate for every entity in the ubiquitous computing environment because of large numbers of dynamic entities. Trust negotiation (TN) is an important means to establish trust between strangers in ubiquitous computing systems through the exchange of digital credentials and mobile access control policies specifying what combinations of credentials a stranger must submit. Current existing TN technologies, such as TrustBuilder and KeyNote, focused on how to solve a certain problem by using some special techniques. In this paper, we present a formal framework for expressing trust negotiation. The framework specifies the basic concepts, elements and the semantics of TN. By analyzing TN, we point out how to build a TN system in practice.
1 Introduction With the comprehensive popularity of computer knowledge and the great development of computer network, people have more and more requirements towards resource sharing over Internet. Common access control mechanisms, such as MAC [1], RBAC [2], could not work well because of their limitation in design and application. In the ubiquitous computing environment, it’s hard to construct a centralized trust management framework for large numbers of dynamic entities. Exchange of attribute credentials is a means to establish mutual trust relationship between strangers in the ubiquitous computing environment, who wish to share resources or conduct business transactions [3][4][5]. As a new access control mechanism, TN successfully provides a method to share resources for strangers from different security domains in the ubiquitous computing environment. *
The project is supported by National Natural Science Foundation of China under Grant No. 60503040.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 35–45, 2008. © Springer-Verlag Berlin Heidelberg 2008
36
D. Zou et al.
In TN, automated trust negotiation (ATN), as a new kind of TN, has a great improvement towards previous TN. Generally, ATN differs from TN as the following aspects: 1) ATN discloses the access control policy when the resource can be available. While in TN, the requestor would not know what credentials he must submit before the negotiation starts. From this, we can see that ATN saves much time, which helps to improve negotiation efficiency. 2) ATN has done much work in privacy protection. In TN, the requestor is required to submit his certificate to attain the access. In ATN, the trust relationship is established by the exchange of credentials and access control policies. The disclosure of a certificate may pose a threat towards the user private privacy, while the disclosure of a credential would not do. 3) ATN pays much attention on information protection, while TN does not. In ATN, the transmitting messages are encrypted, while in most TN systems, the exchanged data are plaintexts. Currently, there are still some problems unsolved in TN. First, there are no unified secure definitions for TN’s concepts. For example, what is a credential? What information a credential must include? Which credential is secure? Second, TN is difficult to be formalized by a certain language, which leads to a low security level. Third, since most information protection techniques are immature or heavyweight, how to protect privacy and sensitive information is still a serious issue. Finally, there is no guideline to instruct how to build a TN system. In the paper, we aim at expressing TN theoretically, specifying the internal relationship between TN’s concepts or components. Generally, our work has the following contributions: 1) A common framework of general-purpose for TN is proposed in this paper. Previous work focused on how to solve a certain problem, such as giving a model, using some special techniques and so on. Our work aims at the unification of all kinds of models and techniques. 2) We propose a lot of useful suggestions on how to build a TN system. Usually, before developing a TN system, some preparations should be made. For example, which policy language will be selected? How should we improve negotiation efficiency? Previous work did not deal with such problems. 3) We present a new thought to improve TN’s security level. According to the specification of TCSEC (Trusted Computer System Evaluation Criteria) [6], TN has a low security level. If TN is formalized by a certain policy language, TN’s security level would be improved or enhanced. The rest of this paper is organized as follows. Section 2 describes the related work, including TN project survey and information protection techniques. Section 3 is the main part of the paper. It describes the framework in detail. Section 4 purposes useful suggestions on how to build/develop a TN system. Section 5 concludes the paper and specifies the future work.
A Formal Framework for Expressing Trust Negotiation
37
2 Related Work Our work is originally motivated from the existing automated trust negotiation research [3][4][5] whose goal is to enable trust establishment between strangers in a decentralized or open environment, such as Internet. Our work focuses on analyzing TN, including the concepts, the components and the relationship between the concepts. To make it clear, existing TN projects and systems should be collected. Meanwhile, how to protect sensitive information or personal privacy is still a hot issue. Based on these, in this part, we mainly investigate TN projects/systems and information protection. In the existing TN systems, TrustBuilder [7][8], KeyNote [7][9][10] are much well-known. Some similar systems include PolicyMaker [11], SD3 [12], PRUNES [13] and Trust-X [14]. Here, we mainly discuss TrustBuilder and KeyNote. TrustBuilder represents one of the most signification proposals in the negotiation research area. It provides a broad class of negotiation strategies, as well as a strategyand language-independent negotiation protocol that ensures the interoperability of the defined strategies within the its architecture. Each participant in a negotiation has an associated security agent that manages the negotiation. During a negotiation, the security agent uses a local negotiation strategy to determine which local resources to disclose next and to accept new disclosures from other parties. In KeyNote, the predicates are written in a simple notation based on C-like expressions and regular expressions, and the assertions always return a Boolean (accepted or denied) answer. Moreover, the credential signature verification is built in to KeyNote system. Assertion syntax is based on a human-readable “RFC-822”-style syntax. And the trusted actions in KeyNote are described by simple attribute/value pairs. As far as the information protection is concerned, some typical and influencing techniques are involved as the following. W.H. Winsborough and N. Li [4] presented ACK policy to control the disclosure of credentials and policies, and developed TTG protocol to construct policy tree so that it was easy to detect whether the credentials matched the policies or not. ACK policy is useful to protect sensitive information, and TTG protocol enables two parties to do joint chain discovery in an interactive manner as well as the parties to use policies to protect sensitive credentials and the attribute information contained in the credentials. However, ACK policy and TTG are limited in applying because of difficulty in constructing them. N. Li et al [15] proposed OSBE protocol to prevent important information from leaking and void the negotiation being attacked. OSBE bases its idea on digital signature and checks the message’s integrity so as to detect whether the negotiator has the right signature. OSBE protects message from unauthorized access, but it is heavyweight to build and the signature computing has a great cost. Holt et al introduced hidden credentials in [16]. They gave a formal description for hidden credentials, including the concepts of credential and policy indistinguishability, and showed how to build them using IBE. Their work also gave some compelling examples of the utility of hidden credentials. In short, they provided a good model for trust negotiation to implement hidden credentials. Based on this, Robert W et al [17] used hidden credentials to conceal complex policies, and Keith F et al [18] used them
38
D. Zou et al.
to hide access control policies. Since hidden credential system cannot prevent from invalid inference, its implementation is limited in some degree. J. Li and N. Li proposed the notions of OACerts and OCBE in [19]. OCBE protocol adopted the idea of zero-knowledge and ensured that if and only if the recipient was the specified one, he could get the resource, otherwise he got nothing. However, the method has not discussed how to guarantee the security of message during the transmission over the insecure Internet.
3 Framework for Expressing Trust Negotiation In this section, we will describe the TN framework in detail. First, the basic concepts and elements are listed. Then, the framework is introduced. Finally, the corresponding expressions are given. 3.1 Basic Concepts and Elements TN is performed by the exchange of credentials and access control policy. In this part, we give the basic concepts of TN and put forward their corresponding formalized description as follows: • Access control policy (ACP): the constraint in accessing the resource, which indicates what credentials the accessor should prepare and submit. In TN, access control policy has two types: compound policy and meta policy. Meta policy is the minimal element to compose a compound policy. To realize it, a set of conjunctive predicates are required, such as
, < ∧,∨,¬> and so on. • Access request (AR): a structural message to tell the resource provider that the resource is expected to be accessed. An access request should at least include: 1) accessor, 2) requested resource, 3) requested operation. • Certificate: to identify a person’s identity or attribute information, such as principal, public key. In previous TN, certificate is the main tool to carry a person’s information. Currently, as a person’s privacy is considered, certificate is hardly disclosed and transferred any more. • Credential: a kind of temporary certificate, whose content comes from the user’s certificate. In other words, a certificate contains many credentials. When the credential is introduced in TN, user’s certificate would not be released. • Negotiation strategy (NS): a mechanism to specify how to disclose access control policy and credentials. There are two extreme negotiation strategies: eager strategy and parsimonious strategy. The former discloses zealously without being requested, while the latter would not release any until a request is received. • Negotiation protocol (NP): a rule to specify what actions would be taken during the negotiation process, including authentication, delegation, authorization communication mode (TCP/UDP), encryption mode (symmetrical or asymmetrical), and encryption function and so on. For example, is transmitting message transferred via plaintext or ciphertext? How to select the encryption functions?
A Formal Framework for Expressing Trust Negotiation
39
Accordingly, the basic elements of TN are as follows: • ACP: access control policy set. ACP= {acp1,acp2,…,acpn}, acpi (1≤i≤n) is a meta policy. Usually, an access control policy ACP can be depicted as ACP=f(acp1,acp2,…,acpn), while f is compound function by using the specified conjunctive predicates. • AR: access request set. AR={ar1,ar2,…,arn}, ari (1≤i≤n) is a specific access request towards a certain resource. • C: credential set, C={c1,c2,…,cn}, ci (1≤i≤n) is a credential. Every credential carries only one kind of attribute information. • R: resource which can be accessed. In TN, R stands for the available resource. It can be the resource itself, or the access entry. When R is disclosed, the trust relationship is established, and the negotiation process ends up. • SEQ: disclosure sequence set, which is used to store the disclosed access requests, access control policies and credentials. SEQ=SEQAR∪SEQACP∪ SEQC∪R, while SEQAR (SEQACP, SEQC) denotes the disclosed access request (resp. access control policy, credential) sequence. When R∈SEQ, the negotiation succeeds. • NS: negotiation strategy set. NS={ns1,ns2,…,nsn}, nsi (1≤i≤n) is the selected negotiation strategy. In practice, new negotiation strategies may be developed according to the application. • NP: negotiation protocol set. NP={np1, np2,…,npn}, npi (1≤i≤n) is a certain negotiation protocol to specify how to exchange information during the negotiation process.
Fig. 1. The relationship between AR and ACP
3.2 Framework for Expressing Trust Negotiation In our framework, TN can be described as follows: TNparameter=
(1)
In (1), TNparameter denotes the parameter to deploy a TN system. When TNparameter is confirmed, the work mode of TN is established. ATN/TN stands for the negotiation
40
D. Zou et al.
type. When ATN is selected, it is an ATN system. AR (ACP, C, NS, NP) is the corresponding access request (resp. access control policy, credential, negotiation strategy, negotiation protocol) set. There are many internal relationships and constraints among TN’s parameters. 1) The relationship between AR and ACP In TN, when the resource provider receives an access request ar, he will analyze it and check whether it is valid and effect. If ar passes the examination, the resource provider will return a relative access control policy set Policy (including one or more meta policies) as a response, otherwise, ar is denied. Generally, every request should have a corresponding policy set, i.e., ∀ar∈AR,∃Policy⊆ACP⇒Correlate(ar,Policy), in which the function Correlate() is a binary logic predicate. Meanwhile, it may happen multiple requests correspond to a same policy set. This is caused by the access granularity. In short, the relationship between AR and ACP can be shown as Fig. 1.
Fig. 2. The relationship between ACP and C
2) The relationship between ACP and C In TN, credentials are submitted to meet the requirements of a compound policy. To avoid the different meanings, each credential can only satisfy a meta policy. That is to say, ∀acp∈ACP, ∃c∈C⇒Match(acp,c) and ∀c∈C,∃acp∈ACP⇒ Match(acp,c), in which the function Match() is also a binary logic predicate. The relationship between ACP and C can be shown by Fig. 2. 3) The relationship between NS and NP In TN, negotiation strategy decides the disclosure mode. Different negotiation strategy has different requirements towards system, network, user, resource provider and negotiation protocol. For example, if the parsimonious strategy is adopted, i.e., the negotiation needs a tight coupling communication, which causes a great network burden, and has a strict requirement towards the negotiation protocol. Accordingly, negotiation protocol decides the negotiation process. Usually, negotiation protocol and negotiation strategy should cooperate with each other; otherwise, the negotiation is easy to fail. The relationship between NS and NP is shown as Fig. 3.
A Formal Framework for Expressing Trust Negotiation
41
Fig. 3. The relationship between NS and NP
3.3 Trust Negotiation Expression From above, we can see that the state of TN can be depicted as: STN = SX + SY
(2)
SX = NS × NP
(3)
SY = AR × ACP × C
(4)
SX denotes a relatively static state of TN. When the negotiation starts, NS and NP would be ascertained. When SX is fixed, the disclosure sequence SEQ would be somehow clear. In short, SX decides the qualitative change of SEQ. SY stands for a relatively dynamic state of TN. SY records the total negotiation process, which can be used to track both negotiators’ actions. Meanwhile, it is the important information to compose the log file. If the negotiation fails, SY can be useful to produce feedback. In short, SY decides the quantitative change of SEQ. As negotiation protocol is taken into account, NP can be depicted as: NP=
(5)
In (5), A1 is authentication. In TN, authentication is an important means to make certain a user’s identity. Usually, authenticating process is to ensure that the acceesor holds the private key of his certificate. Suppose p and l to be a user’s principal (also called public key) and license (also called private key). The authenticating process can be: 1) the server randomly picks a string str, and encrypts it with the user’s principal p and passes it to the user; 2) the user decrypts it and returns it through a secure VPN (virtual private network) channel; 3) the server compares the original string with the received string, and decides the authentication result. Since it would take a great cost to set up a VPN channel, authentication can be realized by mutual encryption via insecure channel such as Internet without VPN. Based on this, we present a mutual authentication shown as Fig. 4.
42
D. Zou et al.
Fig. 4. Mutual authentication for TN
In short, authentication can be depicted as: A1= Principal × License
(6)
In (5), D is delegation. In TN, delegation is to endow others with one’s privileges. To finish a delegation, two users (suppose A and B) should do: 1) the authentication of A and B; 2) the delegation of B towards A, to let A own B’s privileges. Based on these, the delegation can be depicted as: DBA = A1A × A1B × PrivilegeB
(7)
In (5), A2 is authorization. In TN, authorization is to let the accessor have the specified privileges to operate on resource. In short, authorization can be depicted as: A2= Principal × License × PrivilegeR
(8)
In (5), PL/CI denotes the message encapsulation means. PL is plaintext, while CI is ciphertext. TCP/UDP denotes the communication means. TCP stands for tight coupling communication. When NS is parsimonious strategy, TCP must be selected. UDP stands for loose coupling communication. S/A denotes the encryption means. S stands for symmetric encryption, while A stands for asymmetric encryption. When S is selected, the relative algorithms could be AES [20], Rijndael [21] and so on. Corresponding, when A is selected, the relative algorithms could be RSA [22], ECC [23] etc.
4 Guideline for Building a TN System The research in theory is useful to guide teams to build practical systems. Before building a TN system, some questions should be answered.
A Formal Framework for Expressing Trust Negotiation
43
1) Application type The team should make clear what services the TN system can provide. How about the security requirements? Would the online service be high available or have high computing power? How to ensure QoS? When these problems are clear, the team can make good preparations for the development. 2) Policy language Not every language can serve for TN. TN has a lot of requirements [3][7] towards policy languages. The selection of policy language should be adapted to application type. 3) Specification of AR, ACP and C To attain the goal of high efficiency and least artificial participation, TN should be much intelligent as possible. To realize it, TN should improve the readability of access request, access control policy and credential. So before developing TN, the formats of AR, ACP and C should be ascertained. This helps to perform all kinds of operations. An example for the formats of AR, ACP and C can be: AR=(owner):(resource):(operation): (period)
(9)
ACP=(owner):(recipient):(item):(predicate): (value):(period)
(10)
C=(owner):(recipient):(item):(value):(period)
(11)
In (9), owner denotes the holder of AR, usually the principal of the user; resource denotes requested resource or service; operation denotes the requested operation towards the resource; while period denotes the valid life time of AR. In (10), owner denotes the holder of ACP; recipient denotes the receiver of ACP; item denotes the attribute content, such as role, age, name etc; predicate denotes the comparing symbol, predicate∈ {>,<,≥,≤,=,≠,∈,∉,….}; value denotes the comparing value, which may be a threshold; while period denotes the valid life time of ACP. In (11), owner denotes the holder of C; recipient denotes the receiver of C; item denotes the attribute content; while value denotes the attribute value. Note that, when the period of AR, ACP and C is specified, all the digital tools are temporary and can be automatically invalid. From this aspect, it would bring little overhead in storing access requests, access control policies and credentials.
5 Conclusions and Future Work As a new kind of access control mechanism, TN can establish the trust relationship by the exchange of access control policy and digital credentials in the ubiquitous computing environments. In the paper, we present a mathematical framework for expressing trust negotiation. The framework specifies the basic concepts, elements and the semantics of TN. By analyzing TN, we purpose some suggestions on how to build a practical TN system. Since TN still has a low security level, next step, we will pay more attention on how to combine our proposed framework with some strong security mechanisms, such as Public Key Infrastructure (PKI).
44
D. Zou et al.
References [1] Xin, L.L., Min, C.W., Lian, H.S.: Realizing Mandatory Access Control in Role-Based Security System. Journal of Software 11(10), 1320–1325 (2000) (in Chinese with English abstract) [2] Sandhu, R.S., Coyne, E.J., Feinstein, H.L., Youman, C.E.: Role-based access control models. IEEE Computer 29(2), 38–47 (1996) [3] Liao, Z.S., Jin, H., Li, C.S., Zou, D.Q.: Automated trust negotiation and its development trend. Journal of Software 17(9), 1933–1948 (2006) (in Chinese with English abstract) [4] Winsborough, W.H., Li, N.: Towards practical automated trust negotiation. In: Proceedings of the 3rd International Workshop on Policies for Distributed Systems and Networks, pp. 92–103. IEEE Computer Society Press, Los Alamitos (2002) [5] Jin, H., Liao, Z.S., Zou, D.Q., Qiang, W.Z.: A new approach to hide policy for automated trust negotiation. In: Yoshiura, H., Sakurai, K., Rannenberg, K., Murayama, Y., Kawamura, S.-i. (eds.) IWSEC 2006. LNCS, vol. 4266, pp. 168–178. Springer, Heidelberg (2006) [6] Trusted Computer System Evaluation Criteria. America Department of Defense, CSCSTD-001-93 (1983) [7] Seamons, K.E., Winslett, M., Yu, T., Smith, B., Child, E., Jacobson, J., Mills, H., Yu, L.: Requirements for Policy languages for Trust Negotiation. In: Proceeding of 3rd IEEE Intel Workshop on Policies for Distributed Systems and Networks, pp. 68–79. IEEE Computer Society Press, Los Alamitos (2002) [8] The TrustBuilder Porject, http://isrl.cs.byu.edu/ [9] Blaze, M., Feigenbaum, J., Lacy, J.: Decentralized Trust Management. In: Proceeding of the 17th symposium on Security and Privacy, pp. 164–173. IEEE CS Press, Los Alamitos (1996) [10] Blaze, M., Feigenbaum, J., Ioannidis, J., Keromytis, A.D.: The KeyNote TrustManagement System (Version 2). IETF RFC 2704 (September 1999) [11] Blaze, M., Feigenbaum, J., Strauss, M.: Compliance checking in the Poliymaker Trust Management System. In: Proceeding of 2nd Financial Crypto Conference, pp. 205–216. IEEE Press, Los Alamitos (1998) [12] Jim, T.: SD3: a trust management system with certificate evaluation. In: Proceeding of the 2001 IEEE Symposium on Security and Privacy, pp. 106–115. IEEE CS Press, Los Alamitos (2001) [13] Yu, T., Ma, X., Winslett, M.: PRUNES: An Efficient and Complete Strategy for Automated Trust Negotiation over the Internet. In: Proceeding of the 2000 ACM Conference on Computer and Communications Security, pp. 88–97. ACM Press, New York (2000) [14] Bertino, E., Ferrari, E., Squicciarini, A.: Trust-X: A peer to peer framework for trust negotiations. In: Proceeding of IEEE Transaction on Knowledge and Data Engineering, pp. 132–138. IEEE CS Press, Los Alamitos (2004) [15] Li, N., Du, W., Boneh, D.: Oblivious signature-based envelope. In: Proceeding of the 22nd ACM Symposium on Principles of Distributed Computing, pp. 182–189. ACM Press, New York (2003) [16] Holt, J.E., Bradshaw, R., Seamons, K.E., Orman, H.: Hidden credentials. In: Proceedings of 2nd ACM Workshop on Privacy in the Electronic Society, pp. 1–8. ACM Press, New York (2003)
A Formal Framework for Expressing Trust Negotiation
45
[17] Bradshaw, R.W., Holt, J.E., Seamons, K.E.: Concealing Complex Policies with Hidden Credentials. In: Proceedings of the 4th ACM Conference on Computer and Communications Security, pp. 245–253. ACM Press, New York (2004) [18] Frikken, K., Atallah, M., Li, J.: Hidden Access Control Policies with Hidden Credentials. In: Proceedings of the 3rd ACM Workshop on Privacy in the Electronic Society, pp. 130– 131. ACM Press, New York (2004) [19] Li, J., Li, N.: OACerts: Oblivious Attribute Certificates. In: Proceeding of 3rd Conference on Applied Cryptography and Network Security, pp. 108–121. ACM Press, New York (2003) [20] Johannes, B., Seifert, J.P.: Fault Based Cryptanalysis of the Advanced Encryption Standard (AES). In: Wright, R.N. (ed.) FC 2003. LNCS, vol. 2742, pp. 162–181. Springer, Heidelberg (2003) [21] Ferguson, N., Kelsey, J., et al.: Improved Cryptanalysis of Rijndael. In: Proceedings of 7th International Workshop of Fast Software Encryption, vol. 1987, pp. 136–141. Springer, Heidelberg (2001) [22] Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signatures and public key crytosystems. Communications of the ACM, 120–126 (1978) [23] Gura, N., Eberle, H., Shantz, S.C.: Generic implementations of elliptic curve cryptography using partial reduction. In: Proceedings of the 9th ACM conference on Computer and Communications Security, pp. 177–189. ACM Press, New York (2002)
Pervasive Services on the Move: Smart Service Diffusion on the OSGi Framework Davy Preuveneers and Yolande Berbers Department of Computer Science, K.U. Leuven Celestijnenlaan 200A, B-3001 Leuven, Belgium {davy.preuveneers, yolande.berbers}@cs.kuleuven.be http://www.cs.kuleuven.be
Abstract. The ubiquity of wireless ad hoc networks and the benefits of loosely coupled services have fostered a growing interest in service oriented architectures for mobile and pervasive computing. Many architectures have been proposed that implement context-sensitive service discovery, selection and composition, or that use a component-based software engineering methodology to facilitate runtime adaptation to changing circumstances. This paper explores live service mobility in pervasive computing environments as a way to mitigate the risk of disconnections during service provision in mobile ad hoc networks. It focuses on contextaware service migration and diffusion to multiple hosts to increase accessibility and expedite human interaction with the service. We analyze the basic requirements for service mobility and discuss an implementation on top of OSGi. Finally, we evaluate our approach to service mobility and illustrate its effectiveness by means of a real-life scenario.
1
Introduction
The late Mark Weiser [1] predicted that the next wave in the era of computing would be outside the realm of the traditional desktop. Everything of value would be on the network in one form or another with smart interacting objects adapting to the current situation without any human involvement. Ubiquitous computing will become the next logical step to mobile computing where people are already connected anytime and anywhere. The growing presence of WiFi and 3G wireless Internet access and sensor network technologies will give rise to this new paradigm, in which information and intelligent services are invisibly embedded in the environment around us. Service-oriented computing (SOC) [2] is a key enabler of Weiser’s vision. It represents the current state of the art in software architecture [3] that utilizes services as fundamental building blocks for the rapid development and deployment of applications. It relies on a service-oriented architecture (SOA) to organize loosely coupled services with functions to bind them and manage their life-cycle, such as the deployment and updating of services. These key features are crucial for service interaction in a ubiquitous computing environment. By breaking up an application into a configuration of independent services with well-defined F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 46–60, 2008. c Springer-Verlag Berlin Heidelberg 2008
Pervasive Services on the Move
47
interfaces, SOA enables the creation of new applications with the required flexibility to customize services to changing demands and different contexts. The focus of this paper is on how we can improve the availability and accessibility of services in a highly dynamic ad hoc network where intermittent network connectivity can disrupt an application when multiple services on mobile and stationary devices are temporarily coupled to behave as one single application. We propose to replicate the same service and its state at runtime (i.e. live mobility of services) on multiple devices near to the user that can provide a guaranteed quality of service (QoS) with support for state synchronization between the replicated services. This way, we increase the number of opportunities a user can interact with a service and we can better deal with volatile network connections by handing over to a replica if the connection between two services breaks down. As such, we say that the same service has diffused to multiple hosts. In order to keep this diffusion approach scalable, the targets selected for diffusion need to be well-chosen. That is why context-awareness [4] is a necessity for service migration and diffusion in the ubiquitous computing paradigm. It addresses the inner characteristics of the services by collecting relevant information about the service requirements and the devices in the vicinity. It helps to select appropriate targets for service mobility and coordinate synchronization and access to services deployed on these devices, and learns for each service the devices the user will most likely interact with. The applicability of smart service diffusion in a distributed setting will be illustrated on top of the OSGi framework [5]. The paper is structured as follows. In section 2 we list the requirements to enable seamless service migration and diffusion. Section 3 provides details on how we realized these requirements on top of the OSGi framework. In section 4 we conduct experiments that illustrate the effectiveness of service diffusion in a real-life scenario. We measure the overhead of state transfer and synchronization as well as any benefit in improved availability and accessibility. Section 5 provides an overview of related work on distribution and mobility of services. We end with conclusions and future work in section 6.
2
Requirements for Service Migration and Diffusion
Pervasive services offer a certain functionality to nearby users. They are accessed in an anytime-anywhere fashion and deployed on various kinds of mobile and stationary devices. When users or devices become mobile, proactive live service mobility to multiple hosts can provide a solution to the increased risk of disconnecting remote services. In this section we review several non-functional concerns and requirements that need to be fulfilled to ensure that the migration and diffusion of a service in a mobile and pervasive setting can be accomplished. 2.1
Device, Service and Resource Awareness
In a pervasive services environment, the multi-user and multi-computer interaction causes a competition for resources on shared devices. Therefore, knowledge
48
D. Preuveneers and Y. Berbers
about the presence, type and context of the devices (including resource-awareness about the maximum availability and current usage of processing power, memory, network bandwidth, battery lifetime, etc.) is a prerequisite to guarantee a minimum usability and quality of service. Before relocating a service to another host, a service discovery protocol (SDP) can be used to verify if the service is not already available [6]. 2.2
Explicit Service State Representation and Safe Check-Pointing
Stateless services can be replaced with similar ones or redeployed on another host at runtime without further ado as long as the syntax and semantics of the interfaces remain the same such that the binding can be reestablished. Stateful services require a state transfer after redeployment before they can continue their operations on a different host. Furthermore, the service must be able to resume from the state it acquired. Therefore, services should model the properties that characterize their state and make sure that check-points of their state are consistent [7,8]. 2.3
State Synchronization and Support for Disconnected Operation
Due to possible wireless network disruptions in the mobile setting of the user, complete network transparency of remote service invocations can have a detrimental effect on service availability and accessibility. The service platform should provide support to deal with intermittent network connectivity in order to recover from temporary failures in network connectivity between two services. With handover to replicated services whose states are synchronized, the effect of network disruptions can be reduced. Moreover, with discrete state synchronization, the requirement for continuous network connectivity can be mitigated. 2.4
Enhanced Service Life Cycle Management
The service platform must provide functions to manage the life-cycle of services, such as deployment, withdrawal, starting, stopping and updating. If a service cannot be run locally because of a lack of resources, the service needs to be moved to a remote and more powerful device in the vicinity of the user. The service life cycle management should move services to new hosts and replicate the state if necessary. We propose two kinds of service mobility (see Fig. 1): – Service Migration: Once the replicated service has moved to a new host, the original one is stopped and uninstalled. This is the typical behavior of ‘follow-me’ applications that move along with the user. – Service Diffusion: In this adaptation, the original service becomes active again once the state has been extracted. New replicas acquire the service state and get activated. After activation, service states are synchronized. Of course, a combination where a service is replicated on multiple hosts, but where the original is uninstalled is also possible. Also note that plain service migration does not require any state synchronization at all, but handing over to a service replica is no longer possible when trying to recover from a network failure.
Pervasive Services on the Move
49
Fig. 1. Live migration and diffusion of services. Service migration completely relocates the service to another host, while service diffusion redeploys the same service on other hosts and ensures service state replication and synchronization.
As the effect of network disruptions cannot be mitigated completely, we must also clean up stale replicated services. This problem is similar to distributed garbage collection where heartbeats or lease timeouts are used to detect disconnections and recycle memory. Similar algorithms can be reused to clean up these services.
3
Context-Aware Service Mobility on the OSGi Framework
In this section we will discuss how the previous requirements have been implemented on top of the OSGi framework [5]. This lightweight service oriented platform has been chosen for its flexibility to build applications as services in a modularized form with Java technology. OSGi already offers certain functionalities that are needed to implement the service migration and diffusion requirements. OSGi is known for its service dependencies management facilities and the ability to deal with a dynamic availability of services. Moreover, OSGi can be deployed on a wide range of devices, from sensor nodes, home appliances, vehicles to high-end servers, and allows the collaboration of many small components, called bundles, on local or remote computers. As such, OSGi is a viable platform for service orientation in a ubiquitous computing environment. 3.1
Extending Service Descriptions with Deployment Constraints
Services in the frame of OSGi are published as a Java class or interface along with a set of service properties. These service properties are key-value pairs that help service requesters to differentiate between service providers that offer services with the same service interface. Service providers and requesters are packaged into a OSGi bundle, i.e. a JAR file with a manifest file that contains
50
D. Preuveneers and Y. Berbers
information about the bundle, such as version numbers and service dependencies. Service descriptions are stored in the service registry of OSGi. Service requesters can discover and bind to a service implementation by actively querying for the service or by subscribing to a notification mechanism in order to receive events when changes in the service registry occur. A recent addition to the OSGi R4 framework that simplifies the registering of POJOs (plain old Java objects) as services and the handling of service dependencies is the Declarative Services (DS) specification [5]. Dependency information that is currently not mentioned in the service descriptor deals with non-functional properties such as hardware, software and resource constraints. For example, one OSGi bundle could be successfully deployed on a J2ME CDC Foundation Profile, while another could need at least a J2ME CDC Personal Profile or a J2SE virtual machine because it uses AWT for a graphical user interface. Moreover, resource (memory, processing power, storage) and hardware (screen, audio, keyboard) dependencies need to specified as well. The current key-value pair format for service properties is too limited to describe these complex constraints. As discussed in our previous work [9,10,11], ontologies provide a convenient richer specification format to describe and discover pervasive services with support for QoS and context-awareness. We therefore opted to add a ‘deployment’ entry into the Declarative Service descriptor in which we refer to an ontology that is included in the JAR file of the OSGi bundle: <service> <provide interface=”communication.ChatClient” /> <deployment> <require name=”resources1” class=”descriptor.owl#MemoryDependency” /> <require name=”software1” class=”descriptor.owl#JavaVMDependency” /> <require name=”hardware1” class=”descriptor.owl#DisplayDependency” /> <require name=”hardware2” class=”descriptor.owl#KeyboardDependency” />
It models the above dependencies as class restrictions on concepts defined in our context ontologies1. A device should comply with these constraints if the service is to be deployed successfully. For example: = 98304” /> 1
See http://www.cs.kuleuven.be/∼ davy/ontologies/2008/01/ for the latest revision of our context ontologies.
Pervasive Services on the Move
51
The previous constraint declares that a device should have at least one memory resource instance (of class RAM or one of its subclasses) with a property currAvailable that has a value larger than or equal to 98304. The type of the property is specified in the Hardware.owl ontology, and for this value it is bytes.
The above constraint declares that a device should have at least one Java virtual machine instance with a GUI rendering engine instance that belongs to the JavaAWT class. The main advantage of our context ontologies [9] is that complex semantic relationships only need to be specified once and can be reused by every service descriptor. Moreover, each device profile can specify their characteristics with the same ontologies. If a device then would specify that it runs OSGi on top of JDK 1.6, then an AWT rendering engine dependency would semantically match, but a Java MIDP 2.0 LCDUI rendering engine would not. Resource and hardware constraints can be expressed in a similar way. The matching itself to verify that a service can move to a particular host is carried out by a context enabling service that is described in the following section. 3.2
Context-Awareness as an OSGi Distributed Declarative Service
In order to diffuse OSGi services in a way that takes into account the heterogeneity and dynamism of a ubiquitous computing environment, we need to be aware of what the characteristics and capabilities of these devices are, where they are located, what kind of services they offer and what resources are currently available. The COGITO service will provide this information on each host. It gathers and utilizes context information to positively affect the provisioning of services, such as the personalization and redeployment of services tailored to the customer’s needs. The core functions of the enabling service are provided as a set of OSGi bundles that can be subdivided into the following categories: – Context Acquisition: These bundles monitor for context that is changing and gather information from resource sensors, device profiles and other repositories within the system itself or on the network. – Context Storage: A context repository ensures persistence of context information. It collects relevant local information and context that remote entities have published in a way that processing can be handled efficiently without losing the semantics of the data.
52
D. Preuveneers and Y. Berbers
Fig. 2. Building blocks of the COGITO enabling service
– Context Manipulation: These bundles reason on the context information to verify that context constraints are met. Besides a rule and matching engine that exploits semantic relationships between concepts [11], it also provides adapters that transform context information into more suitable formats. A high-level overview of the building blocks of the context-awareness enabling service is given Figure 2. The advantage of decomposing the context-awareness framework into multiple OSGi bundles is that the modular approach saves resources when certain bundles are not required. COGITO is implemented as a distributed Declarative Service. As the Declarative Service specification does not cover the registration of remote services, we have each device announcing its presence and that of its context enabling service with a service discovery protocol like UPnP, SLP or ZeroConf. Upon joining a network, each other node in the network creates a remote proxy to the enabling service of the joining node, and the Declarative Services bundle will ensure lazy registration and activation of the proxy whenever remote context is acquired. When a node leaves the network, the proxies on the other nodes are deactivated. This approach for distributed Declarative Services simplifies the collection of context information on the network, but more importantly it also enables transparent sharing of intensive context processing services (such as a rule or inference engine bundle) on resource-constrained devices. 3.3
Service State Representation, Extraction and Synchronization
In order to replicate stateful services on multiple hosts with support for state synchronization, we need to explicitly model the state variables of the service, extract them at runtime and send them to the replicated peers. In order to keep
Pervasive Services on the Move
53
state representation and exchange straightforward and lightweight, we encapsulate the state of a service in a JavaBean. JavaBeans have the benefit of (1) being able to encapsulate several objects in one bean that can be passed around to other nodes, (2) being serializable and (3) providing get and set methods to automate the inspection and updating of the service state. The approach is quite similar to the stateful session beans in the Enterprise JavaBeans specification [12]. Stateful session beans maintain the internal state of web applications and ensure that during a session a client will invoke method calls against the same bean in the remote container. In our approach however, a JavaBean is situated locally within each replicated service and does not involve remote method calls. Instead, the contents of the JavaBean is synchronized with those of the replicated services. Our current implementation tries to synchronize as soon as the state of an application has been changed. When network failures arise, the state updates are put in a queue and another attempt is carried out later on. The queue holds the revision number of the last state that was successfully exchanged and this for the three replicated applications. If, however, two or more replicated services independently continue without state synchronization, a state transfer conflict will occur when the involved hosts connect again. In that case, a warning will be shown and the user can choose to treat the services as separate applications, or have one service push its state to the others. This approach is rather crude, but acceptable as long as we are mainly dealing with single-user applications.
! " # $ !!
$
" %
!!
Fig. 3. New service states in the life-cycle of a migrating or diffusing service. Some of the transitions of the original service are shown in green, the ones of the replicated service(s) in red.
54
3.4
D. Preuveneers and Y. Berbers
Extending the Life-Cycle Management to Relocate Services
In a ubiquitous computing environment where pervasive services are replicated, a user may switch from one replicated service to another. Therefore, we add extra information to the service state that helps to coordinate the synchronization of the state changes. For example, we add a revision number that is increased after each write operation on the bean and use Vector Clocks [13] among all peers to order the state updating events. Fig. 3 shows that an active service can not only be stopped, but that in another transition the state can be extracted from the service and synchronized, or that the service can continue to either migrate or replicate to a new host. We use a lease timeout algorithm to garbage collect stale replicated services in order to recycle resources. The timeouts are application dependent, but can be reconfigured by the user.
4
Experimental Evaluation
The usefulness and feasibility of dealing with context-aware service migration and diffusion will be illustrated with three applications: (1) a grocery list application, (2) a Jabber instant messaging client, and (3) a Sudoku game. The devices used in the experiment include two PDAs, a laptop and a desktop, all connected to a local WiFi network. This setup will simulate the real-life scenario in the next section. 4.1
Scenario
The scenario goes as follows: It is Saturday morning and mom and dad go shopping. Everybody at home uses the grocery list application to make up a shared grocery list. In the past, mom sometimes forgot her grocery list at home, but now she only needs to take along her smartphone. The application just diffuses along. Dad also has his PDA with him and runs the same replicated application. They decided to split up the work and go shopping separately. At the mall, each time mom or dad finds a product that they need, it is removed from the shared sticky note application. They get each others updates when the state of the application is synchronized. Mom is still checking out while dad is already at the car. As he is waiting, he has another go at the Sudoku puzzle that he was trying to solve on the home computer that morning. His daughter is online and starts a conversation. She wants to send him a picture, but the display of his PDA is too small. He takes his briefcase, turns on his laptop, the application migrates and he continues the conversation there. Unfortunately, he forgot to charge his battery so after a while his laptop gives up on him. “Back to the PDA but then without the picture”, dad decides. In the meantime, mom has arrived and dad tells her about the picture. She will have a look at it at home once the application synchronizes with the desktop computer.
Pervasive Services on the Move
55
Fig. 4. The grocery list is automatically adapted to fit the new screen size if needed. After redeployment the states of all the replicated grocery lists are synchronized.
4.2
Experiments
In the experiment all the applications are launched on the desktop PC. The Jabber instant messaging client and Sudoku puzzle diffuse to one handheld, while the grocery list application diffuses to both PDAs. The application states are continuously synchronized between all hosts. Later on, the instant messaging application migrates from one PDA to the laptop and back (while synchronizing with the desktop PC). The mobility of the grocery list is illustrated in Figure 4. Additionally, these three applications all make use of two extra general-purpose services: (i) a stateless audio-based notification service, and (ii) a stateful logging service. By only deploying them on the desktop computer and synchronizing them on the laptop, we enforce that these two services can only be reached from a PDA through a remote connection. The applications will invoke them when (1) the grocery list changes, (2) the online status of somebody changes or (3) when the puzzle is solved. The bindings to the remote services are used to test handover to a replicated service after network disruptions. We let the setup run for 30 minutes, in which the following actions are simulated: – Each 30 seconds the grocery list is modified and synchronized. This event also triggers an invocation to both the remote services. – Each 5 seconds the state of the Jabber client is synchronized. Each minute the online status of somebody in the contact list changes, and this triggers an invocation to the remote services. – The Sudoku puzzle changes each 15 seconds and the 81 digits of the puzzle are synchronized. A puzzle is solved after 10 minutes, and this initiates an invocation to the remote services.
56
D. Preuveneers and Y. Berbers Table 1. Overhead results of state transfer, synchronization and handover Grocery List
Jabber Client
Sudoku Puzzle
Bundle size
23023 bytes
288235 bytes
54326 bytes
State size
7024 bytes
18235 bytes
1856 bytes
State synchronization
53 Kbytes / min 112 Kbytes / min 37 Kbytes / min
Relocation time
9 seconds
13 seconds
7 seconds
Synchronization delay
350 msec
518 msec
152 msec
Handover delay (audio service) 47 msec
61 msec
54 msec
Handover delay (log service)
213 msec
78 msec
167 msec
Each 3 minutes we shut down a random network connection for 1 minute to verify if the replicated applications can recover from the missed state updates and that handover to one of the remote replicated services. We measured the overhead of state transfer and synchronization and compare that to the resources that the applications required. We also measured the service availability and responsiveness by measuring the delay of handing over to another remote replicated service after a network disconnection. The experiment was repeated 3 times and the averaged results are shown in Table 1. 4.3
Discussion of the Results
The test results show there is only a small overhead for the state transfer. This is mainly a consequence of the size of the applications. They are rather small as they would otherwise not run on mobile devices. For other data intensive applications, such as multimedia applications, we expect that the overhead will be a lot bigger if data files have to be moved along as part of the state of the service. As the overhead was limited, we did not implement incremental state transfers, but for media applications this could be an improvement. The delays in state synchronization are low because the experiments were carried out on a local WiFi network. Other tests in multi-hop networks showed longer but still acceptable state synchronization delays (at most 5 times longer). More interesting though is the difference in service relocation time and state exchange time. By proactively moving the service in advance, the time to get an up-to-date replica is a lot smaller for state synchronization compared to service relocation. As such, service diffusion with state synchronization provides a much better usability to the user compared to plain service migration. Of course, this assumes the required bandwidth is available. The handover to the remote services worked in all cases because we avoided the case where all network connections to the remote replicated services are disrupted. In theory, our framework can handle this particular case if the remote service calls can be buffered and invoked asynchronously. However, our current
Pervasive Services on the Move
57
implementation does not support this. Interesting to note for the handover to the replicated services (in our experiment, the stateless audio service and stateful log service), is that handover to another log service takes a bit longer on average. This result is due to the fact that for the audio service we do not need to wait until the replicated service has acquired the latest revision of the service state. If state synchronization is taking place during handover, the handover delay can become higher. In our experiment, the remote services were only available on two hosts (the laptop and the desktop). If more devices would host a replicated service, the decision to handover to a particular device could depend on which replica is already completely synchronized.
5
Related Work
In recent years, many researchers have addressed the issue of state transfer, code mobility and service oriented computing in ubiquitous computing environments. This research has greatly improved our knowledge of how context-aware service oriented architectures can accommodate to new functionality requirements in changing environments. Providing a detailed review of the state of the art on all of these domains is beyond the scope of this paper. Instead, we focus on those contributions that are most related to the work presented in this paper. In [14], the OSGi platform was used to perform context discovery, acquisition, and interpretation and use an OWL ontology-based context model that leverages Semantic Web technology. This paper has inspired us to use the OSGi platform for context-aware service orientation, and more specifically service mobility. In [15], the authors present an OSGi infrastructure for context-aware multimedia services in a smart home environment. Although the application domain that we target goes beyond multimedia and smart home environments, we envision that more data driven services like multimedia applications are good candidates to further evaluate our service mobility strategies. Other context-awareness systems leveraging the OSGi framework include [16,17]. Optimistic replication is a technique to enable a higher availability and performance in distributed data sharing systems. The advantage of optimistic replication is that it allows replicas to diverge. Although users can observe this divergence, the copies will eventually converge during a reconciliation phase. In [18], the authors provide a comprehensive survey of techniques developed to address divergence. Our current service replication and state synchronization is more related to traditional pessimistic replication, because user perceived divergence might cause confusion. As discussed in [19], we will further investigate how optimistic replication can improve the quality of service in mobile computing, while keeping the user perceived divergence at an acceptable level. Remote service invocation on the OSGi framework is an issue that has been addressed by several researchers and projects. The Eclipse Communication Framework [20] provides a flexible approach to deal with remote services and service discovery on top of the Equinox OSGi implementation. Remote OSGi
58
D. Preuveneers and Y. Berbers
services can be called synchronously and asynchronously on top of various communication protocols. The R-OSGI [21] framework also deals with service distribution. While both projects address network transparency for OSGi services, neither of them deals with explicit service mobility. The Fluid Computing [22] and flowSGI [23] projects explicitly deal with state transfer and synchronization in a similar way to ours. They also discuss state synchronization and provide a more advanced state conflict resolution mechanism. Our contribution builds upon this approach and specifically focuses on service oriented applications. Our method also enhances the peer selection for service replication by using context information of the service and the device. This approach provides better guarantees that the application will actually work on the device.
6
Conclusions and Future Work
This paper presents a context-driven approach to live service mobility in pervasive computing environments. It focuses on service migration and diffusion to multiple hosts to increase accessibility and expedite human interaction with the service. We summarized the basic requirements for service mobility, including enhanced service descriptions, context-awareness to select appropriate targets for service migration, and life-cycle management support to carry out service relocation with state transfer and state synchronization. We have discussed how we implemented these requirements on top of the OSGi framework and validated our prototype by means of several applications that are replicated in a small scale network with enforced network failures. We studied the effects of service migration and service diffusion. Service migration moves an application from one host to another including its state, while service diffusion replicates the service and the state on multiple hosts. State synchronization ensures that each service can be used interchangeably. Experiments have shown that the overhead of state transfer and synchronization is limited for relatively small applications. The results illustrate that, if the available network bandwidth permits, the time to keep the state in sync is a lot smaller than to migrate a service from one host to another. This means that if a user wants to use another device because it provides a better quality of service (e.g. the bigger screen in the scenario), that proactive service diffusion provides a much better usability (i.e. shorter delays). Future work will focus on a better handling of state synchronization conflicts and improving the state encapsulation approach to deal with incremental state transfers for data intensive applications. However, our main concern will always be to keep the supporting infrastructure flexible and lightweight enough for lowend devices with room left to actually run applications. The outcome of this future work could result in a better policy with respect to on how many remote devices a service should be replicated and under which circumstances service migration would be a better approach compared to service diffusion.
Pervasive Services on the Move
59
References 1. Weiser, M.: The computer for the 21st century. Scientific American 265, 94–104 (1991) 2. Papazoglou, M.P., Georgakopoulos, D.: Service oriented computing. Commun. ACM 46, 24–28 (2003) 3. Shaw, M., Garlan, D.: Software architecture: perspectives on an emerging discipline. Prentice-Hall, Inc., Upper Saddle River (1996) 4. Dey, A.K.: Understanding and using context. Personal Ubiquitous Comput. 5, 4–7 (2001) 5. Open Services Gateway Initiative: OSGi Service Gateway Specification, Release 4.1 (2007) 6. Helal, S.: Standards for service discovery and delivery. Pervasive Computing, IEEE 1, 95–100 (2002) 7. Kramer, J., Magee, J.: The evolving philosophers problem: Dynamic change management. IEEE Trans. Softw. Eng. 16, 1293–1306 (1990) 8. Vandewoude, Y., Ebraert, P., Berbers, Y., D’Hondt, T.: An alternative to quiescence: Tranquility. In: ICSM 2006: Proceedings of the 22nd IEEE International Conference on Software Maintenance, Washington, DC, USA, pp. 73–82. IEEE Computer Society, Los Alamitos (2006) 9. Preuveneers, D., den Bergh, J.V., Wagelaar, D., Georges, A., Rigole, P., Clerckx, T., Berbers, Y., Coninx, K., Jonckers, V., Bosschere, K.D.: Towards an extensible context ontology for Ambient Intelligence. In: Markopoulos, P., Eggen, B., Aarts, E., Crowley, J.L. (eds.) EUSAI 2004. LNCS, vol. 3295, pp. 148–159. Springer, Heidelberg (2004) 10. Mokhtar, S.B., Preuveneers, D., Georgantas, N., Issarny, V., Berbers, Y.: EASY: Efficient SemAntic Service DiscoverY in Pervasive Computing Environments with QoS and Context Support. Journal Of System and Software 81, 785–808 (2008) 11. Preuveneers, D., Berbers, Y.: Encoding semantic awareness in resource-constrained devices. IEEE Intelligent Systems 23, 26–33 (2008) 12. Burke, B., Monson-Haefel, R.: Enterprise JavaBeans 3.0, 5th edn. O’Reilly Media, Inc, Sebastopol (2006) 13. Mattern, F.: Virtual time and global states of distributed systems. In: Parallel and Distributed Algorithms: proceedings of the International Workshop on Parallel and Distributed Algorithms 14. Gu, T., Pung, H.K., Zhang, D.Q.: Toward an osgi-based infrastructure for contextaware applications. Pervasive Computing, IEEE 3, 66–74 (2004) 15. Yu, Z., Zhou, X., Yu, Z., Zhang, D., Chin, C.Y.: An osgi-based infrastructure for context-aware multimedia services. Communications Magazine, IEEE 44, 136–142 (2006) 16. Lee, H., Park, J., Ko, E., Lee, J.: An agent-based context-aware system on handheld computers, 229–230 (2006) 17. Kim, J.H., Yae, S.S., Ramakrishna, R.S.: Context-aware application framework based on open service gateway 5, 200–204 (2001) 18. Saito, Y., Shapiro, M.: Optimistic replication. ACM Comput. Surv. 37, 42–81 (2005) 19. Kuenning, G.H., Bagrodia, R., Guy, R.G., Popek, G.J., Reiher, P.L., Wang, A.I.: Measuring the quality of service of optimistic replication. In: ECOOP Workshops, pp. 319–320 (1998)
60
D. Preuveneers and Y. Berbers
20. The Eclipse Foundation: Eclipse Communication Framework (2007), http://www.eclipse.org/ecf/ 21. Rellermeyer, J.S., Alonso, G., Roscoe, T.: Building, deploying, and monitoring distributed applications with eclipse and r-osgi. In: Eclipse 2007: Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, pp. 50–54. ACM, New York (2007) 22. Bourges-Waldegg, D., Duponchel, Y., Graf, M., Moser, M.: The fluid computing middleware: Bringing application fluidity to the mobile internet. In: SAINT 2005: Proceedings of the The 2005 Symposium on Applications and the Internet (SAINT 2005), Washington, DC, USA, pp. 54–63. IEEE Computer Society, Los Alamitos (2005) 23. Rellermeyer, J.S.: flowsgi: A framework for dynamic fluid applications. Master’s thesis, ETH Zurich (2006)
Robots in Smart Spaces - A Case Study of a u-Object Finder Prototype Tomomi Kawashima1, Jianhua Ma1, Bernady O. Apduhan2, Runhe Huang1, and Qun Jin3 1 Hosei University, Tokyo 184-8584, Japan [email protected], {jianhua, rhuang}@hosei.ac.jp 2 Kyushu Sangyo University, Fukuoka 813-8503, Japan [email protected] 3 Waseda University, Saitama 359-1192, Japan [email protected]
Abstract. A smart space is a physical spatial environment such as a room that can provide automatic responses according to the contextual information occurring in an environment. The context data is usually acquired via various devices which are installed somewhere in the environment and/or embedded into some special physical objects, called u-objects, which are capable of executing computation and/or communication. However, the devices used in current researches on smart space are either fixed or residing in real objects, which are incapable of moving by themselves. Likewise, more and more robots have been produced, but most of them are designed and developed based on the assumption that the space surrounding a robot is an ordinary one, i.e., a non-smart space. Therefore, it is necessary to study what additional services can be offered and what specific technical issues will be faced when adding robots to smart spaces. To identify the potential novel services and technology problems, this research is focused on a case study of a u-object finding service performed by a robot in a smart space. This paper presents the design and development of the system prototype robot which can communicate with other devices and can find a u-object with attached RFID tag in a smart room. Some preliminary experiments were conducted and the results verified the functionalities of the system.
1 Introduction The essential of ubiquitous computing (ubicomp), as emphasized by Mark Weiser, is “enhancing computer use by making many computers available throughout the physical environment, but making them effectively invisible to the user” [1, 2]. The synergy of ubiquitous computing and communications is the natural result of two fundamental technology trends: (1) the continuing miniaturization of electronic chips and electro-mechanical devices following Moore’s law, and (2) their interconnection via networks especially using wireless communications. Aside from the widespread use and popularity of wireless technology such as IEEE802.11 a/b/g [3] and Bluetooth [4], the WPAN (Wireless Personal Area Network) [5, 6] and high-speed wireless F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 61–74, 2008. © Springer-Verlag Berlin Heidelberg 2008
62
T. Kawashima et al.
networks using UWB (Ultra Wide Band) [7] will soon be available and will play important roles in short range ubiquitous communications. Due to the continuing miniaturization and availability of high-speed communications, many devices can be distributed to surrounding ambient environments and make the ordinary environments capable of computing. That is, a real spatial environment such as a room may become smart to take automatic responses according to the contextual information [8, 9] occurring in this environment. The context data is usually acquired via various devices which are installed somewhere in the environment and/or embedded into some special physical objects. Real physical objects are called u-objects [10], as opposed to virtual e-things, if they are attached, embedded or blended with computers, networks, sensors, actors, IC-tags and/or other devices [11]. The smart space/environment, as one of the major ubicomp areas, has recently received much attention, and various smart space systems have been developed to make smart homes, smart offices, smart classrooms, etc. [12, 13]. However, the devices used in the current smart space researches are either fixed or residing in real objects that cannot move on their own. On the other hand, robotics is an exciting field that has been getting a great deal of attention for decades. Lots of robots have been produced and deployed to various applications such as in games, entertainment, household chores, rescue, military, and manufacturing [14]. The robots can also be seen as some special kinds of smart u-objects with partial human-like or animal-like behaviors. The most special characteristic of robots are that they can move by themselves according to the received instructions or perceptual information. However, most of them are designed and developed based on the assumption that the space surrounding a robot is an ordinary one, i.e., a non-smart space [15, 16]. It is then interesting and worth investigating to know what new services can be offered by smart spaces in which robots exist, and what new technical issues arise when combining smart spaces and robots to make smart systems. Flipcast [17] showed a design of an agent-based platform for communications between robots and ubiquitous devices, but it was not based on a smart space and did not give any concrete examples. Generally, there is almost no systematic research on integrated systems with some combination of robots and smart spaces. One way to approach the systematic research would be to start making concrete and representative sample applications. Therefore, this research is focused on a case study of a u-object finding service done by a robot in a smart space. This paper presents the design and development of a system prototype to manage robots and u-objects attached with RFID tags in a smart room space, and for a robot to communicate with other devices and find a tagged uobject in the smart space. We have conducted some preliminary experiments and the result verifies and evaluates the aforementioned functionalities. In what follows, we first check the possible roles of robots in a smart space as well as the relevant technical problems in general, and then present the u-object finding robot prototype in detail in the remaining sections. After showing the prototype overview in Section 3, we explain how to manage robots and u-objects in Section 4, and how to control and communicate with the robots in Section 5. The experimental
Robots in Smart Spaces
63
environment and results in finding u-objects are described in Section 6. Finally, the conclusion and future work are addressed in the last section.
2 General Services and Issues of Smart Spaces with Robots This section will discuss in general the special roles of robots in a smart space and the related basic issues to be solved in order to provide the desired services of robots. A smart space that can offer automatic services mainly relies on the sensing devices to acquire contextual information, and actuation devices to take responsive actions. As mentioned earlier, these devices are usually installed in some parts of the physical environment; such as walls, or embedded in some physical objects such as books. Because such devices are fixed and don’t move by themselves, they can sense information or act only in specific locations/directions. Different from these devices, robots are capable of moving from one location to another in a space. In addition, many robots can carry sensors and/or actuators. Therefore, a robot can move to a place to sense information or take an action in that place. For example, a movable robot can monitor a place from any direction using its camera to record the condition of a house, office, classroom, shop, street, and so on. Another example is that a robot, under instruction by a virtual host/manager of the smart environment, may bring an object from somewhere in the environment. In general, additional service functions which can be done by robots in smart spaces fall into the following categories. Location-Varied Information Acquisition: When carrying sensors, a robot can get the requested information not only at a specified location, but also from some data distributed in a space. For example, a robot with a microphone can move around to locate a sound source, which is usually hard to precisely detect by fixed microphones. The temperature distribution in a room can also be easily gathered by a movable robot by carrying only a single temperature sensor. Otherwise, many temperature sensors have to be installed in different parts of a relatively large room. That is, robots can function as mobile sensors to collect information at any point in smart space where they can possibly go. Location-Dependent Responsive Action: According to the sensed information, a smart space can take responsive actions. Some reactions may be conducted best at some specific locations. For example, the sound-aware smart room we developed [18] can sense ambient sounds using a microphone array, and output a speech to one of the speakers installed in a fixed location in the room. It sends a be-quiet reminder speech to a speaker near the person who is talking/singing loudly who may disturb others in the same location. Having a robot carrying a speaker in the room, the robot can move near the person and then give the reminder. In general, a robot may conduct some action by moving to a place requested by a smart space. Location-Aware Interaction with Devices/u-Objects: When a robot works in a nonsmart environment, it has to sense and interact with the environment using multimedia data from cameras, microphones, and so on. Therefore, the sophisticated robots rely on complex intelligent technologies for multimedia processing and understanding,
64
T. Kawashima et al.
which are not yet perfect and are very costly. In a smart environment, however, there are various electronic devices which are either independent or embedded in some objects, i.e., being u-objects. Thus, robots will be able to interact with these devices or u-objects via wireless electronic communications, instead of using audiovisual communications. Through communications, a robot may interact with nearby u-objects to identify their locations, know what they are doing, etc. By means of interaction, the robots and devices/u-objects in a smart room can work collaboratively to carry out some tasks. In summary, robots can be applied to add flexible location-based service functions to smart spaces because of their capability to move. To realize the new functions, a robot must collaboratively work with a virtual space host/manager and other devices/u-objects in the space. Therefore, the two fundamental issues for robots to conduct collaborative work in a smart space are how to manage the robots, and how to enable robots to communicate with the space host and various devices/u-objects. A long-term solution to completely solve the management and communication issues would be a scalable framework or middleware capable of supporting many applications in different smart spaces with multiple robots and heterogeneous devices/u-objects. This preliminary study is focused on the necessary management and communications in a specific application, i.e., a prototype called u-object finder, a robot for finding u-object(s).
3 The u-Object Finder Overview It is a very common experience that sometimes we forget and could not immediately find some common items which we need badly, such as keys or books. This is more common to small children and elder people since they often forget what they have just seen or done. The u-object finder (UOF) is an application of the smart space in which a robot moves around to search for a lost object. Figure 1 shows the UOF conceptual environment used in our study and its development.
㫌㪄㪦㪹㫁㪼㪺㫋㩷㪝㫀㫅㪻㪼㫉
㪬㪦㪝㪄㪤㪸㫅㪸㪾㪼㫉 㪩㪝㪠㪛㩷㪩㪼㪸㪻㪼㫉
㫌㪄㪦㪹㫁㪼㪺㫋 㪩㪝㪠㪛㩷㪫㪸㪾㫊 㪬㪦㪝㪄㪩㫆㪹㫆㫋
Fig. 1. The u-object finder environment
Robots in Smart Spaces
65
The environment is a typical room space equipped with computers, devices, and robots which are connected by wireless networks. Basic assumptions to the environment are as follows.
・ ・ ・ ・
・ ・
Every object to be searched must be attached or embedded with at least one device which is used to uniquely identify the object and communicate with other devices. In the present prototype, we attached a RFID tag to every searchable uobject, also called a tagged object. A single object may have multiple attached tags, which can be used to either identify different parts of the object, or to improve search efficiency. A RFID reader is installed somewhere at the room entrance to read the RFID tag of a u-object when the object passes through the entrance. It is used to detect what and when a u-object has been brought in/out of the room. There is one or more movable robots in the room which moves according to the commands and instructions from a room manager. Each robot may carry multisensors for various purposes. To find a tagged object, a RFID reader is carried by the robot in our system. A number of RFID tags are set at fixed positions on the room floor. These tags are used for the robot to precisely locate its position since all the position data has been measured and kept in some room database. Usually a robot can remember the route it has passed, which can be used to estimate the robot’s present position. The electric-magnetic signal strength received by the robot from wireless routers can be also used to estimate its position, but this exhibits some errors in estimating the robot’s position. Thus, the position-fixed RFID tags on the floor can be read when the robot moves, and exploited to reduce or correct the errors. All machines and devices including robots are able to conduct wireless communications via either some specific communication scheme or a common wireless LAN router. There exists a virtual manager, i.e. an application residing on a PC, to manage all devices, u-objects, RFID tags and robots. It also provides an interface for a user to request a robot to find an object and receive the search result via our present UOF prototype.
The u-Object Finder system, as shown in Fig. 2, is mainly composed of the uObject Finder Manager (UOF-Manager) that functions as a central administrator of all devices including robots, and the u-Object Finder Robot (UOF-Robot) that is mainly in charge of searching u-objects in the room. The UOF-Manager has two main components: the u-Object Manager (O-Manager) to manage u-objects, and the Robot Manager (R-Manager) to manage robots. On the other hand, the UOF-Robot has two main components: the Robot Controller (RController) and the Robot Machine (R-Machine). The R-Machine is some kind of robot that has a motor, wheels, an ultrasonic sensor, etc., and thus movable by itself. When the robot moves near a barricade or blocks, it can turn and change direction using the data from the ultrasonic sensor. The movement direction, destination and route will be determined according to the robot’s task, current position, surrounding
66
T. Kawashima et al.
UOF- Manager R- Manager O- Manager
UOF- Robot IEEE802.11b
Bluet oot h
Radio Frequency
u- Object
R- Controller (PDA)
Radio Frequency
R- Machine (Robot)
Fig. 2. The u-object finder components
situations, and so on. The R-Controller functions as a mediator between R-Manager and R-Machine, and runs on a PDA that is carried by the robot. The PDA provides a popular programming environment and adopts the general TCP/IP communication protocol on top of common wireless LANs. A RFID reader and other wired sensors can be also connected directly to this PDA. Therefore, the PDA can greatly extend and enhance the functions of a low-cost robot. The PC, PDA, robot and devices are connected via several networks including IEEE 802.11b LAN, Bluetooth, and radio frequency communication between an RFID reader and an RFID tag. The general procedure to search a u-object is conducted in the following steps. At first, a user informs the UOF-Manager what u-object to search. The UOF-Manager will find out if the object is in the room by checking the object’s existence status kept in the u-object database. If it is in the room, the UOF manager will check if any UOFRobot is available in the room to do the search job. When a robot is available, the UOF-Manager will send an object finding command and the RFID tag number(s) of the u-object to the UOF-Robot. Then, the robot moves around and reads the nearby RFID tags. Once the tag number of the searched object is detected, the UOF-Robot will stop and send the UOF-Manager a search success message. At the same time, the robot will give a speech or a beeping sound to notify the user of its location, which is near to the found u-object. If the u-object cannot be found after a certain time, the UOF-Robot will send an unsuccessful message to the UOF-Manager. The next two sections will explain the UOF-Manager and UOF-Robot in details, respectively.
4 UOF-Manager The UOF-Manager is an application working on a PC. A user can operate the UOFRobot and manage the u-Object respectively using the UOF-Manager. The UOFManager consists of a UOF-GUI, an R-Manager (Robot-Manager) and an O-Manager (u-Object Manager), as shown in Fig. 3. The user can register and delete a u-Object by simply holding it over an RFID reader. The data of the registered u-Object is saved in the u-Object database by the O-Manager.
Robots in Smart Spaces
67
UOF- GUI
R- Manager
Operation Retrieval Connection Module Module Module
UFCP Stack
O- Manager
Registration Status Module Module
u- Object Database
Fig. 3. The UOF-Manager component
4.1 UFCP (U-object Finder Control Protocol) To manage robots and let them work in a smart space, the R-Manager and robots must be first enabled to communicate with each other. One objective of this study is to develop the UFCP stack based on TCP/IP, which will run the R-Manager, RController and R-Machine. It is a protocol for the UOF-Manager to retrieve and operate the UOF-Robot via message exchanges. The UFCP messages are in two forms: UFCP-TCP and UFCP-UDP. The UFCP-UDP is used to retrieve the UOF-Robots in the same network segment, while the UFCP-TCP is used to operate the UOF-robots. The format of the UFCP packet takes on ASCII. The UFCP packet has two fields: command and data, separated by a comma. Table1 shows the message format of UFCP-TCP and UFCP-UDP. 4.2 R-Manager The R-Manager has the UFCP Client which can search a UOF-Robot, make a connection, start and stop the UOF-Robot, change the threshold value of devices (ultrasonic sensor, motor), and get the status of the UOF-Robot. Through the UOF-GUI, a user can conduct the following management functions.
・ Retrieving the UOF-Robot ・
When receiving a robot retrieval demand from a user, the U-Manager will broadcast a UFCP robot search message over connected wireless LANs. The UOFRobot receiving the message will reply an acknowledgment message with its name and current position. The list of retrieved UOF-Robot is shown in UOFGUI. Connection of the UOF-Robot After the user selected one UOF-Robot from the retrieved UOF-Robot list, the user can make a connection using UFCP to the selected UOF-Robot. After the connection is established, the user can start to operate the robot.
68
T. Kawashima et al. Table 1. UFCP message format
UFCP- TCP DATA DESCRIPTION COMMAND UOF- ROBOT START RFID Starting u- Object finding UOF- ROBOT STOP Stopping u- Object finding ACK VALUE, POSITION, etc. Acknowledgement NAK ERROR MESSAGE Negative acknowledgement GET STATUS Getting UOF- Robot status GET VALUE DEVICE NAME Getting specified device’ s value GET POSITION Getting UOF- Robot position SET VALUE DEVICE NAME, VALUE Setting specified device’ s value FOUND RFID Informing found RFID tag SOUND Request for playing sound UFCP- UDP DATA DESCRIPTION COMMAND SEARCH Searching UOF- Robot ACK UOF- ROBOT NAME Acknowledgement
・ Operation of the UOF-Robot
The UOF-Manager can specify which robot will search a u-object through its RFID, can start and stop the UOF-Robot, or get the status of the UOF-Robot. In addition, the UOF-Manager can ask the UOF-Robot to produce a beeping sound to inform the user of its current position in the smart room.
4.3 O-Manager The O-Manager handles the registration and deletion of a new u-Object. When the user holds the RFID attached to an u-Object over an RFID reader, the O-Manager obtains the RFID data through the RFID reader and check whether the RFID is already registered or not. If the RFID is not yet registered, the O-Manager shows a popup window for the user to input the name of the u-Object associated with the RFID. After the user input the name successfully, the user can select the u-Object to be searched in the smart room. The user can also register a u-Object by not using an RFID reader but by inputting the RFID data and name manually. Another function of the O-Manager is to manage the existence status of all registered u-objects. A registered object can be brought out/back to the room. The u-object’s in/out state and corresponding time can be detected by the RFID reader installed at the room entrance. When a UOF-Robot moves around the room, it may encounter many u-objects including the one to be searched and others which are not to be searched. Actually, the positions’ data of all u-objects encountered by the robot can be kept in the u-object database. These existence and position status information can be further used to speedup the searching and tracing of u-objects.
Robots in Smart Spaces
69
5 UOF-Robot As explained in the previous sections, the UOF-Robot is mainly in charge of searching a u-object in the room space. The search is based on two special functions. The first is that, the UOF-Robot can move around the space; and the second is that, the robot carries a RFID reader to detect a RFID tag within a certain distance, e.g. 10cm. The UOF-Robot consists of a R-Controller, a R-Machine and a R-Reader, as shown in Fig. 4.
R- Controller (PDA) R- Machine
R- Controller (PDA)
UFCP Stack ht oot eu lB
R- Machine
UFCP Stack Motor Ultrasonic Sensor
Device Driver B S U
R- Reader
Fig. 4. The UOF-Robot components
Ultrasonic Sensor
Motor R- Reader
Fig. 5. A snapshot of the UOF-Robot
The R-Controller resides in a PDA and is used to communicate and control both the R-Machine and the R-Reader. The R-Reader is a RFID reader that is connected to the PDA via a USB cable. A manufacturer-dependent device driver should be installed in the PDA for the R-Controller to communicate with the R-Reader. The RReader’s task is relatively simple, i.e., mainly reading the nearby tags and sending the tags’ codes to the R-Controller. The R-Machine is a self-movable robot that has a motor, wheels, an ultrasonic sensor, etc. Figure 5 shows a snapshot of the UOF-Robot used in our system. The R-Machine can start to move or stop by following the instructions from the RController. The R-Machine can move around by itself by sensing the surrounding situations with its ultrasonic sensor. The sensor can measure the distance between the R-Machine and the object in front. When the distance from an encountered barricade or block is below some threshold value, it may move away from the barricade/block, and change its moving direction. The changes of rules on the robot’s turn movement and the degree of direction are programmable, and can be set by the user on the UOFGUI. Although the R-Machine is operated by the R-Manager, their interaction messages during operations are mediated via the R-Controller. The R-Controller communicates with the R-Manager over a wireless LAN (IEEE802.11b), and also communicates with the R-Machine using a Bluetooth network, which is embedded in the robot. Like the R-Manager, the UFCP stack runs on both the R-Controller and R-Machine. Therefore, the three of them can exchange
70
T. Kawashima et al.
messages with each other. When receiving a robot retrieval message request from RManager, the R-Controller sends a UFCP-UDP response message to the R-Manager. The operation request, on the other hand, is sent as an UFCP-TCP message. When the robot moves, the R-Reader detects whether there is any RFID tag nearby. If it detects one, it sends the tag code to the R-Controller which will check whether the code is the same as the u-object’s code to be searched. When the RController gets the RFID code of the u-object to be searched from the R-Reader, it sends a found message in UFCP-TCP format to the UOF-Manager. In addition, when it detects a position tag which is set on the floor in a smart room, it also sends the UOF-Manager the tag code, which can be used to get the current robot position. An operation message from the UOF-Manager is resent to the R-Machine by the RController via the Bluetooth network.
6 System Implementation and Experiment A preliminary u-object finding prototype has been implemented with emphasis on communications between heterogeneous devices across different physical networks, and management of the robots and u-objects. The following devices and software were used in our current implementation.
・ ・ ・ ・
Robot: Mindstorm NXT including an Ultrasonic sensor and a Motor developed by LEGO [19]. It adopts lejOS [20] to control the robot movement and set some moving parameters. The ultrasonic sensor is installed in front of the robot. RFID tag and reader developed by Phidgets, Inc [21]. Java-based API which can run on Windows, Linux and Mac OS X platforms. The RFID tags were attached in all objects to be searched, and were also set on some locations on the room floor. The readable distance between a tag and a reader is about 5cm. PDA: AXIM series by Dell, Inc. It runs on Windows Mobile OS and supports .NET Compact Framework. Networks: wireless LAN (IEEE 802.11a/b/g) between the PC and PDA, and Bluetooth between PDA and NXT robot. The USB cable is used to connect a Phidgets tag reader to the PDA.
The u-object finding system is developed using Java programming language. Its software consists of three main parts. The first one is the UFCP, which is a TCP/IPbased common communication platform running on PC, PDA and robot. The UFCP is a very light protocol, only about 50Kbytes, and thus able to run low-spec devices. The second one is the R-Controller running on the PDA. It has multiple functions; such as interacting with the UOF-Manager to get commands and send back results, checking the tag codes from the R-Reader, controlling the R-Machine, and mediating messages between the R-Manager and the R-Machine. The third one is the UOF-Manager running on a PC, which manages both robots and u-objects, and also provides a GUI to interact with a user. To test our system, the experimental environment was set up in our laboratory room, and a partial setting is shown in Fig. 6. The space is 5mx5m on which there is one tagged object to be searched, a robot to search the object, some position tags fixed on the floor, etc.
Robots in Smart Spaces
71
Fig. 6. The UOF experiment environment
When a user wants to find an object in the space, he/she first starts the UOFManager, which is a Java application program running on a PC. Figure 7 shows the snapshot of the manager’s UOF-GUI. All u-objects registered in this room are listed on the upper-left of the GUI. From the u-object list, the user selects the desired object to be found. Next, the user can check what UOF-Robots are available in the room by clicking a robot retrieval button. As shown in the bottom-left of Fig. 7, one robot is retrieved through communications between the R-Manager on the PC and the R-Controller on the PDA carried by the robot. When the UOF-Manager makes a connection with the UOF-Robot successfully, a new tab window is added to the right side of GUI. At the same time, the robot appears at its present position in the middle-right map of the GUI. Some parameters related to the robot movement are displayed as well. The user can operate the UOF-Robot using the buttons and the text field on the tab window. Upon receiving the instruction to find a u-object, the UOF-Manager will send a “START” message command which sends the u-object tag code to the R-Controller of the selected UOF-Robot. Then, the R-Controller gives the “START” command to the R-Machine and asks the R-Reader to read the RFID tags. When the robot moves around, the R-Reader sends every tag code it has detected to the R-Controller. The R-Controller compares the tag code received from the R-Reader with the tag code of the searched u-object. If the two codes are the same, it means that the searched object is very near the robot. Thus, the R-Controller sends a “STOP” command to the R-Machine, and a “FOUND” message to the UOF-Manager. And then, the robot stops and generates a beeping sound to notify the user of its location. If the u-object is not yet found within the specified period of time, the search operation is considered a failure, and then the process is terminated. In our experimental environment, the u-object was found in many cases, but the search time varied greatly depending upon the positions of the robots and the uobject. This is mainly due to the short readable distance between a RIFD read and a
72
T. Kawashima et al.
Fig. 7. A snapshot of the UOF-Manager GUI
tag, and the robot’s moving course in our current implementation which is based on a rather simple algorithm that the UOF-Robot monitors the distance value from its ultrasonic sensor and turns left or back whenever the distance between the robot and barrier becomes less than 10cm. Although the searching performance is unsatisfactory, the whole system works well in terms of communications and management of the robots and devices in the smart space.
7 Conclusion and Future Work The smart space/environment and robotics are used to be studied as two separate research disciplines. It is therefore very significant to study their combination so as to form a new merged research area and generate some novel applications. Based on this idea, we first investigated the potential services and possible new technical issues when integrating the robots into smart spaces, and then focused on a case study of a uobject finding movable robot prototype to search a u-object, which is attached with RFID tags, in a smart room space. We have discussed the design and implementation as well as experiments of the developed system prototype. As mentioned in the previous section, the research presented in this paper is mainly for communications and management of robots and u-objects in a smart space. However, the object searching procedure took a long time and thus it’s inefficient. This is because of two problems: the short reading distance (about 10cm) between the RFID reader and a tag, and the course was simple for the robot’s movement. To overcome the former limitation, the RFID tag with longer reading distance such as 30~50cm will be used in our next study. Multiple tags can be attached on the different sides of a large object to improve the searching efficiency. In addition, a set of RFID readers which can read tags within 1~2m may be installed in fixed locations to first detect the possible region where the u-object is located and the robot can go directly to the
Robots in Smart Spaces
73
region to find the more precise position of the u-object. To overcome the second limitation, the robot must be able to remember the route passed and the obstacles encountered to avoid going to the same places repeatedly. Another interesting approach is to ask two or more robots to search a u-object together, and study how the robots collaborate with each other. Aside from the above, there are still many technical problems and challenging issues to tackle when combining robots and smart space which should be studied along with many kinds of applications in the future.
Acknowledgement This work was partially supported by Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research, 19200005.
References 1. Weiser, M.: Ubiquitous Computing. IEEE Computer (October 1993) 2. Weiser, M., Brown, J.S.: The Coming Age of Calm Technology. In: Denning, P.J., Metcalfe, R.M. (eds.) Beyond Calculation: The Next Fifty Years of Computing (1996) 3. IEEE802.11, http://www.ieee802.org/11/ 4. Bluetooth, http://www.bluetooth.org/ 5. Callaway, E., Gorday, P., Hester, L., Guiterrez, J.A., Naeve, M.: Home Networking with IEEE802.15.4: A Developing Standard for Low-Rate Wireless Personal Area Networks. IEEE Communications Magazine, 70–77 (August 2002) 6. IEEE802.15 Working Group for WPAN, http://www.ieee802.org/15/ 7. Porcino, D., Hirt, W.: Ultra-Wideband Radio Technology: Potential and Challenges Ahead. IEEE Communications Magazine, 66–67 (July 2003) 8. Dey, A.K.: Understanding and Using Context. Personal and Ubiquitous Computing 5(1), 4–7 (2001) 9. Abowd, G.D., Mynatt, E.D., Rodden, T.: The Human Experience. IEEE Pervasive Computing, 48–57 (January-March 2002) 10. Ma, J.: Smart u-Things – Challenging Real World Complexity. In: IPSJ Symposium Series, vol. 2005(19), pp. 146–150 (2005) 11. Private, G.: Smart Devices: New Telecom Applications & Evolution of Human Interfaces. In: CASSIS Int’l Workshop, Nice (March 2005) 12. Cook, D.J., Das, S.K.: Smart Environments: Technologies, Protocols and Applications. Wiley-Interscience, Chichester (2005) 13. Ma, J., Yang, L.T., Apduhan, B.O., Huang, R., Barolli, L., Takizawa, M.: Towards a Smart World and Ubiquitous Intelligence: A Walkthrough from Smart Things to Smart Hyper-spaces and UbicKids. International Journal of Pervasive Comp. and Comm. 1(1) (March 2005) 14. Pinto, J.: Intelligent Robots Will Be Everywhere, in Automation.com (2003), http://www.jimpinto.com/writings/robots.html 15. Sakagami, Y., Watanabe, R., Aoyama, C., Matsunaga, S., Higaki, N., Fujimura, K.: The intelligent ASIMO: System Overview and Integration. Intelligent Robots and System (2002)
74
T. Kawashima et al.
16. Kaneko, K., Kanehiro, F., Kajita, S., Hirukawa, H., Kawasaki, T., Hirata, M., Akachi, K., Isozumi, T.: Humanoid Robot HRP-2 (ICRA 2004) (April 2004) 17. Ueno, K., Kawamura, T., Hasegawa, T., Ohsuga, A., Doi, M.: Cooperation between Robots and Ubiquitous Devices with Network Script Flipcast. In: Proc. Of Network Robot System: Toward Intelligent Robotic Systems Integrated with Environments (IROS) (2004) 18. Ma, J., Lee, J., Yamanouchi, K., Nishizono, A.: A Smart Ambient Sound Aware Environment for Be Quiet Reminding. In: Proc. of IEEE ICPADS/HWISE 2005, Int’l Workshop on Heterogeneous Wireless Sensor Networks, Fukuoka (July 2005) 19. Lego Mindstorm NXT, http://mindstorms.lego.com 20. lejOS, http://lejos.sourceforge.net/ 21. Phidgets, Inc., http://www.phidgets.com
Biometrics Driven Smart Environments: Abstract Framework and Evaluation Vivek Menon1, , Bharat Jayaraman2, and Venu Govindaraju2 1
Amrita University, Coimbatore 641 105, India vivek [email protected] 2 University at Buffalo, Buffalo, NY 14260, USA [email protected], [email protected] Abstract. We present an abstract framework for ‘smart indoor environments’ that are monitored unobtrusively by biometrics capture devices, such as video cameras, microphones, etc. Our interest is in developing smart environments that keep track of their occupants and are capable of answering questions about the whereabouts of the occupants. We abstract the smart environment by a state transition system: Each state records a set of individuals who are present in various zones of the environment. Since biometric recognition is inexact, state information is probabilistic in nature. An event abstracts a biometric recognition step, and the transition function abstracts the reasoning necessary to effect state transitions. In this manner, we are able to accommodate different types of biometric sensors and also different criteria for state transitions. We define the notions of ‘precision’ and ‘recall’ of a smart environment in terms of how well it is capable of identifying occupants. We have developed a prototype smart environment based upon our proposed concepts, and provide experimental results in this paper. Our conclusion is that the state transition model is an effective abstraction of a smart environment and serves as a basis for integrating various recognition and reasoning capabilities.
1
Introduction
The vision of pervasive computing [1] provides the inspiration for a smart environment saturated with sensors, computing, and communication devices that are gracefully interfaced with human users [2]. In this paper, we focus on smart indoor environments such as homes, offices, etc. Indoor environments do not suffer from the problems of power or battery-life that confront outdoor environments. The goal of our research is to develop smart indoor environments that can identify and track their occupants as unobtrusively as possible and be capable of answering queries about the occupants. Such ‘context-aware’ systems can identify and track people in environments ranging from homes for elderly or disabled, office workplace, department stores and shopping complexes to larger arenas such as airports, train stations, etc.
This work was done while the author was a Visiting Research Scientist at the Center for Unified Biometrics and Sensors, University at Buffalo.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 75–89, 2008. c Springer-Verlag Berlin Heidelberg 2008
76
V. Menon, B. Jayaraman, and V. Govindaraju
(a)
(b)
(c)
Fig. 1. Architecture of a Biometrics Driven Smart Environment
Identification approaches vary from tag-based approaches such as those involving RFID to those based on biometrics of the user. Tag-based methodologies tend to be obtrusive, requiring the individual to continuously retain them, however small the tag maybe. Some of the biometric techniques, such as fingerprint and iris scans, require a ‘pause-and-declare’ interaction with the human [3]. They are less natural than face, voice, height, and gait, which are less obtrusive and hence are better candidates for use in our smart environments. Figure 11 shows the overall architecture of a biometrics-driven smart environment. Although the figure illustrates a single biometric modality of face recognition, the architecture is also applicable to other biometric modalities. For example, zone 1 might use voice recognition, zone 2 might use face recognition, 1
Face images blurred to preserve anonymity.
Biometrics Driven Smart Environments
77
and zone 3 might use height estimation. However, in all cases the output of a biometric recognition is the set of person probability pairs as discussed in more detail below. The main contribution of this paper is a framework for abstracting the behavior of a biometrics-driven smart environment in terms of a state transition system. Our proposed framework supports multiple biometric modalities in a uniform manner and facilitates a precise statement of the performance aspects of a smart environment. The state of an environment is expressed in terms of the probabilities of the occupants being present in the different zones of the environment. The state information is probabilistic because a biometric recognizer typically provides a set of scores indicating the degree of match between the subject and the candidates in the database. Therefore, in our approach an event abstracts a biometric recognition step - whether it is face recognition, voice recognition, etc. - and is represented as a set of pairs o, p(o) where p(o) is the probability 2 that occupant o has been recognized at this event. The transition function abstracts the reasoning necessary to effect state transitions. Effectively, the transition function takes as input, a state and an event, and determines the next state by assigning revised probabilities to the occupants in the environment based upon the probabilities in the event. In this manner, we are able to accommodate different types of biometric sensors and also different criteria for state transitions, including those that incorporate declarative knowledge of the individuals and the environment. It is not necessary for us to consider non-deterministic transitions, since a state itself is represented as a set of occupants and their probabilities. We introduce the concepts of precision and recall in order to provide a quantitative measure of the performance of a smart environment. Precision captures how well an occupant is recognized, while recall captures whether an occupant is recognized at all. These are complementary concepts and together capture the overall performance of a smart environment. The concepts of precision and recall are standard performance measures in the information retrieval literature [6], but we have adapted the definitions to suit our context. The rest of this paper is organized as follows. The related work is presented in Section 2 while the details of our model are discussed in Section 3. The experimental prototype is described in Section 4 and the conclusions and the future work are presented in Section 5.
2
Related Work
There has been considerable interest in the subject of smart environments. A major difference between our proposed approach and several of the approaches surveyed below is our use of a framework in which the details of biometric recognition are abstracted simply as events in a state transition system. We also 2
Such scores can be mapped to probabilities by running the recognizer on a large number of samples, as shown in [4,5].
78
V. Menon, B. Jayaraman, and V. Govindaraju
characterize the performance of the smart environment in terms of the concepts of precision and recall. We briefly survey closely related efforts below and highlight their main features. The survey paper by Cook and Das [7] provides a good account of the stateof-the-art in smart environments. There are several projects that focus on smart homes (MavHome [8], the Intelligent Home [9], the House n [10] and development of adaptive control of home environments by also anticipating the location, routes and activities of the inhabitants. 2.1
Location Estimation
An extensive survey and taxonomy of location systems for a ubiquitous computing application is discussed in [11] while [12] provides a more recent survey of position location techniques in mobile systems and draws a comparison between their parameters. Fox et al. [13] highlight the relevance of Bayesian filtering techniques in dealing with the uncertainty characteristic of sensors in pervasive computing environments and apply them in the context of estimation of an object’s location. Data Association problem [14] is a pertinent issue in scenarios which involve tracking of multiple people using anonymous sensors. The ambiguity in identity estimation arising due to the absence of ID sensors and the corresponding tags on people is difficult to resolve. Thus identity estimation, absolute or relative is a desirable and integral subsystem of any object tracking mechanism. 2.2
Tracking
Bui et al. [15] propose an Abstract Hidden Markov Model(AHMM) based approach for tracking human movement in an office like spatial layout and to predict the object trajectories at different layers of detail. For location tracking in a single inhabitant smart space, Roy et al. [16] propose an optimal algorithm based on compressed dictionary management and online learning of the inhabitant’s mobility profile. In a related work [17], they highlight the complexity of optimal location prediction across multiple inhabitants in a smart home and propose a stochastic game-theoretic approach, which learns and estimates the inhabitants’ most likely location profiles. This study uses RFID tags to track the inhabitants. A combination of particle filters with Kalman filters to track multiple objects with accuracy of anonymous sensors and identification certainty of id-sensors is discussed in [13,18]. Krumm et al. [19] investigate the nuances of visual person tracking in intelligent environments in the context of the EasyLiving project [20] by deploying multiple cameras for tracking multiple people. However this tracking experiment was within the confines of a single room and the identity estimation and maintenance for facilitating the tracking only dealt with non-absolute, internally system generated identity of tracked persons. 2.3
Biometrics in Smart Environments
Pentland and Choudhury [3] highlight the importance of deploying audio-andvideo based recognition systems in smart environments as these are modalities
Biometrics Driven Smart Environments
79
similar to those used by humans for recognition. They summarize the face recognition efforts and discuss various commercial systems and applications as well as its novel applications in smart environments and wearable computing. Chen and Gellersen [19] propose a new method to support awareness based on fusion of context information from different sources in a work environment which included integration of audio and video sources with more specific environment sensors and with logical sensors that capture formal context. Reasoning over the context information generated by applying different perception techniques on the raw data collected is used to generate a refined context. Hamid Aghajan et al. [22] propose a vision-based technology coupled with AI-based algorithms for assisting vulnerable people and their care givers in a smart home monitoring scenario. However, users are expected to wear a wireless identification badge that broadcasts a packet upon sensing a significant signal by one of the accelerometers. Gao et al. [23] propose a new distance measure for authentication in their face recognition system for a ubiquitous computing environment which relies on a fusion of multiple views of each person. Their work focuses on optimizing a single modality to improve robustness rather than deploying a multimodal approach that fuses different biometrics. Hong et al. [24] discuss structure, operation and performance of a face verification system using Haar-like features and HMM algorithm in a ubiquitous network environment. Zhang et al. [25] propose a distributed and extensible architecture of a continuous verification system that verifies the presence of logged-in user. A Bayesian framework that combines temporal and modality information holistically is used to integrate multimodal passive biometrics which includes face and fingerprint. Driven by the proliferation of commercially available hand-held computers, Hazen et al. [26] research the improvements in identification performance by adopting a bimodal biometric approach to user identification, for use on mobile devices by integrating audio and visual biometric information in the form of voice and face. They also report significant improvements in identification performance that can stem from using dynamic video information instead of static image snapshots on a database of 35 different people. Highlighting the privacy concerns of video recording discussed in Bohn et al [27] and reliability issues of face recognition techniques for user authentication, Vildjiounaite et al. [28], deploy a fusion of accelerometer based gait recognition and speaker recognition as an unobtrusive and marginally privacy-threatening means of user authentication with personal mobile devices. They have reported improved performance in the combination mode as opposed to individual modalities for a user base of 31. More recently Bernardin Stiefelhagen [29] have implemented a system for the simultaneous tracking and incremental multimodal identification of multiple users in a smart environment which fuses person track information, localized speaker ID and high definition visual ID cues opportunistically to gradually refine the global scene model and thus increase the system’s confidence in the set of recognized identities. In terms of the focus on the use of non-obtrusive biometrics based recognition and location estimation, our work is similar to [29]. However, in our research,
80
V. Menon, B. Jayaraman, and V. Govindaraju
we propose an abstract framework where in a variety of biometric modalities can be incorporated in a uniform manner. Our approach to identity estimation deals with the absolute identity of people across multiple zones of a facility. However, we attempt to highlight the inherent uncertainty of automated face recognition by recasting the eigen distances generated by eigenface algorithm into a probability distribution of the registered faces, instead of the conventional approach of assigning the value with the least eigen distance as the matching face. This probabilistic approach to biometric recognition is a key theme around which we construct our abstract framework for a biometrics driven smart environment. 2.4
State Space Representation
It might appear that a Hidden Markov Model(HMM) would serve as an elegant basis for representing the state space. From a HMM perspective, a smart environment with n occupants and m zones can have mn distinct possible states. Thus probabilities are not associated with the states but with the transitions between them; these transition probabilities are to be learnt from past behavior or by simulation [15]. Thus an HMM approach is computationally more complex due to a state space explosion and the requirement of a priori probabilities of trajectories. In our approach, the size of a state is m ∗ n, meaning that for each of the m zones we record the probabilities of each of the n occupants being present in that zone. Therefore, in Section 4 (Evaluation), we depict a typical state as a table with n rows and m columns. The transitions from one state to another are deterministic. Therefore, given any event in a zone, the next state is unambiguously determined. In contrast with the HMM approach, we do not need to learn the transition probabilities in order to determine the next state because biometric recognition (or event) provides a direct means for effecting state transitions. Our state transition model is discussed in the next section.
3
Framework
Definition (Smart Environment). An n-person smart environment is abstracted as a state transition system (S, E, Δ) where S is the set of states labeled s0 , s1 , . . . sx ; E is the set of events labeled e1 , e2 , . . . ex and Δ : S × E → S is a function that models the state transition on the occurrence of an event. The state transitions may be depicted as follows: e
e
e
1 2 x s0 → s1 → s2 . . . → sx
We shall consider a smart environment as being divided into a number of zones, each of which may be a region (or a set of rooms). We include two special zones, an external zone and a transit zone, for the sake of convenience. Definition (State). Given n occupants, o1 . . . on and m zones labeled 1 . . . m, a state sk of the environment is represented by an m-tuple Z1k . . . Zmk where for 1 ≤ j ≤ m, Zjk= {oi , pjk (oi ) : 1 ≤ i ≤ n}. Also, in each state sk and for m each occupant oi , i=1 pjk (oi ) = 1.
Biometrics Driven Smart Environments
81
The state of an environment is expressed in terms of the probabilities of the occupants being present in the different zones of the environment. The constraint m i=1 pjk (oi ) = 1 indicates that sum of probabilities of any occupant being present across all the zones in any state equals one. In the initial state s0 , we may assume without loss of generality that all occupants are in the external zone with probability 1. Given a smart environment with n occupants, m zones, and x number of events, the total size of the state space is m ∗ n ∗ (x + 1). Thus, the size of the state space is quadratic in m and n rather than exponential, as in HMMs. In this paper we model all exit events as entry events into a transit zone. Hence it suffices in our model to only consider entry events. An event is essentially an abstraction of a biometric or feature recognition step performed in the environment. Definition (Event). Given n occupants o1 . . . on , an (entry) event ek occurring at zone j (1 ≤ j ≤ m) at time t is represented as t, j, P , where P = {oi , pjk (oi ) : 1 ≤ i ≤ n} and pjk (oi ) is the probability that an occupant oi was recognized at zone j in event ek . As noted earlier, an event is an abstraction of a recognition step. For simplicity, we assume that events happen sequentially in time, i.e., simultaneous events across different zones are ordered arbitrarily in time. That is, the entry of an occupant oi into zone zi and occupant oj to zone zj at the same time t can be modeled as oi before oj or oj before oi . Definition (Transition Function). Δ : S × E → S, maps state sk−1 into state sk upon an event ek = t, j, P occurring at time t in zone j, where P = {oi , pjk (oi ) : 1 ≤ i ≤ n}. Let sk−1 = Z1k−1 . . . Zjk−1 . . . Zmk−1 and Zjk−1 = {oi , pjk−1 (oi ) : 1 ≤ i ≤ n}. Then Δ determines state sk = Z1k . . . Zjk . . . Zmk as follows: Let xi = 1 − pjk (oi ). Then, Zjk = {oi , pjk (oi ) + xi ∗ pjk−1 (oi ) : 1 ≤ i ≤ n}
(1)
Zlk = {oi , xi ∗ plk−1 (oi ) : 1 ≤ i ≤ n}, for 1 ≤ l ≤ m and l = j
(2)
The transition function maps a state sk−1 to a state sk upon an event ek occurring at zone j. For zone j, we sum the new probability pjk (oi ) for an occupant generated by event ek with the complement of the new probability value 1 − pjk (oi ), apportioned by a factor of the existing probability pjk−1 (oi ). In the event of a revision, there might be a violation of the constraint m that the sum of probabilities for any occupant across all zones equals one ( i=1 p(oi ) = 1). To restore adherence to this constraint, for each occupant oi , we apportion to the probability of oi being present in each zone l = j by redistributing the complement of the new probability value, 1 − pjk (oi ), in the ratio of the probability value in existing state plk−1 (oi ). This transition function ensures that the probability values associated with a new event as well as the current state figure in the determination of the new state as in a Markov process. Since we are dealing with a closed environment with a fixed set of occupants, o1 . . . on , we can, in general, utilize a priori declarative knowledge regarding
82
V. Menon, B. Jayaraman, and V. Govindaraju
the occupants, such as their schedules or knowledge of the environment, such as the distance between rooms and whether an occupant could move between a pair of rooms within a certain interval of time. However, the transition function presented in the above definition does not make any use of such declarative knowledge of the occupants. The nature of the declarative knowledge can also be fuzzy to factor in the probabilistic nature of the environment. The reasoning component can alleviate some of the weaknesses in the recognition component. However it should ensure that its results are not totally contradictory to the recognition system thereby generating inconsistencies. We now define the concepts of precision and recall for a smart environment. These are defined in terms of the ground truth, which, for a given input event sequence, is a sequence of states of the environment wherein the presence or absence of any occupant in any zone is known with certainty (0 or 1). Precision captures how well an occupant is recognized, while recall captures whether an occupant is recognized at all. Definition (Ground Truth). Given n occupants O={o1 . . . on } and an event sequence e1 . . . ex , then the ground truth is the sequence of states g1 . . . gx where each gk = T1k . . . Tjk . . . Tmk and Tjk = {oi , qjk (oi ) : 1 ≤ i ≤ n ∧ qjk (oi ) ∈ {0, 1}}. Also, oi , 1 ∈ Tjk → oi , 1 ∈ Tlk , for all l = j in state gk . Given a zone of a state, the precision for that zone of the state is defined as the average probability of those occupants that are present in that zone of the state as given in the ground truth. The average precision across all zones (where at least one occupant is present as per the ground truth) is the precision for the state, and the average precision across all states is the precision for a given ground truth. Finally, the average across multiple ground truths is the precision of the smart environment. Definition (Precision). Given an environment with m zones, n occupants O = {o1 . . . on }, an event sequence E = e1 . . . ex , a ground truth G = g0 , g1 , . . . gx , and state transitions S = s0 , s1 , . . . sx . We define the precision, π, with respect to G as follows: Let πjk = ajk /bjk , where ajk =
{pjk (oi ) : 1 ≤ i ≤ n ∧ qjk (oi ) = 1}
bjk = |
{oi : 1 ≤ i ≤ n ∧ qjk (oi ) = 1}|
Then πk =
m j=1
πjk /m, and we define π =
x k=1
πk /x.
Now, given a set of ground truths {G1 , G2 , . . . G2 } with the corresponding precit sions {π 1 , π 2 , . . . π t }, the precision of the smart environment, Π = l=1 π l /t. For a given ground truth, state and zone, we define recall with respect to a threshold θ as the ratio a/b, where a is the number of occupants of that zone with probabilities greater than θ and who are present in the ground truth, and
Biometrics Driven Smart Environments
83
b is the number of occupants who are present in the ground truth for that zone. The recall for a state is the average of the probabilities across all zones where at least one occupant is present as per the ground truth. The average recall across all states is the recall for a given ground truth, and the average across multiple ground truths is the recall of the smart environment. Definition (Recall). Given an environment with m zones, n occupants O = {o1 . . . on }, an event sequence E = e1 . . . ex , a ground truth G = g0 , g1 . . . gx , and state transitions S = s0 , s1 , . . . sx . We define the recall, ρ, with respect to a threshold θ as follows: Let ρjk = ajk /bjk , where ajk = |
{oi : 1 ≤ i ≤ n ∧ qjk (oi ) = 1 ∧ pjk (oi ) > θ}|
bjk = |
{oi : 1 ≤ i ≤ n ∧ qjk (oi ) = 1}|
Then ρk =
m j=1
ρjk /m, and we define ρ =
x k=1
ρk /x.
Now, given a set of ground truths {G1 , G2 , . . . G2 } with the corresponding precit sions {ρ1 , ρ2 , . . . ρt }, the recall of the smart environment, R = l=1 ρl /t. As it is clear, the recall is inversely proportional to the threshold, θ, since lowering the threshold will result in more occupants being identified. This figure is generally arrived at experimentally for a smart environment. A reasonable choice of θ is 0.5, and this is also the value that we adopt in our experiments. In the above definition, the recall was defined zone-wise. An alternative approach is to disregard the zones while taking the ratio; doing so would increase the overall recall. Our definition gives due importance to zones, and hence is a relatively more conservative.
4
Evaluation
We have developed an experimental prototype embodying the ideas presented in this paper. Figure 1(b) illustrates a 4-zone smart environment (fourth zone representing the external zone) with 25 occupants who are monitored by video cameras installed in each of the zones. Our experimental prototype collects sample face images of the 25 occupants of an office facility and pre-registers them in a training database. The image sensors deployed in each zone detects the presence of occupants as they move through the zones and verify the face images extracted from the video against the database. The distance scores generated by eigenface is recast into a probability value [4,5] which denotes a posterior probability of the detected face matching the pre-registered occupants. This set of person-probability pairs generated essentially constitutes an event as defined in Section 3.
84
V. Menon, B. Jayaraman, and V. Govindaraju
Fig. 2. Event Sequence
Although our abstract framework is independent of the details of any particular biometric modality, we illustrate our concepts in terms of face recognition. Automated face recognition is yet to attain any comparable levels of robustness as that of humans. Factors such as viewing angle, distance, background clutter, lighting spectrum, intensity, angle and diffuseness of lighting, differences between posed photographs and spontaneous expression can cause fluctuations in the performance of computer vision based on statistical classifiers [30]. Our prototype is based upon OpenCV’s [31] implementation of the eigenface algorithm [32], which provides a basis for a simple though not robust implementation of face recognition. This prototype can be extended to incorporate other biometric recognizers in a similar manner. For example, for voice recognition, the voice samples of occupants are pre-registered in a database instead of face image samples. The ambient voice that is picked up by the voice sensors installed in different zones can be
Biometrics Driven Smart Environments
85
Fig. 3. Sample State Transition
(a)
(b) Fig. 4. Illustrating Precision and Recall
matched against the voice database to generate a set of person-probability pairs. In this manner the different biometric recognizers are interfaced in a uniform way with the rest of the system. We illustrate with figures 2, 3, and 4, the computation carried out by the state transition system that is embodied in the 4-zone smart environment. We
86
V. Menon, B. Jayaraman, and V. Govindaraju
have presented the observations and results obtained in a tabular form for ease of understanding. Figure 2 presents 10 events involving four of the occupants of the smart environment. Each event is presented as a column of probability values - these form the output of the face recognition module as it matches a given face against the training database. Shown in boldface are the probability values corresponding to the actual occupants who were involved in the corresponding events, as per the ground truth. Italicized values indicate the probabilities corresponding to the occupants who were not involved in the corresponding events, but falsely assigned values by the recognizer. This ambiguity may arise due to any of the reasons already discussed above. Figure 3 illustrates a sample transition from state s8 to s9 upon the occurrence of event e9 in zone 1. The probabilities of the occupants in zone 1 in state s9 are obtained as per equation (1) defined under the transition function definition. For the remaining zones of s9 , the probability values are obtained as per equation (2) defined under the transition function. Figure 4 illustrates the results of the precision and recall for the ground truth corresponding to the event sequence of figure 2. The low values for precision at zone 3, corresponding to states s3 and s10 in particular, can be traced to the ambiguity arising in the face recognition step at events e3 and e10 , both occurring at zone 3 which results in a low probability of recognition of occupants o1 and o2 at these events respectively. For the same reason, the values for recall at zone 3 also suffers, thereby affecting the average recall of the states s3 . . . s6 .
5
Conclusions
We have presented a novel framework for non-obtrusive biometric based indoor smart environments for identification and tracking. 1. A state transition framework in which events abstract different biometric recognition steps and transitions abstract different reasoning steps. 2. A characterization of the performance of the smart environment in terms of the concepts of precision and recall. Our state transition system is fundamentally probabilistic because the biometric recognition that underlies events is inexact in nature. Our formulation of precision and recall succinctly characterizes the performance of a smart environment. We believe that our state transition model is an effective abstraction of a smart environment and serves as a basis for integrating different recognition and reasoning capabilities. In our model, a state provides location information at a zone level and a sequence of consecutive states implicitly contains zone level tracking information for all occupants. While it is possible to define precision and recall with respect to any query of interest, we have formulated them in a query independent manner which we believe is more general. Our model is capable of supporting spatial and temporal queries, such as: the location of an occupant in the facility; time of entry/exit of an occupant; the first/last person to enter/leave the facility;
Biometrics Driven Smart Environments
87
the current occupants present in the facility; etc. A query of special interest is tracking of individuals over time and space. We plan to enhance our current prototype by incorporating a variety of biometric recognizers, such as for height, gait, voice, etc. It is possible to fuse two or more biometric modalities to enhance the overall performance of recognition. We also plan to incorporate spatio-temporal reasoning based upon declarative knowledge of the environment as well as the occupants. Through such enhancements in recognition and reasoning, we can improve the overall precision and recall of the smart environment. We plan to test this approach on larger environments and support speech-based information retrieval queries about the environment.
Acknowledgements Thanks to Philip Kilinskas for his help in developing the experimental prototype; Dr. Jason J. Corso for discussions on Markov models; and members of the Center for Unified Biometrics and Sensors for their comments and suggestions.
References 1. Weiser, M.: The Computer for the 21st Century. Scientific American 265(3), 66–75 (1991) 2. Satyanarayanan, M.: Pervasive Computing: Vision and Challenges. IEEE Personal Communications 8(4), 10–17 (2001) 3. Pentland, A., Choudhury, T.: Face Recognition for Smart Environments. IEEE Computer 33(2), 50–55 (2000) 4. Cao, H., Govindaraju, V.: Vector Model Based Indexing and Retrieval of Handwritten Medical Forms. In: Proc. of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), pp. 88–92. IEEE Computer Society, Washington (2007) 5. Bouchaffra, D., Govindaraju, V., Srihari, S.: A Methodology for Mapping Scores to Probabilities. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(9), 923–927 (1999) 6. van Rijsbergen, C.J.: Information Retrieval. Butterworths, Boston (1979) 7. Cook, D., Das, S.: How smart are our environments? An updated look at the state of the art. Pervasive and Mobile Computing 3(2), 53–73 (2007) 8. Youngblood, M., Cook, D.J., Holder, L.B.: Managing adaptive versatile environments. In: Proc. of the Third IEEE Intl. Conf. on Pervasive Computing and Communications, pp. 351–360. IEEE Computer Society, Washington (2005) 9. Lesser, V., et al.: The Intelligent Home Testbed. In: Proc. of the Autonomous Agents 1999 Workshop on Autonomy Control Software. ACM, New York (1999) 10. House n Living Laboratory Introduction (2006) 11. Hightower, J., Borriello, G.: Location Systems for Ubiquitous Computing. IEEE Computer 34(8), 57–66 (2001) 12. Manesis, T., Avouris, N.: Survey of position location techniques in mobile systems. In: Proc. of the Seventh Intl. Conf. on Human Computer interaction with Mobile Devices and Services. MobileHCI 2005, pp. 291–294. ACM, New York (2005)
88
V. Menon, B. Jayaraman, and V. Govindaraju
13. Fox, D., Hightower, J., Liao, L., Schulz, D., Borriello, G.: Bayesian filtering for location estimation. IEEE Pervasive Computing 2(3), 24–33 (2003) 14. Bar-Shalom, Y., Li, X.-R.: Multitarget-Multisensor Tracking: Principles and Techniques, Yaakov Bar-Shalom (1995) 15. Bui, H.H., Venkatesh, S., West, G.: Tracking and surveillance in wide-area spatial environments using the Abstract Hidden Markov Model. Intl. Journal of Pattern Recognition and Artificial Intelligence 15(1), 177–195 (2002) 16. Roy, A., Bhaumik, S., Bhattacharya, A., Basu, K., Cook, D.J., Das, S.K.: Location aware resource management in smart homes. In: Proc. of the First IEEE Intl. Conf. on Pervasive Computing and Communications, pp. 481–488. IEEE Computer Society, Washington (2003) 17. Das, S.K., Roy, N., Roy, A.: Context-aware resource management in multiinhabitant smart homes: A framework based on Nash H-learning. Pervasive and Mobile Computing 2(4), 372–404 (2006) 18. Schulz, D., Fox, D., Hightower, J.: People tracking with anonymous and id-sensors using Rao-Blackwellised particle filters. In: Proc. of the 18th Intl. Joint Conference on Artificial Intelligence (IJCAI), pp. 921–926 (2003) 19. Krumm, J., Harris, S., Meyers, B., Brumitt, B., Hale, M., Shafer, S.: Multi-Camera Multi-Person Tracking for EasyLiving. In: Proc. of the 3rd IEEE Intl. Workshop on Visual Surveillance (VS 2000). IEEE Computer Society, Washington (2000) 20. Brumitt, B., Meyers, B., Krumm, J., Kern, A., Shafer, S.A.: EasyLiving: Technologies for Intelligent Environments. In: Thomas, P., Gellersen, H.-W. (eds.) HUC 2000. LNCS, vol. 1927, pp. 12–29. Springer, Heidelberg (2000) 21. Chen, D., Gellersen, H.: Recognition and reasoning in an awareness support system for generation of storyboard-like views of recent activity. In: Proc. of the Intl. ACM SIGGROUP Conference on Supporting Group Work. GROUP 1999. ACM Press, New York (1999) 22. Aghajan, H., et al.: Distributed Vision-Based Accident Management for Assisted Living. In: Okadome, T., Yamazaki, T., Makhtari, M. (eds.) ICOST. LNCS, vol. 4541, pp. 196–205. Springer, Heidelberg (2007) 23. Gao, Y., Hui, S.C., Fong, A.C.: A Multi-View Facial Analysis Technique for Identity Authentication. IEEE Pervasive Computing 2(1), 38–45 (2003) 24. Hong, K., et al.: Real Time Face Detection and Recognition System Using Haar-Like Feature/HMM in Ubiquitous Network Environments. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Lagan´ a, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3480, pp. 1154–1161. Springer, Heidelberg (2005) 25. Zhang, S., et al.: Continuous Verification Using Multimodal Biometrics. In: Zhang, D., Jain, A.K. (eds.) ICB 2005. LNCS, vol. 3832, pp. 562–570. Springer, Heidelberg (2005) 26. Hazen, T., Weinstein, E., Heisele, B., Park, A., Ming, J.: Multi-Modal Face and Speaker Identification for Mobile Devices. In: Hammoud, R.I., Abidi, B.R., Abidi, M.A. (eds.) Face Biometrics for Personal Identification: Multi-Sensory Multi-Modal Systems, pp. 123–138. Springer, Heidelberg (2006) 27. Bohn, J., Coroama, V., Langheinrich, M., Mattern, F., Rohs, M.: Social, Economic, and Ethical Implications of Ambient Intelligence and Ubiquitous Computing. In: Weber, W., Rabaey, J., Aarts, E. (eds.) Ambient Intelligence, pp. 5–29. Springer, Heidelberg (2005)
Biometrics Driven Smart Environments
89
28. Vildjiounaite, E., et al.: Unobtrusive Multimodal Biometrics for Ensuring Privacy and Information security with Personal Devices. In: Fishkin, K.P., Schiele, B., Nixon, P., Quigley, A. (eds.) PERVASIVE 2006. LNCS, vol. 3968, pp. 186–201. Springer, Heidelberg (2006) 29. Bernardin, K., Stiefelhagen, R.: Audio-visual multi-person tracking and identification for smart environments. In: Proc. of the 15th International Conference on Multimedia, pp. 661–670. ACM, New York (2007) 30. Hewitt, R., Belongie, S.: Active Learning in Face Recognition: Using Tracking to Build a Face Model. In: Proc. of the 2006 Computer Vision and Pattern Recognition Workshop, pp. 157–157. IEEE Computer Society, Washington (2006) 31. OpenCV, http://www.intel.com/technology/computing/opencv/index.htm 32. Turk, M., Pentland, A.: Eigenfaces for Recognition. J. Cognitive Neuroscience 3(1), 71–86 (1991)
A Structured Methodology of Scenario Generation and System Analysis for Ubiquitous Smart Space Development Ohbyung Kwon1 and Yonnim Lee2 1
School of International Management Kyunghee University Seochun, Ghiheung, Yongin, Kyunggi-do, South Korea [email protected] 2 Research Center for Ubiquitous Business and Services, Kyunghee University Seochun, Ghiheung, Yongin, Kyunggi-do, South Korea [email protected]
Abstract. Ubiquitous smart space (USS) has been regarded as a promising extension of ubiquitous services, and it is currently the subject of world-wide development. In one USS development methodology, scenario development is performed before system analysis and design. However, even though many redundant elements can be found between scenarios and system analysis results, developers have not been taking any structural approaches to join them together for more consistency and eventually higher productivity. Hence, the aim of this paper is to propose a methodology to increase the consistency in the early steps of USS development. To do so, scenario and requirement analysis are integrated in a structured manner. Keywords: ubiquitous smart space, scenario analysis, requirement analysis, structured method.
1 Introduction Advances in ubiquitous technologies have led to the development of many new types of ubiquitous services. Unlike earlier approaches, ubiquitous computing services are defined and used as services that are provided not just in physical spaces but also in logical spaces. This extension of service concept has drawn interest to USS (Ubiquitous Smart Space). USS is an artificial space that is added to ubiquitous computing technology. Recently ubiquitous computing services have developed in USS environment all over the world, for example: in the United States, there are several systems including HP’s Cool Town, Microsoft’s Easy Living, Cisco’s MIE, Intel’s Digital Home, Xerox’s Smart Media Spaces, NIST’s Smart Space, Georgia Tech’s e-Class, and MIT’s Auto-ID Center. In Europe, systems include EU’s 2Wear, Belfast’s MIME, Orange at Home of Orange, and Siemens’ Smart Home. In Japan, systems include the Ubiquitous ID Center’s Ubiquitous ID Technology, Matsushita’s eHII system, OMRON & ODAKYU’s GOOPAS, NTT’s Power Conversion System, and NTTDocomo’s Cmode. In Korea, F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 90–104, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Structured Methodology of Scenario Generation and System Analysis
91
systems include ETRI’s u-Post, SKT’s Moneta, Samsung Electronics’ Digital Home, and CVnet’s Cyber Village. Most of these USSes are proposed by feasible service scenarios that involve various ubiquitous computing elements, technologies and products. In the first stage of development, they write a service scenario that shows a possible future lifestyle, and carries out a requirement analysis through that scenario. From this, the developers design and implement the USS. USS is totally different from the previous physical space so these services also deploy very differently. Therefore, the scenario method that is very widely used in future research has been recognized as an effective tool in developing future services. The scenario method was proven very useful for eliciting USS service idea [1]. Moreover, the scenario method is useful for explaining new ideas to another developer or to capture requirements for system implementation. Hence, the scenario method is widely used as a method to develop USS service [2]. Carnegie Mellon University’s Aura project was started with the goal of achieving invisible computing by Peter Steenkiste in 1999; they first used the scenario method and designed the Aura system to realize that scenario. However, to date, no standardized methodology exists for writing scenarios. Most scenario methods suggest simple guidelines without using all USS development phases. Therefore many scenarios have been developed by brainstorming and the contents have been decided by personal taste or by their organization’s preferences. Brainstorming is a group creativity technique designed to generate a large number of ideas for the solution to a problem. Although brainstorming is very useful for generating creative ideas, it is of course limited if there aren’t sufficient resources. Potential brainstorming problems can also include distraction, social loafing, evaluation apprehension, and production blocking. Existing scenario methods are not related to requirement analyses for software development, which can cause communication problems between scenario writer and requirement analyst. Requirements analysis for developing services has made progress independent of efforts for scenario writing. As a result, one-sided and subjective scenarios that were written without concerning the requirement analysis phase have created problems: double writing, inconsistency between scenario and requirement specification, waste of time and cost. Moreover, these types of scenarios make it difficult to objectively analyze and evaluate scenarios. It is also difficult to validate the degree of scenario reflection on requirements specifications since finding objective standards against which one can evaluate a subjective scenario is difficult. Therefore, this paper proposes the standardized and structured methodology of writing scenarios, and develops an integration methodology for requirement analysis that proceeds automatically and continuously from the written scenario. In this methodology, we focus on a technical-push scenario that is validating the utilitarian of the technology which is developed in the short term. This paper is organized in five sections. First we review previous research, and then Section 3 describes the scenario writing methodology and integration methodology for requirement analysis. A case study of our methodology is proposed in Sections 4. Finally, we provide concluding remarks in Section 5.
92
O. Kwon and Y. Lee
2 Scenario-Based Requirement Engineering Scenario-based requirements analysis methods express system requirements by using scenarios that were elicited through examples of real world experience and invented experiences [3,4]. Scenario-based requirements analysis methods can be simply divided into two methods: ScenIC and SCRAM [5,6]. ScenIC (Scenario in the Inquiry Cycle) proposes a schema of scenario-related knowledge composed of goals, objectives, tasks, obstacles, and actors [7]. Scenarios are composed of episodes and action carried out by actors, who are usually people but may also be machines. Goals are classified into achieving, maintaining, or avoiding states, while obstacles prevent goals from being achieved, or inhibit successful completion of tasks. The method proceeds in a cycle of expressing scenarios in a semistructured format, criticizing and inspecting scenarios in walkthroughs, which lead to refining requirements and specifications and the next cycle. Guidelines are given for formatting scenario narratives and identifying goals, actions and obstacles. Scenario episodes are assessed with challenges to see if goals can be achieved by the system tasks, whether the actors can carry out the tasks, whether obstacles prevent the actors carrying out the tasks, etc. Thus, dependencies between goals, tasks, actors and resources can be checked to make sure the system meets its requirements. Dependency analysis and means-ends analysis, in which tasks and the capabilities of actors are examined to ensure goals can be achieved. In SCRAM (Scenario-based Requirement Analysis Method) [8,9,10], scenarios are used with early prototypes to elicit requirements in reaction to a preliminary design. This approach does not explicitly cover modeling and specification, as this is assumed to progress in parallel, following the software engineering method of the designer’s choice. According to Alistair [5], the method consists of four phases: (1) Initial requirements capture and domain familiarization. This is conducted by conventional interviewing and fact-finding techniques to gain sufficient information to develop a first concept demonstrator; (2) Storyboarding and design visioning; (3) Requirements exploration; (4) Prototyping and requirements validation. This phase develops more fully functional prototypes and continues refining requirements until a prototype is agreed to acceptable by all the users. However, in the legacy scenario-based requirement analysis methods, requirement analysis is actually separated from scenario writing; they support system analysis only after writing scenarios completely.
3 Methodology Overview 3.1 Overall Framework To write scenarios and analyze requirements in USS development environment, an overall framework is proposed as shown in Fig. 1. There are two ways to write a scenario briefly. The first one is a guided approach. In this approach, we first analyze the environment, and then write scenarios based on the results. The other is an open-ended approach that writes scenarios freely without any environment analysis [24]. As previously stated, our research focuses on a technical-push scenario that validates the usefulness of the technology developed in the short term. Hence, we apply a guided approach because we aim to make our scenario
A Structured Methodology of Scenario Generation and System Analysis
DDefi ne efi ne the the Technol oogy Technol gy MMatri xx atri
DDefi ne efi ne the the SServi ce ervi ce MMatri xx atri
SSketch ketch oout ut the oo the SScenari cenari OOutl ine utl ine
93
GGenerate enerate the the SScenari oo cenari
DDraw raw the the SScenari oo cenari DDescri p oonn escriptiti DDiag iagram ram
GGenerate enerate the the RRequi rem equi rem ent ent SSppeci fifi cati oonn eci cati
Fig. 1. Proposed methodology for generating scenario and requirement specification from technology and service matrix
practical to implement. In this paper, we define technologies, devices, and services that we can use in the scenario at present—this is the environment analysis phase. We then write the scenario based on the results. The scenario writer defines the technology matrix after collecting usable technologies and devices. Technology matrix means a sort of tables which shows summary of available technologies and corresponding devices for making services available. Through a combination of technology and devices in the defined technology matrix, we extract services that we can use in scenarios. We then define a service matrix, which describes in detail the content of each service extracted from a technology matrix, and evaluate its service level to decide if it is available to make scenarios. Based on this evaluation result, we confirm the service that we will use in the scenario. The scenario outline is then sketched out by a final confirmed service. After that, we draw the scenario diagram using the scenario outline and confirmed services. The scenario diagram is a technique that represents scenarios by predefined notations and processes. This diagram consists of a context diagram, a level-0 diagram and a level-1 diagram. Scenarios are generated by transforming a completed scenario diagram according to automatic transformation logic. Requirement specifications are written by transforming a generated scenario according to automatic transformation logic. Meanwhile, we create a scenario dictionary to improve understanding of the scenario reader and requirement specification writer. 3.2 Technology Matrix Definition First, the scenario writer collects technologies and devices that can be used in the scenario and describes brief explanation for each. Then, the scenarios writer may draw up a table which indicates a summary of those technologies and devices as shown in Table 1:
94
O. Kwon and Y. Lee Table 1. Technology Matrix Template
Category Available Technology Possible Device
Name
Explanation
Technology name
Brief explanation for the technology
Device name
Brief explanation for the device
3.3 Service Matrix Definition (1) Extract services Through a combination of technology and devices in the previously defined Technology Matrix, we extract services that we can use in scenarios. Extracted services would take concrete shape by the Service Matrix and be confirmed in the phase of service evaluation. Table 2 is the template for Service Extraction. Among these methodologies, we will focus on the former six steps from need identification to validation on service selection in this paper. Table 2. Service extraction template
Technology and Device Tech. 1
Tech. 2
{
Device 1
Device 3
{
{
Service
Device 4
Service name
Mark ‘{’ Tech. or Device used in the Service.
(2) Define service matrix Service matrix is a table to display in detail the contents of each extracted service. At this time, the functionality of the service that is expressed as an adjective or an adverb in service must be explained as detailed as possible. Table 3 is the template for the Service Matrix. Table 3. Service matrix template
Service
Description
Functionality Enabler
Function
A Structured Methodology of Scenario Generation and System Analysis
95
(3) Evaluate service level We evaluate the service level of each service based on the checklist. Then we review the result of evaluation with the people who request the scenario. 3.4 Scenario Outline Identifying the topic and background of scenario might proceed as follows. • • • •
Define the value that can be provided for end user through the services used in scenario. Identify stakeholders related to the scenario Define the main character of the scenario and his goal. If needed we can set up the obstacles and complication elements of the scenario. Describe the brief contents for scenario.
Table 4 is the template for identifying the theme and the background of scenario. Table 4. The template for identifying the theme and the background of scenario
Category Name Theme Stakeholder Abstract
Description The name of scenario The theme of scenario Stakeholders are people who affects, or can be affected by the scenario financially Abstract of the scenario
(2) Define the actors and assign their role We define the actors who carry out the service in the scenario and assign their role. An actor might be divided by its property (device, agent, ontology, u-Product). Table 5 is the template for defining the actors and assigning their role. 3.5 Scenario Definition Diagram We draw the scenario diagram based on the sketch of the scenario. This diagram is a technique that represents scenarios by predefined notations and processes. It consists Table 5. The template for defining the actors and assigning their role
Category Device Agent Ontology u-Product
Actor Actor name
Role Actor’s role
96
O. Kwon and Y. Lee Table 6. Components of the scenario definition diagram
Component
Notation
Sequence
Scene
Stakeholder
Data Flow
Data Store
Information Flow
Description It represents the service offered in scenario. One sequence has more than one service. It represents the actual service action that is performed by using the data. It has an actor who is the object of services and an action that is service contents. It represents the stakeholders are people who affects, or can be affected by, the scenario financially It represents the data flow among services or actor’s action. It represents the optional data flow among services or actor’s action. It represents the looped data flow among services or actor’s action. It represents the data store that saves the data. It could have a number of various physical shapes. It represents the information flow between scene and data store. When a scene requests specific information, the data store provides it.
of a context diagram, a level-0 diagram, and a level-1 diagram. Table 6 shows the components of the scenario diagram. (1) Draw the scenario context diagram The scenario context diagram represents the business model of scenario. This diagram explains the relationship among stakeholders who affect, or can be affected by, the scenario financially. Hence, the scope of service is decided through this diagram. The scenario context diagram is drawn like Fig. 2. (2) Draw the scenario level-0 diagram The level-0 diagram represents all services offered in a scenario, and it depicts information flow and the transforms that are applied as data moves from input to output. The scenario level-0 diagram is drawn as a series of steps and will be drawn as in Fig. 3.
A Structured Methodology of Scenario Generation and System Analysis
Stakeholder 1
Scenario name
Financial relation
97
Stakeholder 2
Financial relation
Financial relation Stakeholder 3 Fig. 2. Scenario contxext diagram Data 1. Service 1
Data 2. Service 2 Data
Fig. 3. Scenario level-0 diagram
(3) Draw the scenario level-1 diagram The Scenario Level-1 Diagram represents the actor and his action in each service. All actors have an Event, Action, and Response. In here, Event is the cause action leads to actor’s action and Response is the result action of the actor’s action. Table. 7 is the template for Action Matrix to organize these actors and actions. Scenario Level-1 Diagram is drawn like Fig. 4. Table 7. The template for action matrix
Actor Actor name
Event
Action
Response
The cause action leads to actor’s action
The actor’s own action
The result action of the actor’s action
98
O. Kwon and Y. Lee
Data flow
1.1 Actor 1
Data flow
1.2 Actor 2
Action1
Data flow
1.3
Data flow
Actor 3
Action2
Action3
Data flow
Data flow
Data Store Fig. 4. Scenario level-1 diagram
3.6 Scenario Generation We generate the scenario by transforming completed Level-1 Diagram according to automatic transformation logic on a per-Sequence basis. Since the transformation logics are highly dependent of language to be used, Korean language has been considered so far. 3.7 Generate Requirement Specifications In this paper, we use the UML that is a modeling language widely used for requirement analysis. Software system can be visualized, described, developed and created a document by the UML. Especially, we choose the Sequence Diagram among 4 models and 8 diagrams of UML that are OMG’s public standard. Sequence Diagram is drawn as follow steps. Step 1. On the basis of Level-1 Diagram, collect all data flows in scenario and give them serial number in order. And then rearrange each data flow, input and output as Table 8. Step 2. Placed all actors appeared in scenario horizontally by using predefined symbols of Fig. 5. And draw a vertical line started from each actor. Step 3. Draw all data flows arranged in Step 1 from Input Actor to Output Actor in order. At this time, represent data flow as an arrow along the vertical line. Table 8. Template for rearranging the data flow in scenario
No 1
Action
Actor
Next Actor
The actor’s own action
Actor who acts
Actor who is affected by the action
2
…
…
…
A Structured Methodology of Scenario Generation and System Analysis
99
Human
u-product
sensor
event handler agent
ontology
Fig. 5. Symbols for sequence diagram
4 Case Study This case study describes the ‘U-Shopping Aid’ scenario. We applied our methodology to this scenario. ‘U-Shopping Aid’ scenario was proposed to make customers Table 9. Technology matrix Category Technology
Device
Name Agent Technology Semantic Web
Description Offer the information of the product reputation and the payment through the negotiation among agents Provide semantic information through the data written by Ontology Language (OWL) Transfer data between Network AP of shop and u-PDA
uPAN Scalefree Ubiquitous Mobile Object (UMO) UMO
It is a mobile object or device that people have. It can be changed own functions depended on context and share computing resource with smart objects Smart object that customer has. It play an important role in receiving the service Smart display device established in shop or customer’s house Performing a recognizing role among devices
Smart Wall Or AR Table RFID Sensor
Table 10. Extract services Technology and Device uAgent USN PDA
Augmented Reality
Teleconference
Smart Wall
Smart Table
G
G
{G
{G
{G
G
G
{G
G
{G
G
G
{G
G
G
{G
{G
G
G
G
{G
G
G
{G
{G
G
G
G
G
G
G
{G
G
{G
G
Service Customer-aware and product recommendation Virtual put-on service Image sync service Payment service1 Payment service2
100
O. Kwon and Y. Lee Table 11. Service matrix Service name
Definition
Customer-aware and product recommendation
When customer enters, it is aware of that fact automatically. Recommend products that customer would like
Image sync service
Provide the virtual image that customer puts on the chosen clothes using the pictures saved in smart object
Payment service1
It enables to share images among remote spaces and to communicate with each other
Payment service1
When customer makes the decision about buying, it automatically transmits the payment information to the seller and delivery company simultaneously When customer makes the decision about buying, it automatically transmits the payment information to the seller simultaneously
Payment service2
Functionality Function Notify the user’s current location information Mall Agent Detect the embedded chip in UMO automatically Customer Store the user profile information Ontology Shop Agent Make recommendation lists according to the user preference Augmented Display virtual images reality device when the user sees the chosen clothes through AR device AR Table or Display made picture Smart Wall u-PDA Transmit shared images AR Table or Display shared images Smart Wall and transmit the selected information u-PDA Transmit payment information Enabler UMO
u-PDA
Transmit payment information
Table 12. Scenario actors and their roles
Category
Actor Shop Agent
Agent Mall Agent
Sensor
Chip sensor at the Mall entrance
Ontology
Customer info Ontology
Role It offers the optimized service through being aware of various customer need and manage the agent of shops. It is in the shopping mall. It is aware of customer’s context information and transmits it to Shop Agent. It is a RFID chip sensor at the shopping mall entrance. It detects the customer’s smart objects. It stores a customer profile.
A Structured Methodology of Scenario Generation and System Analysis
101
convenient for their shopping more practically. ‘U-Shopping Aid’ scenario consists of customer-aware and product recommendation services, virtual put-on service, image sync service and payment service. Let’s suppose that the Technology Matrix could be defined as Table 9. Then five services in Table 10 are extracted and Service Matrix is defined as listed in Table 11, which is based on Table 9. Then the actors and assign their role for our scenario is defined as shown in Table 12. Lastly, Scenario Context Diagram and Scenario Level-0 Diagram are prepared as Appendix A.
5 Conclusion Recently, USS developers are under increasing pressure due to the increased importance of making scenario and requirement analyses; hence the increasing need for structured methodologies. However, so far there has been no standardized methodology for writing scenarios. Most scenario methods suggest simple guidelines that do not take into consideration all USS development phases. Therefore many scenarios have been developed by brainstorming and the contents have been decided by personal taste or by the organization’s favor. Moreover, existing scenario methods do not relate to requirement analysis for software development. This fact creates communication problems between scenario writer and requirement analyst. It creates problems like double writing, inconsistency between scenario and requirement specification, waste of time and cost, and so on. These kinds of scenarios make it difficult to analyze and evaluate scenario objectively. Therefore, we propose the standardized and structured methodology of writing scenario and develop the integration methodology for requirement analysis that is proceeding from written scenario automatically and continuously. We are applying our methodology to the ‘U-Shopping Aid’ service development supported by a Korea governmental project.
Acknowledgment This research is partially supported by the ubiquitous Computing and Network (UCN) Project the Ministry of Information and Communication (MIC) 21st Century Frontier R&D Program.
References 1. Ranta, M., Asplund, H.: Utilizing scenarios in service idea generation- a case of distant participation for a seminar. In: Proceedings cf COST269 Conference, pp. 379–386 (2003) 2. Anton, A.I., Potts, C.: A Representational Framework for Scenarios of System Use. Requirements Eng. 3, 219–241 (1998) 3. Sutcliffe, A.: Scenario based requirement analysis. Requirements Engineering Journal, 48– 65 (1998)
102
O. Kwon and Y. Lee
4. Plihon, V., Ralyte, J., Benjamen, A., Maiden, N.A.M., Sutcliffe, A., Dubois, E., Heymans, P.: A Reuse Oriented Approach for the Construction of Scenario Based methods. In: Proc. Int’l Software process Assoc. Fifth Int’l Conf. Software Process (ICSP 1998), Chicago, pp. 14–17 (1998) 5. Sutcliffe, A.: Scenario-Based Requirements Engineering. In: Proceedings of the 11th IEEE International Conference, Montery Bay, USA, pp. 320–329. IEEE Computer Society Press, Los Alamitos (2003) 6. Misra, S., Kumar, V., Kumar, U.: Goal-oriented or scenario-based requirements engineering technique- What should a practitioner select? In: IEEE CCECE/CCGEI, Saskatoon, pp. 2290–2291 (2005) 7. Potts, C.: ScenIC: A Strategy for Inquiry-Driven Requirements Determination. In: 4th IEEE International Symposium on Requirements Engineering, pp. 58–65. IEEE Computer Society Press, Los Alamitos (1999) 8. Sutcliffe, A.G.: Scenario-Based Requirements Analysis. Requirements Engineering 3, 48– 65 (1998) 9. Sutcliffe, A.G.: User-Centred Requirements Engineering. Springer, London (2002) 10. Sutcliffe, A.G., Ryan, M.: Experience with SCRAM: A Scenario Requirements Analysis Method. In: IEEE International Symposium on Requirements Engineering: RE 1998, pp. 164–171. IEEE Computer Society Press, Los Alamitos (1998)
Appendix A. Example Results (1) Context Diagram COEX Mall
Payment Context-aware info Info & payment User (A)
U-shop service & delivery Service
Info & payment U-shop
U-Shopping Aid Customized i Purchasing info & payment Delivery Info
Distribution and delivery Service Provider
A Structured Methodology of Scenario Generation and System Analysis
(2) Scenario Level-0 Diagram [Scenario name] Aware & recommend <Scenario Diagram>
[Scenario name] Virtual Put On
[Scenario name] Image Sync
[Scenario name] Payment and delivery
103
104
O. Kwon and Y. Lee
(3) Data Flows in Scenario No. 1 2 3 4 5 6 7 8 9
18
19 20 21 22 23
Data flow A enters Send A’s context info.
Actor Chip sensor at Mall Entrance Mall Agent Customer Info. Ontology Mall Agent
Request for A’s info. Send A’s info. Send A’s Context info. Ask A’s info. Shop Agent Send A’s info. Customer Info. Ontology Provide product list to Shop Agent recommend Send a message to A’s UMO suggest the VPO service Scenes from 10 to 17 are omitted here. Display the created IS service device augmented reality to share with family The info. about the AR Table family choices Send the family A’s UMO choices Decision to purchase A and pay Info. about payment A’s UMO Info. about payment Shop Agent and delivery
Next_Actor Mall Agent Customer Info. Ontology Mall Agent Shop Agent Customer Info. Ontology Shop Agent A’s UMO A
AR Table
A’s UMO A A’s UMO Shop Agent A
Capturing Semantics for Information Security and Privacy Assurance Mohammad M.R. Chowdhury1 , Javier Chamizo2 , Josef Noll1 , and Juan Miguel G´ omez2 1
UniK-University Graduate Center Post Box 70, N-2027 Kjeller, Norway {mohammad, josef}@unik.no 2 Escuela Polit´ecnica Superior Universidad Carlos III de Madrid Avda. de la Universidad 30, Legan´es, Madrid, Spain [email protected], [email protected]
Abstract. Security and privacy assurance is indispensable for ubiquitous access to information and resources. This paper focuses on the security and privacy provisions in a restricted organizational environment through access control mechanism. It includes the representation of the semantics of an organization and its access control mechanism exploiting the Web Ontology Language. The system controls access to the resources of an organization through differential access privileges. These are formulated based on the roles of the individuals, and the projects and departments they belong to. Instead of explicit definitions, some additional facts of the mechanism are inferred by executing semantic rules using the Jess rule engine over the designed ontology. These information are then passed back to the ontology to enrich it. The ontology is designed to cope with the organization restructuring with minimal efforts.
1
Introduction
Ubiquitous computing and connectivity together with extensive diffusion of portable devices allow users to access information/resources/services anytime and anywhere even when they are on the move. However, these access scenarios demand security and privacy assurance which is not a trivial job in today’s increasingly connected but dynamic systems. In this regard, Professor Dr. Eugene Spafford said [1], The only truly secure system is one that is powered off, cast in a block of concrete and sealed in a lead-lined room with armed guards - and even then I have my doubts. This paper focuses on the security and privacy provisions in a restricted organizational environment through access control mechanisms. Access control in distributed and dynamic systems is crucial for secure service access. It is included F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 105–118, 2008. c Springer-Verlag Berlin Heidelberg 2008
106
M.M.R. Chowdhury et al.
as one of the main sections in ISO/IEC 27000 series1 , an information security standard published by the ISO and the IEC. We believe that the capabilities of semantic technologies can contribute to mitigate these problems. The impact of Semantic Web technology is wide ranging. The Project10X (a consulting firm)2 study found that more than 190 companies including Adobe, Google, HP, Oracle and Sony are involved in developing Semantic Web based tools. But making it easier to comb through online data carries security implications. Among the challenges of security issues, policy-awareness and access control to Web resources play a major role, particularly given that these are two of the most significant requirements of information access and exchange. Design and maintenance of access control constraints in organizations are challenging problems as company structure, roles, user pools, security requirements are always changing. Conceptual organizational semantics and its access control mechanisms are fomally represented in an ontology using Web Ontology Language. The company system controls access to the resources of the organization by providing differential access privileges. In this ontology, these privileges are formulated based on the roles of the individuals. In addition, it considers which projects or departments they belong to. The proposed solution is designed to cope with the dynamic nature of an organization. This work is an extension of our previous work [2], where the concepts were so static that it could not reflect the organizational changes. This paper deals with a complex situation where an employee plays multiple roles across different departments and projects. The paper is organized as follows. The next section discusses the problem statements and our use case scenario. Section 3 briefly describes the Semantic Web technologies. The ontology representing organizational semantics are described in section 4. In Section 5, we illustrate the access control mechanism, the processes and results of inference based on the proposed ontology. In the next section, proposed solution is evaluated in the context of organizational restructuring. Section 7 contains overview of the related works and the paper concludes summarizing the paper and briefly stating the future works.
2
Problem Statement: Information Security and Privacy Assurance
Nowadays people in business organizations increasingly work in project oriented environments. Some of the projects deal with company sensitive information which should not be leaked to unauthorized employees. The project members come from different departments. They don’t enjoy the same rights or privileges within a project environment. This is more prevalent while accessing resources owned by the departments or projects. There are situations where a person 1 2
ISO 27002 - The Information Security Standard, http://www.standardsdirect.org/iso17799.htm [accessed on Jan. 4, 2008] The Project10X Special Report, http://www.semantic conference.com/semanticwave.html
Capturing Semantics for Information Security and Privacy Assurance
107
holds multiple roles and privileges. Employees with different levels of privileges are expected to access resources through the Intranet or Internet. Fig. 1 illustrates the use case scenario which describes a specific organizational environment. It deals with the roles like employee (department in general), supervisor, project leader (PL) and project member (PM). Telenor R&I (Department A) and its planning department (Department B) both involve in project release 7 and 8. Release 9 only resides in planning department. Each of the departments and projects has its own resources. A person has multiple roles like Josef Noll is not only a supervisor (Department A) and project leader (Release 9) but also a project member (Release 7). He should have access to corresponding department and project resources where he involves in. So, access rights depend on one’s role into the respective departments and projects. Following are some of the examples of restricted access scenarios, 1. Supervisor is the head of a department. Department owns some resources (administrative resources, documents, deliverables). He can read and write the documents. He can edit its administrative resources and give final approval to the deliverables. Supervisor can also monitor status of department’s employees who work in different projects. 2. Department’s employees will have only read and write privileges to its documents. 3. Departments participate in different projects. Project leader leads a project. He can read and write project’s documents. He can edit its administrative resources and give final approval to the project deliverables. 4. Besides leader, projects have members. They can only read and write project’s documents. Therefore, the architecture manages access to resources not only based on the roles but also based on the involvement in organizational divisions (departments, projects). Access scenarios of the use case are described in table 1.
Fig. 1. The use case scenario
108
M.M.R. Chowdhury et al. Table 1. Roles and privileges to access corresponding resources Employee
Role
Josef Noll
Supervisor
Project Leader
Hans Christian
Project Member Supervisor
Project Leader
3
George Kalman
Employee Project Member
Erik Swansson
Employee Project Member
Privilege
Access to Resources
Administrator Admin. Dept.A Final Approval Deliverables Dept.A Read Write Documents Dept.A Administrator Admin. Rel9 Final Approval Deliverables Rel9 Read Write Document Rel9 Read Write Documents Rel7 Administrator Admin. Dept.B Final Approval Deliverables Dept.B Read Write Documents Dept.B Administrator Admin. Rel7&Rel8 Final Approval Deliverables Rel7 &Rel8 Read Write Documents Rel7 &Rel8 Read Write Documents Dept. A Read Write Documents Rel8 Documents Rel9 Read Write Documents Dept. A Read Write Documents Rel8
Representation of Organizational Semantics
We advocate that access control solutions should adopt semantic technologies as the key building blocks for supporting expressive representation of organizational semantics and reasoning. This section briefly describes the technologies and clarifies our claims. 3.1
Using Ontology
A common format of information and data representation will likely never be achieved. Efficient management of information and data is possible by capturing the common and understandable meaning of them formally and unambiguously [16]. Ontologies [15] are the cornerstone technology of Semantic Web, providing structured vocabularies that describe a formal specification of a shared conceptualization. It is used to capture the knowledge about a domain of interest in the form of concepts and their relationships. It permits the description of organizational structures, roles, privileges and resources at different levels of abstraction and support reasoning about both the structure and the properties of the elements that constitute the system. We believe that designing a consistent ontology based on a sound conceptual foundation is worthwhile because it can be reused in various organizations to control access to their resources.
Capturing Semantics for Information Security and Privacy Assurance
3.2
109
Introduction to the Web Ontology Language
Among the different ontology languages, we are focusing on the Web Ontology Language (OWL3 ) suggested by the World Wide Web Consortium (W3C). It is a markup language that builds on RDF4 and RDF Schema. OWL is chosen because it provides more vocabularies for describing concepts and properties (e.g. relations between concepts, cardinality, equality, richer typing of properties, etc) than that supported by XML, RDF, and RDFS. There are three species of OWL: OWL Lite, OWL DL and OWL Full and these are designed to be layered according to their increasing expressiveness. 3.3
Description Logics for OWL
Between the three different sub-languages OWL offers, we decided to use OWL DL. It is based on Description Logics (hence the suffix DL). These are the decidable part of First Order Logic5 and are therefore amenable to automated reasoning. Though OWL DL lacks in expressivity power compared with OWL Full, it maintains decidability6 and regains computational efficiency. The computational efficiency is an important feature since it is expected to support scores of relations. The mechanism is supposed to evaluate and grant permissions to access resources, it seems necessary to add reasoning support with it. In order to achieve more expressivity and decidability, we use Semantic Web Rule Language (section 5.1) which is designed as an extension of OWL DL.
4
Descriptions of Organizational Semantics
We assumed that company employees are already authenticated to the system through some secure means. Ontology models the organizational structures described in section 2. 4.1
Defining Concepts through Classes
Fig. 2 illustrates the proposed ontology in conceptual level. OWL classes are the concrete representations of concepts. In the proposed ontology, Identity class defines the identities of the company employees. We specify Company, Department and Project as subclasses of Work Unit in order to avoid defining explicit relationships between department/project and roles. We follow the set theory (eq. 1) while defining the class hierarchy of Role considering the fact that supervisor of 3 4 5 6
OWL Overview: http://www.w3.org/TR/owl-features/ RDF builds on URI and XML technologies. The specifications provide a lightweight ontology system. First Order Logic (FOL), http://en.wikipedia.org/wiki/First-order logic Logics are decidable if computations/algorithms based on the logic will terminate in a finite time.
110
M.M.R. Chowdhury et al.
Fig. 2. The ontology: class, property, instance
a department is also an employee of it. The same is true for project leader and member. Class hierarchy is a critical issue in inheritance of properties. {Supervisor, P rojectLeader, P rojectM ember} ⊆ Employee ⊆ Role
(1)
We divide the resources into subclasses, administrator resources and web resources. Web resources are further divided into documents and deliverables. These resources are related to appropriate privileges. Privileges are designed in accordance with the individual’s roles in the organization. In this paper, role is the positional hierarchy of employees. Through this, individuals are restricted to access correct resources. As for example, administrative resources are related to the Admin Privilege to ensure that only the roles having administrative privilege can access these. 4.2
Realizing Concepts through Instances
OWL classes are interpreted as sets that contain instances or individuals. Fig. 2 illustrates the instances of classes in ellipses. Instances of the identities are defined here simply as their names. Admin, Final approval and Read write instances of privilege are added. But new instances can be added (section 6) whenever necessary. Instances of the resources are added which correspond to the resources owned by the departments or projects. Instances to the subclasses of Role are added in accordance with the individual roles to realize multiple roles of a person. As for example, Sup Josef instance of Supervisor corresponds to the supervisor role of Josef Noll. Similarly, PM Josef corresponds to the project member role of Josef Noll.
Capturing Semantics for Information Security and Privacy Assurance
4.3
111
Defining Relations through Properties
Properties are the binary relations between the two things, more specifically between the instances of classes. A property relates instances from the domain with the instances from the range. Syntactically, domain links a property to a class and range links a property to either a class or a data range. Due to the class hierarchy (eq. 1) and domain and range specifications, subclasses inherit the relationships between the respective classes. Fig. 2 also provides the properties and their relationships with classes. rolePlaysIn property specifies the fact that in which specific Work Unit (departments/projects) a Role instance plays its role. Ontology is supposed to answer who can see/access which resources and hasVisibilityOf property defines this situation. isSupervisorOf explains who is the supervisor of whom in a department. The relationships of hasVisibilityOf, isSupervisorOf are not defined explicitly. These relationships of the properties are filled in through the inference process.
5
Access Control by Enhancing the Expressivity of OWL
Access control is achieved by enhancing the expressivity of OWL through the inference process. This section describes the access control logic and expressivity needs, making a review of the language used and its benefit to the goals pursued. 5.1
Introduction to the Semantic Web Rule Language
The expressivity provided by the OWL is limited by tree like structures [17]. This means that knowledge cannot be inferred from indirect relations between the entities, however the solution spends most part of his power in inferring indirect relationships that will determine whether a subject has access to a resource or not and which are its privileges over it. Hierarchical structures as defined before and inherent relationships between working units and hierarchies of resources are a perfect field where inference can extract these knowledge. We did inference through the rule support over the ontology. and used Semantic Web Rule Language (SWRL7 ) which pretends to be a complimentary feature of OWL. SWRL is roughly the union of Horn Logic and OWL. As any Horn Logic based language, rules are defined as a set of precedent and consequent states. 5.2
Inference Results
Objects of the properties, hasVisibilityOf and isSupervisorOf are filled in through the inferred knowledge from executing the rules. We use Jess rule engine to run the rules. First, OWL ontology and SWRL rules are transferred to Jess. Running the engine then initiates the inference process, generates knowledge as Jess facts. This inferred knowledge can be used by the external interface or can optionally be passed back to the ontology to enrich it. All these actions are user-driven. Rules are formulated using the SWRL as follows, 7
The Semantic Web Rule Language, http://www.w3.org/Submission/SWRL/
112
M.M.R. Chowdhury et al.
– Rule1: Over which resource a Role has Visibility/Access? Employee(?Em) ∧ roleP laysIn(?Em, ?X) ∧ hasP rivilege(?Em, ?Y ) ∧ belongsT o(?Z, ?X) ∧ needP rivilege(?Z, ?Y ) −→ hasV isibilityOf (?Em, ?Z)
– Rule2: Who is supervisor of whom? Dept Employee(?DepEm) ∧ hasRole(?Y, ?DepEm) ∧ Department(?Dep) ∧ roleP laysIn(?DepEm, ?Dep) ∧ Corporate Identity(?ID) ∧ Supervisor(?Sup) ∧ hasRole(?ID, ?Sup) ∧ roleP laysIn(?Sup, ?Dep) −→ isSupervisorOf (?ID, ?Y )
Fig. 3 and 4 illustrates the inference results of rules execution. All the relationships in the ontology have not been explicitly defined. As for example, hasVisibilityOf relationship of project leader Hans (PL Hans) has not been defined (circled in fig. 3). The figure displays that 25 relationships are inferred, which answers the resources over which the roles have the required visibility/access (Rule 1). Our investigation shows that the knowledge are inferred as expected. The inference results are exported back to the ontology to fill these empty relationships (fig. 3 shows that 25 relationships are transferred back to the OWL knowledge). From Tab. 1, it is evident that Josef Noll is the supervisor of George Kalman, which was not explicitly defined. Execution of Rule 2 shows the similar result (Fig. 4). This can be used in a special situation, when supervisor wants to check the status of an employee of his department to a project where he is not involved.
6
Evaluation
The proposed solution is more maintainable since there exist a general schema for any organization that can easily be adapted, thanks to its expressivity capacity nearer to human understanding. Expressivity and inference capacities avoid the inclusion of redundant information in the ontology. The proposed ontology can reflect the organization changes of a company with minimum efforts. Now, we are going to describe a situation of this. A new department, Audit has been created in the company. Roman Stanek and Peter Johansson has joined as the supervisor and employee (auditor) of the department respectively. The department has budgets, and Peter audits the budgets. Audit department prepares the audit reports of company. It is expected that auditor (Peter) as well as supervisors of the departments can only check department’s budget. Roman and Peter only have access to the company audit reports that belong to the Audit department only. As a supervisor of Audit department, Roman can give final approval to these. The corresponding actions to reflect these changes into the ontology are described in the following points. Through this, we are going to evaluate the strength of the proposed ontology. – Add new identity instances: Roman Stanek and Peter Johansson. – Add new department instance: Audit
Capturing Semantics for Information Security and Privacy Assurance
Fig. 3. Inferred results executing Rule 1
Fig. 4. Inferred results executing Rule 2
113
114
M.M.R. Chowdhury et al.
Fig. 5. Inferred relationships when Rule 1 is executed
Fig. 6. Inferred relationships when Rule 2 is executed
– Add new subclass of Resource: Budget & Audit. It contains three instances: audit report of company (Audit Company), BudgetDept.A and BudgetDept.B. A new subclass of resource is created because it needs a different privilege to support its privacy requirements. Besides, the department contains an administration resource (AdminResAuditDept). – Add new subclass: Auditor within the class tree: Role-CompanyEmployee. Add Auditor Peter instance of it. – Add new instance of Supervisor role: Sup Stanek.
Capturing Semantics for Information Security and Privacy Assurance
115
– Add a privilege instance Admin Budget to ensure only auditor and supervisor have access to budget. This instance is related to Auditor Peter and Budget Dept.A & Budget Dept.B. – Fill up all the corresponding relationships. – If we execute Rule 1, new relationships are inferred with hasVisibilityOf property. These are exported back to the ontology to fill in (circled in the figure) the empty relationships. Fig. 5 shows these new relationships for instance of auditor role Auditor Peter. As expected, Peter can only have access to company audit reports and Budgets of Dept.A and B. Similarly, fig. 6 shows the fact that Roman Stanek is the supervisor of Peter Johansson. It requires only few changes in the ontology. Among these changes, only two new subclasses have to be created. Otherwise, all the remaining additions are in the instances which is quite apparent. Conceptually, proposed ontology can follow the organizational changes. One of the ways of simplifying access rights management is to store the access control matrix using access control list (ACL). Though ACLs are simple to implement, its management steps are quite tedious especially when it is required to revoke somebody’s privilege. The proposed ontology provides a simple mechanism to revoke somebody’s role or privilege. One can simply delete the relationship between the role instance and the privilege instance to withdraw the privilege and afterward can delete the corresponding role instance to repeal the subject’s role entirely. Among the few disadvantages, it is computationally expensive and decidability not guaranteed by SWRL. In addition, XML processing is slower than that of database. But ontologies can be stored in the relational databases or even be mapped from them.
7
Related Works
The significance of adding privacy-enhancing technologies (PET) in virtual community networks is overwhelming [5], [6]. Information security and privacy requirements are more evident in a business environment. These issues are handled in this paper through access control mechanisms in the context of project oriented corporate working environment. We considered the concept of Role Based Access Control (RBAC) as a part of our access control mechanism. Sandhu introduced RBAC in [4] with early multiuser computer systems, where users’ access permissions are associated with roles, and users are made members of appropriate roles. A major advantage of RBAC is its ability to constraint access based on the concept of separation of duties which significantly simplifies the management of permissions. In the proposed solution, access control also depends on how the organizational structures. There are two types of RBAC constraints: dynamic and static. Authors in [3] described an approach of RBAC with dynamic constraints through an automated reasoning technique. Though we focused on static constraints on roles, rules were included within the ontology to infer new knowledge which can be passed back to the ontology. Through this verification of access control constraints defined
116
M.M.R. Chowdhury et al.
in the ontology are also achieved. The proposed solution can also adapt to ever changing company structure with less effort. Ubiquitous connectivity not only permits users to access resources anytime and anywhere but also complicating its control due to user/device mobility. To integrate this pervasive behavior, a context-aware access control framework has been developed for secure collaborations in pervasive computing environments [18]. It followed two main design principles: context-awareness to control resource access and to enable dynamic adaptation of policies based on context, and the use of OWL DL for context/policy specification. The authors considered location and time of the meeting as the context information collected dynamically. We used static contexts like, which departments/projects someone is involved though these were predefined. Among the access control technologies, ACL is widely used. The semantics of ACLs have been proved to be insecure in many situations. Instead of maintaining a centralized ACL, a trust based group has been proposed to delegate access rights in [7], [8] where FOAF (friend of a friend) [9] based social networks acted as a mean for the delegation. A private key based signature scheme was proposed to ensure the privacy of networks and users. But it requires secure distribution and maintenance of keys. A similar concept of trust or reputation has also been used by [10] to create and access communities. Distributed trust management approach is considered as one of the main components to secure the Semantic Web [11]. They intended to provide access to community resources and privacy solutions only by means of trust/reputation management. But access to sensitive business resources based on trust mechanism does not provide adequate security in business contexts. In addition, trust is affected by various factors and therefore difficult to quantify. In [12], authors presented an approach to reduce the inefficiencies of the management (coordination, verification and validation, and enforcement) of many role-based access control policies and mechanisms using OWL. They focused on the representation of XACML (eXtensible Access Control Markup Language) policies in DL. In [13], Kolovski introduces an algorithm for the translation of a subset of XACML into DL with the goal of offering relevant analysis services using an OWL DL reasoner, Pellet[14]. In this paper, we also formalize the organizational semantics, roles and access privileges using OWL DL. Finini in his paper [11] also proposed using OWL for constructing ontologies which define policies/privileges.
8
Conclusions
In this paper, we addressed the security and privacy challenges of project-based organizations through access control mechanism exploiting Semantic Web technologies. In this regard, we developed an ontology to represent the conceptual structure of an organization and the roles of individuals. In the ontology, all the relationships between the entities have not been defined explicitly. Semantic rules facilitated expressing these additional knowledge. Jess rule engine executed
Capturing Semantics for Information Security and Privacy Assurance
117
these rules and new facts were transferred back to the ontology to enrich it and check its validity. Apart from these, we evaluated the inference capabilities of the proposed solution by restructuring the organization. As the proposed solution is based on centralized architecture, the future architecture should consider the scalability issues of it especially for a system of big enterprise. Our ultimate goal is to integrate this solution with a web application which controls access to resources.
References 1. Spafford, E.H.: director of the Purdue Center for Education and Research in Information Assurance and Security, Selected Quotes [accessed on January 4, 2007], http://homes.cerias.purdue.edu/∼ spaf/quotes.html 2. Chowdhury, M.M.R., Noll, J., Gomez, J.M.: Enabling Access Control and Privacy through Ontology. In: 4th International Conference on Innovations in Information Technology (Innovations 2007), Dubai, UAE (2007) 3. Dury, A., Boroday, S., Petrenko, A., Lotz, V.: Formal Verification of Business Workflows and Role Based Access Control Systems. In: International Conference on Emerging Security Information, Systems and Technologies (SECUREWARE 2007), Valencia, Spain (2007) 4. Sandhu, R.S., Coyne, E.J., Feinstein, H.L., Youman, C.E.: Role-Based Access Control Models. IEEE Computer 29(2), 38–47 (1996) 5. Chewar, C.M., McCrickard, D.S., Carroll, J.M.: Persistent virtual identity in community networks: Impact to social capital value chains. Technical Report TR-03-01, Computer Science, Virginia Tech (2003) 6. Walters, G.J.: Privacy and Security: An Ethical Analysis. Computers and Society, 8–23 (2001) 7. Kruk, S.R., Grzonkowski, S., Gzella, A., Woroniecki, T., Choi, H.-C.: D-FOAF: Distributed Identity Management with Access Roghts Delegation. In: 1st Asian Semantic Web Conference, Beijing, China (2006) 8. Kruk, S.R., Gzella, A., Grzonkowski, S.: D-FOAF Distributed Identity Management based on Social Networks. In: Demo session of ESWC 2006 (2006) 9. FOAFRealm project, http://www.foafrealm.org/ 10. Choi, H.-C., Kruk, S.R., Grzonkowski, S., Stankiewicz, K., Davis, B., Breslin, J.G.: Trust Models for Community-Aware Identity Management. Identity. In: Reference and the Web IRW 2006, WWW 2006 Workshop, Scotland, May 23 (2006) 11. Finin, T., Joshi, A.: Agents, Trust, and Information Access on the Semantic Web. ACM SIGMOD 31(4), 30–35 (2002), Special Issue: Special section on semantic web and data management 12. Smith, M.A., Schain, A.J., Clark, K.G., Griffey, A., Kolovski, V.: Mother, May I? In: OWL-based Policy Management at NASA European Semantic Web Conference 2007, ESWC 2007 (2007) 13. Kolovski, V., Hendler, J., Parsia, B.: Analyzing Web Access Control Policies. In: 16th International World Wide Web Conference, WWW 2007, Alberta, Canada, May 8-12 (2007) 14. Pellet, an OWL DL reasoner, http://pellet.owldl.com/
118
M.M.R. Chowdhury et al.
15. Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, Heidelberg (2001) 16. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (May 2001) 17. Motik, B., Sattler, U., Studer, R.: Query Answering for OWL-DL with Rules. In: International Semantic Web Conference 2004, pp. 549–563. Springer, Heidelberg (2004) 18. Toninelli, A., Montanari, R., Kagal, L., Lassila, O.: Semantic Context-Aware Access Control Framework for Secure Collaborations in Pervasive Computing Environments. In: International Semantic Web Conference 2006. LNCS, pp. 473–486. Springer, Heidelberg (2006)
A Framework for Context-Aware Home-Health Monitoring Alessandra Esposito, Luciano Tarricone, Marco Zappatore, Luca Catarinucci, Riccardo Colella, and Angelo DiBari University of Salento, Lecce, Italy {alessandra.esposito,luciano.tarricone, luca.catarinucci}@unile.it
Abstract. This paper presents a proposal for a context-aware framework. The framework is organized according to a general purpose architecture, centred around an ontological context representation. The ontology provides the vocabulary upon which software agents interoperate and perform rule-based reasoning, in order to determine the system response to context changes. The system components and their coordinated operations are described by providing a simple example of concrete application in a home-care scenario. Keywords: context-aware, ontology, rule, logic, agent, ubiquitous, health.
1 Introduction Progress in mobile devices, wireless networks and software technologies is bringing healthcare a new generation of systems, which are known under the name of pervasive and context-aware systems. The term ‘pervasive’ [1] refers to the seamless integration of devices into the user’s everyday life. Appliances vanish into the background to make the user and his tasks the central focus rather than computing devices and technical issues. Context-aware systems adapt their operations to the current context without explicit user intervention. These kinds of applications pose several technological challenges. First of all, a pervasive application is made up of heterogeneous and dispersed components which must be able to interoperate in a transparent manner. Moreover, in order to adapt to context variations, the system must elaborate raw data sensed by context sources, such as sensors, and extract high level context information from them. Both requirements lead to the identification of the core component of the system: a robust, reusable and sharable context representation. However, the definition of a shared context model provides the basis for 1) allowing the system components to cooperate and build together the system reaction to incoming events, 2) structuring sensed data in a meaningful and interpretable form. Currently, there is no standard modelling option for context [2]. Therefore, available frameworks adopt proprietary approaches. Among them, ontology-based ones [3,4,5] are more and more recognized as the most promising [6], as they provide a rich formalism for specifying contextual information and are reusable and sharable. The adoption of ontologies is encouraged by their capability of being embedded within multi-agent and rule-based systems. Indeed, designing the application in a multi-agent fashion allows one to organize it around the F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 119–130, 2008. © Springer-Verlag Berlin Heidelberg 2008
120
A. Esposito et al.
coordinated interaction of autonomous reasoning entities, which wrap the “natural” components of the pervasive system (such as sensing devices and consumer services). Rule-based logic supports agents in implementing advanced reasoning and in deriving high-level concepts from sensed information thus opening the application to sophisticated adaptation and pro-active behaviour. This work proposes a framework for context-aware pervasive systems built around the three above mentioned technologies: ontology representation, multi-agent paradigm and rule-based logic. The amenability of these technologies for contextaware pervasive environments has been demonstrated in a number of recent works. Among them, we recall CoBrA [5], an agent-based framework for intelligent meeting rooms centred around an OWL ontology. Gu et al. [6] propose SOCAM which is an ontology-based middleware for context-aware services in smart space domains. An ontology-based context model and context reasoning for ubiquitous health-care is presented in [4]. Health-care is the focus of Paganelli and Giuli [3] work as well, which describes a system built around OWL and rule-based inference. A further impulse to the integration of the three technologies is provided with our work, which is the result of an effort to enhance their interoperability. As a result, the ontology provides the knowledge codification needed to support both agent reasoning and communication. The effectiveness of the adopted model is shown by using a simple and concrete example in a home-care scenario. The paper is organized as follows. Section 2 introduces the general-purpose framework and its prototypal implementation. Section 3 is centered around context modeling. First the ontology representation, then the rule-based reasoning are illustrated. Section 4 focuses on how agent-based reasoning supports context elaboration. Section 5 focuses on data collection strategies. It describes “S-tag”, a novel low cost device enabling the integration of sensor networks with RFID systems. A home-health scenario is adopted throughout the entire paper as specific example of application and practical result.
2 System Architecture The architecture of our system (Fig. 1) follows a widely accepted abstraction [7], according to which context-aware systems are organized into three layers: context sources. Context management middleware and context consumer level Context sources include entities providing raw context data. They are conceptually partitioned into two groups: physical and virtual sources [8]. Physical sources include all hardware devices able to sense context data, such as RFID, sensors, positioning systems, etc. Virtual sources include software services able to gather context data, such as GUIs for user preferences input, databases, etc. Such data must be elaborated in an “intelligent” manner, so that the overall system reacts properly to context changes. This requires the availability of a machine-interpretable representation of context and of software components (agents) able to suitably process such knowledge. Both of them are conceptually situated at the intermediate level of the system architecture, the so-called middleware layer, as they provide the core building blocks of a context-aware application. Agents interoperate with one another thanks to the availability of a unified model of reality. Their behaviour is strongly influenced by
A Framework for Context-Aware Home-Health Monitoring
121
data provided by context sources and substantially determines the activities to be performed at the highest layer. Indeed, the context consumer layer includes all the entities, such as mobiles, Web interfaces, laptops, which interact with final users in response to meaningful context changes, thus determining the behaviour of the context-aware application as a whole.
Context Consumer GUIs
Applications
Mobiles
PDAs
PCs
…
Context Management Middleware Context Model
Inference Engine
JADE
APIs
…
Context Sources Physical Sensors
Virtual Sensors
Fig. 1. The three-layer system architecture. The lowest layer includes physical and virtual sensors, i.e. devices and software entities forwarding context data. The middle level hosts middleware components, such as software development kits and APIs. The context model is the core of the system and belongs to the middle level as well. End user devices and applications are positioned at the highest level.
The prototypal implementation of the framework is drawn in Fig.2. As shown in the figure, the distributed system is made up of a team of interoperating agents, which share a common knowledge representation and reason through an inference engine. The agents are dispersed over several nodes and play different roles. As explained in Section 4, Context Provider Agents (CPA) wrap context sources and are in charge of converting raw context data into high level context information. CPAs cooperate with Context Interpreter Agents (CIA) which are responsible for managing high level context information and of identifying the set of actions to be triggered as a consequence of an emergence. Finally, Context Consumer Agents (CCA) forward the message/signal describing the alarm situation to the most suited destination. Input data are provided by a physical context source obtained by integrating sensors with an RFID device (see Section 5) and by a virtual context source containing static information. In the following sections, the system components and their way of operating are described, with reference to a home-care scenario.
122
A. Esposito et al.
S-tag
CPA
CN3
CPA
CPA Agent Inference Engine
CCA
CN1
CIA CN2
Agent Behaviours Fig. 2. System Prototype. The system was implemented on three nodes (CN) connected by a local area network. It follows a multi-agent paradigm. Context Provider Agents (CPA) filter and integrate data coming from physical and virtual sensors, thus converting raw context data into high level context. Context Interpreter Agents (CIA) process high level context information to identify the actions and the actors best suited for managing an emergency. Context Consumer Agents (CCA) forward the alarm information to the final destination.
3 Context Representation An application is context-aware if it is able to adapt its behaviour to context by suitably reacting to context variations. In other terms, a context-aware application must exhibit a sort of “situation awareness”, i.e. it must manage and interpret context and its real-time evolution. Moreover, a context-aware application is generally made up of several components, possibly deployed on dispersed nodes, which need to interact with one another. Therefore it is fundamental for such components to share a common representation of context knowledge. These premises cast a twofold approach to context modelling. First of all, the fundamental entities, objects and relations of a situation must be represented. This is needed in order to ground system knowledge on a common basis and provide a unified concept taxonomy to the several components interacting in the application framework. Secondly, context changes originated by meaningful data variations must originate recognizable “high-level” events, so that situation switching can be interpreted and suitably managed by the system. In other terms, the system must be enabled to deduce high-level, implicit context from the low-level, explicit context directly acquired from sensors. This operation often requires reasoning over complex combinations of different data and context information. For example, an
A Framework for Context-Aware Home-Health Monitoring
123
increase of blood pressure (low level context) exceeding the patient threshold (low level context) may produce a switching to an “alarm” situation (high level context), which, on its turn may produce a series of operations finalized at identifying and invoking the most suited available and reachable doctor. Our system attacks the former need, i.e. machine representation of context entities, by the use of ontologies which enable to describe meaningful events, objects and relations of context and support several fundamental forms of reasoning, such as concept satisfiability, class subsumption, consistency and instance checking. The latter need, situation switching management, is approached by implementing logical rules and by embedding context events into facts. In this way, an iterative process is activated every time rules are matched. Indeed, fired rules infer new facts, i.e. new scenario switchings, which on their turn may fire other rules. This process allows the system to convert low level context into high level context, with the final result of producing the system reaction to the context switching which originated the process. For example the occurrence of low level events sensed by physical and virtual sensors, may originate the “diastolic blood pressure of patient X=100” and “threshold of patient X=90” facts. These facts may fire a rule inferring the high level fact “alarm for patient X=true”. The alarm situation, combined with facts concerning the availability of doctors, may produce the fact “doctor Y has to be called”, which generates the system response to the detected patient anomaly. As explained in the following subsections, the ontologies were implemented in the OWL language, whilst the rule-based domain knowledge was implemented with Jess on top of OWL ontologies. 3.1 Context Ontology Several context modeling techniques [9] exist such as key value, mark-up scheme, graphical, object-oriented, and ontology-based modeling. According to [9], the ontology-based approach fits well with the requirements of ubiquitous/context-aware computing. Indeed. ontologies facilitate knowledge sharing and reuse. Knowledge sharing is enabled by providing a common knowledge model to computational entities, such as agents and services, which need to interoperate with one another. Knowledge reuse is promoted by ontology amenability to be extended to different domains and to be integrated within wider ontology-based frameworks. Many ontology languages exist including Resource Description Framework Schema (RDFS) [10], DAML+OIL [11], and OWL [12]. OWL is a key to the Semantic Web and was proposed by the Web Ontology Working Group of W3C. Therefore, it was chosen for our prototype. A common practise, when developing ontologies, is to adopt a top level (upper) shared conceptualization [13] on top of which domain ontologies are built. Top level ontologies codify general terms which are independent of a particular problem or domain. Once the top level ontology is available, several lower level ontologies can be introduced, with the scope of incrementally specializing concepts from the high level generic point of view of the upper ontology to the low level practical point of view of the application. This way of structuring knowledge promotes sharing and reuse of ontologies in different application domains.
124
A. Esposito et al.
ContextualEntity Person
Device hasDevice MeasuringDevice
Patient hasPhysiological Parameter PhysiologicalParameter BloodPressure hasCurrentSBPValue hasNormalMaxSBPValue Upper Context Ontology Class Context Ontology Class
Sensor WearableSensor Sphygmomanometer measuresBloodPressure
Is-A Relationship Object Property Datatype Property
Fig. 3. Some classes from the context domain ontology referring to the “patient environment”
Therefore, we divided context ontology into two levels: the top-level context and the domain context ontology. The top-level context ontology includes concepts which refer to context-aware computing, independently from the application domain. The domain context ontology refers explicitly to the health-care domain. The vocabulary related to context entities was defined starting from the widely accepted definition of context, provided in [14]: “Context is any information that can be used to characterize the situation of an entity.” Therefore, an entity is a person, place, computational entity, or object which is considered relevant for determining the behavior of an application. Therefore, as shown in Fig.3 and Fig.4, “person”, “device” and “TriggeredAction” are contextual entities which specialize into different con?epts depending on the context subdomain they belong to. For instance, a person can be a patient, a relative of the patient or a health operator. Devices can be of different types as well. For the sake of brevity, the figures show just small ontology fragments referring to the example used throughout the remaining part of the paper. 3.2 Context Rules The proposed framework utilizes Jess [15] to structure knowledge in the form of declarative rules. Jess is a widely adopted rule engine and scripting environment written in Java. It adopts the Rete algorithm [16] to implement efficiently the rule matching. Jess rules are used to convert low-level information, given in a raw form by sensors, into high-level context. This is conceptually performed in an incremental
A Framework for Context-Aware Home-Health Monitoring
125
ContextualEntity Person
TriggeredAction
Patient
Alarm hasRelative
Relative
isInChargeOf
HealthcareOperator Nurse
hasAlert Level AlertLevel
Physician hasAvailability Availability
triggersProcedure ContactProcedure
Upper Context Ontology Class
Is-A Relationship
Context Ontology Class
Object Property
Fig. 4. Some classes from the context domain ontology referring to the “application consumer environment”
fashion.When the system starts to work, the sensor network or other devices get data from physical world. Depending on the incoming events captured by sensors and context, the facts in the Jess working memory are updated. A set of first-order rules determines if an alarm has to be triggered and which alarm level should be activated, according to measurement values and corresponding thresholds. Jess pattern matcher then searches automatically through the available combinations of facts to figure out which rules should be fired. Such rules, when matched, infer new facts which express the context switching to “situation of alarm”. In other terms, the system acquires a sort of anomaly awareness, i.e. the raw data interpretation infers facts which express the occurrence of an anomaly. For instance, the following example shows a rule activating an alarm when the systolic blood pressure (“sbp-c” variable) exceeds the patient threshold (“spb-max” variable). When the rule is fired, the fact “status abnormal is true” (“sbp-s” variable) is inferred and the action “notify the abnormal event” is activated (“sbp-anomalynotification” action): (defrule verify-SystolicBloodPressure (measurement (hasPID ?m-pid) (hasMeasurementDate ?d) (hasMeasurementTime ?t)) (patient (hasPatientID ?pid) (hasCurrentSBPValue ?sbp-c) (hasNormalMaxSBPValue ?sbp-nmax)
126
A. Esposito et al.
(SBPStatusAbnormal ?sbp-s)) (test (> ?sbp-c ?sbp-max)) => (bind ?sbp-s true) (sbp-anomaly-notification) ) The “anomaly awareness” facts may fire other rules, which may on their turn infer other facts. This determines the switching to the higher level context. In other terms, the context switches to a new situation, which we may call “procedure awareness”, in which the activities to be performed in order to manage the alarm situation are known. The following example shows a rule fired as a consequence of an abnormal status due to both systolic and diastolic blood pressure. The rule infers the fact “alarm is true” and the action “find-available-physician” (defrule set-alert-level (patient (hasPatientID ?pid) (SBPStatusAbnormal ?sbp-s) (DBPStatusAbnormal ?dbp-s) (HighAlertLevel ?hal)) (test (eq ?sbp-s ?dbp-s true)) => (bind ?hal true) (find-available-physician) ) Once that the procedures needed to manage the anomaly have been identified, the context consumers come into action by performing suited “anomaly management” actions. As detailed in the following section, the kind of reasoning above described is carried out with the support of suited agents.
4 Agent-Based Reasoning All the agents are implemented by using the Java Agent Development Environment (JADE). JADE [17] is a software framework to develop and run agent applications in compliance with the FIPA specifications [18] for interoperable intelligent multi-agent systems. Inter-Agent communication is based on the FIPA ACL which specifies a standard message language by setting out the encoding, semantics and pragmatics of the messages. As shown in Fig.5, the semantics of agent messages and reasoning is built over OWL concepts and predicates, which are matched with Jess and JADE vocabulary. Figure 6 shows the proposed multi-agent framework, which assigns three fundamental roles to agents: Context provider agents (CPA). These agents wrap context sources to capture raw context data and instantiate the ontology representation. CPAs may encapsulate single
A Framework for Context-Aware Home-Health Monitoring
127
OWL public class SystolicBloodPressure extends BloodPressure{ {private int hasNormalMaxSBPValue; public void setHasNormalMaxSBPValue(int value) {this.hasNormalMaxSBPValue=value;} }} JADE (deftemplate MAIN::SystolicBloodPressure MAIN::SystolicBloodPressure "$JAVA-OBJECT$ .SystolicBloodPressure" (declare (from-class .SystolicBloodPressure))) JESS Fig. 5. The system implementation is based on the matching between OWL vocabulary with agent inner context representation (in the form of Java classes) and Jess facts codification
sensors or multiple sources. In the former case (“single domain CPAs”) they are mainly responsible for gathering and filtering data and info from sensor devices. In the latter case, [19] they interact with single domain CPAs, in order to aggregate context information from various context sources (for instance sensed data must be aggregated with patient thresholds). Both kinds of CPAs are responsible also of making low level context inference and putting relevant context information into the rule engine as facts. Context interpreter agent (CIA). Context Interpreter Agents are responsible for observing context changes sensed by CPAs, and, as consequence of these changes, to identify the set of actions that should be performed by context consumer agents. Context consumer agent (CCA). Context consumer agents are responsible for performing the actions triggered by CIAs. Actions provide the application reaction to context information changes, which may assume diverse forms, such as the generation of a signal, the delivery of a notification or a web services request.
5 Context Sources As previously stated, the input raw data of the proposed architecture is represented by the set of values, usually physical parameters, collected by the so-called physical sources. Nevertheless, in order to be effectively and conveniently integrated in the scheme proposed in Fig.2, the capability to measure a physical value (such as temperature, blood pressure or oxygen saturation), is only one of the several
128
A. Esposito et al.
Data CPA
Data Retrieval Working Memory Manipulation Anomaly Awareness
CIA
Working Memory Manipulation Procedure Awareness
CCA
Agent Behaviours Working Memory
Anomaly Management
Intra-Agent Communications Inter-Agent Communications
Fig. 6. The multi-agent framework. Context Provider Agents (CPA) are responsible for inferring a potential “anomaly awareness” situation from data provided by context sensors. Context Interpreter Agents (CIA) process high level knowledge provided by CPAs to acquire a sort of “procedure awareness”. Context Consumer Agents (CCA) forward signals and messages to the suited destination as requested by CIAs (anomaly management). The three categories of agents embed a Jess rule engine, update the working memory of Jess and interoperate by using ACL messages.
requirements a physical source should satisfy. The measured data, for instance, should be sent to a data collector by using a wireless connection, and the choice of the most adequate one is not univocal: wi-fi, Bluetooth, GPRS, UMTS, GSM are only a few of the many possible candidates. In order to allow the indispensable capillary diffusion of physical sources, though, the ratio benefit/cost cannot be left out of consideration, thus imposing the choice of a cost-saving wireless technology. Moreover, the capability to be easily interfaced with Internet could be another added value. On such basis, the integration of sensor-networks with RFID systems appears to be the most practicable way. RFID technology, in fact, is quite inexpensive (passive RFID tags are as cheap as few euro-cents) and naturally compatible with Internet [20]. Moreover, as demonstrated in the following, it could be slightly modified in order to transmit sensor-like data. Indeed, the scheme proposed in Fig.7 represents the actually designed and realized (patent pending) general purpose Sensor-Tag (S-Tag) connected to a generic sensor. The architecture of the S-Tag does not substantially differ from standard RFID systems, thus allowing us to maintain the compatibility between this system and devices already available and internationally standardized. The working principle is as
A Framework for Context-Aware Home-Health Monitoring
129
S-tag DIGITAL IN/OUT SWITCH
SENSOR
CHIP (ID1)
CHIP (ID2)
CHIP (ID3)
CHIP (ID4)
MULTI-ID CHIP
SENSOR-TAG ANTENNA
Fig. 7. A simplified scheme of the RFID sensor tag (S-tag)
easy as effective: data measured from a generic sensor are used as input to the S-Tag; when the Tag is in the region covered by the RFID reader, it sends back a signal containing a different combination of several identity codes (IDs) depending on the value of the input itself, thus facilitating the transmission of sensor data. More specifically, the internal microwave circuit of the S-Tag samples the value at its input (which has been measured by the sensor) and quantizes it by using a number of bits equal to the number of available different IDs. For each bit with value equal to 1, the integrated micro-switch selects the corresponding ID to be transmitted; the combination of bits can be hence received by a standard RFID reader and easily decoded in order to rebuild the sensor-measured waveform. This is not the most adequate context for a more exhaustive explanation of the implementation issues of the S-Tag; we only would like to observe that, as apparent from the picture, the sensor is an external unit. In such a way, generic sensors, with the only requirement of a digital output, can be used. Such sensors are not integrated into the S-Tag, so that they do not influence the tag-cost. Moreover, thanks to an accurate electromagnetic design of the tag antenna and of the microwave circuit (microcontroller, RF-switch and so on), also the implemented technological innovation is reasonably inexpensive.
6 Conclusions In this paper we presented a framework for context-aware computing and its prototypal implementation to a home-care scenario. The framework is based on a context codification obtained by integrating an ontology model and a rule-based representation. The main components of the proposed system are its multi-agent
130
A. Esposito et al.
behaviour, as well as the harmonization of heterogeneous technologies, such as agents, ontologies and rule-based inference engines, combined with the low-cost and flexible properties of RFID systems. The overall system is now available and tested in real-life situations in homehealth applications.
References 1. Weiser, M.: The computer for the 21st century. Scientific American, 94–104 (1991) 2. Baldauf, M., Dustdar, S., Rosenberg, F.: A survey on context-aware systems. Int. J. Ad Hoc and Ubiquitous Computing 2(4), 263–277 (2007) 3. Paganelli, F., Giuli, D.: An Ontology-based Context Model for Home Health Monitoring and Alerting in Chronic Patient Care Networks. In: 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW 2007) (2007) 0-7695-2847-3/07 4. Ko, E.J., Lee, H.J., Lee, J.W.: Ontology-Based Context Modeling and Reasoning for UHealthCare. IEICE Trans. Inf. & Syst. E90–D(8), 1262–1270 (2007) 5. Chen, H., Finin, T., Joshi, A.: An ontology for context-aware pervasive computing environments. Special Issue on Ontologies for Distributed Systems, Knowledge Engineering Review (2003) 6. Gu, T., Pung, H.K., Zhang, D.Q.: A service oriented middleware for building contextaware services. Journal of Network and Computer Applications 28, 1–18 (2005) 7. Indulska, J., Sutton, P.: Location management in pervasive systems. In: CRPITS 2003 Proceedings of the Australasian Information Security Workshop, pp. 143–151 (2003) 8. Strang, T., Popien, C.L.: A context modeling survey. In: Workshop on Advanced Context Modeling, Reasoning and Management as Part of UbiComp 2004, The 6th International Conference on Ubiquitous Computing, pp. 33–40 (2004) 9. W3C, RDFS (RDF Vocabulary Description Language 1.0: RDF Schema), Recommendation (February 10, 2004), http://www.w3.org/TR/rdf-schema/ 10. DAML site, http://www.daml.org 11. W3C, OWL Web Ontology Language Overview, Recommendation, (February 10, 2005), http://w3.org/TR/2004/RDC-owl-features-20040210/ 12. Guarino, N.: Formal Ontology and Information Systems. In: Guarino, N. (ed.) Proceedings of the 1st International Conference on Formal Ontologies in Information Systems, FOIS 1998, Trento, Italy, pp. 3–15. IOS Press, Amsterdam (1998) 13. Chen, H., Finin, T., Joshi, A.: An ontology for context-aware pervasive computing environments. In: The Knowledge Engineering Review, vol. 18, pp. 197–207. Cambridge University Press, Cambridge (2003) 14. Dey, A.K., Abowd, G.D.: Toward a better understanding of context and contextawareness, GVU Technical Report, GIT-GUV-99-22 (1999) 15. Ernest Friedman-Hill “Jess In Action”, Edited by Manning 16. http://herzberg.ca.sandia.gov/jess/docs/52/rete.html 17. Jade, http://jade.cselt.it 18. Fipa, http://fipa.org/repository/index.html 19. Dockhorn Costa, P., Ferreira Pires, L., van Sinderen, M.: Architectural patterns for context-aware services platforms. In: 2nd International Workshop on Ubiquitous Computing (IWUC 2005), in conjunction with ICEIS 2005, Miami, USA (2005) 20. Finkenzeller, K.: RFID Handbook: Fundamentals and Applications in Contactless Smart Cards and Identification. John Wiley and Sons Ltd, Chichester
Semantic Learning Space: An Infrastructure for Context-Aware Ubiquitous Learning Zhiwen Yu1, Xingshe Zhou2, and Yuichi Nakamura1 1 Academic Center for Computing and Media Studies, Kyoto University, Japan [email protected], [email protected] 2 School of Computer Science, Northwestern Polytechnical University, P.R. China [email protected]
Abstract. In order to facilitate the development and proliferation of contextaware ubiquitous learning services, there is a need for architectural support in the user context processing and learning content management. In this paper, we propose a context-aware ubiquitous learning infrastructure called Semantic Learning Space. It leverages the Semantic Web technologies to support explicit knowledge representation, flexible context reasoning, interoperable content integration, expressive knowledge query, and adaptive content recommendation. The architectural design and enabling technologies are described in detail. Several applications and experiments are presented to illustrate and evaluate the key features of the infrastructure.
1 Introduction The emergence of e-learning allows the learners to access electronic course contents easily and conveniently through the Web. With the vision of ubiquitous computing coming to reality, people will be living in an environment surrounded ubiquitously by a lot of networked computers (e.g. PCs, TVs) and mobile devices such as PDAs, cellular phones, etc.. Learners can therefore access their desired learning content anytime anywhere with the accessible devices. These two trends have precipitated the advent of ubiquitous learning. One of the most important features of ubiquitous learning is adaptability – learners can get the right information at the right place in the right way [1]. To achieve learning adaptability, the provisioning of content needs to take into account the learner’s changing context (e.g., prior knowledge, learning style, current activities and goals), which we call context-aware learning. In a smart learning environment, a contextaware learning service can, for example, analyze the user’s knowledge gap by comparing his current competencies and tasks, and then suggest suitable learning content for him [2]. While many e-learning systems have been developed and used in recent years, building context-aware learning systems in ubiquitous environment is still complex and time-consuming due to inadequate infrastructure support, e.g., context processing, content provisioning, and content selecting. We propose a context-aware ubiquitous learning infrastructure called Semantic Learning Space that leverages the Semantic Web technologies to support explicit F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 131–142, 2008. © Springer-Verlag Berlin Heidelberg 2008
132
Z. Yu, X. Zhou, and Y. Nakamura
knowledge representation, flexible context reasoning, interoperable content integration, expressive knowledge query, and adaptive content recommendation. The Semantic Web enables computers and people to work in cooperation by representing machine-interpretable information and automating some processes. It satisfies the user-centric, personalized and active learning requirements of context-aware learning. The Semantic Learning Space exploits the Semantic Web standards and tools to provide architectural support for facilitating the development and proliferation of context-aware ubiquitous learning services. The main characteristics of the infrastructure include: (1) systematically handling context management, i.e., aggregation, inference, and query of context; (2) interoperably integrating content from diverse sources; (3) adaptively recommending learning content according to various kinds of context. The rest of this paper is organized as follows. Section 2 discusses previous work relevant to this paper. Section 3 presents the ontology model to express knowledge about the learner, content, and the domain being learned. Section 4 describes the Semantic Learning Space infrastructure and its components in detail. The prototype implementation and evaluation are presented in Section 5. Finally, Section 6 concludes the paper.
2 Related Work There has been much work done in the area of context-based learning in the past few years. LIP project [2] aims to provide immediate learning on demand for knowledge intensive organizations through incorporating context into the design of e-learning systems. iWeaver [3] is an e-learning system which offers learners with different media experiences based on their different learning styles. Sakamura and Koshizuka [4] propose ubiquitous learning that enables people to learn about anything at anytime and anywhere by deploying RFIDs onto a variety of objects, such as foods, medicines, vegetables, and resorts. JAPELAS [5] is a context-aware language-learning support system for Japanese polite expressions learning. It provides learner with appropriate polite expressions deriving the learner’s situation and personal information. Paraskakis [6] proposes a paradigm of ambient learning aiming at providing access to high quality e-learning material at a time, place, pace and context that best suits the individual learner. To achieve this aim, it includes three main functions: multimodal broadband access, content management and integration, and context management. KInCA [7] is an agent-based system supporting personalized, active and socially aware e-learning. The personal agent is aware of the user’s characteristics and cooperates with a set of expert cognitive agents. Recently, some efforts have been put into applying the Semantic Web technologies for learning. Stojanovic et al [8] present an e-learning architecture with ontologybased description of the learning materials. Tane et al [9] propose an ontology tool (displaying, interaction, and evolution of ontology) for semantic resource management in e-learning. Elena project [10] works towards a smart space for learning, which is based on peer-to-peer and Semantic Web technologies. It adopts semantic web ontology languages to describe educational services. Although much work has been done to provide efficient and smart learning services by taking into account user’s context as well as utilizing the Semantic Web
Semantic Learning Space
133
technologies, our work differs from and perhaps outperforms previous work in several aspects. While existing work either focuses on context-based learning or Semantic Web-based learning, we integrate them together, i.e., applying the Semantic Web technologies to accomplish context-aware learning. Although Huang et al [11] propose to exploit Semantic Web technologies to represent context in e-learning, which is similar to us, they do not describe how to use context to accomplish content adaptation. Second, our work enhances the development of context-aware learning services through systematically handling context management (e.g., context aggregation, inference, storage and query) as opposed to ad hoc manner in existing systems. Third, we propose an adaptive recommendation approach to realize context-awareness in learning content provisioning while existing systems simply use match or rule-based selection.
3 Ontology Model We use ontologies to model context about the learner, knowledge about the content, and the domain knowledge (the taxonomy of the domain being learned). Within the domain of knowledge representation, the term ontology refers to the formal and explicit description of domain concepts, which are often conceived as a set of entities, relations, instances, functions, and axioms [12]. By allowing learners or contents to share a common understanding of knowledge structure, the ontologies enable applications to interpret learner context and content features based on their semantics. Furthermore, ontologies’ hierarchical structure lets developers reuse domain ontologies (e.g., of computer science, mathematics, etc.) in describing learning fields and build a practical model without starting from scratch. Upper-level ontologies are designed to provide a set of basic concepts that allows the definition of new concepts in terms of subclasses to complement the upper-level classes. In our system, we have designed three ontologies: Context Ontology, Learning Content Ontology, and Domain Ontology. The Context Ontology shown in Fig. 1a depicts contexts about a learner, e.g., content already mastered, learning goal, available learning time, location, learning style, and learning interests. It also describes the characteristics of the terminal that the learner uses (e.g., displaying feature and network connection). The learning goal may be an abstract subject or a particular content. lco and do stand for Learning Content Ontology and Domain Ontology, respectively. Properties of contents as well as relationships between them are defined by the Learning Content Ontology (see Fig. 1b). The relation hasPrerequisite describes content dependency information, i.e., a content necessary to be taken before the target content. Actually, nowadays most of the departments in universities provide a course dependency chart when issuing their courses. The Domain Ontology is proposed to integrate existing consensus domain ontologies such as computer science, mathematics, chemistry, etc. The domain ontologies are organized as hierarchy to demonstrate topic classification. For instance, the hierarchical ontology of computer science domain is presented in Fig. 1c. It derives from the well-known ACM taxonomy (http://www.acm.org/class/1998/).
134
Z. Yu, X. Zhou, and Y. Nakamura
We adopt OWL (Web Ontology Language) [13] to express ontology enabling expressive knowledge description and data interoperability of knowledge. It basically includes ontology class definition and ontology instance markups.
(a)
(b)
(c) Fig. 1. Ontology design, (a) context ontology, (b) learning content ontology, (c) computer science domain ontology
4 Semantic Learning Space Infrastructure The Semantic Learning Space infrastructure consists of seven collaborating components (see Fig. 2): context aggregator, context reasoner, learning content integration, ontology mapping, knowledge query engine, learning content recommender, and content deliver. The Context aggregator is responsible to aggregate context data from physical sensors as well as virtual sensors, transform the raw context data into context mark-ups, and assert them into the context knowledge base. The Context reasoner infers high-level context information, e.g., user behaviour, from basic sensed contexts, and check for knowledge consistency in the context KB. Learning content integration is responsible for integrating multiple heterogeneous content repositories. Ontology mapping transforms different kinds of content description metadata into the generic ontology mark-ups. The Knowledge query engine handles persistent queries and allows learning content recommender to extract desired content description, context information, and domain knowledge from the knowledge bases. The Learning
Semantic Learning Space
135
content recommender selects learning content and determines its presentation form according to learner’s context. Finally the Content deliverer retrieves and delivers the selected learning content to users.
Content deliverer
Learning content recommender
Knowledge query engine Context reasoner
Content knowledge base Domain knowledge base
Context knowledge base
Context aggregator
Ontology
mapping
Learning content integration
Outlook calendar Workflow Human resource Semantic annotation Virtual sensors
Camera Microphone RFID GPS Content repositories
Physical sensors
Fig. 2. Semantic Learning Space architecture
4.1 Context Aggregator Nowadays there are many kinds of sensors deployed in smart home and smart classroom. Such sensors can be utilized as sources of context about learning. It is necessary to combine different information from video sensors, RFID sensors, and other types of sensors for analyzing complicated context, e.g., learning behavior. The context aggregator therefore aggregates diversity of context information from an array of diverse information sources, i.e., physical sensors and virtual sensors. Physical sensors like camera, microphone, RFID sensors and pressure sensors can detect user’s basic context. Virtual sensors, like outlook calendar service and organization workflow, on the other hand, can extract user’s semantic context, such as learning schedule, task, etc. Context aggregation helps to merge the required information which is related to a particular learner. It then asserts them into the context knowledge base for further reasoning.
136
Z. Yu, X. Zhou, and Y. Nakamura
4.2 Context Reasoner Higher-level contexts (What is the user doing? What is the activity in the room?) augment context-aware learning by providing summary descriptions about a user’s state and surroundings. Context reasoner infers abstract, high-level contexts from basic sensed contexts, resolves context conflicts, and maintains knowledge base consistency. It includes both certain and uncertain reasoning. For certain reasoning, we can apply OWL ontology reasoning using DL (Description Logic), and user-defined reasoning based on first-order logic. To perform user-defined context inference, an application developer needs to provide horn-logic rules for a particular application based on its needs. To enable context reasoning to handle uncertainty, we can use mechanisms such as Bayesian Networks, fuzzy logic, and probabilistic logic. Our current system applies Jena2 generic rule engine [14] to support user-defined reasoning over the OWL represented context. The context reasoner is responsible for interpreting rules, connecting to context KB, and evaluating rules against stored context. We have specified a rule set based on the forward-chaining rule engine to infer high-level learning contexts, e.g., a user’s learning behavior, based on basic sensed context. For instance, the following rule examines whether a given person is currently engaged in English studying on the basis of his location, action, book on the desk — if he is in the study room, he is sitting, and the book on the desk is English book, he is likely to be studying English. The user’s location is tracked by RFID. The user’s action, namely standing, sitting, and walking are recognized by video. The English book is detected by RFID sensor to be on the desk. type(?user, Person), locatedIn(?user, StudyRoom), action(?user, Sitting), on(EnglishBook, Desk) =>involvedBehavior(?user, StudyingEnglish) 4.3 Learning Content Integration Learning content integration is responsible for integrating multiple heterogeneous content repositories. Our current system applies Sakai 2.1 tools [15] to integrate varied learning content repositories from a wide range of content providers. It allows to manage integration without including the complexity inherent in supporting heterogeneous means of communication and data exchange, and therefore gains access to content in a manner that hides the technical detail by which that content is provided. To facilitate flexible integration, each repository needs to be wrapped with a plugin developed with O.K.I (Open Knowledge Initiative) Repository OSID (Open Service Interface Definition) [16]. It contracts between service consumers and providers. The OSIDs for eLearning services include repository, assessment, grading, and course management. The Repository OSID also defines methods for managing object lifecycle, data maintenance, and searching. Sakai 2.1 offers the Repository OSID as a repository service. The repository service works with many plug-ins. It finds out what plug-ins are available, loads the plug-ins, and federates requests and responses among them.
Semantic Learning Space
137
4.4 Ontology Mapping Different organizations have developed description languages for learning objects which combine terms from multiple vocabularies (e.g., LOM, Dublin Core, and SCORM). To facilitate the semantic interoperability of learning content, we should provide a universal query interface. The ontology mapping mechanism transforms different kinds of content description metadata into the generic ontology mark-ups. It enables universal query across heterogeneous content repositories that are described using different metadata vocabularies. The ontology mapping involves two ontologies: core ontology and mapping ontology. The core ontology of learning content is defined in Section 3. It provides a common understanding of the basic entities and relationships of learning content. We adopt W3C SKOS Mapping [17] to describe the mapping ontology. It contains a set of properties for specifying mapping relations between concepts from different organizations’ ontologies (e.g., exactMatch, broadMatch, narrowMatch, majorMatch, minorMatch). Our current system maps LOM, SCORM, ARIADNE, IMS, OKI, and Dublin Core metadata. The mapping is done semi-automatically. We first manually specify the mapping relations used for mapping concepts in a particular learning content ontology to the core ontology. After this the term mapping in content metadata querying is automatically done according to the manual setting. 4.5 Knowledge Query Engine There are mainly three knowledge bases in our system: Context Knowledge Base (Context KB), Content Knowledge Base (Content KB) and Domain Knowledge Base (Domain KB). The knowledge bases provide persistent knowledge storage and eventbased retrieval by the Knowledge Query via proper interfaces. The Context KB also allows some manipulations, e.g., context reasoning. The Domain KB integrates existing consensus domain ontologies such as ontology of computer science, mathematics, chemistry, etc. The domain ontologies are organized as hierarchy to demonstrate topic classification. Since the three kinds of knowledge are represented with ontology and described with OWL, we can inherently provide a universal query interface. The knowledge query engine handles expressive query triggered by the learning content recommender. It provides an abstract interface for applications to extract desired knowledge from the Context KB, Content KB and Domain KB. In our current system, we adopt RDQL (RDF Data Query Language) [18] to query knowledge about content, context, and domain. It is widely used for querying RDF or OWL represented ontology-based information. 4.6 Learning Content Recommender The content recommender provides the right content, in the right form, to the right person based on learner context. It addresses two problems about providing learning content: (a) which content should be provided, and (b) in which form should the selected content be presented.
138
Z. Yu, X. Zhou, and Y. Nakamura
For the purpose of efficient context processing in content recommendation, we classify context into two categories: personal context (e.g., prior knowledge and goals) and infrastructure context (physical running infrastructure, e.g., terminal capability). The right content is determined by personal context, while the right presentation form depends on infrastructure context and user’s QoS requirements. The content recommender first utilizes knowledge-based semantic recommendation [19] to determine the right content which the user really wants and needs to learn. It computes the semantic similarity between the learner and the learning contents and generates a learning path for a particular target course. The content recommender also suggests content according to a user’s situation context. For example, knowing the user has studied a subject (e.g., mathematics) for a long time and is somewhat tired for it, it will recommend another subject, e.g., English. Then, the content recommender determines appropriate presentation form of the learning content according to user’s QoS requirements and device/network capability [20]. The recommendation process is conducted based on fuzzy theory. An adaptive QoS mapping strategy is employed to dynamically set quality parameters of learning content at running time according to the capabilities of client devices. 4.7 Content Deliverer The content deliverer is responsible for streaming or downloading learning content to various devices through different networks. If the modality of recommended content is video, audio or flash, i.e., continuous ones, the content deliverer streams the content to terminals. On the other hand, if the modality is image or text, i.e., discrete ones, the content deliverer merely downloads the content. The content deliverer should support a wide variety of video, audio, image formats adapting to different terminal and network conditions. In our system, the media delivery supports transferring content over such networks like wired Internet and wireless IEEE802. MPEG-4 is employed as video codec for media streaming.
5 Implementation and Evaluation 5.1 Implementation We have developed a prototype of the Semantic Learning Space infrastructure. It consists of an embodiment of the above defined architecture and a set of APIs for supporting the integration of sensors, content repositories, and the development of context-aware applications. Several context-aware learning applications supported by the proposed infrastructure have been implemented to illustrate the key features of it. Content adaptation is conducted within the applications based on different kinds of contexts. One application is to enable recommending learning content according to user’s learning goal and prior knowledge. The aim of developing this application is to test the infrastructure support for semantic recommendation of learning content. Fig. 3
Semantic Learning Space
139
Fig. 3. Main interface for semantic recommendation
shows the main interface for the semantic content recommendation. It mainly consists of four parts. The top part provides GUI for context capturing, i.e., learning goal and the courses already learned. When the learner wants to get content based on these context, the system queries context and makes recommendation. Then, the recommendation list is presented below. The learning path is generated and shown in the left bottom column, while the recommendation package for the content selected from the path is presented in the right bottom column. For learners, how to learn the materials in a more efficient way is very important. The second application developed aims to improve learning efficiency through giving learning advices in arranging learning subjects based on user’s behavior and task. For example, through inference on basic sensed information the context reasoner gets the high-level behavior context that the user has studied a subject (e.g., mathematics) for a long time and is somewhat tired for it. The system then suggests the learner to study another subject (e.g., English), that belongs to today’s learning task and has yet to be finished. The everyday learning task can be made up by the teacher or the parents. The advice message can be given in different ways, e.g., displaying a message on the computer screen or sending an email to parents’ mobile phone. Fig. 4 shows an advice message displayed on the computer. The learner’s current learning behavior, today’s learning task, and learning advice are presented.
Fig. 4. Learning advice
140
Z. Yu, X. Zhou, and Y. Nakamura
The third application is named ELAA (English Learning Anytime Anywhere) with the intention of illustrating the infrastructure support for content adaptation according to the infrastructure context like device capability and network condition. The user can on-line access English learning material through different networks (wired and wireless) and display it on different devices (PC, laptop, PDA, and mobile phone). Suppose that a learner is currently using a Sony VAIO handheld PC that uses either wired Ethernet or wireless network to access the popular English learning material, Family Album USA. As network condition changed, the content presentation form varied accordingly. For example, with the low-bandwidth wireless network, the video playing features 240*176 frame size and 5fps frame rate (see Fig. 5a), while a highquality video with larger size (480*360) and high frame rate (30fps) displaying when the user uses the wired network (see Fig. 5b).
(a)
(b)
Fig. 5. System screenshots: (a) playing low quality video; (b) playing high quality video
5.2 Evaluation We evaluated the Semantic Learning Space from system perspective. Among the infrastructure functions, content recommendation and context reasoning are relatively time-consuming. We therefore mainly tested the overheads of these two components in terms of response time. The experiments were conducted on a Linux workstation with 2.4 GHz Pentium 4 CPU and 512 MB RAM. We measured the content recommending time by varying the number of learning content metadata ranging from 50 to 250. The result is shown in Fig. 6a. We observed that the recommendation process was quite computationally intensive. In particular, as the amount of metadata (or contents) grows, the time spent increases proportionally to the size of the learning content database. For context reasoning, we used five context data sets: 1000, 2000, 3000, 4000, and 5000 RDF triples to test the inference engine’s scalability. The rule set contains 20 user-defined rules. The results (Fig. 6b) show that the response time largely depends on the size of context knowledge base. As the content database and context knowledge become larger, the response time will lead human perceivable delay. However, our user test shows that most of the invited learners found the response time acceptable. We believe the infrastructure is generally feasible for non-time-critical learning applications.
Semantic Learning Space
141
Recommending time (ms)_
2500
2000
1500
1000
500
0 50
100
150
200
250
4000
5000
Number of content metadata
(a) 1600
Reasoning time (ms)
1400 1200 1000 800 600 400 200 0 1000
2000
3000
Number of RDF triples
(b) Fig. 6. System performance: (a) content recommending; (b) context reasoning
6 Conclusion and Future Work In this paper, we propose an infrastructure for context-aware learning in ubiquitous computing environment. It supports explicit knowledge representation, flexible context reasoning, interoperable content integration, expressive knowledge query, and adaptive content recommendation. Three applications are given to illustrate the key features of the infrastructure. In the future work, we will continue to develop more context-aware ubiquitous learning applications using the infrastructure and improve the design. Also we are trying to measure the user’s satisfaction with the system by conducting user study.
Acknowledgement This work was partially supported by the Ministry of Education, Culture, Sports, Science and Technology, Japan under the project of “Cyber Infrastructure for the Information-explosion Era”, the High-Tech Program of China (863) (No. 2006AA01Z198), and the Innovation Fund of Northwestern Polytechnical University of China (No. 2006CR13).
142
Z. Yu, X. Zhou, and Y. Nakamura
References 1. Chen, Y., et al.: A Mobile Scaffolding-Aid-Based Bird-Watching Learning System. In: The 1st IEEE International Workshop on Wireless and Mobile Technologies in Education (WMTE 2002), pp. 15–22 (2002) 2. Schmidt, A., Winterhalter, C.: User Context Aware Delivery of E-Learning Material: Approach and Architecture. Journal of Universal Computer Science 10(1), 28–36 (2004) 3. Wolf, C.: iWeaver: Towards ‘Learning Style’-based e-Learning in Computer Science Education. In: The 5th Australian Computing Education Conference (ACE 2003), Adelaide, Australia, pp. 273–279 (2003) 4. Sakamura, K., Koshizuka, N.: Ubiquitous Computing Technologies for Ubiquitous Learning. In: WMTE 2005, Tokushima, Japan, pp. 11–20 (2005) 5. Ogata, H., Yano, Y.: Context-Aware Support for Computer-Supported Ubiquitous Learning. In: The 2nd IEEE International Workshop on Wireless and Mobile Technologies in Education (WMTE 2004), Taoyuan, Taiwan, pp. 27–34 (2004) 6. Paraskakis, I.: Ambient Learning: a new paradigm for e-learning. In: The 3rd International Conference on Multimedia and Information & Communication Technologies in Education (m-ICTE2005), Spain, pp. 26–30 (2005) 7. Nabeth, T., Razmerita, L., Angehrn, A., Roda, C.: InCA: a Cognitive Multi-Agents Architecture for Designing Intelligent & Adaptive Learning Systems. Computer Science and Information Systems 2(2), 99–114 (2005) 8. Stojanovic, L., Staab, S., Studer, R.: eLearning based on the Semantic Web. In: Proceedings of the World Conference on the WWW and Internet (WebNet 2001), Orlando, Florida, USA (2001) 9. Tane, J., Schmitz, C., Stumme, G.: Semantic Resource Management for the Web: An ELearning Application. In: The 13th International World Wide Web Conference on Alternate Track Papers and Posters, New York, USA, pp. 1–10 (2004) 10. Simon, B., Mikls, Z., Nejdl, W., Sintek, M., Salvachua, J.: Smart Space for Learning: A Mediation Infrastructure for Learning Services. In: The 12th International Conference on World Wide Web, Budapest, Hungary (2003) 11. Huang, W., Webster, D., Wood, D., Ishaya, T.: An intelligent semantic e-learning framework using context-aware Semantic Web technologies. British Journal of Educational Technology 37(3), 351–373 (2006) 12. Gruber, T.: A Translation Approach to Portable Ontology Specification. Knowledge Acquisition 5(2), 199–220 (1993) 13. McGuinness, D.L., Harmelen, F.: OWL Web Ontology Language Overview. W3C Recommendation (2004) 14. Carroll, J., et al.: Jena: Implementing the Semantic Web Recommendations. In: WWW 2004, New York, pp. 74–83 (2004) 15. Sakai, http://sakaiproject.org/ 16. Open Service Interface Definition, http://www.okiproject.org/ 17. W3C SKOS Mapping, http://www.w3.org/TR/swbp-skos-core-guide/ 18. Miller, L., Seaborne, A., Reggiori, A.: Three Implementations of SquishQL, a SimpleRDF Query Language. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 423–435. Springer, Heidelberg (2002) 19. Yu, Z., et al.: Ontology-Based Semantic Recommendation for Context-Aware E-Learning. In: UIC 2007, Hong Kong, China, pp. 898–907 (2007) 20. Yu, Z., et al.: Fuzzy Recommendation towards QoS-Aware Pervasive Learning. In: AINA 2007, Niagara Falls, Canada, pp. 846–851 (2007)
A Comprehensive Approach for Situation-Awareness Based on Sensing and Reasoning about Context Thomas Springer1 , Patrick Wustmann1 , Iris Braun1 , Waltenegus Dargie1 , and Michael Berger2 1 2
TU Dresden, Institute for Systems Architecture, Computer Networks Group Siemens AG, Corporate Technology, Intelligent Autonomous Systems CT IC 6 {Thomas.Springer, Patrick.Wustmann, Iris.Braun, Waltenegus.Dargie}@tu-dresden.de, [email protected]
Abstract. Research in Ubiquitous Computing and Ambience Intelligence aims at creating systems able to interact in an intelligent way with the environment, especially the user. To be aware of and to react on the current situation usually a complex set of features describing particular aspects of the environmental state has to be captured and processed. Currently, no standard mechanisms are available to model and reason about complex situations. In this paper, we describe a comprehensive approach for situation-awareness which covers the whole process of context capturing, context abstraction and decision making. Our solution comprises an ontology-based description of the sensing and reasoning environment, the management of sensing devices and reasoning components and the integration of these components into applications for decision making. These technological components are embedded into an conceptual architecture and generic framework which enable an easy and flexible development of situation-aware systems. We illustrate the use of our approach based on a meeting room scenario.
1
Introduction
Research in Ubiquitous Computing and especially Ambience Intelligence aims at creating systems able to interact in an intelligent way with the environment, especially the user. ”Machines that fit the human environment instead of forcing humans to enter theirs will make using a computer as refreshing as a walk in the woods” [1]. Thus, a system able to recognise the current situation can adapt its behaviour accordingly. For instance, an application for supporting mobile workers during their tasks in the field could adapt the interaction modalities to improve the interaction with the user (e.g. speech input and output could be used if the workers hands are not free or gesture input could be used if the surrounding noise level is very high). In a similar way, an assistance application for elderly people could intelligently support planning of daily activities, e.g. selecting convenient connections of public transportation systems for carrying out shopping activities, visiting the doctor or meeting relatives or friends. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 143–157, 2008. c Springer-Verlag Berlin Heidelberg 2008
144
T. Springer et al.
To create such systems, sensing the environment and adapting the behaviour according to its current state are major prerequisites. A system has to be able to capture information about the environment and the involved users based on heterogeneous and distributed information sources, e.g. sensor networks, extracted application data, user monitoring or other methods for gathering context information. The information captured in this way is usually lower-level and has to be abstracted and fused to create an understanding of the overall situation a system is currently within. A large set of schemes for reasoning about the current situation exist which include a wide range of logic and probabilistic based estimation, reasoning and recognition schemes most of which have been employed in artificial intelligence, image and signal processing, control systems, decision theory, and stochastic processes. All these schemes have their advantages and disadvantages and can be applied for different types of sensed data and application areas. Currently, no standard mechanisms are available to model and reason about complex situations. There is no common understanding about what features are relevant for a certain situations and how such features and their interrelations can be identified and modelled. Moreover, the adoption of different reasoning schemes is in an early state, especially with respect to their combination and the overall performance. In this paper, we describe a comprehensive approach for situation-awareness. In particular, the capturing of low-level context information based on sensor networks and further sensing devices, the abstraction of higher-level context using heterogeneous reasoning schemes and the derivation of system decisions are covered. Our solution comprises an ontology-based description of the sensing and reasoning environment, the management of sensing devices and reasoning components and the integration of these components into applications for decision making. These technological components are embedded into a generic framework and a design methodology which enable an easy and flexible development of situation-aware systems.We illustrate the use of our approach based on a meeting room scenario. Our paper is organised as follows: Related work in the areas of context sensing and reasoning as well as for architectures and frameworks for context-awareness is discussed in chapter 2. We introduce our concepts, giving a detailed description of the requirements, the major concepts, the proposed architecture together with its components and a development methodology in chapter 3. In chapter 4 we describe the implementation of the generic framework and our feasibility study based on one example scenario is presented in chapter 5. We conclude the paper with summarising the lessons learned and giving an outlook to future work.
2
Related Work
Several context reasoning architectures and prototypes have been proposed in the recent past. The Sylph [2] architecture consists of sensor modules, a proxy core, and a service discovery module. A sensor module provides a standard means for initialising and accessing sensor devices. A service discovery module advertises
A Comprehensive Approach for Situation-Awareness
145
sensors through a lookup service, and a proxy core manages application queries and serves as a semantic translator between applications and sensor modules. The iBadge prototype, developed with the Sylph architecture, tracks and monitors the individual and social activities of children in kindergartens (loneliness, aggression behaviour, etc.) It incorporates orientation and tilt sensing, environmental sensing, and a localisation unit. The Mediacup [3] is an ordinary coffee mug to which a programmable hardware for sensing, processing, and communicating context is embedded. The hardware is a circular board designed to fit into the base of the cup. It incorporates a micro controller, a memory for code processing and data storage, an infrared transceiver for communication, and an accelerometers and a temperature sensor for sensing different contexts. The same system architecture was used to embed a sensing system into a mobile phone. A three-layered context recognition framework is used to reason about the status of the mug and the mobile phone. It consists of a sensor layer, a cue layer, and a context layer. The sensor layer is defined by an open-ended collection of sensors which capture some aspects of the real world. The cue layer introduces cues as abstractions of raw sensory data. This layer is responsible for extracting generic features from sensed data, hiding the sensor interfaces from the context layer. The context layer manipulates the cues obtained from the cue layer, and computes context as an abstraction of a real world situation. SenSay [4] takes the context of a user into account to manage a mobile phone. This includes adjusting the functional units of the mobile device (e.g. setting the ringer style to a vibration mode) or it can be call related. For the latter case, SenSay prompts remote callers to label the degree of urgency of their calls. Korpip¨ aa¨ et al. [5] exploit several sensors to recognise various everyday human situations. The authors propose a multi-layered context-processing framework to carry out a recognition task. The bottom layer is occupied by an array of sensors enclosed in a small sensor box and carried by the user. The upper layers include a feature extraction layer incorporating a variety of audio signal processing algorithms from the MPEG-7 standard; a quantisation layer based on fuzzy sets and crisp limits; a classification layer employing a na¨ıve Bayesian classifier which reasons about a complex context. Their implementation involves a three-axis accelerometer, two light sensors, a temperature sensor, a humidity sensor, a skin conductance sensor and a microphone. The approaches above support the dynamic binding to context sources. On the other hand, their deployment setting as well as the context reasoning task is predetermined at the time the systems are developed. This limit the usefulness of the systems as users’ and applications’ requirements evolve over time. Subsequently, there is a need for dynamic integration of sensing, modelling and reasoning. We build upon the experiences of previous work to support flexible context reasoning and usage. In parallel, efforts were started to create more general models and services to model and reason about context based on ontologies. Recent projects (e.g., [6]) covered the creation of comprehensive and generic context models with the
146
T. Springer et al.
goal of identifying and integrating characteristics of contextual information. Especially, ontology-based modeling and reasoning is addressed ([7,8,9]) with focus on knowledge sharing and reasoning. In [8,7], approaches for defining a common context vocabulary based on a hierarchy of ontologies are described. An upper ontology defines general terms while domain-specific ontologies define the details for certain application domains. Both approaches use a centralized architecture and work on local scenarios from the smart home or intelligent spaces domain. These solutions focus on the modelling of context and apply particular reasoning schemes. Thus, a comprehensive approach starting with sensor integration and classification is not considered. Moreover, the systematic placement of sensors and a decomposition of situations are not addressed.
3
Conceptual Architecture
Real world situations usually have to be derived from a complex set of features. Thus, a situation-aware system has to capture a set of features from heterogeneous and distributed sources and to process these features to derive the overall situation. Thus, major challenges for the creation of situation-aware systems are to handle the complexity of the situation and the related features, to manage the sensing infrastructure and to find appropriate reasoning schemes that efficiently derive the overall situation from low-level context features. In the following we present a conceptual architecture for the creation of situation-aware systems. The approach is intended to be comprehensive, i.e. it comprises all components and processing steps necessary to capture a complex situation, starting with the access and management of sensing devices up to the recognition of a complex situation based on multiple reasoning steps and schemes. To handle complex situations the concept of decomposition is applied to the situation into a hierarchy of sub-situations. These sub-situations can be handled autonomously with respect to sensing and reasoning. In this way, the handling of complex situations can be simplified by decomposition. We focus on sensors installed in the environment, i.e. they are immobile and usually not part of a mobile device carried by the user. The main idea is to exploit cheap and already available sensors to create new or extended applications. For example, buildings are more and more equipped with sensors, e.g. for measuring temperature or light intensity to enable building automation. Such installations can be extended and then exploited to create ”more intelligent” situation-aware applications. We start with the discussion of the requirements for our solution and discuss our conceptual architecture in detail. 3.1
Requirements
The goal of our research work was to develop a comprehensive approach for situation awareness which covers the whole process of context capturing, situation recognition based on context abstraction and decision making. Because of the dynamic properties in the field of situation-awareness, systems have to be flexible and extensible. Therefore, the approach should be independent
A Comprehensive Approach for Situation-Awareness
147
of the type of information sources involved, i.e. the types of sensors, the structure of particular sensor networks and the different real world situations which could occur. Furthermore, the approach should not depend on the application scenario, the used reasoning schemes and the type of the derived higher-level context. Based on the situation and application-specific use of the sensing infrastructure, the capturing and abstraction should be (re-)configurable, e.g. regarding the frequency of data collection, the used classification mechanisms and the aggregation of sensor values. Moreover, scalability and modularity are important. That means, the approach should not restrict the amount of deployed sensors and their distribution. It should also allow a flexible combination of different schemes for reasoning of higher-level context information. For a better reuse, the resulting system should be designed in building blocks, so it is easy to replace a single building block by another one (e.g. a particular reasoning scheme with a different one). Especially, a situation should not be modelled in a monolithic way and varying availability of sensors and resulting incompleteness of information should be manageable. Last but not least, the system should be independent from a certain application scenario. Rather, it should support the set-up of different scenarios regarding the involved sensors and their configuration, the used reasoning schemes and the overall situation captured. Thus, a formalised scenario description which can be interpreted, instantiated and exchanged should be supported by the system. 3.2
Conceptual Architecture Based on Situation Decomposition
Our conceptual framework is based on the decomposition of complex situations. From previous experiments we have observed, that complex situations can be decomposed into sub-situations which can be handled independently with respect to sensing and reasoning. Each sub-situation represents a certain aspect of the overall situation and has to be fused with other sub-situations at a certain level of a hierarchical reasoning process. Below that point, a sub-situation can be handled separately. This enables especially a parallel and distributed sensing and reasoning process. Moreover, for each sub-situation the appropriate reasoning scheme can be applied. At the same time, the decomposition enables the modularization of the system, because sub-situation can be assigned to separate processing modules. Extensibility and flexibility can be supported, because during the decomposition process, alternative sub-situations can be identified and new aspects can be easily added to the hierarchical situation description. According to our validation example, the overall situation could be the current use of a meeting room. Among others, the meeting room could be empty, used for a discussion between two people, a discussion in a larger group, a presentation, or a party. Each of these instances of the overall situation can be captured based on the aggregation and processing of different aspects of that complex situation. For instance, the number of persons in the room, their activity, the lighting, the noise level and the state of the beamer could be used to detect the overall situation. Thus, the initial step for creating a situation-aware system according to our conceptual framework is the decomposition of the complex situation the
148
T. Springer et al.
Fig. 1. Abstraction process for situation detection based on sensor data
system should be aware about. The resulting hierarchy of sub-situations can be refined or extended during the development process as well as during the whole lifetime of the system. The leaves of the situation hierarchy represent atomic sub-situation which can not or should not be decomposed further. These atomic sub-situations are the starting points for the creation of the situation-aware system. To reflect all necessary steps to derive the overall situation from sensed information, our conceptual architecture consists of three layers: a sensing layer, a feature extraction layer and a reasoning layer. These layers are depicted in figure 1. Each of these layers will be described in more detail below. Sensing Layer. The sensing layer comprises solutions for two major issues, namely the integration of heterogeneous sensing devices and the placement and organization of sensing devices according to their semantic relation to subsituations. In fact, there is a broad range of different sensors which can be considered for gathering context information like audio, video, a whole wireless-sensor network, etc. These sensors have to be accessed with the help of a specific programming interface. So the sensors deliver different types of (raw) values. Usually the interfaces are provided by the manufacturer of the sensors. Therefore mainly the sensors can be used directly with only little implementation effort, except the special implementation of a wireless-sensor network, for implementing special sensor configurations. Usually, sensors relevant for a certain sub-situation belong to a certain location. Thus, in contrast to other approaches, which often assume a uniform or random distribution of sensing devices and calculate average values out of several values of sensors of the same type in a certain area, we
A Comprehensive Approach for Situation-Awareness
149
identify distinct ”‘areas of interest”’ which are relevant to capture sensor data relevant for a certain sub-situation. For instance, the presentation area or a table could represent such an area of interest (see figure 3). At these areas of interest different types of sensor devices should be placed and logically grouped. Thus, in our concept each sensor is dedicated to a certain area of interest. Especially, sensors of the same type which belong to different areas of interest are handled separately in the classification and lower-level reasoning steps. The idea is, that an average value of all sensors for the light level at a certain location could be useless for capturing a certain situation, while the values of the light level at certain areas of that location could have a high relevance (e.g. at the projection area of a beamer to decide if the beamer is on or off, or at the surface of a seat to decide if the seat is taken or free). The information of the organisation of a set of sensors into an area of interest is exploited in the classification and reasoning steps described next. Feature Extraction Layer. Because of the heterogeneous sensing devices and the resulting different sensor values, each value has to be classified by an appropriate classifier. Classifiers are dividing the sensor data into individual, application depended classes. These classed are labelled with a symbolic name. The used classifiers can be Neural Networks, Hidden-Markov-Models, Decision Trees, Rule-Sets, Bayesian-Nets, Clustering with a matching heuristic or simply a quantisation over the data. Because of the possible use of different classifiers for the different sensors it is difficult respectively almost not possible in addition to the classified sensor data to have a common quality statement out of the different classifiers, which can be used as quality statements, which influence the reasoning steps. Therefore, the results of the classifiers are logic facts of the particular sensors on their areas of interest. These facts are forwarded then to the reasoning steps. Reasoning Layer. Reasoning is in our concept done in multiple hierarchical steps. Based on the facts resulting from classification new facts, i.e. new highlevel context attributes, are inferred. This is done with the help of different reasoning schemes, which can be deployed separately or work in parallel. Example reasoning schemes are ontology reasoning applying description logics reasoners, rule-based reasoning, case-based reasoning, neuronal networks or bayesian nets. Schemes like the last two can be used, but they have to be trained with a sufficient large training set. The advantages of these methods are, that contradictory facts, which are resulting from measurement failures of the sensors can be handled better. In fact, no wrong high-level context facts are reasoned out of these wrong sensor facts, because the result of these reasoning schemes are statements of the quality or the probability of the inferred context attributes. For example, a trained Bayesian Network for a context attribute calculates the probability to that attribute. On that probability it can be decided if that attribute should be further considered in the following reasoning steps or not. The disadvantage is the application depended and high training effort in advance. The resulting new facts can be further forwarded to reasoners of next superior areas, which combine these facts to another, more abstract fact. After a certain
150
T. Springer et al.
level of abstraction is reached the context attributes of all levels can be delivered to a context-aware application, which can than involve this information for internal decision making, i.e. triggering actions, performing adaptations, etc. To create situation-aware systems according to our conceptual architecture, the developer can intuitively decompose the relevant situation. Based on the experiences and knowledge about classification and reasoning schemes, the different layers of the conceptual architecture can be defined. Usually, the whole process of situation decomposition and the identification of the components of the different layers of the conceptual architecture are determined based on an iterative process. Starting with a small set of sub-situations, the developer can test the system and extend it stepwise to create a more-and-more complex system. Especially, our experiences show, the understanding of a complex situation grows during testing and practical trials.
4
Generic Framework for Situation-Awareness
For implementing situation-aware systems according to our conceptual model we propose a generic framework. The framework integrates several sensor devices and provides a set of classifiers and reasoners which can be adopted in different scenarios. Moreover, it supports the specification of application scenarios based on an ontology which describes the sensor infrastructure, relevant physical values, applied classifiers and reasoners together with their particular configurations and data base settings for storing the captured raw data. The architecture of our framework is depicted in figure 2. In the following sections all components of the framework are described in detail. 4.1
Scenario Manager
The Scenario Manager is the control component of the framework. It is configured by an ontology modelling the settings, sensors, locations and physical values relevant for the current scenario. It evaluates this information, instantiates and configures the required components and invokes the components at runtime. Thus, by providing a new scenario ontology to the Scenario Manager, the framework can be completely reconfigured for another scenario. 4.2
Sensor Integration
The concept of the framework supports the integration of arbitrary sensing devices. We have currently integrated two types of sensors into our framework. Firstly, we implemented a wireless sensor network (WSN) with a flexible amount of sensor nodes at different locations. These sensor nodes are able to obtain different types of sensed data, e.g. light level and temperature, at once. We have used MicaZ Motes from Crossbow in our current implementation. Crossbow offers a wide range of sensor boards which can be connected to the MicaZ and which include many kinds of sensors. For example, the basic sensor board (MTS310CA) provides the following sensors: Photo (Clairex CL94L), Temperature (Panasonic
A Comprehensive Approach for Situation-Awareness
151
Fig. 2. Architecture of the proposed generic framework for situation-awareness
ERT-J1VR103J), Acceleration (ADI ADXL202-2), Magnetometer (Honeywell HMC1002), Microphone (max. 4KHz), Tone Detector and Sounder (4.5kHz). Based on TinyOS we have implemented basic senor access based on the programming language NesC. We have implemented several commands for initialization, read access, reset, setting of sample rate and network statistics. Secondly, a microphone was utilized for getting audio context information. It is cheap and easy to integrate into a sensor environment. The aim is to extract audio information from the environment, to classify these data according experience, and to identify the most likely context. The integration of the microphone in the prototype was simply done with the javax.sound package. 4.3
Classifiers
For performing the step of classification of sensed data the framework comprises a mechanism for the integration of classifiers and a repository containing a set of classifier implementations. Currently, we have implemented several classifiers for the different sensor types available with the WSN. The result of the classification operations are facts in form of qualitative values (e.g. dark, bright and medium for temperature). For the microphone the underlying model of the classification we used is the Hidden Marokov Model (HMM). Before the HMM can be applied to audio data two the extraction of audio features followed by a quantization of these features has to be performed. Based on training data provided as wave files for a specific scenario, the classifiers can recognise the specified situations (e.g. discussion, presentation or panic in a meeting room). 4.4
Reasoners
Because we assume that all classifiers produce facts we focused on deterministic reasoning schemes. In the current implementation we support rule-based and
152
T. Springer et al.
ontology-based reasoning. Especially, for the reasoning about the overall situation we adopt an ontology-based approach. Therefore, a situation is modelled as a set of concepts and roles in the TBox of the ontology. The current values of concepts and roles related to sensor data or lower-level reasoning steps are included as individuals into the ABox by the scenario manager. To reason about ontologies, a description logic reasoner, namely Pellet [10] is applied. Especially, we use the DL reasoning service realization which works on the Abox. Realization. Given an individual I, Abox A and Tbox T, return all concepts C from T s.t. I is an instance of C w.r.t. A and T, i.e., I lies in the interpretation of concept C. If all Realization is performed for all individuals in the Abox, we speak about the Realization of the Abox. The Abox updates are implemented based on the Semantic Web Framework Jena. Based on realization, concepts can be identified which represent the situation according to the facts in the ABox. For instance, the concept BeamerOn can be defined for the meeting room example. 4.5
Scenario Settings
To ensure the configurability of the framework we introduce an ontology-based scenario description. The description consists of two parts: an upper ontology defining general concepts and roles for the scenario description and a scenariospecific ontology, containing concepts, roles and individuals for the definition of a concrete scenario. The upper ontology defines basic concepts for modelling application scenarios, i.e. available sensors, sensor locations, physical values measured by the sensors, and the assignment of sensors to sensor motes (if existing). The concept Sensor represents the sensors available in the environment. Each sensor is described by a certain location which is assigned to the sensor with the role property locatedAt. The concept Location allows the modelling of semantic locations relevant in the scenario. The sensors available at a certain location are modelled by the property role hasSensor, which is the inverse role of locatedAt. Furthermore, each sensor is described by the physical value, which can be measured by the particular sensor type. The concept PhysicalValue represents a value captured from the environment, e.g. temperature or light intensity. The relation between a sensor and its physical value is modelled by the property measures. Further scenario settings are related to available reasoners, a microphone, the database for storing sensed values and general settings for the scenario. The values are defined as datatype properties for the concepts. In the lower ontology, which is specific to a certain scenario, the general concepts of the upper ontology can be refined by deriving scenario specific concepts. To define concrete instances of reasoners or classifiers, individuals have to be defined in the lower ontology. To add new sensors, classifiers or reasoners, appropriate components have to be implemented and registered with the Scenario Manager. If these components require additional settings, the upper ontology and the controller have to be extended as well. All changes can be done easily based on the current framework implementation. Especially, as long as the
A Comprehensive Approach for Situation-Awareness
153
upper ontology is extended without changing existing concepts and properties, all existing scenario settings can remain usable without any changes. 4.6
User Interface
For the visualization of the functionality of the framework and the easy setup of demonstrations, a graphical user interface was created. The user interface consists of thee parts enabling the configuration of the WSN, the monitoring of sensor data and captured context and to reason about the current situation.
5
Validation Example
In the following the usage of the proposed conceptual architecture and the generic framework is illustrated by an application scenario, namely the capturing of the current situation in a meeting room. In most companies and universities, conference and lecture rooms, special places in cafeterias and lounges, and other public places should be reserved in advance to conduct public meetings, presentations, parties, etc. This ensures that business is conducted as planned and no inconvenience or conflict of interests occurs which inhibits the overall goal of the company or university. On the other hand, this well planned and well organised execution of business should accommodate rooms for impromptu or short term decisions to organise get together, staff meetings and bilateral discussions. For the meeting room scenario the goal is to determine the current activity in the room based on the state of devices (e.g. the beamer), the noise level and the occupation of seats (number of persons) in the room. The activity inside a room can be a meeting, a casual discussion of two or three people, a presentation involving many people, a party, a person who is reading or idle. Based on this knowledge, the usage of that meeting room could be optimized. For instance, an empty meeting room could be detected for a spontaneous meeting or the location of a certain meeting could be detected. 5.1
Areas of Interest
The identified areas of interest are shown in figure 3. To distinguish between several activities in the room (e.g. one person reading, two persons discussing, a working meeting, a presentation or a party) and an empty room the individual chairs were identified as areas of interest similarly to the train compartment application scenario. For each seat area we installed a temperature and a light sensor to measure the light level and the temperature directly at the surface of the seat. Based on the measured values we can distinguish between the occupation by a person or by an object and an empty seat. Moreover, the table represents an area of interest. In that area we installed a microphone to capture the audio data from the room which helps us to distinguish between a discussion between two people, a larger discussion, a party and a presentation (i.e. one person is speaking from a certain distance to the table). A light sensor installed in that area captures the ambient light in the room.
154
T. Springer et al.
Fig. 3. Areas of interest in the meeting room scenario
Additionally, we identified a so called presentation area, the area where the image of the beamer is projected on and where the person is standing while giving a talk. In that area we installed a light sensor to capture the beamer state (i.e. on or off). In combination with the light level in the room we can identify the situation that the beamer is on and the ambient light in the room is dimmed as it would be the case during a presentation. 5.2
Lower Ontology
For the scenario the sensors TemperatureSensor, LightSensor and AudioSensor are defined. These sensors measure the physical values Temperature, Light and Audio respectively. For Temperature the three qualitative values HighTemperature, MediumTemperature and LowTemperature are defined. For Light four qualitative values are defined, namely Dark, Medium, Bright and VeryBright. The values for Audio represent an activity captured by microphone measurements in the room. Currently we distinguish between Discussion and Presentation. By adding more training examples and extending the ontology more values can be added. To reason about the situation in a meeting room we modelled the chairs around a table in the meeting room. Table and Seat as well as PresentationArea are modelled as sub concepts of MeetingroomLocation which itself is a sub concept of Location. Free seats are modelled by the following concept: F reeSeat = ∃hasLightSensor(∃uso : hasV alue(Bright ∨ M edium))∧ ∃hasT emperatureSensor( ∃uso : hasV alue(LowT emperature ∨ M ediumT emperature))
For occupied seats we distinguish between seat occupied by persons and by objects. A seat occupied by a person is described as follows: P ersonOccupiedSeat = ∃hasLightSensor(∃uso : hasV alue(Dark))∧ ∃hasT emperatureSensor(uso : hasV alue(HighT emperature))
A Comprehensive Approach for Situation-Awareness
5.3
155
Reasoning
The reasoning about the situation in the meeting room is based on the individual MyMeetingroom. Again, the type of situation is determined by computing the types of this individual based on the reasoning service realisation. MyMeetingRoom belongs to the concept MeetingRoom. The concept represents the overall situation. All partial situations, namely BeamerOff, BeamerOn and PresentationMeetingRoom are modelled as sub concepts of MeetingRoom. Based on the light sensor installed at the presentation area, the state of the beamer can be determined: BeamerOf f = ∃hasP resentationArea(∃hasLightSensor( ∃uso : hasV alue(Bright ∨ M edium ∨ Dark))) BeamerOn = ∃hasP resentationArea(∃hasLightSensor( ∃uso : hasV alue(V eryBright))) ∧ ∃hasRoom( ∃hasLightSensor(∃uso : hasV alue(M edium ∨ Dark)))
In combination with the audio signal the overall situation is determined: P resentationM eetingRoom = BeamerOn∧ ∃hasRoom(∃hasAudioSensor(∃uso : hasV alue(P resentation)))
New scenarios can be implemented by creating a new scenario ontology based on the upper ontology for configuring the generic framework.
6
Summary and Outlook
The major result of our work is a comprehensive solution for modelling and reasoning about complex situations. The solution is comprehensive in the sense that is starts at the sensor layer and comprises all steps necessary to abstract the captured low level sensor values to an overall notion of a complex situation. In our solution we considered the installation and access of sensors, the classification of captured sensor data, several steps of abstracting sensor data and reasoning about sub-situations and the overall situation. In particular, we developed a systematic approach for the decomposition of complex situations. Based on the identification of sub-situations we have shown that it is feasible to place sensor devices in so called areas of interest. Each area of interest can than process the captured sensor data separately (at least the aggregation and classification of sensed values). In higher level reasoning steps the sub-situations are than integrated into the overall situation. To demonstrate the feasibility of our approach we designed and implemented a generic framework for situation awareness. The framework comprises implemented components of every level of situation awareness as described in the third chapter of this document (see figure 1). Each layer is extensible regarding to new components and technologies (i.e. sensors, classifiers and reasoners).
156
T. Springer et al.
Our solution enables an easy, fast and flexible development of situation-aware applications. New scenarios can be implemented by creating a new scenario ontology based on the upper ontology for configuring the generic framework. By using an OWL ontology, scenarios and sensor configurations are clearly defined, easy readable and easy to understand. 6.1
Lessons Learned
From our approach we learned the following lessons: 1. It is possible and reasonable to decompose complex situation into partial situations. At least in all scenarios considered in our project it was possible to decompose complex situations in a way that each sub-situation characterises a certain aspect of the complex situation and is by itself meaningful. Moreover, the partial situations can than be composed to infer the overall situation even if only a subset of all modelled partial situations is considered, i.e. information about environment is incomplete. 2. The identification of areas of interest and the well defined placement and combination of sensors in that area seams to be reasonable and efficient. Compared to an equal distribution of sensors in the environment, the placement of sensors in areas of interest is more focused and environmental state can be captured systematically. 3. The combination of several classifiers and reasoners seams to be possible. In our generic framework a set of classifiers and reasoners is available and we combined them in different ways in the two scenarios. Based on the ontologybased scenario definition, the classification and reasoning algorithms can be configured and thus reused for different scenarios to some degree. 4. We found out that the programming of WSNs and the placement/organisation of sensors is highly related to the scenario. Usually WSNs have to reprogrammed and sensor placement has to be adopted for different scenarios. Thus, the reusability of particular sensor networks and sensor settings is limited. Nevertheless, the generic framework itself is configurable and thus can be reused in different scenarios. 6.2
Future Work
The results demonstrate that our approach is reasonable and advantageous compared to scenario specific and monolithic approaches. While our approach is more flexible, it assumes a closed sensing infrastructure. Especially, the sensors should be under control of the system developer. To exploit existing sensor infrastructures, our approach could be extended to support sensor discovery and dynamic integration of sensing devices. Especially, a location-based discovery of sensors should be considered. Another important point is a further decoupling of all components and layers of our framework. Classifiers and reasoners are available in a local repository and can’t be distributed in the current implementation. Thus, a distribution of these components as well as a discovery and integration mechanism similar to the sensor integration will be considered in the future. Moreover, we will adopt our approach to further application scenarios.
A Comprehensive Approach for Situation-Awareness
157
References 1. Weiser, M.: The Computer for the 21st Century. Scientific American, 66–75 (September 1991) 2. Srivastava, M., Muntz, R., Potkonjak, M.: Smart kindergarten: sensor-based wireless networks for smart developmental problem-solving environments. In: The 7th Annual international Conference on Mobile Computing and Networking, pp. 132– 138 (2001) 3. Peltonen, V., Tuomi, J., Klapuri, A., Huopaniemi, J., Sorsa, T.: Computational auditory scene recognition. In: International conference on acoustic speech and signal processing (2002) 4. Siewiorek, D., Smailagic, A., Furukawa, J., Krause, A., Moraveji, N., Reiger, K., Shaffer, J., Wong, F.: Sensay: A context-aware mobile phone. In: The 7th IEEE international Symposium on Wearable Computers (2004) 5. Korpip¨ aa ¨, P., M¨ antyj¨ arvi, J., Kela, J., Kernen, H., Malm, E.-J.: Managing context information in mobile devices. IEEE Pervasive Computing 2(3), 42–51 (2003) 6. Henricksen, K., Livingstone, S., Indulska, J.: Towards a hybrid approach to context modelling, reasoning and interoperation. In: Ubi-Comp 1st International Workshop on Advanced Context Modelling, Reasoning and Management, pp. 54–61 (2004) 7. Chen, H., Finin, T., Joshi, A.: An ontology for context-aware pervasive computing environments (2003) 8. Gu, T., Pung, H.K., Zhang, D.Q.: Toward an osgi-based infrastructure for contextaware applications. IEEE Pervasive Computing 3(4), 66–74 (2004) 9. Christopoulou, E., Goumopoulos, C., Kameas, A.: An ontology-based context management and reasoning process for ubicomp applications. In: sOc-EUSAI 2005: Proceedings of the 2005 joint conference on Smart objects and ambient intelligence, pp. 265–270. ACM Press, New York (2005) 10. Sirin, E., Parsia, B.: Pellet: An OWL DL reasoner. In: Haarslev, V., M¨ oller, R. (eds.) Proc. of the 2004 Description Logic Workshop (DL 2004), vol. (104) (2004), http://www.mindswap.org/2003/pellet/index.shtml
Context-Adaptive User Interface in Ubiquitous Home Generated by Bayesian and Action Selection Networks Han-Saem Park, In-Jee Song, and Sung-Bae Cho Department of Computer Science, Yonsei University 134 Shinchon-dong, Seudaemun-gu, Seoul 120-749, Korea {schunya,sammy}@sclab.yonsei.ac.kr, [email protected]
Abstract. Recent home theater system requires for users to control various devices such as TV, audio equipment, DVD and video players, and set-top box simultaneously. To obtain the services that a user wants in this situation, user should know the functions and positions of the buttons in several remote controllers. Since it is usually difficult to manipulate them and the number of the devices that we can control increases, we get to confuse more as the ubiquitous home environment is realized. Moreover, there are a lot of mobile and stationary controller devices in ubiquitous computing environment, so that user interface should be adaptive in selecting the functions that user wants and in adjusting the features of UI to fit in a specific controller. To implement the user and controller adaptive interface, we model the ubiquitous home environment and use the modeled context and device information. We have used Bayesian network to get the degree of necessity in each situation. Action selection network uses predicted user situation and necessary devices, and it selects necessary functions in current situation. Selected functions are used to construct adaptive interface for each controller using presentation template. For experiments, we have implemented ubiquitous home environment and generated controller usage log in this environment. We have confirmed the BN predicted user requirements effectively as evaluating the inferred results of controller necessity based on generated scenario. Finally, comparing the adaptive home UI with the fixed one by 14 subjects, we confirm that the generated adaptive UI is more useful for general tasks than the fixed UI. Keywords: Adaptive user interface, ubiquitous home, Bayesian network, action selection network.
1 Introduction People use remote controllers belonging to the appliances when they want to control them. If they purchase more appliances, the number of controllers is getting bigger according to one of appliances. For example, users have to control several devices such as cable TV set-top-box, DVD player, television, audio equipment, video, and DivX players to use a recent home theater system. Controllers for those devices have different interfaces by their companies though they looked similar. Thus, it is not easy to get accustomed to all controller interfaces [1]. Though the controllers have various F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 158–168, 2008. © Springer-Verlag Berlin Heidelberg 2008
Context-Adaptive User Interface
159
functions, only about a third is used practically. If ubiquitous home is generalized, most home devices like lights and boilers as well as home appliances would be controlled using remote controllers. Therefore, it is required to study a context adaptive user interfaces that provide users necessary functions among many of them. In this situation, PUC (Personal Universal Controller), automatically generating user interfaces in PDA or smart phones, has been investigated recently. J. Nichols and B.A. Myers in CMU developed the system, which can generate an optimized controller interface for smart phone using hierarchical menu navigation system [2] and presented HUDDLE, which generates automatically task-based user interface [1]. Besides, they verified that automatically generated interface has better usability in terms of the time efficiency and consistency than general interfaces for device control using user study [3]. These studies generated and provided the useful PUC, but they cannot consider user’s context. We implemented the ubiquitous home environment and used the space and device information. In addition to the user and sensor input, system have used Bayesian network to infer the necessity of devices given context. Also, action selection network has used the necessity of devices as input and selected the function which is necessary in current context. Adaptive user interface consists of these selected function using presentation template. Modeling using Bayesian networks provides good performance as effectively incorporating the domain knowledge [4]. We also modeled the user interface with action selection network, so it can adapt to the user input constantly. For experiment, we have made the log of the use of devices based on scenario and infer the necessity of devices with this log. After that, we evaluated the proposed method using GOMS model. 14 subjects were asked to perform 10 tasks with both conventional fixed home UI and proposed adaptive home UI. The results showed that subjects dealt with the general tasks more effectively using the proposed UI than using conventional one.
2 Related Works 2.1 Bayesian Networks for Context-Awareness Context can have several meanings. Dey defined context as any information that can be used to characterize the situation of an entity such as a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and the application themselves [4]. Generally, context influences the user's preference to a service, and it does to controlling devices in home. It is because the devices user want to control can change according to user's context. Bayesian network, one of the models that are used for context-awareness, is a model to infer the context and provide reliable performance with uncertain and incomplete information [5]. It can be modeled using the data and can be designed using expert knowledge, and has been used to classify and infer the mobile context based on these strengths. Korpipaa et al. in VTT utilized naive Bayes classifier to learn and classify the user's context [6]. E. Horvitz et al. in MS Research proposed the system that infers what the user concentrated in a certain time point in an uncertain environment [7].
160
H.-S. Park, I.-J. Song, and S.-B. Cho
2.2 Action Selection Network Action selection network was presented by P. Maes for robot agent control [8]. Figure 1 illustrates an example of representative action selection network. In this network, competition of actions is the basic characteristic, and it is shown as a change
Fig. 1. An example of action selection network: Solid line represents predecessor link and dashed line represents successor link
of activation level in each action. An action with the highest activation level is selected after the activation spreading, it is calculated as follows. Nodes can have different types of links that encode various relationships and stimulate one another, and each node has five components of preconditions, add list, delete list, activation level, and the executable code. Links divides into internal and external, and internal link subdivides into predecessor link, successor link and conflict link. External link is connected to sensor and goals. If the value of a certain sensor S1 is true and S1 is in the precondition list of an action node A, a link from S1 to A is activated. If goal G1 has an activation larger than zero and G1 is in the add list of A, a link from G1 to A is activated. The procedure to select an action node executed at each step is as follows: 1. Calculate the excitation from sensors and goals. 2. Spread excitation along the predecessor, successor and conflictor links, and normalize the activation level, so the average becomes equal to a constant π . 3. Check any executable nodes, select the one with the highest activation level, execute it, and finish. A node is executable if all preconditions are satisfied and its activation level is greater than threshold. If there is no executable node, reduce the threshold and repeat the process. Links are set by a designer according to the given task. Using this, the system can select the proper action for its goal given a set of sensor states.
Context-Adaptive User Interface
161
3 Context Adaptive User Interface for Ubiquitous Home Figure 2 summarizes the process of proposed adaptive UI generation for ubiquitous home devices. To begin with, Bayesian network infers the necessary devices in current context. With this result and the description of devices and controllers, action selection network is constructed in order to select the necessary functions for each device. Finally, user interface for a given controller is generated using UI template.
Fig. 2. Adaptive UI generation process
3.1 Modeling Ubiquitous Home Ubiquitous home environment for adaptive user interface generation is modeled as follows. - Ε = {Location_List, Device_List, Sensor_List} - Location_List = {Location1, Location2, ..., LocationN} - Device_List = {Device1, Device2, ..., DeviceN} - Sensor_List = {Sensor1, Sensor2, ..., SensorN} - Location = {Location_Name, Location_ID, Neighbors} - Device = {Device_Name, Device_ID, Location_ID, Function_List} - Function_List = {Function1, Function2, ..., FunctionN} - Function = {Function_Name, Function_ID, Function_Type, Function_Value, Add_List, Delete_List} - Sensor = {Sensor_Name, Sensor_ID, Sensor_Type, Sensor_Value}
162
H.-S. Park, I.-J. Song, and S.-B. Cho
Ubiquitous home environment (E) includes location list, which tells locations of users, rooms and devices, device list including appliances like TV and video and equipments like boiler and lights, and sensor list, which observes the states of users and environments. Device information includes the name, location, and its functions, and function information includes functions of each device and the constraints. Function also includes Add_List and Delete_List for action selection network. Add_List has functions, which are desirable to be displayed with that function. Delete_List, on the other hand, has functions, which are not desirable to be displayed with. Location information is represented as a room name. 3.2 Predicting Necessary Devices Using Bayesian Network We have used the Bayesian network to infer the devices that seem to be necessary for a given context in ubiquitous home. Since Bayesian network can be made by expert knowledge even though there are no or little data for learning [9], the system provides reliable performance in an uncertain home environment where it just set. After enough amount of log is obtained, it is possible to learn the Bayesian network. Personalized model can be learned using the log of individual users. We have used the K2 algorithm proposed by Cooper and Herskovits [10]. To calculate the necessity of each device, basic context such as user location, current time, and the date (whether the day is holiday or not) should be used as evidences. The logs of devices related to each device also have been used as evidences. Related devices mean the devices with similar function or in the same room. After all evidences are set, BN inference is conducted to predict the necessary devices. 3.3 Selecting Necessary Functions Using Action Selection Network Once we get the necessities of each device, an action selection network is designed using Device_List (D) and Function_List (F) explained in section 3.1. Assuming these necessities of devices as a set of environmental nodes E and the states of devices’ functions as a set of functional nodes B, E and B are defined as Equation (1) and (2) using F and D. (1) E = {ei | ei ∈ D ∧ executable(ei )}
B = {bi | bi ∈ F ∧ Executable(bi )}
(2)
After making the nodes, predecessor links, successor links, and conflictor links between these nodes are generated. Predecessor link is generated when Equation (3) is satisfied. Predecessor link, which connects two function nodes or a function node and an environment node, is generated when both nodes belong to the same device and they are related positively. Here, a positive relation occurs among functions that are likely to be executed after a certain function is executed. This link is used to make a hierarchical structure of functions in one device. Successor and conflictor links are used to represent the relations of functions in the different devices. Successor link is decided as Equation (4). It is similar to the predecessor link, but they are different in that successor link connects the functions of different devices and connects functional
Context-Adaptive User Interface
163
nodes only. Conflictor link is generated if Equation (5) is satisfied. Differing from the successor link, it is generated when two functions have negative relation (confliction). ⎧ ⎪ ⎪1 precondition(n p , ns ) = ⎨ ⎪ ⎪0 ⎩
⎧ ((n p ⊂ B) ∨ (n p ⊂ E )) ⎪ if ⎨ Device(n p ) = Device(ns ) ⎪relation(excute(n ), excute(n )) p s ⎩ otherwise
⎧ ⎧ ((n p ⊂ B) ∨ (n p ⊂ E )) ⎪ ⎪ ⎪1 if ⎨ Device(n p ) ≠ Device(ns ) succesor (n p , ns ) = ⎨ ⎪relation(excute(n ), excute(n )) ⎪ p s ⎩ ⎪0 otherwise ⎩ ⎧ ⎪ ⎪1 conflictor (n p , ns ) = ⎨ ⎪ ⎪0 ⎩
⎧ ((n p ⊂ B ) ∨ (n p ⊂ E )) ⎪ if ⎨ Device(n p ) = Device(ns ) ⎪confliction(excute(n ), excute(n )) p s ⎩ otherwise
(3)
(4)
(5)
Fig. 3. Change of activation level in an action selection network and constructed user interface after the user controlled TV
164
H.-S. Park, I.-J. Song, and S.-B. Cho
Constructed action selection network is basically represented as a tree, which has device nodes as parents and their functions as children, and each link is added to that tree based on the functional relations. In this network, activation functions are evaluated as F : E × B → [0...1] . Using the inferred necessities of devices, a set of environment node E is made. After that, the procedure to select a necessary function in action selection network follows as explained in section 2.4. After the procedure, we can get an activation level of each function node. Active function node bi(t) is selected as Equation (6). We let the several functions be selected at once while the conventional action selection network does not, so user interface can use these selected functions to display at next step. ⎧ α i (t ) ≥ θ ⎧ ⎪1 if ⎨ bi (t ) = ⎨ (6) executable (bi , t ) = 1 ⎩ ⎪0 otherwise ⎩ Constructed network is evaluated again when the device list user needs changes a lot or when user requests it. As explained in Figure 3, the necessity of TV gets larger if a user watches it, so network evaluation is updated. After evaluation, user interface includes more functions to control TV.
4 Experiments For evaluation, we have conducted experiments in a simulated ubiquitous home environment. Comparing the proposed adaptive UI with the conventional fixed UI, we have confirmed that the adaptive UI provided better performance.
Fig. 4. A plane figure of ubiquitous home environment
Context-Adaptive User Interface
165
4.1 Experimental Environment
Simulated home environment is illustrated in Figure 4. It has 5 rooms of a living room, a dining room, a bedroom, a study, and a bathroom, and each room has devices summarized in Table 1. Lists for devices and functions are represented as XML. Table 1. Devices and functions in each room
Place
Living room
Dining room
Bedroom
Study
Restroom
Device
Name Power Channel TV Volume Brightness Light intensity Power Audio equipment Mode Play Volume Power Ceil light Light intensity Power Wall light Light intensity Window Open/Close Curtain Open/Close Power Coffee maker Status Power Ceil light Light intensity Power Ceil light Light intensity Window Open/Close Curtain Open/Close Bed Fold/Unfold Status Set (am/pm) Alarm Set (hour) Set (minute) Power Computer Status Phone Status Power Ceil light Light intensity Power Ceil light Light intensity Instantaneous Power water heater Temperature Power Fan Mode
Function Type Enum Enum Range Range Range Enum Enum Enum Range Enum Range Enum Range Enum Enum Enum Enum Enum Range Enum Range Enum Enum Enum Enum Enum Range Range Enum Enum Enum Enum Range Enum Range Enum Range Enum Enum
166
H.-S. Park, I.-J. Song, and S.-B. Cho
4.2 Usability Test for Adaptive User Interface
To evaluate the usability of the proposed method, we assumed 10 situations that could happen in home environment and evaluated adaptive interface generated with GOMS model [11]. Situations are provided in Table 2. Each situation has detailed tasks. For example, situation #1 is “Having a breakfast while listening to music with audio” and its detailed tasks are as follows. “Turn on the ceil light in a dining room Æ Turn on the coffee maker Æ Turn on the audio equipment in a living room Æ Set the frequency Æ Set the volume Æ Have a breakfast Æ Turn off the ceil light in a dining room” Table 2. 10 situations for usability test Number 1 2 3 4 5 6 7 8 9 10
Situation
Having a breakfast while listening to music with audio Watching TV in the evening Playing computer games Getting up from the bed in the morning Taking a shower Closing all windows and curtains Turning of all lights Watching TV while having dinner Having a phone conversation in a bed at night Going to sleep at night
Fig. 5. A user interface used in current home network environment (A fixed UI only)
Using these situations, we have evaluated the user interfaces of Figure 5 and Figure 6 using GOMS. Figure 5 used tab-type interface for selecting place and device. To control a certain device, 2 steps of tab selection are required. This is an interface that
Context-Adaptive User Interface
167
Fig. 6. A user interface including context-adaptive interface generated by proposed method (A fixed UI with an adaptive UI)
is based on a controller design of home network widely used. An interface in Figure 6 adds a context-adaptive interface at the right side. It changes according as the user controls the devices or the context changes. Using these two interfaces, we have compared the time that consumed for 14 users to perform the tasks in 10 situations. Figure 7 summarizes the result. When using an adaptive interface together, the reduction rate of time was 38.63% comparing to one only using a fixed user interface.
Fig. 7. Time consumed to perform tasks in 10 situations
168
H.-S. Park, I.-J. Song, and S.-B. Cho
5 Conclusion This paper proposed the method for context-adaptive user interface generation. There can be several devices and controllers in a ubiquitous home environment, proposed method generated the user interface considering both controller and devices. Bayesian network was used to infer the necessary devices, and an action selection network was used to select the necessary functions according as the context and user selection changes effectively. Finally, we have conducted the usability test of the user interface generated by the proposed method. For future work, we are planning to compare the proposed model with others.
Acknowledgements This research was supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Advancement) (IITA-2008(C1090-0801-0046))
References 1. Nichols, J., Rothrock, B., Chau, D.H., Myers, B.A.: Huddle: Automatically generating interfaces for systems of multiple connected appliances. In: UIST 2006, pp. 279–288 (2006) 2. Nichols, J., Myers, B.A.: Controlling home and office appliances with smart phones. IEEE Pervasive Computing 5(3), 60–70 (2006) 3. Nichols, J., Chau, D.H., Myers, B.A.: Demonstrating the viability of automatically generated user interfaces. In: CHI 2007, pp. 1283–1292 (2007) 4. Kleiter, G.D.: Propagating imprecise probabilities in Bayesian networks. Artificial Intelligence 88(1-2), 143–161 (1996) 5. Dey, A.K.: Understanding and using context. Personal and Ubiquitous Computing 5, 20– 24 (2001) 6. Korpipaa, P., Koskinen, M., Peltola, J., Makela, S.-M., Seppanen, T.: Bayesian approach to sensor-based context awareness. Personal and Ubiquitous Computing 7(2), 113–124 (2003) 7. Horvitz, E., Kadie, C.M., Paek, T., Hovel, D.: Models of attention in computing and communications: From principles to applications. Communications of the ACM 46(3), 52–59 (2003) 8. Maes, P.: How to do the right thing. Connection Science Journal 1(3), 291–323 (1989) 9. Cooper, G., Herskovits, E.A.: A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9(4), 109–347 (1992) 10. Card, S., Moran, T.P., Newell, A.: The Psychology of Human Computer Interaction (1983)
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems Yan Tang and Robert Meersman Semantic Technology and Application Research Laboratory (STARLab), Department of Computer Science, Vrije Universiteit Brussel, Pleinlaan 2 B-1050 Brussels, Belgium {yan.tang, robert.meersman}@vub.ac.be
Abstract. Meaning Evolution Support Systems have been recently introduced as a real-time, scalable, community-based cooperative systems to support the ontology evolution. In this paper, we intend to address the problems of accuracy and effectiveness in Meaning Evolution Support Systems in general. We use Semantic Decision Tables to tackle these problems. A Semantic Decision Table separates general decision rules from the processes, bootstraps policies and template dependencies in the whole system. Recently, DOGMA-MESS (“Developing Ontology Grounded Methodology and Applications” framework based “Meaning Evolution Support Systems”) is developed at VUB STARLab as a collection of meaning evolution support systems. We embed Semantic Decision Tables in DOGMA-MESS to illustrate our approach. Semantic Decision Tables play the roles in both top-down and bottom-up processes of the meaning evolution cycle. The decision rules that consist of templates dependency rules are mainly responsible for the top-down process execution. The bottom-up process execution relies on the ones that contain the concept lifting algorithms. Keywords: ontology, Meaning Evolution Support System, Semantic Decision Table.
1 Introduction Nowadays, a vast amount of ontology capturing methodologies and tools are available. In [9] and on the OnToWorld wiki website1, a general survey on the ontology capturing methodologies is explored, such as the Grüninger and Fox method [11], the Uschold and King method [27], Methontology [8], CommonKADS [18], the linguistic based methodologies (such as in [2, 15]), Opjk methodology [4], and DOGMA methodology [22]. DOGMA (Developing Ontology-Grounded Methods and Applications) is in general a framework for ontology capturing methodologies and applications. Amongst these mentioned methodologies, DOGMA methodology is designed partly based on the best practice of the other methodologies: 1) the Grüninger and Fox method for TOVE project uses the competency questions as a way to scope the domain of 1
http://ontoworld.org/wiki/Ontology_Engineering#Ontology_Building_Methodologies
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 169–186, 2008. © Springer-Verlag Berlin Heidelberg 2008
170
Y. Tang and R. Meersman
interests and conceptualization evaluation; 2) the Uschold and King method for Enterprise Ontology emphasizes the importance of incorporating brainstorming and defining/grouping terms in a natural language; 3) Methontology and CommonKADS focus on the structural knowledge management activities. Each activity produces a deliverable as the output; 4) the linguistic based methodologies ground the concepts on the basis of natural languages. Hence, it is necessary to use the natural language processing technologies while building multilingual ontologies from scratch; 5) the Opjk methodology adapts the argumentation method (so called “Diligent”). It underlines the socio aspect. Seeing the importance of community aspect in the notions of ontology [10, 12], Semantic Web2, Web 2.0 [3], and some socio aspect focused methodologies (e.g. [4]), the trends towards community impacts on ontology engineering result in a growing interest in community-grounded, knowledge-intensive methodology. DOGMA Meaning Evolution Support System (DOGA-MESS) is thus developed at the VUB STARLab [6]. As an extension to the DOGMA, DOGMA-MESS is a machine-guided ontology versioning, merging and alignment system to support scalable ontology engineering. In practice, we observe that it is hard to do in an interorganizational setting, where there are many pre-existing organizational ontologies and rapidly evolving collaborative requirements. Current ontology merging systems 3 mainly focus on how to integrate different ontologies, such as in [14]. Researches concentrate on combining several (sub-) ontologies into one ontology by removing inconsistency and reducing conflicts among them. DOGM-MESS does not focus on how to solve these problems, but to gradually build interoperable ontologies amongst a large, knowledge-intensive community. One communicates with others’ needs, trying to find the overlapping interests, with which we make interorganizational ontologies. The core activity in meaning evolution support systems is to reach the final consensus on the conceptual definitions. How to manage the negotiation among the members (also called “domain experts”) of the community is crucial. In DOGMAMESS, the technology of meaning negotiation [5] is thus integrated, which constructs the kernel of community management. However, the community behaviors are not thoroughly studied, which leads to the fact that the outcomes of the DOGMA-MESS processes often become “messy”. Therefore, a novel design by embedding Semantic Decision Tables (SDTs) in DOGMA-MESS micro process was proposed in our early paper [26]. Semantic Decision Tables are used to capture the community’s behaviors at the macro level and guide the community at the micro level. Recently, we get increasing requirements of managing ontological structure at a high level, automatically checking the dependencies of different knowledge blocks, and the ability of the quick and accurate adaptation of the knowledge elicitation processes in DOGMA-MESS. These requirements become the challenges of this paper. Based on our early work [26], we enrich the model of DOGMA-MESS by integrating Semantic Decision Tables in DOGMA-MESS in this paper. We focus on how Semantic Decision Tables are used in both bottom-up and top-down processes of the 2 3
http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21 We consider the ontology merging systems as a kind of systems in scalable ontology engineering.
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems
171
meaning evolution cycle in DOGMA-MESS. The accuracy and effectiveness that Semantic Decision Tables can bring for the meaning evolution support systems in general are stressed in this paper. The remainder of this paper is structured as follows. In section 2, we present the background of the paper. We compare our work with the existing technologies, and discuss both the advantages and disadvantages of our work in section 3. We design the model of DOGMA-MESS embedded with Semantic Decision Tables in section 4. Different Semantic Decision Tables hold different semantically rich decision rules. The decision rules that consist of templates dependency rules are mainly responsible for the top-down process execution (section 4.1). The bottomup process execution relies on the ones that contain the selection algorithms, which can be evaluated (section 4.2). Section 5 contains the paper conclusion and the future work.
2 Background This section is organized as follows. First, we explain the background of DOGMA (Developing Ontology-Grounded Methods and Applications, [21]) and its philosophical foundation of double articulation (section 2.1). Second, we recall the model of DOGMA-MESS (Meaning Evolution Support System) methodology in section 2.2. Third, we explain the notion of Semantic Decision Tables in section 2.3. 2.1 DOGMA The research efforts on DOGMA (Developing Ontology-Grounded Methods and Applications, [21]) approach to ontology engineering have been performed at the VUB STARLab over ten years. It was designed as a methodological framework inspired by the tried-and-tested principles of modeling conceptual databases. In the DOGMA framework one constructs (or converts) ontologies by the double articulation principle into two layers: 1) the lexon base layer that contains a vocabulary of simple facts called lexons, and 2) the commitment layer that formally defines rules and constraints by which an application (or “agent”) may make use of these lexons. A lexon is a quintuple < γ , t1, r1, r2, t2>, where γ is a context identifier. γ is assumed to point to a resource, and serves to disambiguate the terms t1, t2 into the intended concepts. r1, r2, which are “meaningful” in this specific context γ , are the roles referring to the relationships that the concepts share with respect to one another. For example, a lexon < γ , Driver’s license, is issued to, has, Driver>4 explains a fact that “a driver’s license is issued to a driver”, and “a driver has a driver’s license”. The linguistic nature of a lexon represents that a fundamental DOGMA characteristic is its grounding in the linguistic representation of knowledge. The community of domain experts chooses (or has to agree on) a given (natural) language, e.g. English, to store and present lexon terms and roles. 4
In this paper, we do not focus on the discussion of the context identifier γ , which is omitted in other lexons. E.g. < γ , Driver’s license, is issued to, has, Driver> is thus written as .
172
Y. Tang and R. Meersman
A commitment corresponds to an explicit instance of an intentional logical theory interpretation of applications. It contains a set of rules in a given syntax, and describes a particular application view of reality, such as the use by the application of the (meta-) lexons in the lexon base. This describing process is also called ‘to commit ontologically’. The commitments need to be expressed in a commitment language that can be easily interpreted. Suppose we have a lexon , which has the constraint as “one driver’s license is issued to at most one driver”. We apply the uniqueness constraints UNIQ on the lexon written as below: p1 = [Driver’s license, is issued to, has, Driver]: UNIQ (p1).5 Just like the same database can be viewed and used by different database applications, the same lexon base can be queried, constrained and used by different ontology based application. The commitments interface the lexon base and different applications. The commitments can be further modeled graphically with many popular modeling tools, such as ORM (Object Role Modeling, [13]), CG (Conceptual Graph, [19]) and UML (Unified Modeling Language6). Ontologies modeled in DOGMA can be further implemented in an ontology language, such as OWL7 and RDF(S)8. 2.2 Meaning Evolution Support Systems DOGMA-MESS (Meaning Evolution Support System, [6]) is to organize the process of interorganizational 9 ontology engineering. Its basic characteristic is to find the relevance of concepts defined by different domain experts in order to capture the overlapping interests in the community. The goal is not to create a “complete” domain ontology that covers all the concept definitions in the domain, but a dynamically evolving “interorganizational” domain ontology. A generic model of interorganizational ontology engineering is presented in Fig. 1. An interorganizational ontology cannot be produced once, but needs to evolve over time. The process results in different versions – e.g. v_1 and v_m in Fig. 1. Each version of the interorganizational ontology contains three different layers: OO (Organizational Ontology), LCO (Lower Common Ontology) and UCO (Upper Common Ontology). •
5
Each domain has its own UCO, which contains the common concept type hierarchy and the domain canonical relations. For example, “employee”, “employer” and “personnel” are the common concept types in the domain of human resource management. The concept types “employee” and “employer” are the subtypes of “personnel” in the type hierarchy. The domain
The syntax of the formalized commitment and the examples can be found at: http://www.starlab.vub.ac.be/website/SDT.commitment.example 6 UML is specified by OMG (Object Management Group), http://www.uml.org/ 7 OWL Web Ontology Language: http://www.w3.org/TR/owl-features/ 8 RDF(S) Resource Description Framework (Schema): http://www.w3.org/TR/rdf-schema/ 9 The name of “Interorganizational ontology” was coined by de Moor in [6]. The name “Interorganizational” indicates that the ontology only contains the overlapping interests and is used between the organizations.
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems
•
• •
173
canonical relations are the relations that are common to the domain. For example, the ones between “employee” and “employer” can be “hire” or “fire”. OO is created by individual domain experts based on the ontological structure at UCO level. Each OO within one ontology version is represented by one domain expert that may have different insights from the others’. At LCO level, the concept definitions from different OO are aligned and merged. The overlapped domain interests are elicited. This process happens within a version. When the interorganizational ontology evolves, all the definitions of the concepts at LCO level of version N are standardized and lifted to UCO level of version N+1.
Fig. 1. A Model of Interorganizational Ontology Engineering, [6]
DOGMA-MESS supports the users playing three main roles: the knowledge engineer, the domain experts and the core domain experts. The knowledge engineer has thorough knowledge of analyzing formal semantics in general. He helps the core domain experts to standardize the concept definitions at LCO level. The core domain experts are recognized domain experts who have excellent domain expertise and a clear overview of the domain. The domain experts constitute the majority of the system users. Every enterprise has at least one domain expert who has his own insights of his own domain. They are responsible to create their own organizational ontologies. 2.3 Semantic Decision Table A Semantic Decision Table (SDT, [24]) uses the tabular presentation as a decision table does. There are three basic elements in a decision table: the conditions, the actions (or decisions), and the rules that describe which actions might be taken based on the combination of the conditions. A condition is described by a condition stub and a condition entry. A condition stub contains a statement of a condition. Each condition entry indicates the relationship between the various conditions in the condition stub.
174
Y. Tang and R. Meersman
An action (or decision) contains an action stub and an action entry. Each action stub has a statement of what action to be taken. The action entries specify whether (or in what order) the action is to be performed for the combination of the conditions that are actually met in the rule column. Table 1. A Simple Example of a traditional decision table 1
2
3
Yes
Yes
No
Condition Bad weather It’s far from home
Yes
No
Yes
Money in pocket
Yes
Yes
Yes
Action Take a taxi back home Walk back home
*
* *
Table 1 is a simple decision table with three conditions: “Bad weather”, “It’s far from home” and “money in pocket”; and two actions: “Take a taxi back home” and “Walk back home”. The condition “Bad weather” has two condition entries - “Yes (The weather is bad now)” and “No (The weather is not bad now)”. The rule column with ID ‘1’ expresses a decision rule as “If the weather is bad, it’s far from home, and, there is money in pocket, then the decision is to take a taxi back home”. In the collaborative settings, one decision maker might misunderstand (or have his own comprehension of) the meaning of a decision item designated by others. For example, the condition “It’s far from home - Yes” in Table 1 can have different measures of distance. Does it mean that the distance is more than 1 km, or more than 3 km? A traditional decision table itself doesn’t support the collaborative setting. In [24], we have listed the following problems that occur when using traditional decision tables: 1) ambiguity in the information representation of the condition stubs or action stubs, 2) conceptual duplication amongst the conditions, 3) uncertainty in the condition entries, and 4) difficulties in managing large tables. The notion of Semantic Decision Table (SDT, [24]) was introduced to tackle the mentioned problems. What makes an SDT different from a traditional decision table is its semantics. Unlike traditional decision tables, the concepts, variables and decision rules are explicitly defined. A decision group shares the most important decision knowledge within a group decision making session. SDTs are modeled in DOGMA (section 2.1). Accordingly, an SDT contains a set of SDT lexons, SDT commitments and a specific decision task. The question on how to construct an SDT within a decision group is answered in our recent publications [23, 24, 25]. Although an SDT contains SDT lexons and SDT commitments, SDT itself is not an ontology. It is because each SDT is used for a specific decision task. The SDT commitments can contain both static, ontological axioms and temporal, changeable rules. Also note that the usage of SDTs is not restricted to a specific system, such as DOGMA-MESS in this paper. Instead, we use DOGMA-MESS as an example to
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems
175
demonstrate how SDTs can improve community-grounded systems. SDTs can be applied in many group decision making domains, such as collaborative human resource management.
3 Related Work and Discussion Consensual knowledge base introduced in [7], meeting techniques applied in [4], cooperative domain ontology studied in [1], “Diligent” in Opjk methodology [4] are promising related work in building consensus on ontology level. However, authors in [6] discuss that those methodologies are lack of community consideration although those methodologies work out some basic principles for building ontological consensus. DOGMA-MESS focuses on the community aspects of scalable ontology engineering, and provides a “fat” community grounded model. In our current projects (e.g. the EC Prolix project15), we observe that there are several advantages and disadvantages while applying Semantic Decision Tables to DOGMA-MESS. We stress the advantages as follows: • • • •
The tabular reports generated based on SDT commitments, in general, are extremely convenient and user-friendly for non-technical domain experts. Semantic Decision Tables are used to constrain the dependencies between the templates. The accuracy of the system is thus improved. Semantic Decision Tables are used to capture the behaviors of the community, manage and guide the community systematically and automatically. Therefore, the effectiveness of the system increases. The flexibility at the system management level increases because the knowledge engineers can create different algorithms and decision rules based on their needs.
A big disadvantage is the complexity. The knowledge engineers need to know how to construct Semantic Decision Tables. However, we observe that, in the projects, it is rather easy for the experts to write Semantic Decision Tables if they already know the DOGMA. It is because the notion of Semantic Decision Table is modeled in the DOGMA framework, and the ontology developed at STAR lab also takes the DOGMA format.
4 Embed Semantic Decision Tables in a Meaning Evolution Support System Recently, modeling layered ontologies has been studied so far. The scalable ontology model we describe focuses on neither the typology of ontology nor the construction of layered ontologies. Instead, we focus on the idea of how to gradually build ontologies within layered ontologies. Based on the work in [28, 6], we model a scalable ontology into four layers: Top Level Ontology (TLO), Upper Common Ontology (UCO), Lower Common Ontology (LCO) and Organizational/Topical Ontology (OTO). In [28], topical ontology structure for scalable ontology engineering is introduced to represent the knowledge
176
Y. Tang and R. Meersman
Fig. 2. Interorganizational ontology engineering model with the integration of Semantic Decision Tables
structure of the domain experts (the stakeholders of a community) by involving different view points of analysis and modeling. Later on, the interorganizational ontology model is designed to match the requirements for meaning evolution [6]. We integrate SDT into the topical ontology model and interorganizational ontology model illustrated in Fig. 1. Fig. 2 shows our design. The dotted lines with arrows in Fig. 2 indicate the specialization dependencies between the ontologies of different levels. Comparing to Fig. 1, Fig. 2 contains an extra ontological level – the level of Top Level Ontology (TLO)10. It defines the abstract concept types, such as ‘Actor’, ‘Object’, ‘Process’ and ‘Quality’. Conceptualization at this level is not allowed to be changed. The relations between these concept types fall into two categories: i) the hierarchical relations reflected by the type hierarchical construct. This kind of relations is also called subsumption ontological roles (e.g. “subtype of” relationship in [20]). ii) Other Core Canonical Relations, such as “part-of” merelogical relation in [12], “property-of” relation and “equivalent” relation. Another difference is the OTO Level. In Fig. 1, the lowest level is OO (organizational ontology) level. In Fig. 2, the lowest level is OTO (organizational and topical ontology) level, which includes OO level. Organizational Ontology and Topical Ontology (OTO) seek to represent systematically the knowledge structure the domain experts has on the given themes (or tasks) individually. A Topical Ontology “lays 10
It was called “MO (Meta Ontology)” level in the old papers [6, 26]. However, we have debated whether to name it as MO level or not in the OTM’07 conferences (http://www.cs.rmit.edu.au/fedconf/). As the structures at this level don’t necessarily model the ontology itself, we conclude that it is better to call it TLO.
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems
177
foundation for application (or task) specific ontologies and conceptual models… its semantic space covers multiple subjects and dynamic evolution of the core concepts within a topic” [28]. The concepts within a topic represent the terminology of the application structure, assumption and framework. Within a version, every domain expert (or every enterprise-wise stakeholder group) is responsible to build his own OTO based on the ontology models in UCO. How does DOGMA-MESS execute the model in Fig. 2? Suppose we have some top level models at TLO level designed by the knowledge engineer. The whole process in Fig. 2 starts with creating the templates at UCO level based on the models at LCO level. These templates at UCO level form the first version of the interorganizational ontology (ontology version V1). Then, these templates are delivered to the domain experts from different enterprises. When they receive the templates, they start to create the ontologies at OTO level. After a time period, the system collects the newly introduced concepts, selects a few and lifts them from OTO level to LCO level. When the lifting process is finished, the core domain experts empty the concept set at LCO level by standardizing and formalizing them. Then, the core domain experts merge them at UCO level. A new ontology version V2 is created. As the starting point of creating the ontology V3, the domain experts (again) introduce new concepts based on the updated templates at UCO level, and so forth. By executing the model visualized in Fig. 2 recursively, the ontology evolves. 4.1 Top Down: Use Semantic Decision Tables to Mange Templates Dependencies We use Semantic Decision Tables (SDTs, section 2.3) to manage the dependencies between different templates at UCO level (Fig. 2). In DB theory, several classic templates dependencies are defined in [17]. In the most general sense, dependencies are possible constrained relations. Among all kinds of templates dependencies, multivalued dependencies, subset dependencies, join dependencies, mutual dependencies, and generalized mutual dependencies are the mostly used. Except the multivalued dependencies, the others are also called “functional dependency”. Based on their work, we carefully reuse the notions for ontology engineering. The dependencies in [17] are the constraints at the instance level. Note that the definitions of “instance” in DB theory and in ontology engineering are disparate. In DB theory, a record in a table is called an instance, e.g. in [17]. Suppose that we have a table column called “vehicle” with two record “bus” and “TOYOTA VIP-234”. These two records are both called the instances of “bus”. In ontology engineering, “bus” is a type at the conceptual level. “TOYOTA VIP-234”, which is a car with a plate license, points to a unique object in the UoD. Hence, “TOYOTA VIP-234”, in ontology engineering, is an instance at the instance/application level. In this paper, we follow the definition of “instance” in ontology engineering. We use SDT to manage the dependencies at both the instance level and the conceptual level. In the following subsections, different kinds of dependencies are explained. 4.1.1 Multivalued Dependencies The multivalued dependencies are used to constrain the generating process of the templates. For example, we have two templates that are relevant to the concept “course” (Fig. 3).
178
Y. Tang and R. Meersman
Fig. 3. Two templates in Conceptual Graph [20] at UCO level
If we apply the multivalued dependency of “course” on these two templates, and if a course type (e.g. “Java course”) is introduced by using one template in Fig. 3, “Java course” will be automatically added to another template as a subtype of “Course”. The constraint is stored as the following SDT commitment: (P1 = [Teacher, teach, , Course], P2 = [Teaching, teach, , Course]): MULTIVAL_DEP(P1(Course), p2(Course)). Seeing that the same templates can sometime be applied to different contexts11 in an ontology in DOGMA-MESS, we stress that to use the multivalued dependencies is necessary. 4.1.2 Subset Dependencies The subset dependencies in [17] is similar to the is-a subsumption relationship in ontology engineering. We say that concept A is the subset of concept B when the subtypes and instances of A belong to the set of concept B. For example, we have two templates: one is related to the concept “Teacher” and the other is related to the concept “Lecturer” (Fig. 4). In the context of “university”, “Lecturer” is a subtype of “Teacher”.
Fig. 4. One template relevant to “Teacher” and the other to “Lecturer”
We store this subset dependency in SDT commitments in two ways: 1) to create a new lexon constrained with a subset constraint; 2) to write a subset constraint directly between these two templates. The SDT commitments are shown as follows: P3 = [Lecturer, is a, , Teacher]: SUBSET(P3(Lecturer), P3(Teacher)). //method 1 (P1 = [Teacher, teach, , Course], P4 = [Lecturer, has ID, , Personnel ID]) : SUBSET(P4(Lecturer), P1(Teacher)). //method 2
11
For example, the members of “course” in the context of “university” are different from the ones of “middle school”.
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems
179
SUBSET(P4(Lecturer), P1(Teacher)) (in method 2) records the subsumption relationship between “Lecturer” and “Teacher” defined in lexon P4 and P1. A “Teacher” is a super type of “Lecturer”. SUBSET() constraint can also be applied in the same lexon, e.g. in method 1. Both methods are equivalently used. 4.1.3 Join Dependencies The join dependencies are the constraints while joining two templates. We often use them together with other constraints, such as subset and equal. Take the same example as in section 4.1.2 (Fig. 4), we write the following SDT commitment to indicate that “Teacher” is joined with “Lecturer”. (P1 = [Teacher, teach, , Course], P4 = [Lecturer, has ID, , Personnel ID]): JOIN_DEP(P1(Teacher), P4(Lecturer)), SUBSET(P1(Lecturer), P3(Teacher)). JOIN_DEP(P1(Teacher), P4(Lecturer)) means that the concepts “Teacher” and “Lecturer” defined in lexons P1 and P4 have a join dependency. In this example, the join dependency is used together with a subset constraint – “Lecturer” is a subset of “Teacher”. Thus, when we join “Lecturer” into “Teacher”, the structure of “Lecturer” is not changed. Fig. 5 shows the result of applying this SDT commitment in the system. If we apply a join dependency combined with an equal constraint, then the templates of both concepts should be updated.
Fig. 5. the result of applying join dependency to the two templates in Fig. 4
4.1.4 Other Dependencies Other dependencies have been studied so far, e.g. the constraints of information modeling and relational database in [13]. We mostly use the following constraints • •
Uniqueness. The uniqueness constraint and how to write its commitment are discussed early in section 2.1. Mandatory. A role is mandatory “if and only if, for all states of the database, the role must be played by every member of the population of its object type…” [13, pp. 166]. For example, we want to have a fact that “a lecturer must have a Personnel ID”. We then apply a mandatory constraint on the lexon written as:
P4 = [Lecturer, has ID, , Personnel ID]: MAND(P4(has ID)). •
Equality. The equality constraint discussed in [13, pp. 231] is very similar to the multivalued dependency we deal with in section 4.1.1. A big difference between them is that an equality constraint is applied to two concepts with different names, while a multivalued dependency deals with two concepts
180
Y. Tang and R. Meersman
with the same name. For example, the following SDT commitment means that “Course” and “Lecture” are equivalent. (P1 = [Teacher, teach, , Course], P5 = [Teacher, teach, Lecture]): EQUAL(P1(Course), P5(Lecture)). •
Exclusion. Type A and type B are mutually exclusive if and only if they don’t share any members. This dependency constraint is checked when a new concept type is introduced as a member of Type A or B. If a domain expert tries to add the same concept type to another type (B or A), he would violate this constraint. Suppose we don’t allow a teacher to be a student at the same time. We shall apply the dependency of exclusion to “Teacher” and “Student” written in the following SDT commitment:
(P1 = [Teacher, teach, , Course]), P6 = [Student, 12 learn, , Course]): OR (P1(Teacher), P6(Student)). 4.1.5 Tabular Reports Generated by SDT in the DOGMA-MESS Top Down Process Early in this section, we have discussed different kinds of template dependencies and how to write the SDT commitments respectively. In each DOGMA-MESS top down process iteration, SDT is used not only to constrain the templates dependencies, but also to generate tabular reports. Once a domain expert introduces a new concept, the system checks the dependencies stored in the SDT commitments. A tabular report, which is generated when a domain expert tries to add new concepts, stores the dependencies information of every new concept. Table 2 is a simple example generated when a domain expert add “Trainer” as a subtype of “Lecturer”, and “Tutor” as a subtype of “Teacher”. As there is a subset dependency between “Lecturer” and “Teacher” (“Lecturer” is a subset of “Teacher”, section 4.1.2), the subtypes of “Lecturer”, e.g. “Trainer”, are automatically added as a subtype of “Teacher” (the column named “Trainer”, Table 2). The reason is explained in the column named “1Lecturer” in Table 2: “Lecturer” is the subset of “Teacher”. Table 2. A table that shows validity of introduced concepts Candidate
Trainer
Tutor
Apprentice
1
Lecturer
Teacher
Teacher, Student
Teacher
Lecturer
2
Teacher
Dependency SUBSET EXCLUSION
Student
Action/Decision Add
*
Add to others
*1 {Teacher}
*
Add conflicted Add Action Denied 12
*2
OR is derived from logical disjunction operator – or. For the exclusive-or in logical theory can be considered equivalently to the mutual exclusion in set theory.
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems
181
If the domain expert tries to add “Apprentice” as a subtype of both “Teacher” and “Student”, it cannot be added because “Teacher” and “Student” have exclusion dependency (see the SDT commitment of exclusion in section 4.1.4). Thus, the add action of “Apprentice” is denied (the column named “Apprentice”, Table 2), the reason of which is explained in the columns named “2Teacher” in Table 2. Table 2 shows a very simple example that the DOGMA-MESS top down processes can benefit from SDT. In the next subsection, how to use SDT to guide the concepts elicitation in a bottom up process is discussed. 4.2 Bottom Up: Use Semantic Decision Tables to Guide the Concepts Elicitation Within an ontology version, the system needs to select a few concepts amongst the ones at OTO level. We use Semantic Decision Tables (SDT) to store the lifting algorithms and manage the process.
Fig. 6. The concept “Teacher” is designed at UCO level
Fig. 7. A new relevant concept “Patience” is introduced by a domain expert at OTO level
Let’s first look at a simple lifting algorithm. When we lift a concept from OTO level to LCO level, we need to choose some concepts at OTO level. Let Sc be the concept set at OTO level, and let Sl be the resulting lifted concept set at LCO level. In order to compute this process automatically, we hereby introduce two important condition stubs used to form SDT condition lexons – the relevance score Rc and the interest point Ip. A concept Ci at OTO level is considered as a relevant candidate concept when it gets certain amount of relevance score Rc. Rc is set zero when a new concept is defined at the first time. It increases when the concept is defined in other organizational ontologies designed by different domain experts. For example, if we get the concept “skill of safety” defined within the same context from two different organizational ontologies, the Rc of this concept is increased by one. Interest point Ip starts from zero. Ip is assigned to an existing concept at UCO level. It increases by one when a new concept, which is connected to this existing concept, is introduced. For example, we have the concept definition of “Teacher” at UCO level (Fig. 6). When a domain expert adds a new relevant concept “Patience” to “Teacher” at OTO level (Fig. 7), Ip of “Teacher” is increased by one. Ip reflects the focusing interests of the stakeholder community. We consider the concept “Patience” as a candidate concept that will be lifted to UCO level only when Ip of its connected concept “Teacher” meets a certain threshold value. When a new concept at OTO level is connected to more than one concept at UCO level, we choose the biggest number of Ip. Accordingly, we formalize a lift rule into the SDT commitments as illustrated in Table 3.
182
Y. Tang and R. Meersman
Table 3. An example of SDT commitments and their explanations in the natural language ID 1
Commitment (P1 = [concept, has, is of, Ip], P2 = [Ip, is of, has, concept])
Verbalization Each concept has at most one interest point. And each interest point is of at most one concept.
: UNIQ (P1, P2). 2
(P3 = [concept , has, is of, Rc], P4 = [Rc, is of, has, concept])
Each concept has at most one relevance score. And each relevance score is of at most one concept.
: UNIQ (P3, P4). 3
(P5 = [concept, has, is of, Rc],
The fact that a concept is lifted to UCO level depends on two conditions: 1. whether its relevance score is more than T1 or not; And 2. Whether its interest point P7 = [concept, is lifted to, contain, UCO level]) is more than T2 or not. : IMP(AND(P5(Rc) >= T1, P6(Ip)>=T2 ), P7). P6 = [concept, has, is of, Ip],
Commitment 1 in Table 3 uses the uniqueness constraint (‘UNIQ’) to express the one-to-one relationship between ‘concept’ and ‘interest point’. So does commitment 2. Commitment 3 uses the propositional connectives ‘IMP’ (the implication connective) and ‘AND’ (the conjunction connective) to express that: a candidate concept can be lifted to UCO level if and only if its relevance point and interest point meet their threshold (‘T1’ and ‘T2’). Based on Table 3, two concepts at the OTO level – ‘Patience’ and ‘Oral comprehension’, which are considered as two candidate concepts, are analyzed in Table 4. Table 4 contains the decision whether the concepts ‘Patience’13 and ‘Oral comprehension’ at OTO level can be lifted to LCO level or not. The tabular report is automatically generated by the SDT plug-in in the DOGMA-MESS tool14. As the relevance score (20) of the concept ‘Oral comprehension’ doesn’t reach the threshold (25), it is kept at OTO level for next MESS iterations. Table 4. A tabular report generated based on the SDT, which contains the commitments in Table 3 (T1=25, T2=25) Candidate concept
Patience
Oral comprehension
…
Condition Super Type
N/A
Competence
…
Relevance Score
30
20
…
Relevant concept at UCO
Teacher
Teacher
…
Interest Point
30
30
…
...
…
…
Action/Decision Keep for next MESS iteration Lift to LCO
13 14
… …
* *
Its concept is given by Fig. 7. The DOGMA-MESS tool currently developed in STARLab is a web portal to assist domain experts to design ontologies: http://www.dogma-mess.org/
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems
183
The resulting concept set is then provided to the core domain experts, who are responsible for standardizing the concepts and merging them at UCO level. As the concepts are defined and visualized in the Conceptual Graph models (e.g.), the core domain experts can use many available conceptual graph matching manners, such as in [16], to merge the new concepts automatically into the existing concepts. During this merging phase, some extra links between new concepts and existing concepts need to be constructed. For example, conceptually equivalent concepts need to be linked with the “equivalent” canonical relation. The merging process results in several reorganized conceptual graphs at UCO level. 4.2.1 Process Evaluation Using SDT Table 3 is as an example of a lifting rule. In practice, users are free to choose their preferred lifting rules. Furthermore, the algorithm in SDT can be evaluated. Let Ncan be the number of the candidate concepts, which resides at OTO level. And let Nsel be the number of selected concepts that are lifted. The concept selection rate O is defined as: O = Nsel/Ncan. The selection rate O can be studied by setting up different values of T1 and T2 (Table 5). Table 5. A result of the selection rate O in tabular report15 taking from the PROLIX project16
T1
Exp1
Exp2
Exp3
Exp4
Exp5
8
20
37
44
56
T2
15
15
15
15
15
O
25%
12%
12%
10%
0%
In practice, core domain experts need to adjust the parameters T1 and T2 based on real situation. For example, they may set T1 and T2 higher when there are a lot of concept candidates and they don’t have a lot of time to standardize them. An advantage of using the generated tabular reports (e.g. Table 5) is to help the core domain experts to determine which values that they should assign to T1 and T2. In this section, we focus on how Semantic Decision Tables can be used in both bottom-up and top-down processes of the meaning evolution cycle in DOGMA-MESS. In the top-down processes, Semantic Decision Tables are used to constrain the dependencies between the templates at the Upper Common Ontology level. The accuracy of the concepts provided by the domain experts gets improved. In the bottom-up elicitation processes, Semantic Decision Tables bring the effectiveness to the system. For the behaviors of the community are captured, managed and guided by Semantic Decision Tables. The concepts created by the domain experts are no longer selected manually. In addition, the lifting algorithms are no longer hard coded in the system. Thus, the flexibility increases. 15 16
This report is also automatically generated in the tool. The objective of PROLIX is to align learning with business processes in order to enable organizations to faster improve the competencies of their employees according to continuous changes of business requirements. In this project, DOGMA-MESS is used to create the competency ontology for every domain of the test beds. URL: http://www.prolixproject.org/
184
Y. Tang and R. Meersman
5 Conclusion and Future Work In this paper, we focus on the discussion of using Semantic Decision Tables in community grounded, knowledge intensive Meaning Evolution Support Systems for ontology evolution. As DOGMA-MESS is recently developed at our STARLab as a collection of meaning evolution support systems, we use it as a sample17 system to illustrate the strength of Semantic Decision Tables (SDTs). We improve the accuracy and the effectiveness of DOGMA-MESS by embedding SDT in it. More explicitly, SDTs are utilized when designing the templates dependencies in the top-down process, and lifting new relevant concepts to the domain level in the bottom-up process. In the top-down process, the decision rules of SDTs are used to check the validity of the concepts. By doing that, the accuracy is improved. We specified seven categories of templates dependencies that are mostly used. They are multivalued dependencies, subset dependencies, join dependencies, uniqueness dependencies, mandatory dependencies, equality dependencies and exclusion dependencies. Except the multivalued dependencies, others are also called “functional dependencies”. In the bottom-up process, the decision rules of SDTs are utilized to draw the decisions whether the concepts are lifted or not. The decisions are executed automatically in the processes. Thus, the system is enhanced with effectiveness. In addition, we emphasize the importance of capturing the community’s behavior and guide it respectively. The community’s behaviors are coded as the parameters in the lifting algorithm stored in SDTs. After the execution of each process (no matter whether it is the top-down process or the bottom-up process), a tabular report is generated based on SDTs. We consider such a tabular report as a complementary mechanism for non-technical users. We have developed a tool to support constructing and visualizing SDTs. The current version supports modeling some specific commitment types. A web portal to support DOGMA-MESS methodology has been developed [6]. In this paper, we focus on the system management while introducing new concepts that matter the overlapped interests of the community. In practice, we observe that the concepts in an ontology can be obsolete after a long time period. Therefore, we need to update the ontology by modifying (e.g. deleting, replacing with others, and redefining) the concepts. A future work is to use SDTs in this kind of modifying processes. Acknowledgments. The research is partly supported by the EC Prolix project.
References [1] Aschoff, F.R., Schmalhofer, F., van Elst, L.: Knowledge mediation: a procedure for the cooperative construction of domain ontologies. In: Proc. of the ECAI 2004 workshop on Agent-Mediated Knowledge Management, pp. 29–38 (2004) [2] Benjamin, P., Menzel, C., Mayer, R., Fillion, F., Futrell, M., de Witte, P., Lingineni, M.: Idef5 method report. Technical Report F33615-C-90-0012, Knowledge Based Systems Inc., Texas (1994) 17
We argue that the usage of SDT is not limited to DOGMA-MESS system.
Use Semantic Decision Tables to Improve Meaning Evolution Support Systems
185
[3] Braun, S., Schmidt, A., Walter, A.: Ontology maturing: a collaborative web 2.0 approach to ontology engineering. In: Proceedings of WWW 2007 (2007) [4] Casanovas, P., Casellas, N., Tempich, C., Vrandecic, D., Benjamins, R.: Opjk modeling methodology. In: Lehmann, J., Biasiotti, M.A., Francesconi, E., Sagri, M.T. (eds.) Legal Ontologies and Artificial Intelligence Techniques, International Conference on Artificail Intelligence and Law (ICAIL) WorkshopSeries, vol. 4, pp. 121–134. Wolf Legal Publishers (2007) [5] de Moor, A.: Ontology-Guided Meaning Negotiation in Communities of Practice. In: Mambrey, P., Gräther, W. (eds.) Proc. of the Workshop on the Design for Large-Scale Digital Communities at the 2nd International Conference on Communities and Technologies (C&T 2005), Milan, Italy (2005) [6] de Moor, A., De Leenheer, P., Meersman, R.: DOGMA-MESS: A Meaning Evolution Support System for Interorganizational Ontology Engineering. In: Schärfe, H., Hitzler, P., Øhrstrøm, P. (eds.) ICCS 2006. LNCS (LNAI), vol. 4068, pp. 189–203. Springer, Heidelberg (2006) [7] Euzenat, J.: Building consensual knowledge bases: context and architecture. In: Mars, N.J.I. (ed.) Towards Very Large Knowledge Bases – Proc. of the KB&KS 1995 conference, pp. 143–155. IOS press, Amsterdam (1995) [8] Fernández, M., Gómez-Pérez, A., Juristo, N.: Methontology: from ontological art towards ontological engineering. In: Proceedings of the AAAI 1997 Spring Symposium Series on Ontological Engineering, Stanford, USA, pp. 33–40 (1997) [9] Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering. Advanced Information and Knowledge Processing Series. Springer, Heidelberg (2004) [10] Gruber, T.R.: Toward Principles for the Design of Ontologies Used for Knowledge Sharing. In: Workshop on Formal Ontology, Padva, Italy. In book, Guarino, N., Poli, R. (eds.): Formal Ontology in Conceptual Analysis and Knowledge Representation. Kluwer Academic Publishers, Dordrecht (1993) [11] Grüninger, M., Fox, M.: Methodology for the design and evaluation of ontologies. In: Skuce, D. (ed.) IJCAI 1995 Workshop on Basic Ontological Issues in Knowledge Sharing (1995) [12] Guarino, N., Poli, R. (eds.): Formal Ontology in Conceptual Analysis and Knowledge Representation. The International Journal of Human and Computer Studies 43(5/6) (1995) (special issue) [13] Halpin, T.: Information Modeling and Relational Database: from Conceptual Analysis to Logical Design. Morgan-Kaufmann, San Francisco (2001) [14] Madhavan, J., Bernstein, P., Domingos, P., Halevy, A.: Representing and reasoning about mappings between domain models. In: Eighteenth National Conference on Artificial Intelligence (AAAI 2002), Edmonton, Canada, pp. 80–86. American Association for Artificial Intelligence (2002) ISBN:0-262-51129-0 [15] Mizoguchi, R.: Tutorial on ontological engineering. New Generation Computing 21(2) (2003) [16] Myaeng, S.H., Lopez-Lopez, A.: Conceptual graph matching: a flexible algorithm and experiments. International Journal of Pattern Recognition and Artificial Intelligence 4(2), Special issue Conceptual graphs workshop, pp. 107–126. World Scientific Publishing Co., Inc, Singapore (1992) ISSN:0218-0014 [17] Sadri, F., Ullman, J.D.: Template dependencies: a large class of dependencies in Relational Databases and its complete axiomatization. Journal of the ACM (JACM) 29(2), 363–372 (1982)
186
Y. Tang and R. Meersman
[18] Schreiber, G., Akkermans, H., Anjewierden, A., de Hoog, R., Shadbolt, N., Van de Velde, W., Wielinga, B.: Knowledge Engineering and Management — The CommonKADS Methodology. The MIT Press, Cambridge (1999) [19] Sowa, J.F.: Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, Reading, Massachusetts (1984) [20] Sowa, J.: Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks Cole Publishing Co., Pacific Grove (2000) [21] Spyns, P., Meersman, R., Jarrar, M.: Data Modeling versus Ontology Engineering. SIGMOD Record: Special Issue on Semantic Web and Data Management 31(4), 12–17 (2002) [22] Spyns, P., Tang, Y., Meersman, R.: A model theory inspired collaborative ontology engineering methodology. Journal of Applied Ontology, Special issue on Guizzardi, G., Halpin, T.: Ontological Foundations for Conceptual Modeling, vol. 4(1), pp. 1–23. IOS Press, Amsterdam (2007) [23] Tang, Y.: On Conducting a Decision Group to Construct Semantic Decision Tables. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2007, Part I. LNCS, vol. 4805, pp. 534–543. Springer, Heidelberg (2007) [24] Tang, Y., Meersman, R.: Towards building semantic decision table with domain ontologies. In: Man-chung, C., Liu, J.N.K., Cheung, R., Zhou, J. (eds.) Proceedings of International Conference of information Technology and Management (ICITM 2007), pp. 14–21. ISM Press (2007) ISBN 988-97311-5-0 [25] Tang, Y., Meersman, R.: On constructing semantic decision tables. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 34–44. Springer, Heidelberg (2007) [26] Tang, Y., Meersman, R.: Organizing Meaning Evolution Supporting Systems Using Semantic Decision Tables. In: Meersman, R., Tari, Z. (eds.) OTM 2007, Part I. LNCS, vol. 4803, pp. 272–284. Springer, Heidelberg (2007) [27] Uschold, M., King, M.: Towards a methodology for building ontologies. In: Skuce, D. (ed.) IJCAI 1995 Workshop on Basic Ontological Issues in Knowledge Sharing (1995) [28] Zhao, G., Meersman, R.: Architecting ontology for Scalability and Versatility. In: Meersman, R., Tari, Z. (eds.) OTM 2005. LNCS, vol. 3761, pp. 1605–1614. Springer, Heidelberg (2005)
Combining User Profiles and Situation Contexts for Spontaneous Service Provision in Smart Assistive Environments Weijun Qin1,2 , Daqing Zhang2 , Yuanchun Shi1 , and Kejun Du2,3 1
3
Key Laboratory of Pervasive Computing, Ministry of Education, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China [email protected], [email protected] 2 TELECOM & Management, SudParis, Evry 91000, France [email protected], [email protected], [email protected] School of Computer Science, Northwestern Polytechnical University Xi’an 710072, P.R. China [email protected]
Abstract. Wireless hotspots are permeating our workplace, home and public places bringing interesting services and spontaneous connectivity to mobile users. Since mobile users will be immersed in thousands of different kinds of pervasive services, it’s of paramount importance to provide appropriate services to the right person in the right form. Existing approaches offer a rather simplistic solution to the problem and seldom consider the multi-dimensional contextual information. In this paper we present a service provision mechanism which can enable effective service provision based on semantic similarity measure with the combination of user profiles and situation context in WLAN enabled environment. We illustrate our ideas through a scenario of service discovery and provision in public places (e.g. shopping mall) where a disabled person can select appropriate services with the constraints of his mobile device.
1
Introduction
Recent research advances in the areas of pervasive and ubiquitous computing focus on novel application scenarios that are related to various living spaces such as home, office, hospital, shopping mall, museum, etc.. Mobile and pervasive technologies are opening new windows not only for ordinary people, but also for elderly and dependent people due to physical/cognitive restriction. As wireless hotspots are permeating the globe bringing interesting services and spontaneous connectivity to mobile users, one trend is to gradually transform various physical environments into ”assistive smart spaces”, where dependent people can be supported across the heterogeneous environments and be fully integrated into the society with better quality of life and independence. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 187–200, 2008. c Springer-Verlag Berlin Heidelberg 2008
188
W. Qin et al.
In order to provide services in various assistive environments, impromptu service discovery and provision with heterogeneous mobile devices become critical. There are two possible ways to make the services accessible to mobile users. One way is to get connected to a global service provider and access the location-based services through the service provider. In this case, the service provider needs to aggregate all the services in the smart spaces and have the indoor/outdoor location of the user. Due to the evolutionary nature of the autonomous smart spaces, it is unlikely to have such a powerful service provider in the near future. The other way, which we are in favor of, is to enable the mobile user interact with individual smart space directly when the mobile user physically enters the space, that is, the mobile user can automatically discover and provision the services provided by the smart spaces. In the latter case, the mobile user needs to get connected directly to the individual smart space using short range wireless connectivity (such as WLAN or Bluetooth). There are two key challenges to achieve impromptu service discovery and provision. The first challenge is how to automatically discover the relevant services upon entering the smart space, making reasonable assumptions on mobile devices and service providers who offer services in the specific smart space; The second challenge is how to automatically provide appropriate services to the right person with the right form with the consideration of contextual information in smart environment. To address the first challenge, various software [1][2] and hardware [3] ”plugin” solutions have been proposed, in addition to the service discovery standards such as UPnP [4], Jini [5], etc. The key idea was to embed specialized software or hardware components in both the mobile device and the environment which follow the same service discovery protocol, however, there is still no dominant solution that was accepted by the whole community. Compared to the first challenge, the second challenge received much less attention. Existing approaches eithor offer a rather simplistic solution to the problem and seldom consider the multi-dimensional contextual information, or are still limited to key-value based semantic matching to select appropriate service to the user [6]. In this paper, we aim to propose service provision framework combining the user profiles and situation contexts to select appropriate services to the end-users from thousands of desultory services with the constraints of mobile device. The rest of the paper is organized as follows: Section 2 starts with a use scenario for assisting a person in a shopping mall, from the use case six system requirements are identified for supporting impromptu service discovery and access in unfamiliar assistive environments. After presenting the overall system for impromptu service discovery in Section 3, a context-aware service provision process based semantic similarity measure are described in Section 4. Then Section 5 gives the design issues. Finally, some concluding remarks are made.
2
Use Scenario
In this section we introduce an assistive service scenario in a shopping mall that needs secure impromptu service discovery and access in smart environments. In
Combining User Profiles and Situation Contexts
189
this paper, we focus on the issue of context-aware service provision with the consideration of user profiles and situation contexts. 2.1
Service Provision in Shopping Mall Scenario
Bob, a disabled person with wheelchair, would like to buy a dust coat in a shopping mall. Upon entering the shopping mall, he is authenticated implicitly and presented 3 general services on his wireless PDA including store direction service, goods directory service, and discount service. He uses goods directory service and discount service to search several dust coat brands with discount according to his preferences such as Burberry, Boss, etc. He decides to look around Boss speciality store first, and uses store direction service to find the location. When Bob decides to go to the store, he is presented with 2 in-store services: one is coat finder which helps him locate the coat in a specific shelf, the other is coat recommendation service that recommends coats based on his preference. Besides, Bob could uses general goods directory service to compare different brands to help him make decision. 2.2
Requirements of the Scenario
From the scenarios described above, we can elicit the following requirements that support impromptu, secure and context-aware service discovery and provision with mobile devices: - R1: Minimum assumption on mobile devices: One of the first ideal features for impromptu service access is that the mobile device doesnt need to install any specialized software or hardware in order to interact with the services in a certain environment. - R2: Automatic discovery of services in unfamiliar environments: Another ideal feature for impromptu service access with mobile devices is the automatic discovery of the relevant services in the new environments, when needed. - R3: Heterogeneous service aggregation: As the devices and services are heterogeneous, they also vary greatly in different environments. There should be an effective mechanism and platform that facilitates the management of both static and dynamic services associated with each smart space. - R4: User and environment context consideration: In order to present the right services to the right person in the right form, user and environment context should be taken into account for service discovery and access in each smart space. - R5: User interface generation according to user context and device capability: As the users and access devices vary, impromptu and personalized user interface generation based on user preference and device capability should be supported. - R6: Secure service access with ensured privacy: When the mobile users access certain services, user inputs and the output of the services need to be exchanged in a secure manner. User privacy should be guaranteed so that he might choose not to reveal his identity during service discovery and access.
190
3
W. Qin et al.
Impromptu Service Discovery and Access
To address the above-mentioned requirements, we propose and implement a impromptu service discovery and access framework which enables the services in unfamiliar environments accessible to any WLAN enabled mobile device, according to individual user’s preference, situation and device capability. While an open standard-based service platform OSGi [7] has been adopted for service aggregation (R3), the Semantic Space [8] is deployed for user and environment context management (R4). The spontaneous interaction framework(R3), security and privacy issue (R6) and user interface generation (R5) have been studied and discussed in separate papers. In this section, we mainly focus on what the minimum requirements are for mobile devices in our design (R1) and how we achieve impromptu service discovery under the minimum assumptions (R2). In the next section we will present the service provision process combining with user and situation contexts (R4). 3.1
Minimum Assumptions on Mobile Device
In order to communicate locally with the smart space, we assume that the mobile devices should have at least the built in WiFi chipset to allow wireless connectivity and a web browser to access the web server hosting the services of the smart spaces. The requirement on mobile devices is minimum as it doesn’t rely on any specialized software or hardware. 3.2
Impromptu Service Discovery Mechanism
Following the minimum assumption on mobile devices, it requests that the services are associated with a specific ”smart space” like a shopping mall, a lift, or a coat shop. When the mobile devices are detected within a certain smart space, the services can be automatically discovered and presented in the mobile devices. To enable the service aggregation and automatic discovery, a captive portal and service portal mechanism is proposed. We adopted the open standard-based service platform such as OSGi [7] to aggregate various services and provide the service portal in a specific smart space, as such various devices and communication protocols are already supported. The main function of captive portal is to connect the mobile device to a wireless network automatically and direct the web browser of the mobile device to the service portal of a specific smart space, it enables the automatic discovery of the services which are provided by each service provider available in the smart space. In our approach, we suppose each service provider records the user’s behavior and infers the semantic dependency between the service and the user using reasoning model (e.g. Dynamic Bayesian Network) [9]. Different from the normal captive portal mechanism, our captive portal is able to detect what the preferred networks list the client device has, it can then configure the WLAN access point in such a way the mobile device with wireless auto configuration capability can automatically connect to the network. Thus when the mobile device switches on, the device is automatically connected to the network without
Combining User Profiles and Situation Contexts
191
Fig. 1. When the user enters a smart space, his mobile device is detected by the Captive Portal and connected to the wireless access point(1), and after launching the web browser, his web browser is directed to the Service Portal(2). He can then browse the available services(3), and invoke a service through his mobile device(4).
configuration. When the web browser is launched by the user, the mobile device will be directed to the service portal of the smart space so that the user can choose to invoke the service needed. The whole process is illustrated in Fig. 1. 3.3
Overall System Architecture
The overall system architecture is shown in Fig. 2 where each smart space has an impromptu service discovery and provision framework embedded as a server. When the dependent people move from one space to another with a WLAN enabled mobile device, her mobile device will be detected and connected to a wireless network in a smart space. And the web browser will be directed by the captive portal in the framework to the service portal of the new smart space which aggregates and hosts the services offered. The middleware core refers to the OSGi service platform. The context manager is responsible for context representation, aggregation, reasoning, storage and query processing, which facilitates service discovery and access with mobile devices [8]. Combining user profiles and situation context, The SemanticMatcher delivers the appropriate associated services to the users automatically.
4 4.1
Context-Aware Service Provision Process Context Modeling
Context is a description of semantic entities such as the user, a device, the situation and the environment [10]. It’s obvious that there are many different types of contextual information that can be used by applications in a smart assistive
192
W. Qin et al. Application Layer
Situation Profile Repository
Semantic Matcher
Reasoning Engine
UI Generation
Service Description Repository
Service Invocation
Service Discovery
. . .
Context Manager
Security
Administration
HTTP Request & Response
Doman‐Specific Services (e.g. Home, Office, Public Places)
Service Layer
OSGi Service Framework Java Runtime Environment System Layer
Operating System Hardware (e.g. IEEE 802.11, Ethernet, Bluetooth)
Physical Interface Layer
Fig. 2. Impromptu service discovery and provision framework for multiple spaces
environment, including physical contexts (e.g. time, location), personal contexts (e.g. identity, preference, mood), device contexts(e.g. display size, power), activity contexts(e.g. shopping list, meeting schedule). Besides those, other types of information are still considered as crucial context which may be invisible to the participants e.g. systematic contexts (e.g. CPU power, network bandwidth), and application contexts (e.g. agents, services), environmental contexts (e.g. light, temperature), etc. In order to distinguish the specific properties or attributes of different kinds of contexts, Context Modeling plays an important role in designing and developing context-aware applications for smart assistive environment. Based on the previous work on context modeling in [11], we build an ontology-based context model with the powerful capabilities of RDF (Resource Description Framework) and OWL (Web Ontology Language). In our model, we divide context ontology into two sets: core context ontology for general conceptual entities in smart assistive environment and extended context ontology for domain-specific environment, e.g. shopping mall domain. The core context ontology attempts to define the general concepts for context in smart environment that are universal and sharable for building context-aware applications. The extended context ontology attempts to define additional concepts and vocabularies for supporting various types of domain-specific applications. The core context ontology investigate seven basic concepts of User, Location, Time, Activity, Service, Environment and Platform, which are considered as the basic and general entities existed in smart assistive environment. Part of the core context ontology is adopted from several different widely-accepted consensus ontologies, e.g. DAML-Time [12], OWL-S [13], etc. Since user profiles, environmental contexts and device capability have deeply influenced service provision process in smart assistive environment for mobile users, the model considers context information ranging from user profiles, location, time and activity to environment capability as user’s situation context. Fig. 3. shows a partial context ontology used in the shopping mall scenario
Combining User Profiles and Situation Contexts
193
Fig. 3. A partial context ontology for service provision process in smart assistive environment, which delineates the concepts and relationships between the user’s situation context and service context
above which delineates the concepts and relationships among the user’s situation context and service context. The User ontology in our sense will consist in a classification of users type and of features of users, and each categorized class will be linked with associated information, such as preferences, knowledge, capabilities and so on. The Service ontology defines the multi-level service specifications in order to support service discovery and composition. Each instance of Service will present a ServiceProfile description, be describedBy a ServiceModel description, and support a ServiceGrouding description in OWL-S specification. The Condition property and the additional hasPrecondition parameter in OWLS specify one of the preconditions of the service which can be considered as the instance defined according to the restriction of user’s situation contexts. User Profiles. The User profiles are of major importance to provide intelligent and personalized service in smart environments [6]. Based on User Ontology, we divide the user profile into two parts: static part which is relevant to user’s type and characteristics, such as Knowledge, PersonalData(e.g. demographic features), Characterics(e.g. profession, language, etc.), Capabilities(e.g. competencies), PhysicalState(e.g. blood pressure, etc.), CognitiveState(e.g. mood, etc.) and so on, and dynamic part which is relevant to user’s behavior, such as Interest, Preference, InvolvedActivity, InvolvedLocation, and their subclasses [14]. The profiles can be stored in the PDA (client) and updated according to specific contexts. The same user can have multiple profiles. Each profile could be the one corresponding to the current user’s location or activity. Table 1 shows a possible profile for the same person related to his role as customer in the scenario. Defining user ontologies facilitates to provide general and shared common concepts and properties relevant to user’s personalized information, preferences and activities.
194
W. Qin et al. Table 1. A partial example for user profile used in shopping mall
ID: QIN001 ... InvolvedActivity: Shopping InvolvedLocation: Shopping Mall Gender: Male Age: 40 Language: French Capabilities: Disabled Interest: Coat Preference: Shape, Color, Material, Size, Brand ...
Situation Context Description. The Situation Context captures the current user’s situation including Location, Activity, Time and PhysicalEnvironment information which is obtained from embedded sensors in smart environment. After the representation, aggregation and calculation processes of context-aware system, situation context can be represented as RDF description form. Table 2 shows an partial example for situation context description in the system. From the basis of context model, the hierarchical structures of Location, Time, Activity and PhysicalEnvironment are well organized with the capability of ontologies. For example, The Store is the subclass of the Shopping Mall with the different granularity of Location. In sensor-based context-aware systems, each data result from sensors may have imprecise factors such as sensor disconnection, signal distortion, etc. With the consideration of sensor data imperfectness, probabilistic model is adopted to represent imperfect data with the form of P r(A) = a, where A represents the situation context instance, and a represents probability value which is calculated from context reasoner. Table 2. An partial example for situation context description used in a store
... Location: Boss speciality store Device: PDA Time: 1:00pm EST, Jan. 1st, 2008 ...
Service Description. The Service Description represents the capability and properties of services and the degree of required information. Current initiatives such as UDDI (Universal Description Discovery and Integration), WSDL (Web Service Definition Language) and WSFL (Web Services Flow Language) are considered complementary to the expression of service definition and properties
Combining User Profiles and Situation Contexts
195
description, however these initiatives lack in considering of combing with the situation contexts in real smart environment. From the view of end-user interaction, a taxonomy of service provide a schema to describe the features of service instance including UUID, ServiceName, ServiceProvider, Category, Precondition, InputParameter, OutputParameter, Dependency etc. The UUID provides the unique identity of services in global domain. The ServiceName presents the service instance’s name. The Category identifies the type of services for end users, such as Navigation Service, Recommendation Service. The InputParameter and OutputParameter aim to delivery the interface parameters from/to the system. The Dependency is used to describe the semantic dependency between the service and the user which is captured, inferred and measured by the service provider. The Dependency has two properties Feature and Value with the form of KeyValue pair: the former is to describe the context objects related to the service interaction, and the latter one is to measure the semantics relevance degree between the two objects that ranges from -1 to 1. For example, the Coat Recommendation Service enumeratess the semantic dependency between the user’s feature, such as Gender, Age, etc., and evaluate the semantics relevance degree between each feature and the service. Table 3 shows an partial example for the Coat Recommendation Service provided by the store. Table 3. An partial example for service description provided by a store
... UUID: S001 ServiceName: Coat Recommendation Service ServiceProvider: Boss speciality store Category: Recommendation Service Feature: Gender DependencyDegree: 0.95 Feature: Age DependencyDegree: 0.8 ...
4.2
Semantic Matching Based on Similarity Measure
The goal of semantic matching between context objects is to determine which user or service profile item with ratings is more relevant for the current context. For example in the scenario above, when Bob wants to enter the shopping store, the Recommendation Service provided by the store may be more appropriate for him based on his location and involved activity. Therefore for each context type, the quantifiable measure of the similarity should be needed to evaluate the semantics relevance between two context objects. Formally we define context (including user profile, situation context, and service) here as a n-dimension vector based on Vector Space Model [15] to represent context object: C = (c1 , c2 , ..., cn ) (1)
196
W. Qin et al.
where ci , (i ∈ 1..n) is quantified as a context type (e.g., Activity, Location) from the feature extraction ranging from -1 to 1. We use the Pearson’s correlation coefficient to calculate the similarity weight between two different contexts with their rating values. The similarity of context C1 and C2 is formally defined as, n αi βi C1 · C2 Similarity(C1 , C2 ) = = n i=12 n (2) 2 ||C1 || × ||C2 || i=1 αi i=1 βi where C1 = (αi ), C2 = (βi ), (i ∈ 1..n) and Similarity(C1 , C2 ) returns the relevance of two context over all the item ratings from the system. The advantage of semantic similarity measure is that similarity-based approaches combines the semantics of both the object’s hierarchical (subclass relationship) and relevant non-hierarchical (cross links) components, while the retrieved results from keyword-based approaches have low recall (some relevant items are missing) and low precision (some retrieved item are irrelevant)[6]. The disadvantage of Pearson correlation coefficient is that the correctness of the vector item’s assumption has influenced the similarity between two context objects, while Pearson correlation indicates the strength of a linear relationship between two n-dimensional vectors. Given the user profiles set and situation context captured from the system, the service provision process first select appropriate user profile based on current user’s location or involved activity information. And then, combine with the selected user profile U ∗ ={u∗1 , u∗2 , ..., u∗m } and situation context C ={w1 , w2 , ..., wn } as composite context CC ={v1 , v2 , ..., vp }={u∗1 , u∗2 , ..., u∗m , w1 , w2 , ..., wn }, where p = m + n and a service description S ={s1 , s2 , ..., sp }, the cosine similarity is calculated as p vi si S · CC Similarity(S, CC) = = p i=12 p (3) 2 ||S|| × ||CC|| i=1 vi i=1 si Based on the similarity result, the M services with high ratings are selected as the nominated presented to the end user. The number M depends on the UI design of user’s client device. {S ∗ } = {Uj |maxm {Similarity(S, CC)}}
5
(4)
Design and Implementation Issues
In this section, we discuss design issues for implementing spontaneous service provision combining with user profiles and situation context. 5.1
Captive Portal to Enable Spontaneous Access without Zero Installation and User Profile Automatic Infusion
By means of the captive portal technique, clients can spontaneously discover the services available in the smart space. When the client’s mobile device connects
Combining User Profiles and Situation Contexts
197
to the wireless hotspot, it issues a DHCP request, and the router would assign with an IP for the device, but the device would initially be firewalled. When the client browser is opened, and requests a website, the listener on the router would return the IP address of the captive portal, instead of the requested site. In our solution, clients are seamlessly authenticated at the captive portal and redirected to the Service Portal, where they can discover available services. Since the captive portal mechanism transparently redirects the client to the Service Portal, through DNS manipulation, the client only needs wireless connectivity and a web browser. No additional software or hardware beacons are needed. The process of capturing browser requests and redirecting to the Service Portal is illustrated in Fig. 1. After authentication, the system create a user profile infusion service to transport multiple user profiles from mobile device to the center server automatically, implicitly and securely. 5.2
Thin Client
Aggregation of services and content is done on the server side at a Spontaneous Interaction Server at each location. The information is presented in webpage format to the client browser through HTTP. Existing Wifi-enabled client devices can easily function in such smart environments, since they do not need special hardware or software to detect other devices or existing services.
Service 1
...
Aggregator 1
...
Service 2
Aggregator n
Core
Service n
Aggregator Service 1
Aggregator n
Service 1.1 Service 1.2
Service n.n
Fig. 4. The core is a thin middleware to provide a means to add services as modules. Services installed can be added to the core directly through built-in aggregators, or through aggregator services as sub-services. In the latter case, the sub-services need not be visible to the core layer.
5.3
Aggregation of Services and Contents
The system contains a core with the functionality to install services as modules (see Fig. 4). Whole ranges of services can be added when aggregators based on existing protocols (such as UPnP, Jini, and X10) are added to the core. Some
198
W. Qin et al.
services can also be aggregator services which collate sub-services. However, these subservices may not be visible directly to the core. In the case of the Media Service, it is an aggregator service that also aggregates media content from various media sources, while being a service itself in the home smart space. 5.4
Semantic Matching Combining with User Profiles and Situation Context
Using the aggregation of situation contexts and user profiles, rating or scoring process can be done to generate the Vector description set. The system can calculate the semantic similarities of all kinds of services existed in the smart environment and provide appropriate numbers of personalized service to end users according to the capabilities of mobile device, such as screen display size, media support type, and so on. 5.5
Middleware Support
Building over the Semantic Space [8] and CAMPS [11], the middleware consist of several individual, collaborating module such as ContextWrapper, ContextAggregator, InferenceEngine, KnowledgeBase, ServiceProvision and QueryFilter as shown in Fig. 5.
Fig. 5. Middleware support for service provision
5.6
Initial Prototype Implementation
We have implemented a prototype as Fig. 6(a) shown for the impromptu service discovery and access framework using the ProSyst OSGi service platform as the middleware core. All the assistive services are wrapped as OSGi bundles in the framework. In each logical smart space, we install the captive portal on a Linksys WRT54GL WLAN router. On entering the smart space, a mobile device (Samsung Q1 UMPC)
Combining User Profiles and Situation Contexts
199
is detected with a WLAN card and connected automatically to the router. After launching the web browser in the UMPC, it is automatically directed to the service portal showing the offered services. For aggregation of devices, we implemented a UPnP wrapper service built over Kono’s CyberLink for Java UPnP library, providing web access capabilities. Links generated in the service portal point to the presentation URLs of the UPnP services. For the shopping scenario in a Coat shop, a snapshot of assistive services including a Coat finder and a Coat recommender is shown in Fig. 6(b). By choosing the Coat finder service, the user can locate the coat collections in a certain shelf (Fig. 6(c)).
Fig. 6. Initial prototype implementation: (a) Impromptu service discovery and provision platform; (b) User interface for Coat recommender; (c) A in-store service: Coat finder
6
Conclusion and Future Work
The paper suggests the combination of user profiles and situation contexts to provide a more pervasive service experience in smart assistive environment with mobile device. Leveraging a captive and service portal mechanism, we have allowed mobile clients with a web browser and wireless connectivity to spontaneously discover and access services. With the semantics similarity combination of user profiles and situation context, the scope of assistive services has been reduced efficiently, personalized to user’s interaction intention and adapted to the capabilities of user’s mobile client. In the future work, we will evaluate service provision based semantic similarity measure.
References 1. Lee, C., Helal, S., Lee, W.: Universal interactions with smart spaces. IEEE Pervasive Computing Magazine 5(1), 16–21 (2006) 2. Nakajima, T., Satoh, I.: A software infrastructure for supporting spontaneous and personalized integration in home computing environments. Personal and Ubiquitous Computing, 379–391 (2006) 3. Rukzio, E., Leichtenstern, K., Callaghan, V., Holleis, P., Schmidt, A., Chin, J.: An experimental comparison of physical mobile interaction techniques: Touching, pointing and scanning. In: Proc. UbiComp, pp. 87–104 (2006)
200
W. Qin et al.
4. Universal Plug and Play (UPnP) Forum, UPnP Device Architecture, Version 1.0 (December 2003), http://www.upnp.org 5. Jini.org, http://www.jini.org 6. Yu, S., Al-Jadir, L., Spaccapietra, S.: Matching user’s semantics with data semantics in location-based services. In: Workshop on Semantics in Mobile Environments (SME 2005) (2005) 7. Open Service Gateway Initiative (OSGi), http://www.osgi.org 8. Gu, T., Pung, H.K., Zhang, D.Q.: A service-oriented middleware for building context-aware services. Journal of Network and Computer Applications 28(1), 1–18 (2005) 9. Lin, Z.H., Fu, L.C.: Multi-user preference model and service provision in a smart home environment. In: Proc. CASE, pp. 759–764 (2007) 10. Strang, T., Linnhoff-Popien, C.: A context modeling survey. In: Proc. UbiComp workshop on Advanced Context Modeling, Reasoning and Management (2004) 11. Qin, W.J., Suo, Y., Shi, Y.C.: Camps: A middleware for providing context-aware services for smart space. In: Proc. GPC, pp. 644–653 (2006) 12. DAML-Time, http://www.cs.rochester.edu/∼ ferguson/daml/ 13. OWL-S: Semantic Markup for Web Services, http://www.daml.org/services/owl-s/1.1/ 14. Carmagnola, F.: The five ws in user model interoperability. In: Proc. IUI 2008 Workshop on Ubiquitous User Modeling (2008) 15. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Ubiquitous Phone System Shan-Yi Tsai, Chiung-Ying Wang, and Ren-Hung Hwang Dept. of Computer Science & Information Engineering, National Chung-Cheng University, 621, Chiayi, Taiwan [email protected], {wjy, rhhwang}@cs.ccu.edu.tw
Abstract. Portable smart devices, such as Pocket PC Phones and Smartphones, have become common devices in our daily life. However, they still provide technology-oriented human interface. In future ubiquitous environment, context-awareness is the key to make the human interface of smart devices become human-centric. In this paper, we design and develop a ubiquitous phone system, referred to as UbiPhone, which provides context-aware, human-centric phone service based on rich context acquired from various sensing sources. The most unique feature of UbiPhone is that a user only needs one click, UbiPhone will automatically choose the right communication channel and device to connect to the callee. UbiPhone also provides intelligent feedback service when the callee is not available, such as when to call back according to callee’s calendar, whether to automatically dial back when the callee becomes available, who to contact with if this is an emergency call and the callee is not reachable. In particular, social network is adopted to reason who would be the most appropriate person to help the caller find the callee immediately. Prototype of UbiPhone has been developed to show its feasibility. Keywords: Ubiquitous computing, smart world, context-awareness, Social network, Intelligent phone system.
1 Introduction As rapid advancement of technology, a revolutionary view of our future life articulated by Mark Weiser [1] is bit by bit coming true. Most of digital equipments, such as smart phone, are equipped with the capabilities of computing and communication in the age of information revolution. Through these advanced technologies, a ubiquitous environment full of human-centered services can be fulfilled soon. Contextawareness is the key feature for human-centered services as it enables services adaptive to human needs. By embedding context aware intelligence into devices of our daily life, many tasks could become simplified and efficient. For example, personal mobile phone has become one the necessary devices for our daily life as many of us are using it for communication. Therefore, making the mobile phone context aware could take the initiative to the ubiquitous small world. Much attention has been paid to integrate context and phone system in recent years, e.g., ContextPhone[2], iCAM system [3], Live Contacts project[4], Enhanced F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 201–215, 2008. © Springer-Verlag Berlin Heidelberg 2008
202
S.-Y. Tsai, C.-Y. Wang, and R.-H. Hwang
Telephony [5], Live Addressbook [6], Context Phonebook [7], and Awarenex [8]. ContextPhone[2] is a context aware smart phone project. The components include ContextLogger for recording user’s location movement, ContextContacts for exchanging context with each other and ContextMedia for sharing users’ media. iCAM system [3] displays location information of the callee on the contact list when the caller is making the call. According to the callee’s location information, iCAM also provides the caller an ordered list of phone numbers of the callee, such as house phone, office phone and cellular phone. Similarly, Live Contacts project[4] integrates various contexts to provide a list of communication ways, such as office phone, cellular phone, SMS, IM and e-mail for the caller. Then the caller can choose his favorite way to communicate with the callee from the list. In the aforementioned projects, although various contexts have been used to help the caller to make a phone call, most of them are not able to provide an automatic call to the callee through the most appropriate communication tool. Furthermore, social context, an important context for intelligent phone service, has not been adopted and analyzed. Social context has been applied to several applications in recent years. In a social network [9, 10], any two individuals (persons) in the network may be connected through a link set of intermediate acquaintances. In [11], the social context was used to find the most appropriate participant to talk with during a conference. The social relationship was discovered by analyzing the contents of research papers. In [12], Alex Pentland proposed a perspective on social dynamic which is based on the signals and behavior of interactions among people. He concluded that the measurements of social interactions can form powerful predictors of behavioral outcome in several important social dynamics, such as to get a raise, to get a job or to get a date. VibeFones [13, 14] gained sophisticated understanding about people’s social lives by acquiring those social signals and phone interactions. Observations concluded that user’s social rank can be discovered by analyzing his ordinary social interactions. In this paper, we propose a ubiquitous phone service, referred to as UbiPhone. Our goal is to provide a human-centric context-aware phone service which makes a phone call as easy as a single click on the callee from the contact list. UbiPhone automatically connects to the callee using the most appropriate phone system based on current context information, such as caller and callee’s location, presence status, network status, available phone systems, calendars, social relationship, etc. In case when the callee is not reachable, UbiPhone will prompt the caller when the callee will become reachable according to callee’s calendar. If it is an emergency call, UbiPhone will provide the caller the most possible person that will be nearby the callee or know how to reach the callee immediately based on the social network model we developed. Moreover, we also integrate IM and SMS (Short Message Service) with UbiPhone to provide short message service when a callee is unable to receive a voice-based communication at that time. Obviously, context management is very important to the implementation of UbiPhone. In UbiPhone, a framework for context management, includes context acquisition, context representation, context retrieval, and context inference, is proposed based on state-of-art technologies. It adopts ontology as the underlying context model. RDF (Resource Description Framework) [15, 16] and OWL (Web Ontology Language) [17] are adopted to implement the ontology model. SPARQL [18] is adopted for query and reason context from the ontology.
Ubiquitous Phone System
203
A prototype of UbiPhone has been implemented to demonstrate the feasibility of UbiPhone. PDA phones are used to carry out the human-centric phone service. In this paper, we also demonstrate that the use of social network model is helpful in finding a callee who is currently unreachable via phone. The rest of this paper is organized as follows. Section 2 presents the architecture and design idea of UbiPhone. A prototyping system implementation details are given in Section 3. Performance evaluation results are shown in Section 4. Finally, Section 5 follows with a concluding remark.
2 Design of UbiPhone In this section, we present the architecture of UbiPhone in section 2.1. Context and social models adopted in UbiPhone are described in section 2.2. The intelligent services provided by UbiPhone to achieve human-centric services are described in section 2.3. 2.1 Architecture of UbiPhone The architecture of UbiPhone is shown in Figure 1. UbiPhone Servers consist of service providing server and context management server. Service providing server provides the intelligent human-centric services by the help of inference engine. Context management server mainly acts as a semantic database server to manage raw data collected from the environment. UbiPhone Clients include users’ smart devices, sensors and other network equipments. Between servers and clients, we also designed a middleware as an interface to transmit all messages and signals. Web Services is adopted for the implementation of the middleware. More specifically, both contexts acquired from client’s device and messages to be sent to client from a server are encapsulated into XML-based SOAP messages. Java Web Service (JWS) framework is adopted to generate WSDL description files which are deployed on the server using Apache Tomcat Servlet Container.
Fig. 1. Architecture of UbiPhone
204
S.-Y. Tsai, C.-Y. Wang, and R.-H. Hwang
In Figure 1, the agents provide functions for service enquiry, context retrieval and statistics acquisition. When each agent receives certain events from sensing devices, it will provide the appropriate service. Call agent and messaging agent assist to connect to the right communication channel for users at that time. Status agent can automatically change users’ status according to user’s location, calendar and network status. Schedule agent, location agent and network status agent are designed to transmit user’s calendar, current location and available network system to context management server. Statistics agent derives statistics of phone call records and location logs of a user and updates collected statistics to the context management server. Figure 2 shows the flowchart of how context is acquired from various devices or updated to user devices. For example, the networking status (context) of the user device, such as whether it is connected to GSM network, is acquired by the network agent and sent to the context management server whenever there is a change. Figure 3 shows another example of how presence status is maintained. Context Information Acquire & Update User Request
Location Info. Update
Social Info. Update
Context Info. Response
Context Info. Update
Context Info. Push
Context Info. Update
RFID/GPS Info. Maintain
Context Info. Update
Updated Social Info. Social Info. Update.
Periodical Update Update Social Info.
Push RFID Reader Info. & GPS Info. For each User
Push GSM/IP Network/Bluetooth Status to Server. For each User
Privacy Rule
GSM/IP Network/Bluetooth Support & Status
Server
Privacy Rule
Context Info. Update
Server Push Context Info to Client Ex: New Calendar event
Client Push IM Info & New Calendar event to Server
Server Response Data with Privacy Rule. Ex: Joe’ s Calendar
Client Request Data. Ex: Joe’ s Calendar
Client
Statistics Agent.
Location Agent
Network Agent
Context Info. Maintain
Privacy Context Management Server
Network Status Update
Context Data
Client Agent
User
User Request
Context Info. Update
Update By SPARQL
Fig. 2. Flow chart of context acquisition and update
2.2 Models of UbiPhone We develop context digest model and social network model in UbiPhone to provide human-centric service. 2.2.1 Context Model Context digest model adopts ontology as the underlying technique to manage context. Context digest model acquires context from various agents and smart devices. These
Ubiquitous Phone System
205
Presence Status Maintenance Manual Input
Automatic Status Set By Context Info .
Get PS Contacts Status
Manual Input
Establish a Call Establish a Call
Calling
IM Status BusyΕ Idle etc.
Scheduled
GSM/ IP Network/ Bluetooh Support & Status Location Info.
Call Agent
Privacy Rule
Status Set
Conflict handle & Status Set
Privacy Rule
All Contacts Status
Intelligent Services
Fig. 3. Flow chart of context presence status maintenance
context data can be classified into three types: static, dynamic and statistic. Static context, such as users’ profile and preference, device’s profile seldom changes. Dynamic context, such as user’s presence status and location, updates to context management server frequently and responds to smart device immediately. Statistic context, such as phone call logs, location logs and event logs, is derived from statistics of historic logs. 2.2.2 Social Network Model In UbiPhone, a social network is modeled based on the user’s phone call records. The social network consists of a set of relations between two individuals. The “weight”, ω, of the social relation between any two users is calculated based on Equation (1). ωu ,b = i
Ru ,bi + Rbi ,u
max {Ru , x + R x ,u }
(1)
∀x
where Ra,b is the number of phone calls generated from a to b. The social network model is used by the inference module of service providing server to provide personalized service. Moreover, in order to implement the emergent-contact service, we define Prelation(u,b,t), the possibility to reach u through b at time t, based on social relation as shown in Equation (2).
206
S.-Y. Tsai, C.-Y. Wang, and R.-H. Hwang
Prelation (u , b, t ) = (1 − βω ) ∗ Pu (b | t ) + βω ∗ ωu ,c
(2)
where βω is the weight of social network model in calculating this possibility; Pu(b|t) is the preference of reaching the user u through contact buddy b at time t. Pu(b|t) is pre-configured by u. βω should be set to 1 if the user does not configure his emergentcontact preference Pu(b|t). 2.3 Intelligent Service of UbiPhone Among many intelligence services of UbiPhone, three most interesting intelligent services are ubiquitous call, anycall and emergent-contact service. They are described in detail in the following. All of these services are implemented based rule-based reasoning engine of the service providing server. 2.3.1 Ubiquitous Call Service Ubiquitous call service is aimed to provide a novel human-centric phone call service. When a user desires to call a buddy from his contact list, he just clicks on the buddy’s name. The call agent on the user’s PDA phone will communicate with the service providing server and provide the most appropriate service to the user, either a VoIP call, a GSM call, or a short message will be suggested if the callee is not available to answer the phone call. Figure 4 shows the flowchart of ubiquitous call. When the user initiates the ubiquitous phone call service, the call agent sends the request to the service providing server. The service providing server first checks the presence status of the callee. If the callee is not busy and can be reached via voice communication tools, the service providing server then checks if VoIP is available for both caller and callee. If yes, then it will instruct the call agent to establish a VoIP call. Otherwise, the callee’s phone number is retrieved based on his location information, e.g., office phone number if he is in office or GSM phone number if he is outdoors, the service providing server then instructs the call agent to make a GSM call to the selected phone number. In case the callee is busy, e.g., he in a meeting or class, appropriate text communication tool is suggested to the user through the message agent. The call agent also prompts callee’s expected available time and his presence status to the callee in the latter case. 2.3.2 AnyCall Service AnyCall is similar to the anycast service in IP networks; it provides the user to call the most appropriate buddy within a group. Consider an emergent scenario where the user needs to call anyone in his family group to inform an accident. In such an emergent situation, any buddy from the family group will be able to help the caller. Thus, considering as a human-centric service, AnyCall provides the user an interface to just a single click on the group name, it will connect the phone call to one of the group members who is available at that time to answer the phone. Figure 5 shows the flowchart of AnyCall. In this case, presence status is used to form a list of candidate callees whose status are free now. The location context is then
Ubiquitous Phone System
Fig. 4. Flowchart of Ubiquitous Call
207
Fig. 5. Flowchart of AnyCall
used to decide who is the nearest buddy geographically. If more than one buddy has the same shortest distance, then Ubiphone selects the buddy with the cheapest phone rate to call from the caller. (In Taiwan, phone rate is different due to more than one cellular phone providers.) Finally, after the callee is determined, a procedure similar to ubiquitous call is adopted to find out the most appropriate communication way (or phone number) to call the callee. 2.3.3 Emergent-Contact Service The emergent-contact service helps the caller contact with a third party from callee’s contact list or a phone at a location when the callee is not reachable. Currently social, calendar and location information are used to reason the most appropriate third party in Ubiphone. Specifically, the function Prelation(u,b,t), defined in equation (2), gives us the possibility to reach the callee based on social context. In equation (3), we define another function, Plocation(u,b,t), which gives us the possibility to reach the callee based on the historic location context. Specifically, the presence location of u and b of the last W weeks are recorded in the context management server. The possibility that u and b are at the same location (defined as the same building) is given by the number of times that they are in the same building at time t (the same day of week and the same time of the day) during the last W weeks over the number of weeks (W):
Plocation (u , b, t ) =
| location u (t ) = location b (t ) | W
(3)
The possibility to reach u through b at time t which considers both social and location context is then defined by the following Equation:
208
S.-Y. Tsai, C.-Y. Wang, and R.-H. Hwang
Pcontact ( u, b, t ) = α location ∗ Plocation (u , b, t ) + α relation ∗ Prelation ( u, b, t )
where α location + α relation =1
(4)
Note that equations (1)~(4) are defined as possibility not probability as they are indicators of possibility, not necessary follow the law or semantics of probability.
Fig. 6. Flowchart of Emergent-contact Service
Figure 6 shows the flowchart of emergent-contact service. After the call agent sends the request to the service providing server. The service providing server first retrieves the list, b, of buddies of u who are free now from the context management server. It then retrieves u’s schedule (calendar) at this time. If the schedule exists and the location of u could be retrieved from the schedule context, then find a buddy who is at that location or the phone number of that location to call. Otherwise, u’s current location will be predicted based on his location logs in the past. Equations (1)~(4) will be used to compute the possibility of each buddy in the contact list and the one that has the highest possibility will be suggested to call.
Ubiquitous Phone System
209
3 Prototyping and Implementation In this section, we demonstrate the feasibility of the UbiPhone via prototyping a real system. Our demonstration scenario is made up with four users using Dopod PDA phones equipped with WiFi, GSM, GPS, and RFID. Users joined one or more research groups. An example of contexts of four users is shown in Table 1. Users configured their contact list at will, as shown in Figure 7. Table 1. Users’ context Context Name Status Location Group
Hushpuppy
Lunsrot
Zqq
busy library Web 2.0 group
free EA315 Web 2.0 group
busy EA105 Ubiquitous Learning group
The scenario is demonstrated as follows: Lunsrot read an interesting paper and wished to discuss his novel idea with any buddy in the e-Learning project. He invoked the AnyCall service (implemented as “I’m Feeling Lucky” function on the PDA phone) by a click on the “e-Learning 2.0” and selecting the “I’m Feeling Lucky” function. Since only Septhiorth was free at that time and he was on the Internet, so the UbiPhone connected Lunsrot to Septhiorth via a VoIP connection, as shown in Figure 8. After talking to Septhiorth, Lunsrot continued his survey on Web 2.0 literature. Later on, he read an interesting topic and decided to share with Hushpuppy. So, he invoked the Ubiquitous Call service to Hushpuppy. Since Hushpuppy was busy in library, UbiPhone blocked the voice channel and suggested Lunsrot to use messaging service, as shown in Figure 9. Now, consider an emergent scenario where Septhiorth needs to inform an important message to zqq immediately. Assume that zqq is not reachable because he is busy and his mobile phone power worn out at that time. So Septhiorth invokes the
Fig. 7. The Screen of Four Users’ Smart Devices
Fig. 8. The Demonstration of “AnyCall”
210
S.-Y. Tsai, C.-Y. Wang, and R.-H. Hwang
Emergency-contact service to find out who can help him to reach zqq. Based on the flow chart of Figure 6, Emergency-contact service suggests Septhiorth to call Lunsrot, as shown in Figure 10. Since Lunsrot and zqq is in the same building, so Septhiorth is able to reach zqq through Lunsrot.
Fig. 9. The Demonstration of “Ubiquitous Call”
Fig. 10. The Demonstration of “Emergency-contact Service”
4 Performance Evaluations In this section, we evaluate the performance of emergent-contact service. Consider the scenario where the user masa has been chosen as the callee for this evaluation. Table 2 shows masa’s contact list which includes buddy’s ID, name, and group. Table 3 shows the default preference, Pu(g|t), of each group configured by masa. The experiment is carried out for five weeks. During the first four weeks, masa’s location logs and phone call logs are collected and stored in the context management server. Statistics of the location logs are then derived for each hour based on a weekly cycle. For example, the location logs may show masa was at (EA315, D4304, EA315, EA315) from 8:00 AM to 9:00 AM on Monday during the first four weeks. The possibility that masa will be at room EA315 is then estimated to be 0.75 for the fifth week. On the other hand, masa’s phone call logs are used to build a social network model where a node corresponds to a buddy and the weight of a link between two nodes is computed based on Equation (1). Based on the social network model and statistics aforementioned, the experiment on the fifth week is conducted as follows. For each hour during the fifth week, select a random time and assume masa turned off his PDA phone at that time. Randomly select a buddy to invoke the emergent-contact service and obtain the service from the service providing server. The weighting parameters are setting as follows: αlocation = 0.65, αrelation = 0.35, βω = 0.3. We then evaluate whether the suggested emergentcontact buddy is nearby masa.
Ubiquitous Phone System
211
Table 2. The Experimenters List No. b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 b18 b19 b20 b21 b22
Nickname masa yuwen puzi calais iamshyang bluejam zqq sigwa des33 hopper septhiorth hushpuppy Lunsrot yanmin Nick rhhwang Mei Brian Dad yuan hsg sue
Relation (Group) Myself Graduate Mate Graduate Mate Graduate Mate Graduate Mate Graduate Mate/University Mate/Dorm Mate Graduate Mate/University Mate University Mate University Mate University Mate Younger Graduate Mate Younger Graduate Mate Younger Graduate Mate Younger Graduate Mate Graduate Mate for Doctor's degree Tutorial Professor Family Family Family Friend Friend Friend
Table 3. The Default Preference of Group Defined by Masa Group Name Myself Family Dorm Mate Friend University Mate Graduate Mate Younger Graduate Mate Graduate Mate for Doctor's degree Advisor
Preference 0 1 2 3 4 5 6 7 8
Pu(g|t) 0 1 1/2 1/3 1/4 1/5 1/6 1/7 1/8
Table 4 shows the statistics of location logs for each hour on a weekly cycle. Table 5 shows masa’s location logs of the fifth week. Besides, social network based on masa’s phone call logs of the first four weeks are derived, as shown in Figure 11. Here, the service is considered successful if the emergent-contact buddy is actually nearby masa (in the same building or outdoor location) and can get masa on the phone right away. Table 6 shows the results of our experiment where “s” denotes success and “f” denotes fail. As we can observe that the service is a success for most of the time. In average, the successful rate is 0.7738.
212
S.-Y. Tsai, C.-Y. Wang, and R.-H. Hwang Table 4. The Statistics of Masa’s Location Log During the First Four Weeks
00:00~01:00 01:00~02:00 02:00~03:00 03:00~04:00 04:00~05:00 05:00~06:00 06:00~07:00 07:00~08:00
Mon D4304/ EA315 D4304/ EA315 D4304/ EA315 D4304/ EA315 D4304/ EA315 D4304/ EA315 D4304/ EA315 D4304/ EA315
Tue D4304
Wed D4304/ EA315
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
D4304
09:00~10:00
D4304
D4304/ EA315
D4304/other s D4304/ others
10:00~11:00
D4304
D4304/ EA501/ EA315
D4304/ EA315/ others
11:00~12:00
D4304
D4304/ EA501/ EA315
D4304/ EA315/ others
12:00~13:00
Meal/ D4304
Meal/ D4304/ EA315
Meal/ D4304/ EA315
13:00~14:00
D4304/ others
D4304/ EA315
D4304/ EA315/ others
14:00~15:00
D4304/ EA315
D4304/ EA315
15:00~16:00
D4304/ EA315
EA315
16:00~17:00
D4304/ S.C.
EA315/ EA105
17:00~18:00
D4304/ S.C.
EA315/ EA105
18:00~19:00
Meal/ D4304
Meal/ EA105
19:00~20:00
D4304
Meal/ EA315
20:00~21:00
D4304
EA315
21:00~22:00
D4304
08:00~09:00
22:00~23:00 23:00~24:00
D4304 D4304
EA315 D4304/ EA315 D4304/ EA315
D4304/ EA315/ others D4304/ EA315/ others D4304/ EA315/ C.Y. D4304/ EA315/ C.Y./S.C D4304/ EA315/ C.Y./S.C D4304/ EA315/ C.Y. D4304/EA3 15/C.Y. D4304/EA3 15 D4304/EA3 15 D4304/ EA315
Thu D4304/ EA315
Fri
Sat D4304/ EA315/K.H. D4304/ D4304 D4304 EA315/K.H. D4304/ D4304 D4304 EA315/K.H. D4304/ D4304 D4304 EA315/K.H. D4304/ D4304 D4304 EA315/K.H. D4304/ D4304 D4304 EA315/K.H. D4304/ D4304 D4304 EA315/K.H. D4304/ D4304 D4304 EA315/K.H. D4304/other D4304/ D4304 s EA315/K.H. D4304/ D4304/ D4304/ others others EA315/K.H. D4304/ D4304/ EA315/ EA315/K.H. EA315/ others / others others D4304/ D4304/ EA315/K.H. EA315/ EA315/ others / others others D4304/ Meal/ EA315/K.H. D4304/ Meal/ / EA315/H.W EA315 others .Room D4304/ D4304/ EA315/K.H. EA315/ EA315/H.W / others .Room others Meal/ D4304/ EA315/ D4304/ EA315/ others S.C./K.H EA315 D4304/ D4304/ EA315/other EA315/ EA315/ s others S.C./K.H D4304
Sun D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H.
D4304/ S.C.
EA315/Driv ing/others
D4304/ EA315/K.H.
D4304/ S.C.
EA315/Driv ing/others
D4304/ EA315/K.H.
Meal/ D4304
EA315/ C.Y./K.H.
D4304/ EA315/K.H.
D4304
EA315/ C.Y./
D4304/ EA315/K.H.
D4304/ EA315/Driv ing D4304/ EA315/Driv ing D4304/ EA315
K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H.
D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H. D4304/ EA315/K.H.
D4304/ EA315 D4304/ EA315 D4304/ EA315 D4304/ EA315
D4304 D4304 D4304 D4304
D4304/ EA315/K.H.
Ubiquitous Phone System
213
Table 5. The Statistics of Masa’s Location Log During the Fifth Week
00:00~01:00 01:00~02:00 02:00~03:00 03:00~04:00 04:00~05:00 05:00~06:00 06:00~07:00 07:00~08:00 08:00~09:00 09:00~10:00 10:00~11:00 11:00~12:00
Mon
Tue
Wed
D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304
D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 EA501 EA501
EA315 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304
Thu
Fri
D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 CCU campus DonateBlood EA315
12:00~13:00
D4304
EA315
EA315
Driving Range
13:00~14:00
D4304
EA315
EA315
Driving Range
14:00~15:00
D4304
EA315
EA315
D4304
15:00~16:00 16:00~17:00 17:00~18:00
D4304 Squash Court Squash Court
EA315 EA105 EA105 Great NightFair
EA315 EA315 EA315 Great NightFair
D4304 D4304 D4304
18:00~19:00
D4304
19:00~20:00
D4304
EA315
EA315
D4304
20:00~21:00
D4304
EA315
EA315
D4304
21:00~22:00 22:00~23:00 23:00~24:00
D4304 D4304 D4304
D4304 D4304 D4304
EA315 D4304 D4304
D4304 D4304 D4304
D4304
Sat
D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 Ming Hsiung ChengClinic Cheng Clinic EA315 EA315 EA315 Orange Grange Orange Grange Orange Grange D4304 D4304 D4304
Sun
D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304
D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 D4304 EA501 EA501
D4304
EA315
D4304
EA315
D4304
EA315
D4304 Squash Court Squash Court
EA315 EA105 EA105
D4304
Great NightFair
D4304
EA315
D4304
EA315
D4304 D4304 D4304
D4304 D4304 D4304
Table 6. The Experimental Results
00:00~01:00 01:00~02:00 02:00~03:00 03:00~04:00 04:00~05:00 05:00~06:00 06:00~07:00 07:00~08:00 08:00~09:00 09:00~10:00 10:00~11:00 11:00~12:00 12:00~13:00 13:00~14:00 14:00~15:00 15:00~16:00 16:00~17:00 17:00~18:00 18:00~19:00 19:00~20:00 20:00~21:00 21:00~22:00 22:00~23:00 23:00~24:00
Mon
Tue
Wed
Thu
Fri
Sat
Sun
f f f f f s s s s s s s s s s s s s s s s s s s
s s s s s s s s s f s s s s s s s s s s s s s s
s s s s s s s s s f f f s s s s s s s s s s s f
f s s s s s s s s f f s f f f f f f f s s s s s
s s s s s s s s s s f f f f f s s s s f f f s s
f s s s s s s s s s f f f f s s s s f s s s s s
s f f s s s s s s s s s s s s s s s f f s s s s
214
S.-Y. Tsai, C.-Y. Wang, and R.-H. Hwang
Fig. 11. Social Network
5 Conclusion In this paper, we have proposed a ubiquitous phone service, UbiPhone, which emphasizes on providing human-centric context-aware phone service. In particular, three human-centric services have been shown in this paepr to demonstrate a phone call to a buddy or a group could be done as simple as a single click on the contact list. We have also shown how the social context and location context could be used to reach a buddy in emergency using the emergent-contact service. UbiPhone also adopts a general platform to manipulate context. Specially, ontology is used for context modeling and Web Services is used for client-server communication. Software agents have been used as the middleware to send and receive context between client and server such that applications could provide context-aware services in a unified form. Our experimental results show that emergent-contact service could provide the caller the buddy nearby the callee at most of the times. The prototype of UbiPhone also demonstrates the feasibility of the proposed humancentric concepts. Several issues of the proposed UbiPhone require further study. For example, vertical handoff between heterogeneous networks is necessary to make a phone call uninterrupted when the user moves around.
Acknowledgment The research is supported by the NSC 96-2219-E-194-005 and NSC 96-2219-E-194006, National Science Council, ROC.
Ubiquitous Phone System
215
References 1. Weiser, M.: The Computer for the Twenty-First Century. Scientific American, 94–104 (September 1991); reprinted in IEEE Pervasive Computing, 19–25 (2002) 2. Raento, M., Oulasvirta, A., Petit, R., Toivonen, H.: ContextPhone: A Prototyping Platform for Context-Aware Mobile Applications. IEEE Pervasive Computing 4(2), 51–59 (2005) 3. Nakanishi, Y., Takahashi, K., Tsuji, T., Hakozaki, K.: iCAMS: A Mobile Communication Tool Using Location and Schedule Information. IEEE Pervasive Computing 3(1), 82–88 (2004) 4. Henri ter Hofte, G., Otte, R.A.A., Kruse, H.C.J., Snijders, M.: Context-aware Communication with Live Contacts. In: Proceedings of the Conference on Computer-Supported. Cooperative Work, Chicago, Ill, USA (November 2004) 5. Cadiz, J.J., Narin, A., Jancke, G., Gupta, A., Boyle, M.: Exploring PC-telephone Convergence with the Enhanced Telephony Prototype. In: Proceedings of the International Conference for Human-computer Interaction (CHI 2004), New York, NY, USA, pp. 215–222 (2004) 6. Milewski, A.E., Smith, T.M.: Providing Presence Cues to Telephone Users. In: Proceedings of the Conference on Computer-Supported. Cooperative Work, New York, NY, USA, pp. 89–96 (2001) 7. Schmidt, A., Stuhr, T., Hans, G.: Context-Phonebook - Extending Mobile Phone Applications with Context. In: Proceedings of Mobile HCI 2001, Lille, France (September 2001) 8. Tang, J.C., Yankelovich, N., Begole, J., Van Kleek, M., Li, F., Bhalodia, J.: ConNexus to awarenex: extending awareness to mobile users. In: Proceedings of the International Conference for Human-computer Interaction (CHI 2001), New York, NY, USA, pp. 221–228 (2001) 9. Radcliffe-Brown, A.R.: On Social Structure. Journal of the Royal Anthropological Institute 70, 1–12 (1940) 10. Milgram, S.: The Small World Problem. Psychology Today, 60–67 (May 1967) 11. Matsuo, Y., Tomobe, H., Hasida, K., Ishizuka, M.: Mining Social Network of Conference Participants from the Web. In: Proceedings of IEEE Web Intelligence, pp. 190–193 (2003) 12. Pentland, A.S.: Social Dynamics: Signals and Behavior. In: IEEE International Conference on Developmental Learning, San Diego, CA (October 2004) 13. Pentland, A.S.: Socially aware computation and communication. IEEE Computer Society 38(3), 33–40 (2005) 14. Madan, A., Pentland, A.S.: VibeFones: Socially Aware Mobile Phones. In: Proceedings of the 10th IEEE International Symposium on Wearable Computers, pp. 109–112 (October 2006) 15. RDF Primer. W3C document, http://www.w3.org/TR/rdf-primer/ 16. RDF Schema. W3C document, http://www.w3.org/TR/rdf-schema/ 17. OWL. W3C document, http://www.w3.org/2004/OWL/ 18. SPARQL. W3C document, http://www.w3.org/TR/rdf-sparql-query
Utilizing RFIDs for Location Aware Computing Benjamin Becker1, Manuel Huber2 , and Gudrun Klinker2 1 EADS InnovationWorks, Germany Advanced Industrial Design and Visualization 81663 M¨ unchen [email protected] 2 Technische Universit¨ at M¨ unchen, Germany Institut f¨ ur Informatik Boltzmannstraße 3, 85748 Garching b. M¨ unchen [huberma, klinker]@in.tum.de
Abstract. This paper focuses on the possibilities of utilizing infrastructural RFIDs for location aware computing. A key to automated computer aided work is location and situation awareness. This allows the system to operate independently from direct user input, minimizing the impact on the original process. For demonstration purposes a large area tracking system using existing RFID technology and infrastructure is developed and evaluated. As a first possible application we embed the system into an aircraft cabin. The localization procedure is based on the coordinates of the RFID transponders detected by the RFID reader, as well as the orientation information measured by a gyroscope. Position estimation is accomplished through an iterative algorithm, optimizing the position of the antenna’s readability field with respect to the currently visible tags. The average error is below half a meter. Possible use cases range from maintenance assistance to way finding in commercial shops or underground parking lots.
1
Introduction
In order to automatically adapt to the ever-changing contexts in which a user may perform, situation and especially location awareness emerged as a key concept in ubiquitous computing. A system that is aware of the user’s location is enabled to react to the current demands without explicit input and may assist in various novel means. Unfortunately location information is not always easily accessible. Often tracking either relies on public infrastructure which may not be present everywhere, especially indoors, or on specialized infrastructure which usually has to be deployed at a high cost per area. Furthermore the installation of specialized hardware is likely to be unfeasible in certain environments because it usually is obtrusive and distracting. The consequence of these constraints is the idea to use whatever technology is present as infrastructure for position estimation. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 216–228, 2008. c Springer-Verlag Berlin Heidelberg 2008
Utilizing RFIDs for Location Aware Computing
217
Fig. 1. Computer Assisted Maintenance Worker
In an industrial scenario it is possible to introduce new technology in a controlled manner and thus it is more accessible in this manner as more diverse technologies are likely to be present. Especially RFID technology, which already has seen some attention in ubiquitous computing, is at the moment predominately used by supply chain management and life-cycle management applications. Therefore RFID technology is expected to become more and more ubiquitous in the near future and thus may be part of our common environments, even possibly enabling its use in domestic environments. We focus on the industrial setup of aircraft cabin maintenance (Figure 1) where we can assume already present RFID tags distributed in the cabin for quality assurance as part of our infrastructure. We will show that using these tags it is possible to perform inside-out tracking and pose estimation. We demonstrate this on a first prototype of an automated maintenance support system. Location awareness herein reduces the need for direct user input since the system is able to detect the current state of the worker. Computer aided maintenance and automated worker support for maintenance tasks in general have already been studied in the past (see for example [1] or [2]) and their benefit has been established. This allows us to mainly focus on the characteristics of the tracking concept including scalability, low impact integration and precision.
218
2
B. Becker, M. Huber, and G. Klinker
Exposition
The following paragraphs will describe the prototype’s overall hardware setup as well as the currently used position estimation algorithm. This is still a prototype and will require further adaption for integration into concrete future scenarios. 2.1
Technical Background
In contrast to existing infrastructural large-area outdoor trackers like GPS and GSM, there is a general lack of scalable and robust large-area trackers working indoors. This paper introduces the possibilities of using RFID technology for localization and tracking in large, dense indoor areas. The prototype illustrates the concept on the example of an aircraft cabin testbed, where RFID tags have been deployed independently for the setup and evaluation of various use cases. As already stated, logistics, quality assurance and life-cycle management are major domains of RFID technology. To match these requirements, tags vary in levels of complexity from simple ID transmitters to smart cards. Passively powered tags use the energy transmitted via the RF field for the communication response whereas active tags depend an additional power source. Currently there are two common physical communication layers [3]: near field (LF at 134.2kHz, HF at 13.56MHz and 27.125MHz ) and far field (UHF at 433MHz or 868MHz (EU) and 915MHz (US), MW 2.45GHz or 5.8GHz ) which especially differ in their operating distance which ranges up to few centimeters in the first case and typically up to 6m in the second case. Active powered tags reach further and often implement additional features like complex encryption - but suffer from a short lifespan and additional maintenance issues due to their battery. Summing up, active tags allow more complex applications at a higher investment per tag. Usually they are used to identify only few but special objects for a short period of time, whereas passively powered tags are often used as a long-term identification of structural parts. For our aircraft cabin scenario we focus on passively powered tags which are being used for long-term quality assurance of structural parts. These support our assumption that RFIDs are massively distributed as part of the infrastructure. Recent use cases show the need for long range readability beyond 1m, from which our proposed tracking method profits much more than it would from short range RFIDs, as these would severely limit the operating volume. 2.2
Related Work
Tracking methodologies are generally classified into outside-in and inside-out tracking. For outside-in tracking a global set of observers determines the positions of all participating entities, where as in inside-out tracking each participant determines its pose independently on its own using landmarks in the environment. Similarly existing approaches for RFID based tracking can be classified into these categories. The inside-out RFID tracking uses distributed tags and a single mobile reader for the position estimation of the reader whereas outsidein uses a certain number of fixed readers to track a set of mobile transponder.
Utilizing RFIDs for Location Aware Computing
219
Outside-in based systems generally suffer from the need to distribute numerous readers in the tracking area, impeding the possible scalability to large areas, due to complex communication needs and high investments. Although outside-in tracking does not match the general constraints of the aircraft use cases, it does use the same technological basis and therefore our system should be able to compete with it. Even commercial applications exist [4], using angle of arrival for the tracking of proprietary, active ultra-wide-band tags. Another system [5] demonstrates the possibility to enable tracking by distance grouping of transponders by varying the antenna’s power. Inside-out based tracking was already used to update a robot’s position estimation [6] by reading tags with known location. Another promising approach [7] is introducing the Super-Distributed RFID Tag Infrastructure. This scenario uses a small robot vehicle equipped with a very short-range RFID reader pointing downwards. The tags are distributed on the floor using their ID to reference a specific position. 2.3
Prototype Scenario
The prototype demonstrates the possibility to achieve 6DOF localization and tracking of a mobile device in an aircraft cabin using a long-range reader and a cloud of distributed passive tags, taking a gyroscope’s current orientation estimate into account. The goal of this application is to allow automated maintenance support based on location awareness. Maintenance of air crafts is time critical as every minute on ground means lost revenue to the airline. Clearly, the more promising approach for our case is an inside-out based tracking system, having no or little impact on the environment by bundling the required hard- and software on a single mobile device. The system’s design for a single reader and inexpensive passive RFID tags allows us to cover a large area with little investment. Besides, often the required infrastructure is already present from other RFID applications as listed in the prior paragraph. Also a mobile computing device can be easily extended with additional sensors like a gyroscope or WLAN tracking, increasing range, robustness and precision of the localization algorithm. Hardware Setup. Since RFIDs are only just being introduced in the aircraft production and maintenance process the tags in the prototype’s cabin mock-up were added supplementarily. This paragraph describes how the tracking system was embedded into the aircraft cabin and gives additional detail about the hardware in use. It is useful to have a large and reliable reading range to increase robustness as well as the position estimation quality. The RFID tags which had already been deployed in the aircraft cabin testbed are passive ultra-high-frequency transponders, which match these requirements (Figure 2, Rafsec G2 ShortDipole at 868Mhz ). The reader used was a Feig LRU2000 UHF Long Range Reader. Since neither this nor any other investigated reader made any tag quality metrics (such as received signal strength indicators) available, each scan only resulted in
220
B. Becker, M. Huber, and G. Klinker
Fig. 2. RFID Tags Distributed in Aircraft Cabin
binary “detected” or “not-detected” information per tag. The miniature longrange reading device (Figure 3) is equipped with a directional antenna. The antenna’s gain increases the reading range along a single axis, allowing this system to detect tags up to a distance of 6.0 meters. As illustrated in Figure 2 (right and lower left), the transponders in this scenario are distributed within the aircraft’s cabin, especially attached to the hatrack, the seating and the floor panels. In the final scenario in a real aircraft cabin usually every major structural part has its own identification tag, giving a far denser distribution. For measuring the current orientation, a gyroscope is directly connected to the antenna (Figure 4) with a precalibrated offset transformation. Basic Localization Concept. The localization procedure is based on two sensors: the RFID reader and the gyroscope. The reader’s output is a list of tags read by the antenna within the last 200 milliseconds. Note that these measurements only constitute the existence of the relevant tags in the reading range of the antenna and do not contain further distance information. The developed RFID tracking system has three basic characteristics: – A large number of RFID tags are attached to structural parts of the cabin. For each of them the exact location and ID are known and available to the system as part of the Digital MockUp (DMU) data.
Fig. 3. RFID Reader
Fig. 4. Xsens Gyroscope
Utilizing RFIDs for Location Aware Computing
221
– A gyroscope measures the current orientation. This is used internally to increase the accuracy of the position estimate and returned as part of the pose estimation. – The reading range of the RFID antenna is taken into consideration. It has to be determined beforehand. The localization is based on the coordinates of the currently detected transponders, which can be derived from the database. The antenna is estimated to be at a position from where it is able to read all currently detected tags best. This is accomplished through an iterative process which imposes requirements on how the detection range is modeled, as seen in the following section. In our case the position of each RFID tag and the corresponding ID was measured in a preprocessing step. Hopefully this can be automated in a future scenario by having predefined IDs and positions of tags on certain constructions or building parts. This assumption is realistic, since RFID technology is in the process of being integrated into the aircraft manufacturing process. The setup of the following algorithm is intended to strongly reduce computational cost of the pose estimation. This is required since mobile devices often lack computational power in favor for mobility and uptime. Antenna Readability Model. Before actually using the system, the detection range of the RFID reader’s antenna has to be determined. Due to missing details on the assembly of the antenna this has to be empirically estimated through experiments. To distinguish the readability field some tests were run. Since the system uses a directional antenna with a rotationally symmetric RF field, the dimension of the testbed was reduced to a 2D plane through the central axis of the antenna. Numerous tags were attached to polystyrene cubes and placed on a grid to evaluate the reading range and probability, resulting in a discrete readability field. For each point p relative to the antenna the position estimation algorithm requires a scaled displacement vector from the antenna model. This vector v(p) indicates in which direction the antenna needs to be moved to increase the chance of detecting a transponder at location p. A scalar d(p), defining the intensity of the displacement vector of a tag at this position is used to increase the influence of a wrongly positioned tag on the pose estimation. Both will be derived from the probability of a tag being read. Since realtime computation of the readability for all tags is computationally expensive, a gradient field was derived by interpolation (Figure 5) which serves as a lookup table. The resulting gradient field is used to scale the displacement, which is shown in Figure 6. Both fields use coordinates in the antenna’s reference frame. The antenna is placed at p = (0, 0) and the point with the maximum reading possibility is located at pmax = (0, 0.85m). Optimization of the Position Estimate. The reader scans the antenna field for visible tags during 100ms slices. Initially the position estimation assumes
222
B. Becker, M. Huber, and G. Klinker
Fig. 5. Antenna Readability
Fig. 6. Displacement
that the antenna is in the centroid of the visible tag positions. At each step a simple optimization procedure is performed. First of all the visible tags are rotated into the reference frame of the antenna by applying the inverse of the orientation measured by the gyroscope. This relative position is projected onto the 2D plane and used to access the readability as well as the displacement field. In the next step the median over all displacement vectors is computed and added to the current position resulting in a new position estimate. The estimated quality of this position is calculated as the sum over the readability of all detected tags, taken from the readability field. Iteratively this procedure optimizes the position estimate until a lower border has been reached. When finished the position as well as the likeliness with the highest quality rating is returned. In case no tags have been detected, the current position is kept untouched. 2.4
Prototype Analysis
This section focuses on the evaluation of the technical properties of the prototype and compares it with other existing setups. Error Analysis. The setup for the test is shown in Figure 7. For the estimation of the ground truth position an optical infrared tracker is used, allowing the reflective marker on the antenna to be localized within millimeter precision. Also the offset between the infrared reflection marker and antenna was calibrated beforehand. This assures that the entire tracking system used for validation and testing introduces a negligible error compared to the expected precision of the RFID tracking. We present here only a one minute excerpt from our test runs, which exemplifies the typical performance of the system. The mobile device was moved in the cabin with varying orientations. The currently visible tags and the resulting position estimation of the RFID tracker as well as the ground truth position
Utilizing RFIDs for Location Aware Computing
223
Fig. 7. Antenna with IR-Marker and A.R.T. Tracking System
Fig. 8. Antenna Field (Median Readability) and Cabin Coordinates
were timestamped and logged. In a post-processing step the recorded data was examined to determine the precision and to analyze the current setup. To differentiate the error induced by the distribution of the tags and the antenna model, the coordinate system’s axes are distinguished with a trailing ‘a’ for antenna or ‘c’ for cabin. In Figure 8, the cabin has the xc axis (red) pointing to the side, the yc axis (green) to the front and the zc axis (blue) to the ceiling. The antenna’s coordinate system has the xa axis pointing to the side, the ya along the antenna’s normal and the upward axis za . The overall performance of our approach can be extracted from Figure 9. It shows the error distributed along all three axes of the cabin as well as the average error across the entire testing time. The output is the raw data taken from the position estimator without any filtering or continuous position estimation for jitter reduction. The Euclidean distance between the RFID tracker’s estimate and the ground truth ranges up to 1.5 meters with an average of 0.48 meters. In the beginning of the test run, the antenna was moved while pointed towards the front of the cabin. This results in a small yc axis error since a more homogeneous distribution of the RFID tags (Figure 8, right) allows the antenna model to fit perfectly with its long reading range along its ya axis. Therefore the effect is homogeneous in both reference systems. In the time range between 10 and 40 seconds the antenna was additionally pointed upwards and downwards while still moving through the cabin, increasing the error along the z c axis. The uncertainty along the other two axes as well as
224
B. Becker, M. Huber, and G. Klinker
Eucl. sum x y z
2.5 error distance in meter
2 1.5 1 0.5 0 -0.5 -1 -1.5 0
10
20
30
40
50
60
time in seconds
Fig. 9. Error in the Test Run in the Cabin’s Coordinate System
the overall error are also increased, since the antenna’s reading field is mainly outside of the cabin’s volume where no tags were placed. Within the final time interval beginning at 40 seconds, the antenna’s position was kept still but was rotated mainly around its za axis and therefore often pointing along the side of the cabin (xc axis). This shifts the error between the yc and the xc axis. The distribution density of the tags towards the side of the cabin is also inappropriate. Not only the distribution and the homogeneity of the tags are important for a proper function of the algorithm, also the number of tags used for the position estimation is of interest. Looking at Figure 10, it is evident that the simple approach to use the centroid of the tag cloud is inferior to the optimization based on the antenna model. The overall median error is reduced from 0.71 to 0.48 meters. It is evident that the possible uncertainty introduced by the antenna’s long reading range in direction of the ya axis could be compensated Initial Centroid
Optimized Position Estimation 1000 x y z
error distance in meter
1.4
x y z
1.2 1
800
600
0.8 400
0.6 0.4
200
0.2 0
number of position estimations
1.6
0 0
1 2 3 4 5 6 7 number of tags for pos. estimate
0
1 2 3 4 5 6 7 number of tags for pos. estimate
Fig. 10. Error in the Antenna’s Reference Frame (x, y, z) and Histogram (gray bars) with Respect to the Number of Detected Tags
Utilizing RFIDs for Location Aware Computing
225
with the orientation based method. The errors on the xa and za axis could be improved only slightly. For the case of zero detected tags the error is high because without measurement information supplied no further estimation can be performed. In the case of a single tag the algorithm’s estimate induces an even larger error. This indicates that a tracking using a motion model, based on preceding estimates, could achieve an increased performance since a good initial estimate seems superior to the estimation based on a single tag. Single tag discoveries often occur when the reader is pointing directly into an obstacle, reducing the range of the RF field. Although the error should be indirectly proportional to the number of tags this can not be derived from our current test results. As the histogram in Figure 10 illustrates, only about 3% of the frames have more than 4 tags detected. Nevertheless it seems obvious that the error increases in a high detection case. A possible reason for this can be traced back to the reader buffering recently read tags when a quick rotation is performed, resulting in more detected tags but partially from an outdated orientation. Although neither the algorithm nor the implementation can be expected to be optimal, the results clarify four main error sources in the system setup: – – – –
inhomogeneous distribution of tags cause the position algorithm to fail low density of RFID tags yield an inferior pose estimation obstacles changing the antenna’s reading probability field increases the error accumulation of visible tags over a certain period of time introduces an error when quick rotations are performed
These characteristics need to be considered in future optimizations of the system. Error Evaluation and Comparison. Considering the simple algorithm and optimization model used, the achieved quality is encouraging. As the results clarify, the antenna model and the position estimation algorithm are not optimal. Nevertheless it is obvious that an algorithm using a representation of the antenna model in combination with the orientation information of a gyroscope can significantly increase the quality of an RFID tracker. For a comparison with existing RFID based tracking systems, the outside-in based approaches are analyzed first. Although they have a totally different target application and differ largely from our setup it is important to see what results are possible when using RFID technology at ideal conditions. Disregarding the already mentioned disadvantages, the results are only of slightly better quality regarding precision of the position estimate. The tracker Ubisense reaches only slightly better results with 95% of the measurements having an error below 30cm [4]. The passive outside-in system landmarc ([5]) achieves an median error value of about one meter. Both systems highly depend on the number of readers used in the tracking area. Inside-out based approaches, as used in the fields of robotics, can benefit from the advantages of continuous tracking with control inputs from the robot’s steering sensors. Besides this, the localization is reduced to a 2D position estimation.
226
B. Becker, M. Huber, and G. Klinker
The system presented in Super-Distributed RFID Tag Infrastructure [7] allows a high quality position estimation. When a tag is detected, the short reading range of the antenna provides a high accuracy localization, giving the robot the chance to correct its internal estimate. This still suffers from a major drawback: When carried by a worker and therefore without control input, continuous tracking fails since the short reading range implicates very seldom tag discoveries. 2.5
Application and Further Use Cases
For our maintenance application the tracking is precise enough to embed a location aware assistance software. It is possible to guide the worker to the correct frame of the aircraft and to priorize the tasks according to his current position, so that maintenance tasks in the current working area are favored and thus minimizes time spent between tasks. Also it is possible to reduce user input which would occupy the worker. The use of a tracking system allows extending the existing workflow of a mechanic. Nowadays it is common to have a digital task list on a mobile device to verify that all important maintenance tasks have been performed correctly. With the additional location sensitivity it is possible to guide the worker towards the next task and automatically give additional information regarding the current step. For further scenarios, it will be interesting to see if RFID technology is feasible for standalone tracking, as supporting tracker for initialization of a more precise but local tracking method (e.g. optical model-based tracking)[8] and for largearea tracking. Also it is promising to evaluate RFID technology as an additional source of information for ubiquitous computing or Augmented Reality applications regarding location awareness and object identification. Furthermore RFID based infrastructure can serve as an additional tracking modality in ubiquitous tracking setups[9], extending it to otherwise inaccessible environments. As stated before RFID technology is expected to become more common and enter areas of our daily life. As a consequence this also can be seen as a low cost tracking infrastructure becoming ubiquitously available which can lead to numerous applications. One such envisioned possibility is assisted wayfinding and navigation in supermarkets. This scenario further is supported by the fact that today many consumer products already start to be tagged by RFIDs, as some preliminary field studies show. Another similar scenario could be the installation of tags in underground parking lots to facilitate customer routing and orientation between parking decks and cars.
3
Conclusion and Outlook
Our system extends the existing approaches to a complete 6DOF tracker by combining the 3DOF orientation of the device with the 3DOF position optimized considering the antenna’s alignment.
Utilizing RFIDs for Location Aware Computing
227
The precision of the position estimate is of similar quality as the existing commercial outside-in tracking systems. Contrary to those approaches, the tracking system presented here allows simple and cost effective scaling for tracking within large areas. A major difference to the existing inside-out tracking systems is that it does not require the control input of a robot to function, allowing it to be applied for tracking mobile devices. This prototype confirms the idea of using RFID technology as a basis for location aware computing. The precision is acceptable for location based applications but leaves room for improvement. Our work shows that there is a high potential in using RFID technology for tracking purposes. For different scenarios it might not be feasible to spend additional money on a gyroscope to reduce the error but use a less directional antenna taking the centroid of the detected tags as pose estimation. Further Improvements. One possible way to improve the application is to examine the readability of tags in detail. Not only the antenna’s characteristic defines the probability of a tag to be detected, but also the orientation of the tag’s dipole in the RF field. This leads to the idea not only to store the position of the distributed RFID tags in a database but extend it to a 6DOF pose. Also different reader hardware setups will be investigated. Other readers allow dynamic reconfiguration of the radio parameters and also do not exhibit the buffering problem observed above. Different antennas with more pronounced directional radiation patterns may also influence the performance of the system. Another influence on the readability is the close environment of the tag, the material it is attached to as well as any material preventing it from beeing read by the reader, in case it is placed in the objects inside. This should also be taken into account when trying to estimate the reader’s position. Currently also ignored, yet of major importance to the position estimation, is the impact of undetected tags on the position estimate. An estimate not only becomes more likely if the detected tags are mapped perfectly to the antenna model but also must decline if many tags within the readability field are invisible. Our next extension will be a Kalman Filter based model that takes a tag list as measurement input and compares this to the readability derived from the antenna model. This allows the algorithm to also exploit undetected tags and to reach an even better estimate. Additionally the Kalman Filter will use an internal motion model for continous tracking.
Acknowledgments This paper profits from the work done by Marcus Fey in his diploma thesis. Our research was supported by the BFS (Bayerische Forschungsstiftung) project trackframe.
228
B. Becker, M. Huber, and G. Klinker
References 1. Ong, S.K., Nee, A.Y.C. (eds.): Virtual and Augmented Reality Applications in Manufacturing. Springer, Heidelberg (2004) 2. Speckmann, H., Ley, M.: Multi-media NDT procedures and online-maintenance assistance in the frame of the european R&D project INDeT (integration of NDT). In: ECNDT 2006, 9th European Conference on Non-Destructive Testing, Berlin, Berlin, Germany, Deutsche Gesellschaft f¨ ur Zerst¨ orungsfreie Pr¨ ufung e.V. (2006) 3. Finkenzeller, K.: RFID-Handbuch, Grundlagen und praktische Anwendungen induktiver Funkanlagen, Transponder und kontaktloser Chipkarten, 3rd edn. Hanser (2002) 4. Steggles, P.: Ubisense Hardware Datasheet (2006) 5. Ni, L.M., Liu, Y., Lau, Y.C., Patil, A.P.: Landmarc: Indoor location sensing using active RFID. In: First IEEE International Conference on Pervasive Computing and Communications, vol. 1, pp. 407–415 (2003) 6. H¨ ahnel, D., Burgard, W., Fox, D., Fishkin, K., Philipose, M.: Mapping and localization with RFID technology. In: Proc. of the IEEE International Conference on Robotics and Automation (ICRA) (2004) 7. Bohn, J., Mattern, F.: Super-Distributed RFID Tag Infrastructures. In: Markopoulos, P., Eggen, B., Aarts, E., Crowley, J.L. (eds.) EUSAI 2004. LNCS, vol. 3295, pp. 1–12. Springer, Heidelberg (2004) 8. Najafi, H., Navab, N., Klinker, G.: Automated initialization for marker-less tracking: A sensor fusion approach. In: Proc. of International Symposium on Mixed and Augmented Reality, Arlington, VA, USA (November 2004) 9. Newman, J., Wagner, M., Bauer, M., MacWilliams, A., Pintaric, T., Beyer, D., Pustka, D., Strasser, F., Schmalstieg, D., Klinker, G.: Ubiquitous tracking for augmented reality. In: Proc. IEEE International Symposium on Mixed and Augmented Reality (ISMAR 2004), Arlington, VA, USA (November 2004)
A Component-Based Ambient Agent Model for Assessment of Driving Behaviour Tibor Bosse, Mark Hoogendoorn, Michel C.A. Klein, and Jan Treur Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands {tbosse, mhoogen, mcaklein, treur}@cs.vu.nl http://www.cs.vu.nl/~{tbosse, mhoogen, mcaklein, treur}
Abstract. This paper presents an ambient agent-based model that addresses the assessment of driving behaviour. In case of negative assessment, cruise control slows down and stops the car. The agent model has been formally specified in a component-based manner in the form of an executable specification that can be used for prototyping. A number of simulation experiments have been conducted. Moreover, dynamic properties of components at different aggregation levels and interlevel relations between them have been specified and verified.
1 Introduction Recent developments within Ambient Intelligence provide new technological possibilities to contribute to personal care for safety, health, performance, and wellbeing; cf. [1], [2], [9]. Applications make use of possibilities to acquire sensor information about humans and their functioning, and knowledge for analysis of such information. Based on this, ambient devices can (re)act by undertaking actions in a knowledgeable manner that improve the human’s, safety, health, performance, and wellbeing. The focus of this paper is on driving behaviour. Circumstances may occur in which a person’s internal state is affected in such a way that driving is no longer safe. For example, when a person has taken drugs, either prescribed by a medical professional, or by own initiative, the driving behaviour may be impaired. For the case of alcohol, specific tests are possible to estimate the alcohol level in the blood, but for many other drugs such tests are not available. Moreover, a bad driver state may have other causes, such as highly emotional events, or being sleepy. Therefore assessment of the driver’s state by monitoring the driving behaviour itself and analysing the monitoring results is a wider applicable option. A component-based ambient agent-based model is presented to assess a person’s driving behaviour, and in case of a negative assessment to let cruise control slow down and stop the car. The approach was inspired by a system that is currently under development by Toyota. This ambient system that in the near future will be incorporated as a safety support system in Toyota’s prestigious Lexus line, uses sensors that can detect the driver’s steering operations, and sensors that can detect the focus of the driver's gaze. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 229–243, 2008. © Springer-Verlag Berlin Heidelberg 2008
230
T. Bosse et al.
The ambient agent-based system model that is presented and analysed here includes four types of agents: sensoring agents, monitoring agents, a driver assessment agent, and a cruise control agent (see also Figure 1). Models for all of these types of agents have been designed as specialisations of a more general componentbased Ambient Agent Model. Within the model of the driver assessment agent, a model of a driver’s functioning is used expressing that an impaired internal state leads to observable behaviour showing abnormal steering operation and unfocused gaze. The monitor agent model includes facilities to automatically analyse incoming streams of information by verifying them on temporal patterns that are to be monitored (for example, instable steering operation over time). The design has been formally specified in the form of a component-based executable agent-based model that can be used for prototyping. A number of simulation experiments have been conducted. Moreover, dynamic properties of components at different aggregation levels and interlevel relations between them have been formally specified and verified against these traces. The paper is organised as follows. First, the modelling approach is introduced in Section 2. In Section 3 the global structure of the agent-based model is introduced, whereas Section 4 presents a generic ambient agent model. Specialisations of this generic agent model for the specific agents within the system are introduced in Section 5, and in Section 6 simulation results using the entire model are described. Section 7 shows the results of verification of properties against the simulation traces, and finally Section 8 is a discussion.
2 Modelling Approach This section briefly introduces the modelling approach used. To specify the model conceptually and formally, the agent-oriented perspective is a suitable choice. The modelling approach uses the Temporal Trace Language TTL for formal specification and verification of dynamic properties [3], [7]. This predicate logical language supports formal specification and analysis of dynamic properties, covering both qualitative and quantitative aspects. TTL is built on atoms referring to states, time points and traces. A state of a process for (state) ontology Ont is an assignment of truth values to the set of ground atoms in the ontology. The set of all possible states for ontology Ont is denoted by STATES(Ont). To describe sequences of states, a fixed time frame T is assumed which is linearly ordered. A trace γ over state ontology Ont and time frame T is a mapping γ : T → STATES(Ont), i.e., a sequence of states γt (t ∈ T) in STATES(Ont). The set of dynamic properties DYNPROP(Ont) is the set of temporal statements that can be formulated with respect to traces based on the state ontology Ont in the following manner. Given a trace γ over state ontology Ont, the state in γ at time point t is denoted by state(γ, t). These states can be related to state properties via the formally defined satisfaction relation |=, comparable to the Holds-predicate in the Situation Calculus [8]: state(γ, t) |= p denotes that state property p holds in trace γ at time t. Based on these statements, dynamic properties can be formulated in a sorted first-order
A Component-Based Ambient Agent Model
231
predicate logic, using quantifiers over time and traces and the usual first-order logical connectives such as ¬, ∧, ∨, ⇒, ∀, ∃. A special software environment has been developed for TTL, featuring a Property Editor for building TTL properties and a Checking Tool that enables formal verification of such properties against a set of traces. To specify simulation models and to execute these models, the language LEADSTO, an executable sublanguage of TTL, is used (cf. [4]). The basic building blocks of this language are causal relations of the format α → →e, f, g, h β, which means: If then
state property α holds for a certain time interval with duration g, after some delay (between e and f) state property β will hold for a certain time interval of length h.
where α and β are state properties of the form ‘conjunction of literals’ (where a literal is an atom or the negation of an atom), and e, f, g, h non-negative real numbers.
3 Global Structure For the global structure of the model, first a distinction is made between those components that are the subject of the system (e.g., a patient to be taken care of), and those that are ambient, supporting components. Moreover, from an Driver Assessment agent-based perspective (see, agent e.g., [5], [6]), a distinction is made between active, agent Cruise Steering Gaze-focus components(human or artificial), Control Monitoring Monitoring agent agent agent and passive, world components (e.g., part of the physical world or a database). Agents may interact through communication. Interaction between an agent Steering Gaze-focus and a world component can be Sensoring agent Sensoring agent car and environment either observation or action performance; cf. [5]. An action is generated by an agent, and transfers to a world component to have its effect there. An obdriver servation result is generated by Fig. 1. Ambient Driver Support System a world component and transferred to the agent. In Figure 1 an overview of the system is shown. Table 1 shows the structure in terms of different types of components and interactions. 3.1 State Ontologies Used at the Global Level For the information exchanged between components at the global level, ontologies have been specified. This has been done in a universal order-sorted predicate logic
232
T. Bosse et al.
format that easily can be translated into more specific ontology languages. Table 2 provides an overview of sorts and predicates used in interactions at the global level. Table 1. Components and Interactions of the Ambient Driver Support System subject components subject interactions
ambient components
ambient interactions
interactions between subject and ambient components
subject agent subject world component human driver car and environment observation and action by subject agent in subject world component driver observes car and environment driver operates car and gaze ambient agents steering and gaze-focus sensoring agent; steering and gaze-focus monitoring agent; driver assessment agent, cruise control agent communication between ambient agents steering sensoring agent communicates to steering monitoring agent gaze-focus sensoring agent communicates gaze focus to gaze-focus monitoring agent eye-focus monitoring agent reports to driver assessment agent unfocused gaze steering monitoring agent reports to driver assessment agent abnormal steering driver assessment agent communicates to cruise control agent state of driver observation and action by ambient agent in subject world component steering sensoring agent observes steering wheel gaze-focus sensoring agent observes driver gaze cruise control agent slows down or stops engine
Table 2. Ontology for Interaction at the Global Level SORT
Description an action an agent an information element, possibly complex (e.g., a conjunction of other info elements) a world component Predicate Description performing_in(A:ACTION, W:WORLD) action A is performed in W observation_result_from(I:INFO_EL, W:WORLD) observation result from W is I communication_from_to(I:INFO_EL, X:AGENT, information I is communicated by X to Y ACTION AGENT INFO_EL WORLD
Y:AGENT) communicated_from_to(I:INFO_EL,X:AGENT,Y:AGENT)
information I was communicated by X to Y
3.2 Temporal Relations at the Global Level Interaction between global level components is defined by the following specifications. In such specifications, for state properties the prefix input, output or internal is used. This is an indexing of the language elements to indicate that it concerns specific variants of them either present at the input, output or internally within the agent. Action Propagation from Agent to World Component ∀X:AGENT ∀W:WORLD ∀A:ACTION output(X)|performing_in(A, W) → input(W)|performing_in(A, W) Observation Result Propagation from World to Agent ∀X:AGENT ∀W:WORLD ∀I:INFO_EL output(W)|observation_result_from(I, W) → input(X)|observed_result_from(I, W)
A Component-Based Ambient Agent Model
233
Communication Propagation Between Agents ∀X,Y:AGENT ∀I:INFO_EL output(X)|communication_from_to(I,X,Y) →input(Y)|communicated_from_to(I,X,Y)
4 Component-Based Ambient Agent Model This section focuses on an Ambient Agent Model (AAM) used for the four types of ambient agents in the system. These agents are assumed to maintain knowledge about certain aspects of human functioning, and information about the current state and history of the world and other agents. Based on this knowledge they are able to have some understanding of the human processes, and can behave accordingly. In Section 5 it is shown how the Ambient Agent Model AAM has been specialised to obtain models for monitor agents, a driver assessment agent, and a cruise control agent. 4.1 Components within the Ambient Agent Model Based on the component-based Generic Agent Model (GAM) presented in [5], a model for ambient agents (AAM) was designed. Within AAM, as in GAM the component World Interaction Management takes care of interaction with the world, the component Agent Interaction Management takes care of communication with other agents. Moreover, the component Maintenance of World Information maintains information about the world, and the component Maintenance of Agent Information maintains information about other agents. In the component Agent Specific Task, specific tasks can be modelled. Adopting this componentbased agent model GAM, the ambient agent model has been obtained as a refinement in the following manner. The component Maintenance of Agent Information has three subcomponents in AAM. The subcomponent Maintenance of a Dynamic Agent Model maintains the causal and temporal relationships for the subject agent’s functioning. The subcomponent Maintenance of an Agent State Model maintains a snapshot of the (current) state of the agent. As an example, this may model the gaze focussing state. The subcomponent Maintenance of an Agent History Model maintains the history of the (current) state of the agent. This may for instance model gaze patterns over time. Similarly, the component Maintenance of World Information has three subcomponents for a dynamic world model, a world state model, and a world history model, respectively. Moreover, the component Agent Specific Task has the following three subcomponents: Simulation Execution extends the information in the agent state model based on the internally represented dynamic agent model for the subject agent’s functioning, Process Analysis assesses the current state of the agent, and Plan Determination determines whether action has to be undertaken, and, if so, which ones (e.g., to determine that the cruise control agent has to be informed). Finally, as in the model GAM, the components World Interaction Management and Agent Interaction Management prepare (based on internally generated information) and receive (and internally forward) interaction with the world and other agents.
234
T. Bosse et al.
4.2 State Ontologies within Agent and World To express the information involved in the agent’s internal processes, the ontology shown in Table 3 was specified. Table 3. Ontology used within the Ambient Agent Model Ontology element belief(I:INFO_EL) world_fact(I:INFO_EL) has_effect(A:ACTION, I:INFO_EL) leads_to_after(I:INFO_EL, J:INFO_EL, D:REAL) at(I:INFO_EL, T:TIME)
Description information I is believed I is fact true in the world action A has effect I state property I leads to state property J after D property I holds at time T
As an example belief(leads_to_after(I:INFO_EL, J:INFO_EL, D:REAL)) is an expression based on this ontology which represents that the agent has the knowledge that state property I leads to state property J with a certain time delay specified by D. 4.3 Generic Temporal Relations within AAM The temporal relations for the functionality within the Ambient Agent Model are: Belief Generation based on Observation and Communication
∀X:AGENT, I:INFO_EL, W:WORLD input(X)|observed_from(I, W) ∧ internal(X)|belief(is_reliable_for(W, I))
→ →
internal(X)|belief(I)
∀X,Y:AGENT, I:INFO_EL input(X)|communicated_from_ to(I,Y,X) ∧ internal(X)|belief(is_reliable_for(X, I))
→ →
internal(X)|belief(I)
Here, the first rule is a generic rule for the component World Interaction Management, and the second for the component Agent Interaction Management. When the sources are assumed always reliable, the conditions on reliability can be left out. Belief Generation based on Simulation
∀X:AGENT ∀I,J:INFO_EL ∀D:REAL ∀T:TIME internal(X)|belief(at(I, T)) ∧ internal(X)|belief(leads_to_after(I, J, D))
→ →
internal(X)|belief(at(J, T+D))
The last generic rule within the agent’s component Simulation Execution specifies how a dynamic model that is explicitly represented as part of the agent’s knowledge (within its component Maintenance of Dynamic Models) can be used to perform simulation, thus extending the agent’s beliefs about the world state at different points in time. This can be considered an internally represented deductive causal reasoning method. Another option is a multiple effect abductive causal reasoning method: Belief Generation based on Multiple Effect Abduction
∀X:AGENT ∀I,J1, J2:INFO_EL ∀D:REAL ∀T:TIME J1≠J2 ∧ internal(X)|belief(at(J1, T)) ∧ internal(X)|belief(leads_to_after(I, J1, D)) ∧ internal(X)|belief(at(J2, T)) ∧ internal(X)|belief(leads_to_after(I, J2, D)) → → internal(X)|belief(at(I, T-D))
4.4 Generic Temporal Relations within a World For World Components the following specifications indicate the actions’ effects and how observations provide their results. Action Execution and Observation Result Generation in a World
∀W:WORLD_COMP ∀A:ACTION ∀I:INFO_EL input(W)|performing_in(A, W) ∧ internal(W)|has_effect(A,I)
A Component-Based Ambient Agent Model → →
235
internal(W)|world_fact(I)
∀W:WORLD_COMP ∀I:INFO_EL input(W)|observation_focus_in(I, W) ∧ internal(W)|world_fact(I)
→ →
output(W)|observation_result_from(I, W)
∀W:WORLD_COMP ∀I:INFO_EL input(W)|observation_focus_in(I, W) ∧ internal(W)|world_fact(not(I)) output(W)|observation_result_from(not(I), W)
→ →
5 Instantiations of the Ambient Agent Model This section provides instantiations of the Ambient Agent Model for, respectively, Ambient Monitor Agents, a Driver Assessment Agent, and a Cruise Control Agent. 5.1 Ambient Monitor Agents As a refinement of the Ambient Agent Model AAM, an Ambient Monitoring Agent Model AMAM has been designed, and instantiated for steering monitoring and gaze monitoring. Table 4 indicates the components within these monitoring agents. These agents relate temporal patterns of gaze, resp. steering to qualifications of abnormality. Table 4. Ambient Monitor Agent Model: Components Maintenance of Agent and World Information maintenance of model of gaze/steering patterns over time history models Agent Specific Task process analysisdetermine whether a gaze/steering pattern is abnormal plan determinationfor abnormality state decide to communicate to driver assessment agent Agent Interaction Management prepare communication to driver assessment agent
A monitor agent receives a stream of information over time, obtained by observation of a world component or by communication from other agents. Typical sources of information are world parts equipped with sensor systems or sensoring agents that interact with such world parts. Any monitoring agent has some properties of the incoming information stream that are to be monitored (monitoring foci), e.g., concerning the value of a variable, or a temporal pattern to be detected in the stream. As output a monitoring agent generates communication that a certain monitoring focus holds. A monitor focus can be a state property or a dynamic property. An example of a simple type of state property to be used as a monitor focus is a state property that expresses that the value of a certain variable X is between two bounds LB and UB: ∃V [ has_value(X, V) ∧ LB ≤ V ∧ V ≤ UB ]. In prefix notation, this can be expressed as follows: exists(V, and(has_value(X, V), and(LB ≤ V, V ≤ UB))). It is possible to obtain abstraction by using (meaningful) names of properties. For example, stable_within(X, LB, UB) can be used as an abstract name for the example property expressed above by specifying: has_expression(stable_within(X, LB, UB), exists(V, and(has_value(X, V), and(LB ≤ V, V ≤ UB))))
The fact that a property stable_within(X, LB, UB) is a monitor focus for the monitor agent is expressed by: monitor_focus(stable_within(X, LB, UB)). An example of a monitor property is:
236
T. Bosse et al.
∀t [ t1≤t ∧ t≤t2 ∧ at(has_value(X, V1), t1) → ∃t’, V2 t≤ t’ ≤ t+D ∧ V2 ≠V1 ∧ at(has_value(X, V2), t’) ]
This property expresses that between t1 and t2 the value of variable X is changing all the time, which can be considered as a type of instability of that variable. This dynamic property is expressed in prefix notation as: forall(t, implies(and(t1≤t, and(t≤t2, at(has_value(X, V1), t))), exists(t’, exists(V2, and(t≤ t’, and(t’ ≤ t+D, and(V2 ≠V1, at(has_value(X, V2), t’))))
This expression can be named, for example, by instable_within_duration(X, D). It is assumed that the monitor focus on which output is expected is an input for the agent, communicated by another agent. This input is represented in the following manner. communicated_from_to(monitor_focus(F), A, B) communicated_from_to(has_expression(F, E), A, B)
Note that it is assumed here that the ontology elements used in the expression E here are elements of the ontology used for the incoming stream of information. Moreover, note that for the sake of simplicity, sometimes a prefix such as input(X)|, which indicates in which agent a state property occurs, is left out. Within AMAM’s World Interaction Management component, observation results get a time label: observed_result_in(I, W) ∧ current_time(T) → → belief(at(I, T)). Similarly, within the Agent Interaction Management component communicated information is labeled: communicated_from_to(I, X, AMAM) ∧ current_time(T) → → belief(at(I, T)). The time-labeled consequent atoms belief(at(I, T)) are transferred to the component Maintenance of Agent History and stored there. Within the component Process Analysis two specific subcomponents are used: Monitoring Foci Determination, and Monitor Foci Verification. Monitoring Foci Determination. In this component the monitor agent’s monitoring foci are determined and maintained: properties that are the focus of the agent’s monitoring task. The overall monitoring foci are received by communication and stored in this component. However, to support the monitoring process, it is useful when an overall monitor focus is decomposed into more refined foci: its constituents are determined (the subformulas) in a top-down manner, following the nested structure. This decomposition process was specified in the following manner: monitor_focus(F)
→ →
in_focus(F)
in_focus(E) ∧ is_composed_of (E, C, E1, E2)
→ →
in_focus(E1) ∧ in_focus(E2)
Here is_composed_of(E, C, E1, E2) indicates that E is an expression obtained from subexpressions E1 and E2 by a logical operator C (i.e., and, or, implies, not, forall, exists). Monitoring Foci Verification. The process to verify whether a monitoring focus holds, makes use of the time-labeled beliefs that are maintained. If the monitoring focus is an atomic property at(I, T) of the state of the agent and/or world at some time point, beliefs about these state properties are involved in the verification process: in_focus(E) ∧ belief(E)
→ →
verification(E, pos)
Verification of more complex formulae is done by combining the verification results of the subformulae following the nested structure in a bottom-up manner: in_focus(and(E1, E2)) ∧ verification(E1, pos) ∧ verification(E2, pos) in_focus(or(E1, E2)) ∧ verification(E1, pos)
→ →
→ →
verification(and(E1, E2) , pos)
verification(or(E1, E2) , pos)
A Component-Based Ambient Agent Model in_focus(or(E1, E2)) ∧ verification(E2, pos)
→ →
237
verification(or(E1, E2) , pos)
→ →
in_focus(implies(E1, E2)) ∧ verification(E2, pos)
in_focus(implies(E1, E2)) ∧ not verification(E1, pos)
verification(implies(E1, E2), pos)
→ →
verification(implies(E1, E2), pos)
→ → verification(not(E), pos) in_focus(exists(V, E)) ∧ verification(E, pos) → → verification(exists(V, E), pos) in_focus(forall(V, E)) ∧ not verification(exists (V, not(E), pos) → → verification(forall(V, E), pos) in_focus(not(E)) ∧ not verification(E, pos)
The negative outcomes not verification(E, pos) of verification can be obtained by a Closed World Assumption on the verification(E, pos) atoms. If needed, from these negations, explicit negative verification outcomes can be derived: not verification(E, pos)
→ →
verification(E, neg)
The following relates verification of an expression to its name: verification(E, S) ∧ has_expression(F, E)
→ →
verification(F, S)
Eventually, when a monitoring property E has been satisfied that is an indication for a certain type of abnormal behaviour of the driver, the Monitoring agent will indeed believe this; for example, for the Steering Monitoring Agent: verification(E, pos) ∧ internal(monitoring_agent)|belief(is_indication_for(E, I))
→ →
internal(monitoring_agent)|belief(I)
5.2 Driver Assessment Agent As another refinement of the Ambient Agent Model AAM, the Driver Assessment Agent Model DAAM; see Table 5 for an overview of the different components. For the Driver Assessment Agent, a number of domain-specific rules have been identified in addition to the generic rules specified for the Ambient Agent Model presented in Section 4. Some of the key rules are expressed below. First of all, within the Driver Assessment Agent an explicit representation is present of a dynamic model of the driver’s functioning. In this model it is represented how an impaired state has behavioural consequences: abnormal steering operation and gaze focusing. Table 5. Driver Assessment Agent Model: Components Maintenance of Agent and World Information maintenance of dynamic models model relating impaired state to abnormal steering behaviour and gaze focussing maintenance of model of internal state, abnormality of gaze of driver, and of steering state models wheel Agent Specific Task process analysis determine impaired driver state by multiple effect abduction plan determination for impaired driver state decide to communicate negative assessment to cruise control agent Agent Interaction Management receive and prepare communication (from monitor agents, to cruise control agent)
The dynamic model is represented in component Maintenance of Dynamic Models by: internal(driver_assessment_agent)|belief(leads_to_after(impaired_state, abnormal_steering_operation, D)) internal(driver_assessment_agent)|belief(leads_to_after(impaired_stste, unfocused_gaze, D))
238
T. Bosse et al.
The Driver Assessment Agent receives information about abnormality of steering and gaze from the two monitoring agents. When relevant, by the multiple effect abductive reasoning method specified by the generic temporal rule in Section 4, the Driver Assessment Agent derives a belief that the driver has an impaired internal state. This is stored as a belief in the component Maintenance of an Agent State Model. Next, it is communicated to the Cruise Control Agent that the driver assessment is negative. 5.3 Cruise Control Agent The Cruise Control Agent Model CCAM is another agent model obtained by specialisation of the Ambient Agent Model AAM. It takes the appropriate measures, whenever needed. Within its Plan Determination component, the first temporal rule specifies that if it believes that the driver assessment is negative, and the car is not driving, then the ignition of the car is blocked: internal(cruise_control_agent)|belief(driver_assessment(negative)) ∧ internal(cruise_control_agent)|belief(car_is_not_driving)
→ → output(cruise_control_agent)|performing_in(block_ignition, car_and_environment)
If the car is already driving, the car is slowed down: internal(cruise_control_agent)|belief(driver_assessment(negative)) ∧ internal(cruise_control_agent)|belief(car_is_driving)
→ → output(cruise_control_agent)|performing_in(slow_down_car, car_and_environment)
6 Simulation Results Based upon temporal rules as described in previous section, a specification within the LEADSTO software environment (cf. [4]) has been made and simulation runs of the system have been generated, of which an example trace is shown in Figure 2. In the figure, the left side indicates the atoms that occur during the simulation whereas the right side indicates a time line where a dark box indicates the atom is true at that time point and a grey box indicates false. Note that in the trace merely the outputs and internal states of the various components are shown for the sake of clarity. The driver starts the car and accelerates, resulting in a driving car. internal(car_and_environment)|world_fact(car_driving)
After a short time, between time points 10 and 20, the driver shows signs of inadequate behaviour: the gaze becomes unfocused and steering instable. Over short time intervals an alternation occurs of: output(driver)|performing_in(steer_position(centre), car_and_environment) output(driver)|performing_in(steer_position(left), car_and_environment) output(driver)|performing_in(steer_position(right), car_and_environment)
On the other hand, the gaze focus becomes fixed for long time intervals: output(driver)|observation_result_from(gaze_focus(far_away), driver)
The temporal sequences of these observed steering positions and gaze focus are communicated moment by moment by the respective sensoring agent to the
A Component-Based Ambient Agent Model internal(car_and_environment)|world_fact(car_not_driving) output(driver)|performing_in(start_engine, car_and_environment) output(driver)|performing_in(steer_position(centre), car_and_environment) output(car_and_environment)|observation_result_from(car_not_driving, car_and_environment) internal(car_and_environment)|world_fact(steer_position(centre)) internal(car_and_environment)|world_fact(engine_running) output(car_and_environment)|observation_result_from(steer_position(centre), car_and_environment) output(car_and_environment)|observation_result_from(engine_running, car_and_environment) output(driver)|performing_in(accelerate, car_and_environment) output(steering_sensoring_agent)|communication_from_to(steer_position(centre), steering_sensoring_agent, steering_monitoring_agent) internal(car_and_environment)|world_fact(car_driving) output(car_and_environment)|observation_result_from(car_driving, car_and_environment) internal(driver)|world_fact(gaze_focus(far_away)) output(driver)|performing_in(steer_position(left), car_and_environment) output(driver)|observation_result_from(gaze_focus(far_away), driver) verification(gp(1), pos) output(driver)|performing_in(steer_position(right), car_and_environment) internal(car_and_environment)|world_fact(steer_position(left)) verification(gp(15), pos) output(car_and_environment)|observation_result_from(steer_position(left), car_and_environment) output(gaze_focus_sensoring_agent)|communication_from_to(gaze_focus(far_away), gaze_focus_sensoring_agent, gaze_focus_monitoring_agent) internal(car_and_environment)|world_fact(steer_position(right)) output(car_and_environment)|observation_result_from(steer_position(right), car_and_environment) output(steering_sensoring_agent)|communication_from_to(steer_position(left), steering_sensoring_agent, steering_monitoring_agent) output(steering_sensoring_agent)|communication_from_to(steer_position(right), steering_sensoring_agent, steering_monitoring_agent) output(steering_monitoring_agent)|communication_from_to(abnormal_steering_operation, steering_monitoring_agent, driver_assessment_agent) output(gaze_focus_monitoring_agent)|communication_from_to(unfocused_gaze, gaze_focus_monitoring_agent, driver_assessment_agent) output(driver_assessment_agent)|communication_from_to(driver_assessment(negative), driver_assessment_agent, cruise_control_agent) output(cruise_control_agent)|performing_in(slow_down_car, car_and_environment) output(cruise_control_agent)|performing_in(block_ignition, car_and_environment) internal(car_and_environment)|world_fact(engine_always_off) time
0
10
20
30
239
40
50
60
Fig. 2. Example Simulation Trace
corresponding monitoring agent. The following dynamic monitor property is used as monitor focus within the Steering Monitoring Agent: ∀t [ t1≤t ∧ t≤t2 ∧ belief(at(steer_position(centre), t)) → ∃t’ t≤ t’ ≤ t+D ∧ not belief(at(steer_position(centre), t’))
This property expresses that between t1 and t2, whenever the steer is in a central position, there is a slightly later time point at which it is not in a central position (in other words, the driver keeps on moving the steer). This dynamic property is expressed in prefix notation as: forall(t, implies(and(t1 ≤ t, and(t ≤ t2, belief(at(steer_position(centre), t)))), exists(t’, and(t ≤ t’, and(t’ ≤ t+D, not(belief(at(steer_position(centre), t’))))
In LEADSTO this property was expressed as: is_composed_of(gp(1), forall, t, gp(2, t)) is_composed_of(gp(2, t), implies, gp(3, t), gp(8, t)) is_composed_of(gp(3, t), and, gp(4, t), gp(5, t)) has_expression(gp(4, t), t1≤t) is_composed_of(gp(5, t), and, gp(6, t), gp(7, t)) has_expression(gp(6, t), t≤t2) has_expression(gp(7, t), belief(at(steer_position(centre), t))) is_composed_of(gp(8, t), exists, t’, gp(9, t, t’))
is_composed_of(gp(9, t, t’), and, gp(10, t, t’), gp(11, t, t’)) has_expression(gp(10, t, t’), t≤t’) is_composed_of(gp(11, t, t’), and, gp(12, t, t’), gp(13, t, t’)) has_expression(gp(12, t, t’), t’≤sum(t, D)) is_composed_of(gp(13, t, t’), not, gp(14, t, t’), gp(14, t, t’)) has_expression(gp(14, t, t’), belief(at(steer_position(centre), t’)))
Note that during the process within the Steering Monitoring Agent the overall monitoring focus given by this dynamic property is decomposed into a number of smaller expressions (using the predicate is_composed_of). The top level expression (that is checked by the Steering Monitoring Agent) is called gp(1). The atomic expressions
240
T. Bosse et al.
have the form of a belief that a state property holds at a certain time point (e.g., belief(at(steer_position(centre), t))), or of an inequality (e.g., t≤t2). The following dynamic monitor property is used as monitor focus within the Gaze Focus Monitoring Agent: ∃t ∀t’ [ t ≤ t’ ≤ t+D → belief(at(gaze_focus(far_away), t’)) ]. This property expresses that there is a time period from t to t+D in which the gaze of the driver is focused at a point far away. It is expressed in prefix notation as: exists(t, forall(t’, implies(and(t≤t’, t’≤t+D), belief(at(gaze_focus(far_away), t’))))). Within the LEADSTO model, this property was expressed as: is_composed_of(gp(15), exists, t, gp(16, t)) is_composed_of(gp(16, t), forall, t’, gp(17, t, t’)) is_composed_of(gp(17, t, t’), implies, gp(18, t, t’), gp(21, t, t’)) is_composed_of(gp(18, t, t’),
and, gp(19, t, t’), gp(20, t, t’)) has_expression(gp(19, t, t’), t≤t’) has_expression(gp(20, t, t’), t’≤sum(t, D)) has_expression(gp(21, t, t’), belief(at(gaze_focus(far_away), t’)))
Here, the top level expression (that is checked by the Gaze Focus Monitoring Agent) is called gp(15). Given these monitoring foci, the monitoring agents detect the patterns in this sensor information, classify them as abnormal, and communicate this to the Driver Assessment Agent. By the multiple effect abductive reasoning method, this agent generates the belief that the driver is having an impaired state, upon which a negative driver assessment is communicated to the Cruise Control Agent. The Cruise Control Agent first slows down the car, and after it stopped, blocks the ignition: output(cruise_control_agent)|performing_in(slow_down_car, car_and_environment) output(cruise_control_agent)|performing_in(block_ignition, car_and_environment)
7 Verification of Dynamic Properties This section addresses specification and verification of relevant dynamic properties of the cases considered, for example, requirements imposed on these systems. 7.1 Properties of the System as a Whole A natural property of the Ambient Driver Support System is that a driver with impaired driving behaviour cannot continue driving. The global property is: GP1 No driving when symptoms of impaired driving occur If the driver exposes symptoms that indicate that it is not safe to drive anymore then within 30 seconds the car will not drive and the engine will be off ∀γ:TRACE, t:TIME, R:REAL (unfocused_gaze(t, γ) ∧ abnormal_steering_behaviour(t, γ)) ⇒ ∃t2:TIME < t:TIME + 30 [state(γ, t2, internal(car_and_environment))|= world_fact(car_not_driving)]
This property makes use of two other properties: UG Unfocussed gaze has occurred for some time. In trace γ, during the time period D just before t, the gaze of the driver was focussed at a far distance. ∀t2:TIME ((t2 <= t) ∧ (t2 >= t-D)) ⇒ [state(γ, t2, internal(driver)|= world_fact(gaze_focus(far_away))]
AS Abnormal steering behaviour has occurred In trace γ, during a time period P just before t, whenever the steer is in a central position, there is time point within D time steps at which the steer is not in a central position.
A Component-Based Ambient Agent Model
241
∀t:TIME ((t-P-D < t2) ∧ (t2 < t-D) ∧ [state(γ, t2, internal(car_and_environment)|= world_fact(steer_position(centre))]) ⇒ ∃t3:TIME, ((t <= t3) ∧ (t3 <= t-D) ∧ not ([state(γ, t3, internal(car_and_environment)|= world_fact(steer_position(centre))]))
The global property GP1 has been automatically verified (using the TTL checker tool [3]) against the trace shown in the paper. For D a value of 3 has been used, which means that the driver should have an unfocussed gaze for at least 3 time steps, and that steering corrections should occur within 3 time steps. For P a value of 10 has been used, which means that continued steering corrections should be present for at least 10 time steps. Under these conditions, GP1 proved to hold for the generated trace. 7.2 Interlevel Relations between Properties at Different Aggregation Levels Following [7], dynamic properties can be specified at different aggregation levels. For the Ambient Driver Support system, three levels are used: properties of system as a whole, properties of subsystems, and properties of agents and the world within a subsystem. In Table 6 it is shown for the Ambient Driver Support System how the property at the highest level relates to properties at the lower levels (see also Figure 2). The lower level properties in the fourth column are described below. Table 6. Properties and their interlevel relations subsystems sensoring monitoring assessment plan determination subject process
S1 M1 A1 P1 SP1
components steering, gaze-focus sensoring steering, gaze-focus monitoring driver assessment cruise control driver, car/env
SSA1, GSA1 SMA1, GMA1 DAA1 CCA1, CCA2 CE1, CE2
The property GP1 of the system as a whole can be logically related to properties of the subsystems (shown in the second column in the table) by the following inter level relation: S1 & M1 & A1 & P1 & SP1 ⇒ GP1. This expresses that the system functions well when all of the subsystems for sensoring, monitoring, assessment, plan determination and the subject process function well. 7.3 Properties of Subsystems S1 Sensoring system If the sensory system receives observation input from the world and driver concerning gaze focus and steering operation, then it will provide as output this information for the monitoring system M1 Monitoring system If the monitoring system receives sensor information input concerning gaze-focus and steering operation from the sensoring system, then it will provide as output monitoring information concerning qualification of gaze-focus and steering operation for the assessment system. A1 Assessment system If this system receives monitoring information concerning specific qualifications of gaze-focus and steering operation, then it will provide as output a qualification of the driver state. P1 Plan determination system If the plan determination system receives an overall qualification of the driver state, then it will generate as output an action to be undertaken.
242
T. Bosse et al.
SP1 Subject process If the subject process receives an action to be undertaken, then it will obtain the effects of these actions. If an impaired internal driver state occurs, then the driver will operate the steering wheel abnormally and the driver’s gaze is unfocused.
7.4 Properties of Components As indicated in Table 6 in the fourth column, each property of a subsystem is logically related to properties of the components within the subsystem. For example, the inter level relation SSA1 & GSA1 ⇒ S1 expresses that the sensoring subsystem functions well when each of the sensoring agents functions well Similarly, for the monitoring subsystem SMA1 & GMA1 ⇒ M1. Properties characterising proper functioning of components are the following. The properties for the other sensoring and monitoring agents (GSA1, GMA1) are similar. SSA1 Steering Sensoring agent If the Steering Sensoring agent receives observation results about steering wheel operation then it will communicate this observation information to the Steering Monitoring agent SMA1 Steering Monitoring agent If the Steering Monitoring agent receives observation results about the steering wheel, and this operation is abnormal, then it will communicate to the Driver Assessment Agent that steering operation is abnormal. GSA1 Gaze Sensoring agent If the Gaze Sensoring agent receives gaze observation results then it will communicate this observation information to the Gaze Monitoring agent GMA1 Gaze Monitoring agent If the Gaze Monitoring agent receives gaze observation results, and this shows an abnormal pattern, then it will communicate to the Driver Assessment Agent that gaze is abnormal.
The properties for the Driver Assessment Agent are: DAA1 Assessment of driving behaviour If the Driver Assessment Agent receives input that steering operation is abnormal and gaze is unfocused, then it will generate as output communication to the Cruise Control agent that the driver state is inadequate
For the Cruise Control Agent the properties are: CCA1 Slowing down a driving car If the Cruise Control agent receives communication to that the driver state is inadequate, and the car is driving, then it will slow down the car. CCA2 Turning engine off for a non driving car If the Cruise Control agent receives communication that the driver state is inadequate, and the car is not driving, then it will turn off the engine.
The properties for the Car and Environment are: CE1 Slowing down stops the car If the Car and Environment components perform the slowing down action, then within 20 seconds the car will not drive. CE2 Turning off the engine makes the engine off If the Car and Environment components perform the turn off engine action, then within 5 seconds the engine will be off.
A Component-Based Ambient Agent Model
243
8 Discussion The ambient agent-based model introduced in this paper is described at an implementation-independent conceptual design level. It has facilities built in to represent models for human states and behaviours, dynamic process models, and analysis methods on the basis of such models. The model involves both generic and specific content and provides a detailed component-based executable design for a working prototype system. The specific content, together with the generic methods to operate on it, enables ambient agents to react in a knowledgeable manner. Thus a reusable application model was obtained that can be considered an agent-based Ambient Intelligence system (cf. [1], [2], [9]). It was shown how the different types of agents work together to support safety of the driving and the driver. Simulation experiments have been conducted and the outcomes have been formally analysed, thus showing in how far the system indeed supports safety. For the monitoring agents, specific patterns of gaze and steering behaviour were chosen and formalised in a temporal language as monitor foci. However, as the approach is more general, it is easy to use different, more sophisticated monitoring foci. It would be interesting further experimental research to find out which types of observable deviations of driving behaviour can be found as effects of different types of impaired internal states, for example caused by drugs, or by becoming sleepy, and use results of this to obtain more sophisticated monitoring foci and actions.
References 1. Aarts, E., Collier, R.W., van Loenen, E., de Ruyter, B. (eds.): EUSAI 2003. LNCS, vol. 2875, p. 432. Springer, Heidelberg (2003) 2. Aarts, E., Harwig, R., Schuurmans, M.: Ambient Intelligence. In: Denning, P. (ed.) The Invisible Future, pp. 235–250. McGraw Hill, New York (2001) 3. Bosse, T., Jonker, C.M., Meij, L., van der Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Cognitive Agent Models. In: Nishida, T., et al. (eds.) Proceedings of the Sixth International Conference on Intelligent Agent Technology, IAT 2006., pp. 247–254. IEEE Computer Society Press, Los Alamitos (2006) 4. Bosse, T., Jonker, C.M., van der Meij, L., Treur, J.: A Language and Environment for Analysis of Dynamics by Simulation. International Journal of Artificial Intelligence Tools 16, 435–464 (2007) 5. Brazier, F.M.T., Jonker, C.M., Treur, J.: Compositional Design and Reuse of a Generic Agent Model. Applied Artificial Intelligence Journal 14, 491–538 (2000) 6. Brazier, F.M.T., Jonker, C.M., Treur, J.: Principles of Component-Based Design of Intelligent Agents. Data and Knowledge Engineering 41, 1–28 (2002) 7. Jonker, C.M., Treur, J.: Compositional Verification of Multi-Agent Systems: a Formal Analysis of Pro-activeness and Reactiveness. International Journal of Cooperative Information Systems 11, 51–92 (2002) 8. Reiter, R.: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. MIT Press, Cambridge (2001) 9. Riva, G., Vatalaro, F., Davide, F., Alcañiz, M. (eds.): Ambient Intelligence. IOS Press, Amsterdam (2005)
A Cartesian Robot for RFID Signal Distribution Model Verification Aliasgar Kutiyanawala and Vladimir Kulyukin Computer Science Assistive Technology Laboratory (CSATL) Department of Computer Science Utah State University Logan, UT, 84322 [email protected], [email protected]
Abstract. In our previous research, we addressed the problem of automating the design of passive radio-frequency (PRF) services. An optimal PRF surface is one that offers a maximum probability of localization at a minimum instrumentation cost, i.e., a minimum number of surface-embedded passive RFID transponders. Our previous results were based on the assumption that the signal distribution model of an individual RFID transponder can be approximated as a circle. The problem of automated PRF surface design was then formulated as the problem of packing a surface with circles of a given radius. However, in practice, this approach leads to some loss of optimality: some areas of the surface may not be covered or too many transponders may be required. More exact methods are need for verifying and constructing signal distribution models of surface-embedded RFID transponders that can be used by surface packing algorithms to optimize the design. In this paper, we present the design and implementation of a Cartesian robot for verifying and constructing signal distribution models of surfaceembedded RFID transponders. A model is characterized by four high-level parameters: an RFID transponder, an RFID antenna, an RFID reader, and a surface type. The robot moves an RFID reader-antenna unit over a PRF surface, e.g. a carpet, and systematically collects readings for various antenna positions over the surface. The collected readings are subsequently processed to verify or construct signal distribution models. We describe experiments with the robot to verify the localization probability of automatically designed PRF surfaces. We also present experiments with the robot to verify and construct the signal distribution models of a specific RFID transponder.
1 Introduction A smart environment is a regular everyday environment instrumented with embedded sensors and computer systems that that make use of the data they receive from those sensors to support a quality-of-life function [1]. Since many smart environments are composed of surfaces, one can pose the question of how PRF sensors can be embedded into those surfaces to improve the functionality of mobile units operating in those environments. In our previous research [11,12], we addressed the problem of automating the design of PRF services. An optimal PRF surface is one that offers a maximum localization probability at a minimum cost, i.e., a minimum number of embedded RFID F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 244–257, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Cartesian Robot for RFID Signal Distribution Model Verification
245
transponders. The cost of the surface material is presently not taken into account. Our previous results were based on the explicit assumption that the signal distribution model of an individual RFID transponder can be approximated as a circle. The problem of automated PRF surface design was then formulated as the problem of packing a surface with circles. However, in practice, this approach leads to some loss of optimality: some areas of the surface may not be covered or too many transponders may be required. More exact methods are need for verifying and constructing the signal distribution models of surface-embedded RFID transponders that can be used by surface packing algorithms to optimize the design. In this paper, we present the design and implementation of a Cartesian robot for verifying and constructing signal distribution models of surface-embedded RFID transponders. A model is characterized by four high-level parameters: an RFID transponder, an RFID antenna, an RFID reader, and a surface type. The robot moves an RFID reader-antenna unit over a PRF surface, e.g. a carpet, and systematically collects readings for various antenna positions over the surface. The collected readings are subsequently processed to verify or construct signal distribution models. Several research efforts are related to our research. Patterson et al. [6] use gloveembedded RFID readers that detect RFID stickers on various household objects to monitor the activities of seniors in their homes. Willis and Helal [7] propose an assisted navigation system where an RFID reader is embedded into a blind navigator’s shoe and passive RFID sensors are placed in the floor. Kantor and Singh use RFID tags for robot localization and mapping [8]. Once the positions of the RFID transponders are known, their system uses time-of-arrival type of information to estimate the distance from detected tags. Tsukiyama [10] developed a navigation system for mobile robots using RFID transponders under the assumption of perfect signal reception and zero uncertainty. H¨ ahnel et al. [9] developed a probabilistic robotic mapping and localization system to analyze whether RFID can be used to improve the localization of mobile robots in office environments.
2 Reading a RFID Transponder Figure 1 shows a typical setup of a passive RFID system. The RFID reader takes in a command (usually in the form of a string) from the user, generates the required signals and transmits them in the form of electromagnetic waves through the antenna. This electromagnetic signal excites a small coil in the RFID transponder and charges a capacitor inside the transponder. The energy stored inside the capacitor powers up a circuitry inside the transponder and a unique ID is transmitted back through the coil. The antenna receives this unique ID in the form of an electromagnetic signal decoded by the RFID reader. The reader sends this unique ID back to the user in the form of a string. For the antenna to read the transponder, enough electromagnetic voltage must be induced in the transponder’s coil to charge up the capacitor. The amount of electromagnetic voltage induced in the coil depends upon the transponder’s position with respect to the antenna (in terms of x, y, z coordinates), the transponder’s orientation with respect to the antenna (θ), the dielectric constant (k) of the material between the antenna and
246
A. Kutiyanawala and V. Kulyukin
Fig. 1. Typical RFID Setup
(a) Isofield diagram 0 Degrees
(b) Isofield diagram 90 Degrees
Fig. 2. Isofield diagrams for the stick antenna (a) Orientation of tag = 0 degrees (b) Orientation of tag = 90 degrees
the transponder, the type of antenna and transponder, and the power (V ) given to the RFID reader. If we assume that the dielectric constant and power given to the RFID antenna are constant, we can develop a function f (x, y, z, θ) = {true, f alse} for a given antennatransponder unit. This function describes whether a given transponder placed at the coordinates (x, y, z) and at an orientation θ with respect to the antenna, can be read by the antenna or not. An isofield or a charge-up diagram provides a visual description of such a function. Figures 2(a) through 2(b) show the isofield diagrams for the Series 2000 stick antenna from Texas Instruments and are obtained from the Texas Instruments Antenna Reference Guide [14]. The read area of a RFID transponder is defined as the area where the antenna can read the transponder. The read areas of a transponder can be visualized in Figures 9(a) through 9(d). The read area of a transponder can defined as R = {(x, y, z, θ)|f (x, y, z, θ) = true}. In our previous research [11], we proposed several algorithms for designing PRF surfaces on the assumption that the read area is a circle of radius r. However, the actual read area has a butterfly-like shape.
3 Optimality of PRF Surfaces A PRF surface is defined as a surface embedded with passive RFID tags. A mobile unit, such as a walker for the elderly, that operates in a smart environments can utilize
A Cartesian Robot for RFID Signal Distribution Model Verification
247
either proprioception (action is determined relative to an internal frame of reference) or exteroception (action is determined from a stimulus originating in the environment itself). The primary purpose of a PRF surface is to make it possible for a mobile device, e.g. a walker for the elderly [13], to perform exteroceptive localization reliably. In order to localize, the device must be able to read at least one of the transponders embedded in the PRF surface. Once a transponder is read, the unit immediately localize by retrieving the transponder’s position from a previously compiled database that maps transponder IDs to positions. We define localization probability as the probability of the mobile device reading at least one transponder as its crosses the surface from one side to another on a straight line. One way to increase the localization probability is to populate the surface with more transponders. While this method is guaranteed to increase the localization probability, it will also increase the cost of the surface. To address this trade-off, we define the optimality of a PRF surface as the ratio of the localization probability to the cost, i.e. the number of surface-embedded transponders. Thus, to increase the optimality, the localization probability must be increased without a proportionate increase in the number of transponders embedded in the PRF surface. Optimality can be increased by placing the transponders at strategic locations on the surface. We used this idea to develop four algorithms for automatically designing optimal PRF surfaces. To make the presentation more complete, we present only a brief description of each algorithm below. An interested reader is referred to [11]. – Brute-Force Method: All possible placement patterns are computed for a given PRF surface. The localization probability is computed for each pattern. The pattern with the highest localization probability is chosen as the final design. – Static Greed: RFID transponders are placed at intersection points of the different paths taken by a mobile unit to cross the PRF surface. The weight of each intersection is computed to be the number of paths passing through it. The first transponder is placed at the intersection point with the highest weight, the second transponder is placed at the next available intersection point with the second highest weight and so on. – Dynamic Greed: RFID transponders are placed at intersection points of the different paths taken to cross the surface. The weight of each intersection is computed as the number of paths passing through it. The first transponder is placed at the intersection point with the highest weight. Paths passing through this intersection point are excluded from subsequent transponder placements and the weights are recomputed. The next transponder is placed at the intersection point with the highest weight and so on. – Hill-Climbing: Transponders are initially placed at random positions. A responder is randomly chosen, and is moved by a random amount in a random direction. The localization probability is calculated at this position. If this localization probability is greater than the previous localization probability, the move is accepted, otherwise the move is rejected. This process is continued until the localization probability does not change over several consecutive iterations. Of the four algorithms described above, only the brute force method guarantees an optimal PRF surface design. However, it runs in exponential time and is not practical for
248
A. Kutiyanawala and V. Kulyukin
Fig. 3. Probability of localization of a PRF surface
large PRF surfaces. The other algorithms produce designs that are reasonably optimal and run in polynomial time. All algorithms repeatedly compute the localization probability. Figure 3 shows a PRF surface embedded with one transponder. Suppose there is a mobile unit equipped with one RFID antenna crossing the PRF surface from side A to side B. Since the width of the surface is assumed to be small, we assume that the unit will travel only along a straight line (path). To reduce the computational complexity, let us discretize each side (A and B) into n points and further assume that the unit will cross the surface only along one of the lines connecting the n points A to the n points on B. The read area of the transponder is assumed to be a circle having the transponder as its center. The unit can read the transponder if it crosses the surface along one of the paths that intersect the circle (shown by light gray colored lines) and cannot read the transponder if the path does not intersect the circle (shown by dark gray colored lines). The localization probability is computed as p = nnrt , where nr is the number of paths that cross the circle and nt is the total number of paths. For the surface in Figure 3, the localization probability is 62%. The optimality of p the surface is computed as ntags . In this case, we say that the optimality of the surface is also 62% since ntags = 1.
4 A Cartesian Robot To visualize the read area of a transponder, we performed a simple experiment: we placed a transponder on a surface and then placed an antenna at various points around it to check if the transponder could be read. This experiment, though tedious and error prone, gave us a rough estimate of the read area. It also provided us with an idea to design a robot that would automate the entire process. The requirements of this robot were to scan a RFID transponder automatically and provide the read area as the final output.
A Cartesian Robot for RFID Signal Distribution Model Verification
249
Fig. 4. Model of the Cartesian Robot
We found a range of ±25cms along the X and Y axes to be sufficient for transponder scanning. We allowed the antenna’s height to range from 2.5cms to 10cms so that the read areas could be obtained at different heights of the antenna with respect to the transponder. The granularity of the scan depended upon the resolution at which the transponder was scanned. We found that a minimum resolution of 2mm was necessary to produce a sufficiently detailed image of the read area. We minimized the use of ferromagnetic materials in the robot to reduce the ferromagnetic interference.. The last requirement did not allow us to buy off-the-shelf Cartesian robots as most of them are made using metals. We decided to design our own Cartesian robot that would satisfy the requirements. Figure 4 shows the robot’s design and Figure 5 shows the
Fig. 5. The Cartesian Robot
250
A. Kutiyanawala and V. Kulyukin
actual robot. Its design uses two linear actuators (part number: E57H42-2.7-007ENG from Haydon Switch and Instruments) having a range of 60cms. The actuators are placed perpendicular to each other to ensure scanning along the X and Y axes, respectively. An earlier design had one linear actuator being driven by another linear actuator so that the PRF surface would be on the ground and the antenna would move in a raster pattern. Since the two linear actuators were connected to each other, there were issues with balancing the entire system due to uneven weight distribution. This design had to be abandoned in favor of the current design that disconnects the actuators from each other. In the current design, one linear actuator drives a table-top (on which the PRF surface rests) along the X axis while the other actuator drives the antenna along the Y axis. Thus, the antenna can scan the surface along a raster pattern. The table-top rests on two sliders that restrict its movement along the X axis. The antenna is placed on a wooden block and two wooden guides restrict the movement of this block to the X axis. Since most of the construction is done with wood, the ferromagnetic interference is minimal. The electronic subsystem is as shown in Figure 6. The linear actuators are connected to their respective stepper motor drivers (part number: DCM 8028 from Haydon Switch and Instruments). These drivers move the stepper motors in the actuators by receiving pulses from an OOPIC microcontroller that is connected to the PC/Laptop via a USB to Serial converter. The amount of movement, and hence the resolution of the scan, is controlled by the number of pulses given to the stepper motor driver. The RFID antenna (Series 2000 stick antenna from Texas Instruments) is connected to a RFID reader (Series 2000 Standard Reader from Texas Instruments) connected to the PC/Laptop via another USB-Serial Converter. A Java program runs on the PC/Laptop and communicates with the OOPIC microcontroller and RFID reader to scan the PRF surface. To scan the PRF surface, the PRF surface is placed on the table-top and the actuators are reset to their default positions. The antenna is then moved along the Y axis by a
Fig. 6. Electronic Subsystem of the Cartesian Robot
A Cartesian Robot for RFID Signal Distribution Model Verification
251
small amount (equal to the resolution of the scan). A transponder on the PRF surface is read n times. The boundary of the read area is fuzzy and usually the reader reads the transponder only m times, where m ≤ n instead of n times. This helps us identify the fuzziness of the boundary of the read area as a percentage of times the transponder can be read at that place. While performing the experiments, we chose a value of n = 3. A higher value of n will help in identifying the fuzziness of the boundary with a higher resolution but it will also take more time to scan. The value of m and the corresponding position (x, y) are saved in the database. The antenna is moved again along the Y axis and the transponder is read again. This procedure is repeated until the antenna is at a distance of 50cms from the starting position. Now the table-top is moved by a small distance (again equal to the resolution of the scan) and the PRF surface is scanned again by moving the antenna backwards along the Y direction and attempting to read the tag. The antenna is moved backwards until it returns to its original location. The table-top is once more moved along the X axis and the PRF surface is scanned along the Y axis. This procedure is repeated till the table-top is at a distance of 50cms from the original location. This procedure ensures that the PRF surface is scanned uniformly along both the dimensions. The range and resolution of movement along either axis can be controlled independently from the computer.
5 Experiments with the Cartesian Robot 5.1 Verifying Localization Probabilities of PRF Surfaces Each of the four algorithms used to design PRF surfaces produces the localization probability of the PRF surface. The localization probabilities are computed assuming that the read area of the transponder is a circle centered on the transponder. In our first experiment, we wanted to investigate how well the localization probabilities computed by the algorithms approximate the actual localization probabilities. We designed a PRF surface using each of the four algorithms mentioned and noted down the algorithmic localization probability for each PRF surface. Then we built the PRF surfaces using the placement pattern computed by the algorithm and placed the surface on the table-top of the Cartesian robot. The robot was then made to scan the surface. For each position (x, y) of the antenna with respect to the transponder, it was noted if the antenna could read the tag or not. Finally, the robot’s output was analyzed to produce the actual localization probability. All four algorithms were provided with the same inputs for designing the surface: width = 30cms; length = 60cms; number of transponders = 2. Table 1 shows the positions of the transponders on the PRF surface and compares the algorithmic and actual localization probabilities for the four surfaces. The designs generated by both Greedy methods are the same and the designs generated by the Brute Force method and the Hill Climbing method are very similar. The Brute Force and Hill Climbing methods provided better designs than the Greedy methods and this can be observed from their localization probabilities. The last two columns of the table show that the actual probabilities of localization compare well to the algorithmic ones. The results of a t-test for this sample shows that the differences between
252
A. Kutiyanawala and V. Kulyukin Table 1. Verification of Probability of Localization using the Cartesian Robot
Brute Force Method Static Greed Dynamic Greed Hill Climbing Method
Tag 1 (15, 11) (15, 18) (15, 11) (16, 19)
Tag 2 Theoretical Probability Actual Probability (15, 29) 87.77% 88.72% (15, 40) 79.22% 81.22% (15, 29) 79.22% 81.22% (15, 40) 87.77% 89.33%
the algorithmic and actual probabilities are statistically insignificant. The slight differences can be attributed to the coarseness of the data sample and discretization errors in calculating the actual probabilities and approximating the read area as a circle in the algorithms. 5.2 A Signal Distribution Model of an RFID Transponder We also conducted experiments with the robot to verify the isofield diagrams from Texas Instruments and to construct a mathematical signal distribution model of the transponder. The procedure for performing both experiments was as follows. The transponder was placed on the table-top of the Cartesian robot such that it was at an angle of zero degrees with respect to the antenna, and a scan was performed. The transponder was then rotated in increments of 90 degrees (until it was back to its original orientation of zero degrees with respect to the antenna), and a scan was performed after each increment. This procedure ensured that we had the read area of the transponder for all orientations with respect to the antenna. This experiment was repeated for different heights (1.5 inches and 2 inches) of the antenna with respect to the transponder. The verification of the isofield diagrams was performed visually by comparing the pictures of the transponder’s read area obtained by the experiment and the ones provided by Texas Instruments in their manual. This procedure did not require high resolution scans of the transponder. Therefore, the robot took readings every 10mm along the X and Y axes. Figures 9(a) through 10(d) show the output (read areas of the transponder) of the eight scans (four scans for a given height). The transponder is denoted by a small square shaped dot. The read area is represented by rings of various shades of gray. A black ring means that the transponder was read in every time whereas a light gray ring means that the the tag was read only in some times. The light gray colored rings are usually present at the boundary of the read area and imply that the boundary of the read area is fuzzy. The isofield diagrams obtained from Texas Instruments are inverted and appended to themselves and are shown in Figures 11(a) and 11(b). Figures show that our results are reasonably similar to the ones from Texas Instruments. Thus, the robot was able to verify their isofield diagrams. By looking at the read areas we can also see that they are symmetric and that the read areas for orientations differing by 180 degrees are similar. We can observe that the read area decreases in size as the height of the antenna with respect to the transponder increases. The transponder had to be scanned with a much finer resolution to obtain a mathematical model. Specifically, the robot took readings in increments of 2mm along both axes. This increased the number of readings by a factor of 25. Readings were taken
A Cartesian Robot for RFID Signal Distribution Model Verification
253
Fig. 7. Dividing the read area into four quadrants
at all four orientations and for various antenna heights with respect to the transponder. The readings were imported into MATLAB and segmented into four quadrants as shown in Figure 7. A mathematical model was developed for each quadrant. Since each quadrant looks like an oval, it was observed that developing a mathematical model in polar coordinates would be easier. The center of the quadrant was found by finding the center of mass for the quadrant. If the transponder could be read at a point, a mass of one was assigned to that point otherwise a mass of zero was assigned. Once the center of the quadrant was found, the distance (rφ ) of the farthest point having a mass of one along an angle φ was calculated and this pair (rφ , φ) was saved. Now a function (f (φ) = rφ ) was calculated by using the MATLAB function polyfit. Curve fitting was performed for different degrees of the resulting polynomial and the error between the curve-fitted polynomial model and the actual data was calculated using the MATLAB function polyval. Polynomial models having the least error were chosen as the final mathematical model. The following equations describe the mathematical model for each of the four quadrants: – Mathematical model for the first quadrant: f (φ) = −0.0006φ3 + 0.0167φ2 0.1915φ + 15.460 – Mathematical model for the second quadrant: f (φ) = 0.0001φ3 − 0.0018φ2 0.0051φ + 16.7399 – Mathematical model for the third quadrant: f (φ) = −0.0004φ3 + 0.0142φ2 0.0953φ + 17.2427 – Mathematical model for the fourth quadrant: f (φ) = −0.0006φ3 + 0.0151φ2 0.2012φ + 16.1954
− − − −
Figures 8(a) through 8(d) show the plots of the derived mathematical model and the actual data. It can be observed that the resulting error between the two is very low and that the derived polynomials are adequate models for this type of RFID transponder.
254
A. Kutiyanawala and V. Kulyukin
(a) Quadrant I
(b) Quadrant II
(c) Quadrant III
(d) Quadrant IV Fig. 8. Models for (a) Quadrant I (b) Quadrant II (c) Quadrant III (d) Quadrant IV
(a) 0 Degrees
(b) 90 Degrees
(c) 180 Degrees
(d) 270 Degrees Fig. 9. Antenna Height = 1.5 inches (a) 0 Degrees (b) 90 Degrees (c) 270 Degrees (d) 180 Degrees
These four polynomials can be integrated into one model by choosing the appropriate model at run time. One of the quadrants would be selected depending on the position
A Cartesian Robot for RFID Signal Distribution Model Verification
(a) 0 Degrees
(b) 90 Degrees
255
(c) 180 Degrees
(d) 270 Degrees Fig. 10. Antenna Height = 2 inches (a) 0 Degrees (b) 90 Degrees (c) 270 Degrees (d) 180 Degrees
(a) Isofield Diagram 0 Degrees
(b) Isofield Diagram 90 Degrees
Fig. 11. (a)Isofield Diagram 0 Degrees (b)Isofield Diagram 90 Degrees
of the antenna and the distance (d) and the angle (φ) of the antenna with respect to the center of the quadrant would be computed. The corresponding polynomial would give
256
A. Kutiyanawala and V. Kulyukin
the distance rφ = f (φ) for the angle φ. If d ≤ rφ , it is determined that the antenna lies in the read area and can read the transponder.
6 Future Work We intend to further refine the development of mathematical models of RFID transponders. One approach is to analyze the electromagnetic interaction between the antenna and the transponder. We can model the coils of the antenna and the transponders using well developed antenna models (monopole, dipole, etc) and analyze the electromagnetic signal distribution between them. Once we develop the parameterized model of the electromagnetic signal distribution, we can obtain the parameters of the model by performing more experiments with our robot. Another approach is to perform very fine resolution scans of the read area of the transponder using the Cartesian robot and develop a “black box” model of the transponder by curve fitting the data. This kind of model would be in polar coordinates centered on the transponder and will provide the maximum distance at which the transponder could be read along a particular angle. We would prefer to develop a three-dimensional model that also incorporates the height of the antenna with respect to the tag. If such a model is not possible, we would develop different two-dimensional models for different heights of the antenna with respect to the transponder.
7 Conclusion Optimally designed PRF surfaces can help reduce the cost of indoor localization. We have presented a Cartesian robot that can verify the optimality of PRF surface designs by computing the actual probability of localization on that surface. The robot can also be used to verify and develop signal distribution models of RFID transponders that can be used in designing more optimal PRF surfaces. We performed experiments to verify the optimality of PRF surface design by showing that the error between the actual and algorithmic localization probabilities is statistically insignificant. We also performed experiments to model the read area of a RFID transponder for different orientations and heights of the antenna. Using these read areas we verified the read area of the RFID transponder reported by Texas Instruments. We also developed a mathematical model of the transponder and shown that the error between the mathematical model and actual data is very low.
Acknowledgments The second author would like to acknowledge that this research has been supported, in part, through NSF CAREER grant (IIS-0346880) and three Community University Research Initiative (CURI) grants (CURI-04, CURI-05, and CURI-06) from the State of Utah.
A Cartesian Robot for RFID Signal Distribution Model Verification
257
References 1. Zita Haigh, K., Kiff, L.M., Myers, J., Guralnik, V., Gieb, C., Phelps, J., Wagner, T.: The Independent Life Style Assistant: AI Lessons Learned. In: Proceedings of the 2004 IAAI Conference (2004) 2. Pollack, M.: Intelligent Technology for the Aging Population. AI Magazine (2005) 3. Kautz, H., Arnstein, L., Borriello, G., Etzioni, O., Fox, D.: An Overview of the Assisted Cognition Project. In: Proceedings of the 2002 AAAI Workshop on Automation as Caregiver: The Role of Intelligent Technology in Elder Care (2002) 4. Marston, J., Golledge, R.: Towards an Accessible City: Removing Functional Barriers for the Blind and Visually Impaired: A Case for Auditory Signs Technical Report, Department of Geography, University of California at Santa Barbara (2000) 5. AMS Project Autonomous Movement Support Project, http://www.ubin.jp/press/pdf/TEP040915-milt01e.pdf 6. Patterson, D., Fishkin, K., Fox, D., Kautzh, H., Perkowitz, M., Philipose, H.: Contextual Support for Human Activity. In: Proceedings of the 2004 AAAI Spring Symposium on Interaction between Humans and Autonomous Systems OverExtended Operation (2004) 7. Willis, S., Helal, S.: A Passive RFID Information Grid for Location and Proximity Sensing for the Blind User University of Florida Technical Report number TR04-009 (2004) 8. Kantor, G., Singh, S.: Preliminary Results in Range-Only Localization and Mapping. In: IEEE Conference on Robotics and Automation (2002) 9. Hahnel, D., Burgard, W., Fox, D., Fishkin, K., Philipose, M.: Mapping and Localization with RFID Technology Technical Report, IRS-TR-03-014, Intel Research Institute (2003) 10. Tsukiyama, T.: Navigation System for Mobile Robots using RFID tags. In: IEEE Conference on Advanced Robotics (2003) 11. Kulyukin, V., Kutiyanawala, A., Jiang, M.: Surface-embedded Passive RF Exteroception: Kepler, Greed, and Buffon’s Needle. In: Indulska, J., Ma, J., Yang, L.T., Ungerer, T., Cao, J. (eds.) UIC 2007. LNCS, vol. 4611. Springer, Heidelberg (2007) 12. Minghui, J., Kulyukin, V.: Connect-the-Dots in a Graph and Buffon’s Needle on a Chessboard: Two Problems in Assisted Navigation. In: Proceedings of the 10th Joint Conference on Information Sciences and the 10th International Conference on Computer Science and Informatics (JCIS-CSI-07), Salt Lake City, UT (July 2007) 13. Kulyukin, V., LoPresti, E., Kutiyanawala, A., Simpson, R., Matthews, J.: A RollatorMounted Wayfinding System for the Elderly: Proof-of-Concept Design and Preliminary Technical Evaluation. In: Proceedings of the 30-th Annual Conference of the Rehabilitation Engineering and Assistive Technology Society of North America (RESNA 2007) (2007) 14. Texas Instruments Antenna Reference Guide (1999), http://focus.ti.com/lit/ug/scbu025/scbu025.pdf
Self-Localization in a Low Cost Bluetooth Environment Julio Oliveira Filho, Ana Bunoza, J¨ urgen Sommer, and Wolfgang Rosenstiel Wilhelm-Schickard-Institut f¨ ur Informatik Department of Computer Engineering University of T¨ ubingen Sand 13, 72076 T¨ ubingen Germany {oliveira,bunoza,jsommer,rosen}@informatik.uni-tuebingen.de
Abstract. Personal navigation systems enjoy great popularity. This work presents an indoor self-localization system that resembles outdoor GPS-Positioning. An user-friendly Bluetooth-based solution has been implemented that provides the user with orientation information within a building. Bluetooth technology allows to keep a low budget and low power system. Our region-based localization method is tailored to run completely on the personal platform, and though assures user privacy. Our approach stands out with platform independence, privacy and power awareness.
1
Introduction
Location Based Services (LBS) have become a common place. To offer such services the underlying technology has to be capable to do both, localization and data communication [1]. Regarding this item, Bluetooth has been designed as a cable replacement technology, so connectivity and communication is one of its strengths, but not positioning. Even though, this work argues that it is possible to implement reliable indoor localization in a low power, low cost, Bluetooth based environment. Our approach, the region-based localization, relies on a number of distributed low cost fixed Bluetooth nodes and a mobile agent, typically a mobile phone or PDA. When entering a building equipped with our system, the mobile agent may download the navigation software along with the building information. That is done with user confirmation. This software enables the mobile node, sending periodically inquiries on demand to the fixed Bluetooth nodes in range. After evaluation of the received inquiry responses the actual location is determined with a reliability of 96 %. Privacy concerns are respected because the localization software is completely executed by the mobile agent. No interconnections of the fixed Bluetooth nodes
This work is supported by the F¨ orderprogramm Informationstechnik BadenW¨ urttemberg (BW-FIT) and Programme Alban, European Union Programme of High Level Scholarships for Latin America, n. E04D045457BR.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 258–270, 2008. c Springer-Verlag Berlin Heidelberg 2008
Self-Localization in a Low Cost Bluetooth Environment
259
are necessary. Moreover there is no futile traffic, as the inquiry state of the mobile node is only taken on demand, and the fixed nodes are pending from the low power inquiry scan to inquiry response and back only when receiving an inquiry. This paper is organized as follows. In Section 2, we present an overview of the state-of-the art of location sensing systems. Section 3 introduces the theoretical background of our region-based approach to Bluetooth localization. Section 4 describes the experimental setup we implemented while section 5 presents selective results of our work. The paper ends with the conclusion and points out some future work in section 6.
2
State-of-the-Art
In Fig. 1 an overview of the common real time locating systems is sketched. With regard to the user’s instrumentation, the assortment of available indoor localization enabling devices reduces to Bluetooth, WLAN and GSM/UMTS. In comparison to the accuracy of infrared [2] and ultrasonic [3] systems within subdecimeter range, Bluetooth, WLAN [4] and GSM [5] offer precision of about some few meters under good conditions with the advantage of ubiquitous availableness. These technologies may be applied almost everywhere and are available in a broad range of consumer equipments. Bluetooth has become a quasi standard in consumer short-range wireless connectivity. Empirical studies on using Bluetooth signal propagation for localization can so far be divided into reachability or cell-of-origin (COO) methods [6] [7] [8] and received signal strength evaluation (RSSI) [9] [10] [11] [12]. In some investigations the so called bit error rate (BER) was used as an additional indicator [9]. RSSI measurements performing well in outdoor environments lost a considerable amount of accuracy in buildings [7]. There can also be learned a lot of previous 802.11 relied investigations: RADAR [13] for signal-strength-based measurement is proposing a dual mode operation, VORBA [14] is taking angle of arrival data into account, and G¨ unther et Hoene present in [15] a novel method adapted from statistical analysis of round-trip-times (RTT). A comparative research mapping RSSI, RTT and reachability to three different radio frequency techniques from Fig. 1 (Bluetooth, WLAN, RFID) is carried out in [16].
indoor
LF,HF , UHF
ISM
BT
WLAN
ZigBee
RFID
IR
Active Badge
outdoor
Ultra sound
Active Bat
visual
Visual Tags
mobile phone
GSM
UMTS
satellite
GPS
GloNass
Galileo
Fig. 1. Classification of locating systems with regard to frequency spectrum
260
J.O. Filho et al.
The approaches which allow direct distance estimation from sensor data, like signal strength (RSSI) or response time (e.g. RTT, ToA, DToA), have been well studied. Since cost-effective Bluetooth stacks do often not provide a proximate access to HCI interface where signal strength value can be read our solution seeks to make the best of reachability readings using probabilistic calculus and optimization. The most significant distinctive is the kind of representing the estimated location in range free localization algorithms, where trilateration cannot be directly applied. A single point representation for instance as a result of triangulation or trilateration is agnostic to ambiguity and noise, even small changes in the error prone sensing process can result in excursively different position estimates. In our approach a location is characterized by regions which consists of all points satisfying the decission rules to a certain extent. Similar region-based representation was proposed by Galstyan et al. [17] and Guha et al. [18]. Like in our approach negative information is used as well for building up regions. In comparission to Sextant we are not limited to regions which are composed of B´ezier curves, and the threshold percentage of succesful transmissions is optimized in regard to degree of confidence. Unlike Galstyan our location constraints are not build upon on sensing models like radial binary detection or distance bound estimation.
3
Principles of Region-Based Localization
We propose a new and systematic method for assembling the points of a region based on the probability of succesful communication between sender and receiver. Our main advantage is to control the reliability of the localization during the region building phase. Like in other approaches, typical regions may extend over a large area. So we allow these regions to overlap, like depicted in Fig. 2. That overlap creates other subregions called sectors. In order to indicate that the object is in one sector, it is necessary to identify that it is simultaneously within all the regions that compose that sector. The main advantage of our regionbased methodology is to obtain a high degree of confidence when asserting the presence of a device in a region. Accuracy may be achieved increasing the number of regions, and allowing them to overlap creating sectors. On our approach for self-localization, the mobile device to be located runs itself one decision rule to determine whether it is within one region of the space or not. The decision rule observes some measurable aspects of the system and tries to assert or negate the presence of the mobile in the region. A correct decision is made whenever the decision rule is evaluated positive and the mobile device is to be found in the region. Similarly, the decision rule is also correct when it evaluates negative and the mobile device is outside the region. The probability of meeting a correct decision CdΩ about one region Ω can then be evaluated as P r(CdΩ ) = P r(d = 1, Md ∈ Ω) + P r(d = 0, Md ∈ Ω)
(1)
Self-Localization in a Low Cost Bluetooth Environment
261
Sector
Regions
Fig. 2. Regions and sectors form the base for region-based localization
assuming that the decision rule d evaluates to 1 if it is decided to say that the device Md is in Ω, and evaluates to 0 otherwise. In order to illustrate this concept, consider the diagram in Fig. 3. We assume that a priori a mobile device may be equally found in points a, b, c or d of the space. Additionally, one event X can be observed with probability pa = 0.7, pb = 0.3, pc = 0.6 or pd = 0.3, respectively. Lets also assume the mobile device runs the following decision rule 1 , if X was observed. d(X) = (2) 0 , if X was not observed. According to this, the mobile device decides that it is in region Ω after observing the occurrence of the event X. Equivalently, it decides to be outside the ¯ Though, we may rewrite the equation 1 region if the event is not observed (X). considering the occurrence of the event X, as depicted in equation 3. The space can also be described as a bi-partition of points ωi that are exclusively whether in- or outside the region. So, the expression can be extended as in equation 4 using the total probability law. Considering our example, we are able to evaluate the probability of taking a correct decision in Ω to be 0.675.
Ω 70%
a
30%
b
60%
30%
c
d
Fig. 3. Calculating the degree of confidence for Ω
262
J.O. Filho et al.
¯ Md P r(CdΩ ) = P r(X, Md ∈ Ω) + P r(X, ∈ Ω) ¯ d ∈ Ω)P r(Md ∈ Ω) (3) = P r(X|Md ∈ Ω)P r(Md ∈ Ω) + P r(X|M =( P r(X|Md is in ωi )P r(ωi |Md ∈ Ω))P r(Md ∈ Ω) + ωi ∈Ω
(
¯ d is in ωi )P r(ωi |Md ∈ Ω))P r(Md ∈ Ω) P r(X|M
(4)
ωi ∈Ω
= (0.7 ∗ 0.5 + 0.6 ∗ 0.5)0.5 + (0.7 ∗ 0.5 + 0.7 ∗ 0.5)0.5 = 0.675 Based on that concept, we call P r(CdΩ ) the degree of confidence with which one decision can be taken for a region. We also call region of confidence, and designate Ω ρ0 , a region whose degree of confidence is at least ρ0 , or equivalently, P r(CdΩ ) >= ρ0 , ρ0 ∈ [0, 1]. Implementing our localization approach requires two steps: a training phase and a system-run phase. During the training phase: (a) we acquire the probability for one observable event using experiments, (b) we define one decision rule based on the observable event, (c) we define a minimum degree of confidence ρ0 to be reached on a region and (d) we calculate regions of confidence Ω ρ0 based on the chosen decision rule. During the system-run phase, the mobile device is programmed to track the occurrence of events, execute the decision rule, and decide in which regions it is to be found. Information relating the regions and physical areas of the building are also embedded in the mobile device. It allows the localization to be displayed in a user friendly way, such as in commercial personal localization systems. During the training phase, we mark a grid of points over the space where the localization should take place, as depicted in Fig. 4(a). Then we position one Bluetooth capable device in one fixed position Of ix . We call this point and the device installed on it fixed node. In this work, one BTnode was used to implement this role. From each point ωi of the grid, it is evaluated the probability that the BTnode on position Of ix responds an inquiry command started by a mobile device in ωi . According to the Bluetooth standard, an inquiry command is issued when the identification number, or address, of other device is not yet known. The inquired device answers to the inquirer with its ID-Number. We ωi
Ω(0.9)
O fix
95%
90%
90%
95%
90%
90%
95%
95%
95%
40%
40%
90%
90%
90%
40%
40% 40%
15%
15% 15%
0%
0%
pi
(a)
(b)
0%
Fig. 4. Assessing the probability of receiving a response for an inquiry
Self-Localization in a Low Cost Bluetooth Environment
263
use the probability to receive an inquiry answer at each point as our observable aspect of the environment, as depicted in Fig. 4(b). The next step is to determine a value ρ0 for the desired degree of confidence, and to propose one decision rule based on the observation of inquiries. Details on how that was done for our experiments are discussed in the next section. It remains to find a set of points ωi of the space to compose one region of confidence Ω ρ0 . We start by defining one region as the set of points ωi such that the probability O pωif ix of obtaining an answer from fixed node Of ix when an inquiry is started in ωi O is greater then a threshold value t. In other words, Ω(t) = {ωi ∈ space|pωif ix > t}. Fig. 4(b) depicts an example for a region Ω(0.9). Considering Ω(t) it is possible to calculate the probability of taking a correct decision for that region using equation Ω(t) 4. Now the problem is reduced to find one value of t that maximize P r(Cd ) conΩ(t) strained to P r(Cd ) >= ρ0 . If regions of confidence cannot be find that meet the constrain, it is necessary to substitute the decision rule with a more robust one. Another alternative is to include new observable aspects to the system. Specific details for our experiments will be discussed in the next section. When the training phase ends, we have, for each position Ofkix a confidence region Ωkρ0 . During the system-run phase, a mobile device running a self-localization software is allowed to move within the building. Periodically, the software issues one or more inquiry commands, depending on the decision rule adopted in the training phase. Because the mobile device only issues inquiries, there is no effective connections or change of personal informations. Such feature assures the privacy of the node realizing the location. Additionally, we just require the fixed nodes to be able to answer to inquiries. That has a twofold advantage. First, the hardware used to implement a fixed node may be very simple and cost effective. It may implement only the Bluetooth functions related to the inquiry command. Second, it consumes very low power, because the inquiry command is issued only on demand and very rapidly. In our experiments, the fixed node (Bluetooth chip) consumed only 3.3 mW when in inquiry scan mode and 29 mW when answering to an inquiry command, which takes only 625 microseconds. After acquiring information about the environment, the mobile device runs the decision rule to decide in which regions it is to be found. The software uses the information about regions to identify the sector where the mobile device is located. Additionally, each sector is correlated with the spatial information about the building. Though, it is possible to present the localization result as a region on the building floorplan map on the mobile device’s display. The next sections show the realization of a self-localization and personal navigation system using the concept explained here.
4
Experiments
The region-based localization methodology, explained in the last section, was applied to realize an indoor localization and personal navigation system. We describe our experiments considering the two phases of the implementation: training and system-run phases.
264
J.O. Filho et al.
4.1
Training Phase
A section of the Wilhelm-Schickard-Institut building, at the University T¨ ubingen was used as a real scenario for our experiments. Its basic floorplan can be seen in Fig. 5. A grid of points with one meter displacement among each other was set up over the area. During the training phase, the mobile devices were allowed to be positioned only at those points. Four points in this grid, depicted in Fig. 5 as Off1ix , Off2ix , Ofr1ix and Ofr2ix , were chosen to permanently accommodate one Bluetooth Smart Node (BTnodes), developed at the Swiss Federal Institute of Technology (ETH). Those points compose the set of fixed nodes. Each fixed node was placed considering aspects of the building topology, because we wanted to investigate how it later could affect the shape of the respective region of confidence. Off1ix was positioned at the middle of a straight corridor, while Off2ix was placed at a bent corridor corner. The regions of confidence for Off1ix and Off2ix are expected to spread along the corridor, with a smaller coverage in the second case, due to additional reflexions on the corner. Ofr1ix and Ofr2ix where placed inside a small and a big room, respectively. The idea was to investigate how the shape of the region of confidence would spread out of the room.
Fig. 5. Experimental set up. Points in the grid are spaced by 1 meter. Fixed nodes Of ix answer to inquiry commands issued by a cell phone placed at the indicated measuring points on the grid.
Self-Localization in a Low Cost Bluetooth Environment
265
Fig. 6. Probability maps for the answered inquiry commands
First task in the training phase is to measure the probability of the inquiry event for each point in the grid. That was accomplished executing the following procedure at each point. First, a BTnode and a Sony Ericsson V600i mobile devices were placed at the point. We used two different devices to keep track on the possible variation of using diverse equipments. Each device executed one hundred inquiry commands. In each round, it is logged which fixed nodes answered the command. When all the commands were issued, the rate of successful inquiries may be determined individually for each fixed node. Such rate is used as an estimator for Ok
the probability pωif ix , explained in section 3. The result of this procedure can be seen in Fig. 6, considering each fixed point individually. The variation on the rate of successful inquiries was very small between the two applied devices. The mobile devices obtained a high rate (nearly 100%) of answered inquiries from Off1ix and Off2ix , when they were located in the corridor. That is because they were in the line-of-sight of the fixed BTnodes. When leaving the line-of-sight or when in presence of obstacles, a fast drop-off appears and the mobile devices can hardly reach the fixed BTnodes. We assumed that the fixed nodes Ofr1ix and Ofr2ix would answer to all the inquiries done within the room they are placed with probability 1. In both cases there is a region in the corridor with a radius of about 10 meters where the probability to get an inquiry response is still high. Outside these bounds this probability vanishes very rapidly. During our experiments, the doors were kept open to avoid attenuation of the inquiry signal. However, we stated that doors and metalic obstacles do affect the measured probability and a detailed study about their effects is part of our on going work.
266
J.O. Filho et al.
Regions of Confidence. The regions of confidence Ωkρ0 for each fixed node Ofkix should now be determined. In order to do that, it is necessary to fixate the desired degree of confidence ρ0 and adopt one decision rule. For the remaining of this work, we arbitrarily set ρ0 = 0.9, which guarantees decisions with at least 90% of confidence for all regions. Additionally, we investigate two possible decision rules. The first decision rule is given in equation 5. It considers that the mobile device is in region Ωkρ0 if the fixed node Ofkix answers one inquiry command issued by the mobile device. 1 , if Ofkix answered the inquiry. d1 (one inquiry) = (5) 0 , if Ofkix did not answered the inquiry. The second decision rule considers the last three inquiry commands issued by the mobile device. It decides that such mobile device is in region Ωkρ0 if the fixed node Ofkix answered to at least two of the three commands. It can be described as d2 (three inquiries) =
1 0
, if Ofkix answered the at least two inquiries. , if Ofkix answered less than two inquiries.
(6) We are now able to determine the regions of confidence for one given fixed node Of ix . As an example, we consider the fixed node Off1ix . We grouped the points using a threshold value t as explained in section 3. Fig. 7 shows the obtained value for the degree of confidence ρ as a function of t ∈ [0, 1] considering d2 as the decision rule. The maximum probability of making a correct decision is 96.9%. It is achieved when t = 56. Our region includes every point of the measured grid 1 0,9 0,8
degree of confidence
0,7 0,6 0,5 0,4 0,3 0,2 0,1 0
threshold t
Fig. 7. Degree of confidence for Ω(t) of Off1ix
Self-Localization in a Low Cost Bluetooth Environment
267
where the number of answered inquiry commands lies between 100 and 56. For this region, we can take a correct decision with a probability 96.6%. Interestingly, that result also stands for regions defined with threshold t between 56 and 31. We keep using t = 56, which corresponds to the smallest regions with the higher degree of confidence. Smaller regions lead to greater accuracy. The same procedure was taken on determining the regions for the other fixed nodes. Table 1 shows the maximal value for the degree of confidence reached when using decision rules d1 and d2 . Decision rule d1 fails to provide regions of confidence with ρ >= ρ0 = 0.9 in two cases. Therefore, it should be rejected. Decision rule d2 , however, was able to provide adequate regions for all fixed nodes. It was adopted to be used in the system-run phase. The threshold value t used to build the region of each fixed node is also indicated on table 1. The regions defined for Off1ix , Off2ix , Ofr1ix and Ofr2ix may be seen in Fig. 8. Those regions have degree of confidence ρ equal to 0.969, 0.907, 0.907 and 0.941, respectively. Though, they are all regions of confidence Ω 0.9 . Table 1. Maximal degree of confidence according to decision rule Fixed Node
d1
d2
Final Threshold (t) when using d2
Off1ix Off2ix Ofr1ix Ofr2ix
94.5% 87.7% 85.7% 90.4%
96.9% 90.7% 90.7% 94.1%
56% 51% 51% 52%
Fig. 8. Regions of confidence calculated for Off1ix (top left), Off2ix (top right), Ofr1ix (bottom left) and Ofr2ix (bottom right)
268
J.O. Filho et al.
After the training phase, the environment is prepared for running the selflocalization and navigation system. The fixed nodes are installed and the regions were already determined. The system-run phase for our scenario is explained in details in the next section.
5
Running the Localization and Navigation System
The information for each region of confidence, obtained in the training phase, were merged under one unique building floorplan map. The result may be seen in Fig. 9(a). The floor plan view allows us now to split the space into sectors. Each sector corresponds to an area on the space ”covered” by one or more regions of confidence. Therefore, it is possible to relate each sector with the corresponding regions that cover it. Such information is tabulated and will be available to be used by the self-localization software. As an application example, a self-localization software was implemented using Java 2 Micro Edition and JSR-82 Bluetooth API. The implementation in Java allows the software to run on nearly every mobile device like a mobile phone or a PDA. The Java application has got two main functionalities. First, a localization mode enables the user to find out his location in the building. Second, a navigation mode allows the user to choose a destination. The system will navigate him to the selected place. When the localization mode is active, the software starts the Bluetooth service at the mobile device and issues three inquiry commands. The system analyzes the inquiry responses to find out which fixed BTnodes have answered and how often. Due to our decision rules a BTnode must answer at least two times to get a valid predicate. Answers of different regions are combined to determine in which sector the mobile device is to be found. After this evaluation, the resulting sector
(a) Regions overlap to form sectors on the building plan.
(b) Software output screen for the localization mode.
Fig. 9. Example system for localization and navigation
Self-Localization in a Low Cost Bluetooth Environment
269
is displayed. It takes about 30 seconds from starting the application to obtaining the actual location. Fig. 9(b) shows the output screen for the localization mode. Up to three mobile devices were allowed to start inquiry commands simultaneously in order to investigate concurrency issues in the system. However, there were not observed colision or delay problems. The navigation functionality is an extended localization mode, where the software on the mobile device periodically issues inquiry commands. After each round, it executes the decision rule d2 and refreshes the output screen accordingly. The whole process takes about 5 seconds. Note that all the operation is executed by the software on the mobile device. With the small, battery operated Bluetooth landmarks a minimal infrastructure hardware is required. Even though, the system is reliable enough to support localization and navigation services. Accuracy in the localization may be customized for each building by introducing new fixed nodes.
6
Conclusion and Future Work
We presented a low cost, low power, Bluetooth based self-localization and navigation system. The proposed region-based localization concept is suitable to be completely executed on the mobile device to be localized. Nevertheless, a high degree of confidence on the localization is achieved. Self-localization, without the need of changing personal information with the environment, makes our approach a safe and privacy aware solution. Our experience shows, that it is possible to build reliable, broad usable and low cost personal navigation systems using Bluetooth. The system was implemented in a real scenario at the University T¨ ubingen. It uses a Java software, suitable to be executed in any Java capable mobile device, such as mobile phones and PDAs. Accuracy is yet a point to be improved. That may be achieved optimizing the position of the fixed Bluetooth nodes that compose the environment infrastructure. A methodology to optimally place these nodes, investigating the influence of obstacles such as doors, and how the system behaviour when simultaneously operated by several mobile devices is part of our actual ongoing work.
References 1. Sommer, J., Hoene, C.: Indoor positioning applications. localisation in wireless environments using bluetooth and wlan. In: Third Heidelberg Innovation Forum (2006) 2. Aitenbichler, E., M¨ uhlh¨ auser, M.: An ir local positioning system for smart items and devices. In: ICDCSW 2003: Proceedings of the 23rd International Conference on Distributed Computing Systems, Washington, DC, USA, p. 334. IEEE Computer Society, Los Alamitos (2003) 3. Fukuju, Y., Minami, M., Morikawa, H., Aoyama, T.: Dolphin: An autonomous indoor positioning system in ubiquitous computing environment. In: WSTFES 2003: Proceedings of the IEEE Workshop on Software Technologies for Future Embedded Systems, Washington, DC, USA, p. 53. IEEE Computer Society, Los Alamitos (2003)
270
J.O. Filho et al.
4. Teker, U.: Realisierung und Evaluation eines Indoor-Lokalisierungssystems mittels WLAN. PhD thesis, Universit¨ at Bremen, Diploma thesis (2005), www.uni-bremen.de 5. Otsason, V., Varshavsky, A., LaMarca, A., de Lara, E.: Accurate gsm indoor localization. In: Beigl, M., Intille, S.S., Rekimoto, J., Tokuda, H. (eds.) UbiComp 2005. LNCS, vol. 3660, pp. 141–158. Springer, Heidelberg (2005) 6. Nilsson, M., Hallberg, J., Synnes, K.: Positioning with Bluetooth. In: 10th International Conference on Telecommunications ICT 2003, pp. 954–958 (2003) 7. Forno, F., Malnati, G., Portelli, G.: Design and implementation of a bluetooth ad hoc network for indoor positioning. IEE Proceedings - Software 152(5), 223–228 (2005) 8. Anastasi, G., Bandelloni, R., Conti, M., Delmastro, F., Gregori, E., Mainetto, G.: Experimenting an indoor bluetooth-based positioning service. In: ICDCSW 2003: Proceedings of the 23rd International Conference on Distributed Computing Systems, Washington, DC, USA, p. 480. IEEE Computer Society, Los Alamitos (2003) 9. di Flora, C., Ficco, M., Russo, S., Vecchio, V.: Indoor and outdoor location based services for portable wireless devices. In: ICDCSW 2005: Proceedings of the First International Workshop on Services and Infrastructure for the Ubiquitous and Mobile Internet (SIUMI), Washington, DC, USA, pp. 244–250. IEEE Computer Society, Los Alamitos (2005) 10. Madhavapeddy, A., Tse, A.: A study of bluetooth propagation using accurate indoor location mapping. In: Beigl, M., Intille, S.S., Rekimoto, J., Tokuda, H. (eds.) UbiComp 2005. LNCS, vol. 3660, pp. 105–122. Springer, Heidelberg (2005) 11. Wendlandt, K., Robertson, P., Berbig, M.: Indoor localization with probability density functions based on bluetooth. In: IEEE (ed.) PIMRC 2005, pp. 2040–2044. VDE Verlag (2005) 12. Subramanian, S.P., Sommer, J., Schmitt, S., Rosenstiel, W.: SBIL: Scalable indoor localization and navigation service. In: WCSN 2007: Third International Conference on Wireless Communication and Sensor Networks, pp. 27–30 (2007) 13. Bahl, P., Padmanabhan, V.N.: RADAR: An in-building RF-based user location and tracking system. In: INFOCOM, vol. (2), pp. 775–784 (2000) 14. Niculescu, D., Nath, B.: Vor base stations for indoor 802.11 positioning. In: MobiCom 2004: Proceedings of the 10th annual international conference on Mobile computing and networking, pp. 58–69. ACM Press, New York (2004) 15. G¨ unther, A., Hoene, C.: Measuring round trip times to determine the distance between wlan nodes. Networking, 768–779 (2005) 16. Vorst, P., Sommer, J., Hoene, C., Schneider, P., Weiss, C., Schairer, T., Rosenstiel, W., Zell, A., Carle, G.: Indoor positioning via three different rf technologies. In: 4th European Workshop on RFID Systems and Technologies (RFID SysTech 2008), Freiburg, Germany (2008) 17. Galstyan, A., Krishnamachari, B., Lerman, K., Pattem, S.: Distributed online localization in sensor networks using a moving target. In: IPSN 2004: Proceedings of the third international symposium on Information processing in sensor networks, pp. 61–70. ACM Press, New York (2004) 18. Guha, S., Murty, R., Sirer, E.G.: Sextant: a unified node and event localization framework using non-convex constraints. In: MobiHoc 2005: Proceedings of the 6th ACM international symposium on Mobile ad hoc networking and computing, pp. 205–216. ACM Press, New York (2005)
Penetration Testing of OPC as Part of Process Control Systems Maria B. Line1, Martin Gilje Jaatun1, Zi Bin Cheah2, A.B.M. Omar Faruk2, Håvard Husevåg Garnes3, and Petter Wedum3 1 SINTEF ICT, N-7465 Trondheim, Norway {maria.b.line,martin.g.jaatun}@sintef.no 2 Kungliga Tekniska Högskolan, Stockholm, Sweden {zbcheah,aofaruk}@kth.se 3 Google, Trondheim, Norway {hgarnes,wedum}@google.com
Abstract. We have performed penetration testing on OPC, which is a central component in process control systems on oil installations. We have shown how a malicious user with different privileges – outside the network, access to the signalling path and physical access to the OPC server – can fairly easily compromise the integrity, availability and confidentiality of the system. Our tentative tests demonstrate that full-scale penetration testing of process control systems in offshore installations is necessary in order to sensitise the oil and gas industry to the evolving threats. Keywords: Information Security, Process Control, Penetration Testing, OPC.
1 Introduction Process control systems (PCS) are used to support and control a specific industrial process. The setting for the kind of system discussed in this paper is oil production on offshore installations. Traditionally, hardware and software components in such systems were proprietary; designed for use in a specific context. The current trend is heading towards commercial off-the-shelf technologies because of the goal of integrated operations, which means extensive cooperation between onshore and offshore staff, and the possibility of controlling installations completely from onshore. This transition results in exposure to an entirely new set of security threats. Many components and applications in use were developed before this transition without any thought of making them intrusion-proof, because the access used to be more limited, both in the physical and the logical way. With respect to security, the ideal solution would be to build all parts of a process control system from scratch, taking all new threats into account. In reality this is not achievable; economically it would be a disaster for the operators, and a set of threats relevant today may change during the period of development. Also, there is the fact that new security flaws are usually introduced when developing new systems. Hence the brand new systems would not be perfect anyway. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 271–283, 2008. © Springer-Verlag Berlin Heidelberg 2008
272
M.B. Line et al.
The more realistic solution is to add security functionality or security measures where they are needed. In order to identify where the needs are, several components and applications already in use must be analyzed in detail. This paper presents how a test network was set up to simulate a process control system, and how intrusion tests were performed towards OPC. The concept of OPC is described in the following section. 1.1 OPC – OLE for Process Control OLE1 for Process Control (OPC) [1] is a mechanism for exchanging process control data among numerous data sources in process control system. OPC is implemented as a client-server-architecture. The OPC server aggregates data from available control units in the system. It is capable of reading and writing values from/to these units, and offers mechanisms for interaction with these values to an OPC client. OPC is widely used in process control systems due to its stability and reliability. Also, it allows transmission of double precision real numbers, which other protocols usually do not allow without rewriting [2]. The OPC interface is defined on Microsoft’s OLE/COM (where COM stands for Component Object Model) [3], and most OPC servers run on MS Windows servers, even though there exist some implementations for other operating systems, such as OpenSCADA. The OPC interface is implemented as a set of function calls to COM objects defined by the OPC server, and can be accessed by DCOM (Distributed COM) from other computers. Hence, OPC does not implement security, but makes use of the security offered by DCOM. We have used OPC Tunneller and OPC Simulation Server from Matrikon2 in our simulated process control system. The reason for choosing Matrikon is that it is a freely available and fully functioning OPC-server for use in a non-production setting. The simulation server is normally used for testing OPC connections and services before production servers are put to use. It is therefore considered to be a suitable alternative for a real OPC server. 1.2 Penetration Testing Penetration testing is a way of evaluating the security of a system by simulating attacks by a malicious user against the system. Such testing can be performed in different ways; by running known vulnerability scanners like Nessus3 and Nmap4, or more as research and testing of new and unknown vulnerabilities. Nessus is well-suited for testing systems in production, and it is low-cost. It is continuously updated with new modules for newly discovered vulnerabilities, and running Nessus gives a good indication for how secure a system is, without giving a 100% complete result, which is quite impossible to achieve in any ways. Testing of unknown vulnerabilities requires extensive knowledge of the system in focus and is hence out of the question for our tests this time, as time and resources 1
OLE stands for Object Linking and Embedding. http://www.matrikonopc.com 3 http://www.nessus.org 4 http://insecure.org/nmap/ 2
Penetration Testing of OPC as Part of Process Control Systems
273
puts limitations on our work. In the future we might be able to move on to deeper analyses of OPC including testing of unknown vulnerabilities. 1.3 Our Tests in Brief Our aim is to show how a malicious user with varying privileges – outside the network, access to the signalling path and physical access to the OPC server – can fairly easily compromise the integrity, availability and confidentiality of the system. Resources available to us include a lab where we can set up a test network that simulates a process control system, and software freely available on the Internet. We define a blue team and a red team, where the blue team is responsible for setting up the test network and detecting intrusions, and the red team will act as malicious hackers and run different sets of exploits/attacks. 1.4 Paper Outline Section 2 describes how the test network was setup. Section 3 is about vulnerabilities that apply to OPC, which will be exploited. Section 4 describes the performed attacks together with achieved results, while section 5 discusses our findings. We offer concluding remarks and directions for further work in section 6.
2 Test Network Setup Blue team set up a simulated process control system in a test lab [4]. The topology reflects a typical offshore network inspired by the SeSa method [5]. The network is divided into three subnets, as follows: • DMZ subnet • Admin subnet; with an OPC client • Process Network subnet; with an OPC server DMZ is an abbreviation for demilitarized zone. In common computer networks, a DMZ contains an organization's services that are available for access from untrusted networks, typically the Internet. Services in a DMZ are usually needed by external hosts in order to communicate successfully with hosts inside the network and vice versa. The purpose of the DMZ is to add an additional layer of security to an organization's network by having services that are frequently accessed housed in this layer. The Admin layer models a typical process control system setup where hosts in the Admin Network are privileged users that usually have access to the Process Network layer. In order for users in the Admin network to access the Process Layer services, they should have the proper credentials to do so. For example, a manager in the Admin layer might use OPC client software such as a Matrikon OPC Client to access information from an OPC server (which is placed in the Process Layer). In order for the manager to successfully do this, he has to have the proper login/password credentials.
274
M.B. Line et al.
The Process Network layer is the deepest layer of the system and houses the most critical services in an offshore network. If an attacker manages to take control over this network, he or she has succeeded in compromising the entire network. The specifications of the computers used in the network are listed in Table 1 below, and the topology is illustrated in Fig. 1.
Fig. 1. The network topology as set up by the blue team Table 1. Computer hardware specification Host Name
Operating System
# NICs
Gateway Honeywall DMZ Gateway DMZ Host Admin Gateway OPC Server
Linux Fedora Core 6 Linux Fedora Core 6 Linux Fedora Core 6 Linux Fedora Core 6 Linux Fedora Core 6 Windows XP SP2
3 3 2 1 2 1
Penetration Testing of OPC as Part of Process Control Systems
275
The honeywall and the router are for administrative and monitoring purposes. The OPC client is placed on the Admin subnet, while the OPC server resides on the process network subnet. 2.1 The Honeywall A honeypot is a host set up to detect, monitor and/or trap attackers, and a honeywall is a host that is usually placed between honeypot(s) and non-honeypot components in a network. It is regarded as a bridge between honeypots and regular networks segments. The two main tasks for a honeywall are traffic monitoring and intrusion prevention. Our honeywall has an IP address on only one interface, and we use the gateway to block anyone trying to access it. This interface is used as a remote management interface, while the other two interfaces (without IP addresses) act as the bridge. The bridge operates like a hub; passing traffic along from one interface to the next. As a honeywall, it also reads, copies, and analyzes all the passing traffic. A honeywall should be “non-existent” from a logical view. No one will know there is a honeywall host present unless that person enters the laboratory and sees our network. 2.2 The Gateway All traffic that enters or exits the network has to go through the gateway. If the gateway crashes, it means that the connection between our test network and the Internet is broken. In addition, the gateway performs firewall filtering and Network Address Translation (NAT). The purposes of the main gateway are: • • •
Internet entry-exit point Firewall packet filtering Firewall NAT
Our network provides an open port (21379) for the OPC client to connect to our OPC server using OPC tunnelling. Furthermore, the services SSH (port 22) and HTTPS (port 443) are also open for remote connection. An attacker might wish to exploit such open services and try to attack remotely from the Internet. We selected iptables5 as our choice of firewall implementation due to its flexibility. The firewall not only blocks traffic, but shapes traffic, too. Traffic can be denied or allowed based on port numbers, host names and IP addresses. For our case we use the filter table and NAT table in iptables. In the filter table of iptables, we used ALLOW ALL. Any traffic may come in and out of the network. The only exception is that we have configured our gateway to deny access to the private IP address of 10.0.0.X. The 10.0.0.X is the private network between the gateway and the honeywall; we want this subnet to be logically invisible so that no one can access this subnet. Usually, ICMP (Internet Control Message Protocol) is used to ping a host to determine if it is alive. We have configured the firewall to reply with “icmp port unreacheable” when anyone tries to ping the 10.0.0.X subnet. This will let others think that the subnet does not exist. The only way 5
http://www.netfilter.org
276
M.B. Line et al.
to access this subnet is via HTTPS from the Internet. In this case, the outside host will not know that the subnet exists because NAT has already translated the addressing. The NAT table in iptables allows us to forward traffic to certain ports to ports that we decide. Since we only have one public IP address, NAT directs the traffic to the appropriate destinations. In our configuration we forward traffic meant for the ports 3389 (RDP)6, 21379 (OPC Tunneller) and 135 (RPC)7 to the OPC server, and 443 (HTTPS)8 to the honeywall. 2.3 Intrusion Prevention We do not want an attacker that has successfully compromised our network to launch attacks from our network against others. However, we do not want to deny all outbound traffic, only malicious traffic. Outbound informational queries, such as ICMP ping, finger query, or a simple HTTP GET command should be allowed. To realize this we deploy a special snort function called snort_inline9 that is integrated in the Honeywall. This function lets us limit outbound, but welcome inbound, malicious traffic. One way to limit outbound traffic is to drop all outbound malicious packets; another is to limit the amount of outbound traffic per time unit – we decided to do both. Limiting outbound traffic is especially important regarding preventing Denial-ofService attacks from our network towards others. We do not set the outbound traffic to zero, as this will arouse the attacker's suspicion as to why there is no outbound connections allowed. 2.4 OPC By default, Matrikon OPC Tunneller does not use encryption or authentication to connect with another OPC Tunneller. We can, however, use a shared secret to encrypt and authenticate communication between client and server side OPC Tunnellers. This option was not employed in our tests. Throughout the penetration testing we have assumed that the IP-address of the OPC server is known to the attackers (red team). 2.5 Red Team’s Equipment Red team set up a Matrikon OPC Simulation Server with Windows XP SP2 and all available updates. The server was set up as described in the guide published by CERN [6]. The password for this server was known by the red team, as the login service was not to be tested. This server is in addition to the blue team simulation OPC-network. Besides this test server, red team operated with two regular computers and a switch that connected their units with “the Internet”, where “the Internet” is represented by a switched Ethernet LAN.
6
3389: Needed for remote desktop management of the OPC server. 21379 and 135: Needed to run the OPC server. 8 443: Needed for configuration of the honeywall via a browser. 9 http://snort-inline.sourceforge.net 7
Penetration Testing of OPC as Part of Process Control Systems
277
3 Vulnerabilities in OPC In order to map vulnerabilities related to OPC, we have to consider vulnerabilities related to DCOM and RPC. This is because OPC is based on DCOM, which uses RPC, as mentioned in the Introduction. Frequently used vulnerability databases like the National Vulnerability Database (NVD)10, the Open Source Vulnerability Database (OSVD)11, and the database provided by United States Computer Emergency Readiness Team (US-CERT)12 reveal 555 vulnerabilities in RPC. 71 vulnerabilities related to DCOM are listed, and 40 OPC-specific vulnerabilities are reported. Here are some examples of vulnerabilities: − NETxAutomation NETxEIB OPC Server fails to properly validate OPC server handles − MS Windows RPC DCOM Interface Overflow − MS Windows DCOM RPC Object Identity Information Disclosure − MS Windows DNS RPC buffer overflow − MS Windows RPC service vulnerable to denial of service − MS Windows RPCSS Service contains heap overflow in DCOM request filename handling − MS Windows 2000 RPC Authentication Unspecified Information Disclosure These vulnerabilities open for buffer overflow, Denial-of-Service, and information disclosure; especially in the case of RPC. Carter et al [7] describe OPC-specific vulnerabilities in detail, and the main points of their discussion are summarized below. Lack of Authentication in OPC Server Browser: Configuration guidance from many vendors recommends allowing remote Anonymous Login so that OPCEnum will work when DCOM Authentication is sent to “None”. If a buffer overflow is discovered in the OPC Server Browser code, the result could be arbitrary code execution or Denial-of-Service attack against any computer running the OPC Server Browser. Fortunately, such an overflow has not been discovered yet. Lack of Integrity in OPC Communications: The default DCOM settings do not provide message integrity for OPC communication. If the underlying network is compromised and the attacker can sniff and insert traffic, it is likely that rogue messages could be injected once the client and server are authenticated during the initial connection establishment. A number of “Man-in-the-Middle” tools and techniques are available, and it is likely that these could be modified or enhanced to conduct attacks against OPC communication. Lack of Confidentiality in OPC Traffic: Although DCOM supports message encryption, most OPC vendors do not recommend enabling Packet Privacy for their OPC Server or the OPC Server Browser. Some vendors recommend VPN tunnelling as a means of providing secure remote access. Matrikon uses client- and server-side tunnelling component with encryption for this purpose. 10
http://nvd.nist.gov/ http://osvdb.org/ 12 http://www.kb.cert.org/vuls/ 11
278
M.B. Line et al.
In a different article, Lluis Mora [8] discusses the vulnerabilities of OPC servers. These include attacks using invalid server handle, invalid or crafted configuration file, resource starvation, etc. They have developed the tool “OPC Security Tester”/OPCTester13 for testing these vulnerabilities.
4 Exploits and Results Red team’s task was to act as attackers towards the test network and the OPC server specifically. Their initial knowledge included the IP address of the OPC server. In the following, the different types of mappings and attacks they performed are described, together with the results they achieved. More details about their work can be found in a separate report [9]. 4.1 Initial Network Mapping – Scanning and Probing the Network Many tools are available for gathering information about networks, operating systems and services offered by a network. Network administrators use such tools to manage network services, monitor hosts, service uptime etc. But as such tools are freely available; attackers can also use them for their own purposes. Nmap and Nessus can be used for scanning a process control system. Nessus also supports testing a list of vulnerabilities on a specific remote host. This can be thought of as the first step for any attacker in order to explore the list of services and ports exposed by a network. The introduction of a packet sniffer in the network clearly showed that in the case of setting the DCOM security level for the OPC server to "connect", all the OPC traffic is sent over the network in plain text, both from our test server and from the simulation network. Without any knowledge of the packet layout, we could still easily read out string values of the packets. Closer inspection and cross-checking with a proper OPC client lead us to also be able to identify other types of values in the packets, especially numerical values. This experience indicates that CERN does not value confidentiality of OPC data, as they have recommended this configuration setting. Further examinations of the network were done with the network mapper Nmap and the vulnerability scanner Nessus. Neither of these gave any indications of easily exploitable vulnerabilities in either our test server or in the test network. The information that was obtained was correct from both scanners on the operating system and other information known for our test server, and indicated Linux Fedora as the operating system running on the front end of the simulation network. However, these tools do not test OPC in particular, but rather the general setup of the servers with their operating systems and services. OPCTester performs automated tests of around 30 known vulnerabilities and access faults. We ran OPCTester to scan both the test server and the simulation network and did not get any useful results for either of the OPC servers. Run locally on our test server, OPCTester showed all available tests as passed without known vulnerabilities. 13
http://www.neutralbit.com/en/rd/opctest/
Penetration Testing of OPC as Part of Process Control Systems
279
To sum up, network mapping did not yield any information on exploitable technical issues on the simulation network nor on our test server. Network sniffing did clearly show that the confidentiality of a standard set up OPC-server is void. 4.2 Entering the Signalling Path As “the Internet” in our setup is a switched Ethernet, we utilized the well known technique ARP-spoofing in order to be able to realistically perform network sniffing. We used the tool ARPspoof for this purpose, which is included in the Linux software package dsniff14. With ARPspoof we continuously sent out gratuitous ARP-packets to two given IP addresses informing them that the other IP-address is located at the attacker’s MAC address. This way all traffic between the two hosts is switched onto the attacker’s port on the switch, effectively putting us (the attacker) in the middle. 4.3 Packet Sniffing A network packet analyzer, e.g. Wireshark15, can be used to capture network packets and analyze them for future attacks. In the case of OPC traffic communication between the server and client side tunnellers is by default not encrypted. Information about OPC server and groups and items added or updated in OPC server can hence be read. A Man-in-the-Middle attack can be used due to the lack of access control mechanisms. An ARP spoofing attack allows an attacker to spoof the MAC address to sniff, modify, redirect or stop the traffic to the IP address of the target system. By entering the signalling path we were able to read the packets sent between the client and the server. By monitoring the authentication process of the client we were able to read in clear text the user name and machine name of the client and the machine name of the server. The client response to the server challenge includes both the NTLM version 1 response and the LM response. There is no need for both of the responses which only leads to further reduction of the security. Both of these two protocols use DES, which is known to be a weak encryption algorithm. NTLM version 2 has been accepted as a de-facto internet standard as of 2000 [10], is widely used, and is deemed much more secure than NTLM version 1. 4.4 Denial-of-Service Attacks We used SYN flooding and ARP spoofing to launch a Denial-of-Service attack. TCP uses three-way handshake (SYN, SYN-ACK, ACK) to establish a session. In the case of SYN flooding the attacker sends several packets, but does not send the last ACK back to the server, or the attacker spoofs the source IP address of the SYN message, server sends SYN-ACK to the false IP address and never receives the last ACK. ARP spoofing can stop the traffic to the spoofed target system. With the previously mentioned middleman-status we performed a Denial-of-Service attack by simply dropping all the packets destined for both the spoofed hosts, and thereby acting as a black hole in the signal path. This attack totally destroyed the communication between the test server and the client as expected. 14 15
http://www.monkey.org/~dugsong/dsniff/ http://www.wireshark.org/
280
M.B. Line et al.
A second successful Denial-of-Service attack was performed as a SYN-flood attack run by a single attacker. On the test server, this attack not only disabled incoming communication from a client, but also slowed down the system so much that the server was practically unresponsive during the attack. The effect of the attack lasted until about one minute after the attack was called off by the attacker. As this attack used fake source addresses in the packets, this attack can potentially be very difficult to distinguish from genuine heavy load. As seen, both a black-hole method and a SYN-flood method of attacks were able to destroy the availability of the test system; the SYN-flood even resulted in lost local availability. 4.5 Man-in-the-Middle Attack A tool was written to do string replacement and forwarding of packets in the ARP spoofed network setup between the OPC client and the OPC test server. As the recommended setup of DCOM did not seem to contain any packet integrity mechanisms, we expected this attack to be able to compromise the integrity of the entire OPC network setup.
Fig. 2. Man-in-the-middle attack on OPC; packets between server and client are modified
Penetration Testing of OPC as Part of Process Control Systems
281
The tool we wrote simply replaced the string "Simulation" as sent from the OPC server A with the string "Real" and forwarded the modified packets to the OPC client B. Likewise the tool changed the string "Real" in the packets from the client to the server into the string "Simulation" and again forwarded these packets. Our tool was written as a proof-of-concept in the sense that the strings replaced were arbitrary. Any value and/or string can be replaced in the packets in the same way. The same tool can also be easily modified to shield the client from the server and in effect hijack the session. The resulting setup is described in Fig. 2. Table 2. Excerpt from printouts from OPC client from two different OPC sessions Without packet modification 2007-12-05 14:00:05,061 [main] DEBUG org.openscada.opc.lib.da. browser.BaseBrowser – Browsing with a batch size of 10 Leaf: Clients [Clients] Branch: Simulation Items Branch: Bucket Brigade Leaf: ArrayOfReal8 [Bucket Brigade.ArrayOfReal8] Leaf: ArrayOfString [Bucket Brigade.ArrayOfString]
Man-in-the-middle attack 2007-12-05 14:02:25,345 [main] DEBUG org.openscada.opc.lib.da. browser.BaseBrowser – Browsing with a batch size of 10 Leaf: Clients [Clients] Branch: Real Items Branch: Bucket Brigade Leaf: ArrayOfReal8 [Bucket Brigade.ArrayOfReal8] Leaf: ArrayOfString [Bucket Brigade.ArrayOfString]
As we can see from the sampled output in Table 2, we have a situation that to the eye looks genuine, but in reality is completely false. With the Man-in-the-Middle attack we were able to gain complete control of the system with the user privileges and access rights of the user in session. We were able to read, drop, change, and create packets as we saw fit in both directions, to the server and to the client. 4.6 Configuration Errors In the process of setting up the simulation network, the blue team installed an OPC Tunneller. As we connected to this OPC Tunneller, there were no access control mechanisms in place se we had complete access to the OPC server in the simulation network. The blue team also gave us instructions on how to set up access control on the OPC client, but these instructions led to the setup screen for the Windows XP DCOM run-as-user configuration management. We assume the instructions for the server setup was misunderstood such that the blue team in reality had set their OPC Tunneller to run as a specific user on the server rather than setting access control credentials needed for a client to log onto it. We can see that a small misunderstanding in the setup of the server led the red team, and in reality anyone, to take complete control of the simulation OPC server while the blue team was certain that access control was in place.
282
M.B. Line et al.
5 Discussion Process control networks in the oil and gas industry have traditionally had a very strong focus on safety, but safety-related risks are quite different from security-related risks. One way of differentiating between security and safety is to say that safety consists of protecting the world from the system, while security is protecting the system from the world [11]. Another aspect is that safety-related incidents normally have clearly defined statistical properties that relate to empirically established values for e.g. Mean Time Between Failures (MTBF) for components and systems. The same thing cannot be said for security-related incidents, since they predominately involve a human attacker – and we have yet to develop a useful statistical model for Homo Sapiens. In the early days of the internet, there was a Usenet newsgroup known as alt.folklore.urban16, devoted to the topic of urban legends. A recurring joke (or mantra) in this newsgroup was: “It could have happened, so it must be true.” Ludicrous as this statement may be in “real life,” this is exactly how we must think in the realm of computer security. The professional paranoia this reflects also testifies to a distinct difference in mindset compared to the safety domain. Our main contribution in this paper is to highlight the “could have happened” aspect of computer security in the Integrated Operations context. We have no empirical evidence that the specific attacks in our laboratory tests have been successfully wielded in a real PCS environment, although there are certainly news reports of other attacks against PCS’s in the past [12]. In our contact with representatives from the Norwegian oil & gas industry, we have been confronted with the sentiment “We don’t care what they say they can do over in the States; unless you can show us something that is directly applicable to our setup, we’re not interested.” This tells us that there is a need for full-scale penetration testing activities – and our tests indicate that such activities would yield interesting results. The most conclusive security breach that we successfully performed was achieved due to configuration error, which may at first glance seem to be an insignificant result. However, it is clear that the human factor plays a major role in the defence of systems as well as in their attack, and the safety concept of “fail safe” should clearly be applied to security settings, It may be argued that also the other attacks successfully implemented in our tests have limited value, since they all require physical access to the communications medium on some level, and these networks are all protected by multiple layers of firewalls and other protective mechanisms [5]. However, previous breaches tell us that not only is the insider threat an important factor in PCS environments [12], but the increasing integration of offshore and land-based networks also mean that the attack surface is increasing. This, in turn, implies that it is important to follow the doctrine of defence in depth, ensuring that there are sufficient (and independent) security mechanisms every step of the way from the external barrier to the heart of the PCS. Clearly, a protocol that uses outdated cryptography for authentication, and transmits data in plain text is not a good match for our doctrine. It is our opinion that the OPC implementation should implement NTLM version 2 or other more secure authentication protocols like Kerberos. 16
It’s still out there – see http://tafkac.org/
Penetration Testing of OPC as Part of Process Control Systems
283
6 Conclusion We have shown that confidentiality in OPC networks is non-existent in the default setup as recommended by CERN. Furthermore, the authentication process of the client reveals confidential information in clear text and gives a weak encryption of the client password. We have seen that DoS attacks can easily be accomplished, not only in making the server unavailable over the network, but also leading to denial of local control over the server. We have demonstrated a to the eye perfect breach of the integrity of the OPC server as a consequence of lacking DCOM packet integrity measures in the setup recommended by CERN, and we have demonstrated how fragile an OPC network is to configuration errors.
Acknowledgements The research presented in this paper was performed as part of project assignments at the Norwegian University of Science and Technology (NTNU).
References 1. OPC Overview 1.0, OPC Foundation 2008-01-17 (1998), http://www.opcfoundation.org/Downloads. aspx?CM=1&CN=KEY&CI=282 2. Understanding OPC and How it is deployed, Byres Research 2008-01-17 (2007), http://csrp.inl.gov/Recommended_Practices.html 3. DCOM Technical Overview, Microsoft Developer Network 2008-01-17 (1996), http://msdn2.microsoft.com/en-us/library/ms809340.aspx 4. Cheah, Z.b., Faruk, A.B.M.O.: Identifying and Responding to External Threats in a PCS Network. Norwegian University of Science and Technology Project Assignment, Trondheim (December 2007), http://sislab.no/blueteam.pdf 5. Grøtan, T.O., et al.: The SeSa Method for Assessing Secure Remote Access to Safety Instrumented Systems, SINTEF Report A1626, Trondheim (June 2007), http://www.sintef.no/content/page1_16321.aspx 6. Puget, M.B.J.-P., Barillere, R.: IT-CO recommended DCOM settings for OPC, CERN, Geneva (2005) 7. Carter, J., et al.: OPC Security. Digital Bond (2007) 8. Mora, L.: OPC Server Security Considerations. In: SCADA Security Scientific Symposium 2007, Miami, FL (2007) 9. Garnes, H.H., Wedum, P.: Innbruddstesting på prosesskontrollsystemer på oljeplattform, Norwegian University of Science and Technology Project Assignment, Trondheim (December 2007) 10. Zorn, G.: RFC 2759: Microsoft PPP CHAP Extensions, Version 2, The Internet Society (2000) 11. Line, M.B., et al.: Safety vs security? In: Eighth International Conference on Probabilistic Safety Assessment and Management, New Orleans, USA (2006) 12. GAO-07-1036 Critical Infrastructure Protection: Multiple Efforts to Secure Control Systems Are Under Way, but Challenges Remain, United States Government Accountability Office (2007), http://www.gao.gov/htext/d071036.html
Intersection Location Service for Vehicular Ad Hoc Networks with Cars in Manhattan Style Movement Patterns Yao-Jen Chang and Shang-Yao Wu Department of Electronics Engineering Chung Yuan Christian University Taoyuan, Taiwan 320 {yjchang,g9476025}@cycu.edu.tw
Abstract. For inter-vehicular communications that happen in metropolitan areas, moving cars travel on restricted directions along streets and make frequent stops over intersections. In this paper, we present Intersection Location Service (ILS), a distributed hashing-based location service algorithm that makes use of the features of street intersections and the Chord algorithm as the location query and fault tolerant mechanism. Performance comparisons of ILS with two well-known location service algorithms, Grid Location Service (GLS) and Hierarchical Location Service (HLS) are demonstrated in the ns-2 simulations of moving cars in various city environments. We have shown by means of simulation that ILS achieves good results in terms of query success ratios and remarkable scalability with respect to network size.
1 Introduction We assume that the nodes are able to determine their own position, e.g. by means of GPS. The task of locating the destination is a communication session is then fulfilled by a location service, which employs a distributed algorithm maintained by all the participating nodes in the Vehicular Ad Hoc Networks (VANETs). Most research results of location services have been focused on nodes whose motions follow patterns of random waypoints. However, we are looking at some protocol that introduces novel mechanisms to exploit the features of the urban scenario affecting the vehicular mobility such as the presence of intersections. For car-to-car communications that occur in metropolitan areas, moving cars obviously travel on restricted directions along streets and make frequent stops over intersections to gain right of ways according to local traffic rules. Several location services protocols including DREAM [10], Homezone [11] and GHLS [8] have been proposed in literature, by exploiting different approach for tracking the nodes' movements such as flooding-based, rendezvous-based, and hash-based. The solution proposed in the paper, called Intersection Location Service, is a variant of the hash-based approach. As main difference from the other schemes proposed in literature, the ILS algorithm is tailored for a vehicular environment. It does not use F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 284–296, 2008. © Springer-Verlag Berlin Heidelberg 2008
Intersection Location Service for Vehicular Ad Hoc Networks with Cars
285
flooding of information, and it includes fault-tolerant mechanisms. In the ILS scheme, each node is assigned a single location server, by hashing the node's unique identifier into an intersection identifier. Only vehicles moving in the intersections act as location servers, although every node is assumed to be capable of relaying packets. The duties of cars at temporarily empty intersections are transferred to cars at other nonempty intersections according to the proposed scheme. The location information cached by cars at a previously nonempty intersection is taken care of in a way that the subsequent queries on such information can be directed to the cars at a certain nonempty intersection that now takes over the responsibilities as location servers. The simulation analysis confirms the effectiveness of the ILS scheme in terms of query success ratios under multiple network scenarios. In addition, ILS performs better in street environments than two other well known location service protocols (GLS [7] and HLS [5]) designed without taking into account street environments. The remainder of this paper is structured as follows: In Section 2 we give an overview of related work. The ILS model is described in detail in Section 3. In Section 4 we present how the Chord algorithm is applied in the protocol of ILS. Section 5 contains the results of the simulation with ns-2. A comparison with two select location algorithms is also presented. The paper is concluded with a summary and future work in Section 6.
2 Related Work DREAM, the Distance Routing Effect Algorithm for Mobility [10], is a location service which uses flooding to spread position information. The capacity of the network is substantially decreased as a result of message flooding, especially in situations where the nodes make frequent position changes. A location algorithm that does not require flooding is Homezone [11]. Each node is assigned an area called Homezone via a hash function well known to all the nodes in the network. Location updates that reflect the changes nodes make to positions are sent to all the nodes in the designated homezone. Problems may arise when the homezone becomes empty, and some cars may get into situations where their default location servers are temporarily unavailable. The Grid Location Service (GLS) [7] divides the area of the entire ad hoc network into a hierarchy of squares forming a quad-tree. Each node selects one node in each element of every levels of the quad-tree as a location server. Therefore, the density of location servers for a particular node is high in areas close to the node and become exponentially sparse as the distance to the node increases. GLS has demonstrated good scalability in the random waypoint mobility model. Hierarchy Location Service (HLS) [5] partitions the area of the ad hoc network into cells. The cells are grouped into regions level by level. Therefore, a node uses a hierarchy of location servers to store its location information. The problem of empty cells where location updates or requests may run into is solved by temporary servers through a handover mechanism to bridge the gap. DREAM, Homezone, GHLS, GLS, and HLS are all designed with random waypoint motions in mind. The location services apply in MANETs where nodes can change directions at will; there are no fixed pathways for mobile nodes to follow.
286
Y.-J. Chang and S.-Y. Wu
However, there are increased needs to provide location services for MANETs that consist of vehicles moving along streets within metropolitan areas. ILS is proposed in this paper to address those needs. HLS and GLS, although not designed for street conditions, are selected for comparisons due to their good performance results for nodes in random waypoint motions.
3 Sketch of Intersection Location Service Protocol In this section, we briefly sketch the basic underlying idea of ILS, which will be detailed in the next section. In ILS, the street intersections covered by the network is numbered from 0 to N. In the ITS scheme, each node is assigned a single location server, by hashing the node's unique identifier into an intersection identifier. Cars within the radius r of the intersection M cache the location information of all the vehicles K for which M=Hash(K) where Hash is a hash function that can distribute cars evenly among the intersections. In an ideal case where all the street intersections are nonempty, the queries of location information about Car K where Hash(K)=M will be directed to Intersection M, the default location server for Car K. Due to the highly dynamic nature of VANETs, it is possible that some intersections are empty for certain periods of time. In case there is no designs of fault tolerance, location queries that are directed to temporarily empty intersections will lead to failures. A priori selection of backup location servers is of little use because any intersection that acts as a location server also suffers from becoming temporarily empty. Using the Chord algorithm [1,9]. Once a location server is determined for a certain query, the query issued by a car will be routed to the target location server with a position-based routing protocol like GPSR [4]. The position-based routing in ILS is done by the cars that serve as intermediate relays to overcome the multiple-hop delivery problems typical in MANETs.
4 Protocol of Intersection Location Service This section describes the ILS protocol which specifies how to find the locations of cars moving along the streets, how the unavailability of intersections can be overcome with relatively small overhead. 4.1 Consistent Hashing The consistent hash function assigns each intersection an m-bit identifier using SHA-1 [2] as a base hash function. Vehicles moving in the radius r of intersections are responsible as location servers to cars by caching their location information. Consistent hashing assigns Car K to a default location server M if M=SHA1(K), where SHA1 stands for SHA-1. However, a location server M can be nonexistent in the 2m space or M goes out of service because the intersection has no cars for a period of time and none of the location information can be cached in any one of the cars within the radius r of the center of intersection M. Therefore, a successor location server for car K, denoted as successor(K), is chosen to be the first location server whose identifier is equal to or follows SHA1(K). If identifiers are represented as an imaginary circle of
Intersection Location Service for Vehicular Ad Hoc Networks with Cars
287
numbers from 0 to 2m -1, then the successor(K) is the first node clockwise from M if M=SHA1(K). The circle is referred to as the Chord ring [1,9]. As an illustrative example, Fig. 1 shows a Chord ring with m=3. The Chord ring has 3 location servers and 4 cars. The successor of car 2 is location server 3, so location server 3 will store location information about car 2 while location server 2 is out of service.
7 7 3 6 2
5 4
Fig. 1. A Chord ring with 4 nodes and 3 intersections where cars with radius r act as location servers
4.2 Query for Car Location When a car requests for location service to acquire the location of other cars before car-to-car communication can be set up, it first submits its query to a nearby intersection. ILS is based on the Chord protocol that works by pruning candidate location servers by half at each iteration and the binary search continues until the exact location server is determined. It is not without price; ILS maintains additional information to facilitate binary search. Additional information is kept by each location server in a data structure, called the finger table [1,9]. As before, let m be the number of bits in the location server identifiers. A location server M maintains its finger table with up to m entries. The i-th entry in the table at location server M contains the identity of the first node S that succeeds M by at least 2i-1 on the identifier circle, i.e., S = successor(M+2i-1), where 1≤i≤m and all arithmetic is modulo 2 m. S is called the i-th finger of M, and is denoted by M.finger[i]. A finger table entry includes both the identifier of location server and its coordinates. Note that the first finger of M is the immediate successor of M on the circle and is referred to as the successor for convenience. Figure 2 is an example of finger tables maintained by vehicles with radius r of nonempty intersections. For a network with N location servers, the finger table at each location server has O(Log(N)) entries relating to information about only a small number of other location servers. The Chord algorithm to find a successor with the use of finger tables operates is as follows.
288
Y.-J. Chang and S.-Y. Wu
Finger 7+2i-1 =8, i =1 7+2i-1 =9, i =2 7+2i-1 =11, i =3
Finger 6+2i-1 =7, i =1 6+2i-1 =8, i =2 6+2i-1 =10, i =3
Successor 1 3 3
Successor 7 1 3
7
1
6
3
Finger 1+2i-1 =2, i =1 1+2i-1 =3, i =2 1+2i-1 =5, i =3
Successor 3 3 6
Finger 3+2i-1 =4, i =1 3+2i-1 =5, i =2 3+2i-1 =7, i =3
Successor 6 6 7
Fig. 2. Finger tables as cached by vehicles within the radius r of intersections
//For a location server M to find the successor of car K M.find_successor(K) { if( M
Intersection Location Service for Vehicular Ad Hoc Networks with Cars
Finger 1+2i-1 =2, i =1 1+2i-1 =3, i =2 1+2i-1 =5, i =3
289
Successor 3 3 6
7 1 7
Look up Car 7
6
Finger 6+2i-1 =7, i =1 6+2i-1 =8, i =2 6+2i-1 =10, i =3
3
Successor 7 1 3
Fig. 3. A car looking up the location of Car 7
vehicle at Intersection 7 that knows the location of Car 7 replies to the query and the result gets back to the originating vehicle. It is noted that only O(Log(N)) messages need to be generated in the query and reply process. 4.3 Dynamic Operation and Failure Recovery A fault-recovery mechanism based on the Chord algorithm is used to retrieve location information of vehicles when the corresponding location services go out of service. The location servers get informed about the arrival or departure of their successors or predecessors by periodically exchanging hello messages with them. Pinging nodes beyond successors or predecessors is not required. The Chord ring will become one node less with the departure of the last location server belonging to a certain intersection. The consequences are that its successor and predecessor will update their finger tables to reflect the new situation, which may have a ripple effect and trigger other location servers to perform their own table updates. And the process will continue until at some point in time all the finger tables in the Chord ring stabilize. A location server may enter an intersection n that went out of service earlier and join the network. The status is sent to intersection n’ where n’> n and n’ is most close to n. Finger tables of nodes in the Chord ring will undergo updates like ripples starting from the new location server until they stabilize. Therefore, ILS is capable of dealing with location servers joining and leaving the network.
290
Y.-J. Chang and S.-Y. Wu
4.4 Location Update When cars travel, the positions change all the time. A threshold d is set so that whenever a car travels a distance greater than d, a location update is triggered. The update is processed by the nearest intersection n of car k. If the intersection cannot be reached in one hop, geographic forwarding algorithm is invoked to forward the location update packet to n. Once the packet received by a car, say k*, which is within the radius of n, car k* triggers the sequence to find k’s successor, say p, by looking up its finger table. Within O(Log(N)) hops, the packet arrives at intersection p, where location update is performed by caching the information in the cars moving within the radius r of intersection p.
5 Simulation Results In this section, we evaluate the ILS algorithm by simulation with comparisons to GLS and HLS. We selected GLS and HLS as benchmarks because both are scalable, well understood [3] and has simulation programs available [5, 12]. The discrete event simulator ns-2 [6] version 2.29 was used with the IEEE 802.11 11Mbps MAC layer with transmission range of 250 meters. GPSR [4] was used for packet delivery by ILS, GLS, and HLS as the transport mechanism for location queries and updates. 5.1 Basic Simulations We started the experiments from the basic simulations with 5*5 imaginary streets in a square area and a total of 25 intersections, as shown in Figure 4. The distance between adjacent streets is 200 meters. The streets at the borders have margins of 50 meters. The cars are assumed to be randomly placed at the intersections initially. When cars reach intersections, they pause for 5 seconds and then decide whether to keep straight forward,
Fig. 4. The topology used in the basic simulation with 25 intersections and 5*5 streets in a 2 square of 900*900 m
Intersection Location Service for Vehicular Ad Hoc Networks with Cars
291
make a right turn, or make a left turn, with equal probability. After the pause, the cars also select new speeds according to a uniform distribution, and they maintain speeds and courses until the next intersection is hit. If the car hits the border of the simulation area, it has fewer choices of directions to make; U-turns are not permitted. For the simulation of HLS, we set the cells to be squares with length of 112.5 meters; 4 cells are grouped into regions of level one, 4 level-one regions forms 1 leveltwo region with 16 cells and so on. For the simulation of GLS, we also set the2length of order-1 squares to be 112.5 meters. Therefore, in the area of size 900*900 m , there are 64 order-one squares that are hierarchically grouped into 16 order-two squares, 4 order-three squares or 1 order-four squares. The car density influences the performance of position-based routing. Therefore, we set the value comparable to the ones used in the previous studies [4,5]. In order to understand the impact of car density on the three location service algorithms, we also performed simulations with a wide range of car density. Table 1 contains the parameters used for the basic simulations. Table 1. Parameters for the study of car speeds
Number of cars Area size (m2) Mean velocity (m/s)
80 900x900 10, 20, 30, 40
Velocity deviation (m/s)
5
Pause time (s)
5
Simulation time (s) Update threshold (m)
50 100
The performance evaluation is based on queries. A car sends a query which is a location look up for another car it wishes to set up communication with. The query success ratio (QSR) is the percentage of queries which have been successfully answered by the location service. A query is answered with a success if the location service determines the position of the target car with a precision of at least 250 m, namely the radio distance in our simulation. The results are shown in Figure 5. In the curves of Figure 5, ILS achieves success rates of 93% to 95%. This demonstrates the ILS strategy which keeps the location information in the intersections and the Chord algorithm which does the management work well together even for fast moving cars in a street environment. GLS performance varies between 86% and 89% while HLS varies between 75% and 83%. The QSR of HLS is consistently lower than
292
Y.-J. Chang and S.-Y. Wu
Fig. 5. The percentage of successfully answered queries of ILS, HLS, and GLS given velocities of averages 10, 20, 30, and 40 m/s
that of the other two algorithms and drops faster as speeds increase. The regions selected as location servers in HLS can become empty without cars and cause query or update failures; the situations get worse when car speeds get higher. On the contrary, ILS selects street intersections for location service and the cars make frequent stops at those intersections. As long as there are cars in an intersection, the intersection functions as the location server for the cars it is responsible for according to the Chord protocol. The probability of intersections going out of location service is lower than that of out-of-service regions in HLS, which does not take advantage of intersections. ILS is also better than GLS because at higher velocities GLS suffers from frequent changes of location servers. Significant overhead can result from such frequent changes that happen when cars move across the boundaries of squares in GLS. The next set of experiments studies the impact of car density on the query success rate of the three location services, where we vary the number of cars from 20 to 80 in the same area and the car speed is uniformly distributed in the interval [35,45] m/s. The other parameters remain the same, as listed in Table 2. The result can be found in Figure 6. ILS performs better again than GLS or HLS. At lower density of cars, the failures of packet forwarding in the location query or update happen at a higher frequency because it is more likely to find no cars that can serve as relay in the middle of the query or update process. However, the intersections that ILS is based on still have a higher probability to find cars that can either relay the query packets or hold the location information to be retrieved. In the meanwhile, the Chord algorithm within ILS works comparatively better by the use of successor mechanism in case the default street intersections turn empty for a certain period of time. Therefore, ILS is more immune to the effect of low car density than HLS and GLS. Although the design of HLS uses neighboring cells to back up the location service, the neighbors themselves are likely to be empty at the same time because of low car density. On the contrary, GLS assigns at most four mobile nodes, one for each square of the same order to hold the same copy of location information for the purpose of redundancy; therefore, for certain scales of area sizes, it outperforms HLS in terms of query success rates at various car densities.
Intersection Location Service for Vehicular Ad Hoc Networks with Cars
293
Table 2. Parameters for the study of car density Number of nodes (cars) Area size (m2) Mean velocity (m/s)
20~80 900x900 40
Velocity deviation (m/s)
5
Pause time (s)
5
Simulation time (s)
Update threshold (m)
50
100
In the next set of simulation, we study the performance of the three algorithms when car positions experience a higher degree of randomness. We do so by changing the pause time at intersections to be uniformly distributed in the interval [0,5] s instead of a constant of 5 s. In addition, the initial position of each car is set in a way such that its distance to the nearest intersection is uniformly distributed from [0,100] m instead of a constant of 0 m. We vary the car number from 20 to 80 and measure the query success ratio given the motion pattern described here. All the simulation parameters are the same as those in Table 2 except the pause time.
Fig. 6. The percentage of successfully answered queries of ILS, HLS, and GLS given the speed of the uniform distribution [35,45] m/s and car density varied from 20 to 80 nodes in the area 2
size of 900*900 m
294
Y.-J. Chang and S.-Y. Wu
Fig. 7. The percentage of successfully answered queries of ILS, HLS, and GLS given the car 2
density varied from 20 to 80 nodes in the area size of 900*900 m
The result is shown in Figure 7. In comparison to Figure 6, we see HLS and GLS experience a QSR improvement of 3% to 4% for the number of cars ranging from 30 to 2
80 in the area of size 900*900 m . The improvement goes up to 8% to 10% when the car number is as low as 20. In the meanwhile, ILS shows insignificant performance improvement as the degree of position randomness increases. Therefore, we see the performance of HLS becomes comparable to that of ILS. Since empty cells in HLS or empty squares in GLS degrade their query performance, the higher position randomness translates to smaller numbers of empty cells or squares and consequently higher success ratios. By contrast, the probability of the intersections becoming empty depends more on the number of cars that make stops at those intersections than the mobility model. Therefore, the curves of ILS in Figure 6 and Figure 7 are nearly identical. Table 3. Parameters for the large scale simulations Number of nodes
80, 180, 320
Area size (m2)
900x900, 1350*1350, 1800*1800
Mean velocity (m/s) Velocity deviation (m/s) Pause time (s) Simulation time (s)
Update threshold (m)
40 5 [0,5] 50
100
Intersection Location Service for Vehicular Ad Hoc Networks with Cars
295
5.2 Large Scale Simulations We are interested in the scalability of ILS with respect to network size. To evaluate it, we increase the size of area while maintaining a constant car density of 99 per square 2 km, namely 80 cars in an area of 900*900 m . The parameters of the large scale simu2 lations are presented in Table 3. We increase the size of area from 900*900 m to 2 2 1350*1350 m and 1800*1800 m . In the meanwhile, the number of cars also increases from 80 to 180 and 320 to keep the density unchanged. For ILS, the number of intersections increases from 25 to 64 and 100 as the size of areas increases. Both the size of cells in HLS and the size of order-one squares in GLS increase from 2 2 2 112.5*112.5 m to 168.75*168.75 m and 225*225 m . Therefore, there are 8*8 order-one squares in the simulation of GLS and 8*8 cells in the simulation of HLS. The mobility model is the same as the one used in Figure 7. The result can be found in Figure 8. For HLS and GLS, the scalability observed in the random waypoint mobility model worsens significantly in the street environment here. Since the cars can only move along streets, make stops only at intersections, and the choice of directions is limited to at most three, HLS and GLS can easily find themselves in the situations of certain empty regions. The larger the network is, the greater the number of such "holes" can be. ILS shows greater scalability with respect to network size. The probability of intersections becoming empty depends more on car density than the area size. However, as the area size increases, a successful query involves more relaying intersections, namely more hops. The success ratio becomes more susceptible to empty intersections that may exist. On the other hand, the recovery mechanism based on Chord protocol attempts to find successor intersections to bypass the out-of-service intersections.
Fig. 8. The percentage of successfully answered queries of ILS, HLS, and GLS given the area 2
2
2
size varying from 900*900 m to 1350*1350 m and 1800*1800 m
6 Conclusions In this paper, we have studied the performance of ILS, a location service algorithm that is designed to work in the city environments with streets and intersections. A
296
Y.-J. Chang and S.-Y. Wu
fault-recovery mechanism based on the Chord algorithm is used to retrieve location information of vehicles when the corresponding location services go out of service. The simulation analysis compares the performance of the proposed scheme with other two location based algorithms (GLS, HLS). The simulation results confirm the effectiveness of the ILS scheme in terms of increased success ratios under different network topologies. We also found the ILS algorithm not only remains robust at high car speeds but also demonstrates higher scalability than the other two with respect to network size.
References 1. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for Internet applications. In: Proc. ACM SIGCOMM 2001, San Diego, CA (2001) 2. Secure Hash Standard, U.S. Dept. Commerce/NIST, National Technical Information Service, Springfield, VA, FIPS 180-1 (1995) 3. Mauve, M., Widmer, J., Hartenstein, H.: A survey on position-based routing in mobile ad hoc networks. IEEE Networks, 30–39 (November/December 2001) 4. Karp, B., Kung, H.T.: GPSR: greedy perimeter stateless routing for wireless networks. In: Proceedings of the sixth annual ACM/IEEE International Conference on Mobile computing and networking (MobiCom 2000), pp. 243–254 (2000) 5. Kieß, W., Füßler, H., Widmer, J.: Hierarchical Location Service for Mobile Ad-hoc Networks. ACM SIGMOBILE Mobile Computing and Communications Review 8, 47–58 (2004) 6. The ns-2 network simulator, http://www.isi.edu/nsnam/ns/ 7. Li, J., Jannotti, J., De Couto, D.S.J., Karger, D.R., Morris, M.: A Scalable Location Service for Geographic Ad Hoc Routing. In: Proc. 6th Annual ACM/IEEE Int’l. Conf. Mobile Comp. Net., Boston, MA, pp. 120–130 (2000) 8. Das, S.M., Pucha, H., Hu, Y.C.: Performance comparison of scalable location services for geographic ad hoc routing. In: Proc.of IEEE INFOCOM, pp. 1228–1239 (2005) 9. Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup protocol for Internet applications. IEEE/ACM Transactions on Networking 11(1), 17–32 (2003) 10. Basagni, S., Chlamtac, I., Syrotiuk, V.R., Woodward, B.A.: A distance routing effect algorithm for mobility (DREAM). In: Proc. Fourth Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), pp. 76–84 (1998) 11. Giordano, S., Hamdi, M.: Mobility Management: The Virtual Home Region. Technical Report SSC/1999/037, EPFL-ICA (1999) 12. Käsemann, M., Hartenstein, H., Füßler, H., Mauve, M.: Analysis of a Location Service for Position-Based Routing in Mobile Ad Hoc Networks. In: Proceedings of the 1st German Workshop on Mobile Ad-hoc Networking (WMAN 2002), GI – Lecture Notes in Informatics, pp. 121–133 (2002)
Ubiquitous and Robust Text-Independent Speaker Recognition for Home Automation Digital Life Jhing-Fa Wang, Ta-Wen Kuan, Jia-chang Wang, and Gaung-Hui Gu Department of Electrical Engineering, National Cheng-Kung University No.1, Dasyue Rd., East District, Tainan City 701, Taiwan, R.O.C. [email protected]
Abstract. This paper presents a ubiquitous and robust text-independent speaker recognition architecture for home automation digital life. In this architecture, a multiple microphone configuration is adopted to receive the pervasive speech signals. The multi-channel speech signals are then added together with a mixer. In a ubiquitous computing environment, the received speech signal is usually heavily corrupted by background noises. An SNR-aware subspace speech enhancement approach is used as a pre-processing to enhance the mixed signal. Considering the text-independent speaker recognition, this paper applies a multi-class support vectors machine (SVM)[10][11] instead of conventional Gaussian mixture models (GMMs)[12]. In our experiments, the speaker recognition rate can averagely reach 97.2% with the proposed ubiquitous speaker recognition architecture.
1 Introduction This paper adopts a multi-class SVM as a classifier for speaker recognition. The basic idea of SVM is to use a separating hyperplane which maximizes the distance between two classes to create a classifier. A significant advantage of the SVM-based classification method is the robustness to a small number of training data samples. With the kernel trick, SVM can deal with very high dimensional feature data without altering its formulation. In this paper, three different kernel functions (linear, polynomial or radial basis) with various parameter setting [7] are assessed. Three types of experimental evaluations were conducted: (i) training and testing with individual microphone, (ii) training and testing with the same microphone; (iii) and multiple microphones. Instead of using some popular speech database such as NIST telephony speech database, all the speech utterances in this paper used are live recorded. The objective of generating live recoded speech database is to simulate the home like environment through at most 12 personnel family member utterances then to use for speaker identification. This paper is organized as follows. The proposed ubiquitous speaker recognition architecture is presented in Section 2. The architecture comprises the SNR-Aware subspace speech enhancement module, feature extraction module, and the SVM F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 297–310, 2008. © Springer-Verlag Berlin Heidelberg 2008
298
J.-F. Wang et al.
module. The experimental setting and results are described in Section 3. Finally, concluding remarks are given in Section 4.
2 The Proposed Architecture 2.1 Overview The proposed ubiquitous speaker recognition architecture is shown in Fig. 1, which is composite of Mixer, Speech Signal Pre-process, Feature Extraction and Support Vector Machine four blocks. The first block is to mix six microphones input signal through mixer to one signal, and also the background noise is erase via speech enhancement. Then second block is shown input signal extracting process through LPCC or MFCC. Final block presents extracted feature classifier by SVM method. Multiple Microphone
Speech Signal Pre-process End Point Detector
Feature Extraction
Support Vector Machine Speaker Model Database
?
Mixer
SNR-Aware Subspace Speech Enhancement
Pre-Emphasis ?
Frame Blocking ?
LPCC or MFCC
Hamming windows
Speaker Training
Speaker Model 1
Speaker Model N
Speaker Model Likelihood Cross Validation
Identification Speaker
Fig. 1. Proposed ubiquitous speaker recognition architecture
2.2 SNR-Aware Subspace Speech Enhancement A subspace-based signal enhancement proposed by Ephraim and Van Trees [5] seeks for an optimal estimator that would minimize the signal distortion subject to the constraint that the residual noise fell below a preset threshold. Using the eigenvalue decomposition of the covariance matrix, it is shown that the decomposition of the vector space of the noisy signal into a signal and noise subspace can be obtained by applying the Karhunen-Loeve transform (KLT) to the noisy speech. The KLT components representing the signal subspace were modified by a gain function determined by the estimator, while the remaining KLT components representing the noise subspace were null. The enhanced signal was obtained from the inverse KLT of the modified components. In our previous work, a new subspace-based speech enhancement algorithm has been presented [6]. The block diagram of our subspace-based enhancement is depicted in Fig. 2. The input noisy signal is first divided into critical band time series by the wavelet analysis filter bank. For each critical band, individual subspace analysis is applied.
Ubiquitous and Robust Text-Independent Speaker Recognition
Noisy Signal
Analysis Perceptual Filterbank
Prior SNR Estimation and Gain Adaptation
Critical Band Subspace Enhancement
Synthesis Perceptual Filterbank
299
Enhanced Signal
Fig. 2. Speech enhancement block algorithm diagram
2.3 Feature Extraction 2.3.1 Diversity Coefficients Tuning by MFCC The numbers of MFCCs coefficients selection are usually contains 0 order with 12 coefficients plus log power to 2 order include 39 coefficients. This paper evaluates using multiple orders of MFCCs for the experiment to acquire the optimism speaker recognition performance. MFCCs are well-known speech features. They are derived from a type of cepstral representation of the speech clip. The difference between the cepstrum and the mel-frequency cepstrum is that in the mel scale, the frequency bands are positioned logarithmically which approximates the human auditory system's response closely more than the linearly-spaced frequency bands obtained directly from the FFT or DCT. MFCCs are commonly derived as follows: (i) take the Fourier transform of a windowed signal; (ii) map the log amplitudes of the spectrum obtained above onto the mel scale, using triangular overlapping windows; (iii) take the discrete Cosine transform of the list of Mel log-amplitudes, as if it were a signal. The MFCCs are the amplitudes of the resulting spectrum. One way to capture this information is to use delta coefficients that measure the change in coefficients over time as well as first △ck and second △△ck order delta MFCC. The feature vector used for speech recognition is typically a combination of these features [13][14].
⎛ ck ⎞ ⎜ ⎟ X k = ⎜ Δck ⎟ ⎜ ΔΔc ⎟ ⎝ k⎠ Fig. 3 Below illustrates the general steps required to build this front end. It present the hearing performance of human’s ear.
Pre-Emphasis
26 MFCC 39 MFCC
First and Second Order Regression
13 MFCC
Frame Blocking And Hamming Windows
Fast Fourier Transform
Discrete Cosine Transform
Triangular Band-Pass Filter
Fig. 3. Mel-Frequency cepstrum coefficients extraction procedure
300
J.-F. Wang et al.
2.3.2 Order Selection for LPCC This paper selects and evaluates multiple P orders parameters which is defined in LPCC through the preprocessing steps of input speech signal, and therefore evaluate the optimization arguments for robust speaker recognition. The major advantage of LPCC is low computation than MFCC. The LPCC includes three parts: autocorrelation, Durbin’s recursive and cepstrum coefficient conversion .The formula of autocorrelation function is given by: N −1
R ( k ) = ∑ x ( n) x ( n + k ) n=0
In general, the value of k is order of autocorrelation coefficient. After autocorrelation, we calculate LPC (Linear Predictive Coding, LPC) coefficients by Durbin’s recursive algorithm and we receive [ an ]np=1 .Combining both LPC and cepstral analysis of a signal gives benefits of both techniques and improves the accuracy of the features extracted. The basic idea of linear predictive cepstral coefficients (LPCC) is instead of taking the inverse Fourier transform of the logarithm of the spectrum of a signal, it’s taken from the LPC. By taking the LPC [ an ]np=1 the cepstral coefficients C[n] can be computed using a recursive formula, without computing the Fourier Transform. Which is given by n −1 k C[n] = a[n] + ∑ ⋅ C (k ) ⋅ a[n − k ] k =1 n
1≤ n ≤ P
Where P is LPCC order, C[n] is cepstrum coefficients and a[n] is LPC coefficients. 2.4 Regulatory Arguments for SVM Kernel Functions
The three types kernel functions of SVM method [1][2][3] are applied for extracted features classification and speaker recognition, by using the default variables defined in kernel functions then to the experiments and finally be shown weakness results. EVSM method [7] used C and γ for the control parameters of SVM and adjusted variables C and γ which the range is within from 10-6 to 106. This paper uses prior principle to adjust two variables from SVM radius basis kernel function, and apply to the speaker recognitions experiments. During the experiments by using the radius basis kernel function, the optimization tuning variables for C and γ values are 50 and 0.0005, and for polynomial kernel function, the optimization tuning variables for C and γ values are 1 and 3. The standard formulation of SVM is brief review as follows: min w,b ,ξ
subject to
l 1 T w w + C ∑ ξi 2 i =1
yi ( wT φ ( xi ) + b) ≥ 1 − ξi
ξi ≥ 0
Ubiquitous and Robust Text-Independent Speaker Recognition
301
Where w ∈ Rm is a vector of weights of training instances; b is a constant; C is a real value cost parameter, and ξi is a penalty parameter (slack variable). If φ ( xi ) = xi, the SVM finds a linear separating hyper plane with the maximal margin. The SVM is called a nonlinear SVM when φ maps xi into a higher dimensional space. For the nonlinear SVM, the dimension of the vector w can be large or even infinite, so it is extremely difficult to solve directly. The general method is to use the following dual formulation. 1 min α T Qα − eT α α 2 subject to yT α i = 0, 0 ≤ α i ≤ C , i = 1,..., n,
C>0 is the supper bound; αi are Lagrange multipliers where the magnitude of αi is determined by the parameter C; Q is an n×n positive semi-definite matrix; Qij ≡yiyjk(xi, xj) , T and k ( xi , x j ) = φ ( xi ) ( x j ) is a kernel function. Additionally, the formulation is easier to be solved than, because the number of variables in is the size n of the training dataset which is smaller than the dimensionality of φ ( xi ) . It can be shown that if α is an optimal solution of, then w = ∑ i =1α i yiφ ( xi ) is the optimal solution of the primal. n
Fig. 4. Linear separator vs. non-linear separator
If the training set can not apparent separated, then, we can map training vectors xi into a higher dimensional space by the function φ . Then SVM finds separating hyperplane with the maximal margin in this higher dimensional space. Furthermore, k ( xi , x j ) = φ ( xi )T ( x j ) is called the kernel function. Generally, there are four defined default kernel functions (Table 1): Table 1. Defined SVM kernel functions
Linear
k ( xi , x j ) = xi T x j
Polynomial
k ( xi , x j ) = (γ xiT x j + 1) d , γ > 0
Radial basis function
k ( xi , x j ) = exp(−γ xi − x j ), γ > 0
Sigmoid kernel
k ( xi , x j ) = tanh(γ xi T x j + r )
2
302
J.-F. Wang et al.
3 Experimental Results 3.1 Experimental Databases
There are three speech databases are used for the experiments: (i) training and testing with individual microphone, (ii) training and testing with the same microphone; (iii) and multiple microphones. Novel diversity speaker databases are proposed in this paper instead of using some popular speaker database, e.g. TALUNG DATABSE [8] are telephone datum database, King database[9] is conversational speech collection database, radio broadcast speaker database also been used. The goal of this paper uses the diversity utterances databases which is generated from three types of microphone is to simulate the home ubiquitous environment for speaker recognition. So the testing person number under considering the limitation is to be less than 12. Individual microphone speech database is generated from 12 persons through individual microphone, then to record each personnel from reading a book named “tsmc DNA”, averagely 18 minutes speech length for each person via dividing by 1 to 12 seconds segment to get numbers of sequences for testing samples, prefix 5 and 10 seconds of each personnel speech are used for training samples. Individual microphone speech databases exist obviously strong discrimination between different personnel utterances. The same microphone speech database is generated from 10 persons through using the same microphone, then to record each personnel via reading the website articles randomly, averagely 300 seconds of speech length for each person via dividing the same segment to get numbers of sequences for testing samples, the same prefix as above of each personnel speech are used for training samples. Multiple Microphones experimental environment place six microphones on the laboratory ceiling in Fig. 5. Six microphones input speech signal through mixer is to convert to one output signal as Fig. 1 process. The speech database is generated from each personnel through reading a book for 5 minutes with different article contents while walking to and fro under lab ceiling at same time.
Fig. 5. Multiple microphones experimental environment (unit: meter)
Ubiquitous and Robust Text-Independent Speaker Recognition
303
3.2 Performance Evaluation
For three types of microphone speaker recognition test, the parameters setting for LPCC through 12 to 20 order step 2 order tuning and MFCC through 0 order to 2 order, as well as training utterance time include 5 and 10 seconds and testing utterances time contain 1, 2, 3, 4, 6, 8, 10, 12 seconds. Both MFCC and LPCC features are all apply to SVM three kernel functions. Also we adjust the SVM kernel function parameters to reach optimistic speaker recognition rate. 3.2.1 Individual Microphone Speaker Recognition Test Individual microphone speaker recognition test are shown in Fig. 6(a) to Fig. 6(c). Fig. 6(a) to Fig. 6(c) are the experiment results about LPCCs different orders performance. The tuning includes 12 to 20 order step 2 order, which apply to SVM three kernel functions. The training samples include 5 and 10 seconds utterances and testing samples contain 1, 2, 3, 4, 6, 8, 10, 12 seconds utterances. Fig. 7(a) to Fig. 7(c) are the experiment performance results for MFCC different orders and coefficients, which include 0 order to 2 order, it is the same as 13 coefficients to 39 coefficients contain log power, and is applied to SVM three types of kernel functions The training samples include 5 and 10 seconds utterances and testing samples contain 1, 2, 3, 4, 6, 8, 10, 12 seconds utterances. Individual microphone speaker recognition test result is shown that the speaker recognition rate can reach 99.19% in Fig. 6(c) right part, which use 10 seconds of training utterance and 12 seconds for testing utterances and use LPCC features and SVM radial basis kernel function.
3.2.1.1 LPCC Features with SVM Kernel Functions Training Utterance = 5 sec
Training Utterance = 10 sec
100%
100%
95%
95%
tea R90% tea urc c 85% A
tea R90% tea urc c 85% A
80% 75%
12 18 1
2
3
14 20
4 6 8 Test Utterance(sec)
80%
16
10
12
75%
12 18 1
2
3
14 20
4 6 8 Test Utterance(sec)
Fig. 6. (a) Different LPCC orders with SVM linear kernel function
16
10
12
304
J.-F. Wang et al.
Training Utterance = 5 sec 100% 99% 98% 97% 96% tea 95% R94% et 93% ar cuc 92% A91% 90% 89% 88% 87% 86% 85%
12 18
2
3
Training Utterance = 10 sec
14 20
16
4 6 8 Test Utterance(sec)
10
12
100% 99% 98% 97% 96% tea 95% R94% et 93% ar cuc 92% A91% 90% 89% 88% 87% 86% 85%
12 18
1
2
3
14 20
16
4 6 8 Test Utterance(sec)
10
12
Fig. 6. (b) Different LPCC orders with SVM polynomial kernel function Training Utterance = 5 sec 100% 99% 98% 97% et ar 96% tea ru 95% cc 94% A 93% 92% 91% 90%
12 18 1
2
3
4
Training Utterance = 10 sec
14 20 6
8
16
10
12
100% 99% 98% 97% et ar 96% tea ru 95% cc 94% A 93% 92% 91% 90%
12 18 1
2
3
Test Utterance(sec)
4
14 20 6
8
16
10
12
Test Utterance(sec)
Fig. 6. (c) Different LPCC orders with SVM radial basis kernel function
3.2.1.2 MFCC Features with SVM Kernel Functions Training Utterance =5 sec 100% 99% 98% 97% 96% tea 95% R94% tea 93% ru 92% cc A91% 90% 89% 88% 87% 86% 85%
order2 order0 1
2
3
4 6 8 Test Utterance(sec)
Training Utterance =10 sec
order1
10
12
100% 99% 98% 97% 96% tea 95% R94% tea 93% ru 92% cc A91% 90% 89% 88% 87% 86% 85%
order2 order0 1
2
3
4 6 8 Test Utterance(sec)
Fig. 7. (a) Different MFCC orders with SVM linear kernel function
order1
10
12
Ubiquitous and Robust Text-Independent Speaker Recognition
Training Utterance =5 sec 100% 99% 98% 97% 96% tea95% R94% tea93% urc92% cA91% 90% 89% 88% 87% 86% 85%
order2 order0 1
2
3
4 6 8 Test Utterance(sec)
305
Training Utterance =10 sec
order1
10
12
100% 99% 98% 97% 96% tea95% R94% tea93% urc92% cA91% 90% 89% 88% 87% 86% 85%
order2 order0 1
2
3
order1
4 6 8 Test Utterance(sec)
10
12
Fig. 7. (b) Different MFCC orders with SVM polynomial kernel function Training Utterance =5 sec 100% 99% 98% 97% tea R96% tea95% ru cc94% A 93% 92% 91% 90%
order2 order0 1
2
3
4 6 8 Test Utterance(sec)
Training Utterance =10 sec
order1
10
12
100% 99% 98% 97% tea R96% tea95% urc c94% A 93% 92% 91% 90%
order2 order0 1
2
3
4 6 8 Test Utterance(sec)
order1
10
12
Fig. 7. (c) Different MFCC orders with SVM radial basis kernel function
3.2.2 The Same Microphone Speaker Recognition Test The same microphone speaker recognition tests are shown in Fig. 8(a) to Fig. 8(c). Fig. 8(a) to Fig. 8(c) are the experiment results about LPCCs different orders performance. The LPCC tuning include 12 to 20 order step 2 order, which apply to SVM three kernel functions. The training samples include 5 and 10 seconds utterances and testing samples contain 2, 3, 4, 5, 6 seconds utterances. Fig. 9(a) to Fig. 9(c) are the experiment results for MFCC different orders and coefficients, which include 0 order to 2 order, it is the same as 13 coefficients to 39 coefficients contain log power ,and is applied to SVM three types of kernel functions. The training samples include 5 and 10 seconds utterances and testing samples include 2, 3, 4, 5, 6 seconds utterances. The same microphone speaker recognition test result are shown that the speaker recognition rate can reach 97.44% in Fig.8(b) right part, which uses 10 seconds of training utterance with 6 seconds utterance for testing sample then to apply 14 order LPCC features as well as SVM polynomial kernel function.
306
J.-F. Wang et al.
3.2.2.1 LPCC Features with SVM Kernel Functions Training Utterance = 5 sec
Training Utterance = 10 sec
100%
100%
95%
95%
90%
90%
et aR et 85% rau cc80% A
et aR et 85% rau cc80% A
75%
75%
12 18
70%
14 20
16
12 18
70%
65%
14 20
16
65% 2
3
4 5 Test Utterances(sec)
6
2
3
4 5 Test Utterances(sec)
6
Fig. 8. (a) Different LPCC orders with SVM linear kernel function Training Utterance = 5 sec 100%
12 18
14 20
Training Utterance = 10 sec 100%
16
95%
95%
tea R90% et ar uc cA85%
tea R90% et ar uc cA85%
80%
80%
75%
12 18
14 20
16
75% 2
3
4 5 Test Utterances(sec)
6
2
3
4 5 Test Utterances(sec)
6
Fig. 8. (b) Different LPCC orders with SVM polynomial kernel function Training Utterance = 10 sec
Training Utterance = 5 sec 100%
100%
95%
95%
tea 90% R ater cuc A85%
et aR90% et rau cc A85% 12 18
80% 75%
14 20
16
12 18
80%
14 20
16
75%
2
3
4 Test Utterances(sec)
5
6
2
3
4 5 Test Utterances(sec)
Fig. 8. (c) Different LPCC orders with SVM radial basis kernel function
6
Ubiquitous and Robust Text-Independent Speaker Recognition
307
3.2.2.2 MFCC Features with SVM Kernel Functions
100% 95%
Training Utterance =5 sec 2 5
3 6
4
100% 95%
90%
Training Utterance =10 sec 2 5
3 6
4
90%
tea R85% tea ru cc80% A
tea R85% tea ru cc80% A
75%
75%
70%
70%
65%
65%
Order=2 Order1 Order=0 Order=2 Order=1 Order=0 Energy=1 Energy=1 Energy=1 Energy=0 Energy=0 Energy=0
Order=2 Order1 Order=0 Order=2 Order=1 Order=0 Energy=1 Energy=1 Energy=1 Energy=0 Energy=0 Energy=0
Fig. 9. (a) Different MFCC orders with SVM linear kernel function 100% 95%
Training Utterance =5 sec 2 5
3 6
4
100% 95%
tea90% R tea ru85% cc A
tea90% R tea ru85% cc A
75%
75%
70%
70%
80%
Training Utterance =10 sec 2 5
3 6
4
80%
Order=2 Order1 Order=0 Order=2 Order=1 Order=0 Energy=1 Energy=1 Energy=1 Energy=0 Energy=0 Energy=0
Order=2 Order1 Order=0 Order=2 Order=1 Order=0 Energy=1 Energy=1 Energy=1 Energy=0 Energy=0 Energy=0
Fig. 9. (b) Different MFCC orders with SVM polynomial kernel function 100% 95%
Training Utterance =5 sec 2 5
3 6
4
Training Utterance =10 sec 2 5
3 6
4
95%
90%
et a R et 85% ar cuc80% A
et a90% R et ar cuc A85%
75%
80%
70% 65%
100%
Order=2 Order1 Order=0 Order=2 Order=1 Order=0 Energy=1 Energy=1 Energy=1 Energy=0 Energy=0 Energy=0
75%
Order=2 Order1 Order=0 Order=2 Order=1 Order=0 Energy=1 Energy=1 Energy=1 Energy=0 Energy=0 Energy=0
Fig. 9. (c) Different MFCC orders with SVM radial basis kernel function
308
J.-F. Wang et al.
3.2.3 Multiple Microphones Speaker Recognition Test Multiple microphones speaker recognition test are shown in Fig. 10(a) to Fig. 10(c). Fig. 10(a) to Fig. 10(c) are different orders performance about LPCC different orders. The tuning include 12 to 20 order step 2 order as well as speech enhancement and speech non-enhancement tuning, which is apply to SVM three types of kernel functions. The training samples include 10 seconds utterance and testing samples contain 2, 3, 4, 5, 6 seconds utterances. Fig. 11(a) to Fig.11 (b) is the experiment results for MFCC different orders and coefficients, which include 2 order, it is the same as 38 coefficients with log power as well as speech enhancement and speech non-enhancement tuning ,which is to apply SVM three types of kernel functions. The training samples include 5 and 10 seconds utterances and testing samples contain 2, 3, 4, 5, 6 seconds utterances. Multiple microphone speaker recognition test results are shown that the speaker recognition rate can reach 97.2% in Fig. 10(c) right part, which uses 10 seconds of training utterance and 6 seconds for testing sample and to apply 20 order LPCC features as well as SVM radius basis kernel function with speech enhancement tuning.
3.2.3.1 LPCC Features with SVM Kernel Functions Non-Enhancement
Enhancement
100%
100%
95%
95%
90%
90%
tea R 85% tea ru cc 80% A
tea R 85% tea ru cc 80% A
75%
75% 12 18
70% 65%
2
3
14 20 4
16 5
6
12 18
70% 65%
2
3
Test Utterances(sec)
14 20 4
16 5
6
Test Utterances(sec)
Fig. 10. (a) Different LPCC orders and with SVM Linear kernel function Non-Enhancement
Enhancement
100%
100%
95%
95%
tea R 90% et ar cuc A 85%
tea R 90% et ar cuc A 85%
80% 75%
12 18 2
3
14 20 4
Test Utterances(sec)
16 5
6
80% 75%
12 18 2
3
14 20 4
16 5
Test Utterances(sec)
Fig. 10. (b) Different LPCC orders with SVM Polynomials kernel function
6
Ubiquitous and Robust Text-Independent Speaker Recognition
Non-Enhancement
Enhancement
100%
100%
95%
95%
tea R 90% tea ru cc A 85%
tea R 90% tea ru cc A 85%
80% 75%
12 18 2
3
309
14 20 4
80%
16 5
75%
6
12 18 2
Test Utterances(sec)
3
14 20 4
16 5
6
Test Utterances(sec)
Fig. 10. (c) Different LPCC orders with SVM Radial Basis kernel function
3.2.3.2 MFCC Features with SVM Kernel Functions Training Utterance =5 sec, Non-Enhancement 100%
2 5
95%
3 6
Training Utterance =5 sec, Enhancement 4
90%
100%
2 5
95%
3 6
4
90%
tea R85% tea ru80% cc A75%
tea R85% tea ru80% cc A75%
70%
70%
65%
65%
60%
Linear Polynonial Energy=1 Energy=1
Radial Linear Polynonial Radial Basis Energy=0 Energy=0 Basis Energy=1 Energy=0
60%
Linear Polynonial Energy=1 Energy=1
Radial Linear Polynonial Radial Basis Energy=0 Energy=0 Basis Energy=1 Energy=0
Fig. 11. (a) Different MFCC orders, log power with three types SVM kernel function Training Utterance =10 sec, Non-Enhancement
Training Utterance =10 sec, Enhancement
100%
100%
95%
95%
90%
90%
et aR85% et ar80% uc c A75% 70% 65% 60%
Linear Polynonial Energy=1 Energy=1
et aR85% et ar80% uc c A75% 2 5
3 6
4
Radial Linear Polynonial Radial Basis Energy=0 Energy=0 Basis Energy=1 Energy=0
70% 65% 60%
Linear Polynonial Energy=1 Energy=1
2 5
3 6
4
Radial Linear Polynonial Radial Basis Energy=0 Energy=0 Basis Energy=1 Energy=0
Fig. 11. (b) Different MFCC orders, log power with three types SVM kernel function
310
J.-F. Wang et al.
4 Conclusions This paper proposed a ubiquitous architecture for text-independent speaker recognition. In this architecture, the processes use speech enhancement and noise erase technique, also tuning the SVM three kernel function parameters as well. Three multiple microphone configurations are adopted to evaluate the speaker recognition performance. The experiments are shown that the proposed method can reach high accuracy speaker recognition rate.
References 1. 2. 3. 4.
5. 6.
7. 8.
9. 10. 11.
12.
13. 14.
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995) Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998) Schölkopf, B., Mika, S., Burges, C., Knirsch, P., Müller, K.-R., Rätsch, G., Smola, A.: Input space vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks 10(5), 1000–1017 (1999) Ephraim, Y., Van Trees, H.L.: A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 3(4), 251–266 (1995) Jia-Ching, W., Hsiao-Ping, L., Jhing-Fa, W., Chung-Hsien, Y.: Critical Band SubspaceBased Speech Enhancement Using SNR and Auditory Masking Aware Technique. IEICE Transactions on Information and Systems E90-D(7), 1055–1062 (2007) Hui-Ling, H., Fang-Lin, C.: ESVM: Evolutionary support vector machine for automatic feature selection and Classification of micro array data. BioSystems 90, 516–528 (2007) Shung-Yung, L.: Efficient text independent speaker recognition withwavelet feature selection based multilayered neural network using supervised learning algorithm. Pattern Recognition 40, 3616–3620 (2007) Shung-Yung, L.: Wavelet feature selection based neural networks with application to the text independent speaker identification. BioSystems 90, 516–528 (2007) Vincent, W., Steve, R.: Speaker verification using sequence discriminant support vector machines. IEEE transactions on speech and audio processing 13(2) (March 2005) Campbell, W.M., Campbell, J.P., Gleason, T.P., Reynolds, D.A., Shen, W.: Speaker Verification Using Support Vector Machines and High-Level Features. IEEE transactions on speech, audio and language processing 15(7) (September 2007) Burget, L., Matĕjka, P., Schwarz, P., Glembek, O., Cĕrnocký, J.H.: Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System. IEEE transactions on speech, audio and language processing 15(7), 1979–1985 (2007) Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Recognition Signals. PrenticeHall Co. Ltd, Englewood Cliffs (1978) Huang, X., Acero, A., Hon, H.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall Co. Ltd, Englewood Cliffs (2001)
Energy Efficient In-Network Phase RFID Data Filtering Scheme Dong-Sub Kim, Ali Kashif, Xue Ming, Jung-Hwan Kim, and Myong-Soon Park Department of Computer and Radio Communications Engineering, Korea University, Seoul, 136-713, Korea {gastone, ali, xueming, glorifiedjx, myongsp}@ilab.korea.ac.kr
Abstract. One of the limitations that prevent proliferation of RFID technology is redundant data transmission within the network usually caused by unreliability of readers and duplicate readings generated by adjacent readers. Such redundancies unnecessarily consume resources of network and depreciate the performance of RFID installation. In this paper, we propose a CLIF, an energy-efficient filtering scheme that detects the in-network redundant data and eliminates it. The simulation results show that the CLIF significantly reduces the number of comparisons required for detecting duplicates while it achieves relatively high duplicate data elimination ratio considering the location of reader. Consequently, CLIF reduces the considerable amount of transmission within the network.
1
Introduction
RFID (Radio Frequency Identification) technology provides identification of each object using radio frequency signals. Unlike conventional techniques such as barcodes, RFID does not have a limitation of ’line of sight’. Recent development of RFID technology is boosting its applications’ rapidly while RFID devices e.g. tag are becoming increasingly smaller and cheaper. The Auto-ID [9] Center has proposed methods which could lower the cost per RFID chip to approx. 5 U.S. cents. Consequently many industries such as supply chain management, inventory and military applications showed a great thirst of RFID technology. 1.1
Applications and Challenges
RFID technology is facilitating the business industry in various manners. It has been gradually adopting in many application areas. In applications such as military equipment management, large-scaled inventory systems, RFID installations have to be deployed in a wide area. Covering these large areas with RFID technology requires system infrastructures such as wired facilities or APs (Access Points). However in case of military equipment management systems, it will be
This work was supported by the Second Brain Korea 21 Project. Corresponding Author.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 311–322, 2008. c Springer-Verlag Berlin Heidelberg 2008
312
D.-S. Kim et al.
Fig. 1. Example of RFID System with Mobile RFID Readers
hard to implement infrastructures while area is not defined in it or presence of enemies and their movement could be agile and fast. Therefore, implementing RFID systems without a define infrastructure might be beneficial for this application. Implementation of infrastructure-less RFID systems can be facilitated by using WSN (Wireless Sensor Network) along with mobile RFID readers. Mobile RFID reader is the handheld device with RFID antennas and has a capability of wireless communication. Fig. 1 represents our RFID system architecture showing the infrastructure-less nature of the system. While working with WSN topology and considering its characteristics, energy consumption is a critical point need to be considered. In the RFID, however, readers being unreliable in nature manipulate duplicate readings. The observed read rate (i.e., number of tags read to the actual number of tags in a reader vicinity) in real-world RFID installation is 60-70% [8]; in other words, each reader produces a lot of duplicate readings for achieving accuracy [11]. Moreover readers are deployed densely so they usually have overlapping read ranges which also cause duplication in the network. Scarce resources of wireless sensor networks put constraints; on the network lifetime energy being the most critical one. The duplicate readings cause network congestion and transmission delay since massive RFID data transmission with limited bandwidth between readers in wireless sensor network [1]. Such redundant transmission consumes a lot of energy. Filtering of duplicate data can save a lot of energy of the network and it enhance the lifetime of the network. 1.2
RFID Background
RFID installations usually consist of three components: Reader, Tags and Antenna. RFID reader communicates with tags using antenna through RF signal to produce the tag ID list. Antenna has pre-determined reading range and it can
Energy Efficient In-Network Phase RFID Data Filtering Scheme
313
Fig. 2. Overlapped Reading Area of RFID Readers
read the information written on tags within its reading range. Unreliable nature of readers and overlapped reading vicinity among readers can cause duplicate data generation. [2] Defines data redundancy in RFID systems which can be divided into two parts: data level and reader level. • Redundancy at Data Level When reader interrogates a tag multiple times for accuracy or reads multiple copies of tags attached on one object for reducing missing rate, reader generates duplicate readings. • Redundancy at Reader Level Usually numerous readers are installed densely to cover whole subjected area and these readers may have overlapped reading vicinity with their neighbor readers. When the two or more than two neighboring readers interrogate tags in the overlapped area, duplicate readings are generated as shown in Fig. 4. For instance, readers R1, R2, R3 and R4 are redundant with each other so they generate duplicate readings. 1.3
Motivation
Cluster with dense deployment of readers increases huge amount of duplicated data. Output stream of a reader usually contains many duplicate values which are also described above as redundancy at data level. In a cluster, every reader sends the data to its cluster head and due to overlapping of readers more duplication is being added to the input stream of cluster head buffer which defined as redundancy at reader level. These duplicate data values can cause network congestion and increases transmission delay which need to be removed within the network to save energy and to enhance network lifetime. Therefore; an energy efficient filtering scheme is being proposed in this paper that outclasses the previous in-network duplicate data elimination approaches by considering the readers location and is a remarkable enhancement in the field of energy efficiency in RFID systems along with WSN for RFID applications.
314
D.-S. Kim et al.
1.4
Contributions
In this paper, an energy-efficient RFID data filtering scheme is being proposed named as CLIF (Cluster-based In-network phase Filtering scheme). Contrast to the filtering approaches at BS; the CLIF filters duplicate data within the network. For better understanding of the in-network redundancy, we divide data redundancy in following parts • Intra-Cluster Redundancy When the member nodes want to send the data to the B/S, they send the data to the CH first and CH filter data to remove duplications. After filtering they send the data via a route to the base station. Redundancy occurred within the premises of a cluster is called intracluster redundancy. It includes: - Redundancy at data level - Redundancy at reader level • Inter-Cluster Redundancy Densely deployed readers have overlapping areas that are covered by neighboring readers and after forming clusters; readers of one cluster might have overlapping areas with readers of neighboring clusters is called redundancy at cluster level. This kind of redundancy have not considered yet by any of previous in-network filtering approaches. CLIF provides solution to both intra and inter cluster duplication and reduces considerable amount of network congestion; caused by duplicate data values; hence the network delay. The main challenge for RFID data filtering in clusterbased networks is elimination of redundant data generated by adjacent clusters. This approach exploits the reader’s location and eliminates the duplicate data. Simulation shows that CLIF requires less number of comparisons for detecting duplicates and possess relatively high duplicate data elimination ratio compared to INPFM [1]. We also showed that CLIF’s processing time is much shorter than the previous approaches. Rest of paper is organized as; in section 2 we will give the related work of filtering algorithms. Section 3 is divided into two sub-sections; 3.1 talks about the detection of duplicates and 3.2 includes the filtering algorithm in detail. In the next section we showed the simulation results of our approach and lastly the conclusion of this paper is given in section 4.
2
Related Works
Most of literature [4], [5], [6] concentrates the data filtering at base station or sink node. In these approaches RFID readers read tags, and then the readings are sent to the B/S via a route and B/S filter the data. These approaches are quite heavy to adopt within the network as our reader have very limited resources and processing these fleshy algorithms on the reader can reduce the lifetime of device. Moreover; they being executed at BS can not reduce the amount of in-network
Energy Efficient In-Network Phase RFID Data Filtering Scheme
315
transmission. One famous algorithm running at BS is Shawn R. Jeffery, et al proposed SMURF sliding window for RFID filter [5] on the RFID middleware or BS. They compute an adaptive window size to detect duplications and then RFID data filtering is done with a sliding window. They assume that adjacent readers can read a same object, so the proposed filter eliminates the data which has a same EPC value in a sliding window. But the window size is a critical issue in SMURF and higher the window size will result more reliable data. [7] Suggests a simple solution in case of redundant readers. It chooses some of the readers to be active and rest may be turned off. But the problem of finding the optimal solution for the redundant reader elimination i.e. which reader to be turned on; is NP-hard problem [10]. [1] Proposed an In-Network Phased Filtering Mechanism (INPFM). This mechanism works on every node. However, the nodes closer to sink will consume a lot of energy for relaying the data of downstream clusters to the BS; so they filter RFID data when ’k’ value is less than the number of hops between the node itself and the BS. Otherwise, they transmit the data to the next intermediary node directly. However, filtering depends on the value of ’k’. if k is high then huge number of CH’s will be involved in filtering. Since we may know the probability of duplication by considering the reader’s location, we can apply filtering procedure only on few nodes. [1] Only remove reader level duplication and it also have unnecessary comparisons to filter the data which can be avoided to save a lot of energy and overload of the network.
3
CLIF (Cluster-Based In-Network Phase Filtering Scheme)
For our network model we use the cluster-based wireless sensor network into RFID installation using RFID mobile reader where each reader acts like a node and are in the shape of clusters. We assume the sensor network protocols to be followed for our approach. To detect and eliminate duplicate readings, we proposed an energy efficient filtering approach named as CLIF, which can be divided into two phases as given below – Phase 1: It includes duplicate detection scheme in case of inter and intracluster duplication. Explanation is given in Section 3.1 – Phase 2: An efficient in-network RFID data filtering method called RSF (Reverse Search Filtering) is presented in Section 3.2 3.1
Duplicate Detection Scheme
In the cluster-based RFID network, there exist two kinds of duplicate data such as Intra-cluster Duplication (IRD ) and Inter-cluster Duplication (IED ). In IRD, every member sends the data to the CH (Cluster Head) and CH filters the data. Through this procedure, CH can eliminate the redundancy of both data level and reader level.
316
D.-S. Kim et al.
Fig. 3. Duplicate Readings in Cluster-Based RFID System
Cluster-based network generates one more kind of duplication, IED where readers of neighboring clusters can be redundant. In dense network topology there could be immense redundancy at cluster level as shown in Fig. 3. Such kind of duplication however, can not be removed in the intra-cluster filtering phase as the CH can only eliminates the duplicate reading of its own cluster. Therefore IED can be detected and eliminated by an upstream cluster while forwarding data to base station. The main challenge of IED filtering is to find duplicate data among clusters efficiently. For the above reason, we consider the readers’ location. To eliminate the duplicate data between inter-cluster areas, we select an appropriate CH which has a chance to be redundant. CH in the route path to the base station filters duplicate data so that they can reduce the number of transmission data packets. CLIF addresses both these intra and inter-cluster filtering separately as given below. • Intra-Cluster Filtering Step When the member nodes want to send the data to the B/S, they send the data to the CH first and CH filter data to remove duplications. After filtering they send the data via a route to the base station. • Inter-Cluster Filtering Step If CH receives the data from other cluster, they check the source’s cluster id and if it is neighboring cluster then they process an inter-cluster filtering step. Duplicate Decision Phase: Node which receives data checks whether I am CH. If it is not, it forwards data to its own cluster head. If it is CH, it checks the source cluster id whether it destined to neighbor cluster or to myself. Address of a node can be used to
Energy Efficient In-Network Phase RFID Data Filtering Scheme
317
generate cluster id. This algorithm facilitates us to select the CH which has probability of being duplicated. Algorithm 1. Filtering Step Decision. filtering step decision (params: incoming data) loop for incoming data if i am a CH then s cid ← cluster id of incoming data if s cid is same as my cluster ID then filtering step else search s cid from my neighbor cluster IDs if found then filtering step else send the data through the route end if end if else send the data to the CH end if end loop
3.2
An Efficient RFID Data Filtering Method
Firstly we will define the duplication and data model based on EPC [11]. When an RFID reader read a tag, the following triplet is read. RFID Triplet = EPC Code, Reader ID, Timestamp When the reader reads two triplets, duplication occurs if the following three conditions described in the definition of duplication are satisfied: • Definition of Duplication Triplet A = EA , RA , TA Triplet B = EB , RB , TB 1) The value of EA is equals to the value of EB 2) The RA and RB are same or neighboring readers. 3) Difference between TA and TB is lesser than constant T RFID Reader integrated with a sensor node will eliminate duplicate tag data. One of the conditions of duplication is the same EPC value. EPC consists of four parts - a header, a general manager number, an object class, and a serial number. In this EPC structure, usually objects have different serial number. Although different object can have same serial number, but this situation occur rarely in the domain of RFID applications.
318
D.-S. Kim et al.
Algorithm 2: Filtering Method. tagdata { epc, reader id, timestamp } Filtering (params: tagdata) { hash key hk1 = hash(tagdata.epc.serial number) search hk1 from hash table if not found then insert (hk1, tagdata) to the hash table else if found then if rfid tagdata.timestamp hk1.tagdata.timestamp < time interval then if is neighbor() then compare tagdata.epc.others and hk1.tagdata.epc.others if same then drop rfidobject as duplicate data end if end if end if update tagdata to hash table using hk1 end if } RSF (Reverse Search Filtering) is proposed to filter the data efficiently so we can send the data quickly to the upstream. [1] Discuss the similar kind of filtering method and it compares serial part of EPC first, and then it compares other part of EPC, reader ID and timestamp respectively. However, comparing other part of EPC after comparing the serial number is in-efficient. RSF compares reader ID or timestamp and it is more efficient because the probability of having different other part of EPC is lesser than the probability of having different reader ID or timestamp in EPCs which have a same serial part.
4
Simulation Results
In this section, we evaluate the performance of our proposed algorithm. We calculate the number of raw data, IRD and IED in Section 4.2. We evaluate the NC and DER of CLIF with INPFM [1]. • Definition 1.The Number of Comparison (NC) We define the NC as the number of comparison for examining duplication of
Energy Efficient In-Network Phase RFID Data Filtering Scheme
319
data through in-network transmitters. For example, when data A is transmitted from a specific node to sink and if data comparison to examine the presence of duplication is performed at some CHs, say 3 times, during transmission, the NC value is 3. • Definition 2. Duplicate data Elimination Ratio (DER) The DER represents the ratio of duplicate data elimination. 4.1
Simulation Settings
In this part, we considered grid network topology and simulation environment is given below in table 1. According to the table 1, duplicate ratio between two neighboring nodes is 10%, which means if a node has 2 neighbors, they produce 4 duplicate readings. As considered a cluster-based network and the number of clusters vary from 16 to 225. Each cluster has 9 members and right node at the top in the cluster is selected as a Cluster Head (CH). The sink node exists at the rightmost side on the top of the network as shown in Fig.3. 4.2
Relation between the Number of Cluster and Duplicate Data
As discussed the relation among the raw data, IRD and IED by changing the number of clusters. Let’s suppose the number of node in the square area is M2 and the amount of data generated from one node is n. And we also suppose the number of cluster is C2 and the duplication ratio between one neighbor and node itself is dp . Then, we can calculate the both identical data and data redundancy. • Raw Data = (nM2 ) C2 • Duplicate data in Intra-cluster area (IRD) = 2C2 ndp M(M-1) • Duplicate data in Inter-cluster area (IED) = 2ndp M C(C-1) Assigning our simulation parameters to the equations above, we can get the following graph shown in Fig. 4. Table 1. Parameters used in simulation Item
Value
Number of node Reading range Distance between nodes Number of Tags in each reading area Duplication ratio per neighbor node Number of members in each cluster Number of cluster
144 ˜ 2025 5m 8m 20 10% 9 16 ˜ 255
320
D.-S. Kim et al.
Fig. 4. Ratio of Raw data and IRD, IED data with varying number of Clusters
The Number of Nodes (M2 ) = 9 Duplication probability (dp ) = 0.1 The Number of reading in one node (n) = 20 In the Fig. 4, as the number of clusters increases, IRD and IED also increase. In the next section, we vary the number of clusters from 42 to 152 to observe the NC and the DER compared to INPFM. 4.3
Evaluation of NC and DER
Fig.5 (a) shows that CLIF has less NC with varying number of clusters compared to INPFM. We vary the value of ’k ’ from 1 to 4. In this simulation as the number of clusters increases, the NC will also increase. In other words, if the number of hops between nodes and sink increase, the efficiency of filtering at INPFM decreases. For clusters size 25 with k = 4; NC of CLIF is 71% less than that of INPFM. The less NC does not mean the existence of DER. Fig. 5(b) shows the DER with varying the number of clusters. Even though NC of INPFM is 4 times greater than that of CLIF, it will have lower DER; except the value k is 1. Overall, CLIF can be a good trade-off between NC and DER, as it guarantees of more than 50% DER and it requires lesser NC compare to INPFM. 4.4
Filtering Time
In order to evaluate performance of our RSF algorithm, we simulated one reader during 10ms and configured the constant T on condition of duplication to 1ms. The processing time at CLIF is approximately 8 10% faster than the processing time at INPFM as shown in Fig. 6. Furthermore, when the data exceed more than 20,000, the processing time becomes lesser than 40msec. On the other
Energy Efficient In-Network Phase RFID Data Filtering Scheme
(a) NC in varying number of clusters
321
(b) DER in varying number of clusters
Fig. 5. NC and DER in varying number of clusters
Fig. 6. Comparing Processing Time between INPFM and Proposed Filter
hand INPFM takes 45msec. Hence, we can guarantee that CLIF outperform the INPFM and it process the duplicate data much faster than INPFM and decrease the transmission delay.
5
Conclusion
Transmission of redundant data packets consumes immense amount of energy and cause the delay in the network. By eliminating duplicate data we can save precious resources of the network. In this paper, we defined the duplication in cluster based networks. Duplication of data in Cluster based networks is distinguished in two categories (1) Intra-cluster duplication (IRD ) and (2) Intercluster duplication (IED ). IED occurs when readers of neighboring clusters overlap with each other; is first time considered in this paper. Moreover we proposed energy efficient duplicate data filtering scheme CLIF (Cluster-based In-network phase Filtering scheme). CLIF provide efficient solution to both IRD and IED. CLIF consist of two steps. First one is the detection of duplicate data within the network and secondly it includes an algorithm names as RSF (Reverse Search Filtering) which expeditiously remove the duplicate values. We evaluated our
322
D.-S. Kim et al.
approach by appropriate simulation results showing that even though CLIF has lesser NC required for detecting duplicate data, the DER is still higher than others in-network data filtering schemes.
References [1] Choi, W., Park, M.-S.: In-Network Phased Filtering Mechanism for a Large-Scale RFID Inventory Application. In: The 4th International Conference on IT & Application (ICITA 2007), Harbin, China, January 15-16, vol. 2, pp. 401–405 (2007) [2] Derakhshan, R., Orlowska, M., Li, X.: RFID Data Management: Challenges and Opportunities. In: IEEE First International Conference on RFID, March 27-28 (2007) [3] Bohn, J., Mattern, F.: Super-Distributed RFID Tag Infrastructers, TR, Institute of Pervasive Computing, ETH (2004) [4] Wang, F., Liu, P.: Temporal Management of RFID Data. In: The Proceedings of the 31st Very Large Data Bases Conference (VLDB 2005), August 2005, pp. 1128–1139 (2005) [5] Jeffery, S.R., Garofalakis, M., Franklin, M.J.: Adaptive Cleaning for RFID Data Streams. In: The proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), September 12-15, pp. 163–174 (2006) [6] Bai, Y., Wang, F., Liu, P.: Efficiently Filtering RFID Data Streams. In: The First International VLDB Workshop on Clean Databases (CleanDB), September 11 (2006) [7] Carbunar, B., Ramanathan, M., Koyuturk, M., Hoffmann, C., Grama, A.: Redundant reader elimination in RFID systems. Sensor and Ad Hoc Communications and Networks, 176–184 (2005) [8] Chawathe, S.S., Krishnamurthy, V., Ramachandrany, S., Sarma, S.: Managing RFID Data. In: The Proceedings of the 31st Very Large Data Bases Conference (VLDB 2005), pp. 1189–1195 (August 2005) [9] Sarma, S.E.: Towards the five-cent tag. Technical Report MIT-AUTOID-WH-006, MIT Auto-ID Center (2001), http://www.autoidcenter.org/research/MIT-AUTOID-WH-006.pdf [10] Carbunar, B., Ramanathan, M., Koyuturk, M., Hoffmann, C., Grama, A.: Redundant reader elimination in RFID systems. Sensor and Ad Hoc Communications and Networks, 176–184 (2005) [11] EPCglobal Tag Data Standard (TDS) Version 1.3 document, http://www.epcglobalus.org/dnn epcus/KnowledgeBase/Browse/tabid/277/ DMXModule/706/Command/Core Download/Default.aspx?EntryId=297
Energy-Efficient Tracking of Continuous Objects in Wireless Sensor Networks* Jung-Hwan Kim, Kee-Bum Kim, Chauhdary Sajjad Hussain, Min-Woo Cui, and Myong-Soon Park** Department of Computer Science and Engineering Korea University, Seoul, Korea {glorifiedjx,givme,sajjad,minwoo,myongsp}@ilab.korea.ac.kr
Abstract. The proliferation of research on target detection and tracking in wireless sensor networks has kindled development of tracking continuous objects such as fires, bio-chemical material diffusion. In this paper, we propose an energy-efficient algorithm that detects and monitors a moving event region by selecting only a subset of nodes near object boundaries. The paper also shows that we can effectively reduce report message size. It is verified with simulation results that overall size of the report message as well as the number of nodes that transmit the report message to the sink can be significantly reduced especially when the density of nodes deployed over the network field is high. Keywords: wireless sensor networks, object tracking, target tracking, boundary, edge, continuous objects, energy-efficient.
1 Introduction Large scale wireless sensor networks have been enabled by rapid technological advances in MEMS and wireless communication. They are used in a wide variety of monitoring applications, ranging from habitat/environmental monitoring to military surveillance. One of typical and famous research area in wireless sensor network is target tracking. There have been enormous research achievements on target tracking with sensor networks. The majority of them are to identify and track one or multiple number of small targets. Sensors collaboratively detect and track the objects through the emittance of energy, noise, light, or seismic waves of the objects. However, there have been relatively small research efforts on detection and tracking large phenomena or objects such as forest fires, mud flows, bio-chemical material diffusion and oil spills. The phenomena can span large geographic extents and with monitoring such continuous and randomly changeable objects, we will be able to prevent possible assaults in advance such as diffusion of hazardous gas or find safe routes in such a circumstance for instance. * **
This work was supported by the Second Brain Korea 21 Project. Corresponding author.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 323–337, 2008. © Springer-Verlag Berlin Heidelberg 2008
324
J.-H. Kim et al.
To estimate such type of objects, it usually requires inordinate amount of message exchanges to collaboratively estimate the objects’ movement and location information in real time. Furthermore, it is well known that the cost for communication between sensors is much higher than that for computation. Therefore, it is judicious to invent an efficient algorithm that minimizes the communication costs as much as possible in operating sensor networks in order to prolong the lifetime.
(a) The simplest approach
(c) COBOM in [2]
(b) boundary sensors in [1]
(d) TOCOB (our approach)
Fig. 1. Comparison of number of nodes that send report messages to sink
There would be various ways to monitor the continuous objects. The simplest way to collect data from sensors would be to let every node that is actually detecting an object transmit its reading status to the sink node. Figure 1 (a) illustrates the straightforward approach. All sensors inside a phenomenon report data to the sink. However, this approach will cause sensors inside the phenomenon to dissipate energy at breakneck pace. In [1], Xiang Ji et al. propose a mechanism that selects only a few nodes nearby object boundaries to save energy. Solid dots in Figure 1(b) depict the boundary sensors selected nearby the boundary of an object. It is clear that there would be much more energy saving compared to Figure 1 (a) since only those which are near the object boundary are chosen to send data to the sink. [1] also discusses about
Energy-Efficient Tracking of Continuous Objects
325
forming clusters among the boundary sensors. We will discuss about it in more detail in section 2. In [2], Cheng Zhong and Michael Worboys propose a new energyefficient boundary node selection algorithm described in Figure 1 (c). According to [2], although more boundary sensors are selected than previous research, actual nodes that report to the sink are representative nodes that are drawn red dots in Fig. 1. (c). In this paper, we present an energy-efficient algorithm that detects boundaries of moving phenomena so as to monitor their shapes and movement in wireless sensor networks. We will focus on reducing the number of boundary sensors that eventually results in lowering the number of representative sensors to reduce traffic to the sink node as well as communication between sensor nodes. Figure 1(d) illustrates our proposed representative node selection. In our proposed architecture, we assume that if a node detects a phenomenon in its local area and the sensing value is beyond the predetermined threshold, then the node is regarded to be inside the phenomenon. Moreover, although it is more realistic to consider three-dimensional space, we examine only two-dimensional space in our architecture for simplicity. The rest of this paper is organized as follows: Section 2 discusses about previous works and criticizes them. In section 3, we present definitions related to our proposed idea as well as assumptions required for our algorithm. Section 4 presents the proposed algorithm in detail and Section 5 analyzes simulations and finally in Section 6, we conclude our dissertation with pointing out directions of the future work.
2 Related Work There has been a lot of research on detecting and tracking single or multiple targets in sensor networks [7][8]. Since last few years, some researchers have begun to analyze continuous object detection and monitoring such as forest fires, mud flows, biochemical material diffusion and oil spills. In [4][9], authors analyze detecting some non-local events which are closely related to our topic. The main difference in the research is that they try to estimate not the boundary of continuously changing objects but non-local phenomena that is always static. Therefore, they do not examine the situation where the phenomena move randomly and unexpectedly in real time fashion. Furthermore, [4] may lead to massive quantity of energy consumption since all boundary sensors report data to the sink. Xiang Ji et al. [1] propose a dynamic cluster based algorithm that tracks the movement of continuous objects with monitoring boundary of the objects. In [1], when a sensor detects the emergence of any phenomena at current time, it immediately broadcasts a query message to its neighbors to ask for the neighbors’ readings and the neighbors reply with sending their current readings to the sensor. If any neighbor that has different detection status exists, which means if the sensor receives at least one different detection status from any neighbor, the sensor becomes a boundary sensor. In a word, only those sensors inside the phenomena that are nearby (i.e., in
326
J.-H. Kim et al.
communication range) the boundary of objects will be selected as boundary sensors. After boundary node selection, cluster formation process takes place. However, the cluster formation explanation in [1] is somewhat ambiguous and also we claim that the clustering formation itself is not a good approach when the goal of forming clusters is to save energy since the application we consider in this paper is tracking objects that randomly and unexpected diffuse and drift with gust in real time. If, for example, the objects move, expand or shrink fast, cluster reformation through all boundary sensors should then happen every moment that the object changes or moves. Further, if a cluster is wide and huge, then propagation through members and a cluster head can be attenuated rapidly which necessitate more energy. Above all, the main drawback of this paper appears where every boundary sensor is directly or indirectly involved in routing data to the sink. Since each boundary sensor will be either a member or a cluster head, the boundary sensor in the network should at least send data to the distance up to its cluster head, which means every boundary node should consume a certain amount of energy anyhow. In [2], COBOM, an energy-efficient algorithm for boundary detection and continuous monitoring is proposed. If any sensor’s detection status is changed, then the sensor broadcasts its reading and ID. A neighbor node that receives the reading and ID stores the received reading into its array (called BN-array). Any sensors that have different detection status in BN-array from itself become boundary nodes. Among those boundary sensors, a few representative nodes will be selected. The more number of different detection status a sensor has in its array, the more likely it becomes a representative node that eventually report all the gathered detection result data to the sink. This algorithm is energy-efficient in a way such that 1) only a few representative nodes, which actually send report to the sink are chosen and 2) by using the BN-array, the report message size is not increased considerably since each message contains all of its neighbors’ detection status information only rather than keeping the neighbors’ IDs also, which requires only few bits while the precision of boundary monitoring is guaranteed. Figure 2 illustrates the BN-array.
Fig. 2. BN-aray is shown where c is the start node. It is assumed that the sink knows the start node and so do the other neighbors.
In this paper, however, we claim that we can get less number of boundary nodes than [2], which will also lead to less number of representative nodes. As few
Energy-Efficient Tracking of Continuous Objects
327
representative nodes will be chosen, more energy saving will be achieved. We will discuss about this more in detail through section 4.1.2 and 4.1.3. Furthermore, we also argue that if we can truly get more precise prediction on the shape of the objects when representative nodes report all their neighbors’ detection status to the sink. We will discuss more details about this argument in section 4.2.
3 Preliminaries In this section, we make fairly general assumptions about the capabilities of sensor nodes and the framework of sensor networks. And further, we discuss definitions employed in our algorithm. 3.1 Assumptions y y y y y y
Sensor nodes and a sink are stationary. The nodes are homogeneous. I.e., every node has the same capability. Each node has the same communication range R. The sensor nodes are densely and arbitrarily deployed in the network. The sink knows all the nodes’ IDs and locations. We do not concern possible data loss or contention. I.e., all communications between sensors are error free. y Each sensor node knows its own location by possibly using the global positioning system (GPS) [5] or other techniques such as triangulation [6] or localization [3]. 3.2 Definitions Network Model: Our sensor network can be modeled by a graph G=(V, E) in 2-D plane, where vertices V = {v1, v2, …,vn} represent sensor nodes. An edge eij exists between two vertices (nodes) vi and vj when they are within each other’s communication range and E symbolizes the set of all edges in the network. Definition 1. Interior (IN) of a phenomenon We define the interior of a phenomenon (IN) to be the spatial region ℜ2 such that a sensor detects an event, that is, identify a higher value than a predefined threshold at present time (i.e., at time slot t) and thus it is supposed to be placed inside the region. Its reading function would have evaluated to 1(or true). Definition 2. Exterior (OUT) of a phenomenon The exterior (OUT) of the phenomenon can be analogously defined as definition 1. The spatial region ℜ2 where no phenomenon or the reading lower than the threshold discovered is called OUT. The reading function of the nodes in OUT would have evaluated to 0 (or false).
328
J.-H. Kim et al.
Definition 3. Neighbors (Nu) Let u and Nu represent a node and neighbors of u respectively. The neighborhoods Nu are those nodes that are within communication range R of u. Definition 4. Changed Value Nodes (CVNs) Knowing the assumption that all sensors in the network periodically activate and make local observations (sensing) for detecting target objects, we define Changed Value Nodes (CVNs), detecting emergence of the object at the current time slot t when they did not identify any phenomena at previous time(t-1) or vice versa. That is, for any sensor u, if dt-1 = 0 and dt =1 or dt-1 = 1 and dt = 0 the sensor u becomes a CVN, where dt-1 and dt denote detection result from its local area at previous and current sensing time and 0 and 1 represent the reading result, true and false respectively. Figure 3 shows the CVNs when the continuous object expands and shrinks. Definition 5. CompareOneZero message (COZ message) CVNs broadcast a COZ message to their neighbors. This message includes a CVN’s ID and detection status. Definition 6. Boundary Nodes (BNs) A boundary node u is a node that receives at least one COZ message with different detection status. For example, if u’s current detection status is 0(false) and it receives from its one of neighbors, say v, a COZ message that includes v’s reading 1(true), now u is a boundary node since u’s current detection status is different from v’s. Definition 7. Representative nodes (RNs) A representative node is a node that actually sends data to the sink. Only a few representative nodes will be selected among BNs to save energy.
(a) CVNs when expanding
(b) CVNs when shrinking Fig. 3. CVNs in definition 4
4 Tracking Continuous Objects with Boundary Detection In this section, we illustrate the TOCOB algorithm in detail. In section 4.1, we explain each step one by one carefully and in section 4.2 we will discuss about precision of expected boundary and BN-array in detail.
Energy-Efficient Tracking of Continuous Objects
329
(a) This diagram well describes which sensors become boundary nodes in TOCOB
(b) BNs selected in TOCOB when the object expands (left) and shrinks (right)
(c) BNs selected in COBOM. All nodes closely located in to the boundary will become BNs.
Fig. 4. CVNs and BNs when expanding and shrinking
330
J.-H. Kim et al.
4.1 TOCOB Algorithm 4.1.1 CVNs and COZ Message Exchange Step 1: Each sensor, say u, activates and makes local observations periodically. Step 2: CVNs appear when there are some changes of shape of phenomena in the network. When the detection status of the previous time slot t-1 and the present time slot t of u are different, i.e., when dt-1 = 0 and dt =1 or dt-1 = 1 and dt = 0, u becomes a CVN. Step 3: Each CVN broadcasts a CompareOneZero message (hereafter, COZ) to its neighbors. The COZ message includes its own ID and detection status. The more significant expansion, shrinking or change of the object occurs, the more number of sensors will be implicated in broadcasting the COZ message to its neighbors since more sensors’ readings will be changed 0(false) to 1(true) or vice versa. 4.1.2 Boundary Nodes Step 4: A sensor u may receive the COZ messages sent from its neighbor CVNs ( I.e., the CVNs are in u’s communication range) and compares with its own current detection status. If the detection result is the same, then u ignores the COZ messages. It just stays. If u receives any COZ message that includes different detection status from its own, now u is called a BN and it counts the number of receiving COZ messages during a specific given time and based on the number of COZ messages received, the sensor will set a different waiting time, which is a way to select few RNs among the BNs. The following step demonstrates more on representative node selection. Figure 4 is presented for clarification of the explanation for CVNs and BNs. Figure 4 (a) shows which sensors become BNs when the continuous object is expanding. As clearly illustrated, among the sensors which are adjacently located to the boundary of the object, only the nodes that are in OUT will get privileges to become BNs when the object expands. Conversely, only the nodes in IN will become BNs when the object shrinks according to our algorithm. This is the main difference between COBOM algorithm in [2] and our proposed algorithm. As depicted in Figure 4 (b) and (c), COBOM algorithm selects all the nodes that are proximately situated to the boundary of objects as BNs since the nodes that have any different reading in its BN-array from that of itself all become BNs whereas in our proposed architecture, only those nodes that receive different reading from its neighbors can become BNs. Therefore, the number of BNs will be surely different no matter how or how much the object changes. 4.1.3 Representative Node Selection and Back-Off Time Step5: It would be still energy inefficient if all the BNs are involved in sending their data to the sink. Since our main objective in this paper is to construct an energy efficient algorithm that monitors changeable continuous objects, we try to choose as few RNs as possible that actually transmit report to the sink while tolerable accuracy is reserved. An effective method to determine the RNs without excessive message exchanges would be to figure in the number of received COZ messages. Based on the number of received COZ messages that includes different detection status from its
Energy-Efficient Tracking of Continuous Objects
331
own, each BN sets a waiting time. The higher number of the COZ messages a BN receives, the shorter the random back-off time it will be set such that BNs more proximately situated to the boundary of an object get higher probability to become RNs. The back-off timer is set as the following equation (1): ⎧ ⎪ ⎪ ⎛ ⎪ W W W ⎞ ⎪ D = COZ + U ⎜⎜ COZ − 1 − COZ ⎟ COZ total 〉 2 total total total ⎠, ⎪ ⎝ ⎪ ⎛ W W ⎞ ⎪ − ⎜ COZ − 1 COZ ⎟ W ⎪ total total ⎜ ⎟ COZ total = 2 D = + U ⎨ COZ total 2 ⎜ ⎟ ⎪ ⎟ ⎜ ⎪ ⎠, ⎝ ⎪ ⎪ ⎛ W W ⎞ − ⎜ COZ + 1 COZ ⎪ ⎟ total total ⎪D = W ⎟ COZ total = 1 −U ⎜ COZ total 2 ⎜ ⎪ ⎟ ⎟ ⎜ ⎪ ⎠, ⎝ ⎪⎩
(1)
Where D denotes back-off time, W is the maximum waiting time, COZtotal implies total number of COZ messages received and U represents the uniform distribution in [0,W). By using this equation for setting the back-off time, there would be very low chance that two or multiple RNs in communication distance to one another will be set with the same back-off time. Step 6: The BNs that have shorter back-off time will wake up earlier and broadcast a message to suppress its neighbors to become RNs. The node that sent the broadcasting message becomes a RN and on behalf of the nodes around, only the RNs send report data to the sink. The data to the sink includes the RN’s own ID and an ID of a CVN with the most powerful signal strength, which is one of the CVNs that sent a COZ message to the RN in step 4. I.e., whenever the RN receives a new COZ message in a given time, it compares signal strength of the previous COZ message with that of the new COZ message and drops one with lower signal strength. In this way, the RN might keep the closest CVN’s COZ message that will result in better estimation of the continuous object. 4.2 Discussion on the Precision of Expected Boundary and More on BN-Array In this part, we debate on the precision of boundary tracking through algorithms used in our proposed idea and in [2]. We claim that the expected shape of a typical object derived from our algorithm is as accurate as one generated by [2] while ours is more energy efficient. In [2], a RN sends all its neighbors’ detection status to the sink and in ours, each RN sends only one neighbor’s ID. Figure 5 compares expected shapes that can be possibly resulted from [2] and our proposed idea. We define a boundary point as a virtual point that is on the half of a RN and one of nodes in its communication range (with strongest signal strength. i.e., the one explained in step6) and also let an expected boundary be a connection between all the expected boundary points. In this paper, we adopt these two new definitions, boundary points and the expected boundary to measure and compare the precision of the shape of an object. As shown in Figure 5, knowing more boundary sensors may not guarantee that we can
332
J.-H. Kim et al.
accurately localize the boundary of an object or get a very likely shape as a real boundary since the sensors are arbitrarily deployed, i.e., not arranged in a horizontal line, and therefore what we can expect is in somewhere between the two nodes, there exists a authentic boundary of the object but we cannot measure or know exactly where the boundary of the object resides in. Certainly, if the more number of sensors deployed in the whole area of network, the more precise localization of the object will be achieved since the distances between nodes in the network will be shorter, which leads to better localization. Nevertheless, it still does not mean that awareness of all neighbors from RNs may create a better shape.
(a) Expected boundary line when a RN knows all nodes that have different status (left) Expected boundary line when a RN knows only one node that has different status (right)
(b) Expected boundary line in COBOM (left) and TOCOB (right)
Fig. 5. Expected shapes
Secondly, we claim that the report data to the sink in our algorithm can be lighter than or equal to [2] when we assume that each node’s id is not more than 1 byte and node density is high. There are numerous research that achieve id assignments with few bits in wireless sensor networks [10][11]. In our architecture, as mentioned before, each RN transmits its own ID and reading and a neighbor’s id whereas in [2], the report message consists of a RN’s ID, its reading and BN-array. The size of the BNarray can vary, especially when the sensors are densely deployed, it will be large as the number of neighbors for each sensor is high.The report message size estimation is performed in section 5.2.
Energy-Efficient Tracking of Continuous Objects
333
5 Performance Evaluations In this section, we evaluate the performance of TOCOB based on simulation results. We developed a simulator using Java to evaluate and compare the performance of Dynamic Structure [1], the COBOM algorithm [2] and our proposed idea. Unlike COBOM algorithm, we do not consider static objects in this paper due to the limitation of page but we fully focus on moving objects only. Albeit our simulations do not concern possible data loss or contention between nodes and how to route data to the sink, the simulations will experientially assure that TOCOB is more energy-efficient than previous works due to less number of RNs. Each simulation is run 1000 times and we assume the node ID is 1byte. Table 1. Simulation Parameters Parameters
Values
field size (m) number of nodes communication range (m) object radius increase (m/time slot) total time slots sensing and reporting periodicity (time slot)
100 x100 500(sparse setting) or 1000 (dense setting) 6 0.5 60 3
5.1 Simulation Model In our simulation, as the number of BNs and RNs will be surely dependent on density of nodes deployed, we will vary the number of nodes while fixing the field size. Sensor nodes are distributed over a 100 x 100 m2 field. In each experiment, 500 or 1000 sensor nodes are deployed arbitrarily in the field to simulate a sparse or dense setting. The communication range of each node is also fixed to 6 m. We simulate a continuous object in the square area with a circle. The circle is initially centered at (50,50) and it continually expands during 60 time slots. In each time slot, its radius increases by 0.5m and all sensors in the network activate and make local observations every 3 time slots. The simulation parameters for our proposed algorithm are given in table 1. 5.2 Simulation Results 5.2.1 Less Number of Boundary Nodes and Representative Nodes In this part, we compare different number of BNs and RNs selected during simulation where two different densities of nodes are set in fixed size of region. Figure 6 compares numbers of BNs chosen that are directly or indirectly involved in transmitting data to the sink. According to the Figure 6 (a) and (b), it is obvious that COBOM [2] generates many more number of BNs than Dynamic Structure [1] and our proposed algorithm (TOCOB) since any node that has different detection status from its neighbors becomes BNs (i.e., the nodes near the boundary of an object in IN and OUT) whereas in Dynamic Structure and TOCOB, BNs are those near the boundary of the object in either IN or OUT. TOCOB produces slightly more number of BNs than Dynamic Structure because, in case that an object expands (not shrinking or merely changes), nodes that are outside will be determined as BNs in our approach
334
J.-H. Kim et al.
whereas for Dynamic Structure, no matter the object shrinks or expands, it will get BNs from inside the object. However, from the fact that 1) each CVN in Dynamic Structure broadcasts a request first and nodes that receives the request messages also need to reply on the request and 2) All BNs selected are eventually necessitated to be participated in transmitting data to the sink, it is unreasonable to allege that less number of BNs in Dynamic Structure saves more energy than in TOCOB.
(a) Sparse setting (N=500)
(b) Dense setting (N=1000)
Fig. 6. Comparing the number of Boundary Nodes selected
(a) Sparse setting (N=500)
(b) Dense setting (N=1000)
Fig. 7. Comparing the number of Representative Nodes selected
In Figure 7, we compare the number of RNs selected in COBOM and TOCOB. Since Dynamic Structure does not select RNs, we do not take it into consideration. As apparently illustrated, difference of the number of RNs becomes larger as the object’s radius increases. As the object grows, the region that BNs and RNs should cover also gets larger and therefore, it is natural that RNs involved in detection increase in both algorithms. For COBOM algorithm, however, since the RNs in COBOM should cover both IN and OUT whereas only the nodes that are either IN or OUT can become the RNs in our proposed architecture and thus the number of RNs in TOCOB slowly
Energy-Efficient Tracking of Continuous Objects
335
increases compared to that of COBOM as the uncovered area (i.e., OUT when IN is chosen and vice versa) also becomes larger. This status quo becomes more apparent when density of nodes deployed is high. As seen in Figure 7(b), when the object’s radius becomes 30meters, the difference of the number of RNs becomes more than doubled once there was almost no difference when the object was small. In summary, the difference of the RNs get significantly large when the continuous object expands over the network where the density of nodes deployed is high. According to the Figure 6 and 7, hence, we can conclude that our algorithm produce less number of BNs and RNs and the difference of the numbers become more significant when the object expands where the high density of nodes is present. According to those facts, it is obvious that our algorithm, TOCOB can reduce more traffic between the RNs and the sink as well as the communication between the RNs and their neighbors. 5.2.2 Comparison of Average Report Data Size In this part, we compare average report data size in COBOM and TOCOB. We consider two density settings, i.e., when the number of nodes deployed is 1000 and 500. The left three columns in table 2 indicate average report data size where the number of nodes is assumed to be 1000 and the rest of the columns on the right hand side are when assuming 500 nodes are deployed. Meta data in density column implies the object’s radius and each figure in Proposed and COBOM columns signifies average number of total report data where each experiment is done 1000 times. Since, in our algorithm, each RN sends its own ID (1byte) , reading(1bit) and only one neighbor’s ID(1byte), we assume that the RNs in our architecture sends 3 bytes for each report message to the sink. For COBOM, we assume that each report message size can vary since it contains a RN’s ID (1byte) and reading (1bit) and a BN-array. Because the size of the BN-array depends on number of neighbors around, if a RN has more than 7 neighbors, it means the message size becomes the same as ours and if the neighbors are more than 15, the message size becomes 4bytes, which is bigger than the report message in our architecture and so on. Each figure in Proposed and COBOM columns is computed as follows: Total average data size = average # of RNs × corresponding report data size
(2)
For example, in COBOM, if the total average number of RNs is 10.367 when the object’s radius is 7.5meters and the average number of RNs that keeps 2, 3 and 4 bytes are 1.175, 7.815 and 1.377 respectively. Then the total average data size computation would be 2 (bytes) * 1.175 + 3 * 7.815 + 4* 1.377 that is 31.303 bytes whereas in our algorithm, it is 10.367 * 3(bytes), which is 31.101 bytes. After the calculation of the total average report data size for each period, we simply add up all the computed values and compare the total values. In this way, we might be able to induce which method is more effective to reduce the report data size. It is shown in table 2 that as the density of nodes in the network increases, our proposed architecture produces less report data size overall. However, it is also clearly demonstrated that when the density is low, our average report message is heavier than that of COBOM. Therefore, we can conclude that our idea for reducing the size of report data is as effective as COBOM when the density is low and becomes more effective as the density is high.
336
J.-H. Kim et al. Table 2. Average report data size Density N=1000 1.5m 3.0m 4.5m 6.0m 7.5m 9.0m 10.5m 12.0m 13.5m 15.0m 16.5m 18.0m 19.5m 21.0m 22.5m 24.0m 25.5m 27.0m 28.5m 30.0m
Proposed (3bytes) 2.172 11.421 21.27 25.155 31.101 37.479 43.764 49.887 56.43 63.567 69.873 77.157 82.908 89.343 96.897 102.375 110.214 116.673 122.394 128.889
COBOM (2,3,4 or 5bytes) 2.189 11.509 21.312 25.189 31.303 37.759 43.985 49.596 57.265 64.749 69.913 77.358 83.008 89.637 97.067 102.597 110.232 117.033 122.904 129.406
Density N=500 1.5m 3.0m 4.5m 6.0m 7.5m 9.0m 10.5m 12.0m 13.5m 15.0m 16.5m 18.0m 19.5m 21.0m 22.5m 24.0m 25.5m 27.0m 28.5m 30.0m
Proposed (3bytes) 0.948 3.798 7.974 10.653 13.515 16.332 19.194 22.143 24.843 27.534 30.879 33.552 36.582 39.042 41.745 44.502 47.709 50.568 53.286 56.649
COBOM (2,3,4 or 5bytes) 0.704 2.807 5.866 7.835 9.88 11.965 14.058 16.237 18.196 20.181 22.609 24.526 26.764 28.538 30.587 32.612 34.963 37.054 39.01 41.511
total
1338.969
1344.011
total
581.448
425.903
6 Conclusion This paper proposes the TOCOB algorithm for boundary detection and monitoring of continuously moving phenomena in wireless sensor networks. The algorithm selects only few representative nodes, which transmit report data to the sink among a small subset of boundary nodes and thus reduces traffic between the RNs and the sink node as well as the communication between the RNs and their neighbors. Furthermore, by sending only one of neighbor nodes’ ID, which is possibly the closest node among its neighbors that have different current reading, we have verified that the report message size can be smaller than the previous work, especially when the density of the deployed nodes is high. We also believe that the expected shape of the object can be as precise as the ones in previous works. In Section 5, we presented some simulations that support our allegation. The results show that our proposed idea greatly outperforms the previous works in terms of energy-efficiency, especially when the density of the nodes deployed is high, our algorithm is absolutely dominant. Future work will include verification of precision of expected boundary and invention of a new algorithm that considers residual energy of each node.
References 1. Ji, X., Zha, H., Metzner, J.J., Kesidis, G.: Dynamic Cluster Structure for Object Detection and Tracking in Wireless Ad-Hoc Sensor Networks. In: ICC conference (2004) 2. Zhong, C., Worboys, M.: Energy-Efficient Continuous Boundary Monitoring in Sensor Networks. Technical Report (2007), http://www.spatial.maine.edu/~czhong/boundary_monitoring.pdf
Energy-Efficient Tracking of Continuous Objects
337
3. Liao, P.-K., Chang, M.-K., Jay Kuo, C.-C.: Distributed Edge Detection with Composite Hypothesis Test in Wireless Sensor Networks. IEEE Communication Society Globecom (2004) 4. Chintalapudi, K., Govindan, R.: Localized Edge Detection In Sensor Fields. Ad-hoc Networks Journal, 273–291 (2003) 5. US Naval Observatory (USNO) GPS Operations (2001), http://tycho.usno.navy.mil/gps.html 6. Bulusu, N., Heidemann, J., Estrin, D.: GPS-less low cost outdoor location for very small devices. IEEE Pers. Commun. 7, 28–34 (2000) (Special Issue on Smart Space and Environments) 7. Singh, J., Madhow, U., Kumar, R., Suri, S., Cagley, R.: Tracking Multiple Targets Using Binary Proximity Sensors. In: IPSN (2007) 8. Jeong, J., Hwang, T., He, T., Du, D.: MCTA: Target Tracking Algorithm based on Minimal Contour in Wireless Sensor Networks. In: IEEE INFOCOM proceedings (2007) 9. Liu, J., Cheung, P., Guonidas, L., Zho, F.: A dual-space approach to tracking and sensor management in wireless sensor networks. In: ACM International Workshop on Wireless Sensor Networks and Applications Workshop, Atlanta (2002) 10. Kang, J.H., Park, M.-S.: Structure-based ID Assignment for Sensor Networks. IJCSNS International Journal of Computer Science and Network Security 6(7) (2006) 11. Oud-Amed-Vall, E., Blough, D.M., Heck, B.S., Riley, G.F.: Distributed Unique Global ID Assignment for Sensor Networks. In: Proceeding of IEEE International Conference on Mobile Ad-Hoc and Sensor Systems, Washington DC (2005)
Data Randomization for Lightweight Secure Data Aggregation in Sensor Network Abedelaziz Mohaisen1 , Ik Rae Jeong2 , Dowon Hong1 , Nam-Su Jho1 , and DaeHun Nyang3 1
Electronics and Telecommunication Research Institute, Daejeon 305-700, Korea {a.mohaisen,dwhong,nsjho}@etri.re.kr 2 The Graduate School of Information Security, Korea University, Seoul, Korea [email protected] 3 Information Security Research Laboratory, Inha University, Incheon, Korea [email protected]
Abstract. Data aggregation is one of the main purposes for which sensor networks are developed. However, to secure the data aggregation schemes, several security-related issues have raised including the need for efficient implementations of cryptographic algorithms, secure key management schemes’ design and many others. Several works has been introduced in this direction and succeeded to some extent in providing relatively efficient solutions. Yet, one of the questions to be answered is that, can we still aggregate the sensed data with less security-related computation while maintaining a marginal level of security and accuracy? In this paper, we consider data randomization as a possible approach for data aggregation. Since the individual single sensed record is not of a big concern when using data for aggregation, we show how data randomization can explicitly hide the exact single data records to securely exchange them between nodes. To improve the security and accuracy of this approach, we introduce a hybrid scheme that uses the cryptographic approach for a fraction of nodes. We study the efficiency of our schemes in terms of the estimate accuracy and the overhead. Keywords: security, sensor network, data aggregation, computation efficiency, data randomization, experimental justification.
1
Introduction
Data aggregation is one of the main functions for which the wireless sensor networks (WSN) are developed. In data aggregation networks, the different sensor are scattered in a field for sensing some physical phenomena (e.g., temperature, light, humidity, etc). The avalanched aggregated value of several readings over the time is of more interest rather than the single reading. However, to enable nodes to perform the in-network processing, the concept of secure data aggregation (SDA) is introduced. The SDA has been studied intensively in the context of
This work was supported by the IT R&D program of MIC/IITA. [2005-Y-001-04, Development of next generation security technology].
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 338–351, 2008. c Springer-Verlag Berlin Heidelberg 2008
Data Randomization for Lightweight Secure Data Aggregation
339
the security study in WSN where several cryptographic-based schemes have been introduced. A nice survey of these works is in [1]. In WSN, the cryptographicbased aggregation schemes are used so far [2,3,4,5]. In these schemes, the sensing node encrypts the raw sensed data using a previously shared key (in the symmetric key model) or public key of the other node (in the public key model) and forwards the encrypted data to the destination which has the corresponding key. The destination in this case is the aggregator. Upon receiving the forwarded encrypted records, the aggregator decrypts them, obtain the raw data, and perform the aggregation function on the aggregated data. For the above model, both public and symmetric key techniques have been investigated. The public key algorithms have been shown to be computationally feasible to some extent on the typical sensor nodes [6,7,8]. As the public key authentication is crucial requirement for the deployment of public key, authentication services have been introduced in [9,10]. Also, secret key pre-distribution services have been introduced in [11,12,13,14,15] and key revocation techniques have been introduced in [16] to make the applicability of these algorithms and techniques more feasible on the typical sensor nodes. However, to deploy the public key algorithms widely in WSN, several algorithms and designs need to be considered as the aforementioned existing algorithm do not solve the aforementioned problems perfectly. On the other hand, the applicability of symmetric key algorithms in WSN, though computationally feasible, is subject to the resiliency and connectivity tradeoff [9]. As another direction of performing aggregation, we investigate the applicability of the data randomization for efficient data aggregation. The work is motivated by the question of that: can we still reduce the overhead while performing the same task of marginally securing the data aggregation? To answer the above question, we try the data randomization as a solution. The data randomization has been intensively studied in the context of privacy preserving data mining (PPDM) [17]. In the PPDM, the data owner needs to publish an image of his private data to be used by third party for applying data mining algorithms without revealing this data’s privacy [18]. That is, the data itself is modified using some mechanisms so that modified data is statistically similar to the original data leading to that some aggregate functions can be still applied on the modified data with an acceptable accuracy. An example of the modification techniques is the data perturbation. A promising feature for making the applicability of the randomization more feasible in sensor network is that many randomization components are used already as part of the sensor node design like TinyRNG [19] and RandomLFSR [20]. Though, the data perturbation algorithms face an accuracy/privacy trade-off which is related to that increasing the privacy of the data by increasing the deviation of the added noise (in case of normally distributed noise) results in a high loss in the aggregate accuracy [18,21]. However, one of the facts that may help in reducing the impact of that problem is the huge amount of data delivered by the different sensor nodes over the time making it minimizing the impact of the accuracy loss.
340
A. Mohaisen et al.
In this paper, we consider the data randomization as a possible technique for secure data aggregation in sensor network. We begin with the randomizationonly scenario instead of the existing cryptographic-based aggregation. Facing the accuracy and security problems arising from that, we extend this scheme to more secure/accurate hybrid scheme in which both randomization and cryptographic approaches are utilized. To evaluate our scheme and demonstrate the goal beyond its design, we study the overhead analysis in terms of computation. We also study the accuracy of aggregation estimate. The rest of this paper is organized as follows: section 2 introduces the assumptions and network models which are used through the paper, section 3 introduces the details of our scheme, section 4 introduces the analysis of our scheme, and finally, section 6 draws concluding remarks for future works.
2
Definitions and Network Model
The nodes in the network are represented as s1 , s2 , . . . , sn where the group itself is represented as S. The sensed data by the nodes respectively is denoted with the random variable D where d1 , d2 , . . . , dn ∈ D. Also, we define the random variable X which is used to generate noise such that x1 , x2 , . . . , xn ∈ X. The above random variable statistical characteristics like the mean and deviation. ¯ x¯ for D and X respectively. Also, we define the noise addition The mean is d, ¯ The following operations and their inference operation which is invertible by . are applied on the random variable realizations: – di{∈D} xi{∈X} → yi{∈Y } . That is, D X → Y . ¯ i{∈X} → di{∈D} . That is, Y X ¯ → D. – yi{∈Y } x Through this paper consider the following: D represents the the sensed data, X represents the noise and Y represents the randomized data. In the following, we define the set of definition used through the rest of the paper. 2.1
Definitions
Definition 1 (aggregation function). For a set of sensed data (d1 , d2 , . . . , dn ) ∈ D that is sensed by the set of sensor nodes s1 , s2 , . . . , sn , in the context of this paper, the aggregation function f (d1 , d2 , . . . , dn ) is a function that computes a single value result from a collection of inputs. Here, we mainly define the following aggregate function instances: – summation: f (d1 , d2 , . . . , dn ) = ni=1 di . n – average: f (d1 , d2 , . . . , dn ) = n1 i=1 si . – maximum: f (d1 , d2 , . . . , dn ) = max{di |i = 1, 2, . . . , n} – minimum: f (d1 , d2 , . . . , dn ) = min{di |i = 1, 2, . . . , n} – median: f (d1 , d2 , . . . , dn ) = dr : r = n+1 2 where {d1 , d2 , . . . , dn } are sorted. – count: f (d1 , d2 , . . . , dn ) = |{di |i = 1, 2, . . . , n}|.
Data Randomization for Lightweight Secure Data Aggregation
CH
CH
341
CH
Base Station
CH
node
Fig. 1. Illustration: network model
More precisely, in the context of sensor network, the average function is the mostly used. Definition 2 (distribution function). Let X be a discrete random variable where x1 , x2 , . . . , xn ∈ X. The random variable has a distribution D with a mean value μ and a standard deviation σ. The commutative distribution function (CDF) of X is defined as F (x) = P (X ≤ x) = X xi ≤x P (X = xi ) = p(x ). This function is used to generate all the instances (a.k.a., nonces) i xi ≤x of the random variable X. Definition 3 (Derandomization). In the context of this paper and unlike the common use of the word, the derandomization process is defined as the operation of generating a random sequence of random integers that belong to a random variable with specific statistical parameters which are equivalent to the added noise in the randomization phase. Two random variables X and Y are equivalent if they are equal in distribution. That is, P (X ≤ x) = P (Y ≤ x)∀x. 2.2
Network Model
In this paper, we use the well known 3-tiers model [22,23]. The network of 3tiers is a common in the large networks and consists of the huge number of sensor nodes which are classified into clusters [24,25]. The cluster is a functional grouping of the nodes where each cluster has a cluster head (CH). The communication pattern follows a forwarding mechanism where each cluster has backbone nodes which are updated frequently to keep power consumption fair. The cluster head communicates immediately with the base station (BS) or through the sink. The base station can be typically a special kind of sensor node connected to a computer machine. An illustration of the network model is shown in Fig. 1.
3
Randomization for Lightweight Aggregation
In this section, we introduce the details of our scheme including the basic randomization-only scheme, its shortage, and hybrid scheme. Our schemes both consider the above notation and definitions as an underlying intuition. 3.1
Randomization for Secure Data Aggregation
The randomization-only scheme consists of two stages which are namely the offline and online phases. The two phases are performed as follows:
342
A. Mohaisen et al.
Offline Phase: In the offline phase, the set of node within the cluster agree on the distribution parameters required as entries for the randomization function. That is equivalent to statically assigning the different parameters (e.g., μ, σ, etc) to the different nodes. Online Phase: In the online phase, the data randomization is performed upon the need of forwarding the data to the cluster head. That is, the procedure interaction in Fig 2 is performed (for cluster with size c). The protocol in Fig. 2 can be summarized as follows: 1. At Each Node: each node si generates a random nonce xi ∈ X using its own parameters, add the generated xi as a noise to the sensed di resulting yi as yi = xi di , and forwards yi to the cluster head. 2. At the Cluster Head: The cluster head (CH) receives the forwarded randomized data [y1 , y2 , . . . , yc ] from the different nodes [s1 , s2 , . . . , sc ] in the cluster, generates a vector of random nonces [z1 , z2 , . . . , zc ] ∈ Z, remove the = Y Z ¯ = equivalent (in distribution) noise to the added one resulting D ¯ ¯ [y1 , y2 , . . . , yc ][z1 , z2 , . . . , zc ] where is the corresponding items with the corresponding indexes resulting [d1 , d2 , . . . , dc ]. Then, on the resulting [d1 , d2 , . . . , dc ], the CH performs the aggregation function resulting A = f (d1 , d2 , . . . , dc ) and finally, using some previously shared key K and encryption algorithm Enc, the CH encrypt the resulting A to A = EncK (A). 3. At the Base Station (BS): using some previously agreed-on key K and decryption algorithm Dec, the BS retrieve A = DecK (A ) for each received A value from each CH in the network and then the BS performs its own aggregation function f (A1 , A2 , . . . ) and estimate the final aggregated value. Note that the aggregation works well because the modification of the data will maintain the same mean of the modified data. That is, the statistical properties of X and Z are same resulting that E[X + D] = E[Z + D] = E[Y ] when using the simple addition instead of . When using the simple subtraction instead of ¯ we get that E[X] = E[Y ] − E[Z] = E[Y ] − E[X]. Limitations: the limitations of the above scheme are two. First, the accuracy due to the randomization grows highly when we set σ to a large enough value that guarantees good standards of security (as shown in Fig. 4(b)). Even though a small deviation could be sufficient for randomization if we set the mean of the noise to some non-zero value, possible data filtering attack can be applied to that once the σ is small. The second shortage is that not all of the aggregation functions shown in Definition 1 can be applied on the perturbed data accurately due to the high variance. For example, the min, max and median functions cannot be applied with the required precision. To overcome this shortage, we introduce the hybrid scheme which introduces reasonable solutions for the two problems.
Data Randomization for Lightweight Secure Data Aggregation at each node si Gen xi ∈ X yi ← x i + d i
343
yi
-
at the cluster head CH receive [y1 , . . . , yc ] Gen [z1 , . . . , zn ] ∈ Z : μZ = μX , σZ = σX [d1 , . . . , dc ] ← [y1 , . . . , yc ] + [z1 , . . . , zc ] A ← f (d1 , . . . , dc ) A ← EncK (A))
A at the base station BS receive [A1 , . . . , Ac ] [A1 , . . . , Ac ] ← DecK ([A1 , . . . , Ac ]) Af in ← f (A1 , . . . , Ac )
Fig. 2. The online phase of the randomization-only algorithm
3.2
Hybrid Scheme: Randomization with Encryption
In the above scheme, once the attacker had access to enough number of points in the randomized data, he can study its distribution (when some hint is given on the type of the distribution). Therefore, it would be good choice to harden this kind of natural attack. To do so, the number or randomized data records need to be carefully assigned in that they do not reveal further information. In that case, the attacker will still have some ability to study the distribution but with very high percentage of estimate error. Our solution for solving this is the hybrid scheme. In this scheme, not only randomization but also encryption is performed. The hybrid scheme also consists of two phases; namely, offline and online phases which are detailed in the following subsections. Offline Phase: in this phase, the different nodes in the network are predetermined to whether to use the encryption scheme or the randomization scheme for data delivery. That is, the following is performed: – The operator divide the sensor nodes to be used within each cluster into two parts representing the number of nodes that will use the encryption scheme and the nodes that will use data randomization. The number of nodes is nr = n − ne and ne for randomization and encryption respectively. – In the set of nodes to use the randomization scheme, the operator assign the randomization parameters. – In the set of nodes to use the encryption scheme: • The operator assign the encryption scheme to each of the different nodes. • Based on the encryption scheme (say, symmetric key model), the operator pre-assign the keys (as in [13]) or keying material (as in [11,12]).
344
A. Mohaisen et al.
Data Separation: For the packets which include data that has been treated by the randomization method, a ‘0’ flag is attached and for the data which has been encrypted, a ‘1’ flag is added. That is, if < 0||di > is received, the cluster head will perform derandomization for di and if < 1||di > is received the CH will use the decryption. Online Phase: In the online phase, the raw data is encrypted or randomized based on the the class of the nodes. That is performed in the following steps. 1. At the node side: For each node si in the cluster c, the following is performed on the sensed data item di : (a) If si is in the randomization group, si generates a random nonce xi ∈ X, performs the noise addition to generate yi as yi = xi di and forwards < 0||yi > to the cluster head. (b) If si is in the encryption group, si encrypts di using the pre-assigned key and the pre-loaded encryption scheme resulting yi = EncK (di ) and forwards < 1||yi > to the cluster head. 2. At the cluster head: for each received data from the different nodes, the following is performed: at si [σ, μ] or [K, Enc, Dec] if si ∈ Rn Gen xi ∈ X yi ← x i + d i f=0 if si ∈ En yi = Enck (di ) f=1
at CH [σ, μ and K, Enc, Dec]
at BS [K, Enc, Dec]
-
< f||yi > if f == 0 Gen zi ∈ Z di ← yi + zi if f == 1 di ← Deck (yi ) repeat for each si ∈CH A ← f (d1 , . . . , dn , d1 , . . . , dn ) A
A ← DecK (A repeat for each CH Af ← f (A1 , A2 , . . . ) Fig. 3. The online phase of the hybrid scheme
Data Randomization for Lightweight Secure Data Aggregation
345
(a) If < 0||yi > is received, using the same randomization parameters at the node side the cluster head CH generates a vector of random nonces [z1 , z2 , . . . , zc−nr ] ∈ Z. Then, the CH performs the addition inverse = ¯ for the resulting Z and the received Y resulting D operation ¯ ¯ 1 , z2 , . . . , zc−nr ] where ¯ resulting Y Z =[y1 , y2 , . . . , yc−nr ] [z [d1 , d2 , . . . , dc−nr ]. Finally, the CH performs the aggregation function on [d1 , d2 , . . . , dc−nr ] that leads to Ar = f (d1 , d2 , . . . , dc−nr ). (b) If < 1||yi > is received, using some previously agreed-on key K and decryption algorithm Dec, the CH retrieves di = DecK (yi ) and performs the aggregate function on the resulting set d1 , d2 , . . . , dne resulting Ae = f (d1 , d2 , . . . , dne ). (c) The CH performs the aggregation function on the results Ae , Ar in the previous two steps resulting A = f (Ar , Ae ) and using some previously shared key with the BS (K) and encryption algorithm Enc, the CH en crypt the resulting A resulting A = EncK (A). 3. At the base station (BS): Using some previously agreed-on key K and de cryption algorithm Dec, the BS retrieves A = DecK (A ). This is performed for the different cluster heads in the network. Finally, the BS performs its own aggregation function f (A1 , A2 , . . . ) and estimate the final aggregated value. A brief description of this protocol is shown in Fig. 3.
4
Analysis and Evaluation
In this section, we introduce the analysis of our scheme. This typically includes the evaluation of overhead in terms of the required computational power, possible attack scenarios and their countermeasures, and finally the accuracy of aggregation estimate for some commonly used aggregation functions. 4.1
Overhead Evaluation
for the overhead evaluation, there are three scenarios: the randomized only, the encrypted only, and the hybrid scheme. The memory and communication requirements for each scheme is the same however the computation overhead differs. Scenario 1 (fully randomized): . In the fully randomized scenario, the overall computation overhead results in the computation required the randomization and de-randomization operations at the node and aggregator respectively. That n n is, CO = i=1 Prand + i=1 Pderand = n(Prand + Pderand ). However, Prand is equivalent to Pderand (based on definition 3). Assuming that the required ¯ computation power for calculating is equal to that for calculating n, the final required computation overhead can be concluded as: CO = 2 i=1 Prand = 2nPrand . From that, we define the average overhead per node as 1 Prand = 2Prand . n i=1 n
CO = 2
(1)
346
A. Mohaisen et al.
Scenario 2 (fully encrypted): In the fully encrypted-data scenario, Pe and Pd defines the required power for encryption and decryption respectively. overn The n head required computation power can be defined as: CO = i=1 Pe + i=1 Pd = n(Pe + Pd ). Similar to scenario 1, we define the average required power as: n n 1 CO = Pe + Pd = Pe + Pd . (2) n i=1 i=1 Scenario 3 (hybrid scheme): In the hybrid scheme, both randomization and encryption are used for portions of the network size. Let nr be the number of randomized and ne be the number of encrypted anddecrypted data. nrThat is, the nr overall required overhead can be defined as: CO = i=1 Prand + i=1 Pderand + ne ne n−ne ne i=1 Pe + i=1 Pd = 2 i=1 Prand + i=1 (Pe + Pd ) = 2(n − ne )Prand + ne (Pe + Pd ). Similarly, we define the average overhead per node as CO = CO n which results: CO =
2(n − ne )Prand + ne (Pe + Pd ) . n
(3)
By evaluating the above equation at the experimental values for the parameters Pe , Pd and Prand , we can write the average computation (when using TinyRNG) per node as a function of the network size and number of nodes that use the ne encryption scheme only as: CO = 45.6 n−12.72 μJ. n 4.2
Possible Attacks
Several attacks have been studied in the literature of the data perturbation. Some of these attacks are general and some are scheme-specific. However, in all of the attacks regardless to their type, the adversary tries to derive the perturbed data from the modified data given some apostriori knowledge on some of the original data [21,26]. For example, in [26] the independent component analysis technique (ICA) [27] is used to derive the original data from the perturbed data under some conditions. However, this attack will not work with our scheme for two reasons: (i) the data in our scheme is modified separately. (ii) The Non-Gaussianity condition for the original data cannot be satisfied. For similar shortages, the PCA attack in [21] cannot be directly applied to our scheme. Another more serious attack on the additive noise has been also studied in [28]. However, to accomplish a high precision of estimation for the modified data, the deviation σ need to be as small as possible. In our scheme, however, we can set the deviation dynamically considering the required aggregation accuracy and security level. 4.3
Accuracy of Aggregation Estimate
As we previously assign the statistical parameters for the different random variable from which the noise is generated, the resulting aggregate result after the
Data Randomization for Lightweight Secure Data Aggregation
347
derandomization process will have a small deviation from those values which are calculated on the raw data prior randomization. For example, the deviation of from the mean in D is defined as follows: the mean in D n n − d| = | 1 ( Δd = |D di − di )|. (4) n i=0 i=0 In Table 2, the deviation value is used to express the error of the estimate due to the randomization as a percentile from the original prior randomization estimate. That is, these values are expressed as Δd × 100% for the above mean d deviation Δd.
5
Experimental Results
In section, we detail the evaluation of our proposed scheme in terms of the required average overhead in term of computation, the aggregation estimation over the randomized data, and finally, the accuracy of the resulting results compared to those theoretically performed before randomization. To experimentally estimate the required overhead, we consider the RandomLFSR [20] and TinyRNG [19] as random number generators. For the corresponding symmetric key algorithm, we consider the AES-128. For evaluating the impact of randomization on the accuracy, we consider Intel Lab Data 1 . The used data reflects sensing four different phenomena which are the voltage, temperature, humidity and light. The data is collected over 32 days using 54 typical sensor nodes. For our usage, we consider a fraction of 1296 readings per node and perform our simulation on them. 5.1
Numerical Results of Power Consumption
The power consumption on the typical Crossbow’s Mica2 [29] to perform a randomization for generating a 64-bit random number using the RandomLFSR algorithm [20] is 0.75 μJ [19]. For the same settings, the consumption is 11.4 μJ using the TinyRNG algorithm [19]. The symmetric key’s encryption and decryption operations using the AES-128 are estimated at the level of 12.96 μJ and 19.92 μJ respectively [6]. For ne = nr = 50% of n, the overhead in the hybrid scheme can be rewritten as CO = Prand + 0.5(Pe + Pd ). Based on that, Table 1 is driven. In Table 1, I denotes the saving in the energy as a percentage from the original used in the encryption scheme. That is, I is defined as follows: I=
CO encryption only − CO specified scheme × 100%, CO encryption only
(5)
where the specified scheme can be the hybrid or randomized using either of the randomization algorithms. Though the TinyRNG is computationally heavier than the RandomLFSR resulting a smaller I, the former algorithm (i.e., TinyRNG) is recommended to be used due to its more accurate results [19]. 1
Available at: http://db.csail.mit.edu/labdata/labdata.html
348
A. Mohaisen et al. Table 1. Comparison between the three scenarios in terms of computation Protocol CO CO/RandomLFSR CO/TinyRNG Encryption Only 32.88 μJ Randomization Only 1.5 μJ (I = 95.44%) 22.8 μJ (I = 30.66%) Hybrid (ne = 50%)) 17.19 μJ (I = 47.72%) 27.84 μJ (I = 32.39%)
5.2
Data Aggregation and Accuracy: Results
To evaluate the accuracy of data randomization, we perform the experiment on the different sensed data records for the above scenario using the same parameters regardless to their values and the values’ corresponding interval. Fig. 6 shows a representative plots for the original sensed raw data and Fig. 5 shows the randomized data records. for estimating the accuracy, Table 2 summarizes the aggregation error estimation for the different values. Note that, when using the same σ for the different sensed data regardless to their domain, data records with small interval will be fully distorted and their aggregation accuracy will be low (see Fig. 4(d) and 5(d)). In addition, when σ is relatively small compared to the original data’s interval like the case of light aggregation, the distortion will be limited and the accuracy will be high (see 4(c) and 5(c)). To deal with this limitation, σ need to be considered considering the interval of the original data (e.g., with maximum σ as 200% of the mean value of the original data). 5.3
Impact of Randomization on the Accuracy
To guarantee the minimum standards of security, the deviation σ need to be as high as possible. However, by doing that the accuracy of the aggregated data will be lowered. Fig. 6(a) shows the accuracy of aggregation for the calculated mean over the sensed raw data and Fig 6(b) translates the difference into an accuracy ratio. From the two experiments we figure out that accuracy of the aggregation is proportional with deviation σ. Note that when σ is as big as the original data, the Table 2. Error estimation in the aggregation results due to data randomization Temperature (noise: σ = 10, μ = 0) Data Average Summation Count D 21.0341 27260 1296 D 20.5600 26646 1296 error 2.55% 2.55% 0 Light (noise: σ = 10, μ = 0) Data Average Summation Count D 177.7460 230360 1296 D 177.2719 229740 1296 error 0.27% 0.27% 0
Humidity (noise: σ = 10, μ = 0) Data Average Summation Count D 35.8392 46448 1296 D 35.3651 45833 1296 error 1.32% 1.32% 0 Voltage (noise: σ = 10, μ = 0) Data Average Summation Count D 2.7105 3512.8 1296 D 2.2364 2898.3 1296 error 17.49% 17.49% 0
27 25 23 21 19 17 0
4
8
12 16 Time (in hours)
20
24
40 39 38 37 36 35 34 33 32 31 30 29 28 0
(a) Temperature
2
4
6
8 10 12 14 16 18 20 22 24 Time (in hours)
600 550 500 450 400 350 300 250 200 150 100 50 0 0
349
2.8 2.78 Battery’s voltage (in volt)
Relative humidity (in percentile)
Temperature (in Celsius)
29
Light (in Lux)
Data Randomization for Lightweight Secure Data Aggregation
2.76 2.74 2.72 2.7 2.68 2.66
2
4
(b) Humidity
6
8 10 12 14 16 18 20 22 24 Time (in Hours)
0
(c) Light
4
8
12 16 Time (in hour)
20
24
(d) Voltage
Fig. 4. Raw sensed data over a 24 hours’ day from real sensing system representing four different phenomenas from the point of single node
80
40 30 20 10
64 56
Light (in Lux)
Humidity (in percentile)
Tempreature (in Celsius)
72 50
48 40 32 24 16 8
0 0
4
8 12 16 Time (in Hours)
20
24
0 0
(a) Temperature
2
4
6
8 10 12 14 16 18 20 22 24 Time (in hours)
600 550 500 450 400 350 300 250 200 150 100 50 0 0
40 30 Battery voltage (in Volt)
60
20 10 0 −10 −20
2
4
(b) Humidity
6
8 10 12 14 16 18 20 22 24 Time (in hours)
(c) Light
−30 0
4
8
12 16 Time (n Hours)
20
24
(d) Voltage
Fig. 5. Randomized versus raw sensed data over a 24 hours day from real sensing system representing four different phenomena from the point of single node 100
23 Non−randomized Randomized
Accuracy of aggregation (percentile)
22.5
Temperature (average)
22 21.5 21 20.5 20
96
92
88
84
19.5 19
0
5
10
15
20
25
σ (security parameter)
30
35
40
80 0
(a) Mean comparison
8
16 24 σ (security parameter)
32
40
(b) Aggregate Accuracy
Fig. 6. The impact of σ as a security parameter on the accuracy of the aggregation (a) comparison between the non-randomized and randomized aggregation of data for different σ (b) the accuracy of aggregation as a percentile for different σ values
accuracy achieved as higher than 96%. The simulation considers the temperature which can be applied also to the other sensed data with the same consideration.
6
Conclusion and Future Works
The aggregation functions over raw sensed data are meant to perform some statistical functions in which the exact single value is of less importance. In this paper, we utilize this fact and introduce the data randomization as a mean of
350
A. Mohaisen et al.
data hiding. For the attacker to understand the statistical properties of randomized data, he needs to take control over a big fraction of the communication pattern. We showed the efficiency of the randomization in terms of the required computation as the main resources and introduced several perspectives on attacking scenarios including an extension of a hybrid work which generate a trade-off between the resources, accuracy, and security. In the near future, it will be valuable to study the impact of several statistical distributions on the hardening of the data expectation from the randomized data. Also, we will study the impact of multiple randomizations for the single data records on the accuracy of estimate the security.
Acknowledgment The authors would like to thank the anonymous reviewers for their comments. Also, they would like to extend their acknowledgment to the hard efforts of Peter Bodik, Wei Hong, Carlos Guestrin, Sam Madden, Mark Paskin, Romain Thibaux, Joe Polastre, and Rob Szewczyk from Intel Lab, Berkeley for making their sensor’s data set available online.
References 1. Sang, Y., Shen, H., Inoguchi, Y., Tan, Y., Xiong, N.: Secure data aggregation in wireless sensor networks: A survey. In: PDCAT, pp. 315–320 (2006) 2. Chan, H., Perrig, A., Przydatek, B., Song, D.X.: Sia: Secure information aggregation in sensor networks. Journal of Computer Security 15(1), 69–102 (2007) 3. Chan, H., Perrig, A., Song, D.X.: Secure hierarchical in-network aggregation in sensor networks. In: ACM Conference on Computer and Communications Security, pp. 278–287 (2006) 4. Cam, H., Ozdemir, S., Sanli, H.O., Nair, P.: Secure differential data aggregation for wireless sensor networks 5. Yang, Y., Wang, X., Zhu, S.: Sdap: a secure hop-by-hop data aggregation protocol for sensor networks. In: Proceedings of the seventh ACM international symposium on Mobile ad hoc networking and computing, pp. 356–367 (2006) 6. Wander, A., Gura, N., Eberle, H., Gupta, V., Shantz, S.C.: Energy analysis of public-key cryptography for wireless sensor networks. In: PerCom, pp. 324–328 (2005) 7. Watro, R.J., Kong, D., fen Cuti, S., Gardiner, C., Lynn, C., Kruus, P.: Tinypk: securing sensor networks with public key technology. In: SASN, pp. 59–64 (2004) 8. Malan, D.J., Welsh, M., Smith, M.D.: A public-key infrastructure for key distribution in tinyos based on elliptic curve cryptography. In: First IEEE Int. Conf. on Sensor and Ad Hoc Comm. and Networks, pp. 71–80 (2004) 9. Du, W., Wang, R., Ning, P.: An efficient scheme for authenticating public keys in sensor networks. In: MobiHoc, pp. 58–67 (2005) 10. Nyang, D., Mohaisen, A.: Cooperative public key authentication protocol in wireless sensor network. In: UIC, pp. 864–873 (2006) 11. Liu, D., Ning, P.: Establishing pairwise keys in distributed sensor networks. In: ACM CCS, pp. 52–61 (2003)
Data Randomization for Lightweight Secure Data Aggregation
351
12. Du, W., Deng, J., Han, Y.S., Varshney, P.K., Katz, J., Khalili, A.: A pairwise key predistribution scheme for wireless sensor networks. ACM Trans. Inf. Syst. Secur. 8(2), 228–258 (2005) 13. Eschenauer, L., Gligor, V.D.: A key-management scheme for distributed sensor networks. In: ACM CCS, pp. 41–47 (2002) 14. Mohaisen, A., Maeng, Y., Nyang, D.: On the grid based key pre-distribution: Toward a better connectivity in wireless sensor networks. In: SSDU, pp. 527–537 (2007) 15. Mohaisen, A., Nyang, D.: Hierarchical grid-based pairwise key pre-distribution scheme for wireless sensor networks. In: R¨ omer, K., Karl, H., Mattern, F. (eds.) EWSN 2006. LNCS, vol. 3868, pp. 83–98. Springer, Heidelberg (2006) 16. Maeng, Y., Mohaisen, A., Nyang, D.: Secret key revocation in sensor networks. In: Indulska, J., Ma, J., Yang, L.T., Ungerer, T., Cao, J. (eds.) UIC 2007. LNCS, vol. 4611, pp. 1222–1232. Springer, Heidelberg (2007) 17. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: SIGMOD Conference, pp. 439–450 (2000) 18. Bertino, E., Fovino, I.N., Provenza, L.P.: A framework for evaluating privacy preserving data mining algorithms*. Data Min. Knowl. Discov. 11(2), 121–154 (2005) 19. Francillon, A., Castelluccia, C.: Tinyrng: A cryptographic random number generator for wireless sensors network nodes. In: WiOPT (2007) 20. Lee, N., Philip Levis, J.H.: Mica high speed radio stack (2002) 21. Liu, K., Giannella, C., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: F¨ urnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006) 22. Arora, A., Ramnath, R., Ertin, E., Sinha, P., Bapat, S., Naik, V., Kulathumani, V., Zhang, H., Cao, H., Sridharan, M., Kumar, S., Seddon, N., Anderson, C., Herman, T., Trivedi, N., Zhang, C., Nesterenko, M., Shah, R., Kulkarni, S.S., Aramugam, M., Wang, L., Gouda, M.G., Choi, Y.-r., Culler, D.E., Dutta, P., Sharp, C., Tolle, G., Grimmer, M., Ferriera, B., Parker, K.: Exscal: Elements of an extreme scale wireless sensor network. In: RTCSA, pp. 102–108 (2005) 23. Dutta, P., Hui, J., Jeong, J., Kim, S., Sharp, C., Taneja, J., Tolle, G., Whitehouse, K., Culler, D.E.: Trio: enabling sustainable and scalable outdoor wireless sensor network deployments. In: IPSN, pp. 407–415 (2006) 24. Bohge, M., Trappe, W.: An authentication framework for hierarchical ad hoc sensor networks. In: WiSe 2003: Proceedings of the 2nd ACM workshop on Wireless security, pp. 79–87. ACM, New York (2003) 25. Shah, R., Roy, S., Jain, S., Brunette, W.: Data MULEs: modeling and analysis of a three-tier architecture for sparse sensor networks. Ad Hoc Networks 1(2-3), 215–233 (2003) 26. Guo, S., Wu, X.: Deriving private information from arbitrarily projected data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 84–95. Springer, Heidelberg (2007) 27. Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. In: Proceedings of 15th Conference on Uncertainty in Artificial, vol. 14, pp. 21–30 (2000) 28. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD Conference, pp. 37–48 (2005) 29. Inc., C.T.: Wireless sensor networks, http://www.xbow.com/
Mobile Sink Routing Protocol with Registering in Cluster-Based Wireless Sensor Networks Ying-Hong Wang, Kuo-Feng Huang, Ping-Fang Fu, and Jun-Xuan Wang Department of Computer Science & Information Engineering, Tamkang University, Tamsui, Taipei, Taiwan, R.O.C. [email protected], [email protected], [email protected], [email protected]
Abstract. Wireless Sensor Networks (WSNs) are wireless networks consisting of sink nodes and multiple sensor nodes. While the wireless sensor nodes have several advantages, such as the compact size and the low cost, corresponding constraints on resources can result. The greatest challenge among all is the constraint on energy. Therefore, how to minimize the energy consumption while maintaining an extended network lifetime becomes the most critical issue when it comes to the design of the routing protocol for the wireless sensor networks. Keywords: Wireless Sensor Networks, routing.
1 Introduction In the development of wireless dissemination technology, wireless sensor networks have become the most popular research area. The compact size of the micro sensors and the unique characteristics of communications among those sensor nodes used in the wireless sensor networks substantiate the highly applicable role of such networks in fields such as military, business, medical treatments, environmental protection, the disaster assistance & rescue and etc. Typical applications of wireless sensor networks include monitoring and tracking because of their many inherent advantages. Numerous examples have shown that, in recent years of research and development, wireless sensor networks are employed in place of human labor for tasks that either require a long period of monitoring or involve high risks. Each wireless sensor node features data processing, wireless communication, and data sensibility. In addition, all wireless sensor networks comprised of the said wireless sensor nodes acquire features related to rapid construction, easy operation, self-organization or mobility and etc. Hence, all relevant study results have always been deemed with great importance. Generally speaking, the data sensible and disseminating range of a wireless sensor node is typically fixed, and the energy consumed is exponentially proportional to the dissemination distance. The route selected for the data transmission process, thus, has a direct impact on the efficiency of the networks. To develop an efficient routing algorithm which in effect will improve both the performance and the operational time F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 352–362, 2008. © Springer-Verlag Berlin Heidelberg 2008
Mobile Sink Routing Protocol with Registering
353
of the entire wireless sensor network environment has become one of the most import ongoing researches. The main objective of this thesis is to design a mobile sink routing protocol which can be applied to cluster-based wireless sensor networks. It is expected that, with such protocol, evenly distributed energy consumption among all sensor nodes can be achieved, and consequently the overall life time of the wireless sensor networks will be improved. The "Mobile Sink Routing Protocol with Registering in Cluster-based Wireless Sensor Networks” proposed by this thesis is aimed at improving the network lifetime. The major designing concepts are shown as follows: • • •
Eliminate complicated computation upon operation. Reduce electric energy consumption while improving overall network lifetime. Decrease relay frequency of sensor nodes nearby the Sink to prevent failure in sensing data dissemination.
The "Mobile Sink Routing Protocol with Registering in Cluster-based Wireless Sensor Networks” presented in this thesis has a very small energy consumption upon all related operations ranging from the establishment of the wireless sensor network in the clustering phase, the registration of the clusterhead node and mobile sink nodes in the register phase, and the data dissemination phase to the maintenance of the entire wireless sensor network. It is able to efficiently and evenly distribute the electric power consumption among all sensor nodes in the network and, as a result, extend the network lifetime of the entire wireless sensor network. The paper is organized as follows. Section 2, there is a brief introduction regarding the background knowledge for the relevant technology and studies, the relevant routing protocols for the wireless sensor network together with its particular feature thereof. Section 3, we provide applicable methods and the detailed description of mobile sink routing protocol used in the cluster-based wireless sensor network. Section 4, there is comparable and analytical description for the simulation results. The last section is conclusion and future works.
2 Related Works The study of different algorithms for the routing protocols used in the wireless sensor networks has always been of great interests to many scientists. The combined efforts, hence, facilitate in the formation of various solutions optimized for different aspects of issues related to the wireless sensor networks. Despite the variation, all of the proposed routing protocols share one common goal; that is, to increase the network lifetime and in turn extend the applicability of wireless sensor networks. “Low Energy Adaptive Clustering Hierarchy” (LEACH) [2] is a routing protocol proposed by W. B. Heinzelman. As suggested by its name, it is based on the clustering type architecture. Under the clustering architecture, the sensor nodes in a wireless sensor network, depending on their locations, are grouped into various clusters. Within each cluster, one sensor node is selected as the clusterhead node whose major task is to collect
354
Y.-H. Wang et al.
sensible data from all other sensor nodes in the same cluster. The data will then be periodically transmitted to an Access Point or other sink nodes at a longer distance. The selection process of the clusterhead node is completely autonomous. A new clusterhead node is selected from within the same cluster based on a randomly generated number upon completion of each data transmission. Instead of concentrating all high energy cost operations on a single head node, the energy consumption resulted from data transmission between the clusterhead node and an Access Point is equally distributed among all sensor nodes in the wireless sensor network, which in effect increase the operational time of the entire network. This routing protocol is not only one of the most representative protocols derived from cluster type routing algorithm, but also the most widely adopted in all existing wireless sensor networks. Hierarchical Cluster-based Data Dissemination in Wireless Sensor Networks with Mobile Sink(HCDD)[4] is a routing protocol introduced in 2006, and, similar to LEACH, it is also a clustering type routing protocol. However, HCDD differs from LEACH in its concept of mobility and hierarchy. Mobile sink nodes, rather than merely static sensor nodes, are used in HCDD. In addition, the nodes in a wireless sensor network are first grouped into various cluster groups, and the clusterhead nodes from each cluster group will then form layers to establish a fully hierarchical architecture. Under this structure, each cluster group is able to communicate with one another. The sensor nodes at the fringe of each cluster are responsible of the communication with other clusters. The clusterhead node, on the other hand, is in charge of the communication with the mobile sink nodes. Through the hierarchical structures, the clusterhead node is able to track down the mobile sink nodes, in spite of their high mobility, in such a way that it facilitates in disseminating data to the sink nodes. This protocol also successfully minimizes the overall electric power consumption of the entire sensor network by means of mobile type sink nodes and multi-layer routing. A Two-tier Data Dissemination Model for Large scale Wireless Sensor Networks (TTDD)[9] proposed by Haiyun Luo et al., belongs to the Grid-Based routing protocol. This routing algorithm will first partition the network into several grid units consisting of a number of nodes. In this architecture, the location information of nodes can only be managed by those sensor nodes which also serve as dissemination nodes. It should be noted that the data source has to be one of the dissemination nodes configured within this architecture. When a sink node requests for data, it sends out a query message, which can only be disseminated among dissemination nodes, to the data source nodes. Upon receipt of such query message, the data source nodes will then send back the relevant data to sink nodes via a reverse path. In TTDD, however, each data source must form its own grid unit, which consumes a lot of energy, and the result is typically far from efficient. Even if the data is only disseminated within the same grid unit, the amount of energy consumed is too much to for practical use. In light of the above discussion on related studies and thesis, it is apparent the importance of a good routing protocol in the overall operation of a wireless sensor network. How to efficiently disseminating the sensed data to sink nodes while minimizing the energy consumption is of the up most concern in the routing protocol design. To enhance the performance and applicability of the wireless sensor networks, a well improved scheme from the above methods will be presented, with detailed descriptions of all related architectures and methods, in the following chapters.
Mobile Sink Routing Protocol with Registering
355
3 Proposed Method Each sensor node in the network will exchange information regarding its own geographical address and the status of energy supply with one another, save all relevant information, and establish a Neighbor Information Table (NIT), which will then be utilized for future operations, such as the selection of clusterhead nodes and data dissemination, in the initialization stage of the network. The proposed method will have four main phase as follows Clustering Phase(CP) Register Phase(RP) Data Dissemination Phase (DDP) and Maintenance Phase(MP).
:
、
、
3.1 Clustering Phase(CP) After all sensor nodes are properly configured, each one of them will start building its own Neighbor Information Table (NIT) and be divided into clusters. The node with greater energy capability, because of its shorter wait time, has a higher priority in acquiring an available gateway. After an available gateway is acquired, it updates its own status as a clusterhead node and broadcasts an ADV-CH message notifying all neighboring nodes in NIT. When the neighbor node receives the ADV-CH message, it will stop the timer and join into the newly formed cluster. If a sensor node receives several ADV-CH message at the same time, it will make its selection based on the signal strength and join the cluster closest to its location. The sensor node in order to join a cluster will reply with a JOIN (node_ID, t) message, and within the JOIN (node_ID, t) message the node_ID of the sensor node and the delivery time of the JOIN message are included. During the cluster dividing phase, we will set up a t parameter to count down the CF
time spent on the cluster division. When t
CF
reaches 0, all sensor nodes not yet associ-
ated with any cluster send out the JOIN (node_ID, t) message to all neighboring nodes listed in its NIT to request for joining into the clusters formed by neighboring nodes. Upon completion of cluster division for the network, each clusterhead node in the cluster will establish an Intra-Cluster Schedule Table (IACST) and, according to the remaining electric power capacity of each node, an entry of each future candidate for clusterhead is recorded in the table. The format of the IACST is shown in Table 1 as follows: Table 1. Intra-Cluster Schedule Table (IACST)
3.2 Register Phase (RP) The wireless sensor networks enters the Register Phase (RP) upon completion of the clustering process described in the previous section. During this phase, the mobile sink starts to move around in the network according to the movement pattern calculated based on the Hidden Markov Model (HMM) [7] of the Random Waypoint.
356
Y.-H. Wang et al.
Meanwhile, all sensor nodes and the clusterhead node, depending on its current state out of the possible three - register, transmission, and sleep which define the three primary aspects of its tasks, will also start their own tasks respectively. The sink node, as it moves around in the network, sends out a Search_CH message containing information of its moving velocity, v. When the mobile sink node enters the valid dissemination range of some clusterhead node, and the Search_CH message is intercepted by such head node, the clusterhead node starts measuring the stationary time, T , of the mobile sink node within its valid dissemination range. In addition, it s
sends out a Request To Register (RTR) message with its address information included to the mobile sink node requesting for registration. The sink node, once accepts the RTR message packet, stops sending out the Search_CH message and instead sends an Agree To Register (ATR) message packet back to the node whose address information is provided in the RTR message packet to allow data transmission from that node. The RTR and ATR message format as follows. Table 2. Request To Register (RTR) message format
Table 3. Agree To Register (ATR) message format
Upon receipt of the ATR message packet from the mobile sink node, the clusterhead node compares the address information contained in the message with its own address. If the addresses match, it will disseminate the sensed data to the mobile sink node. If, on the other hand, the addresses do not match, while there is sensed data to be disseminated, the clusterhead node will send an RTR message to the mobile sink node to attempt registration with the sink for data dissemination. The ATR message packet is, otherwise, discarded. The mobile sink node, upon receipt of all sensed data from the clusterhead node, sends out an ACK message to confirm the receipt of the data and records the relevant information of the clusterhead node in its Clusterhead Register Table (CHRT). 3.3 Data Dissemination Phase (DDP) After the registering phase, the wireless sensor network enters its next phase, Data Dissemination Phase (DDP). We will split our discussion into two parts – the data dissemination inside the cluster and the data dissemination outside the cluster. According to Greedy Algorithm, the most favorable choice should be made at each selection step. Therefore, when designing our routing protocol for data dissemination inside the cluster, we adopt this greedy concept, using the remaining energy capacity as the indicator, to select our most favorable path, which in turn minimizes the overall
Mobile Sink Routing Protocol with Registering
357
energy consumption, evenly distributes and balances loads among the sensor nodes, and thereby extends the lifetime of the entire network. We will illustrate our path selection for data dissemination inside the cluster using Figure 1. If node 42 within a cluster needs to disseminate sensed data to the clusterhead node 31, node 42 will inspect NIT in advance and then select a neighbor node closest to itself, having the highest electric power capacity, as the next relay node. In this example, the neighbor node 24 is closest to node 42 and, therefore, will receive all sensed data along with related data required for the clusterhead node 31 from node 42. Once node 24 receives relay data from node 42, node 24 will then inspect its NIT to select the next relay node. Since node 33 is the closest to node 24, node 24 will disseminate all related data to its neighbot, node 33. Node 33, upon receipt of data from node 24, checks its NIT to select the next relay node. In this case, two closest neighbor nodes, node 20 and node 30, are found. Node 30, hence, re-inspects its NIT for higher remaining energy capacity between the two. Node 20 is found to have higher electric power capacity and, thus, is selected as the next relay node. At this point, node 20 checks its NIT and finds that the clusterhead node 31 is the closest neighbor node and relay is no longer necessary. It then disseminates all related data to clusterhead node 31 directly.
Fig. 1. Example for data dissemination route inside the cluster
We will use Figure 2 to illustrate data dissemination outside the cluster. Once the clusterhead node registers with the mobile sink node, the mobile sink node receives a confirmation form clusterhead node stating that clusterhead node is ready to disseminate data, through the exchange of the RTR and ATR data packet. If the mobile sink node moves to the cluster containing clusterhead node and finds that the cluster has been registered before and that clusterhead node has been saved into its Clusterhead Register Table (CHRT), it, after reconfirming that clusterhead node is the clusterhead node of the cluster based on the previous IACST, sends Request To Send (RTS) message packet asking clusterhead node to disseminate sensed data. Upon receipt of the RTS message packet, clusterhead node disseminates data to mobile sink. If the mobile sink node does not receive any message from the clusterhead node after certain wait
358
Y.-H. Wang et al.
time, Tnext, it moves to the next cluster according to its planned path for the operation of receiving related data.
Fig. 2. Flowchart for data dissemination route outside the cluster
3.4 Maintenance Phase(MP) On the maintenance of clusters, if a new sensor node is to join the network, it will send out the JOIN(node_ID, t) message with its own ID number and the message send time included. In addition, it will set a certain wait time for the available clusters to respond. If responses are received from the clusterhead nodes within the wait time, it will join the cluster, of which the clusterhead node is closest to itself. If, however, no reply is received within the set wait time, it will resend the JOIN(node_ID, t) message again up to a total of three times. Upon failure of all three attempts, it will update its own status and become a clusterhead node itself. Meanwhile, the new clusterhead node will send out the ADV-CH message informing all nearby neighbors and start forming its own cluster. Another important issue to be addressed about the maintenance of cluster architecture is the replacement of the clusterhead node. A competition mechanism is activated based on IACST prior to the replacement of the clusterhead node so as to create a buckup_head. The current clusterhead node will then compare the energy capacity of itself with that of the buckup_head. If the energy capacity of such buckup_head is higher than that of the clusterhead node with a difference of more than a threshold, then the buckup_head will be reserved as the clusterhead node in the next operation, and the current clusterhead node soon to be replaced will be moved to the end of the IACST. Changes in the schedule list of the current clusterhead node will trigger the
Mobile Sink Routing Protocol with Registering
359
dispatch of the updated IACST and an update on NIT; otherwise, only the NIT is updated. The “Mobile Sink Routing Protocol with Registering Mechanism” proposed in this paper is based on a cluster-based architecture. The algorithm, as described in this document, is capable of efficiently distributing the energy consumption in different phases of operations. It is able to not only extend the lifetime of the entire sensor network, but also improve the efficiency of the whole routing protocol.
4 Simulation Results We use GloMoSim[6], which is developed primarily for wireless mobile network by University of California Los Angels (UCLA), as our simulation tool,. GloMoSim is implemented based on ParseC and is able to perform simulation tests for large-scale wireless communication network. The parameters and environmental settings for our simulations are as follows: (1) the sensing area is 500 m × 500 m; (2) 1000 sensor nodes are deployed randomly, but evenly and equally densely, in the network; (3) the size of the dissemination packet is 100 bytes; (4) the sensing radius for the sensor nodes is 25 m; sensor nodes are randomly selected acting as the data source nodes; (5) the movement of the mobile sink is computed based on the Hidden Markov Model (HMM) [7] of Random Waypoint[8]. In regards to the similarity between the proposed mobile sink routing protocol with registering mechanism and two other well-known routing methods – TTDD and HCDD on the concept of mobile data sink, we will compare the performance of our protocol with that of TTDD and HCDD and perform an in-depth analysis on the various features of the three protocols and the goals they achieve respectively. First, we observe the energy consumption of the entire wireless sensor network using different routing methods in the simulation process. The results are shown in Figure 3.
Fig. 3. The comparison chart showing the energy consumption of network
As shown in the comparison chart, the routing algorithm presented in this thesis has a lower overall energy consumption than the other two methods. A further analysis of the
360
Y.-H. Wang et al.
simulation results shows that a similar outcome is observed because HCDD, like the algorithm presented in this thesis, is also based on the concept of clustering. However, due to the additional overhead affixed to the maintenance of the hierarchical structure every time a new layer is added, the overall energy consumption of HCDD is increased accordingly. Furthermore, the potential overhead makes it almost impossible to monitor the energy consumption of the wireless network. Based on this observation, it is apparent that our algorithm indeed has a superior performance in terms of managing the energy consumption for the entire network. The average energy consumption of TTDD, on the other hand, is almost 50% higher than that of our algorithm. In setting up the architecture TTDD requires more energy in order for each dissemination node to establish its own grid unit. Additionally, each dissemination node is subject to high energy consumption for handling the data dissemination within its own grid unit. As a result, the performance in terms of reserving energy for the entire network is highly compromised. The simulation results of the different routing methods on the network lifetime are shown in Figure 4.
Fig. 4. The comparison chart showing the lifetime of network
Based on the simulation results, our algorithm preserves a longer network lifetime when having the same number of data source nodes. In comparison with HCDD, our algorithm results in an average of nearly 23% increase in the overall network lifetime. This analytical result proves that the clusterhead replacement mechanism used in our algorithm can truly perform the function of load balancing among nodes and effectively manage the electric energy consumption in each cluster. HCDD, however, with its hierarchical clustering architecture, has the worst performance in terms of the overall network lifetime among all three routing methods. This observation can result from the higher fail rate of the clusterhead nodes in the upper hierarchy. As the number of data source node in the network increases, the number of data relay operations performed by the clusterhead nodes in each layer increases as well, particularly, the ones in the upper layers. Additional loads are forced on those clusterhead nodes because they are higher up in the hierarchy. The constant and long-distance data relay operations lead to rapid energy consumption; and, as a result, a higher fail rate of the clusterhead nodes.
Mobile Sink Routing Protocol with Registering
361
Our algorithm, when compared with TTDD, has, on average, about 14% increase in the overall network lifetime. The analysis shows that TTDD, a Grid-based routing algorithm, fixes the position of each dissemination node via GPS when configuring the grid unit. Since the location of any given node is fixed and known in advance, the energy spent on network maintenance is relatively less than HCDD. Its performance in terms of network lifetime is, therefore, better than that of HCDD. Nevertheless, the dissemination of related data relies heavily on these fixed nodes; and as the number of data source node increases, the data dissemination volume and the data relay operations increase as well. These are the primary sources of energy consumption for using TTDD. Additionally, with the clusterhead node replacement mechanism our algorithm is able to successfully balance loads among all dissemination nodes. It is also able to prevent the clusterhead nodes from wasting energy by utilizing mobile sink nodes to actively communicate with the clusterhead nodes. Taken as a whole, our algorithm results in a better performance in terms of overall network life time than what TTDD is able to achieve. According to the analysis of the simulation results, it is apparent that our algorithm indeed has a greater performance in terms of either the energy consumption or the overall lifetime of the entire network when compared with the other two routing protocols of different architectures. Particularly, the electric power and energy consumption of the entire network is evenly distributed across the network and, base on the simulation results, not seriously impacted by the increase in number of data source nodes. Hence, the overall performance of our algorithm on extending the network lifetime is more outstanding than the other two routing methods.
5 Conclusion and Future Works In this thesis study, we propose a mobile routing algorithm with registering in the cluster-based architecture. During the initializing phase of the sensor network, we introduce path competition and scheduling mechanism, using the remaining energy capability as the indicator, to ensure that sensor nodes with more energy capability will have a greater chance of being selected as clusterhead node. By doing so, the load balancing in the system is ensured. Furthermore, the location information of mobile sink node is recorded and retained through the registration procedure between the clusterhead node and the mobile sink node. In order to enhance the mobility of the sink node, we adopt the Hidden Markov Model (HMM) to compute its moving path and use random waypoint as its moving method. In simulation analysis, we use GloMoSim, a simulation tool developed by University of California Los Angels (UCLA) primarily for wireless communication network. We test our algorithm using the simulation tool and then compare the result to that of TTDD and HCDD two well-known protocols. According to the simulation results, our algorithm has a better performance on overall lifetime and electric power and energy consumption for the entire network. The delay of data transmissions and use multiple mobile sinks will be the concerned in the future.
362
Y.-H. Wang et al.
References 1. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless Sensor Network: a survey. Computer Networks of Elsevier Journal, 393–422 (March 2002) 2. Heinzelman, W., Chandrakasan, A., Balakrishnan, H.: Energy efficient Communication Protocol for Wireless Micro Sensor Networks. In: Proc. of the 33rd Annual Hawaii International Conf. on System Sciences, pp. 3005–3014 (2000) 3. karl, H., Willig, A.: Protocols and Architectures for Wireless Sensor Network. John Wiley & Sons Ltd., Chichester (2006) 4. Lin, C.-J., Chou, P.-L., Chou, C.-F.: HCDD: Hierarchical Cluster based Data Dissemination in Wireless Sensor Networks with Mobile Sink. In: International Wireless Communications and Mobile Computing Conference, July 2006, pp. 1189–1194 (2006) 5. Ye, F., Luo, H., Cheng, J., Lu, S., Zhang, L.: A two-tier data dissemination model for large-scale wireless sensor networks. In: Proceedings of the 8th ACM Annual International Conference on Mobile Computing and Networking, pp. 148–159 (2002) 6. Global Mobile Information Systems Simulation Library, http://pcl.cs.ucla.edu/projects/glomosim/ 7. Hyytiä, E., Virtamo, J.: Random waypoint model in n-dimensional space. Operations Research Letters 33/6, 567–571 (2005) 8. Bettstetter, C., Hartenstein, H., Pérez-Costa, X.: Stochastic properties of the random waypoint mobility model. ACM/Kluwer Wireless Networks: Special Issue on Modeling and Analysis of Mobile Networks 10, 493–619 (2004) 9. Jiang, Q., Manivannan, D.: Routing protocols for sensor networks. In: Proc. of the 1st IEEE Conf. on Consumer Communications and Networking, January 2004, pp. 93–98 (2004) 10. Al-Karaki, J.N., Kamal, A.E.: Routing techniques in wireless sensor networks: a survey. Wireless Communications 11(6), 6–28 (2004) 11. Sohrabi, K., Gao, J., Ailawadhi, V., Pottie, G.: Protocols for Self-organization of a Wireless Sensor Network. IEEE Personal Communications 7(5), 16–27 (2000) 12. Heinzelman, W.B., Chandrakasan, P., Balakrishnan, H.: An application-specific protocol architecture for wireless microsensor networks. IEEE Transactions on Wireless Communications 1(4), 660–670 (2002) 13. Resta, G., Santi, P.: An analysis of the node spatial distribution of the random waypoint model for Ad Hoc networks. In: Proceedings of ACM Workshop on Principles of Mobile Computing (POMC), October 2002, pp. 44–50 (2002)
Towards the Implementation of Reliable Data Transmission for 802.15.4-Based Wireless Sensor Networks Taeshik Shon and Hyohyun Choi U-Convergence Lab, Telecommunication R&D Center, Samsung Electronics Dong Suwon, P.O.BOX 105, 416, Maetan-3dong, Suswon-si, Gyeonggi-do, 442-600, Korea {ts.shon, hyohyun.choi}@samsung.com
Abstract. Reliable data transmission in a wireless sensor network is one of the most significant factors in order to guarantee dependable and efficient sensor data delivery. As sensor network applications diversify and increase, the transmission reliability of sensor data is of uppermost importance. In this paper, we propose a hybrid hop-by-hop reliable approach based on IEEE 802.15.4, which has adaptive link control and enhanced hop-by-hop reliability. The adaptive link control scheme considers the packet's application type and the destination's link status. The proposed hybrid hop-by-hop approach selects an optimized MAC transmission state of related parameters, followed by an additional operation of finding alternative paths for a reliable transmission. In addition, a nodal caching and one hop acknowledgement schemes are used for enhancing hop-by-hop reliability further. We evaluated the proposed approach using NS-2 and C code simulation. Compared with the IEEE 802.15.4 MAC scheme, the simulation results show that the proposed approach is much more robust and reliable in the large number of hops and highly error-prone situation. Finally, we showed our own sensor H/W platform and analysis tool.
1 Introduction In an early stage of wireless sensor network, most sensor network applications are mainly driven by military monitoring and periodical sensing for environmental information. However, current wireless sensor network researches and their deployment cases have shown that the target of wireless sensor network is evolving from simple periodical monitoring solutions to human centric solutions with user on-demand and event-driven data. Like this trend, the importance of a reliable and efficient sensor data delivery is recently getting bigger and bigger. IEEE 802.15.4[1] has been used one of solutions with a medium access protocol (MAC) to implement wireless sensor networks, where reliable data transmission has been considered as a very significant factor to depend on in monitoring users’ interests. Considering data-centric communication and data processing (i.e., duplicate data filtering and data aggregation, etc) for the sensor network longevity, the reliable data transmission is a must for the deployment of energy-efficient wireless sensor networks. The standard, IEEE 802.15.4, provides a retransmission function to cope with one-hop data transmission failure. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 363–372, 2008. © Springer-Verlag Berlin Heidelberg 2008
364
T. Shon and H. Choi
However, it does not guarantee fully-supported reliable data transmission function. More specifically, IEEE 802.15.4 MAC only attempts to resend data with a specific timeout period by 3 times when a transmission error occurs. If these retransmission trials turn out to be failures, the data will be discarded and then MAC notifies the upper layer of the transmission result. ZigBee alliance [2] considers including a reliable data transport function in application support layer, i.e., end-to-end data transport. However, such reliable data transmission scheme can also cause additional overhead and long request delay since transmission failure is detected only on the destination side. Therefore, it becomes worse as the number of hops increases. In this paper, we propose a novel hybrid hop-by-hop reliable data transport scheme which runs between network and MAC layers, for IEEE 802.15.4-based wireless sensor networks. Our proposed approach consists of an adaptive link control scheme and an enhanced hop-by-hop scheme with node-cache and hop-acknowledgement in order to support energy-efficient reliable end-to-end data transport collaborating with IEEE 802.15.4 and the implementation of proposed scheme does not require any modification of IEEE 802.15.4. This paper is structured as follows. An overview and introduction of IEEE 802.15.4/Zigbee and existing reliable schemes are given in the next Section 2. The proposed approach is discussed in Section 3. In Section 4, performance evaluation is performed. Implementation and security issues are discussed in Section 5. The paper is concluded in Section 6.
2 Background Until now, wireless sensor network has been implemented and deployed by various radio frequencies with a light-weight network protocol. These days Zigbee/IEEE 802.15.4 based networking solution is rapidly rising as one of the best choices in order to realize better wireless sensor network with a variety of sensor network applications. Moreover, in order to provide reliable communication with such sensor network applications, a reliable transmission related many researches have studied. In this chapter, we describe overview of IEEE 802.15.4/ZigBee and existing reliable transport researches. EEE 802.15.4 is a standard for implementing wireless personal area network with ease installation, short range operation, reasonable battery life, and simple protocol stack. Also the standard can be applied for not only home network, but also very small device network called by sensor. In other words, IEEE 802.15.4 standard defines wireless medium access control and physical layer for low rate wireless personal area networks (LR-WPANs) in 2003. The purpose of IEEE 802.15.4 standard is to provide ultra-low complexity, ultra-low cost, ultra-low power consumption, and low data rate wireless connectivity among inexpensive devices. ZigBee Alliance is developing very low-cost, very low power consumption, two-way wireless communications standards based on the IEEE 802.15.4 standard. ZigBee specification is managed by ZigBee alliance, a group of over 170 companies creating related semiconductors, development tools and products. ZigBee version 1.0 was ratified in December 2004. More high capabilities specification will be finished 2007. ZigBee architecture consists of ZigBee Network(NWK), Application Support Sublayer(APS), ZigBee Device Object(ZDO), and Application Framework(AF)[3,4,5].A lot of approaches have been proposed and implemented for providing a reliable transport
Towards the Implementation of Reliable Data Transmission
365
feature in wireless sensor network. Specifically, when we try to apply the reliability of packet transmission to wireless sensor network, various sensor network parameters are carefully considered including architectural characteristics. Such reliability related parameters consists of packet delivering features(single, blocked packet, and streamed packet), reliability support layer(MAC, Network, Transport, and Application), quality of packet delivery(best effort and guaranteed), and communication types(sensor-tosensor, sink-to-sensor, and sensor-to-sink)[6,7,8,9,10]
3 Approach for Hybrid Hop-by-Hop Reliable Transmission In this section, we present a hybrid novel hop-by-hop approach in order to provide a reliable transmission in wireless sensor network based on IEEE 802.15.4 MAC. We called the novel approach as a hybrid hop-by-hop reliable transmission approach because of providing two kinds of hop-by-hop transport reliability. The proposed approach consists of two specific schemes: a novel retransmission policy based on adaptive link control(ALC), and an enhanced hop-by-hop reliability using hopacknowledgement and node-cache scheme. Each reliability scheme is compared with the existing reliability methods such as IEEE 802.15.4 MAC and Zigbee network stack. In case of adaptive link control approach, it is analyzed with a retransmission scheme of IEEE 802.15.4 MAC. Moreover, an enhanced hop-by-hop reliability is compared with an end-to-end reliable transmission scheme in ZigBee alliance specification. 3.1 Adaptive Link Control Even though lots of sensor network protocols have been developed, IEEE 802.15.4 standard is one of the widely used sensor network protocols. However, as we already discussed in 2, IEEE 802.15.4 only cover Physical and MAC layer, not upper layer including network. MAC layer of IEEE 802.15.4 has only simple retransmission operations in order to prepare for failing link transmission. From figure 1, if a packet transmission is failed, the MAC layer only tries to retransmit the packet for three times. Although the retransmission of the three times is all failed, MAC doesn’t take additional recovery operation, but let the failure notification to upper layer. We know
Fig. 1. IEEE 802.15.4 Transmission
366
T. Shon and H. Choi
that the hop-by-hop reliability of the IEEE 802.15.4 standard is not importantly considered and more efficient approach is strongly required. Therefore, in this paper, we propose first approach to provide adaptive hop-by-hop reliability without modifying IEEE 802.15.4 MAC. The approach is based on considering dynamic link status and various application characteristic (a type of packet). When a packet is sent, the success or failure of transmission is mainly dependent on the link quality. However, nobody expects that all kinds of packets’ transmission always have to succeed. In other words, it means a packet can have a different priority according to a characteristic of applications such as a periodic, an event-driven, on-demand, and control. Moreover, an adaptive link control approach adjusts MAC retransmission related parameters such as time-out period and the number of retransmission. If the adaptive link control approach can be work with network layer, alternative path discovery(route recovery in upper layer) can be provided, too. Figure 2 shows little more concrete approach. In order to apply adaptive link control scheme, a type of packet is first decided according to an application consisting the packets, and then a link quality is obtained from link quality table using the next hop address of the packet we will send. If the link quality table doesn’t have appropriate value, LQI(Link Quality Indication) probing packet will be sent. After getting the decided packet type and measured link quality values, an appropriate link level is assigned using Adaptive Link Table(ALT). Then, an adaptive link transmission policy using Adaptive Transmission Table(ATT) is applied to MAC layer. The policy means optimized MAC transmission parameters. Also, cooperative path recovery is described in next chapter’s end-to-end reliability mechanism. More realistic example with ALT and ATT is described in chapter 4 for performance evaluation.
Fig. 2. Adaptive Link Control Approach
3.2 Enhanced Hop-by-Hop Reliability In previous section 3.1 we studied the existing IEEE 802.15.4 scheme and our proposed link transmission approach to improve hop-by-hop reliability between wireless sensor nodes. However, it didn’t guarantee fully-supported end-to-end transmission reliability in wireless sensor network environment. On the other hands, ZigBee(upper layers of IEEE 802.15.4 PHY/MAC) supports end-to-end reliable transport which
Towards the Implementation of Reliable Data Transmission
367
increases the reliability of transactions above that available from the ZigBee network layer along by employing end-to-end retries. The ZigBee end-to-end mechanism is similar to the existing TCP/IP reliable transport using a sequence number and timeout. Thus, if we assume that a target wireless sensor network consists of a lot of hops, the expected round-trip delay will be dramatically increased because of high link error-prone environment during transmission. From figure 3, we can see that the 1st packet transmission is success, and then a destination node will wait for the 2nd packet with next sequence number. If a transmission of the 2nd packet is failed in the middle of the end-to-end transmission, the destination node will have to spend much time-out period, and then send a request packet for retransmission, and wait for the reply again and again. If we consider the worst case, the delay time will be increased exponentially.
Fig. 3. End-to-End Reliability in ZigBee
Therefore, in this paper, we propose enhanced hop-by-hop reliability approach using node caching and hop-by-hop acknowledgement in order to provide efficient end-to-end reliability based on IEEE 802.15.4 MAC in comparison with ZigBee. Basically, the proposed scheme cooperates with the adaptive link control-based hopby-hop reliability scheme. Through using adaptive link control as a link-based reliable scheme, it makes up for link reliability between node and node. It means that the endto-end reliability can be provided by extension of link-to-link reliability. In other words, in order to provide end-to-end reliability with adaptive link control, node cache and hop-ack schemes are added. If packet retransmission using adaptive link control is failed, the node caching scheme can first support temporary storing of the failed packet, and then try to resend the packet using alternative path finding. Moreover, the hop-by-hop acknowledgement notifies the packet’s origin in order to decrease end-to-end delay. If hop-by-hop acknowledgement is not used like ZigBee, the expectation time of an end node can increase in proportion to the number of nodes. This enhanced end-to-end reliability scheme is described in figure 4. In figure 4, each packet is stored in the node cache before being sent next node. If hop-ack is arrived to the sending node, the sending node deletes the cached packet and recognizes the packet is sent relaying node and delegates the responsibility of packet transmission. If the sending node receives a normal acknowledgement, it means the packet is arrived to end node successfully. On the other hands, if the packet transferring is failed, the
368
T. Shon and H. Choi
sending node first tries to find an alternative path and then resend using the cached packet from the own cache memory without sending resend request to original source node. Also end node does not need to wait for being arrived packets.
Fig. 4. Enhanced Hop-by-Hop Reliability
4 Performance Evaluation In this chapter, we show a simulation approach and results for evaluating the performance of the proposed a hybrid novel hop-by-hop reliable approach in relation to several simulation metrics. In case of performance evaluation, it is compared with IEEE 802.15.4 MAC. 4.1 Simulation Our proposed hop-by-hop reliability approach is evaluated in comparison with IEEE 802.15.4 MAC based wireless sensor network. However, ZigBee-based end-to-end reliable transport is not a focus on this simulation because ZigBee specification (November 2006) doesn’t concretely mention their end-to-end reliability yet and it is easy to expect the end-to-end reliability of ZigBee is very sensitive to the number of relaying nodes and time-out period of end-node. Therefore, in order to obtain more realistic and priceless evaluation results, actual simulation is dealt with link-level reliability comparison although our proposed end-to-end approach goes in advance of a reliable transport of upper-layer like ZigBee application layer. Table 1. Simulation Parameters Parameters Simulation Time RTT BPS ALC Copies ALC Pkt Type
Value 3600sec 0.192ms 250Kbps None/Use Periodic/On-Demand /Event-Driven/Control
Parameters IEEE 802.15.4 Timeout Processing Delay ALC Retries ALC Timeout
Value 54(symbol) 0.05ms 2~5 27~120(symbol)
ALC Alternative Path
None/Use
Towards the Implementation of Reliable Data Transmission
369
The experiments were run in ns-2 and C codes using IEEE 802.15.4 MAC. More specific simulation parameters are referred to Table 1. From table 1, we assume the simulation time is set to 3600sec, basic round trip time is about 0.192ms, transmission rate is 250Kbps, default IEEE 802.15.4 Timeout is 54 symbols, and processing delay is 0.05ms. The simulated hybrid hop-by-hop approach follows a kind of scenario parameters and operations illustrated in Table 2 and 3. As we already study in the previous chapter 3, the proposed hybrid hop-by-hop approach has a capability of adaptive link control and enhanced hop-by-hop reliability. In case of adaptive link control, link status(LQI : Link Quality Indicator) and packet type are first chosen. The selected result helps to decide a kind of a link level using adaptive link level table to perform real adaptive link control according to link parameters and operations of adaptive transmission table. However, in this simulation, all illustrated tables’ values (Table 2 and Table 3) are not optimized. For finding optimized parameters, much more experiments are required and considered realistic networking environment. Finding optimized parameters is not a scope of this paper. The purpose of this paper is to apply and verify newly proposed reliable approach, not finding optimized parameters. Anyhow, in our simulation, table 2 shows an example of adaptive link level table. According to LQI value and packet type, an appropriate link level can be decided by table 2. After deciding the link level, the link level can indicate adaptive MAC parameters and additional operations such as timeout periods, the number of retries, copies and alternative path. Table 2. An Example of Adaptive Link Level Table LQI value Packet Type Periodic On-Demand Event-Driven Control
LQI (0x00~0x63) Level 2 Level 3 Level 3 Level 3
LQI (0x64~0xC7) Level 1 Level 2 Level 3 Level 3
LQI (0xC8~0xFF) Level 1 Level 2 Level 2 Level 3
Table 3. An Example of Adaptive Transmission Table Actions Adaptive Link Level Level 1 Level 2 Level 3
Timeout Period(symbol)
The number of Retries
Copies
Alternative Path
27 54 120
2 3 5
None None Use
None None Use
4.2 Simulation Results Through the simulation experiments of chapter 4.1, we can see that the proposed approach improves transmission efficiency and decreases transmission delay compared with IEEE 802.15.4 MAC. In figure 5, adaptive link control scheme with
370
T. Shon and H. Choi
node-cache and hop-ack shows high delivery ratio even though the number of hops is increasing. In contrast, the delivery ratio of IEEE 802.15.4 MAC is dramatically decreased in proportion to the number of hops. For instance, when the number of hops is about 30, the delivery ratio of the proposed scheme is over 60% and IEEE 802.15.4 MAC is under 20%. Moreover, in figure 6, we can see that our proposed scheme is strong to the highly error situation, so the average delay is increased smoothly. However, IEEE 802.15.4 MAC scheme is not only very sensitive to the number of hops, but also vulnerable to high error prone situation. Therefore, simulation results indicate that our hybrid hop-by-hop approach is capable of providing reliable transport in wireless sensor network even under highly error prone condition and highly complex relaying condition.
Fig. 5. Simulation Result(hops and delivery ratio), average
Fig. 6. Simulation Result(error rate and delay)
5 Discussion Our proposed approach is to make IEEE 802.15.4-based wireless sensor node with hybrid hop-by-hop reliability functionality. The figure 7 is the H/W prototype of our sensor node with basic network stack based on IEEE 802.15.4 MAC. The prototype of our own sensor node consists of ChipCon’s CC2431 component with 8-bit 8051 core[11] 128KB flash memory is installed as an internal memory. Also the size of sensor board is 60 x 35 mm. In case of operating system, TinyOS 2.0 is ported for CC2431 and our own approach and S/W is implemented using NesC. However, this work is still going on making better reliable wireless sensor network system. Basically, current sensor node has simple adaptive link control functions with nodecache and alternative path finding. In near future, as illustrated in figure 7’s prototype, our real hybrid hop-by-hop reliability function will be fully implemented. Moreover, our experiments is analyzed with own wireless sensor network analyzer. In figure 8, the analyzer provides sensor network packet sniffing, topology viewing, and sensor event monitoring with network camera.
Towards the Implementation of Reliable Data Transmission
371
Fig. 7. IEEE 802.15.4-based Sensor Node Prototype
Fig. 8. Our Wireless Sensor Network Analyzer
6 Conclusion In this paper, we have proposed a hybrid hop-by-hop reliable transport approach based on IEEE 802.15.4 standard for wireless sensor network. A hybrid hop-by-hop reliability approach is an efficient and simple transmission mechanism in order to guarantee high link reliability between sensor nodes and short delay time between end-to-end communications. The distinguishing features of hybrid hop-by-hop reliable approach are an adaptive link control scheme and enhanced hop-by-hop scheme. A key design idea we made an adaptive link control scheme is that an application type of a packet and link status are first considered when a packet has to be sent. And then appropriate MAC transmission parameters are chosen according to the first considered two features. Another important advantage of our approach is to improve highly end-to-end reliability using node-cache and hop-acknowledgement with an alternative path finding in comparison with the existing upper layer support schemes. To the best of our knowledge, the enhanced hop-by-hop reliability is very stable and efficient solution without end-to-end long round-trip delay. Although a link is down, a packet is cached and new path finding is tried to provide the enhanced hopby-hop reliability. Base on this idea, we evaluated the proposed approach and compared its performance to IEEE 802.15.4 MAC. Results show our proposed adaptive link control scheme is much less sensitive than IEEE 802.15.4 MAC under the highly
372
T. Shon and H. Choi
increasing condition of the number of hops. Also, when a packet transmission is fallen with highly error prone environment, the delay time of IEEE 802.15.4 dynamically increased in comparison with the proposed scheme. Thus, the proposed hybrid hopby-hop reliability scheme shows robust and stable characteristic in comparison with IEEE 802.15.4 MAC, and it is appropriate to wireless sensor network which can have a possibility high error prone and confused hop connections. Future work is to evaluate the hybrid hop-by-hop reliability approach in real wireless sensor network with our own H/W sensor node (Figure 7). In realizing fullysupported reliability, above all, security feature is advanced. Therefore, near-future approach with secure and reliable transport mechanism is required.
References [1] Institute of Electrical and Electronics Engineers, Inc., IEEE Std. 802.15.4-2003. Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low Rate Wireless Personal Area Networks (LR-WPANs). IEEE Press, New York, October 1 (2003) [2] ZigBee Alliance, ZigBee Specifications, version 1.1, November 3 (2006) [3] Baronti, P., Pillai, P., Chook, V.W.C., Chessa, S., Gotta, A., Fun Hu, Y.: Wireless sensor networks: A survey on the state of the art and the 802.15.4 and ZigBee standards. Computer Communications 30(7), 1655–1695 (2007) [4] Xue, G., Hassanein, H.: On current areas of interest in wireless sensor networks designs. Computer Communications 29(4), 409–412 (2006) [5] Akyildiz, I.F., Su, W.J., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: a survey. Computer Networks 38, 393–422 (2002) [6] Willig, A., Karl, H.: Data Transport Reliability in Wireless Sensor Networks – A Survey of Issues and Solutions. Praxis der Informationsverarbeitung und Kommunikation 28, 86– 92 (2005) [7] Deb, B., Bhatnagar, S., Nath, B.: Information assurance in sensor networks. In: Proc. 2nd ACM Intl. Workshop on Wireless Sensor Networks and Applications (WSNA), San Diego, CA (September 2003) [8] Wan, C.-Y., Campbell, A.T., Krishnamurthy, L.: PSFQ: A reliable transport protocol for wireless sensor networks. In: Proc. First ACM Intl. Workshop on Wireless Sensor Networks and Applications (WSNA 2002), Atlanta, GA (2002) [9] Stann, F., Heidemann, J.: RMST: Reliable data transport in sensor networks. In: Proc. 1st IEEE Intl. Workshop on Sensor Network Protocols and Applications (SNPA), Anchorage, Alaska (May 2003) [10] Sankarasubramaniam, Y., Akan, O., Akyildiz, I.: ESRT: Event-to-sink reliable transport in wireless sensor networks. In: Proc. ACM MOBIHOC 2003, Association of Computing Machinery. ACM Press, Annapolis, Maryland (2003) [11] Chipcon, Chipcon Products from Texas Instruments, http://www.chipcon.com
An Energy-Efficient Query Processing Algorithm for Wireless Sensor Networks Jun-Zhao Sun Academy of Finland Department of Electrical and Information Engineering, University of Oulu, 90014, Finland [email protected]
Abstract. Sensor networks have recently attracted significant attention for many military and civil applications, such as environment monitoring, target tracking, and surveillance. A high-level abstraction of sensor networks forms the distributed database view, in which query is adopted to retrieve data from the network. Sensor nodes have limited energy resources and their functionality continues until their energy drains. Therefore, query for sensor networks should be wisely designed to extend the lifetime of sensors. This paper presents a query optimization method based on user-specified accuracy item for wireless sensor networks. When issuing a query, user may specify a value/time accuracy constraint according to which an optimized query plan can be created to minimize the energy consumption. At each single sensor node, instead of direct delivery of each reading, algorithms are proposed to reduce both data sensing and data transmission. The performance of the proposed algorithms shows that the proposed methods can achieve better performance than without performing the optimization.
1 Introduction Sensor networks represent significant improvement over traditional sensors in many ways [1, 2]. However, sensor nodes have very limited supply of energy, and should be available in function for extremely long time (e.g. a couple of years) without being recharged. Therefore, energy conservation needs to be one key consideration in the design of the system and applications. Extensive research work has been devoted to address the problem of energy conservation. Examples include energy efficient MAC protocol [3], clustering [4], localization [5], routing [6], data management [7], applications [8], etc. A sensor field is like a database with dynamic, distributed, and unreliable data across geographically dispersed nodes from the environment. These features render the database view [9-11] more challenging, particularly for applications with the lowlatency, real-time, and high-reliability requirements. Under a database view, a wireless sensor network is treated as a virtual relational table, with one column per attribute and one row per data entry. Sensor network applications use queries to retrieve data from the networks. The result is a logical sub-table of the whole virtual table of the network, with each data F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 373–385, 2008. © Springer-Verlag Berlin Heidelberg 2008
374
J.-Z. Sun
entry an associated timestamp to denote the time of measurement. The real data table of a node is different from the one in the virtual table in the sense that there will be only one attribute there. Query processing is employed to retrieve sensor data from the network [12-14]. A general scenario of querying sensor network is, when user requires some information, he or she specifies queries through an interface at sink (also known as gateway, base station, etc.). Then, queries are parsed and query plans are made. After that, queries are injected into the network for dissemination. One query may eventually be distributed to only a small set of sensor nodes for processing. The end nodes then execute the query by sampling phenomena/object. When sensor node has the sampling data ready, results flow up out of the network to the sink. The data can then be stored for further analysis and/or visualized for end user. Query optimization can be made through out the process in all the stages. This paper presents a novel query optimization method for wireless sensor networks, and in particular, for the last stage of query processing: query result collection. The proposed method is applicable for the optimization of periodical query. The key novelty of the method lies on the careful consideration of accuracy issue along with energy consumption in data communication. Accuracy as an application level QoS requirement is thus employed by our methods. By taking advantage of the value and/or time accuracy constraint specified with a query, the method can find the optimal sensing and transmission of attribute readings to sink node. The remainder of this paper is organized as follows. Section 2 and 3 present valuebased and time-based optimization algorithms respectively. Performance of the proposed method is evaluated in Section 4. Section 5 analyzes related work. Finally, Section 6 concludes the paper.
2 Value-Based Optimization Algorithms In this section, we consider query optimization based on value-accuracy constraint set by applications. One example of such query is shown below. // Query Example I SELECT temperature FROM sensors AS s WHERE s.location = Area_C ACCURACY value = 1 SAMPLE ON 00:00:00 INTERVAL 10 s LOOP 500 This query is to collect temperature data from Area_C region, starting from 00:00:00 for 500 times with an interval of 10 seconds. The clause “ACCURACY” specifies the value-accuracy constraint which is, in this example, 1 degree. This constraint actually gives the hint that the accuracy of the temperature readings that is acceptable by the application is within one degree above
An Energy-Efficient Query Processing Algorithm
375
or below the real data. In other words, the application does not need to know the exact measurement, but instead a rough range of the reading is sufficient. Such query raises an opportunity to employ the value-accuracy constraint to optimize the query execution. Instead of directly delivering every reading to sink, we choose to report only updates, i.e. when new reading is out of the previous range of value. This way data communication can be minimized, and thus energy is saved. Obviously, this method is particularly applicable to the case where the attribute changes smoothly, but even changing sharply it is never worse than direct delivery. Fig. 1 illustrated the basic idea. When specifying a value-accuracy constraint as 1, instead of directly delivering 17 readings at each interval, the optimization needs to send only 9 times, and thus largely reduce the data transmission. The blue line shows the approximated curve at sink node. Value 5 4 3 2 1 0
0
5
10
15
Time
Fig. 1. Value-based optimization
Formally, suppose a query is specified with a value-accuracy constraint Δv, a serious of attribute readings r0, r1, … are available over time t0, t1, …. The problem to be addressed is to figure out a serious of value v0, v1, … to be delivered to sink as the representation of the real reading serious. The key problem is how to choose value vi (i=0, 1, …) as the representation of a set of readings. In this paper, we proposes two methods: fixed-interval and adaptive-average. 1. Fixed-value method. Before the query execution, system sets a fixed points V and so a set of value points Vk= V + kΔv, k is an integer. Then, for a new reading ri∈[Vk, Vk+1), replace ri with Vk. Detailed algorithm is shown in Algorithm I. The method is relatively simple and easy to implement. However, when a set of readings vary exactly around the fixed point, then sensor node will generate a serious of unnecessary report due to the ping-pong effect. In considering this, we proposes an adaptive-average method. 2. Adaptive-average method. Comparing to the fixed-value method in which reported values are defined before the query execution, this is a run-time method in the sense that the value is determined during the execution. This way the reported value is more accurate, however with possibly long delay when attribute value is stable.
376
J.-Z. Sun
Algorithm I. Fixed-value method. Read r0; Find k so that r0∈[Vk, Vk+1); CV = Vk; Report (CV, 0); FOR (i=1; i Vk) CV = Vk; Report (CV, i–1); } //End of FOR Suppose a set of readings (r0, r1, … ri), we define rMAX = MAX (r0, r1, … ri), rMIN = MIN (r0, r1, … ri), rAVG = AVG (r0, r1, … ri), and rMAX - rMIN < Δv. When at the next time point a new reading ri+1 is available, if rMAX – Δv < ri+1 AND < rMIN + Δv, then adjust rMAX, rMIN, and rAVG with ri+1; otherwise, report rAVG. Detailed algorithm is shown in Algorithm II. Algorithm II. Adaptive-average method. Read r0; rMAX = rMIN = rAVG = r0; Report (rAVG, 0); Counter = 1; FOR (i=1; i
An Energy-Efficient Query Processing Algorithm
377
3 Time-Based Optimization Algorithms In this section, we consider query optimization based on time-accuracy constraint set by applications. One example of such query is shown below. // Query Example II SELECT temperature FROM sensors AS s WHERE s.location = Area_C ACCURACY time = 1 s SAMPLE ON 00:00:00 INTERVAL 10 s LOOP 500 This query is almost the same as the example in Section 3, the only difference is the clause “ACCURACY”. It is specified with a time-accuracy constraint which is, in this example, 1 second. This constraint also gives the hint that the accuracy of the temperature readings that is acceptable by the application is within one second before or after the real sampling time. In other words, the application does not need to have the measurement at an exact time point, but instead within a rough range of the reading is sufficient. Such query raises an opportunity to employ the time-accuracy constraint to optimize multi-query execution. When a sensor network is hosting multiple queries for a couple of applications simultaneously, some of them may have the same attribute and selection predicates, resulting in the execution of more than one query at a sensor node. If at a sensor node, some attribute readings can be shared by multiple queries, then energy for both sensing and delivering results can be saved. Formally, without losing any generality, we suppose there are two queries at concerned sensor node, QA and QB, that are specified with time-accuracy constraint ΔtA and ΔtB; sampling interval tA and tB, respectively. Therefore, query Q1 will sample the attribute value at time tA0, tA1, …, tAi, …, tAi = tA0 + i× tA. Similarly, query Q2 is specified to sample the attribute value at time tB0, tB1, …, tBj, …, tBj = tB0 + j× tB. If for QA’s ith sample time tAi and QB’s jth sample time tBj, tBj – ΔtB ≤ tAi+ ΔtA ≤tBj + ΔtB or tAj –
Query 1 Query 2 Query 3 Time Fig. 2. Time-based optimization
378
J.-Z. Sun
Algorithm III. Time-based optimization. i=0; j=0; LOOPB=0; FOR (n=0; n
ΔtA ≤ tBi+ ΔtB ≤tAj + ΔtA, i.e. there is overlapping between the two sampling interval, then the shared sampling time tij is given by: tij = (MIN(tAi+ ΔtA, tBj + ΔtB) + MAX(tAi– ΔtA, tBj – ΔtB))/2. Fig. 2 illustrated the basic idea. There queries are executed at local node, with different INTERVAL and ACCURACY set. The figure shows the overlapping of sampling time intervals where readings can be shared between queries. In the example, if no optimization is performed, there will be totally 15 readings and deliveries needed. When using time-based optimization, only 8 readings and deliveries are needed, thus largely reduce energy consumption. Detailed proposed algorithm is given in Algorithm III. We consider two simultaneous queries at local sensor node in the algorithm. It is easy to extend the algorithm for more than two queries.
4 Experiments and Results 4.1 Model and Setup In this section, we compare the performance of queries with optimization with queries without optimization. We denote the query cost with performing optimization by Eop and without optimization by Enop, and energy saving (ES) is thus Enop – Eop.
An Energy-Efficient Query Processing Algorithm
379
Furthermore, without losing any generality, we assume the data size for all the queries is the same, denoted by K. Communication energy denotes the cost for transmitting K amount of data from node u to node v through link e = (u, v). The energy consumption includes those at both u and v, for sending and receiving at transmitter and receiver respectively, i.e. E(K) = ETx(K) + EsRx(K). Our energy model below is based on [15, 16]. We assume a linear relationship for the energy spent per bit at the transmitter and receiver circuitry and assume an d2 path energy loss due to channel transmission. Thus, to transmit a K-bit message a distance d using the radio model, the energy used by the transmitter & power amplifier and receiver circuitry, ETx and ERx can be expressed as ETx(K, d) = ETx-elec(K) + ETx-amp (K, d) = K (eTc + eTad2), ERx(K) = ERx-elec(K) = K eRc, where eTc, eTa, and eRc are hardware dependent parameters. Typical numbers for current radios are eTc = eRc = 50 nJ/bit and eTa,= 100 pJ/bit/m2. Formula (5) therefore can be changed to E = K (eRc + eTc + eTad2). We can thus define the unit transmission consumption Eunit = eRc + eTc + eTad2. It is worth noting that there are two possible power scenarios for data transmission. Variable transmission power is the case that the radio dynamically adjusts its transmission power. There is also the case of fixed transmission power, in which the radio uses a fixed power for all transmission. This case is considered because commercial radio interfaces have limited capability for dynamic power adjustments. Also optimal power adjustment requires feedback, consuming energy in itself. In this case the ETa is fixed to a certain value at the transmitter. Table 1. Parameters, their value ranges, and values used in performance analysis
Parameter eTc eTa eRc d LOOP K hops nodes accuracy
Value range 10-100 nJ/bit ∼100 pJ/bit/m2 5-80 nJ/bit 0.1-10 m 1-10000 5-30 bytes 1-50 20-2000 0.01-30 %
Used value 50 nJ/bit 100 pJ/bit/m2 50 nJ/bit 0.1-10 m 100 10 bytes 1-5 1-100 0.1-30 %
Table 1 shows the parameters, their typical value ranges, and values used in the performance analysis. We assume a uniform distribution of the attribute value over its range R. In this case, value accuracy constraints can be represented by the percent of the constraint to this range. Similarly in case of time accuracy constraint, the accuracy
380
J.-Z. Sun
can be represented by the ratio of the constraint and the sample interval. Note that we are considering two concurrent queries, therefore the accuracy is the sum of the two ratios. 4.2 Results First, the impact of network size is studied. Figure 3 illustrates the power consumption of collecting query results from a set of (1 to 100) sensor nodes with different distance to the sink. The Accuracy is set to 0.2. Obviously, total power consumption increases with the increasing of network size. From the figure, it is clear that by performing optimization, power consumption can be reduced in an order of Accuracy. Energy saving increases with the increasing of network size as well. It is also clear that Adaptive-average method overperforms fixed-value method, as expected. 25 Power Consumption (mJ)
Enop 20 Eop-fixed 15
Eop-adaptive
10 5 0 0
20
40
60
80
100
Network Size
Fig. 3. Impact of network size to energy consumption in case of value accuracy
Fig. 4 shows the impact of value accuracy to energy consumption. In the simulation, number of nodes is set to 100. It is easy to observe the effect of the proposed methods: when increasing (i.e. loosing) the value accuracy constraints, power consumption is decreased when performing optimization. As for case of nonoptimization, power consumption is approximately constant. Again, adaptive-average algorithm results in better performance than fixed-valued approach. Similarly Fig. 5 shows the impact of network size to energy consumption when using time-based optimization. In the simulation, the accuracy of the two queries are set to (0.1, 0.2) and (0.2, 0.3) respectively for Eop-1 and Eop-2. It is clear that with increasing the size of the network, power consumption increases. By performing timebased optimization, energy is saved.
An Energy-Efficient Query Processing Algorithm
381
Power Consumption (mJ)
30 25 20 15 Enop
10
Eop-fixed 5
Eop-adaptive
0 0
10
20
30
Value Accuracy (%)
Fig. 4. Impact of value accuracy to energy consumption in case of time accuracy
25 Power Consumption (mJ)
Enop 20
Eop-1 Eop-2
15 10 5 0 0
20
40
60
80
100
Network Size
Fig. 5. Impact of network size to energy consumption in case of time accuracy
Fig. 6 shows the impact of time accuracy to power consumption. In the simulation, Enop and Eop-1 use the same network deployment. For Eop-2, a new network layout is used, in which node distance is 2/3 on average of the previous one, meaning that the overall number of hops is lessen. As shown in the figure, power consumption decreases when employing a higher time accuracy setting (i.e. loose the constraint).
382
J.-Z. Sun
Power Consumption (mJ)
35 30 25 20 15 Enop
10
Eop-1 5
Eop-2
0 0
10
20
30
Time Accuracy (%)
Fig. 6. Impact of time accuracy to energy consumption in case of time accuracy
Also when network coverage is small, less power is needed for collecting the same amount of data. In summary, out proposals consume less energy than direct delivery method. In both value-based and time-based optimization, energy saving is at the order of accuracy constraints.
5 Related Work There have been a large number of research carried out aiming at query processing for sensor networks, based on a database view of sensor network. In [9] the authors provide a well-understood non-procedural programming interface suitable to data management, allowing the community to realize sensornet applications rapidly, by architecting sensor networks as virtual databases. They also argue that in order to achieve an energy-efficient and useful implementation, query processing operators should be implemented within the sensor network, and that approximate query results will play a key role. In [10] the authors define new concept and model of sensor database system, queries dictate with data is extracted from the sensors. Stored data are represented as relations while sensor data are represented as time series. Each long-running query formulated over a sensor database defines a persistent view, which is maintained during a given time interval. The paper also describe the design and implementation of COUGAR sensor database system. A detailed description of the query processing mechanism is presented in [14], in which the authors evaluate the design of a query layer that accepts queries in a declarative language that are then optimized to generate efficient query execution plans with in-network processing which can significantly reduce resource requirements.
An Energy-Efficient Query Processing Algorithm
383
In [11 and 12] The authors are concerned with query processing in sensor networks and describes in detail the design and implementation of TinyDB, an acquisitional query processing system for sensor networks. Acquisitional issues are those that pertain to where, when, and how often data is physically acquired (sampled) and delivered to query processing operators. The designed system can significantly reduce power consumption over traditional passive systems that assume the a priori existence of data, by focusing on the locations and costs of acquiring data. Berkely and Cornell have built two prototype sensor network query processors (SNQPs) – TinyDB and Cougar – that run on a variety of sensor platforms. Paper [13] report the architecture and methods, as well as query processing optimization methods of the prototypes. Some other works include, in [17] the authors proposes analytical models to evaluate the performance of three methods for processing historical spatial-temporal queries in sensor networks. In [18] the authors presents TiNA, an in-network aggregation scheme that maintains the user-specified quality of data requirement while significantly reducing the overall energy consumption. Paper [19] presents the author’s progress to date in building TeleTiny, with a particular focus on two components and interfaces between them: server side and sensor side. In [20] the authors consider multi-query optimization for aggregate queries on sensor networks by developing a set of distributed algorithms. Comparing to these related works, the unique novelty of our proposal lies on the consideration of application-specified QoS constraint – accuracy, time and energy consumption. By taking advantage of the application QoS requirements, an optimal strategy can be figure out, in which both the QoS requirements are fulfilled and the performance is improved to the best extent.
6 Conclusion and Future Work An application level QoS based method is proposed to optimize the execution of multi-query in a sensor network, by considering the energy consumption with the application-specific value/time accuracy requirement. Algorithms are described in detail. Experiments are conducted to validate the method. Results show that the proposed method can achieve the goal of query optimization. Dynamically adjusting accuracy constraint value is useful, especially when no prior knowledge is available. On the other hand when the distribution of the sensor readings is known, e.g. normal distribution rather than uniform distribution, the optimization can be further developed to take this knowledge into account. These two issues are the problems for future investigation.
Acknowledgment Financial support by Academy of Finland (Project No. 209570) is gratefully acknowledged.
384
J.-Z. Sun
References 1. Gehrke, J., Liu, L.: Sensor-network applications. IEEE Internet Computing 10(2), 16–17 (2006) 2. Gharavi, H., Kumar, S.P.: Special Issue on Sensor Networks and Applications. Proceedings of the IEEE 91(8) (August 2003) 3. Miller, M.J., Vaidya, N.H.: A MAC Protocol to Reduce Sensor Network Energy Consumption Using a Wakeup Radio. IEEE Transactions on Mobile Computing 4(3), 228–242 (2005) 4. Fukushima, Y., Harai, H., Arakawa, S., Murata, M.: Distributed clustering method for large-scaled wavelength routed networks. In: Proc. Workshop on High Performance Switching and Routing, pp. 416–420 (May 2005) 5. Hu, L., Evans, D.: Localization for Mobile Sensor Networks. In: Proc. Tenth Annual International Conference on Mobile Computing and Networking (MobiCom 2004), Philadelphia, pp. 45–57 (September-October, 2004) 6. Al-Karaki, J.N., Kamal, A.E.: Routing techniques in wireless sensor networks: a survey. IEEE Wireless Communications 11(6), 6–28 (2004) 7. Demers, A., Gehrke, J., Rajaraman, R., Trigoni, N., Yao, Y.: Energy-Efficient Data Management for Sensor Networks: A Work-In-Progress Report. In: 2nd IEEE Upstate New York Workshop on Sensor Networks, Syracuse (October 2003) 8. Zou, Y., Chakrabarty, K.: Energy-Aware Target Localization in Wireless Sensor Networks. In: Proc. 1st IEEE International Conference on Pervasive Computing and Communications (PerCom 2003), Dallas-Fort Worth, Texas, USA, pp. 60–67 (March 2003) 9. Govindan, R., Hellerstein, J.M., Hong, W., Madden, S., Franklin, M., Shenker, S.: The sensor network as a database. USC Technical Report No. 02-771 (September 2002) 10. Bonnet, P., Gehrke, J.E., Seshadri, P.: Towards Sensor Database Systems. In: Proceedings of the Second International Conference on Mobile Data Management, Hong Kong (January 2001) 11. Madden, S.R., Franklin, M.J., Hellerstein, J.M., Hong, W.: TinyDB: An Acquisitional Query Processing System for Sensor Networks. ACM Transactions on Database Systems 30(1), 122–173 (2005) 12. Madden, S., Franklin, M.J., Hellerstien, J.M., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proceedings ACM SIGMOD, San Diego, CA, USA, June 2003, pp. 491–502 (2003) 13. Gehrke, J., Madden, S.: Query processing in sensor networks. IEEE Pervasive Computing 3(11), 46–55 (2004) 14. Yao, Y., Gehrke, J.: Query processing for sensor networks. In: Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, California (January 2003) 15. Rabiner Heinzelman, W., Chandrakasan, A., Balakrishnan, H.: Energy-Efficient Communication Protocols for Wireless Microsensor Networks. In: Proc. Hawaii International Conference on System Sciences (HICSS 2000) (January 2000) 16. Heinzelman, W., Sinha, A., Wang, A., Chandrakasan, A.: Energy-Scalable Algorithms and Protocols for Wireless Microsensor Networks. In: Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000) (June 2000) 17. Coman, A., Sander, J., Nascimento, M.: An Analysis of Spatio-Temporal Query Processing in Sensor Networks. In: Proceedings of the 21st International Conference on Data Engineering Workshops, April 2005, p. 1190 (2005)
An Energy-Efficient Query Processing Algorithm
385
18. Beaver, J., Sharaf, M.A., Labrinidis, A., Chrysanthis, P.K.: Power-Aware In-Network Query Processing for Sensor data. In: Proceedings of the 2nd Hellenic Data Management Symposium (HDMS 2003), Athens, Greece (September 2003) 19. Madden, S.: Query Processing for Streaming Sensor Data. Ph.D. Qualifying Exam Proposal, http://db.lcs.mit.edu/madden/html/madden_quals.pdf 20. Trigoni, N., Yao, Y., Demers, A., Gehrke, J., Rajaraman, R.: Multi-query Optimization for Sensor Networks. In: Proceedings International Conference on Distributed Computing in Sensor Systems (June 2005)
Rule Selection for Collaborative Ubiquitous Smart Device Development: Rough Set Based Approach Kyoung-Yun Kim1,*, Keunho Choi1, and Ohbyung Kwon2 1
Dept. of Industrial and Manufacturing Engineering, Wayne State University Detroit, MI 48202, USA [email protected], [email protected] 2 School of International Management, Kyung Hee University Yongin, Gyeonggi-do, Korea [email protected]
Abstract. Comparing with general mobile devices, Ubiquitous Smart Device (USD) is characterized by its capability to generate or use context data for autonomous services, and it provides users with personalized and situationaware interfaces. While the USD development requires more knowledgeintensive and collaborative environment, the capture, retrieval, accessibility, and reusability of that design knowledge are increasingly critical. In the design collaboration, the cumulative, evolutionary design information and design rules behind the USD design are infrequently captured and often difficult to hurdle due to its complexity. Rough set theory synthesizes approximation of concepts, analyzes data by discovering patterns, and classifies into certain decision classes. Such patterns can be extracted from data by means of methods based on Boolean reasoning and discernibility. In this paper, a rough set theory generates demanded rules and selects the appropriate minimal rules among the demanded rules associated to USD physical component design. The presented method shows the feasibility of rough-set based rule selection considering complex design data objects of USD physical components.
1 Introduction Ubiquitous smart devices (USD) are physical and functional components to realize a ubiquitous smart space (USS), such as u-home, u-office and u-city. Appropriate USD should include natural interface (e.g., hands free, adaptable I/O), portability, networking capability, and situation sensibility. To realize a successful and competitive USD, multidisciplinary stakeholders including customers, device manufacturers, and space builders, and even software engineers should be involved in its development processes and should collaborate to determine an optimal USD design including physical and software components. While the USD development requires more knowledgeintensive and collaborative environment, the capture, retrieval, accessibility, and reusability of that knowledge are increasingly critical. Typically, a collaborative product development environment involves multiple designers and heterogeneous tools *
Corresponding Author.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 386–396, 2008. © Springer-Verlag Berlin Heidelberg 2008
Rule Selection for Collaborative Ubiquitous Smart Device Development
387
wherein designers often use their own terms and definitions to represent a product design; however, the cumulative, evolutionary design knowledge and rationale behind the product are infrequently captured or retained. Furthermore, the needs for a USD are often dynamically changing and new needs are identified from a USS. Therefore, dynamic and evolutionary design information change, computational complexity, and incompleteness of information are often barriers to achieve a true collaborative USD development. This improper dissemination of USD information causes lack of understanding of the physical and software components and their effects in USD life-cycle. This paper presents a rough-set based approach to systematically generate minimal set of semantic rules for USD physical components. Typically, USD design information comes from various sources and rapidly changes; design is evolutionary. Minimal set of rules is required to make an appropriate USD design decision. The USD components can be semantically represented in ontology. However, in the autonomous collaboration for USD design, the computational complexity of semantic reasoning is very sophisticated. Moreover, it requires significant efforts and time to semantically represent the evolutionary, dynamically changing design data and to generate appropriate sets of decision rules. Therefore, in this research, a rough set theory generates demanded rules and selects the appropriate minimal rules among extracted rules for the USD design decision. The following sections include description of ontology construction, rough-set based rule generation, and finally a case study to illustrate how rough set theory can be utilized to generate the minimal set of design rules for a USD design decision.
2 Background and Literature Review 2.1 Ubiquitous Smart Device Comparing with general mobile devices, USD is characterized by its capability to generate and/or use context data for autonomous services, and it provides the users with personalized and situation-aware interfaces. The attention on USD has been gradually increased and a few research activities have been reported. First, making use of context in mobile devices is receiving increasing attention in making USD available. Augmenting mobile devices with awareness of their environment and situation as context has been regarded as one of the challenging issues [7]. The context data may be acquired by the USD itself or delivered by another USD nearby in a wireless manner. Second, USDs should collaborate with each other for timely and personalized services. X10 series products (http://www.smarthome.com/ about_x10.html), MIT media-cup and wearable computers are conventional outcomes, which enables collaborations. Third, smart devices, which adopt more sophisticated interaction techniques for applications in an intuitive, efficient, and enjoyable manner, are required in the ubiquitous computing domain. As Ballagas et al. [3] has pointed out, mobile phone as well as cameras could be a good starting point to realize USD. However, there are still significant technological and scientific gap to make the legacy mobile phones truly ‘natural’. To realize the natural interface or allowing highly serendipitous interactions, innovative interface technologies, such as multimodal interaction, marker-based interaction, and augmented reality, have been being
388
K.-Y. Kim, K. Choi, and O. Kwon
considered. Since these requirements generate more complex design requirements for the USD producers and other stakeholders, timely collaboration between the suppliers and potential buyers are highly desirable. 2.2 Rough-Set Theory Rough set theory was developed by Pawlak [15]. Its main goal is to synthesize approximation of concepts from the acquired data. To do so, tasks collected require tools aimed at discovering patterns in data and classifying them into certain decision classes. Classifying a design object according to patterns is the reason why the rough set theory approach is different from other population based approaches. A population based approach is formed for the entire training data set, while the rough set theory follows an individual data object-based approach. With patterns of design objects, the rough set theory is able to accurately capture relationships between input features and the decision, while population-based approaches hold approximation errors. Not many research methods have been classified into design objects linked to design decision making. There is some research regarding data representation to support decision making with the rough set theory. Huang, et al. [10] had conducted to enrich document representation during manufacturing process with a rough set based approach. With generated rules, they can analyze the various qualitative data and attributes, which cannot be easily determined by the current statistical approaches. Agard and Kusiak [1] distinguish customer as a group and analyze customer’s requirements and associated rules. However, they did not address the complexity issues of semantic rules for design objects. Kusiak and Kurasek [13] argued that statistical process control and design of experiment approaches did not provide conclusive results to quality problem in electronics assembly. 2.3 Ontology Ontologies are explicit formal specifications of the terms in the domain and relations among them [8]; a formal, explicit specification of a shared conceptualization. “Conceptualization” refers to an abstract model of some phenomenon, which identifies the relevant concepts of that phenomenon. “Formal” refers to the fact that the ontology should be machine-readable [6]. Mizoguchi [14] presented the roles of ontologies as a common vocabulary, common data structure, explication of what is left implicit, semantic interoperability, explication of design rationale, systemization of knowledge, meta-model function, and theory of content. Ontologies have been developed for a variety of domains. The broadest ontologies are the upper-level ontologies such as CYC, developed by Cycorp [5], that describe common sense-level knowledge. Narrower in scope than upper-level ontologies, enterprise-level ontologies attempt to formalize the practices and processes that occur within an organization [6][20]. The narrowest ontologies have been developed to represent conceptual and functional engineering information [12], design features [9], and manufacturing [19]. Of all the ontologies reviewed, the majority have been developed and applied to broader business applications which do not have the same level of detail that is required for mechanical design. In spite of these important advances, there is still a critical gap to semantically capture and intelligently propagate USD design information across entire USD development processes.
Rule Selection for Collaborative Ubiquitous Smart Device Development
389
3 Ontology Construction for USD Physical Component While maintaining the universality of semantic definitions (e.g., for machine interpretation for software agents), the semantic notions, definitions, and primitives of a USD design should be represented in a standard (machine-interpretable) manner. We used Semantic Web Rule Language (SWRL) for this research. SWRL is a combination of the OWL DL and OWL Lite sublanguages of the Web Ontology Language (OWL), with the Unary/Binary Datalog RuleML sublanguages of the Rule Markup Language. SWRL is intended to be the rule language of the Semantic Web. The SWRL submission package contains three components in addition to the principal prose document: (1) an RDF Schema partially describing the RDF Concrete Syntax of SWRL; (2) an OWL ontology partially describing the RDF Concrete Syntax of SWRL; (3) an XML Schema for the SWRL XML Concrete Syntax [21],[22]. To generate the ontology and rules, the OWL and SWRL editors in Protégé are used for this work. The USD physical component is rarely monolithic, and requires assembly components and joining operations. Before implementing a standard ontology for USD assembly design for the physical components, we thoroughly investigated the definitions, possible terms, and concept representation of assembly. Based on this investigation, the terms of the assembly design ontology were carefully defined. Terms including Individual, Product, Assembly, Assembly Component, Part, Sub-assembly, Assembly Feature, Form Feature, Joint, Joint Feature, Mating Feature, and others are defined and used to construct the assembly design ontology as shown in Figure 1. For owl:Thing
Individual
Product
Material
SpatialRelationship
Feature
Part Assembly FeatureForVisualization Shape
Against Aligned InclineOffset IncludeAngle ParallelOffset ParaxOffset SRDirection
FusionWeldin
IndexedFace Appearance AppearanceColor AppearanceTexture
FeatureForPart
GeometricFeature FormFeature FeatureForAssembly AssemblyFeature JointFeature MatingFeature MatingBond JointFeatureCharacteristics JoiningConstraint JoiningTolerance
JointConfiguration ContactShape
Manufacturing
DegreesOfFreedom
ManufacturingProcess JoiningProcess
FusionWeldingCondition AdhesiveBondingCondition BrazingCondition MechanicalFasteningCondition RivetingCondition CompressionCondition MetalStitchingCondition CrimpingCondition SnapInCondition PushOnCondition SpotWeldingCondition SolderingCondition FixtureLocation
FillerMetal Gas ElectricalSpecation
GMAW GTAW FCAW SMAW PAW
Fixed Lin Plan Rot
AdhesiveBonding Brazing MechanialFasteningThreadedFastener MechanialFasteningRivet Compression MetalStitchingStapling Crimping SnapIn PushOnFastener SpotWelding Soldering
ButtJoint ConerJoint TJoint LapJoint
WeldConfiguration AdhesiveBondConfiguration MechanicalFastenerConfiguration RivetConfiguration StapleConfiguration SolderConfiguration Colinear Cylindrical Plane Spherical
Fig. 1. Assembly design ontology class hierarchy
390
K.-Y. Kim, K. Choi, and O. Kwon
example, the definition of an assembly feature in engineering design is "a group of assembly information," which includes form features, joint features, mating relations, assembly/joining relations, spatial relationships, material, engineering constraints, etc. The developed ontology also contains detailed classes of various joining processes. The joining concepts for the ontology are based on the definitions put forth by the American Welding Society [2], the National Institute of Standards and Technology [18], and the work of Yao et al. [23]; however, not all definitions and specifications were used in this work. The advancement of the assembly design ontology has been reported in Kim et al. [11]. Table 1. Example of SWRL rules
With representing USD design semantics, one of challenges is to manage USD design rules (e.g., SWRL rules) for a rational design decision. Typically, USD design information and associated semantic rules come from various sources and rapidly change, because USD design is evolutionary and it requires autonomous collaboration. Table 1 illustrates examples of SWRL rules that can be associated to a USD design. As aforementioned, design can be semantically represented in ontology. However, when a USD is designed in a semantic and autonomous collaborative environment, the computational complexity of semantic reasoning is very sophisticated. Moreover, it requires significant efforts and time to semantically represent the evolutionary, dynamically changing design data and to generate appropriate sets of decision rules. In this paper, a rough set theory generates demanded rules and selects appropriate minimal rules among the demanded design rules for the USD design decision.
Rule Selection for Collaborative Ubiquitous Smart Device Development
391
4 Rough-Set Based Minimal Design Rule Selection Rough set theory synthesizing approximation of concepts, analyzes data by discovering patterns and classifying them into certain decision classes. Such patterns can be extracted from data by means of methods based on Boolean reasoning and discernibility [4],[16]. Discernibility relations are the ability to discern between perceived objects. By using the discernibility relation, the discernibility function for design decision can be determined. The discernibility function is a classification with respect to decision making, in which design data object can be categorized by attributes. The extracted attribute relations become rules for design decision. When rules are acquired by discernibility relations, there is a challenge of a great number of rules caused by considering each attribute and their values; it increases complexity. Current ontology reasoners face the computational limitation of reasoning capability with enormous complexity and a huge number of rules. Accordingly, rules should be efficiently minimized without losing capability of design data object representation. In this paper, Boolean reasoning is employed to obtain the least set of rules (i.e., prime implicants) extracted from the acquired design data. As a result, a decision maker can make an efficient decision, while handling a great number of complex rules and without losing inducing capability of design data objects. 4.1 Rough-Set and Design Decision System Design information system is a pair of Α = (U , A) , where U is a non-empty finite set of design data objects called the universe and A is a non-empty finite set of conditional attributes, such that a : U → Va for every a ∈ A . The set Va is called the value set of a. A design decision system is any design information system of the form Α = (U , A U {d }) , where d ∉ A is the decision attribute.
In the rough set approach, a discernibility relation DIS (B ) ⊆ U 2 , where B ⊆ A is a subset of attributes of an information system (U, A) , is defined by xDIS (B ) y if and only if non- (xI (B ) y ) , where I (B ) = {( x, y ) ∈U 2 ∀a ∈ B a ( x ) = a ( y )} is the Bindiscernibility relation [17]. The equivalence classes of the B-indiscernibility relation are denoted as [x]B . Set approximation is for delineating outcomes, which we cannot define object crisply. In other words, it is not possible to induce a crisp (precise) description of different objects in terms of decision attribute while they are not discernible in terms of a set of conditional attributes of the decision system. However those objects can belong to a boundary between the certain cases. If this boundary is non-empty, the set is rough. Let B ⊆ A and X ⊆ U be a subset of conditional attributes of decision system and a subset of the universe set, respectively. We can construct B-lower and Bupper approximations of X, denoted by B( X ) and B ( X ) , respectively, where B( X ) = {x [x ]B ⊆ X } and B ( X ) = {x [x ]B I X ≠ Ø} . Rough set can be also characterized numerically by the following coefficient, α B ( X ) = B( X ) / B ( X ) , called the accuracy of approximation, where |X| denotes the
392
K.-Y. Kim, K. Choi, and O. Kwon
cardinality of X ≠ Ø . Obviously, 0 ≤ α B ( X ) ≤ 1 . If α B ( X ) = 1 , X is crisp with respect to B (X is precise with respect to B), and otherwise, if α B ( X ) ≤ 1 , X is rough with respect to B (X is vague with respect to B). 4.2 Reduct for Design Decision System A crucial concept in the rough set approach is that of a reduct. The term “reduct” corresponds to a wide class of concepts. Reducts are used to reduce information (decision) systems by removing redundant attributes. Given a design information system Α = (U , A) , a reduct is a minimal set of attributes B ⊆ A such that I (B ) = I ( A) , where I (B ) , I ( A) are the indiscernibility relations defined by B and A, respectively [15],[17]. The intersection of all reducts is called a core. Decision-relative reducts is nothing different from core but a minimal non-empty set of attributes that affect the classification power of the original decision table. Table 2 includes core functions that are used to discern design data objects. The discernibility function f Α describes constraints which must hold to preserve discernibility between all pairs of discernible objects from Α. The decision-relative discernibility function f Α for Α is constructed from M ′(Α ) in the same way as constructed from M (Α ) . r
f Α is
r
Construct a decision-relative discernibility function f Α by considering the row corresponding to object x in the decision-relative discernibility matrix for Α. Compute
f Αr . On the basis of the prime implicants, create minimal rules corresponding to x. To do this, consider the set A(I ) of attributes corresponding to all prime implicants of
propositional variables in I, for each prime implicant I, and construct the rule:
⎛ ⎞ ⎜⎜ ∧ (a = a( x ))⎟⎟ ⇒ d = d (x ) ⎝ a∈A( I ) ⎠ Table 2. Core functions to discern design data objects
Rule Selection for Collaborative Ubiquitous Smart Device Development
393
Demands Cost Price Ubiquitous Smart Device
Customer Needs
Functions
• Ubiquity • Design • Manufacturing • Maintenance
Fig. 2. Design meta-attributes and relationships of USD
5 Case Study: Rule Reduct for USD Physical Components In order to realize the competitive and innovative USD, what the customers really want for the product should be identified and the core technological requirements should be realized by hardware and software design and manufacturing. Figure 2 depicts important inputs and outputs to realize a USD. It also presents meta-attributes associated to the USD. Table 3 includes more detailed attributes. These attributes are very important in order to realize an autonomous collaboration environment for developing a USD. By employing the rough set theory, relationships and degrees of importance between these attributes can be determined. With the relationship and the degrees of importance, required rules for the USD design decision can be extracted systematically and the required rules will form a set of minimal rules. This set of simplified rules supports software agents to collaborate autonomously. Table 3. Examples of meta-attributes Metaattribute Attribute
Ubiquity Natural Interface Adaptable I/O Portability Networking Situation Sensing
Design
Part Assembly Geometry Topology Assembly relation
Manufacturing Fabrication method Joining method Material Cost
Maintenance Proactive maintenance Adaptive maintenance Diagnostics
The following paragraphs describe a case of rule reduct with USD attributes, particularly, assembly relationships of USD design attributes. Assembly relationships are one of core considerations before launching a successful or quality USD to the market. Similar to a typical mobile device (e.g., mobile phones, PDAs, etc.), a lot of assembly considerations exist in designing USD. Examples of associated design
394
K.-Y. Kim, K. Choi, and O. Kwon
decisions include differentiation of soldering a System on Board (SoB) or of riveting for a printed circuit board with a composite case. Table 4 includes an example of a decision table, which consists of design decision and conditional attributes (e.g., form features, parts, mating areas, and their mereotopological relationships). The rough set theory can induce a set of rules for the design decision by generating the discernibility matrix (Table 5) corresponding to the decision table shown in Table 4. Table 4. The decision table considered in this case f1 F11 F11 F11 F11
A U1 U2 U3 U4
f2 F31 F31 F21 F21
f3 W1 W1 -
f4
f5
FS1 FS2
f6 J1 J1 J3 J3
f7 1 1 0 0
f8 1 0 0 0
f9 0 1 0 0
d 0 0 1 1
1 1 0 0
Table 5. The discernibility matrix corresponding to the decision table in Table 4 D(A) (u1,u3) (u1,u4) (u2,u3) (u2,u4)
f2,1 1 1 1 1
f3,1 1 1 1 1
f4,1 1 1 1 1
f4,2 1 1
f5,1 1 1 1 1
f6,1 1 1 1 1
f7,1 1 1 -
f8,1 1 1
f9,1 1 1 1 1
The required Boolean function from Table 5 is given by following: (f2,1 ∨ f3,1 ∨ f4,1 ∨ f5,1 ∨ f6,1 ∨ f8,1 ∨ f9,1) ∧ (f2,1 ∨ f3,1 ∨ f4,1 ∨ f5,1 ∨ f6,1 ∨ f7,1 ∨ f9,1) ∧ (f2,1 ∨ f3,1 ∨ f4,1 ∨ f4,2 ∨ f5,1 ∨ f6,1 ∨ f8,1 ∨ f9,1) ∧ (f2,1 ∨ f3,1 ∨ f4,1 ∨ f4,2 ∨ f5,1 ∨ f6,1 ∨ f7,1 ∨ f9,1). Induced rules from the discernibility function are still complicated to adopt for a USD design decision. To reduce the complexity, we employ Boolean reasoning to reduce the induced rules to the minimal set of rules. In the example, the discernibility function, D(A), has the following prime implicants: f2,1, f3,1, f4,1, f5,1, f6,1, f7,1 ∧ f8,1, f9,1, which is a minimal set of rules for an assembly design of the USD physical components. As shown in this result, the minimal set of rules is represented in much simpler rules, which are associated with OR relationships. This set of simpler rules can have multiple advantages: 1) the capability to handle incomplete information (a comprehensive rule associated with AND relationships should acquire all the information before the rule can be used); and 2) classification of unique features among various attributes, which often exist in a mixed form of a rule. While SWRL rules are to represent a specific single design object with multiple attributes, the rough set rules determine which attributes (or features) define design objects, while comparing multiple design objects.
6 Conclusion and Future Work Future USD will be considerably different from legacy mobile devices, in a sense that new functionalities and use cases will become available as new way of interactions
Rule Selection for Collaborative Ubiquitous Smart Device Development
395
between human and devices or even between devices is elaborated. Moreover, necessities for personalized USD will be growing as personal computing become usable; the requirements of USD will be much more complicated and hence a gap between the supply and demand will be increased for the time being. These recent changes require timely collaboration between the suppliers and potential buyers. To realize a true collaborative environment, the capture, retrieval, accessibility, and reusability of USD design knowledge are increasingly important. However, it is very difficult to define all the taxonomy and terms before a USD design, which requires highly personalized design elements. An example includes natural interface that can be customized for users or by users, as originally introduced by M. Weiser. The USD design requires iterative process to converge into a satisfactory design and interactions between multi-disciplinary stakeholders. In this aspect, the proposed design rule selection paradigm can leverage a formal communication between collaborators and design information reuse by mining design rules from the acquired design information. This research highlights the complexity of design decision rules in autonomous collaboration for USD physical component design. A USD design can be semantically represented in ontology; however, the computational complexity of semantic reasoning is a very sophisticated and time-consuming task. This paper presented a rough set based methodology to generate the appropriate minimal set of design rules for the USD design collaboration. In future research, the presented rough set framework will be validated by comparing the manually defined SWRL rules and inducted rules, to highlight the capability of rough set theory to handle complex design rules without losing inducing capability of design data objects. Also, a searching problem for an optimal set of cuts from minimal set of rules, which is a NP-hard, will be tackled. Efficient heuristics will be developed to obtain reasonable sets of cuts in an acceptable computational time.
Acknowledgement This research is supported by the ubiquitous Autonomic Computing and Network Project, through the Ministry of Information and Communication (MIC) 21st Century Frontier R&D Program in Korea.
References 1. Agard, B., Kusiak, A.: Data-mining-based methodology for the design of product families. International Journal of Production Research 42(15), 2955–2969 (2004) 2. AWS A3.0-01: Standard Welding Terms and Definitions. The American Welding Society (2001) 3. Ballagas, R., Borchers, J., Rohs, M., Sheridan, J.G.: The Smart Phone: A Ubiquitous Input Device. IEEE pervasive computing 5(1), 70–77 (2006) 4. Brown, F.: Boolean Reasoning. Kluwer Academic Publishers, Dordrecht (1990) 5. Cycorp, Inc. (2007), http://www.cyc.com/cyc 6. Fox, M.S., Gruninger, M.: Enterprise Modeling. AI Magazine, 109–121 (1998)
396
K.-Y. Kim, K. Choi, and O. Kwon
7. Gellersen, H.W., Schmidt, A., Beigl, M.: Multi-Sensor Context-Awareness in Mobile Devices and Smart Artifacts. Mobile Networks and Applications 7(5), 341–351 (2002) 8. Gruber, T.R.: A Translation Approach to Portable Ontology Specification. Knowledge Acquisition 5(2), 199–220 (1993) 9. Horváth, I., Pulles, J.P.W., Bremer, A.P., Vergeest, J.S.M.: Towards an Ontology-based Definition of Design Features. In: SIAM Workshop on Mathematical Foundations for Features in Computer Aided Design, Engineering, and Manufacturing (1998) 10. Huang, C.C., Tseng, T.L., Chuang, H.F., Liang, H.F.: Rough-set-based approach to manufacturing process document retrieval. International Journal of Production Research 44(14), 2889–2911 (2006) 11. Kim, K.Y., Manley, D.G., Yang, H.J.: Ontology-based Assembly Design and Information Sharing for Collaborative Product Development. Computer-Aided Design (CAD) 38, 1233–1250 (2006) 12. Kitamura, Y., Kashiwase, M., Masayoshi, F., Mizoguchi, R.: Deployment of an ontological framework of function design knowledge. Advanced Engineering Informatics 18(2), 115–127 (2004) 13. Kusiak, A., Kurasek, C.: Data Mining of Printed-Circuit Board Defects. IEEE Transactions On Robotics And Automation 17(2) (2001) 14. Mizoguchi, R.: Tutorial on Ontological Engineering Part 1: Introduction to Ontological Engineering. New Generation Computing 21(4), 365–384 (2003) 15. Pawlak, Z.: Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991) 16. Pawlak, Z., Skowron, A.: Rough sets and Boolean reasoning. Information Sciences 177(1), 41–73 (2007) 17. Pawlak, Z., Skowron, A.: Rudiments of rough sets. Information Sciences 177(1), 3–27 (2007) 18. Rippey, W.G.: NISTIR 7107, A Welding Data Dictionary, National Institute of Standards and Technology (2004) 19. Schlenoff, C., Ivester, R., Libes, D., Denno, P., Szykman, S.: An analysis of existing ontological systems for applications in manufacturing and healthcare. NISTIR 6301 National Institute of Standards and Technology (1999) 20. Uschold, M., King, M., Moralee, S., Zorgios, Y.: The enterprise ontology, vol. 13 (Special Issue on Putting Ontologies to Use). Knowledge Engineering Review (1998) 21. World Wide Web Consortium: OWL Web Ontology Language Guide, http://www.w3c.org/TR/owl-guide 22. World Wide Web Consortium: SWRL: A Semantic Web Rule Language Combining OWL and RuleML, http://www.w3.org/Submission/2004/SUBM-SWRL-20040521/ 23. Yao, Z., Bradley, H.D., Maropoulos, P.G.: An Aggregate Weld Product Model for the Early Design Stages. Artificial Intelligence for Engineering Design, Analysis, and Manufacturing 12, 447–461 (1998)
An Object-Oriented Framework for Common Abstraction and the Comet-Based Interaction of Physical u-Objects and Digital Services Kei Nakanishi1, Jianhua Ma1, Bernady O. Apduhan2, and Runhe Huang1 1
Hosei University, Tokyo 184-8584, Japan [email protected], {jianhua, rhuang}@hosei.ac.jp 2 Kyushu Sangyo University, Fukuoka 813-8503, Japan [email protected]
Abstract. One essential feature of ubiquitous computing is to process the information of real objects in real environments surrounding the users to offer novel services to people in their daily lives. Such kind of information is often taken from small devices that are embedded into physical objects called uobjects, and distributed in surrounding places such as rooms and offices. A ubiquitous system generally involves many physical u-objects and their dynamic interactions with digital services, i.e., software entities residing in different machines. To overcome the great heterogeneity of u-objects and digital services as well as platforms and networks used, various frameworks have been proposed and tested. However, there still exist many problems to gracefully integrate these physical u-objects and digital services, and let them seamlessly interact with each other. Therefore, this study is focused on a ubiquitous framework that maps all physical u-objects and digital services commonly into their corresponding abstracted objects, and enables all the objects to interact based on message exchanges via the Comet Web server using the HTTP protocol which is platform independent and able to run on many kinds of physical networks.
1 Introduction Ubiquitous computing, in contrast to cyber computing that is mainly for virtual ethings on computers/web, is primarily for physical things in the real world and their function enhancement or extension are realized by adding computing abilities to sense information, conduct communication and processing. Real physical things (objects and spaces) are called u-things (u-objects and u-spaces), as opposed to virtual ethings, if they are attached, embedded or blended with computers, networks, sensors, actors, IC-tags, and so on [1, 2]. Due to the continuing miniaturization of electronic chips and electro-mechanical devices, there are more and more u-objects developed ranging from handheld devices (cell phone, PDA, etc.), home appliances (TV, refrigerator, etc.), to ordinary goods such as books attached with RFID tags to store the books’ related information, e.g., the book’s title, author’s name, year of publication, ISBN number, etc. As a result, F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 397–410, 2008. © Springer-Verlag Berlin Heidelberg 2008
398
K. Nakanishi et al.
u-objects can be regarded as a special kind of physical entities that are able to keep some data electronically and interact with each other via wired or wireless communications. On the other hand, there are lots of digital services such as software applications, tools or components on computers and the Internet, which can be used to process the physical object related information to provide specific service functions. A digital service in this context means a software entity that resides in either a local or a remote system which can be accessed over a network by a user or another service. A service is usually able to process some operation when activated by a demand and send the result to the client. For example, an online bookstore may have a book information service software tool to provide catalog searching, purchasing procedure, latest publication information, and so on. Actually, there are many digital services, such as map service, news search, and online banking service which are now available on the Internet. A ubiquitous system generally involves many physical u-objects and digital services, which interact with each other to accomplish some assigned task(s). To support their universal interactions, one of the main challenges is how to deal with the great heterogeneity of different u-objects and various digital services as well as the varying platforms and networks used. Many systems have adopted their original approaches and implementations which made the developed systems with poor interoperability and extensibility. To overcome these drawbacks, one solution is to build a general middleware or framework, a reusable design of a program and a set of prefabricated software building blocks, which can be used, extended, or customized for specific computing system, and does not need to be built from scratch every time a user writes an application. To build a framework, it is essential to model system components and their relations. Recently, lots of researches have paid much attention to the service-oriented architecture (SOA) in which everything, i.e., any hardware/software entity, is regarded as a service [3]. This approach is quite like the object-oriented paradigm to model everything as an object [4]. Several object-oriented languages such as C++ and Java are widely used, and a lot of object-oriented software including embedded device drivers and digital services had been developed and made available. To easily integrate both u-objects and digital services, it seems more direct and simple to adopt the object oriented model in ubiquitous computing framework. Therefore, this study is focused on the development of an object-oriented framework that abstracts all physical u-objects and digital services commonly into their corresponding virtual objects on the cyberspace. And all interactions between the abstracted objects are based on message exchanges via a Comet Web server using the HTTP protocol which is platform independent and is capable of running on many kinds of physical networks. The rest of this paper presents the framework in detail and shows some applications developed using the framework. Section 2 describes related research about the acquisition of the physical object information, interaction of digital services on a network as well as various representative frameworks. Section 3 explains our basic design ideas and discusses the important concepts of object abstraction and message exchanges using a Comet web server. Section 4 explains the framework layer, architecture and work flow. Some experimental applications using the framework are shown as case studies in Section 5. Conclusions and future work are addressed in the last section.
An Object-Oriented Framework
399
2 Related Work One of the main issues in ubiquitous systems is how to acquire, represent and use physical object-related information. Such kind of information may be from u-objects with embedded devices such as RFID (Radio Frequency Identification) tags and sensors. For example, RFID can be used to tract locations [5], goods quantity update [6], and detect tagged objects [7]. The sensed data is usually converted into some special format such as XML (Extensible Markup Language), which can be easily processed by software services [8]. A Distributed Architecture for a Ubiquitous RFID Sensing Network [9] focuses on the management of various sensors which are represented in RDF (Resource Description Framework) and OWL (Web Ontology Language). The service-oriented architecture (SOA) [4] is one of the popular concepts/models for building large-scale computer systems. The most popular SOA are the Internet based software systems, e.g., Web services and the Semantic Web. DAML-S [10] and OSGi (Open Services Gateway initiative) [11] support management of remote software components and their cooperation. The SOA concept can be extended from software to hardware devices. In particular, it can be applied to robotize a real space, i.e., some distributed systems may have robot components, such as artificial eyes, hands, legs, etc. installed somewhere in a room, and thus the room becomes a set of robots connected by networks [12]. The Sun Java System RFID Software [13] with EPC (Electronic Product Code) technology can bridge a RFID and a service, capture the RFID events, and collect the captured data and store in a server. It is designed to address large-scale implementations in enterprises that need to integrate real-time data using a large number of RFID tags. The Ubiquitous Service Finder [14] enables to invoke a service with the physical object’s data by simple drag and drop operations on a mobile phone. The Flipcast [15] is a lightweight agent framework with scripts for interaction between different devices. All the above frameworks and middleware are generally with the following functions. First, they enable to invoke the services with the collected physical objectrelated data. Second, they support either device-to-device interaction, or service-toservice interaction. Third, they provide services dynamically according to users’ requests or actions. In addition to the above general functions, our framework further provides the device-to-service interaction by mapping both physical artifacts and digital services commonly to abstracted objects. The abstracted objects are similar as the objects in our previous groupware systems, in which all shared objects are Java applet clients [16] and JXTA peers [17], respectively. That is, the interactions between the abstracted objects can be done similarly as collaborations among the shared objects in these groupware systems. Therefore, it can also be referred to as collaboration control approaches in groupware systems to manage the abstracted objects and their collaborations. Since all interactions of abstracted objects are carried out by message exchanges using HTTP protocol, our framework works on any Web-enabled platforms and HTTP-supported networks.
400
K. Nakanishi et al.
3 Object, Message, and the Comet Web Server As indicated previously, our framework is designed and developed using an objectoriented software engineering approach that models a system as a group of interactive objects. Our basic idea is to map all physical u-objects and digital services commonly into their corresponding virtual objects, as shown in Fig. 1.
virtual objects plane
Comet Web Server
Object1
UO1
Object2
UO2
Object3
DS1
UO and DS plane message mapping UO: u-object DS: digital service
Fig. 1. Object abstraction and interaction
Although the implementation of communication between an abstracted object and its corresponding u-object or digital service is device or service dependent, all abstracted objects share the common features in our framework. The abstracted objects can be made theoretically using any program language, and may exist on various devices/machines running on different platforms and networks. To support interactions between objects on various machines in heterogeneous environments, it is therefore necessary to use a common communication scheme and protocol for message exchange between all abstracted objects. The communication adopted in our framework is based on message exchange via a Comet Web server as shown in Fig. 1. The rest of this section describes the concept of object and message, as well as their interaction based on message exchanges on the Comet Web server. 3.1 Object Abstraction and Message An object is a software entity consisting of a private memory to keep its own data and a set of operations to take some actions. Interactions between any objects are carried out via message exchanges, like the Smalltalk-80 software system [18]. A message sent by object A specifies which operation is requested to object B without caring how the operation is carried out. Object B decides how to carry out the requested operation, and what should be returned to object A according to A’s request message. A crucial property of an object is that its private memory can be manipulated only by its own operations. A crucial property of messages is that they are the only way to invoke an object’s operations. These properties insure that the implementation of one object does not depend on the other objects. In our framework all physical artifacts and digital services are commonly abstracted to the objects as counterparts on cyberspaces, and their interactions are conducted via messages, as shown by the example in Fig. 2.
An Object-Oriented Framework Physical Objects Physical Reader Devices RFID Sensor
401
Digital Services Location Service
PDA
Object Abstraction
class / objects PositionReader private memory - name : String - position : Position
Virtual Environment
class / objects LocationViewer
message getPosition()
private memory - map : Map return Position
set of operations + getName() : String + getPosition() : Position
set of operations + showMap(room : String) : void Abstracted Objects
Fig. 2. Objects and messages
The physical objects such as reader devices that sense or read RFID data are abstracted and mapped onto the corresponding objects as software entities. The class means a set of objects that share the same characteristics but have different names to identify different objects. For example, the RFID reader shown in Fig. 2 is abstracted as a position reader object with its device name and the position data kept in its private memory. It has the getName() operation to return the name, and the getPosition() operation to read a RFID tag and reply the tag’s data to the operation request object. The location service for tracking a user and displaying it on a map is abstracted as the location viewer object that has the map data in its private memory, and showMap(room:String) operation. The location viewer object sends a getPosition() message to the position reader object, and receives the position data from the reader object without the need to know how the getPosition() operation is accomplished by the reader. 3.2 Message Exchange on Comet Web Server All objects connect to the Web server as clients and communicate with the server using HTTP protocol. When an object wants to interact with another object, it sends a HTTP request message to the server. All objects always listen if there is any new message coming from the server. When a request message is detected, the object will process the message and send a HTTP message as a response to the server. HTTP is used widely as a communication protocol, which is able to cross over machines and software boundaries as well as getting through a firewall. However, HTTP is a request-response protocol without keeping a persistent connection. Traditionally, an HTTP server’s message, as shown on the top of Fig. 3, will be delivered to a client only when the client makes a request and after which the connection is terminated. To promptly detect a new event/message arriving at the server, a client has to repeatedly
402
K. Nakanishi et al.
make polling requests to the server. If the polling is repeated too frequently, the computation and communication overhead increases greatly. If the interval between two polling requests is too long, all objects may receive messages with long latency, and the whole system could not respond requests or context changes in real time.
respons e
respons e
event request
request
request
respons e
Traditional HTTP application model Client Process event event
Server Process
push
push
event
push
Server Process event
event post
initialize
ion connect
push
Comet application model Client Process
event
event
Fig. 3. Traditional HTTP and Comet models
Comet, as shown at the bottom of Fig. 3, is an event-driven, server-push data streaming model for a Web server to suspend a request from a client, and deliver data to the client only when some event occurs [19]. Instead of repeatedly polling for getting new events, a Comet application/client, when receiving a pushed message from a Comet server, can post the next request immediately so as to keep a persistent HTTP connection between the server and the client. By adopting the Comet-based Web server, an object can receive messages from other objects almost in real time. Some platforms have been supporting Comet. The Cometd is a scalable Comet platform that consists of a protocol spec called Bayeux, JavaScript libraries (dojo toolkit), and an event server. The Jetty has an implementation of the Cometd event server. The Sun Microsystems GlassFish also supports Comet mechanism using the Grizzly framework. In Tomcat, the new HTTP connector which uses NIO API has been added to Comet applications. The Resin implements the Comet servlet API which enables the streaming communication. The Lingr is the browser-based chat service using Comet for real-time message notification developed by Infoteria Corporation [20].
4 Framework Layer and Architecture Figure 4 shows the layers of our framework that is implemented on three web servers: application server (Tomcat), Comet server (Lingr), and database server (Derby). The
An Object-Oriented Framework
403
Applications with Objects Interaction Object-oriented Comet Framework Object Manager Object Abstraction
Object Chat Space
Authentication
Message Exchange
Application Server
Comet Server
Object Coordinator Context Publication Save/Load Object Database Server
HTTP Physical Network
Fig. 4. Framework layers
framework works on any machine and physical network so long as HTTP is supported. This framework consists of three parts; namely, the object manager, chat space, and coordinator, which are implemented in Java. The framework architecture and work flow are shown in Fig. 5, in which objects corresponding to a RFID, a PDA, a robot, and a location service exists. All objects should implement the HTTP connector and the message receiver for communications with Web servers via HTTP message exchange. Object Manager. It is for managing physical objects and digital services as well as their abstraction. When they login to a system using this framework, the system authenticates whether they are registered. If registered, they are mapped to the corresponding abstract objects. After that, the objects can exchange messages in a common object chat space. The authentication server is implemented on Tomcat. Object Chat Space. It is a common virtual place for message exchanges between objects. An object can send a message to and receive messages from others on the chat space. An object’s message is sent to all other objects in the object chat space, and the related object which is the target receiver of the requested message invokes an operation specified in the message. Other objects ignore this message. The chat space is implemented on the Comet server using Lingr. Object Coordinator. It provides two functions for object interactions. The first one is to analyze the received object’s event message and publish semantic data to the object chat space. Since the generated semantic data is sent to the object chat space, all objects get to know this data. The other function is to save and load an object with object’s state information. When a physical artifact or digital service logouts/leaves from the framework, its corresponding abstract object is saved in the object database to preserve the object’s configuration and state which will be readily available whenever the object is used next time. The context server is implemented on Tomcat, and object database adopts the database server, Derby.
404
K. Nakanishi et al.
get event
Abstracted Objects RFID Object
long-polling PDA Object
Comet Server (Lingr) notify event
Robot Object
notify event
Location Service Object
key
load
Load Object
load
save Object Database (Derby)
Authentication Server (Tomcat)
Object Coordinator
Object Manager
HTTP Connection
login RFID
PDA
get event
Publish Analyze Context Event Context Server (Tomcat)
object
Authenticate
long-polling
Object Chat Space
Robot
Location Service
Physical Objects and Digital Services
Fig. 5. Framework architecture
5 Case Studies Using the Framework This section describes three sample applications developed using the framework as case studies to explain how object interactions are carried out. 5.1 Case 1 – User Location Tracking This example is focused on the function of the object coordinator to analyze the message from the Location Notifier object and publish semantic context position message in the object chat space in order to display a user location on a floor map, as shown in Fig. 6. The location notifier is an object which reads a RFID tag and sends the user name and ID. The tag is placed somewhere in a room. When a user carrying a PDA as a location notifier gets near to a tag, the notifier detect the tag and transmit a message with the user’s name and tag ID to the object chat space. The message in this example is “/location 01023c5ded”. The object coordinator analyzes the message and publishes the context “kei is in room W4024” on the object chat space. The location viewer is an object which acquires position data and displays a user’s current location on the map. It connects the object chat space using long-polling to receive the messages. Since the published context is notified in XML format, the location viewer parses the received XML, and extracts the user’s name and location data. The icon of user “kei” is shown on the location viewer application.
An Object-Oriented Framework
Analyze Event
Publish Context
long-polling
Object Coordinator
kei message
405
context message
post location id
long-polling
long-polling
Object Chat Space
Object Manager
Location Notifier (RFID Object)
notify context (XML)
Location Viewer (Location Service Object)
Fig. 6. Location tracking application
5.2 Case 2 – Robot’s Activity Continuation This example is to explain the function to save and load an object by the object coordinator. Figure 7 shows the case of activity continuation carried out by two different robots. Each robot object has its own memory to keep its current position and destination data, and its own move operation. Robot A is going to move to the destination position (100, 50) to do some work. The value of current position in the robot object depends upon the physical robot’s location. Robot A may be physically obstructed by a wall and cannot reach the destination. In this case, it has to stop and then save its state
Robot B
Robot A move
move Stop
Stop
Physical Objects Object Abstraction A : Robot Current = (0, 50) Destination = (100, 50)
move
A : Robot
B : Robot
Cuurent = (30, 50) Destination = (100, 50)
Current = (40, 50) Destination = (100, 50)
save object
move
load object : Robot Destination = (100, 50)
Object Coordinator (Object Database)
Fig. 7. Robot’s activity continuation
B : Robot Current = (100, 50) Destination = (100, 50) Object Manager Object Name: Class Name State
406
K. Nakanishi et al.
including the destination position data (100, 50) into the object database. Then, robot A can send a message to the chat space to ask robot B to take over the work. Robot B can get the destination and work information by loading robot A’s data, and then continue the work. 5.3 Case 3 – Robot Motion Control A ubiquitous application may be involved with many objects and a sequence of interactions between the multiple objects to cooperatively accomplish a task. This case is an application to conduct robot motion control to demonstrate how the relatively complicated interactions are carried out when using our framework. In this example, there is one robot and a set of digital services. The robot has a move operation and can move according to the received route data. To get the route data, several external service objects are involved, and they interact and cooperate with each other. The robot and digital services used are given first, and then their interaction flows are explained in the following. Robot. The robot and devices used in this application are shown in Figure 8, and their details are given below. • ROBODESIGNER RDS-X03 Platform+CDE • Series 2000 - Low Frequency Micro Evaluation Kit RI-K3A-001A-00 • Zaurus SL-C3100 • Phidget RFID Reader
Fig. 8. Robot, RFID reader and tags
The robot is equipped with a PDA that implements the program for login to the framework, and also functions the robot controller to receive the motion data from a route service and pass the data to the robot. The route of the robot towards the destination is calculated by the external route service based on the robot’s current position and destination as well as the room layout. The robot’s current position is detected by reading a nearby tag using the RFID reader carried on by the robot. Graphical User Interface: Figure 9 shows the interface to test the system using the framework visually. When a user logs-in to the system, the interface shows all available services, i.e., lookup service, shell service, map service and objects’ messages.
An Object-Oriented Framework
407
Lookup Service
Objects’ status
Shell Service
Fig. 9. Interface panel
Lookup Service: It replies a list of services and their descriptions which are available in the room where the user is located. Shell Service: It is used to type a command line to specify the objects used and their interactions. For example, the command line “TestMessage | RouteService | CompileService | Robot” means that the four objects are involved and they are connected in a pipeline fashion. Map Service: It replies a map to a requester. The map may be a room with physical objects and their layout data, which can be used for the calculation of a robot route by the route service. Route Service: It replies the route from a starting point to a destination using some searching method such as depth-first search (DFS), thinning, and potential method with dynamic programming. Compile Service: It generates the motion data for the robot in the following steps. First, it receives a message including the route data. Next, it rewrites the C source code dynamically according to the route message, and compiles this source code to the S19 file which is the motion data for the robot by the SDCC (Small Device C Compiler). Finally, it outputs this motion data written in S19 file as the response of the service. Interactions in Robot Motion Control: Figure 10 shows the sequence of the interactions in robot motion control in which all of the above services are used to find the robot’s position, calculate the route and send motion code to the robot. The following steps are invoked in the object chat space.
408
K. Nakanishi et al. Physical Objects and Digital Services Interface
LookupService
ShellService
Robot
RouteService
MapService
CompileService
Interface
LookupService
ShellService
Robot
RouteService
MapService
CompileService
+ sendMessage()
+ getServiceList()
+ connectServices()
+ move() + getLocation()
+ searchRoute() + generateRoute()
+ getMap()
+ outputSource() + compileSource()
Object Manager getServiceList() return serviceList connectServices() getLocation() return Location searchRoute() getMap() return Map generateRoute() return Route outputSource() compileSource() return MotionData move()
Object Chat Space
Fig. 10. Interaction in robot motion control
Login Steps Step 1: The above robot and services login to the system using this framework. Step 2: The object manager authenticates the robot and services, and abstracts them as objects. Step 3: The objects participate on the object chat space to exchange messages with each other. Interaction Steps Step 4: The interface sends the message getServiceList() to the lookup service, and then receives the returned service list. Step 5: The interface sends the shell command and message connectServices() to the shell service. Step 6: The shell service sends message getLocation() to the robot, and then receives the location data. Step 7: The shell service sends the message searchRoute() with the robot’s location data and destination location data to the route service. Step 8: The route service sends message getMap() with the location data to the map service, then receives the returned map including the robot. Step 9: The route service invoke its own operation generateRoute(), and returns the generated route to the shell service. Step 10: The shell service sends the message outputSource() with the received route to the compile service.
An Object-Oriented Framework
409
Step 11: The compile service invoke its own operation compileSource() with the generated source code, and then returns the motion data to the shell service. Step 12: The shell service sends the message move() with the received motion data to the robot, and then the robot starts to move.
6 Conclusions and Future Work This paper described a framework for common abstraction and interaction of physical artifacts and digital services. This framework has been implemented on the Comet web server, and used Comet to enable the seamless interaction between any objects. The mapping of physical artifacts and digital services to abstracted objects provides seamless objects’ interactions under heterogeneous platforms and networks. Although this framework provides platform independent interaction using HTTP messages, a general event analysis mechanism should be studied for objects to easily know semantic meanings of messages. In the current framework, all objects are exchanging messages via only one Comet Web server, which makes application development easy, but may result in scalability problem for a very large ubiquitous system with many objects. One possible solution is to use a group of distributed Comet Web servers in some scalable hierarchical structure which are able to work collaboratively.
Acknowledgement This research was supported by Strategic International Cooperative Program, Japan Science and Technology Agency (JST).
References 1. Ma, J.: Smart u-Things – Challenging Real World Complexity. In: IPSJ Symposium Series, vol. 2005(19), pp. 146–150 (2005) 2. Ma, J., Yang, L.T., Apduhan, B.O., Huang, R., Barolli, L., Takizawa, M.: Towards a Smart World and Ubiquitous Intelligence: A Walkthrough from Smart Things to Smart Hyperspaces and UbicKids. International Journal of Pervasive Comp. and Comm. 1(1) (March 2005) 3. Papazoglow, M.P., Georgakopoulos, D.: Service Oriented Computing. In Communications of the ACM 46(10), 25–28 (2003) 4. Hunt, J.: Smalltalk and Object Orientation - An Introduction. Springer, Heidelberg (1997) 5. Okuda, K., Yeh, S.-y., Wu, C.-i., Chang, K.-h., Chu, H.-h.: The GETA Sandals: A Footprint Location Tracking System. In: Proc. Of Location- and Context-Awareness, pp. 120–131 (May 2005) 6. Borriello, G., Brunette, W., Hall, M., Hartung, C., Tangney, C.: Reminding About Tagged Objects Using Passive RFIDs. In: Proceedings in UbiComp 2004: Ubiquitous Computing, pp. 36–53 (September 2004) 7. Fishkin, K.P., Jiang, B., Philipose, M., Sumit Roy, I.: Sense a Disturbance in the Force: Unobtrusive Detection of Interactions with RFID-tagged Objects. In: Proceedings in UbiComp 2004: Ubiquitous Computing, pp. 269–282 (September 2004)
410
K. Nakanishi et al.
8. Ito, Y., Kitamura, Y., Kikuchi, H., Watanabe, K., Ito, K.: Development of Interactive Multimedia Contents Technology That Unites Real Space and Virtual Space in Seamless - Interactive Block System, http://www.ipa.go.jp/SPC/report/03fypro/mito/15757d.pdf 9. Ranasinghe, D.C., Leong, K.S., Ng, M.L., Engels, D.W., Cole, P.H.: A Distributed Architecture for a Ubiquitous RFID Sensing Network. In: Intelligent Sensors, Sensor Networks and Information Processing Conference, pp. 7–12 (December 2005) 10. Ankolekar, A., et al.: DAML-S: Web Service Description for the Semantic Web. In: Proc of 1st International Conference on Semantic Web (ISEC 2002), pp. 348–363 (2002) 11. Yamasaki, I., Yata, K., Kaeomichi, H., Tsutsui, A., Kawamura, R.: Security Functions of a Distributed Network Middleware CSC on OSGi Frameworks. IEICE Technical Report. Information Networks 105(113), 35–40 (2005) 12. Nishida, Y., Hori, T., Suehiro, T., Hirai, S.: Sensorized Environment for Selfcommunication Based on Observation of Daily Human Behavior. In: Proc. of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000), pp. 1364–1372 (2000) 13. Sun Microsystems, The Sun Java System RFID Software Architecture (March 2005) 14. Kawamura, T., Ueno, K., Nagano, S., Hasegawa, T., Ohsuga, A.: Ubiquitous Service Finder - Discovery of Services Semantically Derived from Metadata in Ubiquitous Computing. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729. Springer, Heidelberg (2005) 15. Ueno, K., Kawamura, T., Hasegawa, T., Ohsuga, A., Doi, M.: Cooperation between Robots and Ubiquitous Devices with Network Script Flipcast. In: Proc. of Network Robot System: Toward Intelligent Robotic Systems Integrated with Environments (IROS) (2004) 16. Ma, J., Huang, R., Nakatani, R.: Towards a Natural Internet-Based Collaborative Environment with Support of Object Physical and Social Characteristics. International Journal of Software Engineering and Knowledge Engineering 11(2), 37–53 (2001) 17. Kawashima, T., Ma, J.: TOMSCOP -A Synchronous P2P Collaboration Platform over JXTA. In: IEEE CS Proceeding of the International Workshop on Multimedia Network Systems and Applications (MNSA 2004), Tokyo, Japan (March 2004) 18. Goldberg, A., Robinson, D.: Smalltalk-80: The Language and its Implementation. Addison-Wesley (1983) 19. Russell, A.: Comet: Low Latency Data for the Browser, Continuing Intermittent Incoherency (November 2007), http://alex.dojotoolkit.org/?p=545 20. Lingr, http://www.lingr.com/
Personalizing Threshold Values on Behavior Detection with Collaborative Filtering Hiroyuki Yamahara, Fumiko Harada, Hideyuki Takada, and Hiromitsu Shimakawa Ritsumeikan University, 1-1-1 Noji-Higashi, Kusatsu, 525-8577 Shiga, Japan [email protected], {harada,htakada,simakawa}@cs.ritsumei.ac.jp
Abstract. We are developing a system which assists users by collaboration between the users and environment. Our collaboration system provides services according to user behavior proactively in homes when environment detects high-level user behavior such as “leaving the home”. To realize such a collaboration system, this paper proposes a method for detecting high-level user behavior. The proposed method dynamically sets values suitable for individual behavioral pattern of each user to thresholds used for detection. A conventional method determines threshold values common to all users. However, the common values are not always suitable for all users. Our method determines threshold values suitable for a user by utilizing data of other users whose characteristics are similar to the user, with collaborative filtering. Keywords: threshold, context, behavior, ambient, proactive.
1
Introduction
With the progress in downsizing of computers and the development of wireless network technology, it is getting able to make environments intelligent by embedding computers to environments. We are developing a collaboration system for assisting users in their homes, as an attempt for making intelligent environments. Suppose a user uses a home network which manages and controls all appliances in his home. For example, the user can turn off lights in his home before he goes to bed through the home network. However, the user must send commands to the home network consciously with mobile phones or PCs to operate environment. In such a case, it is useful for the user that environment automatically turn off the lights without any explicit commands. It means that environment collaborates with the user actively by providing services automatically according to the user situation. Such a collaboration can prevent mistakes of the user. To realize the collaboration, it is important that environment knows the user situation and operates appropriately for the user context. For example, environment should not turn off lights in his home mistakenly when he is still awake. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 411–425, 2008. c Springer-Verlag Berlin Heidelberg 2008
412
H. Yamahara et al.
Collaboration with the user by environment may bring him comfort, relief and safety. We target the following situations of collaboration by environment, as examples. One day, the user has left his home and has carelessly left the windows open. To prevent such a danger, our system informs the user that the windows are open before the user leaves his home. Such a service is valuable for the user because the service not only improves user amenity but brings relief and safety. In the above example, the timing to provide a service to the user is important. If the user is informed after the user leaves his home, the user must go back into house for closing the windows. The user should be informed before going outside the house. As another example, if an attempted delivery notice had arrived into a home server when a user came home, then our system recommends the user to go to pick up a package before the user sits on a sofa and gets relaxed. We refer to such services, which should be provided proactively according to user behavior, as proactive services. In this paper, aiming to realize the collaboration with the user by environment, we propose how to detect characteristic behavior of the user in situations in his home such as leaving the home, coming home and going to bed, for providing proactive services actively by environment. Some existing studies propose methods for detecting user motion, such as “walking” and “standing up”, and simple actions, such as “making tea” and “brushing teeth” [1,2,3,4]. However, not these low-level behaviors but high-level behaviors, such as “leaving the home” and “coming home”, need to be detected for providing proactive services. A high-level behavior is a complex behavior in which some actions are interleaved. It is difficult to provide proactive services only by detecting low-level behaviors. We are developing a system for detecting high-level behaviors [5,6]. Context-aware applications, including our developing system, are built based on a model that collects online sensor data, which is acquired according to user behavior, as behavior logs and matches the logs with behavioral patterns for recognition. First, such systems collect a specific amount of sample behavior logs, which show characteristics of user behavior. Next, a behavioral pattern is created with the logs on every situation to be detected. User behavior is detected by matching behavior logs, which are acquired online from current user behavior, with the behavioral pattern of each situation. These systems need a specific amount of personal behavior logs as sample behavior logs to create a behavioral pattern for recognition. Therefore, services of the systems do not get available until enough sample behavior logs have been collected. If it takes a long period to collect sample behavior logs from the user activity, the user is dissatisfied with waiting a long time. In order not to dissatisfy the user, a behavioral pattern must be created with a small number of sample behavior logs which can be collected in a short duration. Most of existing methods create a behavioral pattern based on a stochastic method such as Hidden Markov Model (HMM) [7,8]. These methods need many sample behavior logs to create a behavioral pattern. Consider the problem to create a behavioral pattern of the situation of leaving the home. Only about 30 sample behavior logs can be collected even in a month. That means a behavioral pattern cannot be created in a short duration. These methods
Personalizing Threshold Values on Behavior Detection
413
are not adequate to create patterns in short duration. Compared with these existing methods, a system we developed previously detects user behavior, using a behavioral pattern created with only 5 sample behavior logs which can be collected within a week [5,6]. Our system must set threshold values, which are used for creating a behavioral pattern and for matching online sensor data with the pattern. The first threshold is an extraction threshold. A behavioral pattern is created by extracting characteristics which frequently occur in sample behavior logs. The extraction threshold is a threshold of the occurrence frequency. If an unsuitable value is set to the extraction threshold, behavior recognition accuracy is low because the characteristics of the user are not extracted adequately. The second threshold is a detection threshold. When a user’s online sensor data is matched with a behavioral pattern, if the degree of conformity is more than the detection threshold then our system detects user behavior and provides services. Naturally, an unsuitable detection threshold makes behavior recognition accuracy low. Not only our system but also most context-aware applications require such thresholds to be set for creating a behavioral pattern and for matching the pattern. To make behavior recognition accuracy high, suitable threshold settings are necessary. After many sample behavior logs are collected, initial values of the thresholds can be changed into more proper values by learning with the logs. The issue of the learning is not discussed in this paper. This paper discusses, as an issue to be solved, how to set an initial threshold value that achieves high recognition accuracy under a constraint of a small number of sample behavior logs. It is important to set suitable threshold values initially, not to dissatisfy a user. Having some test users use the system on a trial basis before introducing the system to the user’s actual environment, the conventional method determines the values common to all users with data of the test users. However, it is difficult to achieve high recognition accuracy with the common values, because suitable threshold values vary with individual behavioral pattern. This paper aims to create a behavioral pattern which can bring out higher recognition accuracy by setting more suitable threshold value than the conventional method, particularly for users whose behavior is not recognized well with the conventional method. Because it is difficult to determine the threshold value with only a small number of personal sample behavior logs, we consider to utilize data of test users as in the conventional method. However, unlike the conventional method, we cannot determine threshold values directly and also cannot create a behavioral pattern with many data of test users in advance, because characteristics of high-level behavior vary with individual user. The proposed method determines threshold values suitable for a behavioral pattern of each user dynamically, using collaborative filtering with both a small number of personal sample behavior logs and statistical data from test user data. The method estimates recognition accuracy on every setting of thresholds, from relativity, which are obtained by statistics of test user data, between recognition accuracy and threshold settings. Finally, the threshold setting where estimated recognition accuracy is the highest is adopted. The proposed method has the following advantages.
414
H. Yamahara et al.
– Even in a case available personal sample behavior logs of the user individual is limited, the method determines dynamically threshold values which bring high recognition accuracy, by estimating recognition accuracy on every setting of thresholds with collaborative filtering. – Unlike the conventional method determining common threshold values with which recognition accuracy is as averagely high for all users as possible, the method determines threshold values suitable for individuals by collaborative filtering. Compared to the conventional method using pre-determined thresholds common to all users, the proposed method improves the recognition accuracy, with suitable thresholds for individual behavioral pattern. The remaining part of this paper is as follows. Chapter 2 describes our behavior detection system. Chapter 3 considers effect of threshold setting on recognition accuracy of user behavior. Chapter 4 explains how to determine individual thresholds with collaborative filtering. Chapter 5 discusses related work. Chapter 6 concludes this paper.
2 2.1
Behavior Detection in the Home Detection of High-Level Bbehavior
We consider situations of leaving the home, coming home, getting up and going to bed, as situations in which proactive services can be provided effectively. For example, suppose when getting up, our system provides a reminder service, which reminds a user of one-day-schedule and of things to be completed by the time the user leaves his home. By providing this reminder service before the user starts preparing for leaving or for having a meal just after a series of actions when the user got up, the service can support the decision of next action of the user. When going to bed, our system provides services which brings relief and safety. For example, our system informs of that the windows are not closed. We consider proactive services are valuable services which can proactively prevent repentance and danger, which the user might face in the case that the services are not provided. Proactive services should not be provided mistakenly when “the user gets out of bed just for going to the toilet in the middle of sleep”, or when “the user goes outside house just for picking up a newspaper”. High-level behaviors, such as “leaving the home” and “going to bed”, cannot be correctly detected only by recognizing simple actions as in the existing methods [3,4]. We consider that a high-level behavior is a long behavior of around ten minutes. Some actions are interleaved in the high-level behavior. In addition, characteristics of the highlevel behavior vary with individual user. Therefore, a behavioral pattern for detecting the high-level behavior must be created with personal behavior logs of individual user. Not to dissatisfy the user due to long waiting for collecting personal behavior logs, we consider services must get available within a week at the latest only with a small number of personal behavior logs.
Personalizing Threshold Values on Behavior Detection
415
Fig. 1. Objects embedded by RFID tags
2.2
Individual Habit in Touched Objects
To detect high-level behaviors, we must collect data which remarkably shows characteristics of individual user behavior as behavior logs. We focus on the aspect that most people often have habitual actions in a habitual order, for not making omission of things to do, in situations such as leaving the home and going to bed. Each user has his own characteristic behavior in such specific situations. That means the user habitually touches the same objects every time in the same situation. The kind of objects the user touches and their order depend on the individual user. We record histories of touched objects as behavior logs, using 13.56MHz RFID tags. As shown in Fig. 1, the tags are embedded in various objects of a living space, such as a doorknob, a wallet, or a refrigerator. Every object can be identified by unique tag-IDs stored in the tags. In contrast, a user wears a
User A : Leave Home
User A : Come Home
User B : Leave Home
User B : Come Home
・・・ ・・・
・・・ ・・・
・・・ ・・・
・・・ ・・・
・・・ ・・・
・・・ ・・・
・・・ ・・・
・・・ ・・・
toothbrush lavatory cup lavatory faucet wardrobe hanger pants hanger cell phone pass case wrist watch key case bag refrigerator milk carton
key case entrance light switch pass case cell phone wrist watch lavatory faucet lavatory cup lavatory faucet lavatory cup lavatory faucet lavatory cup lavatory faucet hanger
toothpaste toothbrush hair dressing comb shaver hanger VCR remote control TV switch wallet cell phone bicycle key portable music player bag
Fig. 2. Examples of behavior log
bag cell phone portable music player wallet bicycle key bag hanger lavatory cup lavatory faucet lavatory cup lavatory faucet TV switch PC mouse
416
H. Yamahara et al.
finger-ring-type RFID reader. With this RFID system, according to user behavior, the history of touched objects is recorded in a database as the behavior log of the user. history of touched objects, is recorded in a database as the behavior log of the user. Fig. 2 shows actual behavior logs recorded by our system. Each table shows behavior logs of two users in situations of leaving the home and coming home. For example, in the situation of leaving the home, the habitual actions of user A are different from those of user B. From the log, it is inferred that user A brushed his teeth, changed his clothes, picked up some portable commodities, and brought out a milk carton from the refrigerator. It is inferred that user B brushed his teeth, set his hair, operated a VCR and then picked up some portable commodities. These behavior logs show that kind of touched objects and their order are different among individual users even in a same situation. Similarly, comparing each user’s situation of leaving the home to that of coming home, it is found that a user touches different kinds of objects or touches the same objects in a different order in different situations. 2.3
Behavior Detection with Ordered Pairs
To detect high-level behavior, we create a behavioral pattern represented by a set of ordered pairs, which show the order relation among touched objects, with histories of touched objects as sample behavior logs. The flow for creating a behavioral pattern is shown in Fig. 3, with an example of a behavioral pattern in the situation of leaving the home. Generally, existing methods based on probabilistic models, such as HMM, create a behavioral pattern with high recognition accuracy using both behavior logs of the situation of leaving the home and logs of situations other than the situation of leaving the home as sample behavior logs. However, consider our problem that a behavioral pattern must be created with a small number of sample behavior logs. Even behavior logs of leaving the home cannot be collected frequently. We can not expect to collect behavior logs of other situations which are adequate to make recognition accuracy high. Therefore, a behavioral pattern must be created only with behavior logs of leaving the home. First, behavior logs of w cases are collected as sample behavior logs. The time length tl of a sample behavior log is fixed. If m objects are sequentially touched in a behavior log l, then l is represented as a conjunction {o1 , o2 , ... , oi , ... , om }, where, oi−1 = oi (1 < i ≤ m). Second, all ordered pairs between two objects are enumerated from all collected sample behavior logs. If object oj is touched after object oi is touched, then the ordered pair p is represented as {oi → oj }, which includes the case of oi = oj . For example, ordered pairs enumerated from a behavior log {o1 , o2 , o3 } are p1 : {o1 → o2 }p2 : {o1 → o3 }p3 : {o2 → o3 }. Next, the occurrence count of all ordered pairs is counted up. The occurrence count means not the number of times that each ordered pair occurred in a sample behavior log, but the number of sample behavior logs including each ordered pair in w logs. Finally, the ordered pairs where the ratio of the occurrence count to w is more than an extraction threshold e% are extracted as a behavioral pattern π.
Personalizing Threshold Values on Behavior Detection [case_ID : 1]
(1)Collect behavior logs
100000055 pants hanger 100000017 lavatory faucet 100000018 lavatory cup 100000020 toothbrush 100000019 toothpaste 100000020 toothbrush
(2) Enumerate ordered pairs
[ordered pair]
pants hanger → pants hanger → pants hanger → pants hanger →
・・・ ・・・ ・・・
100000018 lavatory cup
lavatory faucet lavatory cup toothbrush toothpaste
417
[count]
9 18 10 10
・・・
(3) Count occurrence ・・・ ・・・
(4) Ordered pairs more than the threshold are extracted
100000020 toothbrush 100000018 lavatory cup 100000017 lavatory faucet 100000068 cell phone 100000063 pass case 100000065 wrist watch 100000050 bag
・・・ ・・・
p1: pants hanger → lavatory cup p2: pants hanger → cell phone p3: cell phone → milk carton p4: cell phone → bag p5: wrist watch → bag
・・・ ・・・
Some cases of behavior logs
Behavioral pattern represented by a set of ordered pairs
Fig. 3. How to create a behavioral pattern
The behavioral pattern π is matched with the current behavior log of time length tl , which is acquired online from current user behavior. If more than a detection threshold d % of ordered pairs, which compose the behavioral pattern π, exist in the behavior log, then user behavior of leaving the home is detected. For example, ordered pairs, such as {toothpaste → toothbrush}, indicate the user’s habitual actions, such as “brushing teeth”. Ordered pairs, such as {toothpaste → pants hanger}, indicates habitual order of the user actions, such as “the user wears pants after brushing his teeth”. The behavioral pattern of a set of ordered pairs can represent the user’s habitual actions and their order. As shown also in our previous work [5], compared to the method using a BN [3,4] and the method using time series association rule [9], this detection method has an advantage that the method can represent characteristics of complex user behavior by composing simple-structured behavioral pattern, which can be automatically created, with a set of the smallest unit of order.
3 3.1
Discussion on Setting of Threshold Values Difficulty of Setting Threshold Values
We previously conducted an experiment in which we detected user behavior in situations of leaving the home, coming home, getting up, and going to bed, using our detection method [5]. We evaluated the recognition accuracy both with true-positive rate (T P R) and with true-negative rate (T N R). TPR shows the rate at which behavior logs in a specific situation, which logs are referred to as true cases, are correctly detected with a behavioral pattern of the situation. TNR shows the rate at which behavior logs in situations other than the specific situation, which logs are referred to as f alse cases, are correctly neglected with the behavioral pattern of the situation. It is preferable that both TPR and TNR are high. From Table 1 to Table 4 shows TPR and TNR of eight subjects for 4 situations as results of the experiment. The recognition rates of some subjects
418
H. Yamahara et al. Table 1. Result of “Leave Home”
Table 2. Result of “Come Home”
subject TPR TNR A 94.00 96.02 B 98.00 85.44 C 78.00 83.20 D 95.00 98.00 E 99.00 98.96 F 96.00 97.00 G 100.00 96.36 H 98.00 95.18 average 94.75 93.77 *The unit is “%”.
subject TPR TNR A 89.00 95.93 B 99.00 98.12 C 81.00 83.37 D 98.00 78.40 E 93.00 99.60 F 99.00 100.00 G 100.00 96.80 H 100.00 98.27 average 94.88 94.34 *The unit is “%”.
Table 3. Result of “Get Up”
Table 4. Result of “Go To Bed”
subject TPR TNR A 73.00 99.12 B 90.00 96.78 C 63.00 84.35 D 100.00 99.22 E 64.00 87.32 F 97.00 99.68 G 100.00 74.33 H 56.00 83.60 average 80.38 90.55 *The unit is “%”.
subject TPR TNR A 62.00 85.34 B 91.00 71.84 C 95.00 96.92 D 78.00 94.66 E 28.00 91.24 F 95.00 99.14 G 98.00 99.32 H 58.00 100.00 average 75.63 92.31 *The unit is “%”.
were more than 90%. Meanwhile, the recognition rates of a few users were low rates of less than 80%. The rates vary among subjects. In situations of leaving the home and coming home, recognition rates are high on most subjects. Compared to these, there are more subjects whose recognition rates are low in situations of getting up and going to bed. The main cause of these differences is that the extraction threshold and the detection threshold are pre-determined values common to all users. Based on half total true rate (HT T R), which is an average between TPR and TNR, these threshold values were determined such that HTTR averaged for all users is maximum. However, the best values of these thresholds intrinsically vary with a behavioral pattern of each user. It is necessary to improve the recognition accuracy of users, whose recognition rates are low with the common threshold values, by setting suitable initial threshold values for individuals. Although the recognition accuracy can be improved by learning of a behavioral pattern and suitable threshold values with the logs after many sample behavior logs are collected, it is difficult to determine thresholds values suitable for behavioral patterns of individual users at the early stage where there are only a small
Personalizing Threshold Values on Behavior Detection
419
Table 5. Valiation of the best value of detection threshold for each user subject LeaveHome ComeHome GetUp GoToBed A 28% 31% 20% 45% B 40% 35% 47% 64% C 46% 42% 43% 72% D 23% 56% 55% 68% E 34% 24% 45% 40% F 28% 30% 46% 47% G 35% 50% 67% 61% H 36% 43% 28% 36% common 33% 31% 47% 63%
number of sample behavior logs. On the other hand, it is impossible to determine the threshold values and create behavioral patterns in advance without sample behavior logs of user individual, because characteristics of high-level behaviors vary with individual user. 3.2
Effect of Detection Threshold on Recognition Accuracy of User Behavior
Table 5 shows the best values of detection threshold, which are obtained by analyzing results of the experiment. The value of “common” shown in the bottom row means common threshold values used in the experiment. In the situation of leaving the home and coming home, differences between common values and the best values of each subject are small. On the other hand, there are more differences of those values in the situation of getting up and going to bed. In addition, there are differences among the best values of subjects. Comparing the recognition rates in tables from Table 1 to Table 4 with differences between common values and the best values of each subject in Table 5, it is apparent that the more differences bring lower recognition rate. Recognition rates of subjects A, G and H in the situation of getting up and ones of subjects E and H in the situation of going to bed indicate such trend significantly. The detection threshold values directly affect recognition accuracy of user behavior. It is preferred that the detection threshold are set to values suitable for individual user. 3.3
Effect of Extraction Threshold on Recognition Accuracy of User Behavior
The number of ordered pairs composing a behavioral pattern changes according to change of the extraction threshold, and affects the quality of the created behavioral patterns. It is preferable that a behavioral pattern includes many ordered pairs which are characteristics of user behavior in true cases. At the same time, the pattern should include few ordered pairs which can be characteristics of user behavior in false cases. If a behavioral pattern is composed of too few ordered pairs due to setting the extraction threshold high, then the behavioral
420
H. Yamahara et al.
pattern may not include some ordered pairs which should be normally included as user characteristics. The pattern will be conformed to by false cases unsuccessfully. On the other hand, if a behavioral pattern is composed of too many ordered pairs due to setting the extraction threshold low, then the behavioral pattern may include excessive ordered pairs which are not normally user characteristics. The pattern will not be conformed to by true cases successfully. In particular, such fluctuation is a sensitive problem under the constraint of a small number of sample behavior logs. Suppose an unsuitable value is set to the extraction threshold. It is impossible to extract ordered pairs adequately without excesses and shortages. Accordingly, recognition accuracy is low because differences between true cases and false cases are small when matching those cases with the behavioral pattern created with such ordered pairs. A suitable extraction threshold sharpens differences between true cases and false cases. In other words, the suitable extraction threshold improves the robustness to unsuitable setting of the detection threshold. Consequently, the suitable extraction threshold achieves high recognition rate. In Table 5, differences between the common value and the best values on subjects C and F in the situation of going to bed are not small. However, their recognition rate in Table 4 is high. These results show that our detection method can distinguish between true cases and false cases even with unsuitable detection threshold values because the extraction threshold values suitable for individual behavioral pattern are used for creating the behavioral patterns. The best values of the extraction threshold vary among users. In addition, because the best values are also affected by the content of sample behavior logs for creating behavioral patterns, the best values vary among behavioral patterns. We investigated the best values by looking for the suitable number of ordered pairs composing a behavioral pattern in the previous experiment [10]. That approach was effective in a way. However, it is preferable that the extraction threshold values are more suitable for individual behavioral pattern.
4
4.1
Dynamic Threshold Determination with Collaborative Filtering Dynamic Determination of Thresholds Suitable for Individuals
We consider determining threshold values dynamically for individual behavioral pattern in order to set suitable values to the thresholds. For that purpose, unlike the conventional method which uses fixed common threshold values, this paper proposes a method which acquires a rule for individually determining the threshold values for each behavioral pattern from the data of test users. The conventional method using common threshold values is illustrated on the left side of Fig. 4 and our method using collaborative filtering is illustrated on the right side of Fig. 4. The horizontal center line shows a partition of the two phases for introducing a context-aware system to actual user environment. The upper portion is the development phase, before introducing the system to the actual
Personalizing Threshold Values on Behavior Detection conventional method with common thresholds
behavior logs of test users before introducing to an individual user
determine common threshold
dynamic threshold determination with collaboration filtering
behavior logs of test users
personal behavioral pattern
create behavioral pattern
gather statistics on recognition rate with test user data statistical data
threshold
after introducing to an individual user
personal sample behavior logs
421
personal sample behavior logs personal behavioral pattern
determine individual threshold with collaborative filtering create behavioral threshold pattern
Fig. 4. Dynamic threshold determination with collaborative filtering
environments of individual users. The lower side is the operation phase, after introducing the system. As shown in Fig. 4, the conventional method determines common threshold values at the developement phase. First, the method collects behavior logs of test users. Next, for every test user, the method repeatedly creates a behavioral pattern with the logs, while matching the logs with the pattern. Analyzing the result of recognition accuracy, the method determines the threshold values with which recognition rate averaged for all test users is the highest. At the operation phase, the method creates an individual behavioral pattern with personal behavior logs. The threshold values are common irrespective of users. However, because suitable values for thresholds vary with the individual behavioral pattern of each user, behavior recognition accuracy of some users may be low with the common values. To dynamically determine suitable threshold values for individuals, it is preferrable to acquire knowledge from personal behavior logs of individual user. However, it is difficult to determine suitable threshold values only with a small number of personal behavior logs. Therefore, the proposed method dynamically determines threshold values by using both knowledge acquired by analysis of test user data and knowledge acquired from personal behavior logs. First, our method collects sample behavior logs of test users. Second, our method repeatedly creates a behavioral pattern with the logs and matches the logs with the pattern, for every test user. Next, our method considers the correlation between threshold values and the recognition accuracy. Our method derives not threshold values itself common to all users but a rule for determining the values from test user data. We uses collaborative filtering as a rule for determining suitable threshold values. Collaborative filtering is a method for automatically estimating unknown information of a target user with some known information of him. If known information of other users, which indicates characteristics of the users, are prepared then collaborative filtering estimates unknown information of the target user by utilizing known information of the target user and known information of other users whose characteristics are similar to characteristics of the
422
H. Yamahara et al.
target user. This estimation method is utilized for recommendation and personalization as used in Amazon.com [11]. In Table 5, the best values do not always vary among all users. There exist some users whose best values are close to the best values of other users. Therefore, collaborative filtering is able to set the threshold values suitable for the target user by determining the values same as values of test users whose characteristics are similar to ones of the target user. The threshold values are not determined at the developement phase. At the operation phase, the threshold values are determined by collaborative filtering with both statistical data of test users and personal behavior logs. 4.2
Determination of Thresholds with Estimate Values by Collaborative Filtering
With an example of a behavioral pattern of user X in the situation of leaving the home, we describe how to determine the threshold values with collaborative filtering. At the developement phase, our method calculates recognition accuracy of individual behavioral pattern of each test user. First, behavior logs in the situation of leaving the home are collected as true cases, and also behavior logs in situations other than that are collected as false cases. Second, the following two steps are executed for every test user, repeatedly k times. Here, w is the number of sample behavior logs used for creating a behavioral pattern. The number is a given value common to all users. 1. Select w true cases as sample behavior logs and create w behavioral patterns with each setting of the extraction threshold value e = 100 × 1/w, 100 × 2/w, ..., 100 × w/w, using the w true cases. 2. Count the number of ordered pairs composing each of w behavioral patterns. 3. With all settings of the detection threshold d from 1% to 100%, match all true cases and all false cases with the w behavioral patterns. 4. Calculate TPR, TNR and HTTR by gathering statistics on all results of the matching in the previous step. In this way, HTTR is calculated per one combination of w true cases. Recognition rates of various combinations of w true cases on every test user are obtained as statistical data. Next, values of thrsholds are determined when a behavioral pattern is created at the operati on phase, by collaborative filtering with the statistical data. Fig. 5 shows examples of the statistical data and the data of user X which are used for collaborative filtering. In the example, w is 5. Rows from “combination a” to “combinatin g” are the statistical data obtained by the above calculation. Each row shows the following information of one combination of w true cases selected in above step 1. – the number of ordered pairs composing behavioral patterns created on each setting of the extraction threshold e = 100 × 1/w, 100 × 2/w, ..., 100 × w/w – values of HTTR with each combination of the setting of the extraction threshold e and the setting of the detection threshold d from 1% to 100%
Personalizing Threshold Values on Behavior Detection
423
Fig. 5. Determination of thresholds for user X by collaborative filtering with test user data
Information of how many ordered pairs composes each behavioral pattern on each setting of the extraction threshold and information how high recognition rate is brought with each combination of the setting of two thresholds are obtained from the statistical data. On the other hand, there is only information, obtained from a small number of personal sample behavior logs, of user X when his initial behavioral pattern is created. As shown in the bottom row of Fig. 5, the proposed method utilizes the number of ordered pairs composing each behavioral pattern which is created on each setting of the extraction threshold with personal sample behavior logs of user X. At this time, it is unknown how high recognition rate is brought with each behavioral pattern of user X. Our method estimates the HTTR of behavioral pattern of user X, by collaborative filtering with values of rows from combination a to combination g and 5 values of the number of ordered pairs which are known information of user X. From E1,1 to E5,100 shows the estimated HTTR. Here, Ei,j means the estimated HTTR on the setting where e = 100 × i/w % and d = j %. After the estimation, our method selects one estimated value from all estimated values as follows. 1. Select the maximum estimated value. 2. If more than two estimated values are selected in the above step, pick up the longest sequence of rows of the maximum estimated values and select the estimated value in the center of the rows. 3. If more than two estimated values are selected in the above step, select the estimated value of row of the smallest value of e from the remaining candidates. Suppose three rows of {E4,62 E4,63 E4,64 } are the longest sequence of the maximum estimated values in Fig. 5. In such a case, E4,63 in center of three rows is selected. Finally, because E4,63 is the estimated value on e = 80% and d = 63%, our method determines these values as threshold values of user X. The estimated values by collaborative filtering are close to HTTR values of other users whose characteristics are close to user X. Therefore, our method can determine the threshold values suitable for user X.
5
Related Work
There are several approaches to set suitable threshold values in a variety of fields. In image processing, a setting method of a threshold used for extracting a specific
424
H. Yamahara et al.
area from a target image has been proposed [12]. This method can be used only if both parts to be extracted and parts not to be extracted exist together in a recognition target. Our issue does not meet such a condition, because behavior recognition in this paper considers whether a current behavior log conforms to a behavioral pattern or not. This approach in image processing cannot be applied to our issue. In other approaches, Support Vector Machines and boosting has been used for text categorization [13,14], and HMM is used for speech and gesture recognition [15,16]. These approaches can set suitable threshold values under the assumption that they can collect and analyze many samples of recognition target or many samples of others which have similar characteristics to samples of the recognition target instead. However, there is the constraint of a small number of sample behavior logs for creating a behavioral pattern in our issue. In addition, because characteristics of high-level behavior in homes are different among individual users, behavior logs of other people other than a user cannot be used for sample behavior logs. Although these methods can be used for learning suitable threshold values after many personal behavior logs have been collected, these methods cannot be used for setting suitable initial threshold values. In a field of behavior recognition, most existing works [9,3,4] do not discuss the setting of thresholds suitable for individual behavioral pattern.
6
Conclusion
This paper proposed a detection system of high-level behavior for realizing collaboration between users and environment in homes. Our system aims to realize the collaboration by providing proactive services according to user behavior. To detect user behavior precisely, our detection method dynamically determines threshold values suitable for behavioral patterns of individuals with collaborative filtering. Our future work is evaluation of the determination method in experiments.
References 1. Barbiˇc, J., Safonova, A., Pan, J.Y., Faloutsos, C., Hodgins, J.K., Pollard, N.S.: Segmenting motion capture data into distinct behaviors. In: Proc. the 2004 conference on Graphics interface, pp. 185–194 (2004) 2. Moore, D.J., Essa, I.A., HayesIII, M.H.: Exploiting human actions and object context for recognition tasks. In: Proc. IEEE International Conference on Computer Vision 1999 (ICCV 1999), pp. 80–86 (1999) 3. Patterson, D.J., Fox, D., Kautz, H.A., Philipose, M.: Fine-grained activity recognition by aggregating abstract object usage. In: Proc. the 9th IEEE International Symposium on Wearable Computers (ISWC 2005), pp. 44–51 (2005) 4. Wang, S., Pentney, W., Popescu, A.M., Choudhury, T., Philipose, M.: Common sense based joint training of human activity recognizers. In: Proc. the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 2237–2242 (2007)
Personalizing Threshold Values on Behavior Detection
425
5. Yamahara, H., Takada, H., Shimakawa, H.: Detection of user mode shift in home. In: Ichikawa, H., Cho, W.-D., Satoh, I., Youn, H.Y. (eds.) UCS 2007. LNCS, vol. 4836, pp. 166–181. Springer, Heidelberg (2007) 6. Yamahara, H., Takada, H., Shimakawa, H.: An individual behavioral pattern to provide ubiquitous service in intelligent space. WSEAS Transactions on Systems 6(3), 562–569 (2007) 7. Aoki, S., Iwai, Y., Onishi, M., Kojima, A., Fukunaga, K.: Learning and recognizing behavioral patterns using position and posture of human body and its application to detection of irregular state. The Journal of IEICE (D-II) J87-D-II(5), 1083–1093 (2004) 8. Kidd, C.D., Orr, R.J., Abowd, G.D., Atkeson, C.G., Essa, I.A., MacIntyre, B., Mynatt, E., Starner, T.E., Newstetter, W.: The aware home: A living laboratory for ubiquitous computing research. In: Streitz, N.A., Hartkopf, V. (eds.) CoBuild 1999. LNCS, vol. 1670, pp. 191–198. Springer, Heidelberg (1999) 9. Mori, T., Takada, A., Noguchi, H., Harada, T., Sato, T.: Behavior prediction based on daily-life record database in distributed sensing space. In: Proc. the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005), pp. 1833–1839 (2005) 10. Yamahara, H., Takada, H., Shimakawa, H.: Behavior detection based on touched objects with dynamic threshold determination model. In: Kortuem, G., Finney, J., Lea, R., Sundramoorthy, V. (eds.) EuroSSC 2007. LNCS, vol. 4793, pp. 142–158. Springer, Heidelberg (2007) 11. Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7(1), 76–80 (2003) 12. Kimura, Y., Watabe, D., Sai, H., Nakamura, O.: New threshold setting method for the extraction of facial areas and the recognition of facial expressions. In: Proc. the IEEE Electrical and Computer Engineering, Canadian Conference (CCECE/CCGEI 2006), pp. 1984–1987 (2006) 13. Cai, L., Hofmann, T.: Text categorization by boosting automatically extracted concepts. In: Proc. the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR 2003), pp. 182–189 (2003) 14. Shanahan, J.G., Roma, N.: Boosting support vector machines for text classification through parameter-free threshold relaxation. In: Proc. the 12th International Conference on Information and knowledge Management (CIKM 2003), pp. 247–254 (2003) 15. Asami, T., Iwano, K., Furui, S.: A stream-weight and threshold estimation method using adaboost for multi-stream speaker verification. In: Proc. the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2006), vol. 5, pp. 1081–1084 (2006) 16. Niu, F., Abdel-Mottaleb, M.: Hmm-based segmentation and recognition of human activities from video sequences. In: Proc. the 2005 IEEE International Conference on Multimedia and Expo. (ICME 2005), pp. 804–807 (2005)
IP Traceback Using Digital Watermark and Honeypot* Zaiyao Yi1, Liuqing Pan2, Xinmei Wang3, Chen Huang1, and Benxiong Huang1 1
2
Dept. of E.I.E, Huazhong Univ. of Sci. & Tech., Wuhan, Hubei, China School of National University of Defense Technology, Changsha, Hunan, China 3 XiDian University, Xi’an, Shanxi, China [email protected]
Abstract. Nowadays, The widespread networks are intensively threatened by Internet attack. It’s highly urgent to traceback the attack origination, neutralize the attack and punish the malicious attackers. There are many IP traceback methods, but none of the existing solutions can fulfill all the effective traceback requirements. This paper proposed a novel IP traceback scheme. In this scheme, an elaborate digital watermark is put into a honeypot, and the probe-scan-entrap the attacks through the honeypot, which sequentially induces the attacks to visit the digital watermark. Thus in the overlay network, the trail of the digital watermark will reconstruct the attack route so that the hacker’s address can be located. It is very difficult to carry out single packet traceback in traditional methods and their measures heavily depend on the router capability, the network administrators’ and co-operation between ISPs. Our proposal has solved such problems and it can against attacks through proxy or slave hosts. Keywords: IP Tracebak, digital watermark, honeypot.
1 Introduction1 From the very beginning of internet development, it always suffered from all kinds of attacks, including involuntary or malicious. The attacks to Internet services increase with the more and more popularity of the Internet applications. Internet is rather vulnerable due to its inherent design characteristic. The criminal activities in cyber world grow even more rapidly than the network itself when considering the administrative hierarchy and personal quality. So they are harmful to the real world specially. Intrude-detection is the main resort for detecting network attacks. It monitors the activities of hosts and network to find out attack behaviors in time, then takes action to block the attack or alleviate it. However it is just a passive way. In order to deter the troublemakers and future crime through network, it is necessary to traceback the attacks and locate the real black kids. *
This work was supported by China Hubei Science & Technology Department through project SBC in 3G CN (2006AA102A04) and Program for new Century Excellent Talents in University NCET-06-0642.1 and the National HighTechnology Research and Development Program ("863"Program) of China No.2006AA01Z267, No. 2007AA01Z215.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 426–438, 2008. © Springer-Verlag Berlin Heidelberg 2008
IP Traceback Using Digital Watermark and Honeypot
427
IP traceback is a method which can find and locate the happening or happened network attacks. It collects attackers’ information and derives the attack routes to finally locate the attacker. However, it is quite difficult to devise a perfect IP traceback scheme since there’re many restrictions, such as: 1. The original TCP/IP protocol did not take security considerations, so there lack blocking mechanism against suspicious activities, say nothing of traceback features; 2. The network bandwidths are almost exhausted due to the large numbers of network traffic. The various network environments and the Tunnel technique make it more difficult to implement effective traceback; 3. The Hacker technique has been developing rapidly, e.g., the proxy and slave hosts set obstacles hard to get over. Aiming at solving the existing deficiency, this paper proposed a novel method to complete IP traceback, which combines the honeypot and digital watermark in a creative way, that actively entrap and traceback the network attacks by using some “forgery” sensitive data embedded with digital watermark. Firstly, honeypot acts as a secured resource that initiates non-service traffic. The existing reason is TO BE scanned, attacked, and captured. Thus, all network traffic connected to the honeypot can be regarded as probed or intruded. So we suggest should deploy some “forgery” sensitive data into the honeypot and monitor any access to that data. Thereby we can precisely judge whether the honeypot is intruded or not. Secondly, digital watermark is commonly used as security ID for digital publications. It has been proved concealment, robustness and certitude. Embed the digital watermark into the message or packets to be traced, then the traceback process of network attack can be transformed to identify the digital watermark within message or packets. Moreover, to reduce the cost of inserting the digital watermark, the digital watermark can be defined to be specific digital watermark within the intrusion traffic itself, e.g., the intruder will inevitably visit such sensitive “forgery” data. Compared with traditional IP traceback methods, our proposal can lessen the consumption of modifying the message or packet. Meanwhile the network traffic is not obviously changed so the intruders can not be easily aware of it. The rest of this paper is structured as follows. In section 2 we give a brief introduction to the related works on IP traceback. Our motivation comes from the shortcomings of the traditional IP traceback schemes. Our design is shown in section 3. We analyze the performance of the proposed scheme in Section 4, and summarize our work in Section 5.
2 Related Works Generally speaking, the existing IP traceback methods include packet-based or linkage-based, reactive or Proactive, host-based or network-based and Global Internetbased or local ISP-based. The typical methods are listed in the following:
428
Z. Yi et al.
2.1 Link Testing Link testing locates the attack source through inter-routers linkage. The nearest-tovictim router would be checked for “intrusion featured” data packets via its routing table and source IP address of the packets. Once upon such “malicious” packets are detected, the upstream router would be logged on to inspect such packets. If diffusive attack is detected, then it goes to higher-up routers to re-check “malicious” packets. Such process is repeated until the actual attack source is located. The link testing method is a reactive method that are initiated after an attack has been detected, and it must be completed while the attack is still active since they are ineffective once the attack ceases. Link testing measures include input debugging [1] and controlled flooding [2]. Input debugging is implemented at the router level. A filter is placed on the egress port of the router upstream to the victim and the ingress port is determined by matching the attack signature in packets arriving at the router. Once an ingress port has been identified the procedure is repeated at upstream routers, hop-by-hop, until the originating network is identified. Input debugging requires the routers’ support, and it has tremendous cost at intranet borders and cooperation between ISPs. It also calls for time and labor cost at victim computer and ISPs. Controlled flooding works by flooding upstream links with large amounts of UDPCharged traffic and by observing whether it decreases the attack traffic rate. When an affected link is identified, the cooperative hosts upstream must be willing to repeat this process recursively until a specific network or network segment where the attack appears to originate is identified. The appropriate network operator would then be contacted in order to ultimately identify the attack source. The major advantages are the effectiveness and the facility of implementation. It does not intensively raise the cost on routers and has no need to modify the routers’ configuration. But this process itself is a kind of DOS attack, which would bring hazard to the trustful upstream routers and other valid network traffic. Besides, the high degree of upstream host cooperation is required. It’s rather difficult to implement in complex network environment since the manual application of loads is needed. 2.2 Ingress Filtering Ingress filtering [3] mainly deals with the source IP address cheating. The ISP would deploy an income filter, which only dispatches packets with the valid source IP scope in this ISP. The invalid source packets would be dropped. Such method can hardly guarantee all the routers has necessary resources to check every packets’ source IP address , and be capable to judge the validity. 2.3 Packet Recording This method contrives the attack source information through data-mining technology, among packets or packets features, which are recorded by some key routers. The recorded packets can be analyzed by precisely intrusion construing, thus Packet
IP Traceback Using Digital Watermark and Honeypot
429
recording can be performed post-mortem. However, there are many difficulties to implement effective packet recording. First of all, it needs tremendous storage capability recording machines. Whereas the sliding window record tact can be deployed, this issue is still a crucial threshold for high traffic networks, or the traceback would be cut loose after the instance. Secondary, the recording machines are desired to be capable enough for automatic analysis. It’s absolutely mission impossible for manual analysis among the jillion network data. 2.4 ICMP Message Trace (iTrace) The basic idea of iTrace [4] [5] is shown in the following. When a packet goes through a router, the router generates a ICMP trace message with certain probability and send this ICMP message and the original packet to the destination. Each ICMP message includes some path information such as the IP address of the router which sends the packet to it, the fore-hop router IP address, the next hop router IP address and the time stamp. The “victim” machine is responsible for collecting such ICMP messages, which can be used to rebuild the complete attack path. Nevertheless, this method can be effective only when enough ICMP messages are accepted. Further more, the additional ICMP traffic would be something different from the normal one. So it’s very vulnerable to filtered out by the routers. Some routers cannot support the ICMP due to its input debugging shortage. Even worse, if the troublemaker sends out freak ICMP packets, a distributed structure is needed for counterstrike. 2.5 Logging Logging [6] [7] [8] [9] is a direct and simple method. The log function is used to record the packets information it transmitted. Once intrusion is detected, the traceback can take advantage of the logs. The key routers would record feature information and the upstream router address for each packet, and traceback the packet source path through data-mining technology. It’s independent on the attack to be real-time, but it also requires tremendous resources upon the storage and processing capability to deal with the logs. Besides, the integrality and security of the log database is a crucial issue in case of malicious deletion or demolishment. 2.6 Packet Marking Packet marking [10] is implemented by adding ID information into the packets, so the “victim” machine can rebuild the attack route via such IDs. Its basic implementation is to add the transfer routers’ IP addresses. Thus the destination host can get every packet with the complete path information. Obviously, there’s no enough space in the packet for the insertion so it would increase the router cost. Besides, the attacker may forge packets for runaround.
430
Z. Yi et al.
There is a modification method named PPM (probabilistic packet marking), by inserting the ID information by probability and selectively. When the destination host get enough packets with IDs, the traceback is feasible. Such method needs extra support from the routers and will not be realistic for very low traffic attacks. Besides, deterministic packet marking, DPM, Savage [11] describes a variation version of PPM in order to overcome these limitations. The concept of compressed edge fragment sampling is introduced. Song and Perrig [12] proposed modifications to Savages’ edge-ID based PPM method to further reduce the storage space requirements by storing a hash of the IP address instead of the address itself. Park and Lee [13] assessed the effectiveness and limitations of PPM methods using a constrained minimal optimization approach. Dean [14] proposed a modified PPM method that employs algebraic techniques from the fields of coding theory and machine learning to encode/decode path information as points on polynomials. 2.7 Watermarking Scheme Watermark which is an important tool to enforce copyright protection, also has been used to trace the source of data flows and the network attacks. Julien P. Stern and Jean-Pierre Tillich [15] firstly embeded different watermarks into different documents and allowed to detect with the help of a single private key. Recently, some advanced schemes of watermark were proposed. For example, embedding a watermark into traffic timing [16], injecting unique watermark into the inter-packet timing domain of a packet flow [17], and the adaptive watermarking scheme [18]. An adaptive watermarking scheme requires fewer packets to embed and it is more tolerant of the distortions of deliberate timing jitter. This scheme puts up with two ways to control the adaptation of the watermark, one of which uses measured traffic timing while the other uses measured packet sizes. However, none of these schemes mentioned above don’t give a details about how to design and deploy watermark. Due to the complexity, some mechanism of watermark cannot come into truth. This paper proposes a new way to apply watermark to trace attacks easily and efficiently by deploying honeypot.
3 IP Traceback Using Digital Watermark and Honeypot In this paper, a production-type honeypot is deployed for entrapping the intrusion. Some “sensitive” files inside this honeypot act as baits, which can be *.TXT, *.DOC, *.WAV or any other types that would get the hacker’s interest, which is embedded with authenticable digital watermark. Once the attacker visit the bait file, his route can be rebuilt by checking the digital watermark in the overlay network. 3.1 Infrastructure Fig 1 shows a distributed system, which including one or more trace service console(s), some distributed trace agents and some specified honeypots saving some sensitive data marked with specific digital watermark, as the infrastructure.
IP Traceback Using Digital Watermark and Honeypot
431
Fig. 1. Infrastructure Layout
3.2 Function Modules By function, such a scheme can be divided into two main parts: The entrapment environment and The Attack Tracing System. The entrapment environment includes honeypots and files with digital watermark. In this environment, the deployment of honeypots, the inducing and entrapping process, and the traceback initiate request are accomplished. Attack Tracing System includes trace service console(s) and trace agents, this part is in charge of the traceback data collection, statistical inquiry and route rebuild computation. In practice, the quantity of the honeypots and watermark can be decided upon different network scale and specific demand, so are the service console(s) and the agents. If necessary, multilevel hierarchy can be deployed. 3.2.1 The Entrapment Environment As illustrated in Fig 2, the honeypot subsystem is made up by 4 functional modules: the network deceive module, the information capture module, the information control module and the communication control module. The honeypot subsystem and the watermarking subsystem build up the entrapment environment. Besides of the traditional information capture module and the information control module, the communication module interacts with the service console(s). The network deceive module deals with the deceits and inducement against incoming probe or intrusion, in any type of pretending services, opening ports and setting sensitive information. Attracting the kids’attacking is the only purpose of it.
432
Z. Yi et al.
Fig. 2. Function modules of The Entrapment Environment
The information capture module monitors and records all activities in the honeypot. Once the digital watermarked file is visited, it will post a traceback request to the communication control module. The traceback request would contain such information: the digital watermark feature, the file size of the watermarked file, the visit time and file destination IP address, and so on. The information control module can be used to restrict the honeypot contact activity. Once the bait is fetched, it guarantees other hazard would be brought in. In practice, such function can be accomplished by honeypot bandwidth restriction and routers-firewall cooperation. The communication control module transfers the traceback request to the trace service console and vise versa, and then receives the traceback result from the service console. The watermarking module would make an easy-verify digital watermark, which is sightless for the intruder, besides, the watermark should be tractable upon segmentation. Also, the bait file should be interesting enough for the intruder otherwise no alert would be meaningful. 3.2.2 Attack Tracing System Attack Tracing System constituted by Trace service console(s) and distributed Trace agents is described in Fig 3. This part is in charge of the trace instruction dispatch and execution, the trace data collection, route rebuild computation and statistical inquiry. Trace service console(s) executes two main operations: Once the trace request from the honeypot is received, it will mark the request with a serial number, then
IP Traceback Using Digital Watermark and Honeypot
433
dispatches instructions to Trace agents in terms of the relevant digital watermark feature; when the agents’ feedback arrived, Trace service console would rebuild the attack route through the combined data and maintain a database for statistical inquiry that includes information as below: serial number, the request source, the initiation time, the agents’ feedback data ( can act as proof), and provides user statistical inquiry. Trace agents will analyze the incoming and outgoing traffic within a sliding time window once they captured the trace instructions. They detect the specific watermark feature using the relevant algorithm. The preliminary result would go back to the service console.
Fig. 3. Attack Tracing System
Trace service console has 4 modules: Time synchronization, Data synthesis processing, Database and Communication control. Time synchronization provides unified time information for Trace service console and all of the Trace agents. Thus the trace events can be analyzed in proper association, and the database can use such given time for log. Data synthesis processing combines all relevant information feedback from Trace agents, and figures out an intrusion route rebuilt result. Database, on the one hand, records all the relevant trace events, such as intrusion serial number, the trace request initiator, the feedback from all the agents (together with “proofing” data ), and the final route rebuild result; on the other hand, it enables user statistical inquiry with specific terms.
434
Z. Yi et al.
Communication control handles the initial trace request from the honeypot, marks a serial number for that request, dispatches instructions to Trace agents, and receives the trace feedback data. Trace agent includes Time synchronization, Traffic storage, Data analyze and Communication control. Time synchronization serves the same as in the console. Traffic storage provides the data source for analyse. Because the responding time (the time between the intrusion traffic pass the agent and the time agent get the trace instructions from console) is rather short due to automatic program, the agent will only reserve a “within sliding time window” traffic. Data analyze pickups the source and destination IP address among the stored traffic, and verifies with the digital watermark feature to form a conclusion. Communication control module receives and feedbacks on the console’s request. And it also provides the proofing data, e.g., the original packets with the digital watermark. 3.3 Implementation Flow Review Honeypot has no service traffic with “outsider”, so all network traffic connected to the honeypot can be regarded as be probed or intruded. Thus, by means of deploying some “forgery” sensitive data into the honeypot and monitoring any access to such data, we can precisely judge whether the honeypot is intruded or not. Access to the entrap file with watermark is monitored continually. Once such event occurred, the trace request will be initiated by the honeypot. Once Trace service console receives a trace request from honeypot, it will numbers it serially, then dispatch trace instructions to the agents in accordance to the specific watermark information. Upon the instructions from Trace service console, Trace agents analyse the traffic stored within the sliding time window. Basically, the main target is the specific digital watermark. If the incoming traffic has not corresponding watermark, such incoming path is not the one we wanted. If the incoming check is positive while the outgoing negative, then the intrusion is within such connection. If the watermark goes to specific host, then such a host is the very troublemaker. If the watermark exists in both incoming and outgoing traffic, then such a node assumes a midway hop. After all, if tracked watermark is found by the Trace agents, the relevant packets and the source/destination addresses will be reported to the Trace service console. According to all Trace agents’ trace results, Trace service console generates the computation of the rebuilding of the final attack path and relevant log information such as: the request initiator, the event serial number, the initiation time, the trace result from Trace agents and relevant proofing data, and the final trace conclusion, and maps them to the database,
IP Traceback Using Digital Watermark and Honeypot
435
After the path rebuilding, Trace service console may feedback the final result to the request initiator. Considering the honeypot is regarded as “captured”, this feedback is somewhat unnecessary. The whole implementation flowchart is shown as Fig 4. Start
The entrapment environment setting
Data synthesis processing
Record trace results into database
Monitoring the access of watermark
No No
Access the watermark?
Inform trace initiator? Yes
Yes Return the results
Trace request to Trace service console
Trace service console pickup and record watermark feature
The end
Report to Trace service console
Dispatches trace instructions to Trace agents Trace agents detect watermark in incoming traffic
Not in attack route No
Watermark exists? Yes Trace agents detect watermark in outgoing traffic
Is the very attacker No
Watermark exists? Yes Is a midway hop
Fig. 4. Implementation flowchart of trace
436
Z. Yi et al.
4 Performance Analysis In this paper, the network attack traceback is transferred to the traceback with some specific digital watermark. Such implementation is based on a honeypot platform and the watermark is the most important factor. In order to explain how to trace an attack traversing three stepping stones, an example is present as Fig 5.
Fig. 5. Example to trace an attack traversing three stepping stones
Firstly, the entrapment environment (IP address is xxx.112.22.40) detects and induces the attack while the attacker (IP address is xxx.112.22.40) initiates scanning and probing on Internet to find targets. Once the digital watermarked file is visited, a traceback request will be post to Trace service console by the entrapment environment. Then, Trace service console dispatches trace instructions to Trace agents. Each Trace agent detects watermark information from the incoming and outgoing traffic. From Fig 3-5, Trace agent 3 gets watermark in both incoming and outgoing traffic with the source/destination IP of xxx.112.22.50/xxx.112.22.30 and xxx.112.22.30/ xxx.112.22.20; Trace agent 2 gets watermark in both incoming and outgoing traffic with the source/destination IP of xxx.112.22.30/xxx.112.22.20 and xxx.112.22.20/ xxx.112.22.10; Trace agent 1 gets watermark in both incoming and outgoing traffic with the source/destination IP of xxx.112.22.20/xxx.112.22.10 and xxx.112.22.10/ xxx.112.22.40; Trace agent 4 gets watermark only in incoming traffic with the source/destination IP of xxx.112.22.10/xxx.112.22.40; Obviously, the route of the watermark flow is: xxx.112.22.50—> xxx.112.22.30—> xxx.112.22.20—>
IP Traceback Using Digital Watermark and Honeypot
437
xxx.112.22.40. So, we can conclude that IP xxx.112.22.40 is the very attacker, there may be some other Trace agent on the network involved, but those are mid-nodes or stepping stones. Form above example, the deployment of honeypot makes it much easier to detect all kinds of network attack. The false alarm/detection reduced greatly. Compared to the existing traceback methods, this method has the following improvements: 1. A single packet traceback scheme is proposed. The agent can judge single packet by examining the watermark feature. The relevant node is the end node or just a midway hop. 2. A full-automatic reaction framework is illustrated. All the monitoring, checking, logging and after-processing are fully programmed. No manual interaction is needed. Thus the efficiency is improved. 3. The boundary between the Attack-ING and Attack-ED is no longer important. Once the intrusion is detected, the traceback would go all out immediately, as programmed. The whole process doesn’t care about whether the attack is over, e.g., such a scheme can be implemented in both cases. 4. The traceback for attacks based on proxy or slave host is handled. Both the incoming and outgoing traffic are checked to determine the property of Trace agent’s location. For attacks based on proxy or slave host, the watermark would exist in the traffic both the way, whereas the real troublemaker has only the watermark in its input. 5. The independence of router capability, ISP cooperation and network administrators is achieved. All the process are pre-programmed, the routers make no extra marks, it’s sustainable between different ISPs, besides, no human intervene is needed.
5 Conclusions IP traceback is a major method to deter the hackers and punish the network criminals. Combined with the honeypot concept and the digital watermark utility, a detailed novel traceback implementation is presented. The independence of router capability, ISP cooperation and network administrators is achieved, and the efficiency and accuracy is greatly improved.
References [1] Stone, R.: An IP Overlay Network for Tracking DoS Floods. In: Proceedings of the 9th Usenix Security Symposium, Denver, CO, USA (2000) [2] Burch, H., Cheswick, B.: Tracing Anonymous Packets to their Approximate Source. In: Proceedings of the 14th Conference on Systems Administration, New Orleans, Louisiana, USA. LISA XIV (2000) [3] Ferguson, P., Senie, D.: Network Ingress Filtering: Defeating Denial of Service Attacks Which Employ IP Source Address Spoofing. RFC 2827, IETF, Network Working Group, Category: Best Current Practice (May 2000)
438
Z. Yi et al.
[4] Bellovin, S., Leech, M., Taylor, T.: ICMP Traceback Messages. Internet Draft, IETF (October 2001) [5] Wu, S.F., Zhang, L., Massey, D., Mankin, A.: Intention-Driven ICMP Traceback. Internet Draft, IETF (February 2001) [6] Baba, T., Matsuda, S.: Tracing Network Attacks to Their Sources. IEEE Internet Computing, 20–26 (March/April, 2002) [7] Schnackenberg, D., Djahandari, K., Sterne, D.: Infrastructure for Intrusion Detection and Response. In: Proceedings for DISCEX (January 2000) [8] Schnackenberg, D., Djahandari, K., Sterne, D., Holiday, H., Smith, R.: Cooperative Intrusion Traceback and Response Architecture (CITRA). In: Proceedings of the 2nd DARPA Information Survivability Conference and Exposition (June 2001) [9] Snoeren, A.C., Partridge, C., Sanchez, L.A., Jones, C.E., Tchakoutio, F., Kent, S.T., Strayer, S.T.: Hash-Based IP Traceback. In: Proceedings of ACM SIGCOMM 2001 (August 2001) [10] Burch, H., Cheswick, B.: Tracing Anonymous Packets to their Approximate Source. In: Proceedings of the 14th Conference on Systems Administration, 2000 LISA XIV, New Orleans, Louisiana, USA (2000) [11] Savage, S., Wetherall, D., Karlin, A., Anderson, T.: Network Support for IP Traceback. IEEE/ACM Transactions on Networking 9(3), 226–237 (2001) [12] Song, D., Perrig, A.: Advanced and Authenticated Marking Schemes for IP Traceback. In: Proceedings of the IEEE INFOCOM 2001, Anchorage, AK, USA (2001) [13] Lee, W., Park, K.: On the Effectiveness of Probabilistic Packet Marking for IP Traceback under Denial of Service Attack. In: Proceedings of the IEEE INFOCOM 2001, Anchorage, AK, USA (2001) [14] Dean, D., Franklin, M., Stubblefield, A.: An Algebraic Approach to IP Traceback. ACM Transactions on Information and System Security 5, 119–137 (2002) [15] Stern, J.P., Tillich, J.-P.: Automatic Detection of a Watermarked Document Using a Private Key. In: Moskowitz, I.S. (ed.) IH 2001. LNCS, vol. 2137, Springer, Berlin (2001) [16] Wang, X.Y., Chen, S.: Network Flow Watermarking Attack on Low-Latency Anonymous Communication Systems. In: Proc. Of the 2007 IEEE Symposium on Security and Privacy, Oakland, May (2007) [17] Pyun, Y.J., Park, Y.H., Wang, X.Y., Reeves, D.S., Ning, P.: Tracing Traffic Through Intermediate Hosts that Repacketize Flows. In: Proc. Of the 26th Annual IEEE Conf. on Computer Communications (Infocom 2007) (2007) [18] Park, Y.H., Reeves, D.S.: Adaptive Watermarking Against Deliberate Random Delay for Attack Attribution Through Stepping Stones. In: Proc. Of the Ninth International Conference on Information and Communications Security (ICICS 2007) (2007)
Multi-priority Multi-path Selection for Video Streaming in Wireless Multimedia Sensor Networks Lin Zhang1, Manfred Hauswirth1, Lei Shu1,*, Zhangbing Zhou1, Vinny Reynolds1, and Guangjie Han2 1
Digital Enterprise Research Institute, National University of Ireland, Galway {lin.zhang, manfred.hauswirth, lei.shu, zhangbing.zhou, vinny.reynolds}@deri.org 2 Department of Computer Science, Chonnam National University, Korea [email protected]
Abstract. Video sensors are used in wireless multimedia sensor networks (WMSNs) to enhance the capability for event description. Due to the limited transmission capacity of sensor nodes, a single path often cannot meet the requirement of video transmission. Consequently, multi-path transmission is needed. However, not every path found by multi-path routing algorithms may be suitable for transmitting video, because a long routing path with a long end to end transmission delay may not satisfy the time constraint of the video. Furthermore, each video stream includes two kinds of information: image and audio streams. In different applications, image and audio streams play different roles, and the importance levels are different. Higher priority should be given to the more important stream (either the image stream or the audio stream) to guarantee the using of limited bandwidth and energy in WMSNs. In this paper, we propose a Multi-priority Multi-path Selection (MPMPS) scheme in transport layer to choose the maximum number of paths from all found node-disjoint routing paths for maximizing the throughput of streaming data transmission. Simulation results show that MPMPS can effectively choose the maximum number of paths for video transmission.
1 Introduction Using video sensors in wireless sensor networks (WSNs) [1, 2, 3, 4, and 5] can dramatically enhance the capability of WSNs for event description. Efficiently gathering and transmitting video streaming data in WSNs is necessary when the underlying infrastructure, e.g. 3G cellular networks or WLANs, does not exist. Real time video streaming in WSNs [6, 7] generally poses two requirements: 1) Guaranteed end to end transmission delay: Real time video streaming applications generally have a soft deadline which requires that the video streaming in WSNs should always use the shortest routing path with the minimum end to end transmission delay; 2) Using multiple routing paths for transmission: Packets of streaming video data generally are large in size and the transmission requirements can be several times higher than the *
Corresponding author.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 439–452, 2008. © Springer-Verlag Berlin Heidelberg 2008
440
L. Zhang et al.
maximum transmission capacity (bandwidth) of sensor nodes. This requires that multi-path transmission should be used to increase transmission performance in WSNs. Many multi-path routing protocols have been studied in the field of WSNs [8, 9]. However, most of the multi-path routing protocols focus on energy efficiency, load balance, and fault tolerance, and are the extended versions of DSR [10] and AODV [11]. These multi-path routing protocols do not provide a powerful searching mechanism to find out the multiple optimized routing paths in terms of minimizing the path length and the end to end transmission delay as well as taking the limited energy of WSNs into consideration.
Fig. 1. An example of TPGF multi-path routing: Eight paths are found for transmission
TPGF [12] is the first multi-path routing protocol in the wireless multimedia sensor networks (WMSNs) field. It focuses on exploring the maximum number of optimal node-disjoint routing paths in network layer in terms of minimizing the path length and the end to end transmission delay as well as taking the limited energy of WSNs into consideration. The TPGF routing algorithm includes two phases: Phase 1 is responsible for exploring the possible routing path. Phase 2 is responsible for optimizing the found routing path with the least number of hops. The TPGF routing algorithm finds one path per execution and can be executed repeatedly to find more nodedisjoint routing paths. It successfully addressed four important issues: 1) Holebypassing; 2) Guarantee path exploration result; 3) Routing path optimization; 4) Node-Disjoint Multi-path transmission. The Figure 1 shows an example of TPGF multi-path routing in a WSN with two holes. These found routing paths have varying numbers of hops. However, not every path found by TPGF can be used for transmitting video, because a long routing path with a long end to end transmission delay may not satisfy the time constraint of the video streaming data. Furthermore, a video stream includes two kinds of information: image and audio streams. In different applications, image and audio streams play different roles, and the importance levels may be different. For example, in the applications of fire monitoring, image stream is more important than audio stream because it can directly reflect the fire event. But in the applications of Deep Ocean monitoring, the audio stream is more important than image stream, since the visibility in Deep Ocean is very low and the environment is extremely quiet. Therefore, instead of transmitting a video
Multi-priority Multi-path Selection
441
stream back to the base station by using fewer routing paths with a stricter real time constraint, it is better to split the video stream into image and audio streams and give higher priority to the more important stream (either the image stream or the audio stream) to guarantee the using of the suitable paths, as shown in Figure 2. The less important stream can be transmitted with a relatively looser real time constraint. Consequently, the routing paths with the longer end to end transmission delay can be used, which can increase the total received data in the base station, where the received data can be joined again or processed separately.
Fig. 2. The general model for multi-priority multi-path transmission
How to split a video stream into an image stream and an audio stream has been widely solved by many programs [13], which is not the focus of this paper. In this paper, we proposed a new Multi-priority Multi-path Selection (MPMPS) scheme to choose the maximum number of paths from all found node-disjoint routing paths for maximizing multimedia streaming data transmission and guaranteeing the endto-end transmission delay in WMSNs. This scheme makes two contributions on: 1) supporting multiple priorities; 2) choosing the maximum number of paths to maximize the throughput of the streaming data transmission. The rest of this paper is organized as follows. In section 2, we discuss the related work. In section 3, we present the network model and discuss the multiple priorities in section 4. In section 5, we formulate and analyze the problem. In section 6, we present the Multi-priority Multi-path Selection (MPMPS) algorithm. In section 7, we present the simulation and comparison work, and we conclude this paper in section 8.
2 Related Work Surveys on WMSNs [14] have shown that transmitting multimedia streaming data in WSNs is still a relatively new research topic compared with other research topics in WSNs such as energy efficient routing, query processing, etc. In [15], another survey work on multimedia communication in WSNs also analyzed and discussed the existing research work from both the mobile multimedia and the WSNs fields in application, transport and network layers. Both surveys show that current existing protocols from both mobile multimedia and WSNs fields are not suitable for multimedia communication in WSNs, because they did not consider the characteristics of multimedia streaming data transmission and the natural constraints of WSNs at the same time. There exists a clear need for a research effort focusing on developing efficient communication protocols and algorithms in order to realize WMSNs applications.
442
L. Zhang et al.
To the best of our knowledge, no research has been done for multi-path selection in WMSNs. Although multi-path selection algorithms have not been studied in WSNs yet, there still are some research works that have been done for multi-path selection in other networks. In [16], the authors proposed an Energy Aware Source Routing algorithm to choose the multiple routing paths in wireless ad hoc networks, the goal of this research work is to maximize the network lifetime by minimizing the overhearing ratio. In [17], the authors considered the concurrent packet drop probability of multipath in wireless ad hoc network, and proposed a path selection algorithm to minimize the concurrent packet drop probability. In [18], the authors investigated the problem of selecting multiple routing paths to provide better reliability in multi-radio, multichannel wireless mesh networks with stationary nodes. In [19], a multi-path selection algorithm is proposed in an overlay network which focuses on minimizing the correlation of multiple paths. None of these above mentioned multi-path selection algorithms has a similar research goal as ours which is to choose the maximum number of paths from all found node-disjoint routing paths for maximizing multimedia streaming data transmission as well as guaranteeing the end-to-end transmission delay. Therefore, to propose a new multi-path selection scheme for multimedia streaming in WMSNs is the key focus of this paper.
3 Network Model In this paper, we consider a homogeneous geographic WSN. The locations of sensor nodes and the base station are fixed and can be obtained by using GPS. Each sensor node has the knowledge of its own geographic location and the locations of its 1-hop neighbor nodes. All sensor nodes have the same maximum transmission capacity (bandwidth) TC. Each source node, for example, a video sensor node, continuously produces sensed video stream SV with a data generation rate RV kbps. Source nodes can dynamically adjust (increase or decrease) their data generation rate by changing the sampling frequency. The video stream from the source node is sent to the base station for further processing. We assume that only source nodes know the location of the base station and other sensor nodes can only know the location of the base station by receiving the packet from source nodes. Video stream can be splitted into image stream SI with data generation rate RI kbps and audio stream SA with data generation rate RA kbps (RI + RA = RV). The soft real time deadline of the image stream is TI and the soft real time deadline of the audio stream is TA. After repeatedly executing the TPGF routing algorithm N number of node-disjoint routing paths P = {p1, …, pn} are found. Each routing path pi has its own end to end transmission delay di based on the routing hops in the path. Only MI number of routing paths PSatisfy_Image = {pSI1, …, pSImi} with transmission delay DSatisfy_Image = {dSI1, …, dSImi} can satisfy the soft real time deadline TI, and only MA number of routing paths PSatisfy_Audio = {pSA1, …, pSAma} with transmission delay DSatisfy_Audio = {dSA1, …, dSAma} can satisfy the soft real time deadline TA. Here, we assume that a source node only tries to use an additional transmission path when all its currently using transmission paths meet the maximum transmission capacity, and a routing path cannot be used for transmitting two different multimedia streams at the same time. Thus, the total number of chosen paths is M (M = MI + MA).
Multi-priority Multi-path Selection
443
4 Multiple Priorities Supporting multiple priorities is a key feature of our MPMPS scheme. In this section, we present our multiple priorities in two aspects: 1) End to end transmission delay based priority; 2) Context aware multimedia content based priority. Definition 1. End to end transmission delay based priority. For any two paths pi and pj within the N number of node-disjoint routing paths P = {p1, …, pn} that are found by repeatedly executing the TPGF routing algorithm, if their end to end transmission delays meet di < dj, we assign the higher priority to path pi. Theorem 1. For any stream S, choosing one routing path with the higher priority from any two path pi and pj can let it reach the base station faster than choosing another routing path with the lower priority. Proof: Let pi denote the routing path with the higher priority and pj denote the routing path with the lower priority. According to Definition 1, the end to end transmission delay di of path pi is smaller than the end to end transmission delay dj of path pj. Thus, if the stream S chooses the routing path pi, it can reach the base station faster than choosing the routing path pj. □ It is clear that in MPMPS scheme, the routing path with the higher priority should always be chosen first to reduce the end to end transmission delay. Definition 2. Context aware multimedia content based priority. In any situation where the video sensor nodes are deployed for gathering information, for both image stream and audio stream, if the image stream is more important for reflecting the event and describing the phenomenon, we assign higher priority to the image stream. On the other hand, if the audio stream is more important for reflecting the event and describing the phenomenon, we assign higher priority to the audio stream. Theorem 2. For any two given streams SI and SA, sending the stream with the higher priority first can reflect and describe the event better than sending the stream with the lower priority first. Proof: We can consider a fire monitoring application in a forest. Let SI denote the image stream with the higher priority and SA denote the sound stream with the lower priority. According to Definition 2, the image stream SI is more important for reflecting the event and describing the phenomenon. Thus, sending the stream with the higher priority first can reflect and describe the event better than sending the stream with the lower priority first. □ According to Theorem 2, in MPMPS scheme, the stream with the higher priority should always be sent first to reflect the events. Here, we want to clearly mention that dynamically assigning different priorities to image and audio streams based on the surrounding situation of the WSNs is another research issue, which is defined as “Situation Awareness in wireless sensor networks”. For example, the video sensor node can use the sensor data gathered by attached light and sound sensors. When the light intensity is higher than a certain value and the sound intensity is lower than a certain value, then the higher priority is
444
L. Zhang et al.
assigned to the image stream. Likewise, when the light intensity is lower than a certain value and the sound intensity is higher than a certain value, then the lower priority is assigned to the image stream. In this paper, we assume the priorities of image stream and audio stream is pre-assigned by using a certain situation awareness algorithm [20].
5 Problem Statement and Analysis The problem of choosing the maximum number of paths M from all N node-disjoint routing paths for maximizing the throughput of video streaming data transmission and guaranteeing the end-to-end transmission delay can be formulized as: Maximize: Subject to:
M, (M = MI + MA) PSatisfy_Image
∪P
Satisfy_Audio
(1)
⊆P
PSatisfy_Image ∩ PSatisfy_Audio = ø
M ≤
(2) (3)
M ≤ N
(4)
P ≠ ø
(5)
N ≠ 0
(6)
DSatisfy_Image ≠ ø
(7)
DSatisfy_Audio ≠ ø
(8)
┌
RI / TC ┐ + ┌ RA / TC ┐
(9)
Equation (9) reflects that a routing path cannot be used for transmitting two different multimedia streams at the same time. Based on equation (4), it is clear that the maximum number of paths M is bounded by the found node-disjoint routing paths N. However, the found routing paths N is also bounded by two factors as following presented Theorem 3 and Theorem 4. Definition 3. Node-disjoint routing path. A node-disjoint routing path is defined as a routing path which consists of a set of sensor nodes, and excluding the source node and the base station none of these sensor nodes can be reused for building another routing path. Theorem 3. For any given source node SSource_Node with NNeighbor_Node number of 1-hop neighbor nodes within its transmission radius, it can have maximum NNeighbor_Node number of possible node-disjoint routing paths for transmitting data. Proof: Assume that there are NNeighbor_Node number of routing paths for a source node SSource_Node. Let the source node SSource_Node tries to find the (NNeighbor_Node + 1) number of node-disjoint routing paths. According to Definition 3, it will try to explore every its 1hop neighbor node to build up a routing path and every used neighbor node cannot be used twice. When the source node SSource_Node tries to find the (NNeighbor_Node + 1)th
Multi-priority Multi-path Selection
445
routing path after finding NNeighbor_Node number of paths, there is no more 1-hop neighbor nodes available. Thus, for this source node SSource_Node, the maximum number of possible node-disjoint routing paths is NNeighbor_Node. □ Theorem 4. For any given source node SSource_Node, the maximum number of possible node-disjoint routing paths is affected by using different routing algorithms. Proof: For example in Figure 3, if using the greedy forwarding routing algorithm (GPSR) [21], the number of routing paths can be only one (black color path) but with the shorter end to end transmission delay. However, if using the label-based multi-path routing (LMR) [22], the number of routing path can be two (green color path) but with relative longer end to end transmission delay. □
Fig. 3. Multi-path GPSR vs. LMR
Based on Theorem 4, it is not hard to draw the conclusion that the goal of exploring more routing paths contradicts the goal of using the approximately shortest routing paths with the minimized end to end transmission delay. It is worth noting that TPGF also sets using the shortest transmission path as the basic criteria, and then explores the possible number of node-disjoint routing paths. The key motivation for using this basic criteria is that the shortest transmission path generally has the shortest end to end transmission delay which may satisfy the time constraint of multimedia stream. Corollary 1. For any given source node SSource_Node, the maximum number of final chosen paths M has the upper bound Min(N, ┌ RI / TC ┐ + ┌ RA / TC ┐),
(10)
where Min(para1, para2) is the function which returns the smaller value. Proof: When the end to end transmission delays of all node-disjoint routing paths satisfy the real time constraints of image and audio streams, all these node-disjoint routing paths can be chosen for transmitting data. However, the actually required number of routing paths is decided by the ┌ RI / TC ┐ + ┌ RA / TC ┐. When N ┌ RI / TC ┐ + ┌ RA / TC ┐, the final number of chosen paths is ┌ RI / TC ┐ + ┌ RA / TC ┐. When N < ┌ RI / TC ┐ + ┌ RA / TC ┐, although more routing paths are required for transmitting data, but only N number of routing paths can be used. Thus, the upper bound on the maximum number of final chosen paths M is Min(N, ┌ RI / TC ┐ + ┌ RA / TC ┐). □
≥
Corollary 2. For any given source node SSource_Node, the maximum number of final chosen paths M has the lower bound 0. Proof: When the end to end transmission delays of all node-disjoint routing paths are longer than the real time constraints of both image and audio streams, none of these routing paths can be chosen for transmitting data. Thus, the lower bound on the maximum number of final chosen paths M is 0. □
446
L. Zhang et al.
Fig. 4. The workflow of MPMPS algorithm
After having all these analysis, we propose the following Multi-priority Multi-path selection (MPMPS) scheme to solve the problem described in equations (1) – (9).
6 Multi-priority Multi-path Selection Scheme The MPMPS algorithm should be executed after the TPGF explored all the nodedisjoint routing paths. In MPMPS, the more important multimedia stream always chooses the routing path with the higher priority to transmit. The workflow of MPMPS is shown in Figure 4. MPMPS algorithm has two phases: 1) searching the maximum number of paths for the stream with the higher priority; 2) searching the maximum number of paths for the stream with the lower priority. Due to the space limitation, we only show the pseudo code of the first phase of MPMPS in Figure 5. After finishing the searching work for the stream with the higher priority, the searching work for the stream with the lower priority will be conducted by using the similar code as presented in Figure 5 and return another number M2. The final maximum number of paths M = M1 + M2. The time complexity of this algorithm is O(n2) where n is the number of possible routing paths that can be found by repeatedly executing the TPGF.
7 Simulation and Evaluation To demonstrate and evaluate MPMPS, we use a new WSNs simulator called NetTopo [23], in which the TPGF is implemented. In this simulation, we consider a WMSN for a fire monitoring application in a forest in which the image stream is more urgent and important than the audio stream in terms of reflecting the fire event.
Multi-priority Multi-path Selection
447
Fig. 5. Search the maximum number of paths for the stream with the higher priority
The end to end transmission delay in WSNs is actually determined by the number of hops. Thus, to find out the path with the shortest transmission delay De2e is to find out the path with the smallest number of hops: De2e = H * Dhop, where H is the number of hops and Dhop is the average delay of each hop. Table 1. Simulation parameters Parameter Network Size Number of Base Station Number of Sensor Node Number of Source Node Video Sensor Generation Rate R Sensor Node Maximum TC Sensor Node Transmission Radius Delay of Each Hop Dhop Video Stream Time Constraint Splitted Image Stream Time Constraint Splitted Audio Stream Time Constraint
Value 500 m * 500 m 1 399 1 72 kbps 12 kbps 48 m 20 ms 280 ms 280 ms (Inherit from video) 320 ms (Tolerable constraint)
(11)
448
L. Zhang et al.
The parameters used in our simulation are shown in Table 1. The time constraint of video stream is 280 ms. Because the image stream actually plays the key role for reflecting the fire event, it should inherit the time constraint of video stream which is also 280 ms. The time constraint of audio stream is extended to tolerable constraint 320 ms, which allows it to use the routing path with relative longer transmission delay. 7.1 Comparison As shown in Figure 6, when a fire event is detected, after repeatedly executing the TPGF routing algorithm, six node-disjoint routing paths are found in total from the video source node (red color node) to the base station (green color node).
Fig. 6. Six available routing paths are found by using the TPGF routing algorithm
The end to end transmission delays of these six routing paths are shown in Table 2. Table 2. The end to end transmission delay Routing path Path No. 1 Path No. 2 Path No. 3 Path No. 4 Path No. 5 Path No. 6
End to end transmission delay 240 ms 260 ms 320 ms 260 ms 240 ms 300 ms
Since there is no related research work in WMSNs, to prove the effectiveness of MPMPS, we compare MPMPS with another version of MPMPS (named as MPS algorithm) which does not split the video stream (72 kbps) into image stream (48 kbps) and audio stream (24 kbps) but still guarantees the end to end transmission delay of video stream, which means only the end to end transmission delays of all node-disjoint routing paths are used as the parameters for choosing the qualified routing paths. Within these six node-disjoint routing paths, if the MPS is used, only 4 paths (Path No. 1, 2, 4 and 5) are qualified for transmitting video stream since the deadline of video stream is 280 ms. Thus, for every one second, the received data by the base station can be 48 kb (image stream 32 kb, audio stream 16 kb) as shown in Table 3.
Multi-priority Multi-path Selection
449
Table 3. Data received by the base station for every one second when using MPS algorithm Path No. 1 No. 2 No. 3 No. 4 No. 5 No. 6
E2E Delay 240 ms 260 ms 320 ms 260 ms 240 ms 300 ms
Used Yes Yes No Yes Yes No
Image Stream 8 kb 8 kb 0 kb 8 kb 8 kb 0 kb
Audio Stream 4 kb 4 kb 0 kb 4 kb 4 kb 0 kb
However, when using the MPMPS, the video stream (72 kbps) is split into image stream (48 kbps) and audio stream (24 kbps). Based on these six found routing paths, four of them are chosen (in pink color) for image stream transmission and the remaining two paths are used for audio stream transmission as shown in Figure 7. For every one second, the received data by the base station can be increased from 48 kb to 72 kb (image stream 48 kb, audio stream 24 kb) as shown in Table 4. Table 4. Data received by base station for every one second when using MPMPS Path No. 1 No. 2 No. 3 No. 4 No. 5 No. 6
E2E Delay 240 ms 260 ms 320 ms 260 ms 240 ms 300 ms
Used Yes Yes Yes Yes Yes Yes
Image Stream 12 kb 12 kb 0 kb 12 kb 12 kb 0 kb
Audio Stream 0 kb 0 kb 12 kb 0 kb 0 kb 12 kb
The simulation result of Figure 8 shows that using MPMPS can greatly increase the total received multimedia streaming data by the base station. 7.2 Demonstration of MPMPS Execution The execution of MPMPS algorithm is demonstrated in Figures 9, 10, 11 and 12.
Fig. 7. Video streaming with MPMPS, four paths are chosen for image stream, and two paths are used for audio stream
Fig. 8. Data received by the base station (kb) for every one second
450
L. Zhang et al.
Fig. 9. Choose the path No. 1
Fig. 10. Choose the path No. 5
Fig. 11. Choose the path No. 2
Fig. 12. Choose the path No. 4
Four routing paths are chosen for image stream transmission. In Figure 9, path No. 1 is chosen for image stream transmission first since it has the shortest end to end transmission delay. In Figure 10, path No. 5 is chosen after the path No. 1 because it has the second shortest end to end transmission delay. In Figures 11 and 12, path No. 2 and path No. 4 are chosen respectively for image stream transmission.
8 Conclusion Video sensors can be used to enhance the capability of WSNs for event description. Efficiently gathering and transmitting video in WSNs is extremely important when the underlying infrastructure, e.g. 3G cellular networks or WLANs, does not exist. In different applications, image and audio streams have different importance levels. Higher priority should be given to the more important stream to guarantee the using of limited bandwidth and energy in WSNs. In this paper, we presented the MPMPS scheme in transport layer based on our previous research work: the TPGF multi-path routing algorithm. The major contributions of this MPMPS scheme have two aspects: 1) supporting multiple priorities; 2) choosing the maximum number of paths to maximize the throughput of the streaming data transmission. Simulation result shows that using MPMPS can effectively choose the maximum number of paths for video transmission.
Multi-priority Multi-path Selection
451
Acknowledgments The work presented in this paper was supported by the Lion project supported by Science Foundation Ireland under grant no. SFI/02/CE1/I131.
References 1. Holman, R., Stanley, J., Őzkan-Haller, T.: Applying video sensor networks to nearshore environment monitoring. IEEE Pervasive Computing 2(4), 14–21 (2003) 2. Chu, M., Reich, J., Zhao, F.: Distributed Attention in Large Scale Video Sensor Networks. In: Proceedings of the second IEE Workshop on Intelligent Distributed Surveillance Systems, London (February 23, 2004) 3. Qin, M., Zimmermann, R.: Maximizing Video Sensor Network Lifetime through Genetic Clustering., USC Technical Report USC-CS-TR05-844, University of Southern California (2005) 4. Agathangelou, D., Lo, B.P.L., Wang, J.L., Yang, G.Z.: Self-Configuring Video-Sensor Networks. In: Gellersen, H.-W., Want, R., Schmidt, A. (eds.) PERVASIVE 2005. LNCS, vol. 3468, pp. 29–32. Springer, Heidelberg (2005) 5. Little, T.D.C., Konrad, J., Ishwar, P.: A Wireless Video Sensor Network for Autonomous Coastal Sensing. In: Proceedings of the Conference on Coastal Environmental Sensing Networks (CESN 2007), Boston (April 2007) 6. Gerla, M., Xu, K.: Multimedia streaming in large-scale sensor networks with mobile swarms. ACM SIGMOD Record 32(4), 72–76 (2003) 7. Misra, S., Reisslein, M., Xue, G.: A survey of multimedia streaming in wireless sensor networks. IEEE Communications Surveys and Tutorials (in print) 8. Tsai, J., Moors, T.: A Review of Multipath Routing Protocols: From Wireless Ad Hoc to Mesh Networks. In: Proccedings of the ACoRN Early Career Researcher Workshop on Wireless Multihop Networking, Australia (2006) 9. Multipath routing in wireless sensor networks, http://sip.deri.ie/wiki/ index.php/multipath_routing_in_Wireless_Sensor_Networks 10. Johnson, D.B., Maltz, D.A.: Dynamic Source Routing in Ad Hoc Wireless Networks. In: Imielinski, Korth (eds.) Mobile Computing, Kluwer Academic Publishers (1996) 11. Perkins, C.: Ad Hoc On-Demand Distance Vector (AODV) Routing. RFC 3561 (2003) 12. Shu, L., Zhou, Z.B., Hauswirth, M., Phuoc, D.L., Yu, P., Zhang, L.: Transmitting Streaming Data in Wireless Multimedia Sensor Networks with Holes. In: Proceedings of the sixth International Conference on Mobile and Ubiquitous Multimedia (MUM 2007), Oulu, Finland (December 2007) 13. Video Stream Codec, http://www.exefind.com/video-stream-codec/ 14. Akyildiz, I.F., Melodia, T., Chowdhury, K.R.: A Survery on Wireless Multimedia Sensor Networks. Computer Networks 51(4), 921–960 (2007) 15. Gurses, E., Akan, O.B.: Multimedia Communication in Wireless Sensor Networks. Annals of Telecommunications 60(7-8), 799–827 (2005) 16. Hwang, D.Y., Kwon, E.H., Lim, J.S.: EASR: An Energy Aware Source Routing with Disjoint Multipath Selection for Energy-Efficient Multihop Wireless Ad Hoc Networks. In: Boavida, F., Plagemann, T., Stiller, B., Westphal, C., Monteiro, E. (eds.) NETWORKING 2006. LNCS, vol. 3976. Springer, Heidelberg (2006)
452
L. Zhang et al.
17. Wei, W., Zakhor, A.: Path Selection for Multi-path Streaming in Wireless Ad Hoc Networks. In: Proceedings of the International Conference on Image Processing (ICIP 2006), Atlanta, Georgia (September 2006) 18. Tsai, J.W., Moors, T.: Interference-aware Multipath Selection for Reliable Routing in Wireless Mesh Networks. In: Proceedings of the First IEEE International Workshop on Enabling Technologies and Standards for Wireless Mesh Networking (MeshTech 2007), Pisa, Italy (October 8, 2007) 19. Ma, Z., Shao, H.R., Shen, C.: A New Multi-Path Selection Scheme for Video Streaming on Overlay Networks. In: Proceedings of International Conference on Communications (ICC 2004), Paris, France, vol. 3, pp. 1330–1334 (2004) 20. Shin, J., Kumar, R., Mohapatra, D., Ramachandran, U., Ammar, M.: ASAP: A Camera Sensor Network for Situation Awareness. In: Tovar, E., Tsigas, P., Fouchal, H. (eds.) OPODIS 2007. LNCS, vol. 4878, Springer, Heidelberg (2007) 21. Karp, B., Kung, H.T.: GPSR: greedy perimeter stateless routing for wireless networks. In: Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom 2000), Boston, USA (August 2000) 22. Hou, X., Tipper, D., Kabara, J.: Label-based Multipath Routing (LMR) in Wireless Sensor Networks. In: Proceedings of The International Symposium on Advanced Radio Technologies (ISART 2004), Boulder, USA (March 2004) 23. NetTopo, http://lei.shu.deri.googlepages.com/nettopo
Energy Constrained Multipath Routing in Wireless Sensor Networks Antoine B. Bagula and Kuzamunu G. Mazandu Dept. of Computer Science, University of Cape Town, Private Bag X3 Rondebosch, South Africa [email protected]
Abstract. This paper addresses the issue of Quality of Service (QoS) Routing to improve energy consumption in wireless sensor networks (WSNs). Building upon a previously proposed QoS provisioning benchmark model, we formulate the problem of routing sensed information in a WSN network as a path-based energy minimization problem subject to QoS routing constraints expressed in terms of reliability, delay and geo-spatial energy consumption. Using probabilistic approximations, we transform the path-based model into a link-based model and apply methods borrowed from the zero-one optimization framework to solve this problem. By comparing the performance achieved by its solution to the benchmark model, simulation results reveal that our model outperforms the benchmark model in terms of energy consumption and quality of paths used to route the sensed information.
1 Introduction Wireless Sensor Networks (WSNs) are a family of wireless networks which are currently deployed in both military and civil applications to achieve different types of sensing activities such as seismic, acoustic, chemical, and physiological sensing. They consist of a large number of tiny nodes, each node being regarded as a cheap computer deployed inside the phenomenon or very closed to it [1] to perform sensing, computation and communication. A typical WSN deployment scenario consists of a placing sensing devices in a human hostile environment to sense chemical substances and communicate the results via a satellite link or an helicopter to a center where these results are processed and appropriate decisions are taken about the controlled environment. It is predicted that by allowing allowing communication between inanimate objects, WSNs will bring a third dimension to the the first mile of the future Internet where information will not only be accessed “anywhere and anytime” but also represent “anything”. As pointed out by Akyildiz et al. [1], wireless sensor networks present several limitations. These include 1. Sensor nodes are densely deployed and are range-limited systems, therefore efficient multi-hop routing algorithms are required. 2. Sensor nodes are unreliable and prone to failure, and the topology of sensor networks changes very frequently, hence it is desirable to set up energy constrained multi-path routing. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 453–467, 2008. c Springer-Verlag Berlin Heidelberg 2008
454
A.B. Bagula and K.G. Mazandu
3. Sensor nodes are limited in power, computational capacities and memory, thus the topology control with per-node transmission power adjustment is needed [2]. Low power consumption is an important paremeter upon which the efficiciency of the routing algorithms for wireless sensor networks depends. Single path routing algorithms are apparently simple than multi-path routing and consume lower energy in wireless sensor networks. However multi-path routing may be used in delay and reliability constrained wireless sensor network settings to (1) increase the likelihood of reliable data delivery by sending multiple copies of data along different paths [3] and (2) decrease the data delivery delays by sharing the data transmission delay among the different paths available from the source to the destination. Energy, Delay and Reliability can thus become competitive constraints in WSNs raising a tradeoff between single path and multipath routing deployment when energy and delay minimization and reliability maximization are at stake. It was pointed in [4] that traditional node disjoint paths have attractive resilience properties, but they can be energy inefficient since they lead to longer alternate nodedisjoint paths which consume more energy than the primary path. The work presented in [4] proposes the braided multi-path routing model where the node disjointedness constraint is relaxed by considering alternative paths which are partially disjoint from the primary path. A braided multi-path model based on constrained random walks to achieve almost stateless multi-path routing on a grid network is proposed in [5]. Recently, X. Huang and Y. Fang [6] proposed a braided multi-path routing model for WSNs referred to as Multi-Constrained Multi-Path routing (MCMP) where packet delivery from nodes to the sink is achieved based on QoS constraints expressed in terms of reliability and delay. This model addresses the issue of multi-constrained QoS in wireless sensor networks taking into account the unpredictability of network topology and trying to minimize energy consumption. This paper models Quality of Service (QoS) routing in Wireless Sensor Networks (WSNs) to achieve energy efficiency. Building upon geo-spatial energy propagation considerations, we extend the model proposed by [6] to formulate QoS routing in WSN networks as an energy optimization problem constrained by reliability, play-back delay and geo-spatial path selection constraints. We solve this problem using optimization methods borrowed from the zero-one mathematical framework. We compare the energy consumed by our model referred to as Energy-Constrained Multi-Path routing (ECMP) to the energy consummed by the MCMP benchmark model and a Link-Disjoint Paths Routing model referred to as LDPR. In the remainder of this paper, we present a sensor network communication model in Section 2 and examine the path delay, energy and reliability behavior in Section 3. Thereafter, we present in the same section a brief formulation of the MMCP and ECMP routing problems. Finally, Section 4 proposes the ECMP model while simulation results comparing the ECMP, MCMP and LDPR algorithms are presented in Section 5. Our conclusions are presented in Section 6.
2 Sensor Network Communication Model When deployed in sensing activities, sensor nodes communicate wirelessly using radio wave, satellite or light and are deployed in three forms : (1) Sensor node used to
Energy Constrained Multipath Routing in Wireless Sensor Networks
Senso
r nod es
Internet & Satel ite
455
Sink
Task Manager Node
User
Sensor Field
Fig. 1. Sensor Nodes Scattered in a Sensor Field
sense the information, (2) Relay node used as relay for the information sensed by other nodes and (3) Sink node acting as a base station with high energy used to transmit the sensed information to a remote processing place. A WSN operates in a multi-hop mode where sensor nodes co-operate to ensure that every information sensed and data collected are successfully relayed to the sink. This is illustrated by the the sensor network communication model depicted by Figure 1 where the sensor nodes scattered in a target observation area collect and route data to the end users via the sink or base station and the base station may communicate with the task manager node via Internet or satellite. Sensor nodes may fall into one of the following states [7]: 1. Sensing: a sensing node monitors the source using an integrated sensor, digitizes the information, processes it, and stores the data in its on-board buffer. These data will be eventually sent to the base station. 2. Relaying: a relaying node receives data from other nodes and forwards it towards their destination. 3. Sleeping: for a sleeping node, most of the device is either shut down or works in low-power mode. A sleeping node does not participate in either sensing or relaying. However, it “wakes up” from time to time and listens to the communication channel in order to answer requests from other nodes. Upon receiving a request, a state transition to “sensing” or “relaying” may occur. 4. Dead: a dead node is no longer available to the sensor network. It has either used up its energy or has suffered vital damage. Once a node is dead, it cannot re-enter any other state.
3 A Path-Based Routing Model Let consider a sensor network represented by a directed graph G = (N , L) where N is the set of sensor nodes (location) and L the set of links. As a data source is usually far from the sink with the distance exceeding the range of communication, there is a need
456
A.B. Bagula and K.G. Mazandu
to deploy a certain number of sensor nodes that may act as relays used to route data over a multi-hop path. The multi-hop path between node s1 and node s is represented by p = (s1 , . . . , s ) an ordered list of nodes si ∈ N such that the pair (si , si+1 ) ∈ L, for i = 1, . . . , − 1. 3.1 Path Set Delay, Reliability and Energy The path p is a series system of links and the path delay i.e the delay between the node s1 and s is given by the sum of link delays D(p) =
−1
d(si , si+1 )
(1)
i=1
where d(si , si+1 ) is the delay of data over the link (si , si+1 ) ∈ L. Similarly, the energy consumption between node s1 and node s is given by [1]. W(p) =
−1
ω(si , si+1 )
(2)
i=1
where ω(si , si+1 ) is the energy required to receive and transmit data between the node si and si+1 . The necessary energy per bit for a node si to receive a bit and then transmits it to the node si+1 is given by [7] ωi (si , si+1 ) = α1 + α2 xsi − xsi+1 n
(3)
where α1 = α11 + α12 with α11 the energy per bit consumed by si as transmitter and α12 the energy per bit consumed as receiver, and α2 accounts for the energy dissipated in the transmitting operation. Typical values for α1 and α2 are respectively α1 = 180nJ/bit and α2 = 10pJ/bit/m2 for the path loss exponent experienced by a radio transmission n = 2 or α2 = 0.001pJ/bit/m4 for the path loss exponent experienced by a radio transmission n = 4. xsi is the location of the sensor node si , and xsi − xsi+1 is the euclidean distance between the two sensor nodes si and si+1 , i = 1, . . . , − 1. Thus, in (2), we have ω(si , si+1 ) = fsi →si+1 · ωi (si , si+1 )
(4)
where fsi →si+1 denotes the data rate on the link (si , si+1 ) ∈ L. Assuming that the links of a path are independent, from [8], the path reliability R(p) is given by R(p) =
n−1
R(si , si+1 )
(5)
i=1
where R(si , si+1 ) is the reliability of the link (si , si+1 ) ∈ L. Considering the set of parallel paths P = {p1 , . . . , pM }, the delay experienced and the energy consumed by data source routed over the path set P are respectively given by D(P) = max{D(p) : p ∈ P} (6)
Energy Constrained Multipath Routing in Wireless Sensor Networks
and W(P) =
W(p)
457
(7)
p∈P
where D(p) and W(p) are computed respectively in (1) and (2). And finally, from [8], the reliability of the data source routed over P is given by R(P) = 1 − 1 − R(p) (8) p∈P
where R(p) is computed using the formula given in (5). 3.2 The Path-Based Routing Problem Let P = {p1 , . . . , pM } denote the set of possible paths from s to the base station b assumed to be stationary. Each path pj ∈ P, j = 1, . . . , M, is associated with the delay dj and reliability rj . If every path p ∈ P has delay larger than the delay D required by the data source, then the data source is dropped since no path can fulfill the delivery of the packet with that constraint. In the case of the reliability constraint, multi-path routing can be used to improve the reliability. However, the use of several path increases energy consumption, which therefore affects the lifetime of the network. Thus, in order to save the energy, the path set with minimum number of paths is chosen as forwarding set. The routing objective is then to find a minimum number of path in P that satisfy the QoS requirements of a given data source f with minimum energy consumption. This can be formulated as an optimization problem given below. Problem 0. Given delay and reliability requirements D and R, the QoS routing problem consists of finding the smallest set of paths P[s, b] from a source s to the base station b which minimize the energy consumption W (P[s, b]) subject to delay and reliability constraints as expressed by min W (P) P⊂P
subject to D (P[s, b]) ≤ D
(9)
R (P[s, b]) ≥ R
(10)
where D (P[s, b]) , R (P[s, b]) , and W (P) are respectively defined by the relations (6), (8), and (7). 3.3 Probabilistic Transformation Problem 0 assumes global knowledge of topology and network characteristics, and requires exact information about path quality which is almost impossible to get in wireless sensor networks. Moreover, as expressed by the equations (9) and (10), the QoS constraints are hard constraints that require QoS enforcement for the entire lifetime of a session [9]. However, after the connection is setup, there exist transient periods of
458
A.B. Bagula and K.G. Mazandu
time when the QoS specification may not be honored due to frequent changes of the sensor network topology. This means that QoS requirement can be provided only with a certain probability referred to as soft-QoS. Thus, the problem can be reformulated as follows.
Problem 0 . Given delay and reliability requirements D and R, the QoS routing problem consists of finding a smallest set of paths P[s, b] from a source s to the base station b that minimize energy consumption subject to delay and reliability constraints as expressed by min W (P) P⊂P
subject to P r D (P[s, b]) ≤ D ≥ α P r R (P[s, b]) ≥ R ≥ β
(11) (12)
where P r (X) denotes the probability of event X, and α and β are respectively softQoS probability for delay and reliability. 3.4 Approximating Global by Local Constraints Based on the assumptions that the path model is inappropriate for QoS routing in wireless sensor networks, different approximations were proposed in [6] to tranform the path- into link-based constraints by expressing the reliability and delay constraints into stochastic constraints which are more relevant to a wireless sensor network setting. The main objective of these transformations is to redesign the routing process in a local context where the routing decision is concerned with only a node and its direct neighbors rather than an end-to-end path. These transformations are based on the following key features: – The links are assumed to be independent in term of delay and reliability and reliability and delay are expressed as random time dependent processes where the time t is omitted in our notation for simplicity sake. – The delay constraint is expressed for each node ı in terms of hop requirement Ldı = (D − Dı )/hı with Dı being the actual delay experienced by a packet at node ı and hı the hop count from node ı to the sink. – The reliability requirement is expressed for each node ı in terms of hop requirement Lrı = hı Rı with Rı being the portion of reliability requirement assigned to the path through node ı decided by the upstream node of ı.
Energy Constrained Multipath Routing in Wireless Sensor Networks
– The resulting local constraints are expressed by
2 α d 2 xj Δıj + 2Ldıdıj − d2ıj ≤ Ldı | Ldı > dıj 1−α
Rıj − rıj xj log Q ≥ log β, Δrıj j∈N[ı] xj log (1 − Rıj ) ≤ log (1 − Lrı )
459
(13) (14) (15)
j∈N[ı]
0 ≤ Rıj ≤ rıj , for all j ∈ N[ı] xj = 0 or 1, for all j ∈ N[ı]
(16) (17)
where xj is a decision variable which takes the value 1 if the path pj ∈ P [s, b] and 0 otherwise. Rıj and Dıj are respectively the delay and reliability of the link ıj , Δdıj and Δrıj are respectively standard deviation of Dıj and Rıj . The Q−function in (14) is defined by
∞ 1 1 Q(x) = √ exp − t2 dt, (18) 2 2π x Note that the equations (13), (14) are an approximation of the delay (11) constraint while (15) is an application of the reliability (12) constraint to a link model. These approximations are detailled in [6].
4 The Energy Constrained Multipath (ECMP) Model The MCMP model was proposed in [6] to minimize the number of paths used in forwarding data source to the sink with the expectation of minimizing the total energy transmission. It is expressed by Problem 1 . At each node ı, find the subset N0 ⊆ N[ı] the set of neighbors of node ı that solves the following linear zero-one program min xj (19) j∈N[ı]
subject to the constraints (13),(14),(15),(16) and (17) above. However, this model does not really take into consideration the geo-spatial energy consumption in the network as illustrated by Figure 2 since it discounts the best link selection in case where a choice must be made between two links to satisfy the QoS requirements. As the objective is to send data from source to the sink with the total energy transmission as minimum as possible, the choice between node j and k is an important factor upon which the performnace of the optimization model depends. 4.1 Considering a Geo-Spatial Constraint It is relevant in energy-efficient modelling of WSNs to account for geo-spatial energy consumption constraints. To illustrate our proposal, let us consider Figure 2 where the
460
A.B. Bagula and K.G. Mazandu
j
i
k
i’
Fig. 2. MCMP model Inefficiency
choice must be made between the link (ı, j) and the link (ı, k) or equivalently the node j and node k to be added to the subset N0 of N[ı] the set of the neighbors of ı, assuming that the two candidates j and k may satisfy the QoS requirement for data source. From Pythagoras’ theorem, the distance between node ı and node j is larger than that between ı and k. Combiningg Pythagoras’ theorem with the formula in (3) for energy transmission computation, one can easily find that the energy transmission between ı and j is higher than energy transmission between ı and k. This means that chosing j as best neighbor node to forward data to leads to the higher energy consumption as compared to the selection of node k. However, the MCMP model proposes an arbitrary choice between the nodes j and k as neighbor node; a random choice which is not likely to select the best node in term of minimum energy consumption. Building upon this finding, we propose a routing model referred to as Energyconstrained Multipath (ECMP) that overcomes this drawback by ensuring that data is transmitted towards the least energy consuming links. The ECMP model finds the subset N0 of the set N[ı] with the lowest expected energy transmission while satisfying the QoS requirements when delivering data from source to sink. The goal of ECMP model is then to find the subset N0 ⊆ N[ı] satisfying the QoS requirements of the data source and minimizing the total energy transmission. Indeed, denoting ω(ı, j) the energy required from a node ı to receive data and then transmits it to the node j given by the formula (4), the ECMP model assumes a neigbhor selection scheme based on the geo-spatial constraint expressed by ω(ı, j) ≤ ω(ı, j˜) | χ(ı, j) ≤ χ(ı, j˜) where χ(ı, j) is the euclidean distance between i and j. 4.2 The Routing Model Let us consider a wireless sensor network represented by a directed graph G = (N , L), where N is the set of sensor nodes and L is the set of wireless links between nodes. Suppose there exists a data source f at a given location xs sensed by the node s. This data must be routed to the base station. The data possesses a QoS requirement expressed in term of delay D and reliability R.
Energy Constrained Multipath Routing in Wireless Sensor Networks
461
Problem 1. At each node ı, find the subset N0 ⊆ N[ı] the set of neighbors of node ı that solves the following zero-one linear program min xj (20) j∈N[ı]
subject to the constraints (13),(14),(15),(16) and (17) and ω(ı, j) ≤ ω(ı, j˜) | χ(ı, j) ≤ χ(ı, j˜)
(21)
Algorithmic solution. The ECMP problem as well as the MCMP problem are deterministic linear zero-one problems. Several methods have been proposed by the literature to address such kind of problems [11, 12]. In both problems, the number of constraints is 2 |N[ı]| + 2, and the number of the decision variables is |N[ı]| which is the size of N[ı]. Thus, the problem size is relatively small and might be proportional to the node density. Building upon the zero-one framework proposed in [11], we considered an implementation where the two local routing problems MCMP and ECMP are solved using the Bala’s Algorithm but using different path selection strategies: (1) a random selection for the MCMP algorithm where at each node the next hop to the sink is selected arbitrarily among the neigbhors of the node and (2) energy-efficient selection where the closest neighbor in term of euclidean distance is selected by the node as next hop to the sink. An illustration. The main idea behind this path selection is illustrated by figures 3 and 4. Figure 3 depicts a WSN where each link is associated with two positive QoS measures expressing the reliability and delay in ms. This figure depicts a WSN where data from the source node 0 to the base station (sink node) 10 is routed under two QoS constraints: (1) delay ≤ 80 ms and (2) reliability ≥ 0.9. The tree of eventual paths generated by the MCMP and ECMP algorithms is depicted by Figure 4. To each node ı of that tree is associated the node state Sı = (x0 , x1 , x2 , x3 , x4 ) describing the QoS of the eventual path segment followed by data from the source node 0. The components of the node state are described as follows: – x0 is set to the value 1 if when a node satisfies the requirement to be used as relay node or the node is the source or the sink node. It is set to the value 0 otherwise. – x1 is the minimum hop count from the given node to the sink. – x2 is set to 0 where the routing process stops and 1 otherwise. – x3 is an indication on the delay achieved by the data entering at the given node. – x4 is set to the value 1 if the node satisfies reliability and delay requirements and 0 otherwise. The tree of eventual paths followed by the data from the source 0 to the sink 10 shows that three link disjoint paths can be found, namely 0 → 1 → 4 → 9 → 10, 0 → 2 → 4 → 7 → 10 and 0 → 3 → 4 → 8 → 10, with reliability of 0.5814, 0.4277 and 0.618 respectively. While all the three paths meet the delay requirement, none of them satisfy the reliability constraint (reliability ≥ 0.9). However, when taken together as a set of three node-disjoint paths, the three paths achieve a reliability value 0.902 allowing data to reach the sink with the required reliability 0.9. When selecting the smallest set of neigbhors of 2 satisfying the reliability constraint, its can be observed that at node 2 both the ECMP and MCMP models will have to
462
A.B. Bagula and K.G. Mazandu
1
5
(0.75,20)
(0.99,20)
0
5,
15
)
(0
(0.6,20)
4
(0.
.5
,4
5)
0) 5,1
.9
(0.75,20)
3
,1
0)
(0.
8
(0.
5)
(0.8,10)
(0
(0
98 ,4
4
8,
.9 (0
5)
5,4 0)
7
(0
.9
5,
)
10
Sink )
25
, 95
.
(0
(0.75,35)
6
35
(0.9,30)
(0.8,10)
.8
4
, .5
(0.
0) 9,1
(0.8,10)
(0
(0.8,10)
Source
2
(0.8,10)
(
1 8, 0.
(0.8,10)
0)
)
,35
(0.5
(0.8,10)
(0
.8 5, 10 ) 5)
8
(0.75,35)
9
Fig. 3. Illustration
(1,3,1,10,1)
1
(1,2,1,20,1)
4
(1,1,1,30,1)
9
10 (1,0,0,55,0)
(1,2,1,40,1)
(1,1,1,50,1)
(1,3,1,0,1)
0
(1,3,1,20,1)
2
4
3 (1,3,1,15,1)
5
6
(1,2,0,65,1)
(1,2,0,65,1)
7
10 (1,0,0,80,0)
4 (1,2,1,25,1)
(1,1,1,35,1)
8
10 (1,0,0,70,0)
Fig. 4. Tree of eventual paths
choose the node 4 and select two nodes among nodes 5, 6, 8 and 9 to be added to the smallest set of neighbors of 2 to meet the QoS constraints. While our ECMP model will pick up the nodes 5 and 6 which lead to minimum energy consumption since their
Energy Constrained Multipath Routing in Wireless Sensor Networks
463
distances to the node 2 are small compared to the node 8 and 9, the MCMP model will implement an arbitrary choice which will not necessary lead to selecting the least energy consuming neigbhor nodes 5 and 6. By selecting nodes 8 and/or 9 for example, the MCMP model will increase energy consumption. It should also be observed that though providing the potential of findind similar paths under loose reliability constraints, the ECMP and MCMP models will perform differently under stringent reliability constraints. This is the case for example when the reliability threshold is reduced from 0.9 to 0.75. This will lead the ECMP model to select the paths 0 → 3 → 4 → 8 → 10 and 0 → 2 → 4 → 7 → 10 while the MCMP model selects 0 → 3 → 4 → 8 → 10 and makes an arbitrary choice between paths 0 → 1 → 4 → 9 → 10 and 0 → 2 → 4 → 7 → 10 in order to satisfy the reliability requirement. Such an arbitrary choice is not likely to choose the path 0 → 2 → 4 → 7 → 10 which is less energy consuming compared to the path 0 → 1 → 4 → 9 → 10.
5 Performance Evaluation In this section, we evaluate through experimentation the efficiency of the ECMP model with those of baseline single path (SP) routing, MCMP and Link-Disjoint Paths Routing (LDPR) models in terms of several performance parameters.These include the average energy consumption, delivery ratio and average data delivery delay. The LDPR is a link-disjoint algorithm where the number of paths used is function of reliability requirements: the higher the reliability required, the higher the number of paths used. The LDPR model is an ideal routing model similar to the “GOD routing” model in [6] which assumes that each sensor node is aware of the instantaneous link delay and reliability and has complete knowledge of the network topology. – Average energy consumption indicates the average energy consumption in transmission and reception of all packets in the network. This metric reveals the efficiency of an approach with respect to the energy consumption. – Delivery ratio is one of the most important metrics in real-time applications which indicates the number of packets that meet a specified QoS level. It is the ratio of successful packet receptions referred to as received packets to attempted packet transmissions referred to as sent packets. – Average data delivery delay is the end-to-end delay experienced by successfully received packets. In addition, we compare the quality of paths used by the MCMP and ECMP models in terms of (1) path length (number of hops of paths used) (2) path multiplicity (average number of paths used to send data to the base station) and (3) path usage showing how often a model uses its preferred paths: a measure of the stability of the model. While the path length gives an indication on QoS since using longer paths lead to higher delays, the path multiplicity reveals energy consumption since an algorithm which shares data over a lower number of paths will consume less energy.
464
A.B. Bagula and K.G. Mazandu
5.1 Experimental Setup We consider a wireless sensor network where 50 sensor nodes are randomly deployed in a sensing field of 100m × 100m square area and the transmission range is 25m. 10 among the 50 sensor nodes are selected randomly to generate data. We adopt a scenario where link reliability and delay are randomly chosen to assess the worst case where link delay and reliability change suddenly at any transmission instant. The reliability values are uniformly distributed in the range [0.9, 1] and the delay in the range [1, 50] ms. Note that the delay includes queueing time, transmission time, retransmission time and propagation time. The delay constraints are taken in the range of [120, 210] ms with an interval of 10 ms, which produces 10 delay requirement levels and the threshold of reliability is set to 0.5. Both parameters α and β are set to 95%. The size of a data packet is 150 bytes and a packet is assumed to have an energy field that is updated during the packet transmission to calculate the total energy consumption in the network. To achieve 10 trials for each experiment, we applied different random seeds to generate different network configurations. Each simulation lasted 900 sec. 5.2 Experimental Results The experimental results are presented in figures 5(a)-5(b) respectively for the delivery ratio and data delivery delay while figures 6(a)-6(d) depict the network energy consumption. Figures 7(a)-7(c) reveal the quality of paths in terms of path length, multiplicity and usage. 1
60
50
Avg. packet delay(ms)
On-time packet delivery ratio
0.8
0.6
0.4
40
30
20
0.2 10
SP routing MCMP routing ECMP routing LDPR routing 0 120
130
140
150
160 170 Delay requirement(ms)
180
190
200
(a) On-time Packet Delivery Ratio
210
0 120
SP routing MCMP routing ECMP routing LDPR routing 130
140
150
160 170 Delay requirement(ms)
180
190
200
210
(b) Avg. End-to-End Packet Delay
Fig. 5. Delivery Ratio and Data Delay Comparison
In term of delivery ratio, ECMP and MCMP models perform equally, and both models outperform single path routing as shown by Figure 5(a). As expected, the LDPR model achieves the best performance since it assumes that each sensor node has complete knowledge of the network topology. The slight difference of average end-to-end delay between ECMP and MCMP models is due to the fact that the paths used by the two models are different in term of number of hops as depicted by the route lengths in Figure 7(a). Looking at the total energy consumed in the network, we found that the ECMP model, as expected, performed better compared to MCMP model as illustrated by figures 6(b) and 6(d). On the other hand, figures 6(a) and 6(c) reveal that MCMP highly
Energy Constrained Multipath Routing in Wireless Sensor Networks 0.019
465
0.0145
0.0185 0.0144
Avg. energy consumption (n=2)
Avg. energy consumption (n=2)
0.018 0.0175 LDPR routing MCMP routing
0.017 0.0165 0.016 0.0155
0.0143
0.0142
0.0141
0.015 0.0145
MCMP routing ECMP routing
120
130
140
150
160 170 Delay requirement(ms)
180
190
200
0.014 120
210
(a) Enerergy for LDPR and MCMP (n = 2)
130
140
150
160 170 Delay requirement(ms)
180
190
200
210
(b) Energy for MCMP and ECMP (n = 2)
0.016
0.0122
0.0155
0.01215
Avg. energy consumption (n=4)
Avg. energy consumption (n=4)
0.015
0.0145 LDPR routing MCMP routing
0.014
0.0135
0.013
0.0125
0.0121
0.01205
0.012
0.01195
0.0119 MCMP routing ECMP routing
0.012 120
130
140
150
160 170 Delay requirement(ms)
180
190
200
210
(c) Energy for LDPR and MCMP (n = 4)
0.01185 120
130
140
150
160 170 Delay requirement(ms)
180
190
200
210
(d) Energy for MCMP and ECMP (n = 4)
Fig. 6. Energy Efficiency Comparison
Route Lengths
Nr of Routes used by OD pairs 0.4
MCMP routing ECMP routing
0.35 Percentage of OD pairs
Percentage of Routes
25 20 15 10 5 0
Usage of the most used route 100
MCMP routing ECMP routing
80
0.3
Percentage of OD Pairs
30
0.25 0.2 0.15 0.1
ECMP MCMP
60
40
20
0.05
1
2
3
4 5 Nr of Links
6
(a) Path Lengths
7
0
0
2
3
4 5 Nr of Routes Used
6
(b) Path Multiplicity
7
0−10 11−20 21−30 31−40 41−50 51−60 61−70 71−80 81−9091−100 Percentage Intervals
(c) Path Usage
Fig. 7. Quality of paths
ouperforms the LDPR model in terms of energy consumption. These results are in agreement with Table 1 which reveals the percentage of paths which are identical to both algorithms (Strong correspondence), the number of paths where both algorithms differ by one hop (weak corresponadence), and the percentage of paths used by ECMP only and those used by MCMP only. This table reveals that the MCMP algorithm shares its traffic on more paths than the ECMP algorithm: while there is no route used by
466
A.B. Bagula and K.G. Mazandu Table 1. Path correspondance Strong Correspondence Weak Correspondence Routes used only by ECMP Routes used only by MCMP
58.86% 25.69% 0.00% 15.45%
the ECMP algorithm only, the MCMP algorithm has 15.45% more routes than ECMP. Consequently, by using smaller path sets, the ECMP algorithm can achieve more energy savings compared to the MCMP model. This relative efficiency applies also to the ECMP model when compared to the LDPR model. The results in Figure 7(a) reveal that in general the ECMP model uses longer paths (in terms of number of hops) compared to the MCMP model. Thus, the paths used by ECMP model are more likely to lead to higher end-to-end delays. However this is balanced by the impact of path multiplicity revealing that the ECMP model uses smaller path sets resulting in lower energy consumption. This justifies the results depicted by the Figure 5(b) on average end-to-end packet delay where ECMP and MCMP achieve similar performance. Finally, the two models use approximatively 99.6% single paths, and when these algorithms start using more than one path, the results depicted by Figure 7(b) reveal that the ECMP model uses smaller path sets compared to the MCMP model. Thus the MCMP model tends to consume more energy than the ECMP model. This is in agreement with the design of each of these models and justifies the results in Figures 6(b) and 6(d) concerning the network energy consumption. The results depicted by Figure 7(c) on the route usage reveal that the ECMP model uses its preferred paths more often than the MCMP model. This reveals the stability of the ECMP model compared to the MCMP model.
6 Conclusion In this paper we analyzed the issue of using multi-path routing in wireless sensor networks and proposed the Energy-constrained Multi-Path routing (ECMP), an improvement to the MCMP model proposed by [6]. The main idea driving the ECMP model is that in the context of wireless sensor networks, efficient resource usage should reflect not only efficient bandwidth utilization but also a minimal usage of energy in its strict term. While, the MCMP model routes the information over a minimum number of hops, the strength of the ECMP model lies in the fact that it trades between minimum number of hops and minimum energy by selecting a path with minimum number of hops only when it is the path with minimum energy or a longer path with minimum energy satisfying the constraints. Using the ECMP algorithm, we show that QoS support in wireless sensor networks should be based on well defined constraints to avoid unnecessary energy consumption when delivering data. The efficiency of the proposed model is evaluated through simulation results revealing that ECMP outperforms MCMP.
Energy Constrained Multipath Routing in Wireless Sensor Networks
467
References 1. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless Sensor Networks: a Survey. Computer Networks 38(4), 393–422 (2002) 2. Liu, J., Li, B.: Distributed Topology Control in Wireless Sensor Networks with Asymetric Links. In: The Proceedings of IEEE Globecom (2003) 3. Dulman, S., Wu, J., Havinga, P.: An Energy Efficient Multipath Routing Algorithm for Wireless Sensor Networks. In: Proceedings of the Wireless Communications and Networking Conference (2003) 4. Ganesan, D., Govindan, R., Shenker, S., Estrin, D.: Highly-Resilient, Energy-Efficient Multipath Routing in Wireless Sensor Networks. ACM SIGMOBILE Mobile Computing and Communications 1(2) (2001) 5. Barrenechea, G., Servetto, S.: Constrained Random Walks on Random Graphs: Routing Algorithms for Large Scale Wireless Sensor Networks. In: Proc. of the 1st ACM International Workshop on Wireless Sensor Networks and Applications (2002) 6. Huang, X., Fang, Y.: Multiconstrained QoS Multipath Routing in Wireless Sensor Networks. ACM Wireless Networks (WINET) (2007) 7. Li, W., Cassandras, C.G.: A minimum-Power Wireless Sensor Network Self-Deployment model. In: WCNC 2005 - IEEE Wireless Communications and Networking Conference, vol. 1, pp. 1897–1902 (March 2005) 8. Trivedi, K.S.: Probability and Statistics with Reliability, Queuing and Computer Science Applications. John Wiley & Sons, Chichester (2002) 9. Misra, S., Reisslein, M., Xue, G.: A Survey of Multimedia Streaming in Wireless Sensor Networks. IEEE Communications Surveys and Tutorials (2007) 10. Leung, K.K., Klein, T.E., Mooney, C.F., Haner, M.: Methods to Improve TCP Throughput in Wireless Networks with High Delay Variability. In: 60th Vehicular Technology Conference (VTC), vol. 4, pp. 3015–3019. IEEE (September 2004) 11. Taha, H.A.: Integer Programming: Theory, Applications, and Computations. Academic Press (1975) 12. Kocay, W., Kreher, D.L.: Graphs, Algorithms, and Optimization. Chapman & Hall/CRC Press (2005); ISBN 1-58488-396-0
Controlling Uncertainty in Personal Positioning at Minimal Measurement Cost Hui Fang1 , Wen-Jing Hsu1 , and Larry Rudolph2 1
Singapore-MIT Alliance, Nanyang Technological University Nanyang Avenue, Singapore 639798 [email protected], [email protected] 2 Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA [email protected]
Abstract. One interesting scenario in personal positioning involves an energy-conscious mobile user who tries to obtain estimates about his positions with sufficiently high confidence while consuming as little battery energy as possible. Besides obtaining estimates directly from a position measuring device, the user can rely on extrapolative calculations based on a user movement model and a known initial estimate. Because each measuring probe usually incurs a substantially higher cost than the extrapolative calculation, the objective is to minimize the overall cost of the measurement probes. Assuming that the user moves at a normally-distributed velocity, we consider two scenarios which differ in the probing devices used. In the first scenario, only one probing device is used. In this case, the aim is to minimize the total number of probes required. In the second scenario, two types of positioning devices are given, where one type of devices offers a higher positioning precision, but also at a greater probing cost. In this case, the aim is to choose an optimal combination of probes from the two types of devices. For both scenarios, we present algorithms for determining the minimumcost probing sequences. The algorithms are computationally efficient in reducing the searching space of all possible probing sequences. Our approach is based on Kalman Filtering theory which allows to integrate estimates obtained from the measurements and the extrapolative calculations. The variances in the estimates can provably stay below the specified level throughout the journey. To the best of our knowledge, these results appear to be the first that uses a mathematically rigorous approach to minimize the probing cost while guaranteeing the quality of estimates in personal positioning.
1
Introduction
With the fast growth of personal mobile devices, people are developing diverse methods to provide efficient and accurate positioning services which are useful in many potentially interesting position-based applications, see e.g. Bhattachary and Das [2] or D’Roza et al. [4]. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 468–481, 2008. c Springer-Verlag Berlin Heidelberg 2008
Controlling Uncertainty in Personal Positioning
469
Personal positioning refers to obtaining an estimate of a mobile user’s actual position at a given point in time by using individualized means. Such estimates are usually obtained either from measurements by using a positioning device such as GPS (Global Positioning System), or, alternatively, through calculations based on a predictive model of the user’s movements, stretching from an existing estimate. Naturally, due to inherent non-determinism in the user movements, the extrapolative calculations accrue uncertainty over time, and a new probe will be needed to control the uncertainty. Now because each measuring probe usually incurs a substantially higher cost than the extrapolative calculation, the objective is to minimize the overall cost of the measurement probes over a journey while keeping the uncertainties in any estimate below an acceptable level. Although there are already many positioning algorithms based on GPS [5,4,8,1] or cell ID of the base stations for mobile phones [2,14,13] separately, there are very few existing results that combine positioning information obtained from multiple types of positioning devices. Most cell-based positioning methods consider only the concept of the logical positions instead of the geographical positions, see, e.g. [2]. One naturally wonders whether information from multiple sources can be combined to provide a cost-effective and consistent positioning service. Elsewhere in [12], a system in which the positioning records from multiple sources are merged for the analysis of a mobile device’s motion pattern was described. For lack of absolute knowledge about the actual position value, in this paper, a positioning estimate is represented as a Gaussian distribution whose mean is interpreted as the estimated position, and the variance as a measure of the uncertainty of the estimate. With this formulation, we are able to make use of Kalman Filter [6] to combine position estimates obtained from different sources. The problem is now reduced to finding probing sequences that will minimize the probing cost while keeping the variances 1 in the estimates below a required level. Assuming that the user moves at a normally-distributed velocity, we consider two scenarios which differ in the probing devices used. In the first scenario, only one probing device is used. In this case, the aim is to minimize the total number of probes required. In the second scenario, two types of positioning devices are given, where one type of devices offers a higher positioning precision, but also at a greater probing cost. In this case, the aim is to choose an optimal combination of probes from the two types of devices. For both scenarios, we present computationally efficient algorithms that can determine a required optimal probing sequence. The algorithms are based on Kalman Filtering theory which allows to integrate estimates obtained from the measurements and the extrapolative calculations. The variances in the estimates can provably stay below the specified level throughout the journey. To the best of our knowledge, these results appear to be the first that uses a mathematically rigorous approach to minimize the probing cost while guaranteeing the quality of estimates in personal positioning. The remainder of the paper is organized as follows. Section 2 presents the formulation and notations. Our position inference method is presented in Section 3. 1
Equivalently, standard deviations.
470
H. Fang, W.-J. Hsu, and L. Rudolph
Then we discuss how to control the variances of the inferred estimates in Section 4. In Section 5 the optimal probing strategy with the minimum cost is achieved. Section 6 introduces the previous work related to personal positioning. Finally in Section 7 we summarize the results and conclude with open problems for future work.
2
Definitions and Assumptions
An estimate of the device’s actual position z will be modeled by a 2-D Gaussian random variable written as z ∼ N (μ, σ 2 · I),
(1)
where μ denotes its mean and σ 2 ·I denotes its covariance matrix. By assumption of Gaussian distribution, given a measurement z, the most likely estimate of user’s actual position is μ. Moreover, the variance σ 2 reflects how likely user’s actual position is around the best estimate. The relationship between the region of high probability and the variance is given by Theorem 0, as shown in [3]. Theorem 0. Confidence ellipse: Let x ∼ N (0, σx2 ), y ∼ N (0, σy2 ), and x, y be independent of each other. The probability that (x, y) lies in the area E = 2 2 2 {(x, y) : σx2 + σy 2 ≤ r2 } is given by P = 1 − e−r /2 . x y E is called P-confidence ellipse, which is centered around the mean. For our case in which σx = σy = σ, the ellipse is a circle whose radius is given by rσ where r=
3
1 2 ln 1−P . In practice, the confidence P is often set to be a constant 95%.
Position Inference by Using Kalman Filter
A common scenario in personal positioning arises when the mobile user embarks on a journey at a variable velocity, and he would like to ensure that during the journey any estimate about his position can be made with a sufficiently low variance. The position estimates may be obtained by carrying out a number of measurement probes in conjunction with calculations based on the user velocity model. Kalman filtering [15] provides a convenient framework for describing mobile user’s movement model and making predictions given the measurements. Since Richard Kalman discovered it in 1960s, the Kalman filtering (KF) has been widely used in many areas such as navigation, manufacturing, and dynamical control etc. A Kalman filter is a recursive data processing algorithm that estimates the state of a noisy linear dynamic system. It processes all available measurements to estimate the state, including both accurate and inaccurate measurements. It uses knowledge of the system and sensor dynamics, probabilistic descriptions of the system and measurement noises, and any available data about the initial values of the state to achieve the estimate while minimizing the mean squared error between the predicted value and the measured value. Here we will adapt the formalism to suit our specific application.
Controlling Uncertainty in Personal Positioning
471
Our aim is to infer the device’s position at any point within a period of time T. Let ti denote the point in time where the i-th probe is carried out. At time tk , the predicted position estimate xk , is given by xk = xk−1 + τk−1 vk−1 + τk−1 wk−1 ,
(2)
and a measurement, zk , is given by zk = xk + rk ,
(3)
where vk = [vlon (k), vlat (k)]T denotes the velocity, and τk−1 = tk − tk−1 . Specifically, the velocity is obtained from the mobile user’s history statistically which results in a noise wk . The random variables wk and rk represent the velocity noise and measurement noise, respectively. They are assumed to be white noises2 , normally-distributed and independent of each other. 2 wk ∼ N (0, σw I),
(4)
rk ∼ N (0, σr2k I),
(5)
In practice, the standard deviation of velocity noise σw might also change with each time step. Here, however, we assume it to be a constant. Using Kalman filter (KF for short), we can combine two (or more) estimates that are obtained either from predictions or measurements. Given two concurrent independent estimates, the KF method can combine them to generate a new estimate; moreover, the new estimate is still Gaussian [15]. Theorem 1 states this fact more precisely. Theorem 1 (KF-1). Let N (μ, V1 ) and N (ν, V2 ) be x’s probability density functions (PDF) conditional on two independent estimates respectively. Then x’s PDF conditional on μ and ν is a Gaussian distribution N (ˆ x, V ) where
and, in (7),
x ˆ = μ + Q(ν − μ),
(6)
V = V1 − Q · V1 ,
(7)
Q = V1 (V1 + V2 )−1 .
(8)
From Theorem KF-1, it can be shown that the variance of the resulting estimate will decrease after merging the two estimates, i.e., the certainty on the position estimate is improved with more position information. Specifically, Figure-1 depicts two measurements indicating positions μ = (1, 1) and ν = (5, 5), and their levels of uncertainty represented by the standard deviations are 2 and 1, respectively. Given two measurements, a new estimate indicates the position (4.2,4.2) with a higher certainty (σ = 0.89). The estimated position is closer to ν than to μ because the measurement ν is more certain than μ. 2
A white noise is a random signal with a flat power spectral density, i.e., the signal’s power spectral density has equal power in any band, at any center frequency, having a given bandwidth.
472
H. Fang, W.-J. Hsu, and L. Rudolph
Fig. 1. The two ellipses centered at (1,1) and (5,5) respectively represent two concurrent measurements with the radii reflecting their standard deviations, i.e., σ1 = 2 and σ2 = 1. By using KF-1, a new estimate can be obtained from merging the two pieces of information, which is shown as the dotted circle, centered at (4.2,4.2) with σ = 0.89.
Theorem KF-2 below says that, unless new information is obtained from measurements, the variance of an estimate derived purely from predictions will degrade over time because of variations in velocity. Theorem 2 (KF-2). Let N (xk , Vk ) be the position of the object concerned at the time tk . Assuming the probabilistic mobility model described in (2), the position estimate N (xk+1 , Vk+1 ) at the time tk+1 is given by xk+1 = xk + τk vk ,
(9)
Vk+1 = Vk + τk Vw .
(10)
KF-1 and KF-2 together form the basis of our position inference method which treats both the predictions and measurements uniformly and minimizes the mean square error between prediction and measured position. The complete position estimation algorithm is summarized as follows. (1) Prediction update: x ˆ− ˆk−1 + τk−1 vk−1 , k = x
(11)
Vk− = Vk−1 + τk−1 Vw .
(12)
Kk = Vk− (Vk− + Vrk )−1 ,
(13)
(2) Measurement update:
x ˆk =
x ˆ− k
+
Vk = (I
Kk (zk − x ˆ− k ), − − Kk )Vk .
(14) (15)
Controlling Uncertainty in Personal Positioning
4
473
Analysis of Probing Strategies
In this section, we will analyze several probing strategies. Here we will assume that the cost of each probe is a constant.3 The first result in Section 4.1 concerns a simple strategy where the probes are carried out periodically. We show that if the probes continue, the variance will eventually converge towards a fixed point independent of the initial estimate. The periodic probing, however, is not optimal in terms of probing cost. Intuitively, minimizing the probing cost implies maximizing the utility of a probe. In other words, one should always delay the probe until the variance of an estimate degrades to the threshold. The second result in Section 4.2 confirms this intuition. When two types of positioning devices are available, it turns out that all the minimum-cost strategies are equivalent to one of two generic patterns. We then prove that the optimal strategy can be found within a relatively small solution space. Section 4.3 covers the details of the algorithm to find the optimal. 4.1
What Happens If We Carry Out a Probe Periodically?
We will show that if the same type of probes are repeated at a regular time interval, the standard deviation obtained in the positioning estimates will converge to a finite value. In order to prove this result, we firstly introduce Lemma 3 below. Lemma 3. Let f (x) =
a
2
1+ x2a+b2
, where a, b, x ≥ 0. Then the function y = f (x)
has the following properties: 1. y = f (x) is monotonically increasing, and √aab ≤ f (x) < a. 2 +b2 √ 2 2 2 2. There exists a fixed point x∗ = −b +b 2b +4a , such that f (x∗ ) = x∗ 3. Let f n+1 (x) = f (f n (x)), f 1 (x) = f (x). For any x ≥ 0, f n (x) → x∗ when n → ∞. Proof. The result can be obtained by calculating the first- and second-order derivatives of f (x). √ Figure-2 shows the function f (x) mentioned in Lemma 3. Notice that f ( 3) = √ 3 in this example. Let σ denote the threshold value of the standard deviation required for the journey. Theorem 4 below determines the converging value of estimate of the measurements are carried out at regular intervals. Theorem 4. Let xk ∼ N (., σk2 ) be a discrete Kalman filter process defined in Section 3. Assume that only one type of probes (σr ) is carried out at the same 3
The energy consumption of a measurement is intricately related to many devicedependent factors such as the types of circuitry, memory access patterns, and computing capabilities etc. Here we will simply assume that a probe entails a constant amount of battery energy.
474
H. Fang, W.-J. Hsu, and L. Rudolph
Fig. 2. The curve of function f (x) = √ √ and f ( 3) = 3.
a
2
1+ 2a 2 x +b
with a = 2, b = 3. Notice that x∗ =
√
3
time interval (i.e., τ = tk − tk−1 ). Then for any initial estimate with standard deviation σ0 , the standard deviation of the estimate will converge to a finite value: lim σk = σ ∗ (τ, σr , σw ). k→∞
2 Proof. Consider xk∼ N (.,σk2 ), the estimate at the time tk , and xk+1 ∼ N (., σk+1 ), the estimate at the time tk+1 . The estimate xk+1 is calculated based on its immediate previous estimate xk and the measurement z ∼ N (., σr2k+1 ) carried out at the time tk+1 . Let τk = tk+1 − tk . According to the prediction update equations (15) and (16), the standard deviation, σt updated periodically is given by 2 σt2 = σk2 + τk σw .
Combining the latest estimate with the current measurement z, we have 2 1/σk+1 = 1/σt2 + 1/σr2k+1 .
σk+1 =
σrk+1
1+
σr2
.
(16)
k+1 2 +σ 2 τ σk w k
√ By assumption, σrk ≡ σr , τk ≡ τ . Let a = σr , b = σw τ . Then σk+1 = f (σk ), where the function f (·) is as defined in Lemma 3. The result is immediately obtained by applying Lemma 3, which completes the proof. Corollary 5. Let σk , σrk , τk , and σw be as defined in Theorem 4. Either (1) σ4 σrk+1 ≤ σk , or (2) τk ≤ σ2 (σ2 k −σ2 ) is a sufficient condition for σk+1 ≤ σk . w
rk+1
k
Controlling Uncertainty in Personal Positioning
475
Proof. Let σk+1 ≤ σk in the equation (20) in the proof of Theorem 4. We have the sufficient condition 2 σw (tk+1 − tk ) ∗ (σr2k+1 − σk2 ) ≤ σk4 .
(17)
If σrk+1 ≤ σk , the left hand side of the inequality (21) is less than or equal to zero. Otherwise the condition (2) of Corollary 5 will imply the inequality (21) above. This completes the proof. Remarks. Corollary 5 reflects two requirements of the new probe. Firstly, when the new probe itself has sufficiently low variance (i.e. φi+1 < σi ), the standard deviation of the estimate at the next time step will not increase. Secondly, if the new probe does not have sufficiently low variance, then the time elapsed before the new probe must be a shorter period of time as is determined by Corollary 5. It also suggests a strategy to choose a proper frequency to carry out new probes periodically on a journey. Corollary 6. Let σk , σw , σr = σrk , and σ be as defined in Corollary 5. If probes are carried out at regular time intervals, then the probing frequency must be at σ2 (σ2 −σ2 ) least f = w σr4 such that σk ≤ σ. Proof. Let τk = 1/f, σk = σ. From Corollary 5 we have σk+1 ≤ σk ≤ σ. That implies all the inferred standard deviations are within the threshold. This completes the proof. Corollary 6 implies that, using the periodic probing strategy, the ratio of freσ2 −σ2 quencies of type-A probes to type-B probes should be proportional to σa2 −σ2 . b Figure-3 shows the standard deviation of the estimate increases over time. Without new measurements, the standard deviation will increase beyond the threshold. A probe serves to bring down the standard deviation. With periodically carried out probes, the uncertainty of the position estimates can be controlled within the required threshold.
Fig. 3. The curve of the estimated σ over time t. Whenever the uncertainty of the estimate increases to the threshold, a probe can bring it down.
476
4.2
H. Fang, W.-J. Hsu, and L. Rudolph
When Is Best to Carry Out a Probe?
For convenience, we call a time duration safe if within the period the variance of any calculated positioning estimate stays below the required threshold value. Firstly we will determine the longest safe duration with a given number of probes. Let σ0 be the standard deviation of the initial estimate at t = 0, where σ0 ≤ σ. Let τ be the length of the safe duration without new probes before the standard deviation of any prediction reaches the threshold σ. Lemma 7. By using only one probe σr at a point of time t ∈ [0, τ ], the safe duration is longest only when the probe is carried out at the point t = τ . Proof. By using only prediction update, at time t the standard deviation is given by σ1 =
2 t. σ02 + σw
Combining σ1 with a new probe σr by KF-1, we can obtain a new estimate with standard deviation σ12 σr2 σ2 = . σ12 + σr2 With the new estimate, it is possible to obtain prediction update for another duration t2 before the estimated standard deviation degrades to the threshold σ, or, formally, 2 σ 2 = σ22 + σw t2 . So the total duration after inserting a new probe is 2 t + t2 = σ 2 /σw −
2 σr2 /σw
1+
σr2 2 t σ02 +σw
+ t,
which is a function of t. Calculating its derivative on t shows that the overall duration monotonically increases with regard to t, and the maximum value is attained at the point t = τ . This completes the proof. Remarks. Lemma 7 states that the longest safe duration is achieved by doing probes exactly at the end of a safe duration when the variance of the estimate based on prediction updates reaches the threshold. Corollary 8. Given the same initial estimate, the safe duration that can be sustained via two simultaneous probes is shorter than that of two serial probes. Proof. According to Lemma 7, the maximum valid duration is achieved only when the new probe is carried out at the end of the first probe’s safe duration. Corollary 8 implies that, to maximize the utility of a given number of probes, carrying out the x probes sequentially will be more effective than carrying out the x multiple probes simultaneously. In other words, the probes should be carried out one at a time, each time exactly at the point when the variance of a prediction based on the previous estimate is about to exceed the threshold.
Controlling Uncertainty in Personal Positioning
4.3
477
What If Two Types of Positioning Devices Are Available?
From the foregoing discussions, we can determine the duration before the next probe is carried out. We will present a min-cost variance-maintaining strategy with two types of devices. Let σ be a required threshold level of the positional variance. Let σa (resp. σb ) denote the standard deviation of an estimate from measurement offered by type-A (resp. type-B) device, where σa < σb < σ. Let c and 1 denote the cost for each type-A and type-B probe respectively, where c > 1. Given a duration of time T , we will find the minimum-cost probing sequence by determining the number of type-A probes and type-B probes required over T . Lemma 9. Any probing strategy with the minimum cost is equivalent to a probing sequence in the following two forms: A...AB...B, or B...BA...A. Proof. Each of the probes in a min-cost probing sequence must extend a maximum safe duration, i.e. a probe will be carried out exactly at the point when the variance of a predictive estimate reaches the threshold. Assume that a total of d probes is required. Then the first d − 1 probes can be permuted in any order, because the overall safe duration is still the same. This completes the proof. Let τa denote the duration that one type-A probe can sustain until the standard deviation again degrades to the threshold σ, and, correspondingly, τb for type-B. Let N1 = T /τa , and N2 = T /τb . Lemma 10. A minimum-cost probing sequence can be obtained by comparing at most N1 + N2 candidate probing sequences. Proof. Suppose that the best strategy is in the form A...AB...B in which there are n1 As and n2 Bs. Obviously, 0 ≤ n 1 ≤ N 1 , 0 ≤ n2 ≤ N 2 1 τa In fact, when given a value of n1 , n2 is given by n2 = T −n . τb So the total cost T − n1 τa C(n1 ) = cn1 + n2 = cn1 + τb
The min-cost sequence (n∗1 , n∗2 ) is the one with minimal cost, i.e. C(n∗1 ) = minn1 C(n1 ), which can be determined by evaluating each value of n1 , where 1 ≤ n1 ≤ N1 . A min-cost probing strategy in the form B...BA...A can be determined by examining N2 cost values. A min-cost sequence can thus be determined by comparing a total of N1 + N2 sequences, as claimed.
478
H. Fang, W.-J. Hsu, and L. Rudolph
For example, let c = 2.4, τa = 2.2, τb = 1.0, T = 15.3. Then by calculating cost for each pair (n1 , n2 ), it shows that the minimum cost 15.8 is achieved when n1 = 2 , i.e. via the strategy AAB...(11 Bs)...B. Theorem 11. Let σ denote the standard deviation of the initial estimate. The minimum-cost strategy can be determined by comparing at most min(N1 , N2 ) candidate strategies. Proof. Let n1 , n2 denote the number of type-A and type-B probes respectively. Then the total cost is cn1 + n2 . We will prove that the minimum-cost can be achieved by strategies in both forms as claimed in Lemma 9. Case 1: The optimal probing sequence is in the form A...AB...B. In this case, 1 τa we can check the pair (n1 , n2 ), where n1 = 0, 1, ..., N1 and n2 = T −n , to τb ∗ ∗ ∗ obtain the best strategy (i1 , j1 ). Let C1 denote the resulting minimum cost. Case 2: The optimal probing sequence is in the form B...BA...A. In this case, 2 τb we can verify the pair (n1 , n2 ), where n2 = 0, 1, ..., N2 and n1 = T −n , to τa ∗ ∗ ∗ obtain the best strategy (i2 , j2 ). Let C2 denote the resulting minimum cost. We will show that C1∗ = C2∗ . From case 1 we have T − i∗1 τa j1∗ = , τb which implies that T − τa i∗1 T − τa i∗1 ≤ j1∗ < + 1. τb τb It can be rewritten as τb T − τb j1∗ i∗1 − < ≤ i∗1 , τa τa which implies that T − τb j1∗ ≤ i∗1 . τa Let ˆi =
T −j1∗ τb , τa
ˆ ai and ˆj = T −τ τb . Then
ˆi ≤ i∗1 . The strategy (ˆi, ˆj) is one in the form of case 1. Let C1 (ˆi, ˆj) denote its cost. Then C1 (ˆi, ˆj) ≤ C1∗ . From the definition of ˆi, we have τaˆi + τb j1∗ ≥ T , which results in j1∗ ≥
T − τaˆi , τb
j1∗ ≥ ˆj. As a result, it must be true that ˆi = i∗1 . Otherwise, if ˆi < i∗1 , then we have C1 (ˆi, ˆj) < C1∗ ,
Controlling Uncertainty in Personal Positioning
479
which means that, in case 1, (i∗1 , j1∗ ) is not the minimum-cost strategy. So we have proved ˆi = i∗1 , which implies (i∗1 , j1∗ ) = (ˆi, j1∗ ) is a feasible strategy solution for case 2, resulting in C1∗ ≥ C2∗ . With a similar argument, we have C2∗ ≥ C1∗ . We conclude that C1∗ = C2∗ . The equality shows that the minimum-cost strategy can be found in either cases, which completes the proof. Remarks. Given a journey and two specific types of probes, our method can easily determine a minimum-cost probing sequence. Because our model of the user allows for uncertainties in his move, a probe will generally yield a position measurement that differs from the prediction. If this occurs, we may adaptively recompute the best probing plan based on the latest estimate.
5
Related Work
Basically personal positioning consists of two levels of understanding of mobile user’s location. One is geographical positioning which provides accurate position information such as GPS latitude-longitude. Another is logical positioning which identifies meaningful landmarks (i.e., significant locations [1]), cell IDs or IP addresses etc. Among those methods which support positionting services, GPS is probably the most developed one and can attain the location within 3 ∼ 5 meter resolution[5]. However, GPS does not work well for indoor or urban environment. As an alternative, cell-based mobile positioning is becoming popular [4,14,11]. When in the coverage area of cell tower, a mobile phone can receive/send some information including the cell identification. Consequently, a mobile user’s location may also be inferred from the cell probes though at much lower resolution. When heterogeneous position measurements are available, some data fusion techniques are applied in order to create a good location inference. In [10], the concept of hybrid predictor is proposed in order to use different methods in parallel to improve the estimates. They use prediction models based on neural networks, Bayesian model, and Markov model, and apply three hybrid predictors: the warm-up predictor, the majority predictor, and the confidence predictor. In their experiments, the hybrid predictors shows a better prediction performance than the average prediction of the individual basis methods. However, their approach treats the positioning data obtained from different sources as mathematically-incoherent entities. Another widely-used technique to combine multiple measurements is Kalman filtering [6]. Specifically [9] discusses the
480
H. Fang, W.-J. Hsu, and L. Rudolph
application of Kalman filtering on robot localization problems. The robot’s location is estimated by using several sensors which output location measurements. For personal positioning, [7] proposes a method that integrates image data from portable sensors by using Kalman filter.
6
Conclusions
We have presented a positioning inference method that can uniformly combine multiple sources of positioning information for personal positioning. We have also presented strategies for minimizing the probing cost while controlling the variances of positioning estimates over a journey. We envision that the method presented will be of use to personal positioning, and hence it will be of great interest to evaluate it on real devices. A natural extension of our current result may be to derive a power-efficient probing strategy for multiple types of probing devices and verify the proposed algorithms on real devices.
References 1. Ashbrook, D., Starner, T.: Learning significant locations and predicting user movement with gps. In: International Symposium on Wearable Computing, pp. 101–108 (2002) 2. Bhattacharya, A., Das, S.K.: Lezi-update: An information-theoretic approach to track mobile users in pcs networks. In: Mobile Computing and Networking, pp. 1–12 (1999) 3. Chung, K.L.: Elementary Probability Theory with Stochastic Processes. SpringerVerlag, New York (1974) 4. D’Roza, T., Bilchev, G.: An overview of location-based services. BT Technology Journal 21(1), 20–27 (2003) 5. Hoffmann-Wellenhof, B., Lichtenegger, H., Collins, J.: GPS: Theory and Practice, 3rd edn. Springer, New York (1994) 6. Kalman, R.E.: A new approach to linear filtering and prediction problems. Transactions of the ASME - Journal of Basic Engineering 82, 35–45 (1960) 7. Kourogi, M., Kurata, T.: Personal positioning based on walking locomotion analysis with self-contained sensors and a wearable camera. In: Proc. of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI 2003, pp. 287–292 (July 2003) 8. Manesis, T., Avouris, N.: Survey of position location techniques in mobile systems. In: Proceedings of the 7th international conference on Human computer interaction with mobile devices and services, Salzburg, Austria, vol. 111, pp. 291–294 (2005); ISBN:1-59593-089-2 9. Negenborn, R.: Robot localization and kalman filters- on finding your position in a noisy world. Master’s thesis, Institute of Information and Computing Sciences, Utrecht University (September 2003), http://www.negenborn.net/kal loc/thesis.pdf 10. Petzold, J., Bagci, F., Trumler, W., Ungerer, T.: Hybrid predictors for next location prediction. In: Ma, J., Jin, H., Yang, L.T., Tsai, J.J.-P. (eds.) UIC 2006. LNCS, vol. 4159, pp. 125–134. Springer, Heidelberg (2006)
Controlling Uncertainty in Personal Positioning
481
11. Roy, A., Das, S., Basu, K.: A predictive framework for location-aware resource management in smart homes. IEEE Transactions on Mobile Computing 6(11), 1270–1283 (2007) 12. Rudolph, L., Fang, H., Hsu, W.J.: Adaptive learning/prediction of mobile user’s location via historical records of gps and cell id information. Technical Report NTUTR-07-01, Singapore-MIT Alliance of Nanyang Technological University, Singapore (June 2007), http://www.mit.edu/∼ fanghui/loc/kfpredict.pdf 13. Salcic, Z., Chan, E.: Mobile station positioning using gsm cellular phone and artificial neural networks. Wireless Personal Communications 14(3), 235–254 (2000) 14. Trevisani, E., Vitaletti, A.: Cell-id location technique, limits and benefits: An experimental study. In: Proceedings of the Sixth IEEE Workshop on Mobile Computing Systems and Applications (WMCSA 2004), pp. 51–60 (2004) 15. Welch, G., Bishop, G.: An introduction to the kalman filter, http://www.cs.unc.edu/∼ welch
RFID System Security Using Identity-Based Cryptography Yan Liang and Chunming Rong Department of Electrical Engineering and Computer Science, University of Stavanger, Norway {yan.liang, chunming.rong}@uis.no
Abstract. RFID was first proposed as a technology for the automatic identification of objects. However, some recent RFID devices can provide additional information and can be used in other applications. Security requirement is essential in most of the applications. An essential research challenge is to provide efficient protection for RFID systems. Identity-based cryptography can provide encryption and digital signature and convenience in key management compared with traditional certificate-based PKI solution. In this paper, a security management solution for the RFID system is proposed by using identity-based cryptography.
1 Introduction The Radio frequency identification (RFID) is a technology that using radio waves for automatic identification of objects. An ordinary RFID system includes two parts, the RFID tag that attached to the objects and the reader that can read the information from the tag. RFID tag includes an integrated circuit that contains the information about the object and an antenna that can receive signal from the reader and transmit information to the reader. With the advantages of automatic identification and quick price reduction of RFID tags and readers, recently RFID technology is widely used in daily life such as transportation payments, people and animal identification, and vehicles access control. More and more big companies and organizations begin to use RFID technology to help them in managing their productions and transportations to reduce their cost. As described in [9], there are two problems about security in RFID system: privacy and authentication. One kind of attack to the RFID privacy is physical tracking, since the reader can sense the existence of the tag while at the same time the owner of the tag can not detect existence of the reader. In addition, since RFID tags can carry private information about the objects, attacker can get this information illegally by physically attacking the tag or by attacking the message transmission protocol used between the tag and reader. It is a big problem if this kind of private information of the object is divulged by the attacker. In some RFID applications, in order to reduce counterfeiting threat, tag or reader needs to authenticate that it is the really one to exchange message. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 482–489, 2008. © Springer-Verlag Berlin Heidelberg 2008
RFID System Security Using Identity-Based Cryptography
483
Identity-based cryptography [13] is a kind of public-key based cryptography [6]. The main difference between it and the traditional public-key cryptography is it uses some public identity information such as a person’s name or address as the user’s public key, and the PKG (private key generator) will generate the corresponding private key for the user. Using identity-based cryptography in the RFID system can provide a good protection for the privacy of the system and can provide authentication for both the tags and reader. At the same time, it can reduce the resource requirement of the system and can facilitate the key management. The reminder of this paper is organized as follow. In part 2, we introduce different kinds of RFID tags and the security problems related with them. In part 3, we describe the principle of identity-based cryptography. In part 4, we describe how to use identity-based cryptography in the RFID system to provide security. Part 5 is the conclusion.
2 Security in RFID Networks In recent years, with more and more RFID products have been used in the daily life, security and privacy in RFID networks become an important problem. The most important security concern in RFID network is the protection of the RFID tags. If the information in the tags is leaked, it can bring problems both for the users and company. Ari Juels gives a survey about the recent research on the RFID security and privacy in [9]. For the security design in RFID networks, considering the vast number of RFID tags, cost and power consumption of RFID tags must be taken into account. RFID tags with limited resources can only be implemented with weak cryptography algorithms thus can not resist sophisticated attacks. While powerful cryptography algorithms based RFID tags means higher cost and more power consumption. From the perspective of security, RFID tags can be classify into three kinds. One is the basic RFID tags, which can not perform any cryptography mechanism; the second is symmetric-key based RFID tags, which can perform symmetric-key cryptography mechanism; the third is the public-key based RFID tags that can support traditional public–key cryptographic operations. Since the basic RFID tags can not provide any type of cryptography, it brings a big challenge in protecting privacy and providing authentication in the system. In [7], [8], [10], some approaches have been proposed to protect the privacy of RFID tags and provide a certain kind of authentication to resist counterfeiting, but each of these approaches either easy to be attacked or brings some inconveniences to the consumers. RFID tags that use symmetric-key can increase the difficulty of tag cloning by providing some efficient authentication methods and can provide better privacy protection by using data encryption. But as shown in [14], tags equipped with symmetric-key cryptography can also be successfully cloned with little efforts and RF expertise. Key management is another problem in symmetric-key based RFID system. Since in symmetric-key systems, if a pair of users wants to exchange message between them with security, they should share a same secret key between them to encrypt and decrypt the message, thus every pair of users should share a different secret key. In a RFID system using symmetric-key cryptography, the reader needs to
484
Y. Liang and C. Rong
keep all the secret keys of the different tags, in a big system that has a large number of tags, the memory requirement of the reader should be high. In addition, if the RFID reader in the system is compromised by the attacker, all the secret keys stored in the reader and information of the tags will be divulged. Another problem about symmetric-key system is since the reader and tags have the same secret key, it can not provide digital signature service for authentication. Symmetric-key approach is appropriate for a small system, while in a RFID system with a very large number of RFID tags, it is difficult to manage the security problems. Recently, some approaches using traditional public-key system to protect RFID tags have been studied in [1], [4]. In the public-key system, every user has a pair of related keys, one public key and one private key. One key is used for encryption and the related other key is used for decryption. Besides encryption and decryption, the public-key scheme can provide digital signature function by signing the message with sender’s private key, the receiver can use the sender’s public key to verify this digital signature. Public-key scheme can also be used for key exchange, which will bring some conveniences for the key management. Another advantage for public-key cryptography used in RFID system is because only the user itself can know its private key, if the reader of the RFID system is compromised, the attacker can not get the private key information of the tags, thus the damage is easier to be recovered. Although public-key system can provide stronger security and more management efficiency for RFID applications, the high cost limits its implementation only to the high-end market, but recently, with the quick development of new technologies, the public-key security algorithm can be implemented in the RFID networks with a reasonable price [15].
3 Identity-Based Cryptography and Signature Identity-based cryptography and signature schemes was firstly proposed by Shamir [13] in 1984. But until 2001, the efficient approach of identity-based encryption schemes had been developed by Dan Boneh and Matthew Franklin [3] and Clifford Cocks [5]. These schemes are based on bilinear pairings on elliptic curves and have provable security. Identity-based cryptographic scheme is a kind of public-key based approach that can be used for two parties to exchange messages and effectively verify each other’s signatures. Unlike traditional public-key systems using the random string as the public key, in identity-based cryptography, user’s identity that can uniquely identify the user such as its name or address is used as the public key for encryption and signature verification. By using identity-based cryptography, the system complexity can be greatly reduced since the two users do not need to exchange their public or private keys and the key directory is no longer needed. Another advantage of identitybased encryption is after all users in a system have been issued with keys by the trusted key generation center, the key generation center can be removed. As mentioned above, the identity-based encryption and signature scheme is based on the public-key cryptography. Instead of generating a pair of public key and private key by the user itself, every user will use its identity as its own public key and only the trusted third party named PKG rather than the user will generate the relative
RFID System Security Using Identity-Based Cryptography
485
private key. Since if the user rather than the PKG can generate its private key that related with its public key, it can also generate other users’ private keys using their public keys, thus every user in the system can know other users’ private keys. In the identity-based cryptography approach, the PKG will creates a "master" public key and a corresponding "master" private key firstly, then the PKG will send this "master" public key to all the interested users. If one user wants to send a message with security to another user, the receiver should use its ID to authenticates itself to the PKG and get its own private key generated by the PKG. The sender will use the "master" public key and receiver’s ID to generate the receiver’s public key and use this key to encrypt the message. After receiving the encrypted message, the receiver can use its private key to decrypt this message. If one user wants to sign the message using its private key, it needs to use its ID to authenticate itself to the PKG and get its private key, then the sender can use this private key to generate a signature and attach it to the message and send both the message and the signature to the receiver. After receiving the message and the signature, the receiver can use the public key generated from the "master" public key and the ID of the sender to verify this signature. Together with its advantages, there are also some inherent weaknesses for the identity-based cryptography [2]. One problem related with identity-based cryptography is the key escrow problem. Since all users’ private keys are generated by the PKG, the PKG may therefore decrypt any user’s message and create any user’s digital signature without authorization, which means the PKG need to be highly trusted. So the identity-based scheme is only appropriate for a closed group of users such as a big company because only under this situation, the PKG that every user can trust can be set up. Another drawback of the identity-based cryptography is the revocation problem. Because all the users in the system use some unique identifiers as their public keys, if one user’s private key has been compromised, the user need to change its public key. If the public key is the user’s name, address or email address, it is inconvenient for the user to change them. One solution for this problem is to add a time period to the identifier as the public key [3], but it can not solve the problem completely.
4 Identity-Based Cryptography Used in the RFID Networks To use identity-based cryptography in the RFID system, since both the RFID tags and the reader have their identities, it is convenient for them to use their own identities to generate their public keys. The RFID system based on identity-based cryptography should be set up with the help of a PKG. We can assume that before the reader and tags enter the system, each of them has been allocated a unique identity that stored in their memory. The process for the key generation and distribution in the RFID system that using identity-based cryptography can be shown in Fig. 1 as follows: (1) PKG generates a “master” public key PUpkg and a related “master” private key PRpkg and saves them in its memory. (2) RFID reader authenticates itself to the PKG with its identity IDre. (3) If the reader can pass the authentication, PKG generates a unique private key PRre for the reader and send this private key together with PUpkg to reader.
486
Y. Liang and C. Rong
(4) When a RFID tag enters the system, it authenticates itself to the PKG with its identity IDta. (5) If the tag can pass the authentication, PKG generates a unique private key PRta for the tag and send PRta together with PUpkg and the identity of the reader IDre to the tag.
Fig. 1. Key generation and distribution
After this process, the reader can know its private key PRre and can use PUpkg and its identity to generate its public key, every tag entered the system can know its own private key and can generate public key of its own and public key of the reader. If a RFID tag is required to transmit message to the reader with security, since the tag can generate the reader’s public key PUre, it can use this key PUre to encrypt the message and transmit this encrypted message to the reader. As shown in Fig. 2, after receiving the message from the tag, the reader can use its private key PRre to decrypt the message, since only the reader can know its private key PRre, the security of the message can be protected. Fig. 3 illustrates the scheme for the reader to create its digital signature and verify it. At first, the reader will use the message and the hash function to generate a hash code, and then it uses its private key PRre to encrypt this hash code to generate the digital signature and attach it to the original message and send both the digital signature and message to the tag. After receiving them, the RFID tag can use the public key of the reader PUre to decrypt the digital signature to recover the hash code. By comparing this hash code with the hash code generated from the message, the digital signature can be verified by the RFID tag. Fig. 4 illustrates the scheme for the RFID tag to create its digital signature and verify it. Since in the RFID systems, the reader can not know the identity of the tag before it read it, it can not generate the public key of the tag, thus the general protocol used in identity-based network can not be used here. In our approach, first, the tag will use its identity and its private key PRta to generate a digital signature. When the tag need to authenticate itself to the reader, it will add this digital signature to its identity, and encrypt them with the public key of the reader PUre and send to the reader, only the reader can decrypt this cipher text and get the identity of the tag and the digital signature. Using the tag identity, the reader can generate the tag’s public key PUta. Then the reader can using this public key to verify the digital signature.
RFID System Security Using Identity-Based Cryptography Tag
M
487
Reader
E
D
PUre
M
PRre ciphertext
Fig. 2. Message encryption Reader H M compare H
E
D
PRre
PUre
Fig. 3. Digital signature from reader Tag
Reader H
M
E
D
G
compare
PUtag
H
E
D PUre
PRre ciphertext
PRtag
Fig. 4. Digital signature from tag
As mentioned above, there are mainly three kinds of RFID tags. Since the basic RFID tags can not support any kind of cryptography and can not provide good security for the system, here we only compare RFID tags that using identity-based cryptography with RFID tags that based on symmetric-key and traditional public-key cryptography. The most important problem for the symmetric-key approach in RFID system is the key management. The RFID tags need a large memory to save all the secret keys related with each tags in the system for message encryption and decryption. Also if the RFID reader receives a message from a tag, since it can not know this message is from which tag and therefore can not know which key it can use to decrypt the message, it needs to search from all the keys until find out the right one. Although
488
Y. Liang and C. Rong
some approaches [11], [12] have been proposed to reduce the cost of key searching, the key searching is still a problem that is waste of time and resources. While in RFID system using identity-based cryptography, since every tag can use the public key of the reader to generate the cipher text that can be decrypted using the reader’s private key, the reader does not need to know the key of the tags and all it needs to keep is its own private key. In some RFID applications such as e-passport and visas, tag authentication is required. However, the symmetric-key approach can not provide digital signature for the tags to authenticate them to the RFID reader. By using identity-based scheme, the tags can generate digital signatures using their private keys and store them in the tags. When they need to authenticate themselves to the RFID reader, they can transmit these digital signatures to the reader, and the reader can verify them using the public keys of the tags. For the RFID systems that use traditional public-key cryptography, the system authority must remember all the keys and keep the key directory for certification. While in our proposed identity-based cryptography RFID systems, since the identity of the tags and reader can be used to generate public keys, the PKG dose not need to keep the key directory thus can reduce the resource requirements. Another advantage using identity-based cryptography in FRID systems is the reader does not need to know the public keys of the tags in advance. If the reader wants to verify the digital signature of a RFID tag, it can read the identity of the tag, and use the public key generated from the identity to verify the digital signature. While in the traditional public-key systems, if one client wants to know another client’s public key, it must search in the key directory to find it. An inherent weakness of identity-based cryptography is the key escrow problem. To avoid this problem, a highly trusted PKG is required. In the RFID system that using identity-based cryptography, since all the devices can be within one company or organization, the PKG can be highly trusted and protected, the chance for the key escrow can be reduced. For the key revocation problem of the identity-based cryptography, different from in other identity-based system that using the user’s name or address that is difficult to change as the public key, in the RFID system, the identity of the tag is used to generate the public key. If the private key of one tag has been compromised, the system can allocate a new identity and create a new private key to the tag effortlessly.
5 Conclusion In this paper, we introduced RFID technology and make an overview of the security considerations in RFID systems. Also we depicted the principles of identity-based cryptography and specified its advantages and drawbacks. Then we proposed to use identity-based cryptography in the RFID systems and depicted how can the system generate and distribute keys to the reader and tags and how the reader and tags use these keys to protect their privacy and authenticate to each other. By comparing this approach with symmetric-key based approach and traditional public-key based approach, we can see that by using identity-based cryptography in RFID system, we can get benefits both in security protection and key management.
RFID System Security Using Identity-Based Cryptography
489
References 1. Batina, L., Guajardo, J., Kerins, T., Mentens, N., Tuyls, P., Verbauwhede, I.: Public-Key Cryptography for RFID-Tags. In: Proceeding of the 5th International conference on Pervasive Computing and Communications Workshops (2007) 2. Beak, J., Newmarch, J., Safavi-Naini, R., Susilo, W.: A Survey of Identity-Based Cryptography. In: Proceeding of the 10th Annual Conference for Australian Unix and Open System User Group (AUUG 2004), pp. 95–102 (2004) 3. Boneh, D., Franklin, M.: Identity-based Encryption from the Weil Pairing Advances in Cryptology. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139. Springer, Heidelberg (2001) 4. Bono, S., Green, M.,, A.: Security Analysis of a Cryptographically Enabled RFID Device. In: USENIX Security Symposium – Security 2005, pp. 1–16 (2005) 5. Cocks, C.: An Identity-based Encryptoin Scheme Based on Quadratic Residues. In: Proceeding of 8th IMA International Conference on Cryptography and Coding (2001) 6. Diffie, W., Hellman, M.: New Direction in Cryptography. IEEE Transactions on Information Theory 22, 644–654 (1976) 7. Inoue, S., Yasuura, H.: RFID privacy using user-controllable uniqueness. In: Proceeding RFID Privacy Workshop (November 2003) 8. Juels, A.: Minimalist cryptography for low-cost RFID tags. In: Proceeding in the 4th Conference on Security in Communication Networks (2004) 9. Juels, A.: RFID Security and Privacy: A Research Survey. Selected Areas in Communications, IEEE Journal (2006) 10. Juels, A.: Strengthening EPC tags against cloning. In: Proceeding in the 6th International Conference on Web Information Systems Engineering (2005) 11. Molnar, D., Wagner, D.: Privacy and security in library RFID: Issues, practices, and architectures. In: Proc. ACM Conf. Ubiquitous compute Security, pp. 210–219 (2004) 12. Molnar, D., Soppera, A., Wagner, D.: A scalable, Delegatable Pseudonym Protocol Enabling Ownership Transfer of FRID Tags. In: The 12th Annual Workshop on Selected Areas in Cryptography. LNCS, Springer, Heidelberg (2005) 13. Shamir, A.: Identity-based Cryptosystems and Signature Schemes. In: Blakely, G.R., Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 47–53. Springer, Heidelberg (1985) 14. Tyyls, P., Batina, L.: RFID-tags for Anti-Counterfeiting. In: Pointcheval, D. (ed.) CT-RSA 2006. LNCS, vol. 3860, Springer, Heidelberg (2006) 15. NTRU. GenuID, http://www.ntru.com/products/genuid.html
RFID: An Ideal Technology for Ubiquitous Computing? Ciaran O’Driscoll, Daniel MacCormac, Mark Deegan, Fred Mtenzi, and Brendan O’Shea Dublin Institute of Technology, Kevin Street, Dublin 8, Ireland [email protected], {dan.maccormac, mark.deegan, fred.mtenzi, brendan.oshea}@comp.dit.ie
Abstract. This paper presents a review of RFID based approaches used for the development of smart spaces and smart objects. We explore approaches that enable RFID technology to make the transition from the recognized applications such as retail to ubiquitous computing, in which computers and technology fade into the background of day to day life. In this paper we present the case for the use of RFID technology as a key technology of ubiquitous computing due to its ability to embed itself in everday objects and spaces. Frameworks to support the operation of RFID-based smart objects and spaces are discussed and key design concepts identified. Conceptual frameworks, based on academic research, and deployed frameworks based on real world implementations are reviewed and the potential for RFID as a truly ubiquitous technology is considered and presented.
1
Introduction
The case for RFID in retail [1] and logistics [2] has been clearly identified and is being actively pursued by many companies. RFID technology, in association with Electronic Product Codes (EPC) [3], has been identified as an enabler in Supply Chain Management (SCM) in the retail case study presented in [1]. There is a clear financial motivation for companies like Wal-mart to have suppliers provide RFID tagging on products. In 2003, Roberti in [4] refers to estimates by Sanford C. Bernstein & Co., a New York investment research house, that Wal-Mart could save nearly 8.4 billion USD per year when RFID is fully deployed throughout its supply chain and in stores. The potential for RFID beyond the scope of Retail and SCM into the area of ubiquitous computing is being widely researched and this is an area of growing interest. Ubiquitous computing involves computers and technology that blend seamlessly into day to day living. Weiser [5] introduced the concept of ubiquitous computing in 1991, he begins his article with: ”The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 490–504, 2008. c Springer-Verlag Berlin Heidelberg 2008
RFID: An Ideal Technology for Ubiquitous Computing?
491
This description of a disappearing technology can clearly be applied to the trend in RFID technology development. The technology is maturing, size and form factors are reducing, and the cost of passive tags is now in the order of 10s of cents. As RFID is becoming such a ubiquitous technology, it is of particular interest to review its use in the area of Smart Spaces [4,6] for ubiquitous computing.
2
Motivation
The motivation for this paper and our current research is to identify and determine technologies that are suitable to support the development of ubiquitous computing environments. This paper focuses on the potential use of RFID as a truly ubiquitous technology. The goal of this paper is to consider and determine the suitability of RFID for ubiquitous and ubiquitous computing by reviewing existing research in this area. The objective of this research is to develop a comprehensive understanding of the state of the art in using RFID as a basis for the development of a generic ubiquitous framework that is suitable for rapid deployment. Through the research it is intended to consider aspects of usability of diverse smart objects by developing a demonstrator and performing validation tests with users.
3
Smart Space Devices
A smart device is an everyday physical object that has been enhanced by the addition of technology to create a smart device. An example of such a smart device is the smart coffee cup, MediaCup [7], which serves as a coffee cup and also pocesses additional processing, sensing and networking capability that can infer information about the state of the user. The identification of smart objects is considered a prerequisite for developing smart behaviour, which also requires support mechanisms for; event propagation, location management, and provision of the basic services of the object. The attachment of tags to physical objects allows for automatic identification and collection of location information when brought into the proximity of a tag detection system. RFID technology has been identified as a suitable technology for such object tagging [8] to create a smart object. RFID enhanced objects can use passive tag systems, which have a unique ID and small amounts of up to a 100 bytes of read/write memory, or active RFID systems, which have an inbuilt battery permitting transmission up to 100 m and a life of up to 10 years [9]. 3.1
Smart Space Characteristics
Smart objects operating in smart environments or spaces can respond to users preferences or profiles and provide a variety of services to users. These smart spaces require system software and associated communications infrastructures
492
C. O’Driscoll et al.
to interact with smart objects and users. Kindberg and Fox [10] identify two key characteristics of ubiquitous systems; Physical Integration and Spontaneous Interoperation. Physical Integration is the interaction between computing nodes and the physical world. Smart objects provide the interface between the physical and the virtual world. There is a stated recognition between different environments, such as private as in the home, or public as in work spaces, and boundaries are required to demarcate these different spaces. Spontaneous Interoperation involves components of the Smart Space that can change identity and functionality as required. This involves service discovery in which components can locate a service instance that meets their needs, bootstrapping of devices in which addresses are allocated to permit network integration of the smart devices, and interaction in which the components of the Smart Space have a common interoperation model to permit operation across environment boundaries. There are significant challenges in the development of systems that can provide the functionality of smart spaces. There is a need for appropriate frameworks to enable the development of ubiquitous environments and many frameworks based on alternative technologies have been developed. A review of frameworks that support RFID infrastructures is presented in the next sections.
4
RFID Frameworks for Ubiquitous Computing
The development of a suitable framework requires consideration of many design concepts such as those identified in [8], these are considered suitable for discussion of the frameworks that follow in the review section. Location, whether geographic or coordinate based, is essential in identifying the context of smart objects. This information can be readily acquired from RFID readers interacting with tagged objects. The concept of neighbourhood is related to location and is important in supporting cooperating devices in a smart environment. It is necessary to consider location management in terms of the physical object location and a symbolic location as in the case of a room in a building, both of these two aspects need to be integrated in a meaningful manner. Time is also an important aspect of any smart system in order to timestamp and order events identified by different interfaces such as an RFID reader. Related to time is the concept of history to support logging of events to support querying of past events. In many scenarios an object may consist of many objects, such as a truck transporting smart objects, this is the concept of composition that can be an essential element of a Smart Space framework. There is a need for a link between the physical world and the virtual world, of the supporting computer system, to identify events such as tags entering or leaving the Smart Space. A major design concern is the context of the smart device, which may depend on the presence of other devices, as applications or services are only provided in certain predefined circumstances. Applications or services require to assign state and behaviour to physical objects in a flexible manner to support different scenarios.
RFID: An Ideal Technology for Ubiquitous Computing?
493
A virtual counterpart for a physical object is required to support these design concepts as the smart object has limited resources, particularly when a simple RFID tag is in use. To support the virtual counterpart, objects require to have a unique name tag and an address to permit storage in a computer environment. These virtual counterparts require to be managed from creation until their end of use and possibly beyond end of use to provide a traceable history. An underlying requirement for applications or services in a smart environment is a communications infrastructure to permit access by smart object to the services. To support mobility in the environment a wireless based solution such as Bluetooth, WLAN or Zigbee [11].
5
Review of Academic Frameworks
Significant research has been carried out in to the development of frameworks to support smart spaces and smart objects. A number of these that have been developed as part of academic research are presented in this section. These frameworks are considered as academic as the papers reviewed indicate that the systems had not been deployed in the real world for long periods to permit validation by users. 5.1
Intuitive Service Discovery in RFID Enhanced Networks
In [12] Antoniou et al. present an Intuitive Service Discovery architecture that supports the interaction between smart devices, such as an IP network printer enhanced with an RFID tag. The RFID technology permits service discovery to occur through touch gestures by the user devices, which are mobile RFID devices using wireless connections. RFID was chosen, as it is well suited to automatic service discovery and configuration. To encourage this approach of RFID enhanced contact-less short-range interaction the Near Field Communications (NFC) Forum [13] was established. The proposed framework permits the automatic discovery of a printer by a new user in a shared Smart Space. An RFID enhanced mobile phone is gestured at a printer and this connects the phone to the local WLAN to permit use of the printer by the phone. This framework was designed to operate with a number of service discovery technologies such as UPnP [14] or Web Services. This framework requires a significant level of infrastructure in the form of WLAN, DNS and DHCP servers. In many public hotspots this infrastructure is readily available and would provide mobile users with the capability to select a suitable smart device with which to interact. 5.2
Service Framework with RFID and Sensor Networks
An alternative approach to developing a Smart Space is through the use of Wireless Sensor Networks (WSNs), which are small devices with processing capability, sensor inputs and an RF transceiver for setting up a communications infrastructure, such as in [15] for monitoring the temperature in refrigerated lorries. In [16]
494
C. O’Driscoll et al.
Fig. 1. WISSE Implementation Scenario
Lopez et al identify a novel approach of integrating RFID technology with a Wireless Sensor Network (WISSE). They propose a cluster based approach for organizing the entities, smart objects and identify how the communication between groups is maintained using a particular sensor node that acts as a connection, referred to as a correspondent, between clusters. A prototype implementation of the framework is presented that uses a PDA and RFID device and operates over WiFi rather than a true WSN technology as shown in Figure 1. This framework uses a centralized approach to maintaining information about physical devices and virtual devices, in order to minimize communication overhead in the network. This has limitations in terms of scalability, particularly as WSNs are generally used in a environment with many hundreds of sensor nodes in the network. 5.3
Active and Passive Tags for Spontaneous Interaction
Siegemund and Florkemeier [17] present an interesting set of 3 scenarios using smart objects based on Bluetooth enabled active RFID tags, BTnode [18]. The interaction between the different smart objects and users is enabled through the use of mobile phones using Bluetooth when users and objects are close and SMS when the distance is too long for Bluetooth. A conceptual overview of the system is shown in Figure 2. In each scenario in [17] the active tags are used to enhance normal objects. In the case of the egg box the tag monitors the condition of the eggs in the box to determine whether the eggs are intact or broken. In their smart medicine cabinet design, the cabinet reminds users when it is time to take their medicine, by sending an SMS message. It keeps track of the contents of the cabinet by using the an RFID reader, within the cabinet, that reads the serial number of the RFID tags on medication boxes. In the remote interaction case a user can use their phone to call an object and query the state of the object such as to determine who is in a particular smart office. In this approach they define an architecture to support the operation of the different applications and identify the potential for using passive and active RFID tags together. In particular they identify the ability of attaching an RFID scanner to an active tag to permit active and passive RFID tags to be integrated.
RFID: An Ideal Technology for Ubiquitous Computing?
5.4
495
The Smart Container
Floerkemeier et al. [19] address the issues of monitoring the contents of everyday containers using RFID. The inconvenience of monitoring the contents of containers combined with the possibility of human error creates the need for such a solution. This demonstration is called the smart box. The smart box is an example of self-contained RFID application, which does not communicate with external sensors. Examples of real world boxes which must be monitored are the toolbox, the first aid kit, and the medicine cabinet. Since these boxes may be checked many times with no abnormality being identified, this repetitive task can become irritating. Since the purpose of this application is to remove inconvenience or annoyance, the application is designed in such a way that it avoids further irritation by permitting people to interact with the container in their usual manner. The proposed smart box offers the following functionality: – The contents of the box must be monitored unobtrusively – The desired configuration of the box must be known and any discrepancy must be indicated. With the above functionality implemented, the implementation of additional services would be possible such as provision of detailed information on items and a mechanism for identifying which person is currently interacting with the container. Two possible interfaces are proposed for the system. The most simple interface is a indicator light, which turns red or green depending on the status of the box. A more detailed interface is also proposed which displays information about the box on a LCD screen. This information includes a list of the items are in the box, which items are missing, if any, and a usage history of each item. When a user adds or removes an item from the box, additional information about this item will be displayed on the LCD screen. Three sample smart box applications were discussed in [19], a smart toolbox, a smart medicine cabinet and a smart surgical kit. Smart Toolbox. The smart toolbox was implemented to support aircraft maintenance mechanics in their daily jobs. The system enables automatic monitoring of toolboxes. This prevents situations such as missing tools or a situation in which tools are returned to a toolbox which they do not belong to. A warning is also displayed if the person interacting with the box is not the owner of that box. An RFID tag is attached to each tool and the box itself is equipped with and RFID reader and antenna. Smart Medicine Cabinet. This application is used to track the contents of the medicine cabinet and to ensure that necessary prescription medication is contained within the cabinet at all times. The system is capable of tracking expiry dates on drugs and hence prevents situations where patients accidentally continue to use medication beyond its expiry date. Accessibility features are also present in the system such the ability to view and manage the contents of the cabinet remotely via a mobile phone or for the visually impaired who may have
496
C. O’Driscoll et al.
Fig. 2. Overview of the main architectural components in the remote interaction and smart product monitoring scenario: when the user is in range of a smart object, interaction stubs are transferred to his/her mobile phone (1), when far away, communication takes place over the cellular phone network (2)
difficulty reading small print on prescription bottle, the system can display the name of the item on the LCD screen when it removed from the cabinet and a synthesized voice informs the user of the name of the medication. Smart Surgical Kit. The third demonstration of the smart container was in the form of a smart surgical kit. The key problem which this application addresses is the case in which a surgeon forgets to remove swabs and bandages used during surgery. Currently, this problem is addressed by post-surgery XRays. This not only creates the inconvenience of needing X-Rays, but also the issue of opening a wound to retrieve a misplaced swab. For this application, it is necessary to monitor both the contents of the surgical kit and the bin into the which the used swabs are placed. Each swab is tagged with an RFID chip to monitor its location. Any items not in either the waste bin or the surgical kit can be assumed to be in use. Floerkemeier et al. [19] felt that the smart surgical kit and the smart medicine cabinet exhibit particular promise. The smart toolbox, while an interesting application was perhaps the least useful of the three. It could be used as interesting showcase for RFID technology but its use in everyday life is questionable, at least until the cost of such technology is further reduced. 5.5
Smart Playing Cards
The smart playing cards application [20] applies RFID technology to an existingcard game, Whist. This approach is presented as differing from many applications
RFID: An Ideal Technology for Ubiquitous Computing?
497
of technology to gaming where new games are developed to exploit or to showcase the capabilities of technology. The game works by using standard playing cards to which adhesive RFID tags have been attached. Thecards are laid on a table, as normal, but in this instance the table has an RFID antenna mounted such that cards are detected as they are played. The system developed can determine the winning card in each trick played, can detect if a player’s move would be illegal, and notify accordingly, and can also keep score. Further concepts presented include the development of a training system for beginners to indicate whether a card played was a good or bad choice, and ultimately to prompt the player based on previous hands played. The rules of Whist are very straightforward, but a suitable extension of this project would be to develop systems to support other card games, and to assist the novice in learning the game. The fluidity of the playing of the real game was compromised by the reaction speeds of the system. The speed at which the antenna could detect the removal of cards from the table, for example, caused a latency which experienced players would not tolerate. Another limitation had to do with the RF properties of the antenna. The antenna has a sensitivity pattern of a hemisphere of diameter equal to the length of the antenna. In certain circumstances, this sensitivity is such that it can cause false detections of cards that have not yet been played. The proposed solution is to use a series of smaller antennae resulting in a ’lower’ sensitivity bubble. 5.6
RFID Chef
In [21] Langheinrich et.al. refer to the RFID Chef project. This project is based in a household kitchen. The co-location of tagged ingredients on a smart surface allows the system to retrieve recipes appropriate to the ingredients available. This system was developed in order to explore issues including; the application of RFID technologies to non-technical environments, a greater range and
Fig. 3. Schematic view of the RFID Chef hardware. The RFID reader antenna detects tagged artifacts in its vicinity and transmits raw sensory information through a reader module to the PC via serial cable.
498
C. O’Driscoll et al.
number of interacting artefacts when compared with other implemented RFID applications and an increase in the interaction complexity. A schematic view of the RFID Chef hardware is shown in Figure 3. The software system comprises of a a number of layered software modules implemented in a range of languages (C, Java and Python). – Sensory Control and Input Processing Module to poll sensors. – Basic Event Modelling Module which aggregates lower level events provided by the above module to generate events with a larger granularity. – Context Event Modelling Module to improve the signal/noise ratio by applying a time-based filter to the detected events. These modules serve to reduce jitter in the system due to the apparent arrival and departure of tags from the system. This jitter is most likely due to the nature of the RFID technology where tags may appear to arrive and depart the system while in reality the have not moved. This jitter also manifests itself when objects genuinely do move in and out of the range of the sensor antenna due to the chef’s rearranging of utensils, or ingredients in the kitchen. 5.7
Smart Libraries
The use of RFID in libraries has been widely written about [22,23]. An interesting use of RFID in libraries was the provision of digital library services [24] to mobile users. This system aims to eliminate the use of large desktop computers as a searching facility in libraries and to replace them with mobile devices such as PDAs and smart phones. This allows users to identify and view information on both physical and non physical library items on their mobile device. This can be achieved, it is proposed, through the use of active RFID tags as a means of storing meta data about each object in the library in conjunction with the use of RFID enabled mobile devices. Each library item contains an active RFID tag which contains information about the object in question. This data can then be transmitted to a range of devices. This system uses two types of RFID-based query services: on-demand or broadcast. If query information is broadcast from the RFID tag, it is the responsibility of the mobile host to recognize the information as relevant and retrieve it. On-demand queries can be real time or delayed. Real-time RFID queries will retrieve information immediately while delayed queries will allow the mobile host to voluntarily disconnect, process the query, and transmit the result when the mobile host reconnects. For efficiency purposes an RFID caching proxy is also used to store the results of frequent queries. The system is also capable of scaling down results based on bandwidth availability and the specification of the mobile device in question. The framework for the smart library is an interesting approach and could be used to deliver ubiquitous digital library services across a broad range of devices. Security issues when using RFID in libraries as described by Molnar [25] would also need to be considered in such a framework.
RFID: An Ideal Technology for Ubiquitous Computing?
6
499
Review of Deployed Frameworks
This section provides a review of RFID based frameworks that have been deployed and are in use in applications in museums and educational settings. These examples permit the usability of the technologies to be assessed through feedback from users. 6.1
RFID in Museums
eXspot [26,27] is a custom-designed RFID application being prototyped and evaluated over the past three years at the Exploratorium museum in San Francisco. Using RFID tagged visitor cards, visitors can trigger automatic cameras to take photographs and capture information about exhibits on display at the museum. This allows bookmarking of exhibits, which is a feature which visitors have previously expressed an interest in [28]. The system eliminates the need for the user to carry additional items such as cameras and informational booklets. Visitors can later use their RF tagged cards to log onto kiosks which allow them to view photographs which they have taken. They can continue their exploration either from museum kiosks or from home by logging onto a personalized web page which shows the dates they were at the museum, which exhibits they visited, photographs taken, as well as links to additional educational information on exhibits which they bookmarked. The system consists of small RFID readers attached to museum exhibits, RFID tagged cards carried by visitors, a wireless network, a registration kiosk and dynamically generated webpages. An RFID reader unit is located at each exhibit with a control processor and radio connectivity, a low-power RFID reader with a range of a few inches, and LEDs which show the systems state to visitors. Visitors hold their card in the vicinity of the eXpsot RFID reader, which interrogates their card to read an ID number which is then sent wirelessly to a base station. The base station then records the ID number, time of visit and exhibit information for access by the visitor at a later occasion. The Exploratorium [26] is not the only museum to adopt the use of RFID. The museum of Science and Industry in Chicago have opened an exhibition called ”NetWorld” where visitors learn about the Internet. Users design personal avatars before being given an RFID tagged visitor card. The personalized avatars then accompany and interact with them throughout their visit as they learn about the Internet and how it works. The Museum of Natural History in Aarhus, Demark takes a slightly different approach for their exhibit named ”Flying”. Stuffed birds are tagged with RFID chips and visitors carry RFID readers which they can use to scan the birds allowing them to view a presentation of related text, quizzes, audio and video [27]. The use of RFID in educational environments can provide a rich learning experience and promote user curiosity. RFID has also been used in more traditional learning environments such as the lecturer theater. Transnote is one such example.
500
6.2
C. O’Driscoll et al.
Transnote
Lecturers in the Japan Advanced Institute of Science and Technology are using RFID to display students’ notes on a shared media board via the Transnote [29] system. The concept of shared media boards has been widely considered as in [30], but in this example RFID technology is employed as the key enabling technology. The system provides a note sharing facility, which allows users to share notes, written on regular paper, using a shared media board. This is enabled through the use of digital pens and wireless communications. Initially, a lecturer used a desktop computer to control the contents of the shared media board. However, it was noted in a user survey that this proved to be an inconvenience since it is more natural for the lecturer to walk around the class and interact with the students. In addition to this, not all lecturers were comfortable with the use of a PC, which created the need for additional training on the lecturer’s part to familiarize themselves with the interface of the new system. To solve this problem an RFID remote was introduced. The remote takes the form of a PDA with an RFID reader attached. As the lecturer walks around the class they may see a set of notes they wish to show to the entire class. To achieve this, the lecturer simply holds the remote over the students RFID tagged digital note taking facility. Once the student tag is identified by the lecturer’s RFID reader, a command can be sent to the manager to display the corresponding notes. At this point, the lecturer may ask the student to explain the notes to the class. The student will typically explain the notes by pointing and drawing with their pen. To support this, the system uses a tracking function which supports zooming and positioning of the notes to correspond with the movement of the students pen. The use of RFID in this system allows for a more natural approach. It permits a lecturer to work in a fashion they are accustomed to and comfortable with, and allows students to see the lecturer approach and prepare them for having their notes displayed. One point noted is that the apparatus used by the student for note taking leaves something to be desired in terms of size.
7 7.1
Analysis and Review Physical Integration
The use of the passive RFID tags permits significant physical integration of the smart devices. In particular the approach of using both passive and active tags [17] to monitor the items in a cabinet is quite innovative. It permits the use of inexpensive tags on many individual items and requires only one more costly active tag to be used to create a range of adaptable solutions. In particular the use of SMS as a mechanism for sending event alerts is practical and can be readily implemented with existing telecom infrastructure.
RFID: An Ideal Technology for Ubiquitous Computing?
7.2
501
Spontaneous Interoperation
Each of the Frameworks present a differing approach to solving the issues of spontaneous operation and in particular to providing service discovery for users in the Smart Space. The simplest approach identified in [8] use passive tags in each case and provide a usable approach that is inexpensive. However in the case of smart tool box the prototype application has been developed to operate as a standalone unit. A communications infrastructure would be useful to permit the collection of data and identification of event alerts that could be circulated. 7.3
Academic Versus Deployed Frameworks
The academic frameworks provide indications of the potential uses of RFID and suggest approaches for addressing the issues of Physical Integration and Spontaneous Interoperation. The implementations identified as academic have not been validated by significant user interaction and usability testing. In order to determine the true potential of RFID technologies these frameworks require more exhaustive testing. The prototype systems used in such academic research, as in [19,8], may not readily translate into a real world implementations as the level of physical integration could be limited and thus would not be considered as truly ubiquitous solutions. The deployed frameworks such as the museum example [26] have been used in real world situations and have been evaluated by users. Spontaneous interoperation is supported in these deployed cases as the systems can be observed in operation and do provide services autonomously. The user devices, RFID cards, also indicate the concept of physical integration through the size and operation. However they identify the need for diverse collaborating technologies to create a suitable infrastructure. The deployed frameworks provide the ability to gain user feedback and permit modification of frameworks and the system configuration to meet usability requirements. This was particularly clear in the Transnote [29] case where the system was modified to match users’ normal lecturing styles. Through deployment the issue of user training also became apparent, which is often overlooked during an academic research activity.
8
Conclusion
This review underlines the case for RFID as a ubiquitous technology to support the development of smart spaces. A number of indicative examples of the use of RFID technologies for developing ubiquitous environments were presented. The review of the academic frameworks gives an indication of the potential applications of RFID in the development and provision of smart environments and spaces. However these approaches have limitations in that they have yet to be validated in large scale user tests. Also, before RFID can become a truly ubiquitous technology, there are still many research challenges to be faced. Such
502
C. O’Driscoll et al.
challenges include security, privacy, deployment challenges such as health and safety and aesthetics, as well as technical challenges such as system failures and input data errors. For example, duplicate tag readings have been discussed by Welbourne et al. [31], which involve the same tag being detected by multiple antennas simultaneously or several times by the same antenna in a very short time interval. Cleaning RFID data is a active area of research [32,33] which addresses such problems. The deployed frameworks address the real challenge, user acceptance, in creating viable smart spaces and usable smart objects. To meet Weisers’ concept of disappearing computing and technologies [5] requires that users are willing to accept the use of that technology into their everyday lives. This level of acceptance requires non-invasive technologies which are intuitive, self organizing, self managing and which require minimal interaction by the user.
9
Future Work
The next stage in our research program is the development of a demonstrator system using RFID technologies and the development of an overarching framework that will address the limitations of existing solutions. The demonstrator will involve the deployment of a Smart Space across an academic campus to support lectures and instant tutorials for local and remote access by participants to both pre-scheduled and ad-hoc lecturing events. In particular the research will focus on the use of Near Field Communications (NFC) solutions, with a core aim to minimize complexity in order to achieve an optimal design for rapid deployment and intuitive user operation.
Acknowledgements Dan MacCormac gratefully acknowledges the contribution of the Irish Research Council for Science, Engineering and Technology: funded by the National Development Plan.
References 1. Wamba, S.F., Lefebvre, L.A., Lefebvre., E.: Enabling intelligent B-to-B eCommerce supply chain management using RFID and the EPC network: A case study in the retail industry. In: The 8th international conference on Electronic commerce, pp. 281–288 (2006) 2. Srivastava, B.: Radio Frequency ID technology: The next revolution in SCM. Business Horizons 47/6, 60–68 (2004) 3. EPC Global (2006), http://www.epcglobalinc.org 4. Roberti, M.: Analysis: RFID - Wal-Mart’s network effect. CIO Insight (2006) 5. Weiser, M.: The computer for the 21st century. Scientific American 265/3, 94–104 (1991)
RFID: An Ideal Technology for Ubiquitous Computing?
503
6. Mangione-Smith, W.H.: Mobile computing and smart spaces. IEEE Concurrency 6/4 (1998) 7. Beigl, M., Gellersen, H., Schmidt, A.: Mediacups: Experience with design and use of computer-augmented everyday objects. Computer Networks 35/4, 401–409 (2001) 8. R¨ omer, K., Schoch, T., Mattern, F., D¨ ubendorfer, T.: Smart identification frameworks for ubiquitous computing applications. Wireless Networks 10, 689–700 (2004) 9. Philips ICODE smart label solutions, http://www.philips.semiconductors.com/ products/identification/icode/index.html 10. Kindberg, T., Fox, A.: System software for ubiquitous computing. IEEE Pervasive Computing 1/1, 70–81 (2002) 11. Baker, N.: Zigbee and bluetooth strengths and weaknesses for industrial applications. Computing & Control Engineering Journal 16/2, 20–25 (2005) 12. Antoniou, Z., Krishnamurthi, G., Reynolds., F.: Intuitive service discovery in RFID-enhanced networks. In: Proceedings of IEEE COMSWARE Conference (January 2006) 13. NFC forum, http://www.nfc-forum.org 14. UPnP forum, http://www.upnp.org 15. Shan, Q., Liu, Y., Prossec, G., Brown., D.: Wireless intelligent sensor networksfor refrigerated vehicle. In: IEEE 6th CAS Symposium on Emerging Technologies: Mobile and Wireless Communication (2004) 16. Lopez, T.S., Kim, D., Park, T.: A service framework for mobile ubiquitous sensor networks and RFID. In: 1st International Symposium on Wireless Pervasive Computing, pp. 16–18 (2006) 17. Siegemund, F., Florkemeier, C.: Interaction in pervasive computing settings using bluetooth-enabled active tags and passive RFID technology together with mobile phones. In: The First IEEE International Conference on Pervasive Computing and Communications (PerCom) (2003) 18. Beutel, J., Kasten., O.: A minimal bluetooth-based computing and communication platform. Technical report, Computer Engineering and Networks Lab, Swiss Federal Institute of Technology (ETH) Zurich (2001) 19. Floerkemeier, C., Lampe, M., Schoch, T. : The smart box concept for ubiquitous computing environments 20. Romer, K., Domnitcheva, S.: Smart playing cards: A ubiquitous computing game. Personal and Ubiquitous Computing 6, 371–377 (2004) 21. Langheinrich, M., Mattern, F., Rmer, K., Vogt, H.: First steps towards an eventbased infrastructure for smart things. Technical report, ETH Zurich, Swiss Federal Institute of Technology (2000) 22. Boss, R.W.: RFID technology for libraries (2004) 23. Smart, L.: Making sense of RFID. Netconnect (Fall, 2004), 4–14 (2004) 24. Morales-Salcedo, R., Ogata, H., Yano, Y.: Towards a new digital library infrastructure with RFID for mobile e-learning. In: IEEE International Workshop on Wireless and Mobile Technologies in Education (WMTE 2005) (2005) 25. Molnar, D., Wagner, D.: Privacy and security in library RFID issues, practices, and architectures. In: Conference on Computer and Communication Security (2004) 26. Hsi, S., Semper, R., Brunette, W., Rhea, A., Boriello, G.: eXspot: A wireless RFID transceiver for recording and extending museum visits. In: Davies, N., Mynatt, E.D., Siio, I. (eds.) UbiComp 2004. LNCS, vol. 3205, Springer, Heidelberg (2004) 27. Hsi, S., Fait, H.: RFID enhances visitors’ museum experience at the Exploratorium. Communications of the ACM 48 / 9, 60–65 (2005)
504
C. O’Driscoll et al.
28. Fleck, M., Frid, M., Kindberg, T., Rajani, R., OBrien, E.: From informing to remembering: Deploying a ubiquitous system in an interactive science museum. IEEE Pervasive Computing 1, 13–21 (2002) 29. Miur, M., Kunifuji, S., Shizuki, B., Tanaka, J.: Airtransnote augmented classrooms with digital pen devices and RFID tags. In: IEEE International Workshop on Wireless and Mobile Technologies in Education (WMTE 2005) (2005) 30. Elrod, S., Bruce, R., Gold, R., Goldberg, D.: Liveboard: A large interactive display supporting group meetings, presentations, and remote collaboration. In: Conference on Human Factors in Computing Systems (1992) 31. Welbourne, E., Balazinska, M., Borriello, G., Brunette, W.: Challenges for pervasive rfid-based infrastructures. In: PERCOMW 2007: Proceedings of the Fifth IEEE International Conference on Pervasive Computing and Communications Workshops, Washington, DC, USA, pp. 388–394. IEEE Computer Society, Los Alamitos (2007) 32. Jeffery, S.R., Garofalakis, M., Franklin, M.J.: Adaptive cleaning for rfid data streams. In: VLDB 2006: Proceedings of the 32nd international conference on Very large data bases, VLDB Endowment, pp. 163–174 (2006) 33. Khoussainova, N., Balazinska, M., Suciu, D.: Towards correcting input data errors probabilistically using integrity constraints. In: MobiDE 2006: Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access, pp. 43–50. ACM, New York (2006)
An Experimental Analysis of Undo in Ubiquitous Computing Environments Marco Loregian and Marco P. Locatelli DISCo, University of Milano-Bicocca viale Sarca 336/14, Milan, Italy {loregian,locatelli}@disco.unimib.it
Abstract. All personal computer application are provided with an undo functionality, which can implement any of the models available in literature. Users are generally aware of what the undo function is expected to do, depending on the application in use. Ubiquitous computing systems are beginning to be understood and deployed in real life situations, but little attention has been paid to what users expect themselves to be able to do and undo in such systems. In this paper, we present the results of a survey we made to evaluate the perception of undo mechanisms with respect to a simple ubiquitous-computing environment. Our study shows that users already have a complex vision of undo encompassing advanced features such as context awareness and compensation.
1
Introduction
The problem of undoing actions is nowadays considered to be well understood by computing system developers, as well as by personal computer users [1]. Several approaches and theoretical frameworks for undo have been proposed in literature [2] and implemented in prototype systems [3] as well as in commercial applications. Studies have also investigated the common understanding of undo, with respect to abstract situations, meaning using non computer-related tests to assess the impact of the undo concept at a cognitive or practical level [4]. There is a new wave of systems arising, that are ubiquitous-computing (ubicomp) systems [5]. Currently, there is very little common knowledge of what these systems are going to be like. We are still in a world of visions, — slowly evolved from Weiser’s original ones [6], — since too few systems have been deployed to real world experience. How do the future users of ubicomp systems expect them to be? What are their expectations about general reliability, flexibility, and adaptability to unexpected situations (e.g., breakdowns)? How should ubicomp systems move away from a state that does not fit the users’ will? Do the currently adopted strategies for undo, and the corresponding classifications still apply? The aim of our research has been to experimentally evaluate the understanding of the will to undo in a world of interconnected devices populating the environment, pervading it with ubiquitous intelligence, and proactively satisfying users’ needs. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 505–519, 2008. c Springer-Verlag Berlin Heidelberg 2008
506
M. Loregian and M.P. Locatelli
By doing so, we want to provide a basis to the designers of ubicomp systems, to be adopted by novel applications, if they are designed in a user-centric fashion. The paper starts from the general classification of undo strategies available in literature; the following sections illustrate our case survey and the results that validate a set of hypotheses, which are finally discussed with respect to existing approaches for undo.
2
Background
A command can be defined as the high-level (user ) action that causes the execution of a set of lower-level operations by the system, and the scope of a command can be defined considering the atomic operations it triggers, and their effects, as a whole. The term undo is appropriate when there has previously been the intention to do something. Low-level (system) actions should not and usually can not be directly undone, since it would violate the scope of the command, but can only be addressed by issuing new commands dealing with the effects of those actions. Undoing a command can thus be seen as the result of the capability of a system to perform a set of actions to fulfill the users’ will to reach a state as if the previous command had never been given. Abowd and Dix presented undo as a user intention, rather than just a system function [1]. Their hypothesis was that it is not important for the users to know what “the undo button” does, but rather what it is for, and why it should be used. When users recognize an undesired state and are aware of what should be done to restore a previous one, they can choose whether to use the most appropriate system command, or to adopt strategies that might differ from the undo function provided as a default solution. The most common strategy for undo in PC applications is the linear one: i.e., an entire sequence ofcommands is undone command-by-command, in reverse chronological order (i.e., from the most recent, backwards), up to the desired command which is then undone last [7]. Sometimes this strategy is automated, adopting a chronological model: it is possible to choose a command from a timeline, and the system automatically goes back undoing commands step-by-step. This strategy is well suited for situations in which users behave autonomously and independently (either in a single-user or collaborative environment), and in which the temporal order essentially determines the relations between commands [3]. While linear undo is common and effective in simple applications, it is hardly acceptable in complex systems [8], where a selective (non-linear) strategy is preferred: i.e., a particular command can be undone preserving all the following ones (and their effects). If the systems practically undo the whole sequence of commands and then re-issues every command except the selected one the strategy is called script undo [9]. In contrast, cascading undo strategies take into account also the dependencies between commands, therefore, when a command is selected to be undone only the non-dependent following commands are preserved (or redone) [4]. Interpreting selective undo [10] as an intention means to allow the user be able to identify which command has to be undone, to know what other command(s) could
An Experimental Analysis of Undo in Ubiquitous Computing Environments
507
be given to revert the effects of that command, and to be able to choose if it is better to execute some compensative commands or to use a default strategy (if it is known what the side-effects will be). Undesired side-effects of undo are the result of dependencies that are not or can not be automatically handled in a proper way, all others can (and should) be defined in a compensation sphere [11]. The concept of compensation sphere comes from business process management: simply put, the successful execution of a set of actions (sphere) causes the commitment of the achieved results, while the failure of any of the steps in the set causes the execution of tasks previously defined to deal with the effects of the already executed tasks to consider the whole set of actions undone. Each operation or function in a compensation sphere has a compensation operation or function associated to it. It is not always possible to foresee or understand what the outcome of an undo operation might be: the ability to simulate or describe exactly the possible outcome (as a new state of the system) is called speculative undo [12] and can be of great help to users in trouble or can be willing to analyze different solutions. Speculative undo and non-linear strategies in general assume a prominent role in collaborative settings [13,14], — which we analyzed in a previous work [8], — i.e., when logical dependencies may affect others’ decisions and commands. In this work we will focus only on single-user interaction, being it easier to evaluate in a first experiment, and a necessary part of collaborative environments.
3
The Experiment
We know (from direct experience and from literature) what users do and what they would like to be able to do in controlled environments, such as PC applications. The goal of our study was to determine the most intuitive choice when implementing undo mechanisms in ubicomp. Our general hypothesis is that users think of command cancellation (after they have been already executed) in terms of cascading undo with smart compensation mechanisms. In other words, when a ubicomp system is requested to undo a previous command, it is going to be expected to be able to identify the logical dependencies between commands (according to their effects) and enact all the necessary measures to deal with each of them (also by performing new commands, to compensate the previous ones) in order to reach a new state that will be accepted by users. To validate the hypothesis, a typical ubicomp situation involving a request for undo was presented with a video-clip to a set of users, who then answered a questionnaire. Users were chosen in order to have a non-technical sample, i.e., not being computer science students or researchers was a preferential condition for recruitment. 3.1
Population and Sampling
Very simple guidelines drove the choice of the population to be used in the experiment: subjects should not be biased by former knowledge of similar problems, or by too much knowledge of computer applications and systems; subjects should be
508
M. Loregian and M.P. Locatelli
Fig. 1. Graph showing the age distribution of participants
willing to spend some time focusing on the impact of the proposed questions on other people. Therefore some sensibility towards cognitive or interaction-related issues was good to have. Moreover, we considered the possibility of having smarthome systems on sale in a few years [6], and oriented our search towards subjects likely to be in the condition to buy these systems. We published a call for participation (via email and web forums) to psychology students of our university, with knowledge of the many curricula available in that department ranging from neuropsychology to economics-, organization-, and technology-oriented curricula. Students participating in the experiment received a certification which is useful (and to some of them necessary) to complete their studies. Since the sample was quite uniform, we assessed the independence of values (answers to the questionnaire) by making the test also to some (extra-sample) computer science and economics students, without seeing a significant deviation from the others. The final population was composed of 36 subjects (13 male, 23 female), with ages between 19 and 46 (mean value: 27.09) (Fig. 1). Informatic skills were asked and 1 (2.8%) participant declared having no knowledge about computers, 11 (30.6%) declared having basic skills (mostly: only web browsing, email, word processing), 16 (44.4%) intermediate skills (spreadsheets,. . . ), and 8 (22.2%) more advanced skills (image and sound editing,. . . ). Setup of the experiment. In order to make users quickly and clearly understand what ubiquitous computing is, a movie clip1 was produced showing an easy to understand situation: interaction in the kitchen of a smart home: “Along with its obvious basic food storing function, the smart refrigerator is equipped with a shopping and storage application that allows the user to keep track of the individual food items expiry dates as well as manage shopping lists.” [15] The scenario was chosen among several other typical options [5] in order to not corrupt the validity of the experiment: it had to be easy to grasp by participants, 1
http://video.google.it/videoplay?docid=1025487561314098360
An Experimental Analysis of Undo in Ubiquitous Computing Environments
509
Fig. 2. Screenshots from the movie clip: a “fake” 16:9 style has been used to leave room for additional information (date and location, in the upper band) and subtitles
and close to their experience (e.g., not too “futuristically imaginary”). Moreover, we needed a self-explaining presentation. Different scripts were prepared and extensively discussed by HCI researchers and with a filming crew. The Staged Scenario. A man lives in a smart home: when he wakes up, he turns the lights on with his voice. When he walks into the bathroom, a sensor switches the lights on a mirror, and, since his health condition and diet are automatically monitored according to his desires, his weight is automatically measured and presented on the mirror he is watching (with some more information, Fig. 2, left). During breakfast, the system proposes him a menu for dinner, and the user accepts it by simply touching a screen. During the day, the system buys some food from the habitual store, pays using a dedicated account, and asks for the delivery at the time when the house owner usually comes back from work. The systems also turns on the air conditioning system to reach a comfortable temperature by the time the user comes home. In the late afternoon the user must cancel the dinner at home because he is unexpectedly invited out by a colleague: he just sends a text message with his mobile phone saying “Cancel2 dinner” (Fig. 2, right). At this point of the video the screen in the kitchen shows “Undo in progress” as the image fades out. 3.2
Apparatus and Instruments
The duration of the videoclip is two minutes: after the first vision all participants were asked if it was sufficiently clear in describing the situation, or if they needed some explanation or wanted to re-watch it. Participants could watch the video as many times as needed at any time: a second vision was requested only 4 times (11.0%). A further explanation of the video was never asked to the experimenters before the end of the individual questionnaire session. Both the video clip and 2
Even if “cancel” is different from “undo”, In Italian the meaning is preserved: we used cancella because it is the term commonly used in PC applications.
510
M. Loregian and M.P. Locatelli
Fig. 3. Completion time for each participant
the questionnaire were in the native language of all the participants, and carefully worded to avoid the necessity of a technical background and to avoid any possible misleading interpretation of questions. Further on this latter point, we cross-checked the interpretation given by subjects to some particularly important question by introducing some redundancy in the questions, as it will become clearer later when we will introduce our composite hypotheses (Sect. 4.1). Three instruments were used to collect feedback from participants. Each participant was given all instruments, one at a time in the following sequence: 1. A one-page free-text form asking “If you were the character of the movie, what would you expect to happen after the undo request?”; 2. A one-page free-text form asking “If you were the system, what would you do to fulfill the undo request?”; 3. A questionnaire with 17 questions (Table 1) to be answered with an acceptability rating (Not at all, hardly, easily, or absolutely acceptable) and a three-line (at most) mandatory motivation, plus a final question presenting a table with the complete sequence of operations performed by the system during the video and asking for which of them the system should be doing something (and if so, what) when it receives the undo request. The aim of the first two instruments was twofold (a) to make subjects start thinking about the issue, and take into consideration the various implications of the undo request before submitting them more detailed questions, and (b) to have a preliminary assessment of the attitude of the participant towards the problem (does she understand the situation?) and experiment (is she willing to make an effort to answer the questions?). Fig. 3 presents a completion time chart: the instruments of the 24 subjects (66.7%) having completed the first two parts in 8 minutes
An Experimental Analysis of Undo in Ubiquitous Computing Environments
511
(mean completion time) or less were preliminarily reviewed to see if the subject should be rejected according to one of the two reasons above. This only happened in two cases (one for each reason), and the data collected from the subjects were consequently excluded from the analysis and are not presented in figures. The mean global completion time was about 37.38 minutes (partial means: 4.66, 3.81, 28.91 minutes), to which the two minutes of the video and a few more minutes to fill the personal data in a standard form (gender, age, background knowledge,.. . ) must be added.
4
Results
While the first part of the questionnaire allowed us to preliminarily build qualitative user profiles, quantitative results can be collected from the third instrument (Table 1) and analyzed with respect to classical statistical measures: e.g., standard deviation and error of mean (Table 3, Fig. 4). The acceptance rates were coded to numerical values: not at all as −2, hardly as −1, easily as +1, absolutely as +2. Neutral attitude could not be expressed, thus producing a non-linear scale. Even if driven by common sense (e.g., with respect to the sign), the choice of the numerical values to acceptability rates is arbitrary, in the sense that the +2 value associated to absolutely does not mean “twice as easily” if compared to +1. The primary aim of the coding is to have and to be able to present a first indicator of the position of the subjects with respect to the various facets of the problem. Table 1. Questions in the third instrument (first part). The translation in English from the native language of the speakers might look more ambiguous than it was.
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17
How much would it be acceptable: To find a non-ideal temperature when coming back home To have paid for, but not received shopped items To receive a phone call from the shop, asking for a new delivery time To find the shopping outside the home front door To find a note from the shop saying that the delivery boy will come by the next day with the same items To find that something that should always be available (e.g., milk) in the refrigerator is missing To have the same menu proposed for dinner on the following day To receive a text-message from the system asking what to do with the shopping If the air conditioner is turned off If the system acts in complete autonomy If the system suspends every activity until the user comes back If the system saves the cancellation of the dinner If the system considers the cancellation of the dinner like an intention not to return home If the system knows what has been eaten during the day (snacks, canteen,. . . ) If the shopping is left with a neighbor If the shopping is redirected towards the user If everything is set back as the confirmation of dinner never happened
512
M. Loregian and M.P. Locatelli
Table 2. Second part of the third instrument (the first items, originally splitted between “on” and “off”, are presented together here) During the day, the system performed the following actions: when dinner is canceled, which actions should be undone, and how? Alarm clock on/off Room lights on/off Bathroom lights on/off User’s weight check Information display on the mirror Dinner menu calculation Insertion of dinner ingredients to food list Food list availability check Shopping list generation Order to the shop Delivery time choice Payment Air conditioning on Table 3. Statistics on the third part of the questionnaire. Normality test P value: always <0,0001; No variable passed the normality test.
While Table 3 presents some basic data derived from the statistical analysis, the results of the specific tests (e.g., χ2 ) have been omitted for simplicity (not being much informative) and for shortness from the following. As shown in Table 3, all questions but one (Q12) received at least one extremely negative response (−2) and at least one extremely positive response (+2), and most
An Experimental Analysis of Undo in Ubiquitous Computing Environments
513
Fig. 4. Questionnaires: Mean and Standard Deviation (top), Mean and Standard Error (bottom). Complete numerical data are available in Table 3.
of the means are close or within the [−0.5, +0.5] interval, showing some sort of neutrality of responses. In addition to what we already stated, the numerical values can also be used to proof that the distribution of answers is not random, to show significance: using a 95% confidence intervals, 9 out of the 17 responses are significantly different from zero at 2.5% (single tailed): Q3, Q5, Q7, Q8, Q9, Q12, Q14, Q16, Q17. However, a deeper analysis, based on the qualitative data collected through the free-text answers collected in association with each numerical value has been essential to resolve all the ambiguities. Fig. 4 (top) presents an overview of the mean and standard deviation with respect to all the questions in the questionnaire. Fig. 4 (bottom) presents an overview of the mean and standard error with respect to all the questions in the questionnaire. The purpose of the two charts is to give a simple presentation of the individual acceptability ratings.
514
M. Loregian and M.P. Locatelli
The questions in the third instrument were chosen to validate a set of hypotheses, some of which were simple (i.e., polled through a single, direct question) and some others were composite (i.e., polled through a combination of different questions). The order in which the questions (Table 1) were presented to participants was chosen to incrementally provide knowledge for reflection. Participants could not read the questions beyond the one they were answering, but could read their previous answers again (without changing them). The incremental provision of questions sometimes lead users to re-think about previously given answers. In these cases (3; 8.0%), the answers given later were considered more significant, and motivations were used to solve the discrepancies. The analysis of the data collected can start from some general considerations. The strongly positive feedback to Q12 (“If the system saves the cancellation of the dinner”; mean 1.57) indicates that the undo request is perceived at the same level of any other command given to the system: it should be memorized so that the system can use the information in the future or possibly undo the request itself, thus enabling a redo of the command. All subjects identified both confirmation and undo as commands composed of various operations, i.e., they understood that each command could trigger various, different, system actions. As a consequence, users started thinking about the ramifications of commands, and making implicit assumptions regarding logical dependencies. For example, 15 subjects (41.7%) answered positively to Q13 (“If the system considers the cancellation of the dinner like an intention not to return home”, mean -0.29), even if the total mean for the question is -0,29, showing that those who saw the dependency tended to see a weak dependency, while others considered the events unrelated. This hypothesis is also confirmed by Q1 (“To find a non-ideal temperature when coming back home”; mean: -0,49) and by Q9 (“If the air conditioner is turned off”; mean: 0.31). The online payment (Q2, “To have paid for, but not received shopped items”; mean -0.63), even if limited to a very small amount of money (8 euros, in the video scenario), is strongly identified by participants as a critical factor to determine how to process the undo request. By inserting this step in the scenario, we wanted to give participants something hard to manage and which would depend on individual payment experiences. We think it is an important factor in order to evaluate the general reliability of other answers: if all steps were practically irrelevant to users (i.e., not directly involving or causing material effects) participants might have tended to answer carelessly. The strongly negative feedback to Q16 (“If the shopping is redirected towards the user”; mean -1.34), beside showing that users understand and want the system not to act outside the domestic sphere (in this very specific case, at least), thus interfering with users’ lives, has been used to cross-check user answers: only 3 users (8.0%) gave a non-negative answer, and these users were in the five most technoenthusiasts profiles. The role of the sample ubicomp system with respect to human relationships has been further investigated through the combination of multiple questions.
An Experimental Analysis of Undo in Ubiquitous Computing Environments
4.1
515
Composite Hypotheses
As already stated briefly, the questions in the questionnaire were carefully worded to avoid ambiguities, but the instrument was designed to provide information (and data) also about a pre-defined set of complex issues. Those issues have been formulated in terms of composite hypotheses, i.e., the position of the subjects with respect to the problems has been investigated through more than one single question, so to have also a degree of redundancy sufficient to be confident that the wording of a part of the questions set did not affect the overall description of the expected system characteristic. The method used to analyze the position of participants with respect to the following generic hypotheses is to combine the answers given to different questions (Table 1), also considering the 612 motivations given along with the discrete rate of acceptability and users’ profiles. For this reason, the details of the statistical analysis of correlation between acceptability ratings was not considered relevant to be included, even if they were taken into account to draw conclusions. Only a synthesis of the collected answers is given for each hypothesis, with further implications left to the discussion (Sect. 5). H1. Is the system expected to act as a mediator between human parties? [Questions: Q3, Q15, Q16 ] No, the role of the system is not (just) to automatically open communication channels between humans, at least in the scenario presented. The system is expected to act on behalf of the user, respecting his privacy and time (see H3 below). More precisely, the system should carefully act with respect to people. Questions Q15 and Q16 show that the involvement of external people and the extension of the scope of the system outside the domestic boundaries is perceived as extremely unacceptable (means: -0.74 and -1.34 respectively). H2. What is the degree of autonomy expected from the system? [Questions: Q1, Q3, Q8, Q10, Q11, Q14 ] If the system does not have all the information needed to operate (e.g., if it is unable to determine if and at what time the user is returning, or why he is not returning, and this information is not provided by the text message in the scenario) it should open a communication channel with the user in order to obtain such data. The suspension of other regular activities is not acceptable (Q11, mean -0.57), but also the amount of interaction must be limited in order to not bother the user. The results highlight the necessity of having a clear notion of context, defined and presented to the potential users of these kinds of systems. It must be evident which information is going to be necessary to enact proper system behaviors, leaving the users in full control (i.e., to be aware the actions autonomously performed, to possibly be able to confirm or stop them). Speculative undo [12], even if not explicitly presented to subjects, was mentioned (in terms of “I would like to know in advance what could happen” or similar) 7 times (19.4%). H3. How much context information should the system be able to automatically collect and process? [Questions: Q1, Q10, Q12, Q13, Q14 ]
516
M. Loregian and M.P. Locatelli
Regardless of the number of sensors and devices in the system, there seems to be a limit to intrusiveness that can be tolerated by users. Q14 (“If the system knows what has been eaten during the day (snacks, canteen,. . . )”, mean 0.34) presented the possibility of allowing the system to monitor the user outside the house: this could have at least two motivations (not included in the question): (a) to have better control of nutrition, and diet, and (b) to possibly understand why the dinner was canceled (e.g., stomach ache, overeating,. . . ). Even if the majority of participants (26, 72.2%) caught only the former aspect, the general agreement was that the system should be allowed to monitor “remote” activity, provided that the information is not used against the user (i.e., to punish him in case of overeating), and that privacy is preserved.
5
Discussion
By analyzing the conclusive form explicitly asking which operations in the participants’ opinion should be affected by the undo request, we tried to give an interpretation of the experiment with respect to the existing undo models (Fig. 5). The following interpretation has been adopted: – linear undo: cancelling the dinner causes all operations following the earlier confirmation of the dinner to be undone, including those non logically dependent (i.e., air conditioning start); – selective undo: cancelling the dinner causes the system to be in a state where only the operations that are logically dependent from the earlier confirmation of the dinner are undone; – other: either some dependent operations are not identified to be undone, or some non-dependent operations that are neither in a temporal dependency are identified to be undone. Apparently, participants identified linear undo as the most suitable to the situation (45.7%), liking less a selective strategy (31.4%), and a significant part of the participants (22.9%) answered in a way that can not be classified as belonging to either of the other models. However, this is a misleading conclusion that must be explained by analyzing the cases with respect to the whole experiment. What counts as linear strategy is primarily the suspension of air conditioning, which seemingly has nothing to do with the dinner. However, participants who indicated that the air conditioning should be stopped implicitly assumed that it should be done because the user is not coming home soon. This information is provided by the video, because the character is invited out for dinner, but is not given to the system along with the undo command (the text message only says “cancel dinner”). The approaches envisioned by participants can also be interpreted from other perspectives: smart (or proactive) versus non-smart solutions, and solutions with or without compensation (i.e., the execution of new commands). Table 4 presents a classification of user-suggested solutions according to the two criteria. Solutions dynamically adapting to the undo request, resolving logical dependencies between commands and operations, and not relying on further interaction with
An Experimental Analysis of Undo in Ubiquitous Computing Environments
517
Fig. 5. Interpretation of the results adopting undo models already presented in literature Table 4. Classification of solutions envisioned by subjects according to smartness and compensation Smart Non-Smart Compensation 16 (44.4%) 6 (16.7%) No Compensation 2 (5.6%) 12 (33.3%)
the user have been classified as smart (18; 50.0%). Solutions where new commands are automatically issued by the system to reach a new state have been classified as with compensation (22; 61.1%). Our initial hypothesis suggesting the adoption of a cascading undo with smart compensation has been supported by data, even if not in the strong way we would have expected. The lack of numerical convergence in the collected responses, in conjunction with the many different suggested strategies and minor preferences show that users have a wide range of expectations and personal preferences with respect to the behavior of smart systems: learning and mixed-initiative systems would be preferred to purely smart systems. Analyzing the answers to the questionnaire we found out that the users want to be in full control of system’s actions (see composite hypothesis H2), even if this means adding some interaction overhead (request for confirmation of system autonomous commands). Participants trusted the system to be able to resolve logical dependencies, and to propose and negotiate sound solutions to the undo problems with respect to the situation. Our experiment can thus lead to a definition of undo for ubicomp systems that relies on context, i.e., the activity the system is supporting, the situation in which the activity is taking place, the actors involved (with their unique individual features), and so on.
6
Concluding Remarks and Future Work
The undo models proposed for “traditional” computing applications, even in distributed or collaborative settings, are not suited to ubicomp systems. The notion of ubiquitous intelligence allows for a broader spectrum of options, of which users
518
M. Loregian and M.P. Locatelli
already seem to have some notion. Context awareness, flexibility, and smartness (almost in terms of proactivity) are more than just desired features, they are expected. The results of our experiment can provide initial directions to the designers of ubicomp systems, since they should be aware of users’ expectations. Undo for ubiquitous computing cannot be designed just as a function to visit the timeline of a system backwards. The implications of the actions performed by users have to be taken into account to complement the trace of the issued commands and of their predictable effects, for which compensation can already be defined in advance. The technological possibilities of ubiquitous intelligence can help this task by providing the required data, which is then to be processed to support complex processes such as the identification of a set of possible undo strategies to respond to a specific request. Particular attention should also be paid to the problem of redo: what amount and kind of information should be preserved after an undo? Our subjects suggested that they agree on keeping the information, but what could it be used for, and how? Our experiment showed that the material component of the problem of undoing actions in ubicomp system is not secondary. Some observations on this side can already be made with respect to problems that users are already facing: e.g., undo in e-commerce is still often implemented and presented in a rough (non-contextual) fashion, and users can recede from purchases only either within a limited time or by directly negotiating the event with the seller. This is just one of the many examples in which undo is not fulfilling users’ expectations yet [8]. The next steps of our research will be to investigate how the results of our experiment might lead to novel functionalities in the systems we are working on [16]. In particular, we plan to implement user-driven context-aware undo strategies in some of the scenarios on which we are testing our ubicomp middleware platform.
References 1. Abowd, G.D., Dix, A.J.: Giving undo attention. Interact. Comput. 4, 317–342 (1992) 2. George, B., Leeman, J.: A formal approach to undo operations in programming languages. ACM Trans. Program. Lang. Syst. 8, 50–87 (1986) 3. Edwards, W.K., Igarashi, T., LaMarca, A., Mynatt, E.D.: A temporal model for multi-level undo and redo. In: UIST 2000: Proceedings of the 13th annual ACM symposium on User interface software and technology, pp. 31–40. ACM Press, New York (2000) 4. Cass, A.G., Fernandes, C.S.T., Polidore, A.: An empirical evaluation of undo mechanisms. In: NordiCHI 2006: Proceedings of the 4th Nordic conference on Humancomputer interaction, pp. 19–27. ACM Press, New York (2006) 5. Weiser, M.: The computer for the 21st century. Scientific American 265(3), 94–104 (1991) 6. Bell, G., Dourish, P.: Yesterday’s tomorrows: notes on ubiquitous computing’s dominant vision. Personal Ubiquitous Comput. 11, 133–143 (2007) 7. Prakash, A., Knister, M.J.: A framework for undoing actions in collaborative systems. ACM Trans. Comput.-Hum. Interact 1, 295–330 (1994) 8. Agostini, A., De Michelis, G., Loregian, M.: Undo in Workflow Management Systems. In: van der Aalst, W.M.P., ter Hofstede, A.H.M., Weske, M. (eds.) BPM 2003. LNCS, vol. 2678, pp. 321–335. Springer, Heidelberg (2003)
An Experimental Analysis of Undo in Ubiquitous Computing Environments
519
9. Archer, J.E., Conway, R., Schneider, F.B.: User recovery and reversal in interactive systems. ACM Trans. Program. Lang. Syst. 6, 1–19 (1984) 10. Berlage, T.: A selective undo mechanism for graphical user interfaces based on command objects. ACM Trans. 1, 269–294 (1994) 11. Chessell, M., Griffin, C., Vines, D., Butler, M., Ferreira, C., Henderson, P.: Extending the concept of transaction compensation. IBM Syst. J. 41, 743–758 (2002) 12. O’Brien, J., Shapiro, M.: Undo for anyone, anywhere, anytime. In: EW11: Proceedings of the 11th workshop on ACM SIGOPS European workshop: beyond the PC, p. 31. ACM Press, New York (2004) 13. Sun, C.: Undo as concurrent inverse in group editors. ACM Trans. Comput.-Hum. Interact 9, 309–361 (2002) 14. Ressel, M., Nitsche-Ruhland, D., Gunzenh¨ auser, R.: An integrating, transformationoriented approach to concurrency control and undo in group editors. In: CSCW 1996: Proceedings of the 1996 ACM conference on Computer supported cooperative work, pp. 288–297. ACM Press, New York (1996) 15. Park, S.H., Won, S.H., Lee, J.B., Kim, S.W.: Smart home - digitally engineered domestic life. Personal Ubiquitous Comput. 7, 189–196 (2003) 16. Locatelli, M.P., Loregian, M.: Active coordination artifacts in collaborative ubiquitous-computing environments. In: Schiele, B., Dey, A.K., Gellersen, H., de Ruyter, B., Tscheligi, M., Wichert, R., Aarts, E., Buchmann, A. (eds.) AmI 2007. LNCS, vol. 4794, pp. 177–194. Springer, Heidelberg (2007)
Towards a Collaborative Reputation Based Service Provider Selection in Ubiquitous Computing Environments Malamati Louta Technological Education Institute of Western Macedonia, Department of Business Administration, Koila 50100 Kozani, Greece University of Western Macedonia, Department of Information and Communications Technologies Engineering, 50100 Kozani, Greece [email protected]
Abstract. From a market based perspective, entities composing dynamic ubiquitous computing environments may be classified into two main categories that are, in principle, in conflict. These are the Service Resource Requestors (SRRs) wishing to use services and/or exploit resources offered by other system entities and the Service Resource Providers (SRPs) that offer the services/resources requested. Under the assumption that a number of SRPs may handle the SRRs requests, the SRRs may decide on the most appropriate SRP for the service / resource requested on the basis of a weighted combination of the evaluation of the quality of their offer (performance related factor) and of their reputation rating (reliability related factor). In this context, a reputation mechanism is proposed, which rates SRPs with respect to whether they honoured or not the agreements they have established with the SRRs. The reputation mechanism is collaborative, distributed and robust against false and spurious ratings. Keywords: Intelligent Multi Agent Systems, Performance & Reliability Related Factors, Collaborative Reputation Mechanism, Ubiquitous Computing.
1 Introduction Highly competitive and dynamic ubiquitous computing environments (including pervasive, peer-to-peer, grid computing, mobile ad-hoc & sensor networks and electronic communities), comprise system entities, which may be classified into two main categories that, in principle, are in conflict. These two categories are: the entities that wish to use services and/or exploit resources offered by other system entities (Service/Resource Requestors - SRRs) and the entities that offer the services / resources requested (Service/Resource Providers - SRPs). In general, SRPs’ main role is to develop, promote and provide the desired services and resources trustworthily, at a high quality level, in a time and cost effective manner. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 520–534, 2008. © Springer-Verlag Berlin Heidelberg 2008
Towards a Collaborative Reputation Based Service Provider Selection
521
The aim of this paper is, in accordance with efficient service operation objectives, to propose enhancements to the sophistication of the functionality that can be offered by ubiquitous intelligent computing environments. Hence, the following business case that falls into the realm of ubiquitous computing may be pursued: SRRs should be provided with mechanisms that enable them to find and associate with the most appropriate SRPs, i.e., those offering the desirable quality of service at a certain time period in a cost efficient manner, while exhibiting a reliable behavior. This study presents such mechanisms. Specifically, SRRs’ decision on the most appropriate SRP is based on a weighted combination of the evaluation of the quality of the SRPs’ offer (performance related factor) and of their reputation rating (reliability related factor). The SRPs’ performance evaluation factor is based on the fact that there may in general be different levels of satisfaction with respect to the various SRPs’ offers, while the reliability factor is introduced in order to reflect whether SRP finally provides to the SRR the service / resource that corresponds to the established contract terms or not. A wide variety of negotiation mechanisms, including auctions, bilateral (1 to 1) and/or multilateral (M to N) negotiation models and strategies as well as posted offer schemes (i.e., a nonnegotiable, take-it-or-leave-it offer) may be adopted in order to evaluate the quality of the SRP offers and establish the ‘best’ possible contract terms and conditions with respect to service/resource access and provision [1][2]. However, seeking for the maximization of their welfare, while achieving their own goals and aims, entities may misbehave (intentionally or unintentionally), acting selfishly, thus, leading to a significant deterioration of system’s performance. Therefore, trust mechanisms should be exploited in order to build the necessary trust relationships among the system entities [3], enabling them to automatically adapt their strategies to different levels of cooperation and trust. Traditional models aiming to avoid strategic misbehaviour (e.g., authentication and authorization schemes [4][5], Trusted Third Parties (TTPs) [6]) may be inadequate or even impossible to apply due to the complexity, the heterogeneity and the high variability of the environment. Reputation Mechanisms are employed to provide a “softer” security layer, sustaining rational cooperation and serving as an incentive for good behaviour, as good players are rewarded by the society, whereas bad players are penalized [7]. In the context of this study, for the evaluation of the reliability of SRPs, a collaborative reputation mechanism is proposed, which takes into account the SRPs’ past performance in consistently satisfying SRRs’ expectations. To be more specific, the reputation mechanism rates the SRPs with respect to whether they honoured or not the agreements established with the SRRs, thus, introducing the concept of trust among the involved parties. The reputation mechanism considers both first-hand information (acquired from the evaluator SRR’s past experiences with the target SRP) and secondhand information (disseminated from other SRRs), is decentralized and exhibits robust behaviour against spurious and false reputation ratings. This study is based upon the notion of interacting intelligent agents which participate in activities on behalf of their owners, while exhibiting properties such as autonomy, reactiveness, and proactiveness, in order to achieve particular objectives and accomplish their goals [8]. The rest of the paper is structured as follows. The related research literature is revisited in Section 2. In Section 3, the fundamental concepts of the proposed collaborative reputation mechanism are presented, aiming to offer an efficient way of building
522
M. Louta
the necessary level of trust in the ubiquitous intelligent computing environments. In Section 4, the reputation ratings system is mathematically formulated. In Section 5 the SRRs decision on the most appropriate SRP is formally described. Section 6 provides some initial results, indicative of the efficiency of the SRPs selection mechanism presented in this study. Finally, in Section 7, conclusions are drawn and directions for future plans are presented.
2 Related Research The issue of trust has been gaining an increasing amount of attention in a number of research communities. In [9], the current research on trust management in distributed systems is surveyed and some open research areas are explored. Specifically, the authors discuss on representative trust models in the context of P2P systems, mobile ad-hoc networks and electronic communities including public-key cryptography, the resurrecting duckling model and distributed evidence and recommendation based trust models. Open research issues identified include a) trust / reputation value storage, while preserving its consistency, b) mitigation of the impact of false accusations / malicious behaviour and c) combination and exploitation of trust values of different applications. In [10], a set of aspects is proposed to classify computational trust and reputation models. The classification dimensions considered were the reference model (cognitive or game theoretical), the information sources taken into account for trust and reputation value calculation (direct experiences & witness information, sociological aspects of agents’ behavior, prejudice), the global or subjective view of trust and reputation values, the model’s context dependence (capability of dealing with several contexts at the same time, while maintaining different trust/reputation values associated to these contexts for each partner), the incorporation of mechanisms to deal with agents showing different degrees of cheating behavior, the type of information expected from witnesses (Boolean information or continuous measures) and the trust/reputation reliability measure. A representative selection of trust and reputation models is classified on the basis of the aforementioned criteria. Based on this study, the authors believe that a good mechanism to increase efficiency of actual trust and reputation models and also to overcome the lack of confidence in e-markets is the introduction of sociological aspects as part of these models. In general, reputation mechanisms establish trust by exploiting learning from experience concepts in order to obtain a reliability value of system participants in the form of rating based on other entities’ view/opinion. Reputation related information may be disseminated to a large number of system participants in order to adjust their strategies and behaviour, multiplying thus the expected future gains of honest parties which bear the loss incurred by cooperating and acting for the maximization of the social welfare. Current reputation system implementations consider feedback given by Buyers in the form of ratings in order to capture information on Seller’s past behavior, while the reputation value is computed as the sum (or the mean) of those ratings, either incorporating all ratings or considering only a period of time (e.g., six months) [11], [12]. In [13] the authors, after discussing on desired properties for reputation mechanisms for online communities, describe Sporas and Histos reputation
Towards a Collaborative Reputation Based Service Provider Selection
523
mechanisms for loosely and highly connected online communities, respectively, that were implemented in Kasbah electronic marketplace. Sporas reputation mechanism provides a global reputation value for each member of the online community, associated with him/her as part of his/her identity. Histos builds a more personalized system, illustrating pairwise ratings as a directed graph with nodes representing users and weighted edges representing the most recent reputation rating given by one user to another. [14] introduces PeerTrust, an adaptive and dynamic reputation based trust model that helps participants/peers to evaluate the trustworthiness of each other based on the community feedback about participants’ past behavior. Five important factors are taken into account for the calculation of trust: the feedback a peer obtains from others, the feedback scope (such as the total number of transactions that a peer has with other peers), the credibility factor of the feedback source, the transaction context factor for discriminating mission critical transactions from less or non critical ones and the community context factor for addressing community related characteristics and vulnerabilities. In [15], the authors base the decision concerning the trustworthiness of a party on a combination of local information acquired from direct interactions with the specific party (if available) and information acquired from witnesses (trusted third parties that have interacted with the specific party in the past). In order to obtain testimonies from witnesses, a trust net is built by seeking and following referrals from its neighbours, which may be adaptively chosen. Their approach relies upon the assumption that the vast majority of agents provide honest ratings, in order to override the effect of spurious ratings generated by malicious agents. In [16], it is studied how to efficiently detect deceptive agents following some models of deception introduced, based on a variant of the weighted majority algorithm applied to belief functions. In the current version of this study, a reputation mechanism is exploited in order to estimate the reliability of each SRP. This estimation constitutes the reliability related factor and is introduced in order to reflect whether the SRP finally provides to the SRR the service /resource that corresponds to established contract terms or not. The SRP’s reliability is reduced whenever the SRP does not honour the agreement contract terms reached in general via a negotiation process. The proposed reputation mechanism is collaborative (the agents in addition to the information acquired based on their own experiences exploit information disseminated form other parties, enabling them to identify misbehaving parties in a time efficient manner), distributed and robust against false and/or spurious feedback.
3 General Elements of the Designed Reputation Mechanism Most reputation based systems in related research literature aim to enable entities to make decisions on which parties to negotiate/cooperate with or exclude, after they have been informed about the reputation ratings of the parties of interest. The authors in this study do not directly exclude / isolate the SRPs that are deemed misbehaving, but instead base the SRRs’ decision on the most appropriate SRP on a weighted combination of the evaluation of the quality of the SRPs’ offer (performance related factor) and of their reputation rating (reliability related factor).
524
M. Louta
Each system entity is represented by an intelligent agent acting on behalf of its owner in order to accomplish its tasks. Thus, two agent categories are introduced: Service/Resource Requestor Agents (SRRAs), which are assigned with the role of capturing the SRRs preferences, requirements and constraints regarding the requested services / resources, delivering them in a suitable form to the appropriate SRP entities, acquiring and evaluating the corresponding SRPs’ offers, and ultimately, selecting the most appropriate SRP on the basis of the quality of its offer and its reputation rating. Service/Resource Providers Agents (SRPAs) are the entities acting on behalf of the SRPs. Their role would be to collect the SRR preferences, requirements and constraints and to make a corresponding offer, taking also into account certain environmental criteria. SRRAs and SRPAs are both considered to be rational and self-interested, while aiming to maximise their owners’ profit. The proposed reputation mechanism is collaborative in the sense that it considers both first-hand information (acquired from the SRRA’s past experiences with the SRPAs) and second-hand information (disseminated from other SRRAs). To be more specific, each SRRA keeps a record of the reputation ratings of the SRPAs it has negotiated with and been served by in the past. This rating based on the direct experiences of the evaluator SRRA with the target SRPA forms the first factor contributing to the overall SRPA reputation rating. Concerning the SRPAs’ reputation ratings based on feedback given by other SRRA on their experiences in the system (the second factor contributing to the overall SRPA reputation based on witness information), the problem of finding proper witnesses is addressed considering a Service/Resource Provider Reputation Broker component (SRPRB) through which the evaluator SRRA obtains a reference of the SRRAs that have previously been served by the SRPAs under evaluation. At this point some clarifications with respect to the proposed model should be made. First, each SRRA is willing to share its experiences and provide whenever asked for the reputation ratings of the SRPAs formed on the basis of its past direct interactions. Second, the SRPRB component maintains a list of the SRPAs providing a specific service / resource as well as a list of SRRAs that have previously interacted and been served by a specific SRPA. Third, the reliability of SRPAs is treated as a behavioural aspect, independent of the services / resources provided. Thus, the witnesses list may be composed of SRRAs which have had direct interactions with the specific SRPA in the past, without considering the service / resource consumed. Fourth, SRPAs have a solid interest in informing SRPRB with respect to services / resources they currently offer, while the SRRAs are authorized to access and obtain witness references only in case they send feedback concerning the preferred partner for their past interactions in the system. This policy based approach provides a solution to the inherent incentive based problem of reputation mechanisms in order for the SRPRB to keep accurate and up to date information. True feedback cannot be automatically assumed. Second-hand information can be spurious (e.g., parties may choose to misreport their experience due to jealousy or in order to discredit trustworthy Providers). In general, a mechanism for eliciting true feedback in the absence of TTPs is necessitated. According to the simplest possible approach that may be adopted in order to account for possible inaccuracies to the information provided by the witnesses SRRAs (both intentional and unintentional), the evaluator SRRA can mostly rely on its own experiences rather on the target
Towards a Collaborative Reputation Based Service Provider Selection
525
SRPA’s reputation ratings provided after contacting the witness SRRAs. To this respect, SRPA’s reputation ratings provided by the witness SRRAs may be attributed with a relatively low significance factor. In this paper, we consider that each SRRA is associated with a trust level dynamically updated, which reflects whether the SRRA provides feedback with respect to its experiences with the SRPAs truthfully and in an accurate manner. In essence, this trust level is a measure of the credibility of the witness information. To be more specific, in order to handle inaccurate information, an honesty probability is attributed to each SRRA, i.e., a measure of the likelihood that a SRRA gives feedback compliant to the real picture concerning service provisioning. Second-hand information obtained from trustworthy SRRAs (associated with a high honesty probability), are given a higher significance factor, whereas reports (positive or negative) coming from untrustworthy sources have a small impact on the formation of the SRPAs’ reputation ratings. The evaluator SRRA uses the reputation mechanism to decide on the most appropriate SRPA, especially in cases where the SRRA doubts the accuracy of the information provided by the SRPA. A learning period is required in order for the SRRAs to obtain fundamental information for the SRPAs. In case reputation specific information is not available to the SRRA (both through its own experiences and through the witnesses) the reliability related factor is not considered for the SRPA selection. It should be noted that the reputation mechanism comes at the cost of keeping reputation related information at each SRRA and updating it after service provision / resource consumption has taken place.
4 Reputation Rating System Formulation Let us assume the presence of M candidate SRPAs interacting with N SRRAs concerning the provisioning of Q services / resources S = {s1 , s 2 ,...sQ } requested in a ubiquitous intelligent computing environment. Let the set of agents that represent Service Resource Providers be denoted by P = {P1, P2 ,...PM } and the set of agents that represent Service Resource Requestors be denoted by R = {R1, R2 ,...RN } . We hereafter consider the request of a SRRA Ri (evaluator) regarding the provision of service sc ( sc ∈ S ), which without loss of generality is provided by all candidate SRPAs P = {P1, P2 ,...PM } . In order to estimate the reputation rating of a target SRPA Pj , the evaluator SRRA Ri needs to retrieve from the SRPRB the list R w of n witnesses ( Rw ⊆ R = {R1, R2 ,...RN } ). Thereafter, the Ri contacts the n witnesses in order to get feedback reports on the behaviour of the Pj . The target SRPA’s Pj overall reputation rating ORR Ri ( Pj ) may be estimated by the evaluator SRRA Ri in accordance with the following formula: n
ORR Ri ( Pj ) = wRi ⋅ DRR Ri ( Pj ) + ∑ wRk ⋅ DRR Rk ( Pj ) . k =1
(1)
526
M. Louta
where DRR Rx ( Pj ) denotes the reputation rating of the target SRPA Pj as formed by SRRA R x on the basis of its direct experiences with Pj in the past. As may observed from equation (1), the reputation rating of the target Pj is a weighted combination of two factors. The first factor contributing to the reputation rating value is based on the direct experiences of the evaluator agent Ri , while the second factor depends on information regarding Pj past behaviour gathered from n witnesses. The set of the n witnesses is a subset of the R = {R1, R2 ,...RN } set. Weight wRx provides the relative significance of the reputation rating of the target SRPA Pj as formed by the SRRA Rx to the overall reputation rating estimation by the evaluator Ri . In general, wRx is a measure of the credibility of witness Rx and may be a function of the trust level attributed to each SRRA Rx by the evaluator Ri . It has been assumed that weights wR x are normalized to add up to 1 (i.e., n wRi + ∑ wRk = 1 ). Thus, weight wRx may be given by the following equation: k =1
wRx =
TLRi ( Rx ) R ∑ TL i ( Rx )
.
(2)
x∈i ∪{1,...n}
where TLRi ( Rx ) is the trust level attributed to SRRA Rx by the evaluator Ri . It has been assumed that TLRi ( R x ) ∈ [0,1] with level 1 denoting a fully trusted witness Rx in the eyes of the evaluator Ri . One may easily conclude that for the evaluator Ri it stands TLRi ( Ri ) = 1 . Concerning the formation of the reputation ratings DRR Rx ( Pj ) , each SRRA may rate SRPA Pj with respect to its reputation after a transaction has taken place in accordance with the following equation: Rx Rx Rx DRR post ( Pj ) = DRR pre ( Pj ) + k r ⋅ l ( DRR pre ( Pj )) ⋅ {rr Rx ( Pj ) − E[rr Rx ( Pj )]} .
(3)
Rx Rx ( Pj ) are the SRPA Pj reliability based rating after where DRR post ( Pj ) and DRR pre Rx and before the updating procedure. It has been assumed that DRR post ( Pj ) and Rx DRR pre ( Pj ) lie within the [0,1] range, where a value close to 0 indicates a misbehaving SRP. rr Rx ( Pj ) is a (reward) function reflecting whether the service quality is compliant with the picture established during the negotiation phase with SRRA Rx and E[rr Rx ( Pj )] is the mean (expected) value of the rr Rx ( Pj ) variable. In general the larger the rr Rx ( Pj ) value, the better the SRPA Pj behaves with respect to the agreed terms and conditions of the established contract, and therefore the more positive the influence on the rating of the Pj . It should be noted that SRP’s misbehaviour (or at least deterioration of its previous behaviour) leads to a decreased post rating value, since the {rr Rx ( Pj ) − E Rx [rr ( Pj )]} quantity is negative. The rr Rx ( Pj )
Towards a Collaborative Reputation Based Service Provider Selection
527
function may be implemented in several ways. In the context of this study, it was assumed without loss of generality that the rr Rx ( Pj ) values vary from 0.1 to 1. Factor k r ( kr ∈ (0,1] ) determines the relative significance of the new outcome with respect to the old one. In essence, this value determines the memory of the system. Small k r values mean that the memory of the system is large. However, good behaviour will gradually improve the SPRA’s Pj reputation ratings. Finally, Rx Rx l ( DRR pre ( Pj )) is a function of the Pj reputation rating DRR pre ( Pj ) and is intro-
duced in order to keep the Pj rating within the range [0,1] . A wide range of functions may be defined. In the current version of this study, we considered a simple polynomial
model
Rx Rx l ( DRR pre ( Pj )) = 1 − DRR pre ( Pj ) ,
for
which
it
stands
Rx Rx ( Pj )) → 0 . Other functions could be defined as l ( DRR pre ( Pj )) → 1 and l ( DRR pre x DRR R pre ( Pj )→0
Rx DRRpre ( Pj )→1
well. Trustworthiness of witnesses TLRi ( Rx ) initially assumes a high value. That is all witnesses are considered to report their experiences to the Ri honestly. However, as already noted, the trust level is dynamically updated in order to account for potential dissemination of misinformation by the witnesses in the system. Specifically, a witness Rx is considered to misreport his/her past experiences, if the target Pj overall reputation rating ORR Ri ( Pj ) as estimated by the evaluator Ri by means of equation (1) is beyond a given distance of the rating DRR Rx ( Pj ) obtained from the witness Rx (formed in accordance with equation (3)), in which case the following expression holds: ORR Ri ( Pj ) − DRR Rx ( Pj ) > e .
(4)
where e is the predetermined distance level. As it may be observed, this approach may be quite efficient in case the population of the witnesses reporting honestly their experiences is quite large with respect to the dishonest witnesses. Thus, to account for such cases, the evaluator takes also into account the distance of the reputation rating of the target Pj as formed considering its own direct experiences DRR Ri ( Pj ) . In case it stands DRR Ri ( Pj ) − DRR Rx ( Pj ) > e (assuming that the evaluator Ri has obtained the information required based on its direct experiences and has formed an accurate picture of the target Pj reliability) the evaluator may conclude that the witness misreports its experiences. Otherwise, in case the evaluator Ri does not have a real picture of the target Pj behaviour, it adjusts the trustworthiness of the witnesses considered for the formation of Pj reputation, only in case Pj is selected for the provisioning
528
M. Louta
of the service / resource and after service provisioning has taken place and the reputation rating has been accordingly updated. Witnesses’ trustworthiness may be updated on the basis of the following expression, in a similar manner to equation (3): i i i TLRpost ( R x ) = TLRpre ( R x ) + k b ⋅ l (TLRpre ( R x )) ⋅ a .
(5)
i ( R ) and TLRi ( R ) are the witness R trustworthiness after and before where TLRpost x pre x x i ( R ) and TLRi ( R ) lie the updating procedure. It has been assumed that TLRpost x pre x within the [0,1] range, where a value close to 0 indicates a dishonest witness. For the reward / penalty parameter a the following expression holds:
⎧a < 0, ORR Ri ( P ) / DRR Ri ( P ) − DRR Rx ( P ) > e⎫ j j j ⎪ ⎪ a=⎨ ⎬ . Ri Ri Rx ⎪a > 0, ORR ( Pj ) / DRR ( Pj ) − DRR ( Pj ) < e⎪ ⎩ ⎭
(6)
i ( R )) is a function of the witness trustworthiness TLRi ( R ) and is introduced l (TLRpre x pre x
in order to keep the witness trustworthiness level within the range [0,1] . In the current i ( R )) = 1 − TLRi ( R ) version of this study, in accordance with equation (3), l (TLRpre x pre x , i ( R )) → 1 and l (TLRi ( R )) → 0 . Factor k for which it stands l (TLRpre x pre x b ( k b ∈ (0,1] ) i ( R )→ 0 TLRpre x
i ( R ) →1 TLRpre x
determines the relative significance of the new outcome with respect to the old one, constituting, thus, the memory of the system. At this point, it should be mentioned that the reliability rating value of the SRPA Pj requires in some cases (e.g., when consumption of network or computational resources are entailed in the service provision process) a mechanism for evaluating whether the service quality was compliant with the picture promised during the negotiation phase.
5 Decision on the Best Service / Resource Provider In this study, SRPs that are deemed misbehaving are not directly ostracised, but instead the SRRs’ decision on the most appropriate SRP is based on a weighted combination of the evaluation of the SRPs’ offer quality (performance related factor) and of their reputation rating (reliability related factor). Considering the fact that there may in general be different levels of satisfaction with respect to the various SRPs’ offers, there may be SRPs that, in principle, do not satisfy the SRR with their offer. The evaluator SRRA Ri decides on the most appropriate candidate SRPA Pj for the provision of service/resource sc ∈ S (i.e., the SRPA best serving its current service / resource request) on the basis of the following formula:
Towards a Collaborative Reputation Based Service Provider Selection
529
P
j Maximise APR ( Pj ) = w p ⋅ U B (C final ) + wr ⋅ ORR Ri ( Pj ) .
(7)
As you may observe, APR ( Pj ) is an objective function that models the performance and the reliability of the SRPA Pj . Among the terms of this function there can Pj be the overall anticipated SRRA satisfaction stemming from the final contract C final reached within the negotiation phase, which may be expressed by the utility function Pj Pj U B (C final ) ( U B (C final ) ∈ [0,1] [17], [18]) with respect to the contract/offer proposed to the evaluator Ri and the reputation rating of the target Pj . Of course, one of the two factors (anticipated SRRA satisfaction or SRPA reputation rating) can be omitted in certain variants of the general problem version considered in this paper. Weights w p and wr provide the relative value of the anticipated user satisfaction and the reputation related part. It is assumed that weights w p and wr are normalized to add up to 1 (i.e., w p + wr = 1 ).
6 Performance Evaluation This section provides some indicative results on the behaviour of the service/resource provider selection mechanisms that are proposed in this paper. We hereafter assume the existence of an area that falls into the domain of P = {P1 , P2 ,...PM } candidate service providers (that is a specific request may be handled by any of the candidate SRPs belonging to the set P ). Regarding the different service/resource requestors that access the area, it is assumed that N classes exist. SRR classes are interested for the same service/resource, differentiated however with respect to the quality/quantity level required. In order to make the test case more realistic (or general), all SRPs are not assumed to offer all possible quantity/quality levels. SRPs that do not offer the required quality/quantity level for the service/resource as requested by the SRR class Ri constitute the I ( Ri ) set, which comprises SRPs that are inappropriate for the specific request and should therefore be excluded. Hereafter, it is assumed that N = 10 and M = 10 . Table 1 presents the set of SRPs that are inappropriate for service/resource requests originating from each SRR class. The proposed framework was empirically evaluated by simulating the interactions among SRRAs and SRPAs considering the simplest possible case. Specifically, it was assumed that the SRPAs, which can handle the request satisfying all requirements of the requestor class, offer exactly the same contract to the evaluator SRRAs (the same service/resource characteristics with exactly the same terms and conditions). In such a case, the service/resource provider selection is reduced to choosing the one with the highest reputation value (second factor contributing to equation (7)), since the overall satisfaction stemming from the proposed contract (expressed by the first factor of equation (7)) contributes to the objective function value the same amount for all candidate SRPs.
530
M. Louta Table 1. Set of inappropriate SRPs for each SRR class
SRR Class R1
Inappropriate SRPs -
R2
P7 , P8 , P9 , P10
R3
-
R4
P7 , P9
R5
P2 , P5 , P7 , P8 , P9 , P10
R6
P3 , P5 , P7 , P8 , P9 , P10
R7
P2 , P3 , P4 , P5 , P7 , P8 , P9 , P10
R8
P3 , P5 , P7 , P9 , P10
R9
P3 , P5 , P7 , P9 , P10
R10
-
In Table 2 the result of SRPs ranking with respect to their overall reliability (estimated on the basis of equation (1)) after the learning period is presented for each SRR class, which reflects whether the SRPs usually meets the quality expectation raised (or promised) by the contract proposed. In order to test this aspect, each SRP has been associated with a reliability probability, i.e., a measure of the likelihood that the SRP delivers the service compliant with the agreement established. This probability has been set to values illustrated in Table 3. Specifically, with probability 0.9 SRP P5 Table 2. Service Resource Providers Reliability Ranking
SRR Class
R1
1 P5
Service Resource Providers Reliability Ranking 2 3 4 5 6 7 8 9 P1 P8 P7 P10 P3 P6 P2 P9
R2
P5
P1
P3
P6
P2
P4
R3
P5
P1
P8
P7
P3
P6
P10
R4
P5
P1
P8
P3
P6
P10
P2
R5
P1
P3
P6
P4
R6
P1
P6
P2
P4
R7
P1
P6
R8
P1
P8
P6
P2
P4
R9
P1
P8
P6
P2
P4
R10
P5
P1
P7
P8
P6
P3
P10
10 P4
P9
P2
P4
P9
P2
P4
Towards a Collaborative Reputation Based Service Provider Selection
531
Table 3. Honesty probability associated with each SRP
SRP P1
Honesty Probability 0.8
P2
0.4
P3
0.6
P4
0.2
P5
0.9
P6
0.6
P7
0.7
P8
0.7
P9
0.4
P10
0.6
complies with its promises, whereas P4 maintains its promises with probability 0.2. A mixture of extreme and moderate values have been chosen in order to test the schemes under diverse conditions. Figure 1 illustrates the reputation rating of each SRP, as estimated by SRR class R1 , after the learning period. During the learning period, in order for the SRRAs to obtain fundamental information for the reliability related behavioural aspects of the
SRPs Reputation Ratings for SRR class R1 after the learning period 1 0,9
Reputation Ratings
0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
SRPs
Fig. 1. Reputation Ratings for all SRPs as formed by SRR class R1 after the learning period
532
M. Louta
SRPAs, each SRRA selects SRPAs for the service / resource provisioning on a roundrobin basis (e.g., the service / resource requests are served by iterating the candidate SRPs list). We considered that an accurate picture of the SRPs behaviour is formed by an SRRA, after a number of transactions (e.g., 100) have been conducted with each candidate SRP. Finally, all witnesses are assumed to behave honestly. Thus, TLRi ( R x ) = 1 for all witnesses. As may be observed from Table 2 (and Figure 1), considering SRR class R1 , the most appropriate SRP is P5 (ranked first), followed by SRP P1 (ranked second), followed by SRP P8 (ranked third), followed by P7 , P10 , P3 , P6 , P2 , P9 , while the SRP P4 occupies the 10th ranking position. Empty spaces in Table 2 are attributed to the fact that for a specific SRR class, there may be SRPs that do not offer the required quality/quantity level for the service/resource as requested (i.e., inappropriate SRPs). It may easily be concluded that each of the SRPs P5 and P1 handle 50% of the requests originating from all SRRs classes, because of their suitability in adequately serving 5 out of 10 SRR classes (that is, they offer the required quality/quantity level for the service/resource as requested by the SRR classes in a more reliable manner with respect to the rest SRPs). Specifically, SRP P5 is more reliable than P1 , but is characterized as inappropriate for SRR classes R5 , R6 , R7 , R8 , R9 , whereas P1 offer the requested service / resource characteristics for all SRR classes. Additionally, one may observe slight differences in the SRP ranking position for different SRR classes. As an example the difference between SRR classes R1 and R3 should be noted. Specifically, for SRR class R1 , the 5th, 6th , and 7th positions are occupied by SRPs P10 , P3 and P6 respectively, while for SRR class R3 , 5th, 6th, and 7th positions are occupied by SRPs P3 , P6 and P10 . These changes may be attributed to the fact that SRPs P3 , P6 and P10 are associated with the same honesty probability. At this point it should be noted that, in order to take into account new SRPs that enter the system and/or potential changes on SRPs’ reliability related behavioural aspects, the SRR decision on the most appropriate SRP is based on a random scheme after a prespecified number of transactions has been completed in the system, until possible outdated information is updated. In general, the proposed scheme exhibits a better performance with respect to the random SRP selection, which on average is 30%.
7 Conclusions Two main entity related categories may be identified in dynamic ubiquitous computing environments: the Service Resource Requestors (SRRs) wishing to use services and/or exploit resources offered by the other system entities and the Service Resource Providers (SRPs) that offer the services/resources requested. In this study, under the assumption that a number of SRPs may handle and serve the SRRs requests, the SRRs are enabled to decide on the most appropriate SRP for the service / resource requested on the basis of a weighted combination of the evaluation of the quality of their offer (performance related factor) and of their reputation rating (reliability related factor).
Towards a Collaborative Reputation Based Service Provider Selection
533
Specifically, a reputation mechanism is proposed which helps estimating SRPs trustworthiness and predicting their future behaviour, taking into account their past performance in consistently satisfying SRRs’ expectations by honouring or not the agreements they have established with the SRRs. The reputation mechanism is distributed, considers both first-hand information (acquired from the SRR’s direct past experiences with the SRPs) and second-hand information (disseminated from other SRRs’ past experiences with the SRPs), while it exhibits a robust behaviour against spurious reputation ratings. The reputation framework designed has been empirically evaluated and has performed well. Initial results indicate that the proposed SRP selection scheme (based only on the reliability related factor) exhibits a better performance with respect to random SRP selection, which is on average 30%, in case honest feedback provision is considered from the vast majority of the witness population. Future plans involve our frameworks’ extensive empirical evaluation considering witnesses’ intentional provisioning of false feedback, as well as its comparison against existent reputation models and trust frameworks.
References 1. Jennings, N., Faratin, P., Lomuscio, A., Parsons, S., Sierra, C., Wooldridge, M.: Automated Negotiation: Prospects, Methods, and Challenges. International Journal of Group Decision and Negotiation 10(2), 199–215 (2001) 2. Louta, M., Roussaki, I., Pechlivanos, L.: An Intelligent Agent Negotiation Strategy in the Electronic Marketplace Environment. European Journal of Operational Research 187(3), 1327–1345 (2006) 3. Louta, M., Roussaki, I., Pechlivanos, L.: Reputation Based Intelligent Agent Negotiation Frameworks in the E-Marketplace. In: International Conference on E-Business, Setubal, Portugal, pp. 5–12 (2006) 4. Callas, J., Donnerhacke, L., Finney, H., Shaw, D., Thayer, R.: OpenPGP Message Format. RFC 4880, IETF 2007 (2007), http://www.ietf.org/rfc/rfc4880.txt 5. Cooper, D., Santesson, S., Farell, S., Boeyen, S., Housley, R., Polo, W.: Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile. Internet Draft, IETF 2007 (2007), http://www.ietf.org/internet-drafts/draft-ietf-pkix-rfc3280bis09.txt 6. Atif, Y.: Building Trust in E-Commerce. IEEE Internet Computing Magazine 6(1), 18–24 (2002) 7. Zacharia, G., Maes, P.: Trust management through reputation mechanism. Applied Artificial Intelligence Journal 14(9), 881–908 (2000) 8. He, M., Jennings, N., Leung, H.: On agent-mediated electronic commerce. IEEE Transactions on Knowledge and Data Engineering 15(4), 985–1003 (2003) 9. Li, H., Singhal, M.: Trust Management in Distributed Systems. IEEE Computer 40(2), 45– 53 (2007) 10. Sabater, J., Sierra, C.: Review on Computational Trust and Reputation Models. Artificial Intelligence Review 24(1), 33–60 (2005) 11. eBay, http://www.ebay.com 12. OnSale, http://www.onsale.com/exchange.htm
534
M. Louta
13. Zacharia, G., Moukas, A., Maes, P.: Collaborative Reputation Mechanisms in Electronic Marketplaces. In: 32nd Hawaii International Conference on System Sciences, Los Alamitos, CA, USA, pp. 1–7 (1999) 14. Xiong, L., Liu, L.: Reputation and Trust. In: Advances in Security and Payment Methods for Mobile Commerce, pp. 19–35. Idea Group Inc. (2005) 15. Yu, B., Singh, M.: A social mechanism of reputation management in electronic communities. In: Klusch, M., Kerschberg, L. (eds.) CIA 2000. LNCS (LNAI), vol. 1860, pp. 154– 165. Springer, Heidelberg (2000) 16. Yu, B., Singh, M.: Detecting Deception in Reputation Management. In: 2nd International Joint Conference on Autonomous Agents and Multi-Agent Systems, Melbourne, Australia, pp. 73–80 (2003) 17. Raiffa, H.: The Art and Science of Negotiation. Harvard University Press, Cambridge (1982) 18. Roussaki, I., Louta, M., Pechlivanos, L.: An Efficient Negotiation Model for the Next Generation Electronic Marketplace. In: IEEE Mediterranean Electrotechnical Conference, Dubrovnik, Croatia, pp. 615–618 (2004)
Petri Net-Based Episode Detection and Story Generation from Ubiquitous Life Log Young-Seol Lee and Sung-Bae Cho Dept. of Computer Science, Yonsei University 262 Seongsanno, Seodaemun-gu, Seoul 120-749, Korea [email protected], [email protected]
Abstract. As mobile devices improve their performance in ubiquitous spaces, they have a variety of applications with more information. Active exploration follows for collecting and utilizing the information accumulated in ubiquitous environment. If user information in mobile devices can be summarized in a more intuitive and interesting form, like cartoon, it can help to share his experience with other people, and recall his meaningful memory. In this paper, we propose a method that organizes a story and generates cartoons using the information collected in ubiquitous environment. We generate cartoons with Petri net-based method, which describes causal relationship between experienced events and the precondition of events. Mobile information used in the experiment is collected from two female undergraduate students for two weeks. We confirm the potential of the proposed method for ambient intelligence through the analysis of the results.
1 Introduction Recently, improved performance of mobile devices and increasing diversity in mobile application allow us to accumulate user information in ubiquitous environments. There are also active research for collecting and exploiting the information. Raento et al. has developed a framework for gathering user information in smart phone [1]. This framework collects GSM Cell ID, Bluetooth, GPS records, telephone call logs, short message service usages, multimedia and so on and sends them to server. The contexts in the research are used as annotations of picture or media. Panu et al. compiled logs and extracted features in mobile environment [2]. They used a mobile device with GPS device, microphone, temperature, moisture and illumination sensors. On the other hand, researchers tried to help user recall memory through using ubiquitous information. Aizawa and Hori developed an application which recorded lifelog video using a small wearable camera and used ubiquitous information as a key to retrieve necessary part from video data [3]. Gemmell, Gordon and Lueder suggested a personal database system, MyLifeBits [4], which collected quite a lot of information about a person from SenseCam [5] and PC and also constructed personal database to find necessary information easily. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 535–547, 2008. © Springer-Verlag Berlin Heidelberg 2008
536
Y.-S. Lee and S.-B. Cho
As quantity of the information grows larger, need for organization of accumulated information gets bigger. If we organize the ubiquitous information in a more intuitive and interesting form such as cartoon or narrative, it can help us recall our memory or share our interesting experience with others. Our previous work, AniDiary (Anywhere Diary), collects mobile contexts, infers memory landmarks and generates cartoons [6]. The cartoon represents memorable events in daily life. The system shows good performance with artificial data but it does not always produce good results with real life log. It has some problems such as repeated cartoons, lack of diversity of cartoons, and few generated cartoon. AniDiary generates a story of cartoons based on landmarks and the relationships between them in Bayesian networks. Landmark is connected to other landmarks or contexts with causal relationships. Because two or more landmarks associated with causality are similar semantically, character and background images of cartoons for the landmarks become similar, too. Moreover, it is difficult to generate different text messages for the cartoons. In this paper, we propose a Petri net-based method for organization of a story and generation of cartoons. Mobile contexts have fragmentary information collected in ubiquitous spaces. Enumeration of the information is not enough to generate a story. In order to generate an effective story, we define causal relationships between the information and organize them with prior knowledge. Here, we create a story from combination of various episodes. The episodes are described with causal relationships between experienced events and the precondition of them. An episode also can include undetected events which may happen. This paper is organized as follows. In section 2, we introduce the related works and our previous work, AniDiary. Section 3 explains the proposed method and then section 4 describes experimental results. Finally, section 5 mentions summary and further studies.
2 Related Works 2.1 Ubiquitous Information Collection and Usage There are many studies in ubiquitous information collection. VTT research center presented a uniform framework of mobile software that provided systematic methods for acquiring and processing useful context information [7].The framework allowed people to recognize semantic contexts in real time and deliver contexts for the client applications in an event-based manner. Mantyjarvi developed adaptive UI application control using the framework and fuzzy context [8]. Carnegie Mellon University built on SenSay, which was a context-aware mobile phone that adapted to dynamically changing environmental and physiological states [9]. It informed caller the status of SenSay user. University of Helsinki and Helsinki Institute for Information Technology developed Contextphone, which was a software platform that collected user contexts. It was provided as a set of open source C++ libraries and source code components [1].
Petri Net-Based Episode Detection and Story Generation
537
Most of the previous personal data management systems were using ubiquitous information as one of useful indexes. Context-based video retrieval system recorded life-log video with ubiquitous contexts, and used them to search parts of video [3]. People used contexts from wearable sensors for search, and then the system replies the candidates of video. MyLifeBits was an implementation of personal record database system [4]. In this system, personal information collected from PC and SenseCam were stored in a relational database system, MS SQL server, with relationships among personal information. 2.2 AniDiary Our previous work, AniDiary, collected logs from smart phone and pre-processed logs [6]. It logged photographs, MP3 music files, short-message-service use, monitoring phone's battery, call history and so on. Useful contexts were extracted from them through statistical analysis such as data frequency, intensity, lasted time, etc. It used modular Bayesian networks designed by domain ontology to find memorable events. These memorable events were named landmarks. We generated a story of cartoons as composition of the events.
Fig. 1. System overview
538
Y.-S. Lee and S.-B. Cho
3 The Proposed Method In this paper, we propose a method that generates a story using mobile contexts. The whole diagram of the cartoon generation system is depicted in Figure 1. The whole system consists of context collection, landmark inference, story generation and cartoon visualization. In context collection stage, we collect logs from mobile device, preprocess logs and convert them to contexts. We infer user's emotions or activities through Bayesian network. Here, they are named landmarks. Context collection and landmark detection are the same as our previous work, AniDiary. The proposed method is alternative story generation presented in Figure 2. Our story generation consists of 3 stages: episode selection, selected episodes connection and visualization using cartoons. 3.1 Definition of Basic Concepts In this paper, 'story' is a sequence of events that can abstract user's daily life. Episode is a subsection of story, which has a topic related to a memorable event and various events. Landmark is user's memorable emotion or activity inferred from mobile contexts. Event is one of components of an episode, which represents human behavior or experience. Main event is an event that corresponds with a landmark. Episode is the kernel of our story generation system. Story has more than one episode. The topic of episode can be any of human's event or activity or experience. Table 1 shows examples of topic in an episode. We refer multilingual lexical semantic network [10] for the class and the topic. Table 1. Summary of basic concepts
Classification Story Episode Landmark Event Main event
Definition Abstraction of daily life, set of episodes Subsection of story, set of events related to a topic Emotion or activity inferred from mobile context Human behavior or experience An event that corresponds with a landmark Table 2. Classes and topics of episodes
Class Entertainment Travel / viewing Workplace / school Military service Living / household chores Business / trade Friendship / social gathering Religion Congratulation / funeral
Topic Soccer, basketball, baseball, game, etc. Travel, dancing, drama, musical, etc. Retirement, vacation, graduation, etc. Entrance, discharge, etc. Make up, bathing, meal, cleaning, etc. Shopping, eye shopping, etc. Farewell, reconciliation, competition, etc. Worship, prayer, etc. Marriage, divorce, funeral, etc.
Petri Net-Based Episode Detection and Story Generation
539
Stage 1 : Episode Selection Landmark 1
Landmark 2
Landmark 3
Landmark 4
Landmark N
Episode 1
Episode 2
Episode 3
Episode 4
Episode N
Episode 3
Episode 4
Episode N
Stage 2 : Episode Connection Episode 1
Episode 2
Stage 3 : Visualization
Comic 1
Comic 2
Comic 3
Comic 4
Comic 5
Comic 6
Comic 7
Comic 8
Comic M-1
Comic M
Fig. 2. Story generation process
Class Topic 1 Event 1
Topic 2
Event 2
Event 3
Event 4
Topic 3 Event 5
Event 6
Fig. 3. Hierarchy of topics and events
3.2 Story Representation Using Petri Net Representation of story needs the following prerequisites. First, story must be created from mobile contexts. Although the quantity of mobile contexts is large, there are a few contexts directly related to user's experience. Second, generated story coincides with user's experience. Therefore, we need to insert user's experience into story. Third, the method has ability to denote causal relationship between events in a story. We express a story as a network of events. Here, we use our own script in order to represent a story. It is based on Petri net [11]. Petri net is more effective for representing parallel events or various situations
540
Y.-S. Lee and S.-B. Cho
formally than story grammar [12], story tree [13] and story graph [14]. Also, it is possible to diversify the flow of story easily as mobile contexts. Most script-based approach for story generation changes the flow of story in specific branch. Therefore, the flow of story is always linear. They have difficulties to generate various stories. In real life, more than two events often occur at the same time, but script-based methods generally cannot process it. Petri net can represent more than two parallel events that occur at the same time. In this paper, we define story as a connection of various episodes : Story = {Episode1, Episode2, ... , Episoden }. Petri net has ability enough to represent an episode, which can clearly describe temporal and causal relationship between events as well as preconditions related to user's context or previous event. It can also naturally express transmission of the result of an event by transferring tokens from result of the event to precondition of other events. Table 3 shows the components of Petri net in episode representation. Table 3. Elements of Petri net for episode representation
Picture
Name place transition arc (edge)
token
Description Condition causally required by action or event. Result or status after the event. User’s action or event. Link between places and transitions. Representation of input or output functions which connect user's action and status. The representation of current status of episode. A mean for transmission of result of episode to posterior episodes.
Fig. 4. An example of episode represented using Petri net
Petri Net-Based Episode Detection and Story Generation
541
Figure 4 is an example of an episode represented by Petri net. In Figure 4, the episode begins with inserting a token at the initial place in the episode. Main event is connected to topic of episode, which requires one or more landmarks. The result of previous episode changes the flow of current episode. Because the result of the previous episode changes the current flow, the whole story can be changed by user's previous experience. Finally, the result of current episode is stored by inserting tokens in result place. 3.3 Modified Story Representation Episode represents events, conditions of occurring events, and result of events. Episode template has classification id of topic, unique id, episode name, landmark related to main event, branch of flow of story, and status of processing all events of an episode. Class = {ClassId, ClassName} Episode = {Class, EpisodeName, CoreSet, BranchSet, StatusSet, EventSet, ArcSet} When we use original Petri net for episode representation, all preconditions are place. However, it can be departmentalized to landmark, user context, result of episode, condition of event, result of episode, and precondition of selecting an event. Landmarks are used as precondition of occurring main event. We denote them as core node in the proposed method. The number of tokens in core node is the strength of influence of a landmark. CoreSet = {Core1, Core2, ..., Coren} Core = {EpisodeId, LandmarkName, NumToken} Result of previous episode affects divergence of current episode, which is status node in the method. The number of tokens in status node decides maintenance period of status in status queue. StatusSet = {Status1, Status2, ..., Statusn} Status = {EpisodeId, ResultName, NumToken} Status stored in status queue is used to branch story flow of posterior episode. Posterior episode describes necessary information for branching story off. It is shown as a branch node. BranchSet = {Branch1, Branch2, ..., Branchn} Branch = {EpisodeId, StatusName} Sometimes, occurrence of more detailed event requires mobile contexts, such as place, human and so on. For example, ‘shopping in department’ requires place. Those events need a method to describe directly the conditions for context. It is context node. The number of tokens affects the strength of the context. ContextSet = {Context1, Context2, ..., Contextn} Context = {EpisodeId, ContextName, NumToken} Key factor of story is event. Other nodes such as status, core, branch, and context are precondition of occurrence of an event. Priority of event node decides which event occurs in a conflict condition.
542
Y.-S. Lee and S.-B. Cho
EventSet = {Event1, Event2, Event3, ..., Eventn} Event = {EpisodeId, EventName, Priority} The priority of event is decided by the number of conditions for occurrence of the event. For example, when 'shopping with friends' and 'shopping' are detected at the same time, we decide to occur 'shopping with friends,' which requires one more condition than 'shopping.' The condition is that the result of previous episode was 'meeting with friends.' To sum up, place is a general condition for general event. Core is condition for main event connected to user's activity or emotion related to landmark. Status is storing result of a main event of an episode. Branch is story branch based on status of previous episode. Finally, context is condition directly related to mobile context extracted from statistical analysis of raw data. Table 4 summarizes condition of events in episode. Arc node connects event to preconditions. It has the same as arc in original Petri net. ArcSet = {Arc1, Arc2, ..., Arcn} ArcTemplate = {EpisodeId, ArcID, SrcType, SrcID, DstType, DstID} 3.4 Selection and Connection of Episodes The first stage of story generation is selection of episode that represents user's experience based on landmarks. For selection, episode includes a main event related to a landmark. Figure 5 shows the structure of episode. In order to reflect user's experience collected from mobile device, an episode organizes causally connected events with a main event as the center. Episode can include undetected events as well as main event inferred from mobile context. It is necessary to include undetected events, which are able to occur causally, in episode for coherence of story. Undetected events help to make natural story. Result of selected episode affects posterior flow of episodes. Table 4. Preconditions for events in episode
Name Place Core Status Branch Context
Meaning Premise or precondition for an event Precondition for a main event Result of a main event of an episode Precondition for branching story Precondition for a more real event
People experience various events in daily life. These events are independent and inconsistent. Therefore, a story consists of various episodes for abstraction of daily life. Selected episodes are connected with temporal sequence. Result of prior episode affects posterior episode. Because most script based approaches select current story branch by considering previous result. Once a branch is selected, prior result cannot affect any posterior episode. For example, story tree, one of script based approaches, connects leaf node of previous tree to root node of next tree. Result of previous tree can affect current story tree but not posterior tree. The proposed method uses status queue for storing results of episodes. Figure 6 shows the process of status transmission between episodes.
Petri Net-Based Episode Detection and Story Generation
Land mark
Data
Core
Context
543
Episode
Place 1
Main Event
Place 2
Branch
Event 1
Event 2
Place
Event 3
Status
Result
Prior Result
Fig. 5. Episode structure
Episode 1
Episode 2
Result
Episode 3
Status Queue
Episode 4
Branch
Fig. 6. Transmission of status between episodes
3.5 Cartoon Generation Visualization for story helps user's intuitive understanding. Cartoon is generated by combination of character image, background image and text. Table 5 shows an example of cartoon images of story of Figure 7. Visualized images are evaluated easily for subjective tests or comparison. <events date="20060309"> <event name="go to movie theater by subway" /> <event name="watching movie" /> <event name="take a bus" /> <event name="shopping in mart" /> Fig. 7. XML representation for story
544
Y.-S. Lee and S.-B. Cho Table 5. An example of cartoons generated
Going to Movie
Watching movie
Taking a bus
Shopping
4 Experimental Results 4.1 Environment In this paper, we use landmarks and mobile contexts collected from two female undergraduate students for two weeks [15]. They are used to select episodes and generate a story. Table 6 shows the summary of topics in 32 episodes. If prior episode and posterior episode are the same, a story includes the episode only one time and increases the influence of result of episodes. Also, a series of episodes without result are ignored. Subjects bring with smart phone connected to GPS device. They report visited places, activities, and emotion as the summary of daily life. Table 6. Topics of Episodes used in experiment
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Name Viewing Using means of transportation Traffic jam Study at a late hour Dance hall Meeting Taking a photo of goods Take a walk Washing up Take a photo of myself Shopping Taking lessons Spam SMS Meal Indoor exercise Eating out
No. 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Name Cooking Exercise Taking a photo of food Study while listening to music Using subway Doze Taking a photo with joy Conversation by telephone Annoyance SMS Having tea Using computer Scenery place Take a photo of scene Leisure hours Alone Study alone
Petri Net-Based Episode Detection and Story Generation
545
4.2 Results Table 7 is comparison between subject’s report and generated story on a specific day. That day, the subject takes an express bus and went to hometown. Time, place, and experience are extracted from the subject’s report. It shows generated cartoons and corresponding experience. Most trivial events such as conversation by telephone, short-message-service and listening music are not included in the user’s report. Subjects tend to ignore them. They have more interests about event related to user’s activity or emotion. Table 7. Comparison between report and cartoon
Time 00:40 12:00 16:30 17:10
Place Home Home Bus station 1 Restaurant
18:30
Bus station 2
19:00
Subway station 1
19:50
Experience Go to bed Wake up Go to restaurant by bus Meal with family
Generated Cartoon -
Go to subway station
Go to station near terminal Arrive at the subway station Subway station 2 2
20:05
Express terminal Go to hometown by express
20:40
Express terminal
Traffic jam
22:20
Hometown
Arrive at hometown
-
-
546
Y.-S. Lee and S.-B. Cho
On the whole, generated cartoons are fewer than events in the report. Generated cartoons miss many events in report. It is due to insufficient contexts and lack of diversity of events in episodes. If user adaptive episodes are included, generated story has more events related to report and mirrors user’s daily life better. We calculate precision for checking coincidence between cartoon and user report. Cartoon consists of character’s activity and background place. If both activity and place of cartoon are coincident with the report, accuracies of the cartoon are evaluated as 1 point. If only one activity or place is accurate, we consider its accuracy as 0.5 point. precision = sum of accuracy of cartoons / number of cartoons Subject 1’s precision is 27% and subject 2’s precision is 43%. If meaningless events such as SMS or telephone call are included, the precisions increase to 59% and 67%, respectively.
5 Conclusion and Future Work In this paper, we attempt to create story and generate cartoons from mobile contexts. Story is a sequence of events with temporal or causal relationship. We propose a method that selects episodes and organizes them to a story. The experiments show the potential of the proposed method to generate story in real life log. Generated cartoons are compared with user’s report in precision. The method needs further improvements such as increase of types of context, accuracy of landmark detection, and the number of episodes supported. Also, we would enhance the episode composition algorithm. More contexts and episodes will improve the precision and diversity of generated cartoons. If user can design his own episode, it can help to generate a story appropriate to user’s preference. Finally, user modeling can also help user’s story.
Acknowledgements This research was supported by MKE, Korea under ITRC IITA-2008-(C1090-08010046).
References 1. Raento, M., Oulasvirta, A., Petit, R., Toivonen, H.: ContextPhone - A prototyping platform for context-aware mobile applications. IEEE Pervasive Computing 4(2), 51–59 (2005) 2. Panu, K., Jani, M., Juha, K., Heikki, K., Esko-Juhani, M.: Managing context information in mobile devices. IEEE Pervasive Computing 2(3), 42–51 (2003) 3. Aizawa, K., Hori, T.: Context-based video retrieval system for the life-log applications. In: Proc. of MIR 2003, pp. 31–38. ACM, New York (2003) 4. Gemmell, J., Bell, G., Lueder, R.: MyLifeBits: A personal database for everything. communications of the ACM 49(1), 88–95 (2006)
Petri Net-Based Episode Detection and Story Generation
547
5. Smeaton, A.F., O’Connor, N., Jones, G., Gaughan, G., Lee, H., Gurrin, C.: SenseCam visual diaries generating memories for life, Poster presented at the Memories for Life Colloquium 2006 (2006) 6. Cho, S.-B., Kim, K.-J., Hwang, K.-S., Song, I.-J.: AniDiary: Daily cartoon-style diary exploits Bayesian networks. IEEE Pervasive Computing 6(3), 66–75 (2007) 7. Korpipää, P., Mantyjarvi, J., Kela, J., Keranen, H., Malm, E.-J.: Managing context information in mobile devices. IEEE Pervasive Computing 2(3), 42–51 (2003) 8. Mantyjarvi, J., Seppanen, T.: Adapting application in handheld devices using fuzzy context representation. Interacting with Computers 15(4), 521–538 (2003) 9. Siewiorek, D., Smailagic, A., Furukawa, J., Krause, A., Moraveji, N., Reiger, K., Shaffer, J., Wong, F.: Sensay: A context-aware mobile phone. In: IEEE International Symposium on Wearable Computers, pp. 248–249 (2003) 10. Choi, K.-S.: CoreNet: Chinese-Japanese-Korean wordnet with shared semantic hierarchy. Natural Language Processing and Knowledge Engineering, 767–770 (2003) 11. Peterson, L.J.: Petri Net Theory and The Modeling of Systems. Prentice-Hall (1981) 12. Propp, V.: Morphology of the Folktale, University of Texas Press (1968) 13. Correira, A.: Computing story trees. American Journal of Computational Linguistics 6(34), 135–149 (1980) 14. Riedl, O.M., Young, M.R.: From linear story generation to branching story graphs. IEEE Journal of Computer Graphics and Animation 26(3), 23–31 (2006) 15. Hwang, K.-S., Cho, S.-B.: Modular Bayesian networks for inferring landmarks on mobile daily life. In: The 19th Australian Joint Conference on Artificial Intelligence, pp. 929–933 (2006)
Protection Techniques of Secret Information in Non-tamper Proof Devices of Smart Home Network Abedelaziz Mohaisen1 , YoungJae Maeng2 , Joenil Kang2 , DaeHun Nyang2 , KyungHee Lee3 , Dowon Hong1 , and JongWook Han1 1
3
Electronics and Telecommunication Research Institute, Daejeon 305-700, Korea {a.mohaisen,dwhong,hanjw}@etri.re.kr 2 Information Security Research Laboratory, INHA University, Incheon, Korea {brendig,dreamx}@seclab.inha.ac.kr, [email protected] Electrical and Computer Engineering Department, Suwon University, Suwon, Korea [email protected]
Abstract. The problem of revealing secret information in home network becomes critical when a physical capture of single device or more happen where the attacker can statically analyze the entire device’s memory. While the trusted platform module that assumes a tamper proof chip in each device is not a choice, we investigate other software-based solutions. This paper introduces several mechanisms and schemes in varying scenarios for secret information protection in non-tamper proof devices of the smart home environment. The mechanisms provided herein utilize the existence of several algorithms and techniques and building blocks that do not require an extra hardware while they are computation efficient on the typical home network devices. To demonstrate the value of contributions, an extensive analysis for different scenarios including security and cost estimations are provided. Keywords: home network, secret information, authentication, code attestation, secret sharing, security and system integration.
1
Introduction
Home Network (HN) is composed of diversity of devices which have the ability to communicate with each other internally and with other devices in other networks externally through a control point (CP). In the communication-enabled HN devices, the user’s input affects devices behavior in the HN model for ease of accessibility and to provide a living-convenient environment [1]. In such environment, secret information stored and maintained in each device is used for several security purposes including device authentication, encryption and/or decryption of the internally or externally exchanged data via the CP. Generally speaking, revealing the secret information in the HN may include critical damages and threats for the privacy of part or the whole of the network. Thus, HN designers and operators carefully need to maintain the stored secret information in HN F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 548–562, 2008. c Springer-Verlag Berlin Heidelberg 2008
Protection Techniques of Secret Information
549
devices under several expected and unexpected attacking scenarios including the physical attack. One of the solutions to prevent physical attack is performed using the Trusted Platform Module (TPM) [2]. The usage of the TPM requires integrated hardware and operating system’s supports which both complicate the overall design and structure of the HN device. Technically, once the TPM is used on an HN device the device price will be greatly increased. In addition, even though the TPM provides no chance for any successful malicious attack to be performed, it is not always guaranteed and sometimes undesirable to provide a TPM for each single device in an HN with diversity of devices of ranging prices and capabilities. Based on that, we assume the TPM is not an option in HN environment. In an HN with non-TPM devices, an attacker does not have any difficulty to extract secret information from these devices. Even though secret information is encrypted and the key for decryption is stored outside of the device, once we the key stored externally needed to be restored for a decryption operation, the device itself need to be authenticated by the external key storing device. To overcome this reiterated problem, as the TPM scenario is out of question, we introduce a solution based on secure cryptographic functions which are already in use. In this paper, we introduce a scheme to protect secret information in a software fashion. We revise several existing building blocks (such like RSA, secret sharing, code attestation) for protecting the secret information without any additional hardware components. Our schemes utilize these blocks in cooperation with other novel schemes as well and show a satisfactory performance and resources consumption as will be shown. On the side of the paper structure, section 2 introduces the assumptions, terms elucidations and contribution followed by our scheme in section 3 and section 4. A comparison between our introduces schemes demonstrating their advantages and limits followed by a hybrid scheme that merges both of the schemes is shown in section 5. Security evaluation and cost estimation are discussed in section 6 Finally, we conclude by concluding remarks in section 7.
2 2.1
Definitions, Assumptions and Contributions Definitions and Building Blocks
Definition 1 (Secret Information - SI). is a manipulated data that provides benefit to someone where revealing this data may cause damage to its owner. Also, the indirect influence of the secret information exists where it is difficult to identify the scale of potential benefit or damage. As an example, a party’s communication leads other party’s benefit or damage [3] therefore the exchanged data between parties is secret information at most. SI in this paper means information which has a direct impact on some entity when exposed. This SI can be any of the following: a private key which is used for public key infrastructure, secret key of symmetric cryptography, bio-informatics
550
A. Mohaisen et al.
key, or authentication key for a HN’s device. Certainly, valuable information for only its existence such like private information is also SI. Definition 2 (Software-Based protection [4]). is a method used for twisting the the secret information by modifying or manipulating this information through statically changed software. In untrusted environment, an attacker can physical capture a device with full control of its resources including the memory and thus reveal any SI. Based on that, we cannot protect SI statically stored in local memory under software-based method [4]. For the same reason, we extend software-based method in which not only the locally stored information on the same device but also other networks components may affect the overall security of the locally stored SI based on the network status. For example, a device may delete the SI to minimize the possible damage if it is informed of a possible attack. Definition 3 (RSA Cryptosystem [5]). In RSA, let p and q be two large enough prime numbers, n = pq and φ(n) = (p − 1)(q − 1), SKc is chosen such that 1 < PKc < φ(n) and SKc is chosen to satisfy SKc PKc ≡ 1 mod (φ(n)) where SKc is CP’s private key and PKc is CP’s public key. Definition 4 (Secret Sharing [6]). Shamir’s secret sharing is defined as follows: Let f (x) be a polynomial of degree t − 1 defined as f (x) = s0 + a1 x1 + a2 x2 + · · · + at−1 xt−1 where s0 is the initial secret. Partial secrets are computed as f (s1 ), . . . , f (sn ) and distributed for different parties. The recovery of the initial secret s0 (and therefore the polynomial f (x)) requires a collusion between a group of n nodes where n ≥ t. That is, f (x) can be reconstructed by the linear regression iff (if and only if ) the number of linearly independent components from the construction of f (x) known to a single user is as big as t or greater. Definition 5 (Diffie-Hellman key exchange [7]). Let g ∈ Zp be a generator where p is a large enough prime. A shared key between two parties P1 and P2 is established according to the following interaction: P1 g, p a ← Zp A = g a mod p
P2 g, p b ← Zp B = g b mod p A
−→ B
←− KA = B a mod p
KB = Ab mod p
Note that KA = KB = (A)b = (g a )b = (B)a = (g b )a = g ab mod p. Definition 6 (Code Attestation). A code is verified in code attestation by comparing the code (or representative code of the code) to other copy on a trusted party. That is performed through the CP by sending an attestation command that enforce the execution of attestation code on the remote device. Examples of these attestation methods are detailed in [8,9,10].
Protection Techniques of Secret Information
2.2
551
Assumption
Throughout this paper, the following assumptions are used. Note that these assumptions are based on the architecture of home network in Fig. 1 representing the part of unit household. 1) The devices in the home network can communicate between each other and pass operation commands to devices within the same network or other devices that are in another network. The communication with devices in other networks is carried out through the control point. The control point (CP) is a special device (or combination of devices) which is connected community network via a 10/100 Ethernet infrastructure (see Fig. 1) 1 [11,1]. 2) The different devices within the same HN can communicate between each other in a single hop communication method without getting through the CP. Thus, modifying messages between two communicating parties is hard to be carried out unless one of the two parties is compromised. This assumption is rationalized according to several other networks when communicating in a single hop (e.g., wireless sensor networks [12]). 3) An adversary needs at least a period of time longer than a threshold (say, τ seconds) to compromise a device successfully [13]. Also, to analyze the contents of physically compromised HN device, the adversary need to perform this out of the scope of the HN itself (offline analysis). 4) In most cases, the CP is trustworthy but it does not hold any secret information since we cannot guarantee the CP’s safety against unknown attacks. This makes the information secure even when the CP is compromised by an attacker2. 5) The communication traffic between two devices is encrypted using session key which is obtained by Diffie-Hellman Key Exchange (def. 5) except for few specific messages as will be shown later. 6) The CP and other devices have certificates based on RSA public key cryptosystem (def. 3). 2.3
Contributions
We investigate SI protection schemes for HN considering devices without TPM. Our contribution includes four parts which are detailed as follows – First, we introduce a novel scheme based in which CP’s broadcast periodical messages to identify the different devices in the network as safe devices or not (based on whether the network is under attack or not). – Second, we introduce a scheme based on Shamir’s secret sharing [6] and the code attestation method [9] combining their security levels for distributing SI into shares among different HN devices and authenticating each share in the different devices by authenticating the holding device itself using the code attestation method [9]. 1 2
The CP in our case is a server connected to a gateway. Indeed, the compromise of the CP will affect the communication links to other devices in other networks passed through the CP.
552
A. Mohaisen et al.
Community network
PC
Mobile
Interface
Community common section
Streaming SVR
Network Camera
Community health care solution
Unit household
HomeNet PDP
Door security camera
DTV portal HA System
10/100 Ethernet Community network
Community healthcare solution Iris recognition/ Common entrance Management office Parking management
Community server
HomeNet Appliances
Others!
Automatic door IP Phone Remote Unmanned meter home reading delievery
Sensors
Gateway HomeNet Server Magic Mirror
HomeNet Pad (wireless)
Others Household health checker (wireless)
Emergency Solutions
Fig. 1. The typical architecture of home network representing the premium HomeNet solution from LG ElectrnonicsTM . The control point (CP) represents the combination of the HomeNet server and the gateway. Our proposed schemes are concerned with the unit household part.
– Third, as several advantages in each scheme comes out over the other scheme, we introduce a comparison between the two proposed schemes considering their different characteristics, advantages and disadvantages. – Fourth and last, we merge both of the schemes mentioned above to provide a hybrid scheme to enhance the overall security of SI protection and introduce a scheme that uses both schemes merit to overcome their limits.
3
Safe Umbrella: Time-Based Scheme
The safe umbrella is our first method to protect the SI. When using the safe umbrella for protecting SI, we store the SI in volatile memory (VM) in each device (or a group of devices) in the network. Keeping SI in the memory is determined depending on the status of the device and the network with respect to attacking-probability. That means, a device erases the SI stored in its own volatile memory when it detects a possible critical attack (e.g. if the device is informed that a number of compromised devices is greater than a threshold value in a trustworthy manner). However, it is not easy to discover attacker’s malicious behavior. That is, even though other devices are under attack some other devices might be unaware of this possible attack. 3.1
Overview
In the safe umbrella, we assume that each device can be informed about the network status by receiving regular message. To perform this, two scenarios are introduced. In the first scenario, a message that reports the existence of a possible attacker is generated by an HN device that informs each device to delete SI which
Protection Techniques of Secret Information SGM HN Devices
HN Devices
Control Point
SGM
553
Fig. 2. Illustration of safe umbrella’s
is stored in the corresponding device’s memory. In the second scenario, the CP periodically broadcasts messages to inform the devices in the network about the status of the network that determines whether or not there are attackers in the network. Obviously, the first scenario is cheatable since a device under control may deviate in sending the proper signal for other devices. Other than that, there is a high probability that a device under attack cannot send the warning message on time. Thus, we use the second scenario of the solution for estimating and examining the conditions of the network. In this scenario, we assume that the CP periodically broadcasts Safe Guarantee Message (SGM) determining the status of the network as shown in Fig. 2. Upon receiving the SGM, the different devices in the network check safety of the network indicated by checking the SGM following one of these scenarios: – Each device deletes the SI from its volatile memory when the received SGM is not correct. – Each device may delete the SI according to exception error generated from the execution of codes (except the case of errors related to hardware configurations). In this case, the different devices should be provided the ability to distinguish between the exception errors and relate them either to the attack possibility (malicious code running) or to normal running error (hardware exceptions). To determine the status of the network, a time constrained verification process is held in what we call the “signed time-stamp” phase which is shown in the following section. 3.2
Signed Time-Stamp
Each device in the network can confirm its security status by the time synchronization for signed time-stamp broadcast reply. This procedure is illustrated as follows: 1. CP broadcasts signed time-stamp tcenter signed by CP’s private key to devices that CP is controlling within an estimated attack time τ where this τ period helps each device to decide whether it received the message or not. This part of procedure is performed as follows: broadcast
CP −→ : DSKc (tcenter )
554
A. Mohaisen et al.
where DSKc (tcenter ) is the signed (tcenter ) using the CP’s private key and the decryption algorithm D. 2. Each device checks the received time stamp tcenter – which is decrypted by CP’s public key – by comparing the decrypted one with device’s own timestamp (tdevice ) as follows (before all, the message is ignored if the decrypted tcenter is not a valid type of time stamp): 1: 2: 3: 4:
tcenter = EPKc (DSKc (tcenter )) ε = |tcenter − tdevice | if ε ≤ εt , network status=safe. else, network status=unsafe. erase SI.
3. If SGM is not received within the estimated time (i.e., εt ), each device erases SI as a high probability of attack exists and the SGM generator is compromised. Step 2 in the above procedure is clarified as follows: each device decides whether it is in safe status or not by checking whether the time difference between the device’s local time stamp (tdevice ) and the CP’s time stamp (tcenter ) is less than εt or not. εt considers the transmission delay and an additional time drift between the CP and each device. If the time difference is greater than εt , which indicates that an attacker probably intercepted the time packet or the CP is compromised, the device erases SI from its memory to avoid its possible reveal to the attacker.
4
Secret Sharing with Code Attestation
The above scheme, however, is vulnerable to the two possible deviation factors. First, though the CP is compromised, it may not be able to send the SGM on time. The other deviation is related to the node itself in that some possible time drift might not be considered in the design may lead to the deletion of SI though the network is not under attack. To overcome the deviation exposed in the above time-based scheme, we introduce the secret sharing with authentication via the code attestation as solution. The scheme is detailed in the following sections. 4.1
Motivation: Necessity of Secret Sharing
Assuming that the HN devices are in an untrusted environment, we further believe that each attacker has the ability to extract the whole memory contents of any device including the SI. In the previous scheme, we store SI only in volatile memory which requires much care to maintain. However, SI is generally not for “one-time use” so that recovering the SI from a recovery remote storage system will be required. We overcome this dilemma by diffusing (i.e., distributing) SI into several devices using secret sharing to reduce the impact of single or several devices compromise. In the following, we introduce the secret sharing scheme based on the early introduced HN scenario and architecture.
Protection Techniques of Secret Information
555
Secret Sharing. In secret sharing scheme (def. 4), the SI is divided into partial secrets and the resulting shares are distributed through different devices [6]. More precisely, the SI is divided into n ≥ t number of partial secrets by assigning f (i) to the i-th device where i is evaluated in f (x) = s + a1 x + a2 x2 + · · · + at−1 xt−1 where f (x) is a polynomial with random coefficients and of degree t − 1. Devices can recover and then use the SI by recovering the coefficients of the polynomial after gathering t number of partial secrets (out of the n). Threshold Cryptography. The main goal of the threshold cryptography is the protection of information by fault-tolerant distributing of this information among devices in the network [14]. A device can use the SI when the received shares are more than a threshold value t out of n where n is the number of the devices in the HN. Since recovering the whole secrets is possible for an attacker once he compromises a number of nodes, recovering secrets may lead to some security problems. Therefore, we use the threshold cryptography which makes usage of SI from its shares recovering it. Even though, by adapting secret sharing or threshold cryptography at the HN as shown in this procedure, we cannot eliminate all of the uprising problems. For example, since we use those methods to distribute the SI at a point, we cannot verify the device which holds the share used for recovering the partial or the whole secret. In the following, we review a direction for method that may be beneficial for the devices authentication. 4.2
Code Attestation for Shares’ Authentication
Generally, authentication between two parties is performed using information that each [15] or at least one [16] of the two parties knows. However, this kind of authentication cannot be trusted in an untrusted environment because the adversary can perform authentication illegally after obtaining SI from the device which is captured physically. Therefore, received value (keys or whatsoever) for authentication
Device n
Device 3
Code Attestation
Device 2
Control Point
Device 1
Initiating home network device
Notification
Partial Request
Partial Response
Fig. 3. Illustration of secret information recovery process in which the initiating device is authenticated via the CP which confirms the authenticity to other devices in the network. Based on the authenticity of the device, other devices may send their partial secrets to it.
556
A. Mohaisen et al.
might be delivered by an attacker and need to be verified (i.e., authenticated) to confirm that the value is generated by a uncompromised device. Code Attestation: To solve the above problem, the code attestation is used (def. 6). In the code attestation, we verify a device’s whole memory to detect malicious code that may exist. Therefore, this scheme requires an authenticator to have a copy of the same code of the device to emulate and to generate same value required for authentication. If the result of code attestation discovers an abnormality of modified code, the device is considered as unavailable. Code attestation is performed through the CP by sending attestation code to enforce its execution or with code that already stored in the device. It is important to exclude any probability that an attacker may inject malicious code to avoid the attestation itself. SWATT [9], PIONEER [8] and Remote Code Attestation [10] are well-known code attestation methods and can be used as a building blocks for any attestation-based scheme including our system. 4.3
Integration: Secret Sharing for Home Network
Once a device joins in to the network, the device obtains public key certificate from CP after it is verified by the CP through code attestation. The device generates public and private key and then sends the private key to CP. To reduce the computation at the side of the device, CP may generate the keys and send them to the device. The device deletes the private key after sending the key. Upon that, the following is performed: 1. CP which has SI of device generates a polynomial f (x) of degree t − 1 with t random coefficients (i.e., a0 . . . at−1 ). 2. CP erases all of the information related to SI from the memory after distributing f (i) to each devices. This guarantees that the CP is not a bottleneck anymore after the initiation phase. 3. When required, HN device requests other devices which holds partial secret to send them to recover the SI (as in Fig. 3). 4. Before the devices send the partial secrets to the initiating device, the CP verify the authenticity of the that devices and inform devices holding the partial secrets by the authentication result. 5. If the initiating device is authenticated by the CP, the CP confirms to other devices by sending response messages. The response messages are encrypted by CP’s private key so that other devices in the network confirm the message by CP’s public key within the validation time τ . 6. Other devices can verify the authentication response and determine whether to reply to the initiating device’s request or not. The responses for the initiated requests are encrypted by session key (see def. 5). 7. The initiating device recovers the secret information from its shares (see def. 4) after decrypting the shares using the shared session key (def. 5). Otherwise, a device requests service to CP where CP can perform the service by assembling results from the devices as threshold cryptography as in Fig. 4. CP
Protection Techniques of Secret Information
Control Point Device n
Device 3
Service Request
Device 1
Code Attestation
Device 2
Initiating home network device
557
Notification
Partial Request
Partial Response Service Response
Fig. 4. Illustration of SI Merging. Unlike the recovery process, all secret merging operations from its partial shares are performed at the CP.
requests partial service requests to other devices instead of the initiating device and sends back the service response to that device. This solution, however, suffers from the trustworthiness of the CP. That is, the CP is the bottleneck for such design.
5
Comparison and Extension
5.1
Comparison
Both of Safe umbrella and secret sharing have their own characteristics which determine their advantages and disadvantages. These characteristics range from hardware implementations (i.e., memory type) to the functionality (authentication, rejoining, secret recovery, etc). Table 1 shows a brief comparison. One of the interesting advantages of the safe umbrella over the secret sharing is the ease of the secret key usage while the secret sharing requires gathering partial secrets that require multiple message request and response as shown earlier. However, the security of the safe umbrella is heuristic in that it is determined by the attacker’s ability related to the time slot available for the authentication. Table 1. Comparison of Safe Umbrella and Secret Sharing. VM refers to volatile memory and NVM refers to the non-volatile memory. Feature Device Authentication Storage of SI Rejoining Network Restoring Secret key Usage of Secret key
Safe Umbrella public key of device VM of device impossible impossible direct
Secret Sharing code of device NVM of other devices possible possible indirect
558
A. Mohaisen et al.
Volatile Memory SK
Non-Volatile Memory S1
S2
S3
S4
Sn
Public Key Crypto
Secret Sharing
Device n
Safe Umbrella
Device 2
Device 1
Control Point
Threshold Crypto Code Attestation Challenge-Response
Home Device
Fig. 5. Architectural view and building blocks of the hybrid scheme that makes use of both schemes’ advantages
The obvious disadvantage of the same scheme, however, is that it is impossible to restore the secret key (for further use) or to rejoin to the network. On the other hand, secret sharing is provided with the rejoining feature via the code attestation and to restore the SI which makes a better solution for long-living network. However, those two features are provided on the account of using the CP which could be a bottleneck in the design. As shown above, the safe umbrella and secret sharing has several advantageous characteristics over each other. To make use advantages of both, we briefly introduce a hybrid scheme that uses both schemes as building blocks. 5.2
Hybrid Scheme
Based on different characteristics of two introduced schemes, we provide a hybrid scheme by maintaining SI in the device’s volatile memory as safe umbrella does and by enabling rejoining the network using secret sharing with code attestation. Thus, the overall procedure is as following: 1. When a device attempts to enter the network, CP performs code attestation to the device 2. If the device is authenticated successfully, the device requests from CP to restore his own secret key for the future use. 3. The device save restored secret key in its own volatile memory for future usage. 4. The CP broadcasts SGM for devices to recognize themselves as safe. 5. In case SGM is delayed over estimated time, device erases the secret key from the volatile memory. By doing so, the different device will be re-keyed always upon joining the network. Later when the device gets valid SGM, it can again restore the secret key after attesting its memory contents. An illustration of the architecture and the building blocks of the hybrid scheme’s structure is shown in Fig. 5.
Protection Techniques of Secret Information
6
559
Security Evaluation and Cost Estimation
6.1
Security Evaluation
In this section we overview and estimate the security from different points of views including the restoring chances of the SI, attack on running devices, control point’s criticality, and the number of devices and that impact of that on the security. Restoring SI. Our scheme considers the deletion of the SI stored in volatile memory when the device senses an abnormal condition (regardless to whether the revealed information is the designated SI or not)3 . SI can be restored through secret sharing which is available only for a device which is authenticated by CP through code attestation. Code attestation is not perfect against powerful attacker who has powerful hardware, but we believe it is hard to fetch this kind of powerful attacking machine at home environment and to deceive CP for communication (with the attacker’s own CP and pretending as a legal device). Finally, malicious code in a compromised device that behave as a legal device for obtaining SI will be detected by code attestation when the device tries to join in the network. Attack on Running Device. To obtain SI on running device, an adversary must make the device sure that networking and safe umbrella are working correctly until obtaining the secret. Otherwise, adversary needs to obtain the right of execution to incapacitate safe umbrella which is impossible to be performed in the HN considering the hardness of the physical attack. Control Point: Even though the attack on CP is difficult to be performed, an adversary can take out the information when he can control the CP. Because CP does not have SI anymore after the initiation phase, to obtain SI an attacker must disguise itself as a legal device for making a request to collect partial secrets which is hard to be done in practice is shown earlier. A number of Devices. An adversary can restore the SI by attacking as many devices as possible in the HN. The purpose of secret sharing is dispersing attacker’s target. Based on that, this attacking way is related to the cost of attack. In other words, the more devices are used for secret sharing the more security is guaranteed. 6.2
Cost Estimation
In this section we consider the overhead in terms of the computation only determined through the different building blocks of our schemes. 3
We assume that the device can delete the SI before its physical capture due to any abnormal condition. Furthermore, we believe that reveal of the SI, in whole or in parts, to an attacker will affect security of the whole system in a way or another.
560
A. Mohaisen et al.
Validating SGM: RSA [5] requires higher resources compared to the symmetric key cryptography, however by choosing a small number as in many other applications (e.g., RFID and sensor networks), its operation can be faster than the normal case. Verifying message procedure is same like encrypting plain text which is decrypted by private key using public key. Therefore, having limited ability of operation, device can verify the message signed by CP’s private key with a relatively small cost. To sum up, the required overhead per operation is 1 sign (at the CP and n verifications at the devices’ side which means that a single verification is required per device. Secret Sharing and Threshold Cryptography: Threshold cryptography requires many exponent operations. Fortunately in the proposed scheme, threshold cryptography is performed through CP with request from a device so that only the device’s costs are represented in the cost of the request and response for the service. For secret sharing used to decentralize the secret key, devices require operations such like Lagrange interpolation (for polynomial). As the polynomial degree is related to the security parameter and the size of the network, the overall cost therefore increases in proportion to the number of devices. Session Key Generation: The session key generation is required for secure exchange of the shares. The computation of the DH based key requires two exponentiation operations as shown (in def. 5). Code Attestation. A number of operations such like hash and checksum are needed to perform code attestation. Code attestation is performed only when a device try to join the network. Even though cost for code attestation is bigger than our expectation, code attestation is rarely performed. Thus, we believe that the cost for code attestation is marginally acceptable for most of the common HN devices 4 . Since some of HN devices operate only when they are needed or scheduled by the network operator or owner, the CP may not realize whether the device is just powered off or is under attack. For this reason, CP may performs code attestation to the device for authentication when the device attempts to join in the network. The code attestation which detects malicious code on the device as an essential functionality can also be used for our authentication goal.
7
Concluding Remarks
We introduced several schemes for protecting SI in HN devices. Our schemes are based on several existing technologies and algorithms adapted into the home network environment with its specifications. Motivated by the special communication pattern in the home network, we introduced the safe umbrella which, to some extent, solves the problem of the SI protection. To overcome limits of 4
In fact, code attestation is very applicable to sensor nodes which are essential components in the home network environment.
Protection Techniques of Secret Information
561
the safe umbrella, we introduced the secret sharing scheme with code attestation in which the secrets are maintained for long living network. To show the value of our proposed schemes, we showed detailed security and cost analysis in addition to the discussion of several network scenarios. In a future work, we will investigate other method to substitute SGM-time stamp and finding other applications for the hybrid scheme. Also, it will be beneficial to consider other attacking scenarios rather than the physical capture.
Acknowledgment The authors would like to thank the anonymous reviewers for their valuable comments. This research was supported by the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Advancement) (IITA-2008-C1090-0801-0028)
References 1. LG electronics: HomeNet (2007), http://www.lge.com/products/homenetwork/homenetwork.jsp 2. Group, T.C.: Trusted computing platform alliance main specification version 1.1b (2003) 3. Sengodan, S., Edlund, R.Z.L.: On securing home networks. In: INET (2001) 4. Wallach, D.S., Balfanz, D., Dean, D., Felten, E.W.: Extensible security architectures for Java. In: 16th Symposium on Operating Systems Principles, pp. 116–128 (1997) 5. Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21, 120–126 (1978) 6. Shamir, A.: How to share a secret. Commun. ACM 22, 612–613 (1979) 7. Diffie, W., Hellman, M.E.: New directions in cryptography. IEEE Transactions on Information Theory 22, 644 (1976) 8. Seshadri, A., Luk, M., Shi, E., Perrig, A., van Doorn, L., Khosla, P.K.: Pioneer: verifying code integrity and enforcing untampered code execution on legacy systems. In: Herbert, A., Birman, K.P. (eds.) SOSP, pp. 1–16. ACM (2005) 9. Seshadri, A., Perrig, A., van Doorn, L., Khosla, P.K.: Swatt: Software-based attestation for embedded devices. In: IEEE Symposium on Security and Privacy, p. 272. IEEE Computer Society (2004) 10. Shaneck, M., Mahadevan, K., Kher, V., Kim, Y.: Remote software-based attestation for wireless sensors. In: Molva, R., Tsudik, G., Westhoff, D. (eds.) ESAS 2005. LNCS, vol. 3813, pp. 27–41. Springer, Heidelberg (2005) 11. Nakamura, M., Tanaka, A., Igaki, H., Tamada, H.: Adapting legacy home appliances to home network systems usingweb services. In: ICWS, pp. 849–858 (2006) 12. Perrig, A., Stankovic, J.A., Wagner, D.: Security in wireless sensor networks. Commun. ACM 47, 53–57 (2004)
562
A. Mohaisen et al.
13. Perrig, A., Szewczyk, R., Tygar, J.D., Wen, V., Culler, D.E.: Spins: Security protocols for sensor networks. Wireless Networks 8, 521–534 (2002) 14. Desmedt, Y., Frankel, Y.: Threshold cryptosystems. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 307–315. Springer, Heidelberg (1990) 15. Bellovin, S.M., Merritt, M.: Augmented encrypted key exchange: A password-based protocol secure against dictionary attacks and password file compromise. In: ACM Conference on Computer and Communications Security, pp. 244–250 (1993) 16. Feige, U., Fiat, A., Shamir, A.: Zero-knowledge proofs of identity. J. Cryptology 1, 77–94 (1988)
Universal Remote Control for the Smart World Jukka Riekki1, Ivan Sanchez1, and Mikko Pyykkönen2 1
University of Oulu Department of Electrical and Information Engineering and Infotech Oulu P.O. Box 4500, 90014 University of Oulu, Finland {jukka.riekki,ivan.milara}@ee.oulu.fi 2 University of Lapland Department of Industrial Design P.O. Box 122, 96101 Rovaniemi, Finland [email protected]
Abstract. In this paper, we discuss how to build user friendly user interfaces to the smart world. We present the REACHeS architecture for controlling Internet services through physical user interfaces, using a mobile terminal and icons placed in the environment. An icon advertises a service that can be started by touching the icon with a mobile terminal. This service activation configures the mobile terminal as a remote control for the service. We have implemented this architecture and designed an icon set. The physical user interface is based on RFID technology: the terminals are equipped with RFID readers and RFID tags are placed under the icons. We present the first prototype applications and the first usability tests that we have carried out.
1 Introduction The number of devices that provide services to us in our daily environment increases at an accelerating pace. Hence, the effort required from us, the users, to learn to use all these services is increasing as well. Furthermore, using the services most often requires us to interrupt our daily activities and to focus on the service’s user interface. How could we change this situation? How could we develop our environment into a smart world that allows us to focus on our daily life and just use the services when we need them, without studying their use beforehand and without distracting our focus too much from the everyday activities? In this paper, we discuss how this challenge can be tackled with users’ personal mobile terminals and physical user interfaces. We present the REACHeS system that changes mobile terminals into universal remote controls for the smart world. When a user wants to use a service, she/he first scans visually the local environment for icons advertising services locally available. Then, the user touches the icon advertising the desired service with her/his mobile terminal. The action of touching generates an event that is received by the REACHeS system. The event is generated by the mobile terminal’s RFID reader when it reads the RFID tag placed under the icon. REACHeS delivers the event to the server implementing the requested service. The service replies by creating a user interface into the mobile terminal and taking the necessary F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 563–577, 2008. © Springer-Verlag Berlin Heidelberg 2008
564
J. Riekki, I. Sanchez, and M. Pyykkönen
local resources into use. As a result, the user controls the service, and the resources used by it, with the mobile terminal. The REACHeS system transmits all communication between the service, the mobile terminal, and the local resources. It is also in charge of allocating resources. The main contributions of this work are the icon set and the REACHeS architecture. These contributions enable a wide variety of Internet services to be controlled by touching icons in the environment with mobile terminals. In addition, the architecture facilitates controlling local resources like wall displays. A further contribution is building physical user interfaces from off-the-self components. Icons advertising RFID tags have not been studied much, some suggestions for visualising RFID tags can be found but not any comparable icon set [1,2,3]. Earlier versions of our icons can be found from [4,5]. Here, we present the largest uniform icon set so far and elaborate further different possibilities for placing the icons in the environment. Remote controls and systems resembling the REACHeS have been studied more, see for example [6,7,8,9,10,11]. However, we are not aware of any other architecture that integrates mobile terminals, services, and local resources into a single system, using a comparable set of widely-used and robust technologies. We have presented the REACHeS architecture and the first prototypes in [12]. Here, we focus on the communication between system components and report performance measurements, user interfaces that are consistent with the icons, and the first usability studies. The rest of the paper is organized as follows: in the next two sections, we present the physical user interface and the REACHeS system. The fourth section describes two prototypes and the fifth section the performance measurements and the usability study. The sixth section contains the discussion and comparison with related work. The seventh section concludes the paper.
2 Physical User Interface We see user friendliness as one central characteristic of the smart world. Why should we develop more technology into our daily environment if it is difficult to use and it takes time from our everyday activities? New technology should make our life easier and free time for the activities that we value! Our approach in improving user friendliness is to offer a physical user interface for the user. The physical user interface consists of icons and a mobile terminal. The icons are placed in the environment and they advertise the services that are available locally. A service is activated by touching the corresponding icon with a mobile terminal. Although also other technologies can be used to implement the physical user interface, we focus on RFID technology in this paper. A technology having a short reading distance (less than 5 cm) is used; so the user needs to intentionally bring the terminal near the icon to initiate the reading of an RFID tag. Hence, the reading events can be interpreted as commands to the system serving the user, and it is natural to describe the selection as "touching" an icon. An icon advertises a point in the environment that can be touched with a terminal and a service that is started when the icon is touched. An icon forms, together with the RFID tag placed behind the icon, a two-sided interface between the physical and digital worlds. For the user, this interface advertises a service available and for the system this interface contains the data needed to start the service.
Universal Remote Control for the Smart World
565
The main challenge in icon design is to communicate the icons’ meaning clearly to all potential users. Basically, an icon has to communicate enough information for the user to decide whether to touch the icon or not. If users do not recognize all icons to belong to the physical UI, some services will not be activated. On the other hand, if a user recognizes an icon but interprets it incorrectly, the activated service is not the expected one. Moreover, if the user touches an icon that is not part of the physical UI, the user might reason that the system is broken. Icon design is clearly a challenging task, especially when the goal is that the icons can be placed anywhere in our everyday environment where users might potentially use the advertised services. To tackle this challenge, we have divided icons into two parts: the outer part is a general icon that communicates to the user a point that can be touched. The inner part, in its turn, is a special icon that advertises a service. Figure 1 presents our icon set. The first icon is the general icon. The next five icons advertise services performing simple actions: print something, show some entity’s calendar, send a message or call some entity, and locate some entity. The first icon on the second row, Play Slideshow, starts a specific application. The following five icons have been designed to advertise the corresponding place on some document, for example on a map.
Fig. 1. The icon set
566
J. Riekki, I. Sanchez, and M. Pyykkönen
The first icon on the third row, Remote control, advertises that some resource can be controlled remotely by touching here. The next four icons advertise services that are related to an information storage, that is, to a container. The following icon, Drop to wall display, advertises a service that drops a document from the mobile terminal to a wall display. The next icon, Pick from wall display, advertises a document transfer to the opposite direction. The second icon on the fourth row, Save, saves a document when touched and the third icon, Play audio, plays an audio file. The next icon, Blog, brings a specific blog page to the mobile terminal’s display. The Join the game icon configures the user’s mobile terminal as a game pad for a game played in the local environment. The last icon, Stop, might be used to stop a service. All information about the service can not be encoded in the icon, no matter how well it is designed. For example, the Drop to wall display icon specifies the drop action. But which document is to be dropped and to which wall display? A shared understanding is clearly required between the user and the system; they both have to interpret the situation in the same way. Icon placement can be used in building this understanding. For example, if we place the Drop to wall display icon on the wall display (or next to the display), it is obvious that the document is to be dropped on just that wall display. Placing a Drop to wall display icon on a wall display is an example of a direct tag placement. In this case, an icon is placed on an entity that has a role in the service. Here, it is clear that the action is "Drop a document to this wall display". However, all icons can not be placed directly on an entity that has a role in the service. In such cases, indirect placement offers a solution: the icon is placed on a document that describes an entity having a role in the service. Documents advertising music might contain Play audio icons. The icons advertising places (Attraction, Beach, etc) can be placed, for example, on a map or on a tourist guide. Furthermore, as this technology is new for the majority of the users, we might start the usage by placing icons on posters that instruct the users. For example, a poster might describe a local resource and inform the user that by touching the Remote control icon (included in the poster) brings a remote control UI on the mobile terminal’s display. In weak placement, the entity on which the icon is placed has no role in the service. A better alternative is to use indirect placement; to place the icon on a poster describing the service, at least till the icon is widely known. Moreover, in embedded placement, an RFID tag is installed under an icon that is already present in the environment, for example, under a name tag in an office corridor. Finally, an icon can be placed on top of an RFID reader. In this case, the icon is not bound to one service but the system can change dynamically the data that is read into a terminal when the icon is touched – the icon needs to advertise this to the users. This approach allows any mobile terminal to be used (having a network connection and able to receive data that is pushed from a server). The terminal just needs to be equipped with an RFID tag. When the user brings the terminal near the reader, data identifying the terminal is read from the tag and delivered to a server.
3 REACHeS We suggest that the user’s personal mobile terminal is the main interaction device in the smart world. The REACHeS (Remotely Enabling and Controlling Heterogeneous Services) system provides for services a simple interface for creating a user interface
Universal Remote Control for the Smart World
567
Fig. 2. The REACHeS architecture
into the mobile terminal and communicating with it (Figure 2). The system also controls the local resources (wall displays, etc) and offers for the services interfaces for using these resources. When a user wants to use some service, she/he browses the environment, selects the corresponding icon, and touches it with a mobile terminal. The RFID reader installed in the mobile terminal reads data from the RFID tag that is placed under the icon (number 1 in the Figure). The data read from the tag is delivered as an event to the REACHeS system (2). The REACHeS passes the data to the server that is responsible for the service in question (3). The service replies with a message that determines the user interface to be created to the mobile terminal (4). REACHeS adapts and passes this information to the mobile terminal (5). As a result, the user sees the remote control interface on the mobile terminal display and can start to control the service. During operation, REACHeS transmits the messages that are sent from the mobile terminal to the service and vice versa (as indicated by 2-5). The service can also request the control of local resources (e.g. a wall display) from REACHeS (6). In such a case, REACHeS redirects events from the service to the local resources, and vice versa. From the service’s point of view, the events flow directly to the resource, thus REACHeS establishes a kind of virtual path between the service and the resource (7). In a typical message sequence, a command given by the user is first delivered through REACHeS to the service and then the service responds by sending a response to the mobile terminal and a command to a local resource. The communication proceeds like this till the service is stopped.
568
J. Riekki, I. Sanchez, and M. Pyykkönen
The main components of the REACHeS are the User Interface Gateway (UIG) and the System Display Control (SDC). UIG is the core component that delivers the messages sent to REACHeS to the intended destinations. When a service component sends a message to a local resource, UIG directs the message to the component controlling the resource. Thus far, we have defined only the component controlling displays (SDC), but components controlling other resource types are straightforward to add. REACHeS contains also adaptor components for client (i.e. mobile terminal) GUIs, interface components for the external services, an administration component, and a database for storing information about local resources, registered services and relations between them. The communicated messages deliver events. Messaging is implemented using HTTP protocol. An event is represented either directly as message parameters or as simple XML fragment like <event> event parameter1 … parametern , where the name of the event, event, is followed by the parameters of the event. The first alternative is used in communication between clients and services, while the second one is used to transmit messages from the service to a resource. REACHeS is implemented using servlets and runs in a Tomcat 5.5. The mobile phones (Nokia 6131 NFC) are equipped with NFC compliant RFID readers. The phones implement the JSR 257 (Contactless Communication API) for communicating with the RFID reader. Near Field Communication (NFC) [13] is a standard-based, short range wireless connectivity technology. Data is communicated as NFC Data Exchange Format (NDEF) messages that are composed of records, which can contain, for example, a URL or location data [14]. Applications use record types to identify the semantics and structure of the record content. We have two versions of the client application. The first one is simply the phone’s Web browser; the second client version is a MIDlet application. The browser client is started when a tag containing a URL is read. An HTTP request is sent (message number 2 in Figure 2) and the received webpage (5) is shown on the display. The MIDlet is started when an NDEF record associated with the MIDlet is read from a tag. The MIDlet sends an HTTP request to the REACHeS system (2) and receives a response (5). In both cases, the communication proceeds as described above. Wall displays are the first local resources that we have integrated into the REACHeS system. A service controls a wall display by sending HTTP Post requests to the REACHeS which passes the requests to the wall display’s browser. A request contains one or more events. The name of an event is a method name from Table 1 and the event parameters are that method’s parameters. Flash is used to play multimedia content. The multimedia player (JWMediaPlayer, [15]) is a Flash application that is loaded in a webpage using the insertFlashObject() method. When a service receives a command from the user (play, pause, stop, etc), it processes the command and sends the corresponding sendCommandToObject() method call as an event to the wall display’s browser. The external services have to be able to communicate with the REACHeS using the HTTP protocol. A service implementation contains a service component, tags for starting the service, the icons, and the client application. In addition, data can be stored in files in XML format. For example, when a user activates a photo album application, the photographs to be shown can be specified in an XML file. The location of this file can be given as a parameter in the URL that is stored in the tag. This approach allows the same application to be activated with different parameters easily.
Universal Remote Control for the Smart World
569
Table 1. The interface for controlling wall displays Method insertBody(source) insertWebPage(url) changeAttribute(element, attribute, value) playAudio(file) stopAudioFile() insertFlashObject(flashFile, id, destination, filelist, extra) sendCommandToObject(id, command, extra)
Description Shows the source webpage on the wall display. Shows the url webpage on the wall display. Changes from the current document the value of the element's attribute to value. This method can be used, for example, to change the image that is shown on the webpage. Starts playing the audio file file. Stops playing the currently played audio file. Inserts flashFile object in the element destination. Id is used as an identifier when events are sent to the object. Filelist parameter passes a list of files to be played by the Flash player. Also extra parameters such as width, height, autostart, etc can be delivered. Sends the command to the object id. Extra parameters can be given as well.
4 Prototypes The Product Browser application lets the user browse advertisements on the wall display using a mobile terminal as a remote control. The wall display might be placed on a shop window or on the wall of a shopping mall, for example. A Remote Control icon is bound to the Product Browser (i.e. the RFID tag contains a set of NDEF Records defining the service and its parameters). Another option would be to have a special Product Browser icon. The wall display and the service need to be registered into REACHeS before the application is used. When a user touches a tag (Figure 3), the remote control sends to the REACHeS the HTTP GET request that specifies the Start event (2 in Figure 2). This event is listed on the first row of Table 2; the actual server address and port number have been replaced with “server” and “port”. The file playlist.xml is a file following the RSS 2.0 specification. The file specifies the information (images and text) that is shown on the wall display. The REACHeS responds to the request by reserving the display and sending a request to the Product Browser service (3 in Figure 2). This request resembles the one received from the mobile terminal. The
Fig. 3. User starts the Product Browser application
570
J. Riekki, I. Sanchez, and M. Pyykkönen Table 2. Examples of communication Message Start Update display Next ad request Next ad event Close
Data server:port/reaches?service=show_products&event=start&display=10.20.41.8& playlist=server:port/productBrowser/playlist.xml <event>insertWebPage server:port/reaches/productBrowser/main.html server:port/reaches?service=show_products&event=next <event>changeAttribute mainPhoto src server:port/…/productBrowser/images/ image1.jpg server:port/reaches?service=show_products&event=close
Fig. 4. The Product Browser user interface on the mobile phone’s display
Fig. 5. The first advertisement in the Product Browser
service accesses the Internet if necessary and responds by updating both the mobile terminal’s display and the wall display. The second row of Table 2 shows the event the REACHeS receives from the service and redirects to the display. Now, the wall
Universal Remote Control for the Smart World
571
display presents the title page and the mobile phone’s display presents the user interface for controlling the Product Browser (Figure 4). The user can browse the advertisements by pressing the buttons at the lower part of the display. The small "i" button opens a webpage presenting additional information. When the user selects for example the Next command, the REACHeS passes this command to the service as the HTTP request listed at the third row of Table 2. The service replies by sending and event determining that the next advertisement should be shown on the wall display (fourth row in Table 2). REACHeS sends the event to the corresponding display. As a result, the first advertisement is shown on the wall display (Figure 5). The commands given by the user are processed like this till the user closes the service with the HTTP request that causes all resources associated to this service to be released (fifth row in Table 2). Other prototypes such as the Multimedia Player and the SlideShow Viewer are described in [12].
5 Testing 5.1 Performance We performed the measurements listed in Table 3. Due to space constraints, we do not report the detailed test setup or sequence but focus on the results. The aim of these measurements was to verify our assumption that the mobile network delay dominates the overall latency (EGPRS was used), to measure the significance of the service communication in overall latency, and to study the communication between the service and the display. We measured latencies for both internal services (running in the same machine with the REACHeS) and external services (running in different machines). Table 4 shows the results. The times are slightly longer if we use external services but the difference is small. Start events require much time because both the REACHeS and the service must allocate necessary resources. For the other events, the latency is (on average) 1.2 seconds for external services. The mobile data network is clearly the slowest component, as the REACHeS Execution Time is 100 times shorter than the Total Latency for common events. The last three experiments indicate that all the other system components have acceptable performance. Table 3. Performance measurements 1. Total Latency: the time from a button press that generates an event till the phone receives a message informing that the event has been processed by the service. 2. REACHeS Execution Time: the time from the moment the REACHeS receives a request from the mobile phone till it sends a response back. 3. Service Execution Time: the time from the moment the REACHeS sends a request to the service till it receives a response back. 4. Display Update Time: the time from the moment the service sends a request to the REACHeS till the display performs the required action.
572
J. Riekki, I. Sanchez, and M. Pyykkönen Table 4. The performance of the REACHeS system Service location
Experiment
1. Total Latency 2. REACHeS Execution time 3. Service Execution Time 4. Display Update Time
Internal External Internal External Internal External Internal External
Measured time [ms] Start event Other events 3300 1200 4500 1100 1265 1461 1095 1076 1398 1887
10 13 2 5 171 202
When a wall display is used, the Total Latency is not as important as the latency from the moment the user sends an event till the moment the effect is shown on the wall display. This Effective Time can be approximated from the measured times as follows: EffectiveT ime =
Total _ Latency + Display _ Update _ Time 2
We assume that the HTTP request made from the mobile phone and the response sent from the REACHeS have the same network delay. The Display Update Time has no effect on the Total Latency because the service does not wait a response from the display before it sends a response back to the REACHeS. Table 5 shows that Effective Time for common events is always under one second. Table 5. The Effective Time [ms]
Internal Service External Service
Start 3040 4130
Other 771 752
5.2 Usability Study We have performed a usability study to compare controlling a wall display using: (1) the display’s touch screen, (2) gestures that are recognized by a hand-held sensor device [16], and (3) a mobile terminal as a remote control. The last method utilized the REACHeS system. The wall display is presented in Figure 6. The test group consisted of 11 participants with the average age of 28 years. We tested two applications: a Photo Album reminding the Product Browser and a remote browser control that allows a user to control a wall display’s browser. Each command available was associated with a button in the remote control’s UI and had an own gesture as well. The normal browser GUI was used directly to control the wall display. First, we tested how intuitive the different methods were. The participants used both applications with all three control methods without knowing how each command was performed. The majority of the participants succeeded in relating the commands with the buttons of the mobile terminal (five out of eight commands on average), but only two out of eight gestures were guessed.
Universal Remote Control for the Smart World
573
Fig. 6. The wall display
At the second stage, we explained to the participants how to perform each command. Then, the participants had to achieve a small task and the time to accomplish the task was measured. The touch screen was clearly faster (average time 82 seconds) than our remote control system (average time 139 seconds), or gesture control (average time 144 seconds); most probably because the participants used the familiar Windows style GUI directly and there were no wireless links or sensor data processing increasing the latency. On the other hand, both the gesture control and the remote control GUI were new for the participants and their development was in quite an early stage when this test was carried out (an earlier version of the mobile phone GUI was used). This immaturity probably affected also to the low rate of relating the commands to the buttons and gestures. Furthermore, the participants had to share their attention between the mobile GUI and the wall display. We noticed that the participants mainly focused on the feedback shown on the wall display. Selecting hyperlinks was easiest with the touch screen as the correct link could be selected directly. When gestures and the remote control were used, the participants had to move through the hyperlinks in the order they were included in the webpage. The participants evaluated the touch screen to be otherwise the best interaction device but the remote control was valued to be equally reliable. The remote control was evaluated to be better than the gesture control. When the remote control was used, the participants needed to touch an icon to activate the service. The majority of the participants (64%) considered the icons intuitive or very intuitive and that no extra textual information is required to explain them. The participants commented on the need to be instructed that an icon needs to be touched to start a service. For the majority (82%) of users, the system worked as they expected. The REACHeS was evaluated by the majority to be reliable and easy to use. Over half (56%) of the
574
J. Riekki, I. Sanchez, and M. Pyykkönen
participants felt comfortable or very comfortable using the keypad to send commands to the service and most (91%) felt comfortable or very comfortable using the mobile phone display as a remote control GUI. Many participants remarked as a positive aspect that the mobile phone is a familiar device. We also noticed that the cognitive load was not very high as we could establish short conversations with the participants.
6 Discussion We presented an architecture for controlling Internet services through physical user interfaces and an icon set designed for the physical user interfaces. A user activates a service by touching the corresponding icon with a mobile terminal. This user interface is physical, as mobile terminals are used as physical objects rather than as traditional I/O devices. This kind of user interfaces are also known as tangible user interfaces [17]. Ailisto et al. [18] classify this approach of service activation as physical selection and present concepts for selection by touching, pointing, and scanning. Our approach is similar to their concept of selection by touching. An icon forms, together with the RFID tag placed behind the icon, a two-sided interface between the physical and digital worlds. Such interfaces have been suggested by many others, for example Want et al. [19] published their often referenced work in late 90’s. A central characteristic of our approach is the emphasis on advertising services by icons. Icons of this kind have not been studied much. Want et al. [19] do not discuss specific icons for advertising RFID tags but focus more on unobtrusive tagging of physical objects. We classify this as embedded tag placement. Tungare et al. [2] emphasize that the user interacts with objects and present few examples of iconic representations. Ailisto et al. [18] do not suggest any visual appearance for RFID tags. Arnall [3] explores the visual link between information and physical things. Välkkynen et al. [1] present a few suggestions for visualising RFID tags. We have presented our earlier icon sets in [4,5]. But although there is not much work on the visual appearance of RFID tags, icons indicating RFID tags are already approaching our everyday life. For most of us, the first touch to RFID icons might be the icon that is used to mark passports that contain RFID tags [20]. The main difference between the physical user interface presented in this paper and the graphical user interfaces of the commercial mobile terminals (e.g. iPhone) is that the icons of the physical user interface are placed in the local environment, whereas the icons of a GUI are presented on the mobile terminal’s display. Placing the icons (and RFID tags) in the environment leads to easy interaction with a large number of services. The detailed information required to start a locally available service is hidden from the user (into an RFID tag). The amount of tags that can be placed in our daily environment – and hence the number of services that can be started by touching a tag – is large. When a service is started by selecting an icon from the mobile phone’s display, a compromise has to be made between the number of icons and the amount of information the user needs to enter. One alternative is to discover automatically the locally available services and present only the icons corresponding with these services on the mobile phone’s display. However, such a solution requires a considerably more complex implementation than the one presented in this paper.
Universal Remote Control for the Smart World
575
We discussed the challenges for designing and placing icons. The icons need to be recognized as part of the physical user interface and they need to be interpreted correctly. As all information cannot be encoded in the icons nor stored in the tags, the placement can be used to provide information about the service. Additional information can be expressed in the form of rules shared by the users and the system. We suggested that user’s personal mobile terminal is the main interaction device in the smart world. The implemented REACHeS system illustrates how well-known, widely used technologies like HTTP, XML, and Flash suffice to build novel contextaware applications. Also, the mobile terminals are commercial, off-the-self mobile phones. Extra applications do not have to be installed to the computers controlling the displays nor to the mobile phone when the browser client is used. The potential of using mobile devices as universal control devices has been reported earlier [8,6,7,11]. However, these works do not cover a system that integrates the mobile terminals, services, and local resources into a single system. Raghunath et al. [9] propose controlling external displays using mobile phones and Broll et al. [21] present a system that allows Web Services to manipulate mobile phone displays. Both systems are more complex than REACHeS, but at the same time offer less functionality. Elope middleware [10] uses mobile phones with embedded RFID readers to start interoperation between components in the smart space. The middleware is in charge of setting up the parameters to start the communication between agents and services. The REACHeS controls the communication also between resources and services. Furthermore, a mobile phone is not used as a remote control in Elope. The REACHeS system functions in a reliable fashion. The performance measurements indicate that the EGPRS connection is the bottleneck, as can be expected. WLAN connection would without any doubt decrease the latency – but unfortunately terminals offering both a WLAN connection and an integrated RFID reader are not yet available. The current version is usable in many different applications. The latency is too big for real-time applications but small enough for browsing content. We are currently working with wall displays. However, numerous useful services do not require a wall display but the mobile phone’s UI capabilities suffice. Physical user interfaces can produce more fluent user interaction even for the common simple services such as making a phone call or sending a short message. Our earlier usability studies [4] indicate that the icons are easy to learn and that special icons are preferred over a general icon that brings a list of choices to the terminal’s display. The study reported here indicates that a mobile terminal is a reliable and easy to use remote control. A wall display’s own touch screen offers the fastest interaction but is not always the most suitable interaction device. A remote control is a viable option when a user-specific UI is justified and when touching the wall display is not possible. Furthermore, the overwhelming performance of the touch screen can be partly explained by the immaturity of the remote control prototype. However, the remote control has one characteristic shortcoming, namely a user has to share her/his focus of attention between two user interfaces: the remote control UI and the wall display UI. The usability study indicated that users pay more attention to the wall display, hence all important messages should be shown on the wall display In this paper, we focused on starting services by touching. Icons can also be used to control an already running service, for example to pause a video. Another
576
J. Riekki, I. Sanchez, and M. Pyykkönen
interesting research topic is combining several touches, to first pick a document from a container and then to drop it into a wall display, for example. These topics are in our research agenda. In the near future, we will also develop the icons further, build more prototypes and organize more usability studies. After having built and tested several prototypes, we can analyze the merits and drawbacks of REACHeS in more detail. We have started to study the security issues as well. Finally, developing REACHeS to communicate with mobile phones that contain an RFID tag instead of a reader clearly offers great potential, as this improvement would increase considerably the amount of mobile phones that can be used to access the services.
7 Conclusions We presented an architecture for controlling Internet services through physical user interfaces. We presented an icon set and the first prototypes and usability tests. Although the usability tests indicate that we can achieve good usability, our research on this kind of physical user interfaces is in its early stage. We have implemented some prototypes and used some icons in these prototypes, but most of the suggested services currently exist only on paper. We will continue this research by building more prototypes. We will test these prototypes in real life settings. We will clearly have a considerable amount of work ahead, but we argue that this is the only way to gain the deep understanding that is required to build truly user friendly services for the forthcoming smart world.
Acknowledgments This study was funded by the Finnish Funding Organization for Technology and Innovation and companies. Marta Cortés is acknowledged for taking photographs and collaborating in the prototype implementation. The test group is acknowledged for participation to the usability study.
References [1] Välkkynen, P., Tuomisto, T., Korhonen, I.: Suggestions for Visualising Physical Hyperlinks. In: Proc. Pervasive Mobile Interaction Devices, Dublin, Ireland, May 2006, pp. 245–254 (2006) [2] Tungare, M., Pyla, P.S., Bafna, P., Glina, V., Zheng, W., Yu, X., Balli, U., Harrison, S.: Embodied data objects: tangible interfaces to information appliances. In: 44th ACM Southeast Conference ACM SE 2006, Florida, USA, March 10–12, pp. 359–364 (2006) [3] Arnall, T.: A graphic language for touch-based interactions. Proc. Mobile Interaction with the Real World. In: Conj with the 8th Intl Conf on Human Computer Interaction with Mobile Devices and Services (MobileHCI 2006), Espoo, Finland, September 2006, pp. 18–22 (2006) [4] Riekki, J., Salminen, T., Alakärppä, I.: Requesting Pervasive Services by Touching RFID Tags. IEEE Pervasive Computing 5(1), 40–46 (2006)
Universal Remote Control for the Smart World
577
[5] Riekki, J.: RFID and smart spaces. Int. J. Internet Protocol Technology 2(3/4), 143–152 (2007) [6] Myers, B.A.: Mobile Devices for Control. In: Paternó, F. (ed.) Mobile HCI 2002. LNCS, vol. 2411, pp. 1–8. Springer, Heidelberg (2002) [7] Christof, R.: The Mobile Phone as a Universal Interaction Device – Are There Limits? In: Proc of the Workshop Mobile Interaction with the Real World, 8th Intl Conf on Human Computer Interaction with Mobile Devices and Services Espoo, Finland (2006) [8] Nichols, J., Myers, B.: Controlling Home and Office Appliances with Smart Phones. IEEE Pervasive Computing 5(3), 60–67 (2006) [9] Raghunath, M., Ravi, N., Rosu, M.-C., Narayanaswami, C.: Inverted browser: a novel approach towards display symbiosis. In: proceedings of Fourth Annual IEEE International Conference on Pervasive Computing and Communications, Pisa, Italy, pp. 71–76 (2006) [10] Pering, T., Ballagas, R., Want, R.: Spontaneous marriages of mobile devices and interactive spaces. Communications of the ACM 48(9) (September 2005) [11] Ballagas, R., Borchers, J., Rohs, M., Sheridan, J.G.: The Smart Phone:A Ubiquitous Input Device. Pervasive Computing 5(1), 70–77 (2006) [12] Sánchez, I., Cortés, M., Riekki, J.: Controlling Multimedia Players using NFC Enabled mobile phones. In: Proceedings of 6th International Conference on Mobile and Ubiquitous Multimedia (MUM 2007), Oulu, Finland, December 12-14 (2007) [13] NFC Technology Architecture Specification. NFC Forum (2006) [14] NFC Data Exchange Format (NDEF) Technical Specification. NFC Forum (2006) [15] ABOUT JEROENWIJERING.COM (17.1.2008), http://www.joerenwijering.com [16] Kauppila, M., Pirttikangas, S., Su, X., Riekki, J.: Accelerometer Based Gestural Control of Browser Applications. In: Proc. Int. Workshop on Real Field Identification (RFId2007), UCS 2007, Tokyo, Japan, November 25, 2007, pp. 2–17 (2007) [17] Ullmer, B., Ishii, H.: Emerging frameworks for tangible user interfaces. IBM Systems Journal 39(3-4), 915–931 (2000) [18] Ailisto, H., Pohjanheimo, L., Välkkynen, P., Strömmer, E., Tuomisto, T., Korhonen, I.: Bridging the physical and virtual worlds by local connectivity-based physical selection. Personal Ubiquitous Computing 10(6), 333–344 (2006) [19] Want, R., Fishkin, K.P., Gujar, A., Harrison, B.L.: Bridging Physical and Virtual Worlds with Electronic Tags. In: Proc SIGCHI conference on Human factors in computing systems, Pittsburgh, Pennsylvania, USA, pp. 370–377 (1999) [20] Biometric passport (16.1. 2008), http://en.wikipedia.org/wiki/Biometric_passport [21] Broll, G., Siorpaes, S., Rukzio, E., Paolucci, M., Haamard, J., Wagner, M., Schmidt, A.: Supporting Mobile Services Usage through Physical Mobile Interaction. In: Proc 5th IEEE Intl Conf on Pervasive Computing and Communications, White Plains, NY, USA, pp. 262–271 (2007)
Mobile Navigation System for the Elderly – Preliminary Experiment and Evaluation Takahiro Kawamura1, Keisuke Umezu1,2 , and Akihiko Ohsuga1 1
Graduate School of Information Systems, University of Electro-Communications 2 NEC Corporation Abstract. The aging of society is emerging as a global problem. Participation of elderly people in social activities is highly desirable in several perspectives of the society. Thus, we propose a mobile service that supports elderly people wishing to go out and promotes their participation in social activities. This system collects information on the existence of nearby barriers and provides timely information to users via a mobile phone equipped with GPS. The nearby barriers include not only permanent barriers such as stairs, but also temporary ones such as crowds and parked bicycles. Further, we propose functionality that selects and informs barrier information suitable for each user. This paper illustrates the public experiment that we conducted in Tokyo, and confirms the accuracy of the information filtering.
1
Introduction
Recently, the rapid aging of society has emerged as a global problem. In Japan, elderly people (over 65 years of age) accounted for more than 20This proportion is expected to rise to 26% in 2015 [1]. Elderly people have several problems and anxieties in contemporary society. For example, 30% of elderly people in Japan have no friends [1]. Furthermore, people with disabilities have to make a great deal of effort in order to move around outside their homes. Therefore, many people rarely leave their homes. However, the engagement of the elderly in social activities, including not only work but also in voluntary or local activities, is a key to vitalization of the entire society. In this paper, we propose a mobile navigation system for the elderly that employs cellular phones equipped with GPS. It automatically selects barrier information in an urban environment according to the user’s location and profile, and then notifies it to the user via the cellular phone. We have evaluated the usability regarding the burden of operating the system and the accuracy of barrier notification corresponding to the user profile. The remainder of this paper is organized as follows. Section 2 defines requirements of the navigation system for the elderly on the basis of the results of our preliminary survey. Then, the system overview is described in section 3, and the barrier notification mechanism in section 4. Sections 5 and 6 present the experiments and results with regard to the usability and the notification accuracy, respectively. In section 7, which is the final section, we refer to related works and present our conclusions. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 578–590, 2008. c Springer-Verlag Berlin Heidelberg 2008
Mobile Navigation System
2 2.1
579
Requirements of Mobile Navigation for the Elderly Survey Results
The survey was conducted orally and the subjects were the authors’ neighbors and relatives and members of the Japan National Council of Social Welfare. We collected mainly problems and complaints concerning daily life, and have decided to focus on the following three problems. 1. Decrease of the willingness to leave the home The main factors decreasing the willingness of the elderly to leave the home are barriers such as pedestrian ways with steps and stairs, illegally parked cars and bicycles, and lack of sidewalks. 2. Desire to be independent The elderly are grateful for the support from their families and volunteers, but most of them are reluctant to accept it. 3. Difficulty in using new machines The elderly tend not be good at using appliances and computer systems or are unfamiliar with them. 2.2
System Requirements for Elderly People
To solve these problems, we extracted the following three system requirements. 1. Barrier / useful information notification Although the ultimate solution is to remove the barriers, we propose to ICT technology to provide users who go out with advanced notification of the existence of barriers, and then indicate alternative routes in order to avoid barriers and reduce anxiety. Of course, the notified content is not restricted to barriers. Useful information such as the location of public rest rooms, slopes and places of interest would motivate the elderly to go out. 2. Easy-to-use interface The system is required to be easily operated by the elderly. Therefore, when operation is necessary, interfaces such as buttons should be of an appropriate size and well designed. Further, icons would be useful. 3. Autonomous system behavior Autonomous behavior is especially important for the elderly who tend to find information systems difficult to use. The current navigation systems for pedestrians such as NAVITIME [2] require active operation by the user, i.e., goal input, route choice, and symbol selection, etc. In a system for the elderly, however, an “agent” should actively provide information to passive users. 2.3
Proposal of Mobile Navigation System for the Elderly
To satisfy the above requirements and promote participation of the elderly in social activities, we have developed a mobile navigation system using a GPS
580
T. Kawamura, K. Umezu, and A. Ohsuga Table 1. Handicap and Will to Go
Go out actively Go out if necessary or stay at home
No disability Possible to go out alone Need Support to go out 73.5% 32.7% 0.0% 27.5%
67.3%
100.0%
cellular phone to support the elderly so that they can go out in a relaxed manner. It corresponds to a personal agent that resides in the cellular phone and automatically notifies the necessary content such as barriers to the user in the street. Also, the system provides a content-submission mechanism. It is like a social bookmarking system for the real world. Needless to say, it is practically impossible to input all content in the world and trace the changes everyday. A social mechanism for the individual users is necessary for content collection. We selected the cellular phone equipped with GPS as the system client as it will be available for widespread use in the near future, although some navigation systems are based on proprietary gadgets. In Japan, 3G cellular phones are required to be equipped with GPS from 2007 onward, and some phones customized for the elderly have sold well. Note that our target users are people who are physically able to go out, but tend to stay at home [1](Table 1). Disabled people who need physical support are outside the scope of this paper. Also, although the system we envisage would be effective for the elderly, it would be useful for other people too.
3 3.1
Mobile Navigation System System Overview
Fig. 1 shows a use case. First of all, the user starts the “agent” application on the phone, and keeps it running before going out. When the user comes to the content (barrier or significant spot) registered in a SNBI (Searching NearBy Information) server, the agent barks “bow wow” to alert the user, and displays content consisting of a picture and a brief description (Fig. 2 at right). We set a large character size and high sound volume in view of the impaired vision and hearing experienced by many of the elderly. If the user pushes the orange button when notified (Fig. 3 at left), a map of the area indicating the location of the content is displayed (Fig. 3 at right). For content submission, the user pushes a circular button for useful information or a cross button for a barrier at the location (Fig. 4 at left). Then, the built-in camera is activated (currently, 90% of cellular phones sold in Japan are equipped with a digital camera), and the user can input title, comment, tags, and time such as 9am-5pm after taking a photo (Fig. 4 at right). After completing the inputs, the user pushes the send button to register the information and the current location to the SNBI server. The content becomes available for all subsequent users who approach that location.
Mobile Navigation System
581
Fig. 1. Usecase of mobile navigation
Fig. 2. The “agent” and content notification
3.2
Network Architecture
The client system, which has been implemented as a Java application (iAppli) on cellular phones of NTT DoCoMo, uses NTT’s business “mopera” GPS location service. The SNBI server is implemented as a Java servlet. If the client queries the SNBI server to get the nearest content, the GPS module is automatically invoked
582
T. Kawamura, K. Umezu, and A. Ohsuga
Fig. 3. Orange button and map
Fig. 4. Content submission
and the retrieved location information is sent to the SNBI server from a DLP (DoCoMo Location Platform) server (Fig. 5). Then, the SNBI server searches the DB (content table) based on the location, and returns the content, if found. If not, the map including the current location is sent to the client. This is repeated every 2 to 3 minutes. The locations are stored in another DB (location history table) in the SNBI server, and are used to trace and predict the route. The SNBI Server is composed of FreeBSD 6.2, MySQL 4.0 and Tomcat 6.0 on Xeon 2.80GHzCwith 1024MB memory. 3.3
Content Notification Mechanism
Content selection is a key issue concerning the navigation system. If the user is not good at system operation and does not get the desired results, the system is likely to be abandoned. Therefore, unnecessary content notification and related operations must be reduced.
Mobile Navigation System
583
Fig. 5. System architecture
Furthermore, it is no wonder that there are many barriers and many ways around them. For example, in the case of a pedestrian bridge, there are two ways: detour or, in other words, find a crossing, or across the bridge. In that case, the notification mechanism should select proper approaches according to the user’s profile, and notify them. Note that if the way around is “across the bridge”, the “bridge” is not a barrier and should not to be notified. In this paper, we used a Bayesian network [3] to select the content. The Bayesian network estimates probabilistic parameters from a given statistical data set and a causal association model, and can be used to predict cases with uncertainty [4]. For the content selection, it’s practically impossible to collect every case for every profile, and then make rules such as in an if-then style. The Bayesian network estimates probability distribution even for missing data sets and reasons unseen cases. Fig. 6 shows our Bayesian network model and parameters for the content selection. It has environmental parameters (weather, temperature) and the user profile (purpose of going out, familiarity of the place, ability to walk a long distance, willingness to walk), and then predicts the importance of each case for the user after filtering out the locations. The location filtering is done by selecting ones located within 100 degrees ahead on the user’s route, which is simply a straight line from previous position to current position. According to this content selection based on the user profile, we would be able to do the following: if the user does not know the place well, it would be important to check the nearest police box. But, if the user goes there frequently, such notification is unnecessary. Furthermore, an escalator is of course important for
584
T. Kawamura, K. Umezu, and A. Ohsuga
Fig. 6. Bayesian network for content selection
a disabled user, but stairs should be recommended to the user who tends to walk for exercise.
4 4.1
Experiment on Usability Experiment Overview
Firstly, to check the usability of our system we conducted an experiment at Ueno Park, Tokyo, for 10 days in August 2007. Test users are randomly selected from the elderly people walking in the park (Fig. 7). Since the area includes museums, ponds and many other places of interest, it is popular with the elderly. Before the experiment, we collected the barriers and 20 to 30 significant spots and registered them in advance, such as the places where street people and crows gather, stairs and steps, and benches in the shade.
Fig. 7. Snapshot of experiment
Mobile Navigation System
585
Table 2. Profiles of test users ID Sex, Num Age 1 F, 2p 70s 2 F, 2p 70s 3 M 60s 4 M & F 80s 5 F, 2p 80s 6 M & F 70s 7 M & F 60s 8 M, 2p 70s 9 M & F 70s 10 M 80s
Freq. of visit Disability Stairs and Steps Have a Cell Phone? Several times Legs Never No 2 to 3 Legs Never No 2 to 3 Legs Never Yes Several times None Prefer not to Yes Several times None Prefer not to No 2 to 3 None Prefer not to No First time None OK No Several times None OK No Several times None OK Yes Several times None OK No
Fig. 8. Walk routes of 10 users at Ueno Park
We first explained the purpose of this experiment and instructed the subjects in the use of the client, that is, how to get notification and how to submit content. The experiment was conducted from the user’s current position to the user’s goal, and then we orally collected the responses to a questionnaire for 5 min. As a result, we obtained the walk routes (Fig. 8) and the responses to the questionnaire for 10 pairs of elderly people (Table 2). 4.2
Experiment Results
Contents Notification. In terms of notification timing along with the user’s position, the notified content was: – in front of the user’s direction (8 cases, 24%),
586
T. Kawamura, K. Umezu, and A. Ohsuga
Fig. 9. Content notification
– at the same location as the user (10 cases, 30%), – behind the user’s direction (8 cases, 24%), – misaligned such as on another street (7 cases, 21%). Also, in their responses to the questionnaire, half of the test users complained that the notifications were not provided at the right time (Fig. 9). We intend to improve this point by estimating the user’s walking speed and direction. Notification of content along with the user’s position is discussed in the next section. However, more than 40% of the users mentioned barrier notifications were useful as a whole. Therefore, although there is room for improvement, utility of a mobile navigation system that automatically notifies the proper content when the user is walking outside has been confirmed to some extent. Content Submission. In this experiment, none of the users submitted a barrier or other useful information. Major reasons are thought to be the difficulty of operation for first-time users and the relatively short distance each user walked in the experiment, resulting in a lack of interesting information to report. So we intend to consider an easier way of content submission. However, more than 70% of the users expressed a desire to submit content. Therefore, we confirmed that the elderly subjects would be willing to cooperate in order to enrich the content, provided it is made easier to do so. User Interface. Some users commented that the maps displayed on the phone were not easy to see. And some users commented that the operations are generally not difficult, as the operation for content retrieval involves pushing just one button (Fig. 10). However, the large images and large characters were appreciated, and moreover, the content was readily understood by the elderly subjects.
Mobile Navigation System
587
Fig. 10. User Interface Table 3. Users’ requests Content useful for notification request for notification police box street people and crows benches in the shade automatic vending machine park map toilets resting places restaurants
Users’ Impression. Table 3 shows the content whose notification the elderly subjects considered would be useful. In total, 90% of the users mentioned that this service is fun and they would like to use it again. However, there was a precondition for continuous use, that is, easier operation overall. We observed that the user’s route selection is different depending on whether the user is familiar with the locality or not. For example, users familiar with the locality walked on side roads in the shade, but those unfamiliar with the locality walked on the main street in the sun. Furthermore, we found the barriers can be Table 4. Static and dynamic barriers Static barrier steep stairs pedestrian bridge road without sidewalk no resting place steps, stairs
Dynamic barrier crowd in station bicycles on sidewalk road construction children in public space road under the sun space without people in the night street people, crows hawkers
588
T. Kawamura, K. Umezu, and A. Ohsuga
classified into two categories: static barriers and dynamic ones (Table 4). The former are structural barriers such as stairs and steps, whereas the latter change dynamically with the passage of time. For example, the users who walked on the shady side of the street changed their route in the evening, because street people and crows appeared as it became darker. These observations highlighted the importance of selecting notification content according to the user profile and time. In the next section, we evaluate the accuracy of content notification.
5 5.1
Experiment on Accuracy of Content Selection Experiment Overview
We conducted a second experiment to evaluate our content selection mechanism. First, in light of the knowledge gained by means of the first experiment and the responses to the questionnaires, we created 324 pairs of user profiles, environmental parameters, the barrier information, and possible reactions as a data set for learning (Table 5). This data set was input to the Bayesian network, BayoNet. We also created 50 pairs of user profiles, environmental parameters and barrier information as a data set for evaluation. Then, the Bayesian network reasoned the proper reactions for the evaluation data set. In this experiment, any barrier for which the reaction determined was “neglect” was considered to be unnecessary content, and therefore, not notified. The other content was notified together with possible reactions. Secondly, four test users selected one of the proper reactions (as correct data) for the evaluation data set, and then we compared them with the result of the Bayesian network. 5.2
Experiment Result
The experimental result is illustrated in Fig. 11. In 35 cases (70%), the results of our content selection mechanism were the same as more than three of the test users’ answers. On the assumption that we provided six alternatives for the reactions in this experiment, the result shows high accuracy of our content selection mechanism as preliminary consideration. Table 5. Part of learning data set Weather Fine Cloudy Rain Fine Fine Cloudy Cloudy Rain
T 30C+ 5Cother other 5Cother 5C5C-
Locality No Yes No Little Little Little Yes Little
Willingness to walk not walk walk for exercise walk for exercise walk for exercise not walk not walk walk for exercise not walk
Barrier Bicycles on street stairs in station Bicycles on sidewalk stairs in station crowd in station road w/o sidewalk street people stairs in station
Reaction proceed with caution escalator detour neglect change time slot proceed with caution detour elevator
Mobile Navigation System
589
Fig. 11. Comparison with Bayesian network and users’ selection
However, the results of the Bayesian network were different from the answers that all of the four users chose in 4 cases (8%). The reason is thought to be that the learning data set was too small. We are planning to conduct another experiment using a larger data set. Furthermore, the results of the Bayesian network were different from the answers that two users chose in 11 cases (22%). For example, 4 cases were about the bicycles on the sidewalk. One of the reasons is different recognition of the degree of danger among the users. Therefore, in addition to building a large data set, we are considering personalization of the content selection in the future, which will reflect the user’s previous content selection and frequent visiting points retrieved by trace logs.
6
Related Works
Sakamura et al. conducted an experiment using Ubiquitous Communicator [5]. It is a dedicated terminal that reads RFID tags embedded at several places in a town in advance, and then shows information on places nearby. RFIDs have o higher sensitivity than GPS, which makes them particularly advantageous for applications within the home. However, for outdoor use the cost of embedding RFID tags throughout a city poses a considerable problem. Yairi et al. created barrier-free maps for a few towns and launched a commercial service [6]. Users use their PCs to browse Web-based online maps and they can input barriers. Further, it can create a route to a designated goal in light of those barriers. However, there is no function to notify the barrier information to users who are outdoors. Kurihara et al. are using high-resolution GPS and trying to create a barrierfree map automatically [7]. In this project, the user’s route is extracted by the high-resolution GPS attached to a wheelchair, whose error is less than a few centimeters. Then, steps and flatness are automatically detected by the trace, speed, etc. We would like to consider the use of that GPS, but in this paper we selected the GPS cellular phone because it is already in widespread use.
590
T. Kawamura, K. Umezu, and A. Ohsuga
With regard to a social bookmarking feature in the real world, there are various lines of research such as Space Tag [8], xExplorer [9], Voting With Your Feet [10], Place-Its [11], etc. In those cases, GPS is used not only for information notification, but also for users’ communication and reminder systems. Although we have not described our content submission mechanism at length in this paper, we are currently considering the incorporation of those features to promote the use of the content submission mechanism by the elderly.
7
Conclusion
In this paper, we proposed a mobile navigation system for a GPS cellular phone to help the elderly go out into society. Then, we described two experiments on the system usability and the accuracy of the content notification mechanism by the Bayesian network, respectively. On the basis of the results of those experiments, we confirmed that this “agent” system is useful for the elderly in terms of its notification feature, although the interface needs further improvement. We are currently building a larger learning data set to improve the content notification mechanism and for the personalization, so that we will be able to promote the participation of the elderly in social activities.
References 1. Cabinet Office, Government of Japan: Annual Report on the Aging Society, http://www8.cao.go.jp/kourei/english/annualreport/index-wh.html 2. NAVITIME JAPAN: NAVITIME (in Japanese), http://www.navitime.co.jp/ 3. Motomura, Y.: BAYONET: Bayesian Network on Neural Network, Foundations of Real-World Intelligence, pp. 28–37. CSLI publications, Stanford (2001) 4. Pearl, J.: Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 241–288 (1986) 5. Ubiquitous ID Center: Ubiquitous Communicator, http://www.uidcenter.org/index-en.html 6. Shobunsha: barrier / barrier-free map (in Japanese), http://www.mapple.co.jp/corporate/product/22.html 7. Kurihara, M., Nonaka, H., Yoshikawa, T.: Use of Highly Accurate GPS in NetworkBased Barrier-Free Street Map Creation System. In: International Conference on Systems, Man & Cybernetics, pp. 1169–1173 (2004) 8. Tarumi, H., Morishita, K., Ito, Y., Kambayashi, Y.: Communication through Virtual Active Objects Overlaid onto the Real World. In: Third International Conference on Collaborative Virtual Environments (2000) 9. Munemori, J., Tri, T.M., Itou, J.: Forbidden City Explorer: A Guide System that Gives Priority to Shared Images. In: Third International Conference on Mobile Computing and Ubiquitous Networking, pp. 248–253 (2006) 10. Froehlich, J., Chen, M.Y., Smith, I.E., Potter, F.: Voting With Your Feet: An Investigative Study of the Relationship Between Place Visit Behavior and Preference. In: Eighth International Conference on Ubiquitous Computing (2006) 11. Sohn, T., Li, K.A., Lee, G., Smith, I., Scott, J., Grisword, W.G.: Place-Its A Study of Location-Based Reminders on Mobile Phones. In: Eighth International Conference on Ubiquitous Computing (2005)
Time Stamp Protocol for Smart Environment Services Deok-Gyu Lee1, Jong-Wook Han1, Jong Hyuk Park2, Sang Soo Yeo3, and Young-Sik Jeong4 1
Electronics and Telecommunications Research Institute 161, Gajeong-dong, Yuseong-gu, Daejeon, Korea, 305-700 [email protected] 2 Computer Engineering Department, Kyungnam University 449, Wolyong-dong, Masan, Kyungnam, Korea, 631-701 [email protected] 3 BTWorks Inc., Korea [email protected] 4 Dept. of Computer Engineering, Wonkwang University 344-2 Shinyong-Dong, Iksan, Jeonbuk, 570-749, S. Korea [email protected]
Abstract. By development of wireless mobile communication, many users increased. But, in case of 1st generation or 2nd generation, transfer communication service was not satisfying high speed wireless internet Communication consumer's request such as other multimedia service because serviced based on voice and text basically. Can get through service such as data and transfer multimedia service that is not service of voice putting first in wireless hereafter. Problems by much development of service are happening, because a transmit is exposed, problem point that radio net is much unlawful stealing use and tapping etc. by user that is not right can happen. Need method to keep away purpose that is enemy of third party in contract between both men as well as problem for document or accounting information which the third user that is enemy of third party is shared. By solution about problems, certification of contents for document and visual point confirmation must it. Applied service or certification of contents service that is rapidly point of time that is using in wire to solve problem that refer in front in this treatise in smart environment to develop hereafter. Way to propose proposed efficient way using individual in smart environment just as it is.
1 Introduction Mobile communication has made rapid progress after the first and second generations, with users increasing at the same time. However, the first and second generations mobile communications do not satisfy the needs of users of high speed Internet communication such as multimedia service, as they had been developed with major focus on voice and text service. Recently, mobile communication has been providing a variety of services such as data and multimedia services including voice service. New services are also being developed. Mo-bile communication service is not limited to time and geographical F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 591–601, 2008. © Springer-Verlag Berlin Heidelberg 2008
592
D.-G. Lee et al.
restrictions, providing the convenience of voice and data service. Since it utilizes radio waves as a communication medium, however, it has weakness in security. IMT-2000 is a third generation mobile communication system. It enables most of the ser-vices available for a wired network in the wireless network and ensures quality. However, unauthorized third parties can intercept transmission and use it or malicious third parties may be able to eavesdrop through a shared transmission media since wireless communication transmission is exposed. Malicious third parties can cause this type of problem, or any user can conduct vicious acts in the wireless network. The challenge for the wireless net-work lies in preventing malicious intentions by third parties. One method is to have an authentication authority that will validate communication between users and a time verification service. This type of service should be used for communication between users and for billing in coordination with the billing company. This paper discusses how to implement the time verification service and the content certification service by applying the wired network services to the IMT-2000. When considering the current development phase of the IMT-2000, however, it is impossible to examine the entire service and its system. In the case of CDMA 1X EVDO, there is a view that it is an intermediate form between the IMT-2000 and the second generation. The other view is that it is the IMT-2000 from service provider's perspective. These two different views make CDMA 1X EVDO (Evolution-Data Optimized) difficult to define completely as IMT-2000. Nevertheless, the main purpose of the IMT-2000 is to provide services avail-able to the wireless network. With the focus on the purpose of the IMT-2000, this paper will propose the methods. Part 2 of this paper discusses the IMT-2000 and its security requirements. Part 3 discusses the authentication service. Part 4 introduces the time verification service. Lastly, Part 5 analyzes the proposed methods and gives the conclusion.
2 Related Work The smart environment was done by the development of the preexistence communication service. The smart environment is carried based on the various services, and the various communications. In this section, it inquires about IMT-2000 with the existing research about the smart environment. 2.1 IMT-2000 Overview Compared with second-generation mobile communication systems such as GSM (Global System for Mobile Communication) or IS-95 CDMA (Code Division Multiple Access), the IMT-2000 system features multimedia service and global roaming. This change in mobile communication requires information protection wherein technology should be developed to fit the new environment. Corresponding to these demands, the IMT-2000 leading groups such as 3GPP (3rd Generation Partnership Project) and 3GPP2 (3rd Generation Partnership Project2) are in the process of standardizing their technologies. In particular, the 3GPP standardizes the asynchronous method that consists of ETSI (Europe Telecommunication Standard
Time Stamp Protocol for Smart Environment Services
593
Institute), ARIB (Association of Radio Industries and Business), TTA (Telecommnication Technology Association), T1 (T1 Committee), and TTC (Telecommunication Technology Committee). Currently, various groups under the 3GPP are very active. The TSG SA WG3 (Technical Specification Group Service and system Aspect Working Group 3) are responsible for security architecture, authentication mechanism, and encryption algorithm that are related to information protection. The security features required in the IMT-2000 proposed by the 3GPP are shown below. 1. Network Access Security: It provides safe access to the 3G (3rd Generation) service and prevents attacks from a third party on the radio link. 2. Network Domain Security: It provides the protection for information transmitted and signaling information on the wired portion of the network. 3. User Domain Security: It provides safe access to the MS (Mobile Station). 4. Application Domain Security: It enables the safe transmission of messages between the user and the service provider domain. 2.2 3GPP Authentication Mechanism 2.2.1 Authentication Overview The subscriber authentication of the 3GPP is performed by proving that the USIM (Universal Subscriber Identity Module) and AuC (Authentication Center) have the same secret key. Prior to providing the service, the VLR (Visitor Location Register) uses AV (Authentication Vector) created by HLR (Home Location Register)/AuC to authenticate subscribers. If the authentication is successful, the VLR and MS share CK (Ciphering Key) and IK (Integrity Key) used in the security mode. Based on the value of the KSI (Key Set Identifier), the VLR determines whether the AKA process should be omitted. The 3GPP authentication mechanism can be described using transmitted messages. Initially, the transmitted values (Initial L3) undergo the AKA process and, finally, the Security Mode Command. The detailed process is shown below. MS transmits the Initial L3 message to the VLR during the MM (Mobility Management) connection process. Actually, the RRC (Radio Resource Control) connection configuration transpires between MS and RNC before the AKA process occurs. At this time, the MS transmits encryption algorithm and integrity algorithm to RNC (Radio Network Controller). The VLR determines whether AKA should be performed based on the KSI value because Initial L3 message contains the user ID (Identification), KSI, and LAI. As such, the ID verification process can occur between the old subscriber's network (VLRo) and the new subscriber's network (VLRn). If AKA is performed, the VLR requests the creation of the AV from the HLR/AuC. If the authentication is successful, the security mode negotiation process is initiated. 2.2.2 Authentication and Security-Related Processes This chapter discusses the authentication process of each interface in detail. First, the AKA between the VLR and the HLR/AuC is discussed. The AKA between the MS
594
D.-G. Lee et al.
(Mobile Station) and the VLR, the distribution of authentication data from previously visited location, and IMSI (International Mobile Subscriber Identity), lastly, the security mode setup process are then investigated. 2.2.2.1 AKA between VLR and HLR/AuC. If the VLR (Visitor Location Register) does not have the AV (Authentication Vector) required authenticating users, it requests a new AV from the HLR/AuC. The AV required here is a prerequisite for the authentication process of the MS, since authentication is required prior to the security mode. Thus, if the VLR does not have an AV, it transmits a request message to the HLR/ and receives the AV as a response. The transmitted message includes the identity, node type, and IMSI. After the HLR/AuC creates the authentication vector as a response, it transmits the authentication vector to the VLR. 2.2.2.2 AKA between MS and VLR. The purpose of the AKA is to authenticate the MS and the VLR and set up the new CK (Cipher Key) and the IK (Integrity Key). If the VLR sends the authentication request message including RAND (Random Number), AUTN (Authentication Token), and KSI to MS, the MS compares the MAC (Message Authentication Code) that is one of the elements that make up the AUTN and the XMAC (Expected Message Authentication Code). In case of different results of the comparison, the MS sends the authentication failure message including the reason for failure of the MAC to the VLR. In case of the same results, the MS compares the SQNMS (Sequence Number) created in USIM and SQN as another element of AUTS (Authentication Synchronization Failure Parameter). After determining whether the SQN is within the valid range, the MS sends the authentication response message, i.e., the success response message of the terminal for the network authentication, or it sends the authentication failure message that the SQN is not within the range to the VLR. If the MS authenticates the network, it computes the RES as a response. The RES (Response), along with the authentication response message, is transmitted to the VLR and compared with the XRES (Expected Response) to see if they are identical. This comparison allows the network to authenticate the terminal and to complete the authentication process. Specifically, it allows the authentication to reject the message for unsuccessful authentication to be transmitted to the MS. 2.2.2.3 Process for Distributing IMSI and Authentication Data from Previously Visited Network and Identifying Them. This process occurs when moving the authentication data to the newly visited network, VLRn, from the previously visited network, VLRo, on the identical SN (Serving Network). Right after the VLRn receives the Initial L3 message containing the TMSI and the LAI from the MS; it transmits a message to verify the TMSI from the VLRo. The VLRo retrieves the information related to the TMSI from its own database and transmits the CK and IK to VLRn, i.e., it transmits the TMSI check failure message. If the identification verification process mentioned earlier fails, the VLRn uses the IMSI to verify the MS and the identification. 2.2.2.4 The Security Mode Setup Process. If the AKA is successfully completed or is omitted because of the KSI verification, the security mode negotiation process is initiated. Many negotiation processes between the MS and the RNS such as choosing
Time Stamp Protocol for Smart Environment Services
595
the encryption algorithm used during the process occur. This paper does not discuss these processes in detail because they have less relevance to the subject of the paper. However, the check for the integrity of the signaling message as a requirement is considered.
3 Notarization Service While normal transactions occur not only in real life business but also in the network, there is no special problem. However, solutions must be established in case a problem occurs with the transactions. For example, ways of checking the eligibility of the opposite party and documenting the business transactions (Contracts, Orders, Memorandum, etc) must be considered. The time verification service contained in this notarization service must meet all notarization requirements. The notarization service is provided in the presence of a notary. The time verification service includes the function of verifying the time and focuses on the time verification rather than the presence of a notary. In real life, a certificate with a seal impression and a register book can be used to validate the opposite party. These documents are official and legally backed by authorities. For existing transactions, the opposite party can also be verified with a telephone call. In the case of e-commerce, resolving the threats from the third party is presumably required. At the same time, trust from the opposite party should be acquired. This paper defines electronic notarization as a concept of obtaining the safety and trust of electronic exchange by proving the occurrence of it for who/when/what in the business transactions of the network. Specifically, electronic notarization maintains the trust of the parties concerned and plays a central role in implementing reliable transactions. It in-creases the provability and provides a useful means of preventing or resolving disputes. Electronic notarization function includes the specification of sender/recipient, delivery confirmation, alteration detection, visual electronic preservation, access log, and process log. The requirements for electronic notarization are shown below. They are identical to those of the time verification service. 1. Authentication: Authentication is carried out to enable the third party to prove who has provided what information to whom. 2. Integrity: The contents delivered between the parties concerned must not be revealed and altered during the delivery. 3. Delivery Confirmation: There must be a way to ensure that the digital information of the sender has been delivered to the recipient. 4. Readability: It means the ability of rendering the contents of data readable depending on the needs. 5. Resolvability: Data must be stored in the format that can be restored within the preservation period. 6. Timing: There must be a way to verify the information or examine the accuracy of date time data. 7. Responsibility: The most important element of responsibility is to seek safety. The responsibility for the operation must be clarified.
596
D.-G. Lee et al.
4 Time Verification Service in IMT-2000 This chapter proposes the PKI-based time verification service structure that can be applied in the IMT-2000 environment. 4.1 Elements The elements used in the proposed method are described below. The proposed method consists of the USIM, ME, SN, and AuC. The feature of each element is described below. • AuC (Authentication Center) / HLR (Home Locater Register): It sends the public key certificate to the user. • NA (Notary Authority): If the communication request from the AuC is completed, the NA receives a key from the AuC and creates a session key based on the received key. • SN (Service Network) / VLR (Visitor Locator Register): When ME0 and ME1 perform encrypted communication; the encrypted contents are stored in the SN. • ME (Mobile Equipment): It requests communication with the other MS and stores a key received from the NA in the USIM. • USIM (Universal Subscriber Identity Module): It stores the key. 4.2 System Values The proposed method uses the following figures: • CKey: Key created in the AuC • NKey: Key created in the NA • EKey: Session key used between ME's. • Res: Response value • ID*: Identity (*: ME: Mobile Equipment, AuC: Authentication Center, NA: Notary Authority) • H(): Hash Function • PKME0, SKME0: Public and private keys of ME0 • PKME1, SKME1: Public and private keys of ME1 • PKAuC, SKAuC: Public and private keys of AuC • PKNA, SKNA: Public and private keys of NA • SigAuC, SigME0, SigME1: Signature of AuC, ME0, ME1 • TSME0, TSSN, TSAuC: timestamp of ME0, SN, AuC 4.3 Proposed Method Figure 4 schematizes the protocol of the proposed method. The entire protocol consists of three stages: requesting communication, transmitting and distributing a key, and providing messages. The stage of requesting communication is the time when communication is started by ME0. If the completion message is delivered for the communication request, the AuC creates a key to deliver to the ME and transmits the message based on this key. If unexpected activities are found in the message afterwards, the time verification service is provided.
Time Stamp Protocol for Smart Environment Services
597
Based on Figure 3, the detailed description of the protocol is provided as follows: 1) Requesting communication The description of the request and the response needed for first time communication is provided below. This stage shows the initial user's request for communication from the other party for the time verification service and the other party's acceptance of the request. This stage is described as a general request stage. Initially, each user passes the authentication through the AuC, and, without going through the authentication process, receives the service through the authentication of the AuC.
① The ME requests communication with the ME from the SN . ② The SN requests the location of the ME from the AuC. ③ The AuC requests the information of the SN and the communication with the ME from the SN . ④ The SN verifies the request of the ME for communication with the ME . ⑤ The ME responds to the communication verification. ⑥ The SN transmits response results to the AuC. ⑦ The AuC transmits the response for the request to the SN . Lastly, The SN 0
1
0
0
1
0
1
1
1
0
1
1
1
0
0
transmits the response to the ME0.
2) Generating a Key Key creation required for message encryption and time verification in the proposed method is described below. At this stage, the required key receives the time verification service. Using the provided key, support such as the point of time verification and content certification will be provided.
① If ⑥ Stage of requesting communication is completed, the AuC informs that
correct communication with the ME0 and the ME1 has been made and request key was created. The AuC creates the CKey(formula (1)) and transmits the message formula (2) to the NA.
②
CKey = H(IDAuC || IDME0 || IDME1 ||SKAuC ) PKNA (CKey || SigAuC (H(CKey)))
(1) (2)
③ The NA responds to the transmission of the CKey(formula (1)) from the AuC.
The stage of requesting communication and distributing the key is completed.
3) Message transmission with key distribution, transmission, and time verification The proposed method specifies the message transmission stage after the key distribution stage. If a user who wants to transmit a key created from the AuC and the NA using the received key of the NA encrypts the message and appends the timestamp for time verification to the message, the SN of each user in the middle stores the message. If a problem occurs, the time verification or content certification service is provided upon the user's request. Using the Ckey(formula (1)) received from the AuC, the NA creates the Nkey(formula (3)) and Key(formula (4)).
①
NKey = H(IDNA || SKNA ) Key = H(IDME0 ||IDME1 ||CKey||NKey)
(3) (4)
598
D.-G. Lee et al.
② The created key transmits formula (5) and formula (6) based on the Me to the SN0 and the SN1.
PKME0 (IDME0 ||Key || SigAuC (IDME0 ||Key)) PKME1 (IDME1 ||Key || SigAuC (IDME1 ||Key))
③ ④ ⑤
(5) (6)
Again, the SN0 and the SN1 transmit formulas (5) and (6) based on the ME0 and the ME1. The ME0 and ME1 store the received key in the USIM. The ME0 encrypts the message M with Key and transmits the message. EKey (IDME0 || M || TSME0, SigME0 (H(IDME0 || M || TSME0 )))
⑥ After the SN
(7)
0 stores the transmitted formula (7), it transmits the formula to the SN1. When storing, the SN0 attaches the timestamp with its own secret key.
SigSN0 (EKey (IDME0 || M || TSME0, SigME0(H(IDME0 || M || TSME0 )))|| TSSN0 )
⑦ Likewise, after the SN
(8)
1 stores the transmitted formula (7), it transmits the formula to the ME1. The SN1 also attaches the timestamp and stores the formula.
SigSN1 (EKey (IDME0 || M || TSME0, SigME0 (H(IDME0 || M || TSME0 )))|| TSSN1 )
(9)
⑧ After the SN stores the formula, it sends off the verification message to the NA. ⑨ After the ME verifies the received formula (7), it sends the verification 1
1
response to the SN1.
EKey (IDME0 ||RES, SigME1 (IDME0 || RES))
(10)
⑩ The SN sends the response to the SN , which then sends the response message to the ME . ⑪ The ME sends the response verification message and completes the entire 1
0
0
0
process.
5 Analysis of the Proposed Method This chapter analyzes the time verification, certification stage, and the security details of the proposed method. 5.1 Time Verification and Content Certification This chapter discusses the time verification stage of the proposed method. When a message is altered for any reason, a user can request the confirmation for the time verification and content certification from the NA. In this case, confirmation can be requested by a user or all the users involved. The NS is responsible for the final alteration.
① The uncertainty check for the message is performed upon the request of the ME. ② If a request for the uncertainty check is received from a user, the NA requests
the Ckey(formula (1)) from the AuC. Using the Ckey and Nkey(formula (3))
Time Stamp Protocol for Smart Environment Services
599
transmitted from the AuC, the NA creates a Key(formula (4)) and transmits the Key(formula (4)) to the SN stores the message. AuC -> NA: CKey H(IDAuC ||IDME0 || IDME1 || SKAuC ) NKey H(IDNA || SKNA ) Key H(IDME0 || IDME1 || CKey || NKey )
(1) (3) (4)
③ Each SN performs decryption using the delivered key. The users can use the
time-stamp TS(Time-Stamp) of SN stored in the SN to receive the confirmation for the transmitted time and the content certification.
SigSN0 (EKey (IDME0 || M || TSME0, SigME0 (H(IDME0 || M || TSME0 )))|| TSSN0 ) SigSN1 (EKey (IDME0 || M || TSME0, SigME0 (H(IDME0 || M || TSME0 )))|| TSSN1 )
(8) (9)
5.2 Consideration of Proposed Method The proposed method prevents eavesdropping and alteration on the transmission path using the public key of each object or the session key. Moreover, it prevents each object from maliciously corrupting other objects. In case a user consults with the server, it cannot be created on the server since the CKey is a hash value created from the user ID and the private key of the AuC. Thus, it can prevent illegal consultation between a user and the server. Even if the SN has illegal intentions, it cannot verify the message until a confirmation request is made since it would not be able to obtain the key delivered from the NA. The time verification can use the encrypted message stored in the SN to provide the con-tent certification service to the users or legal authority. Table 1 performs the analysis according to the proposed method. The results are shown below. In examining each part, the authentication for the AuC was carried out and the communication request with the other party is made since the authentication service is initially pro-vided through AuC. Thus, if the authentication with the AuC is correct, the authentication with the party concerned is passed. Integrity is implemented through the signatures of the parties concerned. At this time, threats from the insufficient communication path or threats from the sender/recipients can be properly handled using the keys created in the HE and the NA. Likewise, data for the delivery certification stored in the SN can be decrypted using the key of the NA and the AuC when resolving future disputes between the parties concerned. The delivery certification can be verified in the SN where the parties concerned belong to because it is stored in each SN after the delivery of documents. These documents can be retrieved when a problem occurs in the future. Readability renders the data readable depending on the contents. Thus, the parties concerned should be able to read data. This feature allows the AuC and the SN to use the created key to read data upon the request of the parties concerned. Preservability can determine how long data should be preserved depending on its importance. The party can determine when a message is transmitted through the timestamp. The timing verifies when data was transmitted. It can verify whether the data has been changed or altered by attaching the timestamp whenever the data undergoes each element.
600
D.-G. Lee et al. Table 1. Proposed Scheme Analysis
Proposed Scheme Authenti cation
AuC according to existent certification system each person concerned certification
Integrity
EKey(IDME0 || M || TSME0, SigME0 (H (IDME0 || M || TSME0 )))
delivery receipt
SigSN0(EKey(IDME0 || M || TSME0, SigME_0 (H(IDME0 || M || TSME0 )))|| TSSN0 )
continua nce
EKey(IDME0 || M || TSME0, SigME0 (H(IDME0 || M || TSME0 )))
responsi bility
CKey=H(IDAuC || ||SKAuC )
IDME0
||
IDME1
Analysis This can reduce the authentication process between users by verifying with the party concerned. Encryption using the signatures between the parties concerned according to the delivered messages and keys created by AuC and NA Delivery certification can be verified by storing the following message in the SN and through the verification process as necessary. Each timing is verified by comparing the elements using the TS at each delivery point. The creation and delivery of a key is done in the NA because the key of the AuC is created at the request of the NA. The NA and AuC are responsible for the operation.
Lastly, the AuC and NA are responsible for the operation. Likewise, the SN along with the AuC has the authentication and trust system. Thus, the AuC and NA are responsible for all operations excluding illegal ones between related parties.
6 Conclusion To date, mobile communication has provided many services since the first and second generations. Preparing for future services, the third generation anticipates not only multimedia services but also many other services. However, problems as the contract between users, billing time issues, and content certification between a user and content provider can occur. To resolve these problems, the time verification service with notary authority has been proposed. Since time verification services can provide not only the notarization
Time Stamp Protocol for Smart Environment Services
601
ser-vice but also the content certification service, the key can be obtained from the notary authority to validate the content when a user raises an issue about the records. This paper introduced the part of IMT-2000 service in the future. The services mentioned here are expected in the near future. At the same time, more services providing convenience and safety are foreseen.
References [1] 3GPP TS 33.102 : 3rd Generation Partnership Project(3GPP); Technical Specification Group Services and System Aspects; 3G Security; security Architecture [2] 3GPP TS 22.022 : 3rd Generation Partnership Project(3GPP); Technical Specification Group Services and System Aspects; Personification of UMTS Mobile Equipment (ME); Mobile functionality specification [3] 3GPP TS 33.103: 3rd Generation Partnership Project(3GPP); Technical Specification Group Services and System Aspects; 3G security; integration Guidelines [4] 3GPP TS 33.105: 3rd Generation Partnership Project(3GPP); Technical Specification Group Services and System Aspects; 3G security; Cryptographic Algorithm Requirements [5] 3GPP TS 33.120: 3rd Generation Partnership Project(3GPP); Technical Specification Group Services and System Aspects; 3G security; Security Principles and Objectives [6] 3G TR 33.901: 3rd Generation Partnership Project(3GPP); Technical Specification Group Services and System Aspects; 3G security; Criteria for cryptographic algorithm design process [7] 3G TR 33.902: 3rd Generation Partnership Project(3GPP); Technical Specification Group Services and System Aspects; 3G security; Formal Analysis of the 3G Authentication Protocol [8] 3G TR 33.908: 3rd Generation Partnership Project(3GPP); Technical Specification Group Services and System Aspects; 3G security; General Report on the Design, Specification and Evaluation of 3GPP Standard Confidentiality and Integrity Algorithms [9] ETSI SAGE: Security Algorithm Group of Experts(SAGE); General Report Design, Specification and Evaluation of the MILENAGE Algorithm Set: An Example Algorithm Set for the 3GPP Authentication and Key Generation Functions [10] ESTI SAGE: Specification of the MILENAGE Algorithm Set: an Example Algorithm Set for the 3GPP Authentication and Key Generation Functions f1,f1^* ,f2,f3,f4,f5 and f5^*. Document 1: Algorithm Specification [11] ESTI SAGE: Specification of the MILENAGE Algorithm Set: an Example Algorithm Set for the 3GPP Authentication and Key Generation Functions f1,f1^* ,f2,f3,f4,f5 and f5^*. Document 2: Implementers’ Test Data [12] ESTI SAGE: Specification of the MILENAGE Algorithm Set: an Example Algorithm Set for the 3GPP Authentication and Key Generation Functions f1,f1^* ,f2,f3,f4,f5 and f5^*. Document 3: Design Conformance Test Data [13] ITU: ITU-R Security Principles For International Mobile: Telecommunications-2000 (IMT-2000) Recommendation ITU-R M.1078 [14] ITU: Evaluation Of Security Mechanisms For IMT-2000: Recommendation ITU-R M.1223
An Analysis of the Manufacturing Messaging Specification Protocol Jan Tore Sørensen1 and Martin Gilje Jaatun2 1
2
mnemonic as, NO-0167 Oslo, Norway SINTEF ICT, NO-7465 Trondheim, Norway [email protected]
Abstract. The Manufacturing Messaging Specification (MMS) protocol is widely used in industrial process control applications, but it is poorly documented. In this paper we present an analysis of the MMS protocol in order to improve understanding of MMS in the context of information security. Our findings show that MMS has insufficient security mechanisms, and the meagre security mechanisms that are available are not implemented in commercially available industrial devices. Keywords: Process control; Industrial Networks; Protocols; Security.
1
Introduction
We will in this paper present an analysis of the Manufacturing Messaging Specification (MMS1 ) protocol and propose improvements based on our findings. MMS is defined in the ISO 9506 standard [1], and there is also some information available in white-papers from SISCO2 [2,3,4]. We have tested the MMS protocol as implemented in a major brand of controller used in the process control industry. We will in the following look closer into the construction of MMS packages and how they may be altered and forged. MMS is an application-layer protocol which specifies services for exchange of real-time data and supervisory control information between networked devices and/or computer applications. It is designed to provide a generic messaging system for communication between heterogeneous industrial devices, and the specification only describes the network-visible aspects of communication. By choosing this strategy, the MMS does not specify the internal workings of an entity, only the communication between a client and a server, allowing vendors full flexibility in their implementation. In order to provide this independence, the MMS defines a complete communication mechanism between entities, composed of [3]: 1. Objects: A set of standard objects which must exist in every conformant device, on which operations can be executed (examples: read and write local variables, signal events). 1 2
In this paper, MMS does not stand for Multimedia Messaging Service, as is often the case elsewhere. System Integration Specialists Company - not to be confused with Cisco.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 602–615, 2008. c Springer-Verlag Berlin Heidelberg 2008
An Analysis of the Manufacturing Messaging Specification Protocol
603
2. Messages: A set of standard messages exchanged between a client and a server station for the purpose of controlling these objects 3. Encoding Rules: A set of encoding rules for these messages (how values and parameters are mapped to bits and bytes when transmitted) 4. Protocol: A set of protocols (rules for exchanging messages between devices). MMS composes a model from the definition of objects, services and behavior named the Virtual Manufacturing Device (VMD) Model. The VMD uses an object-oriented approach to represent different physical industrial (real) devices in a generic manner. Some of these objects are variables, variable type definitions, programs, events, historical logs (called journals) and semaphores. Along with the definition of these objects, MMS defines a set of communications services that an application can use to manipulate these objects. We observe that in the literature the terms services, service primitives and messages are all used to describe the functions that manipulate objects or their attributes. We will therefore in this paper use the term service primitive as this is used in the ISO 9506 standard [1], unless we are citing directly from a written source, in which case the quote will be evident in the text. The standard also refers to physical industrial devices as “real devices” and we will continue to use this terminology to avoid confusion. As MMS is based on an object oriented approach, we will give a brief introduction to the addressing and object hierarchy of MMS, before focusing on the network communication. 1.1
Architecture and Addressing
The MMS architecture is based on a common client server model. Real devices used in industrial networks often contain an MMS server allowing the device to be monitored and managed from an MMS client. An MMS client is typically part of an control builder application or an MMS-to-OPC gateway (MMS/OPC GW). A control builder is an application used to program and monitor industrial controllers. Both the control builder and the MMS/OPC GW use service primitives provided in the MMS to manage devices containing MMS servers. This is depicted in Fig. 1 [2]. As MMS does not specify how to address clients and servers, an entity containing an MMS client or server must rely on the addressing scheme of underlying protocols in the process of establishing an application association to support the MMS environment[1]. In practice, clients and servers are addressed by their IP address and the MMS server uses port number 102. The addressing allows for an MMS context to be negotiated between two peer applications. To address an MMS object variable, MMS provides several different address modes. MMS allows an address to have different syntax, based on the implementer’s choice of what is most appropriate for that device. The specification separates between named and unnamed variables.
604
J.T. Sørensen and M.G. Jaatun
Fig. 1. The VMD model depicting communication between an MMS client and an MMS server
1.2
MMS Objects, Services Primitives and Access Control
Associated with each object is a set of variables that describe values in a given instance of the object. For each object there are corresponding MMS service primitives that allow client applications to access and manipulate those objects. The top level object in the MMS is the VMD which has at least one networkvisible address. Each real device is represented by a real object with vendor-specific features associated with them. The VMD model maps the real object and devices onto virtual objects and devices, described in a generic manner which is in conformance to the VMD model. In other words, a real variable is an element of typed data that is contained within a VMD object. An MMS variable is part of a virtual object that represents a mechanism for the MMS client to access the real variable. The MMS server containing the virtual MMS object can be understood as a communication driver which hides the specifics of a real device from the client. From the client’s point of view the virtual MMS variable represents a pointer or an access method to the real variable and it is only the MMS server with its objects and its behavior that is visible to the client. The MMS client can never interact with real device variables directly. All MMS objects contain an access method variable. This attribute contains the information which a device needs to identify the real variable as described above. It contains values which are necessary to find the memory location of the real variable with the contents that lie outside MMS. A special method, the method PUBLIC, is standardized for accessing the real variables.
An Analysis of the Manufacturing Messaging Specification Protocol
605
Table 1. The basic methods inherited from the VMD object MMS General Description methods Get This method is used to obtain the value of a specified object. Set This method is used to write/put value or contents into a specified object. Query Attributes This method is used to obtain structure or capability information of a specified object. Create This method allows objects of particular classes to be instantiated. Rename This method allows instantiated objects to be renamed. Delete This method allows instantiated objects to be destroyed.
For each object there are corresponding MMS service primitives that allow client applications to access and manipulate those objects. The MMS defines the service primitives of both clients and servers, but the VMD focuses only on specifying the network-visible behavior of MMS servers. And thus, each vendor of an MMS server device is responsible for hiding vendor specific details of the real objects and devices by providing an executive function which maps the real entities up to the virtual level, which shall comply with the VMD model definitions. To ensure vendor implementation compliance with the VMD model, it specifies how MMS devices containing an MMS server shall provide a consistent and well-defined view of the object contained in the VMD. And thus, MMS provides a common interface for communication with different devices through the generic virtual objects. All MMS objects except the Operator Station object inherit six abstract services from the VMD object. These are depicted and described in table 1. E.g. service primitives read and RequestDomainUpload for the objects Named Variable List and Domain respectively inherit from the abstract service primitive get. MMS uses access control lists to provide explicit control of the ability to access or alter MMS objects. Protection requirements for an MMS variable are inherited from the underlying real variable in the real device. These requirements are established by the access method in the MMS object. ISO 9506 [1] states that each object within an MMS implementation must contain a reference to an Access Control List object that specifies the conditions under which services directed at the named object may succeed. For the purposes of specifying the control conditions, services are grouped into six classes as described in table 1. Access control is enforced through special mechanisms provided by MMS. These mechanisms include possession of a semaphore, identity of user (Application Reference), and the submission of a password (which may be arbitrarily complex). 1.3
Network Services
As we have stated earlier, MMS is not by itself a communication protocol, as it only defines messages that have to be transported by an unspecified network.
606
J.T. Sørensen and M.G. Jaatun
MMS was originally developed as a part of the MAP specification [5], and is therefore specified on all seven OSI layers as depicted in figure 3. MAP was originally created by General Motors as an internal standard for communications in industrial automation networks. It is now a public, multivendor communications standard for industrial automation equipment. MMS supports the use of both confirmed and unconfirmed services, but we will in this paper focus on the confirmed services. The MMS defines the following Protocol Data Unit (PDUs) for a confirmed service exchange: – – – – – – –
Confirmed-RequestPDU Confirmed-ResponsePDU Confirmed-ErrorPDU Cancel-RequestPDU Cancel-ResponsePDU Cancel-ErrorPDU RejectPDU
These messages will be used in the communication between a MMSclient and server. When a client wishes to invoke a service primitive on the server side application (the VMD), the transitions depicted in Fig. 2 may occur, depending on the responce from the VMD. Before a service primitive is called through a Confirmed-RequestPDU, the server is in a Responder Idle state. Upon receipt of a Confirmed-RequestPDU (1) for any of the confirmed services, the MMS-provider issues an indication primitive to the application, specifying the particular service being requested and an invoke ID that specifies the service instance and enters the state Service Pending Responder. Upon receipt of a response service primitive, from the overlaying application, containing a result parameter specifying the service previously indicated and an invoke ID that identifies the service instance, the MMS-provider sends a Confirmed-ResponsePDU (2) which specifies the service type and the invoke ID from the response primitive along with the requested data. Then a state transition into the Responder Idle state occurs. If the application can not provide the requested data, the MMS-layer will receive a response service primitive containing a Result parameter specifying an errorstate along with the invoke ID used to identify the the requesting MMS service instance. The MMS-layer then sends a Confirmed-ErrorPDU (3) containing the service type and the invoke ID from the response primitive. When sending the Confirmed-ErrorPDU a state transition back into the Responder Idle state occurs. Upon receipt of a Cancel-RequestPDU from the client containing the invoke ID of the matching service instance request, the MMS-layer on the server issues a cancel indication service primitive to the overlaying application. This indication specifies the invoke ID of the service request to be canceled; this information is obtained from the Cancel-RequestPDU parameters. The state Canceling Service Responder (4) is then entered. Now two things may occur on the server side: 1) If the application has provided the requested data, the server sends a
An Analysis of the Manufacturing Messaging Specification Protocol
607
Fig. 2. The MMS Confirmed Service Request as seen by the Service Responder (server)
Cancel-ResponsePDU containing the data and Confirmed-ErrorPDU entering the Responder Idle state through transition 6 (T6). 2) If the application has not provided the requested data the server sends a Cancel-ResponsePDU without any data and enters the Service Pending Responder state through T5. The server then sends a Confirmed-ErrorPDU through T3 and returns to the Responder Idle state. According to [1] the MMS runs on the network stack depicted in Fig. 3. Like all ISO standards this network stack relates to the Open System Interconnection stack describing the abstract service layers such as session and presentation layer. We will now give a short description of some of the protocols/layers based on their relevance to this paper theme.
2
ASN.1
MMS uses ASN.1 to encode data at the OSI presentation layer before transmission. The ASN.1 representation of data is independent of machine-oriented structures and encodings and also of the physical representation of the data (referred to as transfer syntax in communication terminology). MMS uses BER to encode ASN.1 data before transmission. As we will be decoding BER code, we will explain BER encoding in the next section.
3
BER
The Basic Encoding Rules are one of the original encoding rules specified by the ASN.1 standard for encoding abstract information into a concrete data stream. The rules, collectively referred to as a transfer syntax in ASN.1 parlance, specify
608
J.T. Sørensen and M.G. Jaatun
Fig. 3. The MMS communication stack specified as OSI layers
the exact octet sequences which are used to encode any given data item before it is transmitted over a network. The BER syntax is defined by the ITU-T’s X.690 standards document, which is part of the ASN.1 document series. BER is a self-identifying and self-delimiting encoding scheme, which means that each data value can be identified, extracted and decoded individually [6]. Each data element is encoded using a triplet consisting of a type identifier (tag), a length description and the actual data element. The use of such a triplet for encoding is commonly referred to as a tag-length-value (TLV) encoding. A generic triplet is depicted below. [identifier (tag)] [length (of the contents)] [contents] The use of TLV encoding allows any receiver to decode the ASN.1 information from an incomplete information stream, without any pre-knowledge of the size, content or semantic meaning of the data. Assuming that the communicating parties share the same context specific module definitions. BER uses a unique code as an identifier for an ASN.1 data type. This identifier is encoded as one or more bytes of every data type and creates the tag. The identifier is well-structured to allow the representation of three levels of Table 2. Description of the BER identifier Bit number 7 0 0 1 1
6 5 4 3 2 1 0 Implication 0 UNIVERSAL 1 APPLICATION SPECIFIC 0 CONTEXT SPECIFIC 1 PRIVATE 0 primitive data type 1 constructed data type X X X X X numeric identifier
An Analysis of the Manufacturing Messaging Specification Protocol
609
information within one such code, as illustrated in table 2. On the highest level, represented by the highest-order two bits of the tag octet(s), the class of the data type is encoded [6]. The third highest bit of the identifier indicates whether the represented data type is a primitive or constructed one. A constructed data type can be seen as a complex or compound data type hierarchically based on one or more primitive data types. The remainder of the identifier is a numeric tag associated with a data type within a class. Tags ranging from 0 to 30 can be associated with the remaining 5 bits of the octet. For larger tags, these 5 bits are set to 111111, and one or more subsequent octets are used to encode the tag.
4
Analysis of MMS Communication
ISO 8823 states that the OSI transport protocol exchanges information between peers in discrete units of information called Transport Protocol Data Units (TPDUs) [7]. This is a fundamental difference between the TCP and the network service expected by Transport Protocol Class 0 (TP 0). The difference is that TCP manages a continuous stream of octets, with no explicit boundaries, while TP0 expects information to be sent and delivered in discrete objects termed network service data units. Therefore RFC 1006 [8] describes that all TPDUs shall be encapsulated in discrete units called TPKTs. The TPKT layer, depicted in figure 4, is used to provide these discrete packets to the OSI Connection-Oriented Transport Protocol (COTP) on top of TCP. We have intercepted some packages using Wireshark. As Wireshark did not support the MMS protocol at the time of testing, we were forced to manually decode the MMS PDUs. When looking at MMS communication in Wireshark we found the underlying MMS protocol stack depicted in 4. As there was no publicly available documentation on how the vendor had implemented the MMS protocol we had to analyse the protocol stack. As we see in Fig. 4, there are two protocols running on top of TCP. Above TCP we find TPKT, which is a packet format used to emulate the ISO transport services COTP on top of TCP. RFC 1006 [8] describes how to implement ISO’s transport protocol class 0 on top of TCP. ISO 9506 [1] stipulates the use of OSI transport class 4 in conjunction with MMS. Nevertheless RFC 1006 [8] describes the use of OSI transport class 0 to emulate an ISO Transport Service on top of the TCP.
Fig. 4. The MMS communication stack as Wireshark detects it
610
J.T. Sørensen and M.G. Jaatun
Fig. 5. Format of TPKT header
Fig. 6. Format of COTP PDU
The reason for using ISO’s OSI transport class 0 on top of TCP/IP instead of transport class 4 is that transport class 0 achieves identical functionality as transport class 4 when running on top of TCP. The TCP layer provides reliable transport service through error detection and retransmission. It also handles segmentation and reassembly of PDUs. As TCP provides all these properties as part of its service to the next layer, there is no reason to implement them again. A TPKT consists of two parts: a packet-header and a TPDU. The format of the header is constant regardless of the type of packet, as illustrated in Fig. 5. The field labeled vrsn is the version number which according to RFC 1006 [8] always is three. The next field, reserved, is reserved for further use. The last field is the packet length. This field contains the length of entire packet in octets, including packet-header. The maximum TPDU size is 65531 octets, with a payload of maximum 65524 octets. According to Wireshark we find COTP above the TPKT layer. RFC 0905 [9] describes the ISO 8073 specification. The COTP PDU is described in Fig. 6. The header length in octets is indicated by a binary number in the length indicator (LI) field. This field has a maximum value of 254 (1111 1110)3 . The next field is divided into two parts, first the PDU type specification (T), which describes the structure of the rest of the PDU, e.g., Data Transfer (1111) as described in Figure 6. The PDU type is encoded as a four bit word. The full list of codes for data types can be found in [9]. The second part, is the credit part (CDT) which is used to indicate a reliable transport service, but this is always set to 0000 as TP 0 does not offer reliable transport. The third field contains the TPDU number and an end of transfer indication flag. In all data transfer packets 3
The value 255 (1111 1111) is reserved for possible extensions.
An Analysis of the Manufacturing Messaging Specification Protocol
611
the EOT flag is set and the TPDU number is zero; this might be because the service relies on TCP sequence numbering on the transport layer, but we have not found any written documentation to support this theory. We wish to note that there is no reference to ACSE in our packet dump. We verified through Wireshark’s documentation that ACSE is a supported protocol [10]. That leaves us with two possible conclusions: 1. The MMS protocol uses an implementation of ACSE which is not in conformance with the standard, which leaves Wireshark unable to decode the packet layer. 2. The implementors of the MMS protocol have omitted the ACSE layer when implementing the protocol. We are fully capable of decoding the whole payload of the COTP PDU to MMS structured ASN.1 text. We therefore conclude that the current implementation of the MMS has omitted the ACSE layer, making the ACSE authentication facilities forfeit. This means that there are no authentication or access control facilities at the lower layers of the MMS stack.
5
Decoding MMS Communication
Now knowing the underlying protocols which MMS is running on, we will study the MMS message communication between the MMS client and the MMS server and try to determine if there are any signs of security mechanisms. We used the client software to create a small program which we downloaded to the controller over MMS. The program was a very simple counting application as decribed in C code below: int i=0; while(TRUE) i=i+1; The controller will now report the value of i back to the client at regular intervals using MMS. Once the program was downloaded to the controller and running, we used Wireshark to capture MMS communication on the network. The first thing we noticed when we examined the packet dump in Wireshark, was that there is a pattern in packet communication repeating itself in a period of eight. This pattern was first identified by packet sizes repeating themselves at a period of eight. We wish to note that we chose an arbitrary packet in our packet dump as our starting point and decoded packages sequentially from that point. We chose this strategy to simulate an attacker tapping into a network at an arbitrary point in time.
612
J.T. Sørensen and M.G. Jaatun
5.1
Decoding the First PDU
We will now look closer at the first PDU and attempt to decode it. We have extracted the payload of the COTP package at our randomly chosen starting point. We know from the manufacturer’s web page that the equipment employs MMS. a0 80 11 4f 35
41 08 24 52 32
02 24 48 4d 36
01 4d 57 30 35
7b 53 53 11 38
a4 47 34 a0 39
3c 24 35 0f 36
a1 31 38 80
3a 24 35 0d
a0 24 34 24
38 30 33 4d
30 15 32 53
0c a0 30 47
a0 13 3a 24
0a 80 4e 35
The package above is encoded in BERs TLV format. We know this from [2] and [3] which are whitepapers publicly available on the internet. We must therefor use the decoding rules described in section 3 to decode each TLV pair. When decoding this first PDU, with the help of the MMS syntax module[11], we found that this is a confirmed-Request PDU. This confirmed-Request PDU contains an integer id named invokeID with value 123 and confirmedServiceRequest for an read operation. The read-request specifies a listOfVariables with three items. Each item is a vmd-specific object name containing the identifier. We decoded these identifiers to: – $MSG$1$$ – $HWS45854320:NORM – $MSG$55265896 Space does not permit going through the entire decoding process, but for illustrtive purposes, the beginning of the first PDU is decoded below. Using table 2 we decode the first PDU to the following textual ASN.1 structure: 1
3
5
a0 41
CONTENT SPECIFIC c o n s t r u c t e d nr 0 LENGTH=65 02 01 7b
7
9
UNIVERSAL p r i m i t i v e nr 2 (INTEGER) LENGTH=1 123
a4 3c
CONTENT SPECIFIC c o n s t r u c t e d nr 4 LENGTH=60
11
13
15
a1 3a
CONTENT SPECIFIC c o n s t r u c t e d nr 1 LENGTH=58 a0 38
CONTENT SPECIFIC c o n s t r u c t e d nr 0 LENGTH=56
17
30
UNIVERSAL c o n s t r u c t e d nr 16 (SEQUENCE)
An Analysis of the Manufacturing Messaging Specification Protocol
0c
19
LENGTH=12
a0 0a
21
613
CONTENT SPECIFIC c o n s t r u c t e d nr 0 LENGTH=10
23
80 08
25
24 4d 53 47 24 31 24 24
27
29
31
33
35
CONTENT SPECIFIC p r i m i t i v e nr 0 LENGTH=8 $ M S G $ 1 $ $
. . . etc .
We observe that MMS utilizes many CONTENT SPECIFIC tags to identify MMS specific data types. As stated earlier we can use the MMS module definition to decode these tags. This module is publicly available for anyone at [11] or TM through Google .
6
Security in MMS
After analysing MMS protocol communications we will in this section look into the security mechanisms defined by the MMS standard. The standard does specify means for access control through accessControlList objects. We quote from the ISO standard [1]: T he &accessMethod field for an Named Variable object shall specify the mode of access. If the Address is declarable (and obtainable) using MMS services, the &accessMethod field shall have the value public, and the Address attribute shall be defined and available to MMS clients requesting the attributes of the Named Variable object. Otherwise, the value of this field is a local issue. The public access method shall not be available unless vadr is supported. From the quote above we see that each Named Object Variable has an accessMethod field, which specify the mode of access. According to the standard, if the address is declarable and obtainable using MMS service primitives the accessMethod shall have the value public. Access to all objects can be controlled by a special object, the Access Control List, that tells which client can read, delete or modify the object. On a general level MMS specifies that if the accessMethod is public the following field shall appear and if the &accessMethod is anything but public, the following field shall not appear. But there are some exceptions.
614
J.T. Sørensen and M.G. Jaatun
An MMS server may declare an MMS variable that exists only at the instant of access. Such a variable does not have an address per se, but is still accessible at that instant. We see that the standard provides mechanisms for access control, but to our knowledge there are no other security mechanisms included in the MMS standard. We quote from the Security Considerations section in the ISO standard [1]: When implementing MMS in secure or safety critical applications, features of the OSI security architecture may need to be implemented. This International Standard provides simple facilities for authentication (passwords) and access control. Systems requiring a higher degree of security will have to consider features beyond the scope of this International Standard. This International Standard does not provide facilities for non-repudiation. As stated above, MMS itself is not designed with information security in mind. This indicates that security should be enforced at some lower layer, but as we have seen through our analysis of the MMS, there is no security enforced at any layer. The ACSE layer could have offered some security features, but as the ACSE layer is omitted from our implementation those features are forfeit. According to the ISO standard the MMS protocol should have implemented some simple facilities for password authentication and access control. We wanted to study these mechanisms to see what security they really offer, but as they are at best optional and at worst not implemented they provide no security what so ever. We have through our analysis seen that they do not exsist. When analysing the MMS we have found no protection against replay attacks. This is a major concern, as anyone with access to the network may sniff up a packet and then replay it on the network at a an inappropriate moment. We do not regard the invokeID field as such a mechanism as it is easily changed.
7
Conclusion
The Manufacturing Messaging Protocol (MMS) is a complex protocol that is rendered even more complex by the implementation of an OSI transportation protocol on top of TCP. MMS offers very limited security mechanisms, and equipment we have studied does not appear to have implemented even these mechanisms. It is clear that if MMS is to be used in process control networks that have to fulfil information security requirements, major modifications have to be made to the protocol.
Acknowledgements The research for this paper was conducted while Mr. Sørensen was a student at the Norwegian University of Science and Technology (NTNU).
An Analysis of the Manufacturing Messaging Specification Protocol
615
References 1. Industrial automation systems, Manufacturing Message Specification. Part 1, ISO ISO Standard ISO 9506-1:2003(E) (2003) 2. Overview and introduction to the Manufacturing Message Specification (MMS), System Integration Specialists Company (SISCO), 6605 19,5 Mile Road, Sterling Heights, MI 48314-1408, USA, Tech. Rep. (1995), http://www.sisconet.com/downloads/mmsovrlg.pdf 3. Falk, H., Robbins, J.: An Explanation of the Architecture of the MMS standard. System Integration Specialists Company (SISCO), 6605 19, 5 Mile Road, Sterling Heights, MI 48314-1408, USA, Tech. Rep. (1995) 4. Falk, H., Burns, D.M.: MMS and ASN.1 Encoding. System Integration Specialists Company (SISCO), 6605 19, 5 Mile Road, Sterling Heights, MI 48314-1408, USA, Tech. Rep. (2001) 5. Floyd, L., Ronald, D.: Manufacturing automation protocol. In: Conference Record - International Conference on Communications, pp. 620–624 (1985) 6. Basic encoding rules, Vijay Mukhi’s Computer Institute, India (February 2007), http://www.vijaymukhi.com/vmis/ber.htm 7. Information technology – Open Systems Interconnection – Connection-oriented presentation protocol: Protocol specification. ISO ISO Standard ISO/IEC 88231:1994 (1994) 8. Rose, M.T., Cass, D.E.: RFC 1006: ISO transport services on top of the TCP: Version 3 (May 1987), obsoletes RFC0983. Updated by RFC 2126. Status: STANDARD, ftp://ftp.internic.net/rfc/rfc1006.txt 9. McKenzie, A.M.: RFC 905: ISO transport protocol specification ISO DP 8073 (April 1984), ftp://ftp.internic.net/rfc/rfc905.txt 10. Wireshark wiki on ACSE, Published on Wireshark’s wiki page (June 2006), http://wiki.wireshark.org/ACSE 11. SISCO’s MMS syntax, Published on Systems Integration Specialists Company (SISCO), Inc. Homepage (February 2007), http://www.sisconet.com/downloads/mms abstract syntax.txt
A Long-Distance Time Domain Sound Localization Jhing-Fa Wang, Jia-chang Wang, Bo-Wei Chen, and Zheng-Wei Sun Department of Electrical Engineering, National Cheng-Kung University No.1, Dasyue Rd., East District, Tainan City 701, Taiwan, R.O.C. [email protected]
Abstract. This paper presents a computer simulation of a long distance time domain sound source localization algorithm that localizes a single sound source by using two separated channel microphones. This paper uses the TimeDifference-Of-Arrival (TDOA) method for estimating the time delay difference between the incoming sound signals to microphones using the AverageMagnitude-Difference-Function (AMDF) algorithm. This algorithm has the main advantage of requiring no multiplication operations. The key design objectives are to reduce hardware circuit complexity and implement the algorithm on FPGA board. The computer simulation results have average error within 8° of accuracy in 1 to 5 meters.
1 Introduction Numerous localization system of sound source using microphone arrays has been explored in the past [1][2][3]. The precise and robust localization system is often necessary for a variety of applications, such as speech recognition system, robot [4] and toy products. In recent years, portable consuming electronic products toward to short, small, light and thin, multiple microphone are not suitable to use due to the limitation area. Now, the most widely to use localization of sound source algorithm is Generalized Cross-Correlation (GCC) [5]. This method is to get the TDOA through look for the correlation peak values. The Generalized Cross-Correlation algorithm has been already implemented on FPGA board [6] and chip [7]. In this paper, we expect using another algorithm called Average-MagnitudeDifference-Function (AMDF). This method keeps high accuracy comparing with other method [8][9]. It only uses subtraction operations to get TDOA and expected to reduce design complexity. This paper is organized as follows. Section 3 introduces the principle of AverageMagnitude-Difference-Function. Section 4 demonstrates the sound source localization algorithm simulation flow and expected hardware architecture. Section 5 shows the experimental results of all kinds of angle and distance.
2 Motivation In recent years, market trends are toward the entertainments and Human-Computer Interaction (HCI). We can see many novel toys product in the market such as F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 616–625, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Long-Distance Time Domain Sound Localization
617
Ugobe-Pleo [10] and Hasbro-furby [11], etc. For this reason, we want to make localization system applicable to the general toys. This localization system is able to keep the localization error within 10° of accuracy under a long distance position. This performance is acceptable for toys applications.
3 TDOA Estimation Method This paper applies the principle of hyperbolic characteristic to sound source localization. Hyperbolic location involves estimation of the TDOA between microphone receivers through the time-difference estimation algorithm. The estimated TDOA results are then converted to angle-difference measurement between microphones and hence locate the sound source position. 3.1 Hyperbolic Sound Localization Principle Fig.1 illustrates the hyperbolic sound location. To explain the hyperbolic property, we assume the following assumptions first. Focus-to-focus distance is 2c, namely MIC1 to MIC2 distance. The difference between any points on the curve and focus is 2a, namely the distance difference between sound and both microphone. Therefore, we can locate the sound source θ as
x2 y2 − =1 a2 c2 − a2
(1)
c2 − a 2 x a
(2)
y=±
sound Y B MIC1
O a
c ? a
A
MIC2 X
Fig. 1. Hyperbolic sound location curve
2c = MIC1 − MIC 2
(3)
618
J.-F. Wang et al.
2a = TDOA * v ⎛a⎞ ⎝c⎠
⎛
TDOA * v ⎞⎟ ⎟ ⎝ MIC1 − MIC 2 ⎠
θ = arccos⎜ ⎟ = arccos⎜⎜
(4) (5)
where v is the velocity of sound 3.2 Incoming Signal Model for Time Delay Estimation
Assume a sound signal s(t) is transmitted from a remote source with noise n(t), two separated microphones can be modeled as x1 (t ) = s (t − τ 1 ) + n1 (t ) x2 (t ) = s (t − τ 2 ) + n2 (t )
(6)
where τ 1 and τ 2 are the signal delay times. This model assumes s(t), n1(t) and n2(t) are real, jointly stationary random-process and that s(t) is independent with noise n1(t) and n2(t). Assume τ 1 < τ 2 , we can rewrite Equation (6) as x1 (t ) = s (t ) + n1 (t ) x2 (t ) = s (t − τ ) + n2 (t )
(7)
where τ = τ 1 - τ 2 is the time delay. We need to estimate τ, the TDOA of sound signal s(t) between the two microphones. 3.3 Average-Magnitude-Difference-Function Algorithm
Various time delay estimation is proposed and compared. For accurate estimation of τ, we use Average-Magnitude-Difference-Function (AMDF) algorithm. From Equation (8), we can find that no multiplication operation is required. It is able to reduce the design complexity and implement hardware practical.
AMDF (τ ) =
1 ∑ x1 (t ) − x2 (t + τ ) N
(8)
4 Simulation Flow and Hardware Architecture The flow chart of Average-Magnitude-Difference-Function algorithm is shown in Fig.2. This algorithm is simulated and verified through MATLAB. The sound signal is received from a microphone pair and then amplified through the microphone preamplifier. Fig.3 shows the microphone pre-amplifier and then passes output signal to PC Line-in port after. The parameter setting for sound source localization simulation is shown in Table 1. Fig.4 shows the sound source localization hardware scheme. The
A Long-Distance Time Domain Sound Localization
619
input signals are sampled and then stored in two input buffers (one of each channel). With the voice activity detect (VAD), we are able to take the voice components and stored them into the buffer. Finally, the TDOA values will output from the buffer contents of each channel are processed by adder and substrator.
MIC1 MIC2
Frame sound signal s1(t),s2(t) AMDF algorithm Microphone preamplifer TODA to Angle converter ADC conversion
Fig. 2. Average-Magnitude-Difference-Function algorithm design flow
Fig. 3. Two channel microphone pre-amplifier
620
J.-F. Wang et al.
VAD
MIC1
Ch1. input buffer
MIC2
Ch2. input buffer
M U X
buffer1
M U X
adder
substrator M U X
buffer2
M U X
TDOA
adder
VAD
Fig. 4. Hardware architecture of sound source localization Table 1. Parameter setting for sound source localization simulation
Input bit Sampling frequency Inter-microphone separation Microphone type
16bit 64KHz 14cm Omni-directional
5 Experimental Results and Discussion Fig.5 gives the experimental environment diagram. Fig.6 shows the real experimental environment with 1 meter distance. We tested all kinds of angle and distance between the speaker and receiver from degree -75° to 75° with every 15° increase and 1 meter to 5 meters, respectively. Every angle is tested twenty-five times at different distance. Totally have 1375 samples test. Table 2 shows the experimental results of all kinds of angle accuracy on distance 1 meter to 5 meters that experimental error lower than 10°. From Table 2, we can find out that angles accuracy almost between 88% and 100% at angles 0° to 60° regardless of how length is. When the degree reaches 75° and distance between 4 meters and 5 meters, the accuracy are poor. This is caused by echo effect. The echo effect is also called reverberant effect. Miscellaneous of shape environment room will cause a variety of reverberant effects [12]. Here, we are trying to reduce this effect now. Fig.7 to Fig.11 shows the average error measurement results from distance 1 meter to 5 meters. Experimental result shows the average errors within 6.5° of accuracy for distance 1 meter to 3 meters. For distance 4 meters and 5 meters, the average errors are within 8° of accuracy due to echo effect. Compared with other paper [13] in our experimental test of distance, the accuracy of this paper has long distance measurements and the values are almost greater than 88%. Table 3 shows the compared results on distance 1 meter.
A Long-Distance Time Domain Sound Localization
621
Table 2. All kind of angles accuracy of sound source localization experimental results Degree 0°
15°
30°
45°
60°
75°
88% 100% 100% 100% 100% 100% 100% 100% 100% 100%
100% 100% 100% 96% 100% 100% 100% 96% 100% 96%
96% 100% 100% 96% 100% 100% 100% 100% 96% 100%
96% 96% 100% 84% 92% 100% 88% 88% 68% 92%
88% 100% 96% 68% 88% 84% 68% 72% 60% 56%
Length 1 meter 2 meter 3 meter 4 meter 5 meter
Left Right Left Right Left Right Left Right Left Right
100% 92% 100% 100% 100%
12.5m French windows
2.6m width wall 8m
mic1
14cm (Mic height 1.35m) mic2 wall 1m 75o 2m 60o 3m o 45 4m o o 30 5m o 15 0
wood wall Fig. 5. Experimental environment diagram
Door
622
J.-F. Wang et al.
Fig. 6. Real experimental environment
Fig. 7. Measurement results for distance of 1 meter
A Long-Distance Time Domain Sound Localization
Fig. 8. Measurement results for distance of 2 meters
Fig. 9. Measurement results for distance of 3 meters
623
624
J.-F. Wang et al.
Fig. 10. Measurement results for distance of 4 meters
Fig. 11. Measurement results for distance of 5 meters
A Long-Distance Time Domain Sound Localization
625
Table 3. Compared with the accuracy of error within 10° in distance 1 meter
Error <=10°
[13]
This paper
92.86%
96.7%
6 Conclusion The computer simulation of long distance source localization that uses AverageMagnitude-Difference-Function algorithm is discussed. The experimental results indicated that average errors are within 8°of accuracy from 1 meter to 5 meters and suitable in several application fields. The AMDF algorithm also has the advantage of no multiplication operations required and hence the hardware complexity able to reduced. Future activities include FPGA implementation, circuit synthesis and then integrated into a chip finally.
References 1. DiBiase, J., Silverman, H., Brandstein, M.S.: Robust localization in reverberant rooms in Microphone Arrays: Signal Processing Techniques and Applications. In: Brandstein, M.S., Ward, D.B. (eds.), pp. 131–154. Springer, New York (2001) 2. Aarabi, P., Zaky, S.: Integrated vision and sound localization. In: Proc. 3rd International Conference on Information Fusion, Paris, France (July 2000) 3. Aarabi, P., Zaky, S.: Robust sound localization using multi-source audiovisual information fusion. Information Fusion 3(2), 209–223 (2001) 4. Nakadai, K., Hidai, K., Okuno, H.G., Kitano, H.: Real-time speaker localization and speech separation by audio-visual integration. IEEE International Conference on Robotics and Automation 1, 1043–1049 (2002) 5. Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Acoustics, Speech, and Signal Processing 24(4), 320–327 (1976) 6. Nguyen, D., Aarabi, P., Sheikholeslami, A.: Real-time sound localization using fieldprogrammable gate arrays. IEEE Acoustics, Speech, and Signal Processing 2, 6–10 (2003) 7. Halupka, D., Mathai, N.J., Aarabi, P., Sheikholeslami, A.: Robust sound localization in 0.18 um CMOS. IEEE Signal Processing 53(6), 2243–2250 (2005) 8. Fertner, A., Sjolund, A.: Comparison of various time delay estimation methods by computer simulation. IEEE Acoustics, Speech, and Signal Processing 34(5), 1329–1330 (1986) 9. Ross, M., Shaffer, H., Cohen, A., Freudberg, R., Manley, H.: Average magnitude difference function pitch extractor. IEEE Acoustics, Speech, and Signal Processing 22(5), 353– 362 (1974) 10. Ugobe, http://www.pleoworld.com/ 11. Hasbro, http://www.hasbro.com/ 12. Varma, K.: Time-Delay-Estimate Based Direction-of-Arrival Estimation for Speech in Reverberant Environments, Virginia Polytechnic Institute and State University, M.S thesis (2002) 13. Sha-Li, T.: Two-dimensional Source Localization : Implementataion and Discussions of Time Domain Methodologies, Nation Tsing Hua University,Taiwan, M.S thesis (2005)
Towards Dataintegration from WITSML to ISO 15926 Kari Anne Haaland Thorsen and Chunming Rong Department of electrical engineering and computer science, University of Stavanger, 4036 Stavanger, Norway [email protected], [email protected]
Abstract. To enable information sharing and automatic reasoning across applications and domains, with relevant context and content, we need to model and map data into common standards with semantic annotation. The project Integrated Operations (IO) has been estimated to increase the value of the petroleum resources on the Norwegian continental shelf (NCS) with NOK 250 billions in NPV (EUR 30 billions). It has been decided to use ISO 15926 as the instrument for integrating data across disciplines and business domains. Different domains metadata is mapped onto the common ontology defined in ISO 15926. Only in this manner can true integrations emerge. There is a need to have a mapping of WITSML types and structures in ISO 15926. Much work has already been done, but a vital part still remains. At present it is not possible to categories and identify WITSML base types and WITSML schemas structures in ISO 15926. Keywords: Ontology, Semantic Web, Integrated Operations (IO), data integration.
1 Introduction Currently integration is mainly focused on ad hoc application-to-application integration, and many integrated services already exist in high-end and specialized applications area, but there is still a lack of real data integration. Capturing and mapping data into a common standardized data format, with semantic annotation, enable large amount of data to be aggregated and dealt with in real-time, and decision making can be done automatically. Information becomes more accurate, more dependable, and can be viewed in new ways. The urgent need for skilled labour due to economic growth can be alleviated by automate processes and relieve such labour for less mundane tasks. The development of integrated and autonomous systems depends on the ability of knowledge sharing between distributed heterogeneous data sources. So far, a promising solution seams to be by letting data sources use a common ontology. A common ontology offers the ability to access information directly at its source, and work with data from disparate sources as if they were located in a central repository. Modern oil and gas industry is to a large extent a knowledge and information industry, and the capacity of handling large amounts of information has a key impact on both added value and the ability to protect the natural environment. Improving the industry’s F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 626 – 635, 2008. © Springer-Verlag Berlin Heidelberg 2008
Towards Dataintegration from WITSML to ISO 15926
627
ability to structure, transmit, process and share information can make significant contributions to productivity and environment protection. For instance, the project Integrated Operations (IO), which aim is to support the industry in reaching better, faster and more reliable decision, has been estimated to increase the value of the petroleum resources on the Norwegian continental shelf with NOK 250 billions in Net Present Value (EUR 30 billions) [1]. IO consists of collaborative efforts in the oil and gas industry to support operational decisions about offshore installations by onshore control centres, developing common standards, integrated solutions, and new technologies. It has been decided to use ISO 15926 as the instrument for integrating data across disciplines and business domains.
Expert and knowledge data
Common Oil and Gas Ontology
Operation Centre Real time data
Fig. 1. Integrated Operations
Focus is on integration of offshore and onshore, real-time simulation and optimizing of key work processes, and integrat operation centres of operator and vendors. A vital factor for the success of IO is to capture and map the different domains metadata onto the common ontology defined by ISO 15926. Only in this manner can true integrations emerge.
2 Previous Work For the time being, exchange of data demand vast human interaction to ensure information-consistency. Drilling processes generate a great deal of real time data. When processed in real time, precise and accustomed representation of data can give valuable information on-the-fly, thus improve drilling processes and increase production. So far drilling operation centres can gather and visualize real-time drilling data. However, there is a need to transform these real-time measurements into information which can support decision making during drilling and completion operations. The
628
K.A.H. Thorsen and C. Rong
concept of using models of the well behaviour to estimate in real-time machine limits can be transposed to the drilling operation centre for making predictive estimations and risk quantification of the remaining operations. Semantic Web facilitates the connection of different services, enables automation of processes and makes possible data integration and communication of complex data in real-time. Real-time sensor data from the well can, on-the-fly, be analyzed and compared with stored expert-data to predict the optimal drilling plan, estimate the current risk picture as well as projecting the expected response of the well in the near future while continuing the current operation. The probabilistic estimation of the occurrence of unexpected events completes the understanding by the drilling team of the current situation. The evaluation of the operational limits to complete the operation using a projection in the near future gives a good indicator to support decision making. As facilities are moving to subsea automations and integration of data processing are essential for controlling and provide situation awareness such that they can be used efficiently. PCA (POSC Caesar Association) [2], the leading global, not-for-profit, standardization organization for the process industry including oil and gas, has through ISO TC 184/SC4 developed a methodology for data integration across disciplines and phases and this work has been documented in ISO 15926, also named “Integration of lifecycle data for process plants including oil and gas production facilities”. ISO 15926 is extraordinarily robust and complete, both in its specification and the technical infrastructure through which it is deployed. These characteristics differentiate ISO 15926 from other, less formal attempts, at standards development. Other standards initiatives have broken down due to their inability to scale the submissions process, cover the structural requirements of multiple disciplines and provide an enterprise-grade technology foundation [2]. Using the methodology of ISO 15926, PCA has, in close collaboration with the Norwegian offshore industry, developed an OGO (Oil and Gas Ontology) for important upstream business processes: Drilling, development, production, and operation. The ontology is administered by PCA and is stored in PCA’s Reference Data Service (RDS) [3]. This ontology enables the offshore industry to: − − − − −
do data integration within and across business domains create an architecture for web services include reasoning as part of the ontology for creation of autonomous solutions include uncertainty as part of the ontology to cope with risks be able to store data over time
The utility value of ISO 15926 depends on the industry mapping their local expressions on to it. By mapping metadata up to ISO 15926 it is easy for different applications to communicate and cooperate. ISO 15926 enables mapping between different sources vocabularies. It defines structures and definitions on rig related information and terminologies, thus eases the interpretation and translation of terminologies and terms between different applications and working domains.
Towards Dataintegration from WITSML to ISO 15926
629
The WITSML (Wellsite Information Transfer Standard Markup Language) [4], is a standard for sending well site information in an XML document format between business partners. It is already in use in different area, e.g. in daily drilling report at the NCS, but to enable real-time, automatic capturing and representing of well sit information in different systems and applications a current issues is to have a mapping of WITSML in ISO 15926. Only then can this standard be fully utilized, and well sit information can be captured and represented in other systems that do not support WITSML. By example, integrating drilling-data, such as production volume and productive vs. non-productive times, into a resource planning tool will allow significant improvements on scheduling of maintenance, identifying resources consumption and to automate resource allocation. Much work has already been done her. Most of the WITSML expressions and terminologies have already been mapped onto ISO 15926, but a vital part is still remaining. At present it is not possible to categories and identify WITSML base types and WITSML schemas structures in ISO 15926. Only by this mapping can true integration be fulfilled.
3 Mapping The WITSML was initially developed by the WITSML project, an oil industry initiative sponsored by BP and Statoil, and later by Shell, as a new standard for drilling information transfer. The aim of the WITSML standard is: The “right time” seamless flow of well site data between operators and service companies to speed and enhance decision-making. WITSML is written in xml and is in accordance with xml schema structures. WITSML abstract types, listed in Table 1 and Table 2, are all derived from xsd builtin simple data types, directly or through other WITSML abstract types. WITSML abstract types are intended as the supertypes of all other WITSML types (e.g. abstractInt is the supertype for all other Int-types in WITSML), and disallows an “empty” type. These abstract types are not intended to be used directly, as implied by their names, except to derive other types in WITSML. 3.1 Mapping WITSML Data Types onto ISO 15926 As WITSML abstract types are all derived from xsd built-in simple data types[4], directly or through other WITSML abstract types, it first seemed like a good solution to map xsd schema structures and datatypes in ISO 15926. But this structure is standardised and conducted by W3C, and it is W3Cs responsibility to maintain and update the standard. A mapping in ISO 15926 require constant attention on update and changes and can easily lead to inconsistency between the standard and ISO 15926s representation of it. We therefore concluded that it should not be the responsibility of ISO 15926 to represent the structure of xsd schemas.
630
K.A.H. Thorsen and C. Rong Table 1. Abstract Base Types
Type abstractBoolean abstractDateTime abstractDate abstractYear abstractDouble abstractShort abstractInt abstractString
Base xsd:boolean xsd:dateTime xsd:date xsd:gYear xsd:double xsd:short xsd:int xsd:string
abstractMeasure
witsml:abstractDouble
abstractMaximum LengthString abstractUncollaps String
witsml:abstractString xsd:string
Notes:
Collapse all white spaces to a single space. Quantity that have a value with a unit of measure, given in the uom attribute of the subtype. Defines the max. length string to be stored in a data base Maintains white spaces
Table 2. Content specific abstract types
Type abstractPositiveCount abstractNameString
Base witsml:abstractShort witsml:abstractString
abstractUidString abstractCommentString
witsml:abstractString witsml:abstractMaximum LengthString witsml:abstractString witsml:abstractString
abstractTypeEnum abstractUomEnum
Notes: User assigned human recognizable contextual name types. Locally unique identifiers. Comment or remark intended for humans. enumerated “types” Units of measure
Hence, ISO 15926 should not catch the schemas structure of WITSML, but it should be able to catch the semantically structure and interpretation of concepts in WITSML. The structure of WITSML schemas reflects a model of the real world, a view on e.g. the composition of a rig. Which parts it contains, what information it holds etc. E.g. the interpretation of the concept well bore and that a well bore must belong to a well. ISO 15926 should capture this structuring of well data independently of whether it is represented in an xml based schema or something else. This is something that reflects the actual nature and not some specific technology to represent it, and support the idea that is should be easy to change technology. To accomplish this there is still a need to be able to identify what a WITSML base type represents. E.g. a witsml:abstractBoolean is an representation of a boolean value. Thus, ISO 15926 should be able to identify WITSML representation of abstract concepts like integer, boolean, string etc.
Towards Dataintegration from WITSML to ISO 15926
631
An xsd simpe type is a set of Unicode strings together with a semantic interpretation of the string [5]. I.e., they are not the data type themselves but a way of representing the data types. The ISO 15926-2 class: class of information representation classifies patterns used to represent information [6] and seams like a suitable superclass for WITSML base types. None of the current subclasses of class of information representation seams suitable as a superclass for WITSML base types, and a new class should be considered. As a subclass of class of information representation we find class of EXPRESS information representation (a class_of_information_representation that is defined by ISO 10303-11 [3]) that holds all the simple data types of the EXPRESS language. The EXPRESS language is a data specification language, which allows unambiguous data definitions and specification of constrains on the data it defines [7]. Our suggestion is to create a subclass similar to class of EXPRESS data types (Figure 2 and Figure 3), as they both present a way of representing data types.
Fig. 2. EXPRESS information representation. The arrow indicates a classification relationship with the arrow head pointing towards the super class.
Fig. 3. WITSML information representation
3.2 Relating WITSML Base Types to Their Content In the previous section we identified WITSML base types to be a representation of some kind of information. Furthermore we need to identify what type of information it represents.
632
K.A.H. Thorsen and C. Rong
In ISO 15926 attributes are not represented in the same way as in familiar programming languages like Java, C++ etc, but by references to entity data types that describes relationships between different objects. Instead of saying that an object has an attribute, e.g. a car has an engine, this would be modelled by a relationship entity saying that an engine is a part of a car. This modeling concept maintains the fact that the engine exists independent of the car. The same would be true for a simple datatype, like an integer. The integer 5 exists independent of its xml representation, and the diameter of a drill pipe exists independently of its representation. The relation between a witsml:basetype and the actual information it represent should be modeled by an relationship entity describing the given relationship. (Illustrated in Figure 4) E.g. an witsml:integer represent an “arithmetic number that is an integer number”, and an witsml:abstractBoolean represent a “mathematical value which indicates whether something is the case or not”.
Fig. 4. Information representation
All abstract base types, except abstractMeasure, (table….) can be considered derived from existing xsd types, with some added constrains (e.g. max length on string), and represent the same datatypes as their xsd counterparts. In Table 3 we give an overview of the WITSML abstract base types and the corresponding ISO 15926 reference data they should represent. AbstractMeasure is not directly derived from any xsd datatypes but is a composite reference data. As the documentation states, it is the superclass of all quantities that have a value with a unit of measure. (E.g. 3m, 5.4m/s, 8kg), thus contains both a
Towards Dataintegration from WITSML to ISO 15926
633
Table 3. Data type representation. The table gives an overview of which data types /information the different WITSML types represent. The data type/information refer to reference data in RDS. The third column lists the RDL definition for the suggested reference data types.
WITSML base type abstractBoolean
Reference data represented Boolean (note)
abstractInt abstractPositiveCount abstractDouble abstractShort abstractDateTime
Integer Integer Real Real Point in time
abstractDate abstractYear
Period in time Period in time
abstractString abstractMaximum LengthString abstractUncollaps String abstractNameString abstractUidString abstractCommentString abstractTypeEnum abstractUomEnum abstractMeasure
String String
RDL Definition A mathematical value which indicate whether something is the case or not An arithmetic number that is an integer number An arithmetic number that is a real number An event that is a commonality of point in time A possible_individual that is all space for part of time – a temporal part of the universe. A text that is purely a sequence of characters
String String String String String String Property
A class_of_individual that is a member of a continuum of a class_of_property. The property may be quantified by mapping to a number on a scale.
numeric value and a scale. In ISO 15926 this would be categorised as a singleProperyDimension (a property space that is a single and complete continuum of properties each of which maps to a single number [6]). All WITSML specific acceleration and scale expressions need to be mapped to their corresponding ISO 15926 reference data. Figure 5 gives an illustration of this mapping. abstractMeasure is the subclass of accelerationLinearMeasure, which contains the accelerationLinearUom “m/s2”. How they represent the data types, and under what constrains (e.g. maximum length of a string or value space constrain on number types) should be specified in the documentation. The documentation should for instance hold a reference to W3C and xsd: datatype specification documents.
634
K.A.H. Thorsen and C. Rong
Fig. 5. Mapping of WITSML abstractMeasure. In ISO 15926 attributes are represented by references to entity data types that describe relationships between different objects. The diamond indicates such a relationship. Linear Acceleration would be related to two attributes, a number which state the magnitude of the acceleration (not illustrated here), and the scale the acceleration is given in.
In this manner the WITSML structure are reflected in the ISO 15926 reference data, but the xml standard which states how the data are represented are left outside.
4 Conclusions and Further Work Focus her has been on representing WITSML specific datatypes to ISO 15926. We concluded that it is not the responsibility of ISO 15926 to represent the structure of xsd schemas. This structure is standardised and conducted by W3C, and it is W3Cs responsibility to maintain and update the standard. A mapping onto ISO 15926
Towards Dataintegration from WITSML to ISO 15926
635
requires constant attention on updates and changes, and can easily lead to inconsistency between the standard and the ISO 15926 representation of it. We identified the need to be able to, in the ISO 15926 model, specify that a data type is represented by an WITSML type. In the light of this, a suggestion for addition to the RDS is proposed. Further work would be to compare the ISO 15926 structure and the WITSML structure, and based on this findings suggest further additions to the RDS to capture WITSML’s information structuring. E.g. what a wellbore is and that it must belong to a well. Technically it would be logical to assume that this structure would be the same, as a well has a given construction. But individual views may result in different modelling of the constructions. Where divergences appear this must be identified and analysed to be able to construct a mapping between the different interpretations. By integrating different domain specific expressions in one common ontology standard, like ISO 15926, we make possible autonomous interactions between different applications and domains. This will improve and increase the efficiency of many work processes, and may also relief the urgent need for skilled labour as such labour can be relocated to less mundane tasks. Ubiquitous integrated computing will renovate workflow processes and realize new business models to meet the challenges from the fast growing digital convergences.
References 1. Berrefjord, Thomassen, A.S.: Ståsted, drivkrefter og mulige strategiske fremdriftsutsikter mht. utvikling og innføring av Integrerte Operasjoner på norsk sokkel. OLF, p. 47 (2006) 2. POSC Caesar [cited September 10, 2007], http://www.posccaesar.com/ 3. Reference Data Service (RDS), POSC Caesar 4. Wellsite Information Transfer Standard Markup Language [cited September 2, 2007], http://www.witsml.org/ 5. Møller, A., Schwartzbach, M.I.: An introduction to XML and web technologies, XXII + 542 pages. Addison-Wesley, Harlow, Engl. (2006) 6. International Organization for, S., Industrial automation systems and integration - Integration of life-cycle data for process plants including oil and gas productions facilities, Part 2: Data model. ISO, Geneva, xiv + 225 pages (2003) 7. International Organization for, S., Industrial automation systems and intergration: product data representation and exchange, Part 11, Description methods: The EXPRESS language reference manual. ISO, Geneva, XIII + 255 pages (2004)
A SIP-Based Session Mobility Management Framework for Ubiquitous Multimedia Services Chung-Ming Huang and Chang-Zhou Tsai Laboratory of Multimedia Mobile Networking, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C [email protected]
Abstract. This paper proposes a SIP-based session mobility management framework for achieving session handoff that is transparent to the underlying wireless or wired technology between network domains. Session mobility is concerned with a presentation session changing its attachment device from one client device to the other client device and still continuing the presentation session. This concept requires migrating related information of the current service execution, including service content, service data, etc, from one device to another device. Since different devices and their attached networks have different capabilities and performance, media streaming service quality should be adapted when the session handoff occurs. MPEG-21 efforts for multimedia adaptation to bring suitable multimedia services to users. MPEG-21 Digital Item Adaptation provides normative XML descriptions for handling multimedia content adaptation, but does not specify relations to existing technologies for transport mechanisms. Thus, we adopt the MPEG-21 DIA for multimedia session and usage environment description in our proposed SIP-based session mobility management framework. Keywords: Ubiquitous Network, Session Mobility, SIP, MPEG-21 DIA.
1
Introduction
Ubiquitous networking and computing environments are being realized because of (1) wireless networks’ providing more bandwidth and supporting higher moving speed, (2) devices’ getting more powerful and continuing miniaturization, and (3) many new applications are being developed. One of the main goals of ubiquitous networking and computing is to allow users to access services anytime and anywhere by adapting users’ current situations. For example, for receiving better quality of the multimedia service, a user may want to change to the office desktop that connects to Fast Ethernet from his smart phone that connects to
This research is supported by the National Science Council of the Republic of China, Taiwan, under the contract number NSC 96-2219-E-006-007, and Intel Microelectronics Asia Ltd., Taiwan Branch.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 636–646, 2008. c Springer-Verlag Berlin Heidelberg 2008
A SIP-Based Session Mobility Management Framework
637
WLAN
ADSL IP Network Cellular Network Fast Ethernet
Fig. 1. An illustrated configuration of the user environment
cellular network when he is coming into his office; on the other hand, if the user wants to continue a multimedia presentation session when he is moving out his office, e.g. from VVoIP to VoIP, he may want to switch the presentation session from his desktop that connects to Fast Ethernet to his PDA that connects to WLAN. However, different devices have their own capabilities and performance. For example, PDA has low computing power and a small screen. Given the capabilities of these devices, end-users are able to access their multimedia content anywhere anytime. The concept of Universal Multimedia Access [1] requires content providers to produce multimedia resources suitable for wide varieties of devices. This kind of research makes it possible to access multimedia contents with different devices. Since users can use their devices to access multimedia contents, they need a method to transparently switch from one device to another. The above concept belongs to the session mobility management issue in the ubiquitous networking and computing era. Session mobility is concerned with a presentation session changing its attached device from one client device to the other client device and still continuing the presentation session. Session mobility can exist at various situations, depending on the user’s behavior. The user may move among different network domains, which is illustrated in Figure 1, and switch the presentation session from one client device to another client device. In order to achieve session mobility, some problems must be resolved. (1) How to establish the session mobility management architecture to provide the smooth session handoff and the most suitable multimedia service quality over heterogeneous networking environments? (2) How to divert multimedia services between distinct devices/networks using the established session mobility management architecture? (3) How to describe the information of the network/multimedia systematically and exchange these pieces of information between the server and the client? (4) How to make the device or service re-work effectively? In this paper, we propose a SIP-based session mobility management framework for achieving session handoff that is transparent to the underlying wireless or wired technology among different network domains. For the signaling protocol aspect, we adopt SIP [2] to resolve this problem because it is an application layer
638
C.-M. Huang and C.-Z. Tsai
protocol and thus is easy to be deployed without the need of modifying the underlined networks’ internal infrastructure. SIP, which can initialize, maintain and terminate sessions for users, is a wide-spreading signaling protocol for multimedia services in the Internet. In other words, the SIP-based framework can easily across heterogeneous computing platforms. For multimedia adaptation aspect, we adopt MPEG-21 Digital Item Adaptation (DIA) [3] to adapt multimedia contents for heterogeneous networking environments. MPEG-21 DIA provides normative XML descriptions for handling multimedia content adaptation, but does not specify relations to existing technologies for transport mechanisms. Furthermore, SIP [4,5,6] can use the eXtensible Markup Language (XML) for session description. Among different devices, we can use SIP to intercommunicate and carry MPEG21 DIA tools to specify different devices’ capabilities and operating environments such that we can smoothly integrate two technologies well. The remaining part of the paper is organized as follows. Section 2 presents related works of session mobility. Section 3 introduces the proposed SIP-based session mobility management architecture. Section 4 gives the session handoff mechanism. Finally, Section 6 has conclusion remarks.
2
Related Work
The research of session mobility has been studied in various domains. In [7], the authors proposed a browser session preservation and migration (BSPM) infrastructure. This infrastructure utilizes the BSPM proxy to store the session. Although this work has a solution for the BSPM, the browser needs to install the BSPM plug-in at each device. In [8], the authors proposed a middleware for supporting device and network adaptation in various network environments and the session can be separated to several devices, such as from the cellular phone to the external camera, display device and microphone. The middleware has to be installed on every client device. In [9], the authors also proposed a middleware architecture to resolve the session handoff issue. This work tries to minimize application level support for session transfer mechanisms. The handoff manager resides in the network infrastructure, and thus allows the session mobility to be deployed on heterogeneous devices with minimum constraints. Although this work has a good solution for session handoff, this system has the high deployment cost. In [10,11], the authors proposed a SIP-based signaling mechanism for session transfer. It utilizes the SIP REFER method and Third Party call control to achieve session handoff. In [11], authors further proposed the use of an XML document to allow a user to define the configuration behaviors of devices. Although the SIP-based method can be easily deployed, the SDP cannot easily specify the user’s environment because SDP uses an ABNF-like format that has a restricted depth of the definition hierarchy. In [12], the authors used MPEG-21 Digital Item to realize session mobility between two different devices. However, this work does not use a well-know signal protocol such that this work may not be suitable for different operating systems.
A SIP-Based Session Mobility Management Framework
3
639
System Architecture
In order to resolve the session handoff problem, we propose a SIP-based session mobility management architecture that uses the MPEG-21 DIA tools to provide an adapted multimedia streaming for different devices with different capabilities, e.g., Desktop PC, Notebook, PDA, etc. Thus, a diverse set of terminal devices can receive adaptive streaming services from the multimedia server even though the devices’ attached networks are different, e.g., ADSL, 3G, etc., that may differ in bandwidth, latencies, etc. Figure 2 depicts two main entities that contain several components in this architecture. 3.1
Client and Server
Figure 3 depicts the message flow between client and server. Main components of the architecture are as follows: – Agents: An agent is constructed in the server side and the client side respectively. The agent is a coordinator for signal process and media process. The client agent can send media request messages with the requested media descriptions through the session control manager. A request media description includes the request media name, requested frame number, usage environment, etc. The sever agent can retrieve the requested media description from the client side through the digital items handler, and then trigger the adaptation decision engine to generate the adaptive media content based on the media description. – Digital Items (DIs) Handler: The DIs handler deals with digital items, which are represented using the MPEG-21 Digital Item Declaration Language (DIDL), between the agent and session manager at both client and server sides. In the second part of MPEG-21, the Digital Item Declaration (DID) [13] provides required flexibility and makes it possible to declare a Digital
Multimedia Server Client Device Session Database Media Database Digital Items Stream Player Handler Client Module Agent Session Streaming Control Manager Control Manager
Adaptation Digital Items Decision Engine Handler Server Session Agent Streaming Control Manager Control Manager
Fig. 2. System components
640
C.-M. Huang and C.-Z. Tsai
Media Database
Session Database
Digital Items Handler
Streaming Control Manager
Session Control Manager Server Side Client Side
RTCP
SIP Session Control Manager
Digital Items Handler
Adaptation Decision Engine
Server Agent
RTP
Streaming Control Manager
Client Agent
Stream Player Module
Fig. 3. The workflow of the architecture
Item composed of multiple multimedia resources. DID is a container structure allowing users to describe the relationship between different elements of the Digital Item. A Digital Item Declaration can be constructed in XML using the Digital Item Declaration Language (DIDL). In order to achieve device independence, at the client side, DIs handler generates the MPEG21 DIA Usage Environment Description (UED) into the SIP message body, which contains information about terminal capabilities, network capabilities, or any type of information about a Digital Item that is needed for multimedia adaptation. At the server side, DIs handler is responsible for retrieving digital items from the SIP message body and parsing them for adaptation decision engine’s use. – Adaptation Decision Engine (ADE): ADE executes media content adaption using MPEG-21 DIA elements that are conveyed from the session control manager. According to the UED, the media content may be adapted to different resolutions for different device screen sizes. – Streaming Control Manager: At the server side, this component is responsible for sending RTP streaming and receiving RTCP packets, and performs the actual data-rate adaptation. At the client side, this component is responsible for receiving RTP streaming and sending RTCP packet. When a client device receives the media stream, it will reply the received frame number to the multimedia server periodically using RTCP packets. – Session Control Manager: This component is responsible for signal exchange and session maintenance using the SIP protocol. Except session establishment, modification and termination, this component also achieve other session handoff functionalities.
A SIP-Based Session Mobility Management Framework
UAC1
641
Multimedia Server (UAS)
UAC2
(1) Media Packets Trigger session handoff (2) UPDATE (Pause Media) (3) Refer (Push) (4) INVITE (Replace) (5) 200 OK (6) 202 Accept (7) Bye (8) 200 OK (9) Ack (10) Media Packets
Fig. 4. The session handoff mechanism - push mode
– Session Database and Media Database: The session database keeps track of session information that the client device received during the media session. Session information includes the user information, movie name and the received frame number. The received frame number is dynamically changed by the streaming control manager. The media database stores multimedia contents.
4
Session Handoff Mechanism
In this Section, we introduce three session handoff schemes that take advantage of the SIP protocol to carry the MPEG-21 Digital Items between server and client. According to the user behavior, three session handoff conditions that can exist are push mode, pull mode and error recovery mode. – Push mode: This mode allows the user to move the current media session from the current client device to the target client device. For example, when a user goes home, he may switch the video session from his PDA connected with the 3G network to the desktop connected with the ADSL in the living room for receiving the better quality of service and lowering the communication cost. Considering the push mode depicted in Figure 4, a media session is handoff from UAC1 to UAC2. When a user prepares to switch the client device, UAC1 must inform the media server of an impending handoff of the media session and suspend the current media session. At the same time, UAC1 collects the information about the current session, e.g., the current media frame number, and then transmits session status information to UAC2 for handoff preparation. When UAC2 receives the refer message [14,15] with
642
C.-M. Huang and C.-Z. Tsai
UAC1
UAC2
Trigger session handoff (2) Refer (Pull)
Multimedia Server (UAS) (1) Media Packets
(3) UPDATE (Pause Media)
(4) 202 Accept (5) 200 OK (6) Refer (Push)
(7) INVITE (Replace)
(8) 202 Accept (9) Bye (10) 200 OK (11) Ack (12) Media Packets
Fig. 5. The session handoff mechanism - pull mode
current session information, UAC2 then sends the invite message, which contains the UAC2’s UED and session information, to the multimedia server. After receiving the invite message, the multimedia server then adapt the media content using UAC2’s UED and transfers the media session to UAC2. – Pull mode: This mode allows a user to utilize his device to get the remote media session. For example, when a user needs to move to another place and wants to continue the media session, he may use his PDA to get the media session from Digital TV directly and does not need to be close to the digital TV. Figure 5 depicts the pull mode session handoff control scheme. When a user wants to get the media session from remote device (UAC1) to his handheld device (UAC2), he can use his hand-held to inform the remote device to switch the media session. Then, the UAC1 executes the push mode session handoff procedure in order to switch the media session to UAC2. – Error Recovery mode: This mode is used to recover the media session at a new device while the original device is suddenly crashed. For example, when a user’s current client device is suddenly crashed, the user wants to continue the multimedia session and does not want to restart it. If user can find another client device to resume the multimedia session, he can retrieve related session information from the multimedia server depending on the user’s information. Figure 6 depicts the control flow chart of the error recovery mode. When a user wants to resume the prior multimedia session at a new device with Internet connectivity, he must send a message to the multimedia server with the “record” parameter and user information to check the authority. If the session database does not record the session information, the multimedia server may establish a new media session to the client device. On the
A SIP-Based Session Mobility Management Framework
643
SIP Invite message
Parse the message body
YES
Check the session Database has the related information or not?
Retrieve the received frame number
1. send first GOP and 2. stream the frames after the received frame number to client device
Check Parameter is ϘRecordϙ or not
NO
NO
Initial the media session
Stream media to the client device
Fig. 6. The session handoff mechanism - error recovery mode
contrary, when the session database has the user’s session information, the user can resume the multimedia presentation session at the new client device. The multimedia server then sends the beginning of GOP and streams the frames after the received frame number to client device. The client device must receive the beginning of GOP because other frames will be reference to the beginning of GOP. Thus, the client device can recover the media session.
5
Implementation
In order to achieve the session mobility between two devices, we have established an experimental environment that has UAC and UAS. We implemented the UAS as a ubiquitous Video-on-Demand (VoD) service over the heterogeneous network environment, e.g., wired and wireless network, and the system environment is described as follows. The operating system in the server is Linux Fedora core 4 and runs on top of an Intel Pentium 4 running at 2.8GHz with 512MB of RAM. During the experiment cases, we construct a session between the server and two
644
C.-M. Huang and C.-Z. Tsai Server Side
(5) (1)
(2)
(4)
(3) Client 1
Client 2
Fig. 7. The session handoff from client1 to client2
different clients, in which (i) one is a notebook with Linux Fedora core 4 and Intel Pentium 4 running at 1.8GHz with 256MB of RAM and (ii) the other is a desktop PC with Linux Fedora core 4 and Intel Pentium 4 running at 2.8GHz with 512MB of RAM. On the reference of libraries, we use the GNU oSIP library and eXoSIP library to realize the SIP protocol. For the RTP stream, we use GNU oRTP library. In order to carry the MPEG-21 DIA tools, we remove the SDP from the eXoSIP library and use the GDAL library for the XML parser. Figure 7 depicts the session handoff procedure between client1 and client2. (1) Using the program, users can perform multimedia presentation and handoff directly in the interface. This interface of this program enables a user to input usage information and choice of the movie. (2) Media resolution of the client1 is 352*288. (3) When the user wants to move to the other place, he uses client2 to get the multimedia presentation session from client1. (4) After receiving the session state information, client2 sends the usage environment and media session description information to the multimedia server. (5) Then, client2 receives the media session whose resolution is 176*144.
6
Conclusion
Session mobility is concerned with a presentation session changing its attachment device from one client device to the other client device and still continuing the presentation session. Over the past years, SIP has become the protocol for
A SIP-Based Session Mobility Management Framework
645
multimedia session management over Internet and can be conveniently adopted over the wireless network. In this paper, we have proposed a SIP-based session mobility management framework for achieving session handoff that is transparent to the underlying wireless or wired technology among network domains and used the MPEG-21 DIA for multimedia session and usage environment description. Since we integrate the two technologies, the scalability of our framework is shown through the interoperability of heterogeneous networks and environments. The main merits of the proposed framework are that users can have session handoff over heterogeneous networks and generate the suitable quality of service depending on the target device’s capability, operating environment and network’s situation.
References 1. Perkis, A., Abdeljaoued, Y., Christopoulos, C., Ebrahimi, T., Chicharo, J.: Universal multimedia access from wired and wireless systems. Birkhauser Boston Transactions on Circuits, Systems and Signal Processing 20(3-4), 387–402 (2001) 2. Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., Schooler, E.: Sip:session initiation protocol. RFC 3261 (June 2002) 3. ISO/IEC: Iso/iec 21000-7:2004 information technology - multimedia framework (mpeg-21) - part 7: Digital item adaptation (October 2004) 4. Kutscher, Ott, Bormann: Session description and capability negotiation. InternetDraft:draft-ietf-mmusic-sdpng-06.txt (August 2005) 5. Ott, J., Perkins, C.: Sdpng transition. Internet-Draft:draft-ietf-mmusic-sdpngtrans-04.txt (November 2003) 6. Levin, O., Even, R., Hagendorf, P.: Xml schema for media control. InternetDraft:draft-levin-mmusic-xml-media-control-12 (May 2008) 7. Song, H., hua Chu, H., Kurakake, S.: Browser session preservation and migration. In: In Poster Session of International World Wide Web Conference 2002, May 2002, p. 2 (2002) 8. Ohta, K., Yoshikawa, T., Nakagawa, T., Isoda, Y., Kurakake, S.: Adaptive terminal middleware for session mobility. In: Proceedings of the 23rd International Conference on Distributed Computing Systems Workshops (ICDCSW 2003), pp. 394–399 (May 2003) 9. Cui, Y., Nahrsetedt, K., Xu, D.: Seamless user-level handoff in ubiquitous multimedia service delivery. Multimedia Tools and Applications Journal 22(2), 137–170 (2004) 10. Shacham, R., Schulzrinne, H., Thakolsri, S., Kellerer, W.: The virtual device: Expanding wireless communication services through service discovery and session mobility. In: Proceedings of IEEE International Conference on Wireless And Mobile Computing, Networking And Communications, vol. 4, pp. 73–81 (August 2005) 11. Shacham, R., Schulzrinne, H., Thakolsri, S., Kellerer, W.: Ubiquitous device personalization and use: The next generation of ip multimedia communications. ACM Transactions on Multimedia Computing, Communications and Applications 3(2), 12 (2007) 12. Keukelaere, F.D., Sutter, R.D., de Walle, R.V.: Mpeg-21 session mobility on mobile devices. In: Proceedings of the 2005 International Conference on Internet Computing, pp. 287–293 (June 2005)
646
C.-M. Huang and C.-Z. Tsai
13. ISO/IEC: Iso/iec 21000-2:2003 information technology – multimedia framework (mpeg-21) – part 2: Digital item declaration (March 2003) 14. Sparks, R.: The session initiation protocol (sip) refer method. RFC 3515 (April 2003) 15. Sparks, R.: The session initiation protocol (sip) refer-by method. RFC 3892 (September 2004)
AwarePen - Classification Probability and Fuzziness in a Context Aware Application Martin Berchtold1 , Till Riedel1 , Michael Beigl2 , and Christian Decker1 1
TecO, University of Karlsruhe Vincenz-Priessnitz-Str. 3, 76131 Karlsruhe, Germany 2 Distributed and Ubiquitous Systems Muehlenpfordtstr. 23, 38106 Braunschweig, Germany
Abstract. Fuzzy inference has been proven a candidate technology for context recognition systems. In comparison to probability theory, its advantage is its more natural mapping of phenomena of the real world as context. This paper reports on our experience with building and using monolithic fuzzy-based systems (a TSK-FIS) to recognize real-world events and to classify these events into several categories. It will also report on some drawbacks of this approach that we have found. To overcome these drawbacks a novel concept is proposed in this paper. The concept incorporates fuzzy-based approaches with probabilistic methods, and separates the monolithic fuzzy-based system into several modules. The core advantage of the concept lays in the separation of detection complexity into distinct modules, each of them using fuzzy-based inference for context classification. Separation of detection functionality is supported by an automatic process using transition probabilities between context classifiers to optimize detection quality for the resulting detection system. This way our approach incorporates the advantages of fuzzy-based and probabilistic systems. This paper will show results of experiments of an existing system using a monolithic FIS approach, and reports on advantages when using a modular approach.
1
Introduction
Fuzzy systems have been known for years and have been successfully applied in many domains like control systems, medical systems and white ware electronics [1]. The characteristics of fuzzy control systems is their ability to classify noisy and vague input information in an application appropriate way. For example, in a process control system an engine recognized system status might be neither halted nor running after the machine receives the stop signal. Fuzzy systems handle this by fuzzifying membership functions, thus basically the engine state somehow belongs to both classes. This behaviour is similar to detecting situations of human activity: Between successive activities, there is a state which is between two activity classes, so it is impossible to fix which this activity would belong to. As shown in our previous work [2], the fuzzy logic approach can be successfully applied for activity recognition. This can be carried out by assigning fuzzy context classes to detected sensor events [2]. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 647–661, 2008. c Springer-Verlag Berlin Heidelberg 2008
648
M. Berchtold et al.
Another class of techniques successfully applied to the area of activity and context recognition are probabilistic methods. These systems often use a filterbased approach to improve the recognition rate of context recognition systems. This is achieved by assigning an a priori probability to the following classification and adjusting the recognized information accordingly. This paper will show how we improve context and activity recognition systems by applying both methods in combination. Section 2 will shortly introduce characteristics of fuzzy set theory and discuss some difference between fuzzy set and probability theory. In section 3 we will illustrate our initial fuzzy inference (FI) based context recognition method and also discuss recognition results for this approach. In section 4 we will argument that we can further improve FI based methods by separating FI functionality into modules. Section 5 will demonstrate how to fuse our fuzzy-based approach with probabilistic methods to further increase recognition rate. A discussion section 6 shows which degrees of freedom are accessible through our approach. The paper is summarized in section 7.
2 2.1
Probability vs. Fuzziness Fuzzy Set Theory
A general fuzzy set (x, μA˜ (x)) is a tuple of a value x and a membership μA˜ (x). The membership function μA˜ : U → [0, 1] expresses the degree of which an element - e.g. a sensor value - belongs to the fuzzy set A˜ - e.g. a context class. In a crisp set A the membership μA : U → {0, 1} would equal 1 for all members. Typical fuzzy membership functions are Gaussian-, triangular-, trapezoid-, etc. functions μ : U → [0, 1] with a maximum of one and a minimum zero. In general, the fuzzy sets are an extension of the crisp set theory, and therefore fuzzy logic is an extension of the Boolean logic. The fuzzy equivalents of Boolean operators are functions f : [0, 1]2 → [0, 1]. 2.2
Differences of Fuzzy Set to Probability Theory
Since the invention of the fuzzy set theory by Lofti A. Zadeh [3] a controversy has evolved on whether to use the fuzzy logic or the probability theory. A probability based view of fuzzy sets and differences to probability was published by Laviolette, et al [4] and is briefly summarized in the following: Basic Differences:In value a membership is the same as a probability, both are elements of the interval [0, 1], but the semantic is unequal and can therefore not be compared. A membership does not follow the laws of probability. The included Middle: A non-zero membership can simultaneously be held to several fuzzy sets. The probability theory defines states in a distinct way, so only the probability to one state can be expressed at one point in time, which was criticized by Zadeh [5].
AwarePen - Classification Probability and Fuzziness
649
Toleration of Vagueness: The key idea in the fuzzy set theory is the definition of vagueness [6]. In the probability theory there is no such concept. Promotion of Emulation: The fuzzy logic is designed to emulate human behaviour and thinking about real-world concepts, whereas the probability theory does not fully harness this natural potential [7]. Others [8] claim that the emulation of human reasoning contradicts many empirical studies. The Axiomatic ’Inadequacy’ of Probability: Kandel and Byatt [6] claim that the laws of probability are not realistic since human beings do not conform to them. Kosko [9] claims that the conditional probability theory lacks an adequate axiomatic foundation. This is disproved by Fishburne [10]. Fuzzy Set Operations: The fuzzy operators are generalizations of the operators from the crisp set theory [3]. We claim that there are sufficient reasons to use both - as suggested by Zadeh himself [11]. Contextual coherence can in some cases be more consistently expressed in fuzzy sets, while in other cases it is more clearly defined through probability states. We will study the limitations and mutual complementariness based on a ubiquitous appliance that we built, the AwarePen. 2.3
Applications Using Fuzzy Logic and/or Statistical Methods
There is a wide use of fuzzy logic techniques to model contextual coherences on a semantic level. Guarino and Saffiotti [12] used fuzzy logic to monitor the state of ubiquitous systems by the everyday user of the system. They claim that fuzzy logic is especially suited to overcome heterogeneity in ubiquitous sensing systems. The modelling is done by hand on a semantic level, which is not applicable in our case. Another human computer interface approach using fuzzy context can be found in [13]. One of the arguments stated here for using fuzzy logic, is that it provides a continuous control signal, whereas Boolean logic does only provide discontinuous threshold based control signals. The system is also modelled manually. An application using adaptive network-based fuzzy inference systems (ANFIS) is introduced in [2]. Here the fuzzy system is used to represent the error another system may cause when recognizing contexts. Our goal is to maximize context recognition rate and minimize calculation effort at the same time. A system that uses a probabilistic approach to represent regular activities in a smart house based on the concept of anxiety is introduced in [14]. The anxiety is formulated using probabilistic models. Another work [15] does localization based on Bayesian networks. The location is inferred of a wireless client from signal quality levels. It additionally discusses how probabilistic modelling can be applied to a diverse range of applications using sensor data. There is also a hybrid approach using both fuzzy logic and probabilistic models in a context aware system. The proposed system [16] uses Bayesian networks for deriving consequences and fuzzy logic to draw attention to subjective decisions. In this paper we focus more on context recognition and classification.
650
3
M. Berchtold et al.
The AwarePen Architecture
The general architecture of the AwarePen consists of a hardware and software part, whereas the software model is obtained offline through intelligent system identification. Overall architecture are shown in fig. 1. Data Annotation
Offline System Identification
Check Data Check Data Trainings Data .. .
.. .
.. .
Data
Variance adxl x
Subtractive Clustering
Fuzzy Classification
MAPPING Mean Mean
Funct. Adapt. Data
Online System: AwarePen TSK−FIS
Variance Mean
Hybrid Learning
ANFIS
Funct. x x xx xx x x Funct. x x x x x Rules x x x xx x x x Rules x x Data Data System Specification
Variance
adxl y adxl z
Least Squares Regression
’lying’ = 0 ’cell ringing’ = 1 ’playing’ = 2 ’pointing’ = 3 ’writing horiz.’ = 4 ’writing vertic.’ = 5 ’sitting’ = 6 ’standing’ = 7 ’walking’ = 8
Fig. 1. General software architecture of AwarePen artefact
Our software design is based on long experience with smart artefacts in general and the AwarePen [2] especially. A central aspect of our design is a fuzzy inference system (FIS): The system is capable of automatically obtaining the necessary parameters. The mapping can be fuzzily interpreted and thus stability towards unknown input is much better in our system design than with neural networks. The fuzziness of context classes is also a central aspect in our design principles. Looking at the fuzzy side of the system design, there is initially no probabilistic aspect in our architecture of the AwarePen, but a step by step analytical and experimental approach will deliver distinct facts to apply probabilistic in system identification and design. 3.1
The Hardware
The software system is implemented using the AwarePen hardware. This hardware consists of a small electronic PCB (fig. 2 middle and right) containing a digital signal processing microcontroller unit (DSP-MCU, dsPIC by Microchip) and a collection of sensors. The DSP-MCU is dedicated to processing sensor data allowing fast, real-time sampling and FIS computation. For communication and main system operation we used the Particle pPart [17] platform, which is plugged onto the sensor PCB. In this paper we focus on 3D acceleration sensors only. Boards and battery are applied inside a marker pen (fig. 2 left). 3.2
Online Fuzzy Inference System (FIS)
One way to map queued sensor data (mean and variance) onto context classes is through a fuzzy inference system. The results of this mapping can be again
AwarePen - Classification Probability and Fuzziness
651
Fig. 2. Pictures of AwarePen hardware with pen (left), sensor board with dsPIC on top side (middle) and sensor board with sensors on top side (right)
interpreted as fuzzy classes. The fuzziness is the actual error caused by the mapping FIS. Here the fuzzy mapping is semantically correct for ALL context classes that have overlapping patterns and therefore are not clearly distinguishable. With the AwarePen, classes ’pointing’ and ’playing’ are good examples for fuzziness. Whilst ’playing’ around movements can occur which are typical for ’pointing’. These circumstances are semantically correctly expressible with fuzziness, and in particular with the ’included middle’ concept of fuzzy sets. Takagi-Sugeno-Kang-FIS. Takagi, Sugeno and Kang [18][19] (TSK) fuzzy inference systems are fuzzy rule-based structures, which are especially suited for automated construction. The TSK-FIS also maps unknown data to zero, making it especially suitable for partially incomplete training sets. In TSK-FIS the consequence of the implication is not a functional membership to a fuzzy set but a constant or linear function. The consequence of the rule j depends on the input of the FIS: n → fj (− v t ) := aij vi + a(n+1)j i=1
The linguistic equivalent of a rule is formulated accordingly: → IF F1j (v1 ) AND F2j (v2 ) AND .. AND Fnj (vn ) THEN fj (− v t) The membership functions of the rule are non-linear Gaussian functions. The antecedent part of the rule j determines the weight wj accordingly: → wj (− v t ) :=
n
Fij (vi )
i=1
→ The projection from input − v t := (v1 , v2 , .., vn ) onto the classifiable onedimensional set is a weighted sum average, which is a combination of fuzzy reasoning and defuzzification. The weighted sum average is calculated according to the rules j = 1, .., m as follows: m → − → − j=1 wj ( v t )fj ( v t ) → − m S( v t ) := → − j=1 wj ( v t )
652
M. Berchtold et al.
Fuzzy Classification. The outcome of the TSK-FIS mapping needs to be assigned to one of the classes the projection should result in. This assignment is done fuzzy so the result is not only a class identifier, but also a membership identifying the clearness of the classification process. Each class identifier is interpreted as a triangular shaped fuzzy number. The mean of the fuzzy number is the identifier itself with the highest membership of one. The crisp decision which identifier is the mapping outcome, is carried out based on the highest membership to one of the class identifiers. The overall output of the FIS mapping and the classification is a tuple (c, μc ) of class identifier and membership to it. 3.3
Offline System Identification
The system identification of the FIS is not performed on the embedded device, but on a PC. The computed high-efficient FIS is then downloaded onto the AwarePen embedded device for processing in-situ FIS-based classification. It is important that the resulting FIS represents the mapping function as precise as possible, but the precision results in more rules. These conflicts with efficiency, because the more rules a FIS has the better the mapping, but the less efficient. Subtractive Clustering. An unsupervised clustering algorithm is needed to perform system identification. Each cluster results in a fuzzy rule representing the data in the cluster and its influences on the mapping result. Since there is no knowledge about how many clusters are required, an algorithm is needed that computes the number of clusters automatically. For example, the mountain clustering [20] could be suitable, but is highly dependent on the grid structure. We instead opt for a subtractive clustering [21]. This algorithm estimates every data point as a possible cluster center, so there are no prior specifications required. Chiu [22] gives a description of parameters subtractive clustering needs for a good cluster determination. Throughout this paper we use different parameters for the subtractive clustering to achieve different numbers of clusters. The subtractive clustering is used to determine the number of rules m, the antecedent weights wj and the shape of the initial membership functions Fij . Based on initial membership functions a linear regression can provide consequent functions. Least Squares Linear Regression. The weights aij of the consequent functions fj are calculated through linear regression. The least squares method fits the functions fi into the data set that needs to be adapted. A linear equation for the differentiated error between designated and actual output - which can be calculated with the rules and initial membership functions the subtractive clustering identified - is solved for the whole data set with a numeric method. Single value decomposition (SVD) is used to solve the over-determined linear equation. Using the initial membership functions Fij , the rules j and the linear consequences fj , a neural fuzzy network can be constructed. The neural fuzzy 2 network is used to tune the parameters aij , mij , and σij in an iterative training towards a minimum error.
AwarePen - Classification Probability and Fuzziness
653
Adaptive-Network-based FIS. A functionally identical representation of an FIS as a neural network is an Adaptive-Network-based FIS (ANFIS) [23]. Most of the network’s neurons are operators and only the membership functions Fij and the linear consequences fj are adaptable neurons. This neural fuzzy network is used to tune the adaptable parameters aij of the linear consequences, 2 and mij and σij of the Gaussian membership functions. The tuning process is done iteratively through a hybrid learning algorithm. Hybrid Learning. The learning algorithm is hybrid since it consists of a forward and a backward pass. In the backward pass we carry out a back-propagation of the error between designated and real output of the ANFIS to the layer of the Gaussian membership functions. The back-propagation uses a gradient descent method that searches a preferably global minimum for the error in an error hyper plane. The forward pass performs another iteration of the least squares method with the newly adapted membership functions from the backward pass. The hybrid learning stops, when a degradation of the error for a different check data set is continuously observed. The resulting ANFIS represents the qualitative non-normalized TSK-FIS S.
4
Divide and Conquer: Fragmentation of Complex FIS Rules
We have shown in [2] that a FIS is capable of detecting even complex context information from sensor readings. Nevertheless, handling and detection quality can be improved by dividing the processing of a FIS into several subparts. Throughout this section such a FIS is step-by-step ’divided’ into subparts to maximize classification results and to keep the calculation effort to a minimum. The division is not done literally, instead of dividing the amount of classes each FIS should map onto. At first, a single FIS with a varying number of rules, represents all classes (nine in the example of figure 1). In the next step, two FIS do the same mapping, also with a variable set of rules each. In the last subsection the mapping is done via four queued FISs, whose order is determined statically according to best mapping result. 4.1
Analysis of Complex FIS Mapping Error
The usage of one FIS for mapping onto nine context classes with a reasonable amount of rules is nearly impossible. Separating the patterns from acceleration sensors is difficult when the patterns are too similar. For example, separating the patterns of ’writing horizontally’ and ’pointing’ is hardly possible with only mean and variance data, because they are nearly the same for both. In theory of fuzzy sets it was proofed [24] that every function can be represented to an infinite precision with a FIS with an infinite number of rules, but such theory is of less value for resource limited embedded system. The chosen clustering shows best results with a low amount of fuzzy rules, but still is not able to separate the
654
M. Berchtold et al.
Fig. 3. Percent of correct classified data pairs for one FIS classifying all classes (grey ×) or two FIS mapping onto classes (black +)
patterns to a satisfying degree. The percentage of correct classified data pairs for a varying number of rules is shown in figure 3 as grey curve and × markers. 4.2
Dual-FIS Context Recognition
A better result according to the last section can be achieved when the big FIS representing all classes is divided into several FISs. We start here by explaining how to separate into two FISs, and later explain how to apply this method to an n-FIS system. In our first example, each FIS maps onto four (FIS A) or five (FIS B) basic contextual classes (as ’writing’, ’playing’ etc., see fig. 4). To allow the transition from one FIS to another, we add an additional classifier that represents all complementary classes. To train each FIS correctly, an equal amount of data pairs for basic classes and for the complementary class is required. In our two-FIS example, the training data for the complementary class consists of data for all classes the second FIS is mapping on. Correct selection of the training data is important, as performance depends on the correct detection of basic classes, but also on correct detection of complementary class. The recognition percentage for an average rule evaluation based on a test data set is plotted in Fig. 3 (black curve with + markers) against the equal amount of rules for the FIS recognizing all context classes. For this plot the order giving the best classification result is chosen in our example (first FIS A than FIS B). For evaluation purposes a different amount of rules for each FIS are combined and analyzed upon the average recognition rate for a test data set. The recognition rate in percentage is plotted as a surface in figure 6 on the left and as a contour plot on the right. A brief analysis of these plots can be found in discussion section. To implement each of the two FIS above, it is important to know the number of rules that have to be processed by each FIS for correct classification. Due to the high variance of possible input data, it is difficult to give a general estimation
AwarePen - Classification Probability and Fuzziness
0,1,4,5 FIS A
’lying’ = 0 ’cell ringing’ = 1 ’writing horiz.’ = 4 ’writing vertic.’ = 5 ’comp. class’ = −1
2,3,6,7,8 FIS B
comp.
2,3,6,7,8
comp. = −1
’playing’ = 2 ’pointing’ = 3 ’sit’ = 6 ’stay’ = 7 ’walk’ = 8 ’comp. class’ = −1
FIS B
’playing’ = 2 ’pointing’ = 3 ’sit’ = 6 ’stay’ = 7 ’walk’ = 8 ’comp. class’ = −1
655
0,1,4,5
comp. = −1
FIS A ’lying’ = 0
comp.
’cell ringing’ = 1 ’writing horiz.’ = 4 ’writing vertic.’ = 5 ’comp. class’ = −1
Fig. 4. Schematic of contextual classifier working with two FIS - (left) combination with 50% and (right) with 48% correct classifications
of the mean number of rules. A mean estimation for the number of rules to be processed for an input data can be given: NAB = P(A)(nA + P(B|A)nB ) + P(B)(nB + P(A|B)nA ) = n(P(A)P(B|A) + P(B)P(A|B)), for nA = nB = n and P(A) + P(B) = 1 Probabilities are only distinguished by the amount of context classes represented by each FIS and not through the average recognition rate for the classes or the complementary one. Hereby P(X) with X ∈ {A, B} is the probability of the FIS X to be evaluated according to the number of classes it classifies on. The probability that the second FIS Y ∈ {A, B} needs to be executed - if the first X is not capable of classifying the data - is expressed through the formula P(Y |X). The number of rules each FIS consists of is nX with X ∈ {A, B}, where in our case nA = nB . If each FIS is handling the same amount of rules, the mean rule evaluation is 12 n. 4.3
Queued Fuzzy Inference Systems
We can continue the above approach by further subdividing the classification into four FIS. For example, one FIS (F ISlying ) represents ’lying’ and ’cell ringing’, the next one (F IShand ) classifies ’playing’ and ’pointing’, another one (F ISwrite ) maps ’writing horizontally’ and ’writing vertically’ and the last (F ISpants ) manages the classes ’sitting’, ’staying’ and ’walking’. FISs are queued after another. The average amount of rule evaluations of 16,15 rules is a bit high, but the recognition rate has improved up to 68,44%.
5
Stateful Interpretation and Probabilistic Optimizations
Up to this point the best order of the FISs in a queue was based on the overall classification accuracy of the different sub-classifiers from a test-data run, and therefore static at run-time. Even better results can be reached using a dynamic ordering of FIS based on current state of the stochastic process. Basic knowledge about the statistical behaviour of the classifiers and the underlying process that is classified, allows us to optimize the order of the FIS queue dynamically,
656
M. Berchtold et al.
Fig. 5. 4 FISs dynamically queued - black (+) data pairs are correct classified and red (×) are false classified - 85,35% correct classified with an average of 10 rules evaluation
and avoid unnecessary execution of FIS modules. Although adding some complexity here, we can show that, we do not only save resources, but also improve recognition rates. One discovery we made looking at typical AwarePen states in a typical usage process is that the transition of a classified state to itself during following classification is more likely than to any other state. Furthermore, we also see that some classes, like ’playing’ after ’writing’, have higher transition probabilities than for example ’walking’ after ’writing’. In order to make use of this statistical feature in our experiments, we intuitively grouped classes in a way that similarly form a kind of ’macro state’. The transition probability from this state towards a state with classes from another FIS is much lower than towards itself. We used this property to pre-compute an oder of a classification queue such that we always execute the classifier first that matched in the last step. This way we reduce the expected rule executions, but also improve the recognition rates, because it becomes less likely to make mistakes in previous classifiers. The results of such a scheme are shown in figure 5 and as a confusion matrix in table 1. The recognition rate is again improved and the dynamic FIS queue classifies 85% of the check data correctly. 5.1
Modelling and Optimization of Classification Probabilities
From this last dynamic experiment we can see that probabilistic modelling can help us maximize the statistical classification correctness. The underlying trained fuzzy system provides us with a quality of classification that is valuable when interpreting a single classification in an application context. When quantitatively optimizing the system the previous experiments suggest that we should leave
AwarePen - Classification Probability and Fuzziness
657
Table 1. 4 FISs dynamically queued - Confusion matrix ˆ Error 0 0 0 94.0594 1 0 30.8511 2 5.7692 3.8462 3 0 1.0417 4 2.1053 1.0526 5 0.9524 0.9524 6 0 0 7 0 0 8 0 0
ˆ 1 4.9505 69.1489 1.9231 0 1.0526 0.9524 0 0 0
ˆ ˆ ˆ ˆ ˆ ˆ ˆ 2 3 4 5 6 7 8 0.9901 0 0 0 0 0 0 0 0 0 0 0 0 0 63.4615 3.8462 13.4615 7.6923 0 0 0 12.5000 80.2083 6.2500 0 0 0 0 6.3158 0 87.3684 2.1053 0 0 0 0.9524 2.8571 15.2381 78.0952 0 0 0 0 0 0 0 100.0000 0 0 0 0 0 0 0 100.0000 0 0 0 0 0 0 4.3478 95.6522
the fuzzy domain in favour of probabilistic modelling, as we do not look at a single classification any more but multiple samples. In this section we therefore provide a first probabilistic formalization based on our experimental findings to explain the relationship between recognition rates, class grouping and different classifier sequences. In order to model the overall statistical performance of the proposed queued interpretation system, we model the probability for a correct classification ˆ t = Xt ), i.e. that the real state X equals the classified state X. ˆ Both random P(X variables are defined over the same set of C. In the dynamic scheme we optimized the classification queues on the previous state, which need to be analyzed separately: ˆ t = Xt ) = P(X P(ˆ xt−1 ) P(xt |ˆ xt−1 ) P(Xˆt = xt |xt , x ˆt−1 ) x ˆt−1
xt
To reflect the different precomputed classifier sequences we introduce a total order relation
658
M. Berchtold et al.
Because the internal classifiers are stateless, i.e. independent of x ˆt−1 , we finally obtain a model for the overall recognition performance based on the recognition rates, the class grouping and the classifier sequences: ˆ t = Xt ) = P(X
(P(ˆ xt−1 )
x ˆt−1
(P(xt |ˆ xt−1 )P(Km(xt ) = xt |xt )
xt
P(Ki,t = ⊥|xt )))
i|Ki <xˆt−1 Km(xt )
5.2
Definition of a Cost Function
We can use this model to derive optimization strategies for designing classifiers. However, because in theory every classifier function can be represented by an infinite number of rules in our TSK-FIS [24], we have to consider execution complexity of the TSK-FIS when optimizing classification. To model this trade-off, we introduce a composite cost function that incorporates the amortized recognition rates and resource usage. The proposed optimization function minimize cost = E[class cost] + E[exec cost] is composed by the expectation of false classification costs and the expected number of rule execution costs. We use P(Xˆt = xt |xt , x ˆt−1 ), which is associated with class cost1 = 0 costs and the inverse P(Xˆt = xt |xt , x ˆt−1 )), which associate with an application specific cost class cost0 to define the expectation: E[class cost] = (1 − P(Xˆt = xt |xt , x ˆt−1 ))class cost0 Because of the strictly compositional setup of the constructed FIS the execution time is proportional to the number of rules. In order to obtain the expected execution costs, we generalize the equation of section 4.2 in a similar way as the equation for the recognition rates based on the number of rules in a FIS |KSˆ | and a cost factor rule cost: E[exec cost] = rule cost
xtˆ−1 xt
P(xt |ˆ xt−1 )
Ki
|Ki |
P(Kj, t = ⊥|xt )
Kj r(ˆ xt )Ki )
With a given algorithm for implementing classifiers K like our fuzzy system as an internal classification algorithm, we can optimize the class mapping m and queue mapping r to achieve a better recognition rate. Both the number of classifiers and the number of rules in a classifier can additionally be adapted. Resulting from this modelling we can easily see the interconnection of classifier grouping and recognition rates via the state and the transition probabilities. Because the true positive rates of the classifiers are dependent on an initial grouping and the transition probabilities P(xt |ˆ xt−1 ) are again recursively dependent on the recognition rates, we get a highly non-linear optimization problem. We can solve this problem heuristically based on expert knowledge as we did at the beginning of this section, or we can do an automated design space exploration using empirical data.
AwarePen - Classification Probability and Fuzziness
659
10
55
# rules 1.FIS
% correct classified
12
50 45
8
6
40 4 12
10
8
# rules 1.FIS
6
4
2
2
4
6
8
10
# rules 2.FIS
12 2 2
4
6
8
10
12
# rules 2.FIS
Fig. 6. Surface plot (left) and contour plot (right) for percent of right classified data pairs for two FIS - better FIS first in line
6
Discussion
As we saw in our earlier experiments, we can find interesting trade-offs between execution time and classification accuracy. Understanding those trade-offs in relation to the underlying statistical process contains a high potential for a model-guided optimization. Analyzing the cost function we can see different possibilities for minimization. In the simplest case above, that splits a FIS into two separate classifiers, we only changed execution order of a split classifier. Because we used only static queues - i.e. the classification process is independent on x ˆt , and P(xt ) was equally distributed - the expectation for the cost is solely dependent on the recognition rates of the correct and complementary state, as already seen in figure 4. The second degree of freedom is modifying the internal FISs classifiers themselves by changing the rule numbers and thus also adapting the execution cost expectation. We plotted the resulting optimization space in figure 6 comparing recognition rates depending on the number of rules in a classifier that show a certain degree of linearity in the distribution, which suggests that classifier systems can be modelled statistically. We can see that the maximal recognition rate is roughly distributed around the ”natural” cluster amount returned by our initial subtractive clustering. Knowing this optimal number of rules and the according recognition rate allows us to estimate the recognition rates of FISs with a different number of rules. This would mean, that even if the optimization space already grows quickly for the static case, we could easily find good solutions. The figure also shows that we can find Pareto optimal solutions for cost (proportional to the distance to zero) and recognition rate (height/colour). The third degree of freedom was discussed here and is represented in our model is the grouping of classes by common or separate classifiers. Naturally this affects the local recognition rates, as seen already in 3. In our dynamic approach, which can change the queue order by a pre-computed lookup of based on the previous match, the grouping additionally affects the joined probability
660
M. Berchtold et al.
for having to detect previous complementary classes. This probability would be 1 if Km(xt ) = Km(xˆt ) and thus optimal, if the previously classified state has the same classifier as the current. We also see potential to heuristically optimization such dynamic cases based on the model. By grouping classes into classifiers by their transition probabilities at the beginning of the last section, we exploited the observation that P(xt |ˆ xt−1 ) is dominated by P(xt |xt−1 ) for high overall recognition rates. With the results shown in 5 already hint the potential for such an approach.
7
Conclusion and Future Work
Motivated by the problems of extending our fuzzy classification system to support a larger quantity and diversity of recognizable context classes on a single, resource-constraint device, we started looking at the intrinsic problems of monolithic classifiers like our initial TSK-FIS. Because we strongly believe in the expressiveness and the intuitiveness of automatically learned fuzzy inference systems, we looked for ways to increase scalability. We showed in this paper, that a divide and conquer approach allows complex classifications while maintaining low execution overhead. Our experiments indicated, that maximum gain can be obtained by carefully designing the division of the classifiers. Different groupings, rules amounts and execution sequences for the classifiers all have effects on the recognition rates. It soon becomes clear that many of those effects are interdependent. Already a simple set-up, which compromises a classifier split into sub-classifiers acting in a chain of responsibility pattern, allows many degrees of freedom for optimizations. They are difficult to overlook by an appliance developer, who only wants to optimize the recognition rates while being bound in complexity by hardware constraints. Therefore we reviewed our experimental findings and formalized the classification process in a probabilistic model. This model captures the different effects of a divided classifier system and directly motivates different optimization strategies for maximizing the expected recognition rate while minimizing the expected costs. In this paper we outlined how a fuzzy inference system can profit from a complex top down probabilistic analysis, which can be done off-line. In future work we plan to develop a tool chain that helps the artefact developer in designing classifiers that are automatically optimized based on the parameters of the underlying process. We motivated and showed that a combination of a bottom-up, pre-trained classification system with an probabilistic approach is advantageous from both a system performance perspective, but also from the simplicity to understand and use such a system in ubiquitous computing settings. We believe that the complexity of embedded ubiquitous applications can only be dealt with by an intuitive design abstraction like fuzzy inference systems paired with optimization technologies for an automated design space exploration based on the probabilistic modelling of the appliance and its context.
AwarePen - Classification Probability and Fuzziness
661
References 1. Elkan, C.: The paradoxical success of fuzzy logic. IEEE Expert: Intelligent Systems and Their Applications 9(4), 3–8 (1994) 2. Berchtold, M., Decker, C., Riedel, T., Zimmer, T., Beigl, M.: Using a context quality measure for improving smart appliances. In: IWSAWC (2007) 3. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965) 4. Laviolette, M., Seaman, J., Barrett, J.D., Woodall, W.: A probabilistic and statistical view of fuzzy methods. Technometrics 37, 249–261 (1995) 5. Zadeh, L.A.: Is probability theory sufficient for dealing with uncertainty in ai? a negativ view. In: Uncertainty in Artificial Intelligence. Elsevier, Amsterdam (1986) 6. Kandel, A., Byatt, W.: Fuzzy sets, fuzzy algebra and fuzzy statistics. IEEE, Los Alamitos (1978) 7. Kosko, B.: The probability monopoly. Fuzzy Systems 2 (1994) 8. Kahneman, D., Slovic, P., Tversky, A.: Judgement under uncertainty: Heuristics and biases. Cambridge University Press, Cambridge (1982) 9. Kosko, B.: Fuziness vs. probability. International Journal of General Sys. (1990) 10. Fishburn, P.: The axioms of subjective probability. Statistical Sci. (1986) 11. Zadeh, L.A.: Discussion: Probability theory and fuzzy logic are complementary rather than competitive. Technometrics 37, 271–276 (1995) 12. Guarino, D., Saffiotti, A.: Monitoring the state of a ubiquitous robotic system: A fuzzy logic approach. In: Fuzzy Systems Conference, pp. 1–6 (2007) 13. M¨ antyj¨ arvi, J., Sepp¨ anen, T.: Adapting applications in mobile terminals using fuzzy context information. In: Mobile HCI, London, UK. Springer, Heidelberg (2002) 14. West, G., Greenhill, S., Venkatesh, E.: A probabilistic approach to the anxious home for activity monitoring. In: Computer Software and Applications Conf. (2005) 15. Castro, P., Chiu, P., Kremenek, T., Muntz, R.R.: A probabilistic room location service for wireless networked environments. Ubiquitous Computing (2001) 16. Park, J., Lee, S., Yeom, K., Kim, S., Lee, S.: A context-aware system in ubiquitous environment: a research guide assistant. Cybernetics and Intelligent Sys. (2004) 17. TecO: Telecooperation Office, Univ. of Karlsruhe (2006), http://particle.teco.edu 18. Tagaki, T., Sugeno, M.: Fuzzy identification of systems and its application to modelling and control. IEEE Trans. Syst. Man and Cybernetics (1985) 19. Sugeno, M., Kang, G.: Structure identification of fuzzy model. Fuzzy Sets and Systems 26(1), 15–33 (1988) 20. Yager, R., Filer, D.: Generation of fuzzy rules by mountain clustering. Journal on Intelligent Fuzzy Systems 2, 209–219 (1994) 21. Chiu, S.: Method and software for extracting fuzzy classification rules by subtractive clustering. IEEE Control Systems Magazine, 461–465 (1996) 22. Chiu, S.: 9, Extracting Fuzzy Rules from Data for Function Approximation and Pattern Classification. In: Fuzzy Information Engineering: A Guided Tour of Applications, John Wiley & Sons, Chichester (1997) 23. Jang, J.S.R.: Anfis: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics 23, 665–685 (1993) 24. Wang, L.X.: Adaptive Fuzzy Systems and Control. Prentice-Hall, Englewood Cliffs (1998)
A Model Driven Development Method for Developing Context-Aware Pervasive Systems Estefanía Serral, Pedro Valderas, and Vicente Pelechano Departamento de Sistemas Informáticos y Computación Technical University of Valencia, Camí de Vera s/n, E-46022, Spain {eserral, pvalderas, pele}@dsic.upv.es
Abstract. In this work we introduce a software engineering method for developing context-aware pervasive systems which is based on MDA and Software Factories. This method allows us to describe a context-aware pervasive system at a high level of abstraction by means of a set of models and then automatically generate the system code from these models. To do this, a method proposed by authors in previous works is extended to fully support context-awareness. The introduced extensions are: (1) a set of models that allow us to represent the context information at conceptual level; (2) a strategy to generate the system code automatically from the models; 3) mechanisms for storing and updating the context information and reasoning about it at runtime.
1 Introduction The vision of Weiser [15] where computers are in everywhere and people interact with electronically enriched environments that are sensitive and responsive to them, is closer and closer. To achieve this, pervasive systems must be completely integrated in the environment and disappear from the user point of view. In this sense, the way in which the systems allow users to interact with them becomes critical if we want to provide users with a satisfactory experience of use. Researchers are aware of this aspect and they have attempted to improve user interaction through the notion of context-awareness. A context-aware pervasive system is a system that provides users with a specific functionality but also is able to sense their context of use, and adapt its behaviour accordingly. We think the development of context-aware pervasive systems is an activity too complex for being handled with ad-hoc solutions. Hence, solid engineering methods are required to develop robust systems. We propose the use of a Model-Driven Development (MDD) method. MDD is a software engineering approach that claims models are the most important artefacts in the development cycle. System developers build and transform system models to achieve automatic code generation. The Object Management Group (OMG) has reflected these trends in its Model Driven Architecture (MDA) [9] proposal, and supports it by the UML 2.0. Microsoft is also promoting MDD with the Software Factories [10] and the DSL tools [18]. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 662–676, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Model Driven Development Method
663
MDD provides us with many benefits, such as the followings: 1) Using abstract models rather than low-level code, system maintenance, update and evolution are much more productive and economical. Besides, being able to work at a higher level of abstraction allows pervasive software developers to deal with heterogeneity more easily. 2) Simulation and early requirement validation is easier with the availability of a model. 3) System design and specification is more intuitive using models than with other kinds of artefact. 4) MDD facilitates systematic reuse of know-how knowledge, software best practices and development assets. In this context, we have developed a MDD method for pervasive systems that provides us with mechanisms to represent pervasive systems in a set of models and then automatically generate the corresponding system code from these models. This method is being developed during several years in our research group. In [20] [16] we can found the main characteristics of this method as well as examples of its application in the development of several case studies. Furthermore, we have developed some tools which support both the visual creation of models and the automatic generation of code [11] as well as tools for validating systems throughout simulation [20]. However, mechanisms to properly consider the context are not provided in these preliminary works. This problem is solved by the work presented in this paper. We extend this MDD method to properly consider context information in the different stages of its development process. In particular, we introduce: 1. 2. 3.
A set of models that allow us to properly consider context data at the conceptual level. A transformation engine to transform these models into the corresponding Java code. A strategy that allows the system to access and update the context data and reason about it at runtime, in order to analyze it and adapt the system behaviour accordingly.
The rest of the paper is organized as follow: Section 2 presents a detailed description of the concept of context as well as the main characteristic that a context-aware system must present. Section 3 introduces a case of study developed by our method, which is presented in Section 4. Section 5 presents the PervML models, paying a special attention to how these models have been extended to properly support context. Section 6 and Section 7 explain the implementation framework used by the method and the code generation strategy, respectively; and how they have been extended to properly consider the new introduced mechanisms. Section 8 presents the related work. Finally, conclusions and further work are explained in Section 9.
2 The Concept of Context According to Dey [1] context is “any information that can be used to characterize the situation of an entity”, and an entity is “a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves”. To identify this information, we have analyzed several proposals and finally we have based on SOUPA ontology [2]. SOUPA can be considered one of the most influential published ontology model for pervasive systems. Therefore, according to
664
E. Serral, P. Valderas, and V. Pelechano
this ontology, the context information is made up of: 1) system users (name, age, address, etc.); 2) user preferences;3) space information; 4) system services; 5) privacy and security policies (which indicate what operations each user can execute); 6) services available to a certain user in the current time; 7) temporal information (date and time, working days, etc.); 8) user mobility; 9) service state; and 10) user actions (what the user is doing at present moment and in the past). Besides, we can identify two types of context information: 1) Those that are available when the system is being specified such as information about users, user preferences, space information, system services, and privacy and security policies; and 2) those that are only available at runtime such as temporal information, user mobility, service state, available services and actions that users perform. To properly support this context information a context-aware pervasive system must present the following characteristics [1]: 1) presentation of information and services to a user: the system must provide mechanisms that allow users to properly access the information and functionality provided by the system; 2) tagging of context to information to support later retrieval: the system must process the context data to allow an efficient access to it; 3) automatic execution of a service operation for a user: the system must be able to analyze users behaviour and changes in context to automatically execute operations when users need them. We explain in this work how our method allows us to develop context-aware pervasive systems that present the first two features. As far as the last one, we are currently working on it. Although it has been left as a further work, some details about the strategy that we are following to support this feature can be found in [17].
3 Case Study: The Smart Home We have developed a context-aware pervasive system for a two floor smart home where a couple lives. The plan of the first floor of the house is shown on the left of Fig. 4. The system provides a great variety of services, such as lighting intelligence management, that controls the lighting according to the users presence and the light intensity, for instance, the intensity of the garden lighting is increased as the outside light is decreased; multimedia management, that allows us to store, manage and reproduce multimedia archives; air conditioning management, which is in charge of achieving the optimum temperature; security management, which, when it activated, if it detects presence inside or detects that a door or window has been opened, the system starts the presence simulation, starts to record and informs users; etc.
4 A MDD Method for Context-Aware Pervasive Systems The method that we propose to develop context-aware pervasive systems provides: • The PervML models: a set of models for specifying context-aware pervasive systems using conceptual primitives suitable for this domain. To properly capture the context data available at design time we have added in the conceptual model the Location Diagram (to specify space information) and the User Model (to specify users and policies).
A Model Driven Development Method
665
• A transformation engine and an implementation framework to automatically translate the PervML models into the system Java code. The constructors provided by this framework have been extended with mechanisms in order to manage the context information at runtime. • An ontology that defines the concepts introduced in PervML and a transformation engine to translate the PervML models into the system OWL specification according to the ontology. We have incorporated both the ontology and the transformation to automatically obtain a description of the system that can be used for adaptation purposes. The main characteristics of this can be found in [17]. With this strategy the system information is stored in an OWL specification that is updated at runtime. To do this, new constructors have been incorporated to the implementation framework introduced above. This allows us to infer knowledge from the context data at runtime by means of reasoners and machine learning algorithms.
Fig. 1. The MDD Development process
As Fig. 1 shows, the system is developed by performing the following steps: Step 1. Conceptual Modelling: The system is specified at a high level of abstraction by means of the PervML models. PervML allows us to define the system functionality from abstract mechanisms that are independent of the technology. Step 2. Code Generation: Two model transformations are executed to translate PervML models into both 1) Java code and 2) OWL specifications: 1) The generated Java code implements the functionality that supports the services that the context-aware pervasive system must provide to users. This code extends the framework, which provides a common architecture for all the systems that are developed using the method. 2) The OWL specification describes the context-aware pervasive system by using concepts of the PervML ontology. This specification is used to infer context knowledge at runtime. This knowledge is used by the system for supporting the adaptation to context changes and the automatic execution of service operation for a user. Moreover, the generated Java code updates the OWL specification at runtime according to the way in which users interact with the system. Step 3. Driver implementation: An OSGi developer must implement the drivers for managing the access from the implementation framework to the devices selected. They must be developed by hand, since they deal with technology-dependent issues. However, we can have a driver repository with the drivers used in previous systems. Thus, if any device or external software system was used in a previous system, the same driver can be reused.
666
E. Serral, P. Valderas, and V. Pelechano
Step 4. Deployment: The Java files are configured to use the selected drivers. This configuration only implies to set up the drivers identifiers. Besides, on the one hand, the generated files are compiled, packaged into bundles (JAR files) and deployed in the OSGi server with the implementation framework and the drivers. On the other hand, the reasoner is running with the OWL specification, which is updated by means of the framework according to the runtime information and the information extracted from the rules put in the reasoner.
5 Conceptual Modelling: Supporting Context in the PervML Models PervML [8] is a language designed with the aim of providing to pervasive system developers with a set of constructs that allows to precisely describing the pervasive system. We have extended this language in order to properly support context information. Thus, in this section, we introduce first an overview of PervML, using this for specifying partially the smart home case of study and next, we explain in detail how the PervML models have been extended in order to support the description of context information. 5.1 PervML: An Overview PervML promotes the separation of roles where developers can be categorized as System Analysts and System Architects. On the one hand, System Analysts capture system requirements and describe the pervasive system using the service metaphor as the main conceptual primitive. They specify three models which are shown in Fig. 2: the Services Model, the Structural Model, and the Interaction Model. ServicesModel
InteractionModel Structural Model
Fig. 2. A partial view of the System Analyst models
A Model Driven Development Method
667
The Services Model describes the kinds of services that are provided in the system. The diagram that represents this model in Fig. 2 shows that the smart home provides services for controlling the lighting, for managing the security, etc. Additionally to the information shown in Fig. 2, the description of a kind of service includes (1) pre and post conditions (which are expressed using the Object Constraint Language (OCL)) for every operation, (2) a Protocol State Machine (which indicates the operations that can be invoked in a specific moment), and (3) triggers (which allow specifying the proactive behaviour of the services). The Structural Model is used to indicate the instances of every service which are provided by the system. The instances are represented as components and the service that each one provides is depicted as an interface. For example, Fig. 2 shows the SittingRoomBlind Component that provides the BlindManagement service. Dependency relationships between components can be included to specify that one component uses the functionality provided by another. In order to not overload this figure only the sitting room components are shown. For instance, we have defined several components such as the SittingRoomLightingControl which uses the functionality provided by the SittingRoomPresenceDetection and the SittingRoomLighting components, defined in the model too. The Interaction Model specifies the communication that is produced as a reaction to some system event. Every interaction is described by a set of UML 2.0 sequence diagrams. Thus, analyst identifies the components of the Structural Model that participate in the interaction, defines the messages that the components must interchange and specifies the condition that triggers the interaction by using OCL. The actions described in the diagram will be executed when the condition is satisfied. Fig. 2 shows the interaction that is in charge of opening the blinds and switching on the water heater when the alarm clock goes off at 7.00 a.m. Component Structure Model
BindingProvider Model
Functional Model SittingRoomBlind void open() if _ST_B1.getPosition() = “CLOSED” then _ST_B1.up() _ST_B1.up() else if _ST_B1.getPosition() = “MIDDELPOSITION” then _ST_B1.up() Endif
Fig. 3. A partial view of the System Architect models
On the other hand, System Architects specify which devices and/or existing software systems support the system services. We refer to these elements (devices and software systems), as binding providers because they bind the pervasive system with its physical or
668
E. Serral, P. Valderas, and V. Pelechano
logical environment. System Architects specify three models (the Binding Providers Model, the Component Structure Model, and the Functional Model) which are shown in Fig. 3. The Binding Providers Model describes the different kind of devices that are used in the system. Fig. 3 shows some of the binding providers for our smart home. For instance, the diagram specifies that the MovementDetector sensor provides an operation to know if it detects some movement. The Component Structure Model is used to assign devices and software systems to the system components. For instance, the SittingRoomLighting uses the ST_GL1 and ST_GL2 devices that are instances of the GradualLamp Binding Provider. Besides, the same device can be used for different components. The Functional Model specifies the actions that are executed when an operation of a service is invoked. These actions are specified using the Action Semantics Language (ASL) of UML. Every operation provided for every component must have associated a functional specification. Fig. 4 shows the actions that are executed when the open operation of the SittingRoomBlind Component is invoked. 5.2 Capturing Context Information at Conceptual Level The context d that is captured at the conceptual level is related to services, users, policies and space information. Services are modelled in the System Analyst Models; however requirements about users, policies and space information are not captured in PervML. Hence, we add to the Structural Model the location diagram (to specify space information) and we design the User Model (to specify the users and policies). Next, we explain them in detail. 5.2.1 Location Diagram In the Structural Model the System Analyst uses an UML 2.0 component diagram to indicate which components are available in each location to provide the services defined in the Services Model. However, there is not an explicit specification about the locations where these components are deployed. Hence, we have design a new diagram for this Model: the Location Diagram. This diagram describes the different areas where services can be located or where the user can move. It is specified by means of a UML package diagram. Each package represents a certain area, and the hierarchy between packages symbolizes the space hierarchy between the areas simulated by those packages. Also, two types of association can exist between the areas or packages: adjacency and mobility (or accessibility). Adjacency means that the zones are one next to the other, whereas mobility is the possibility to go from one area to another. Adjacency is represented by a line between two areas. Since mobility implies adjacency, it is represented by adding arrows to the line between two areas. Thus, this model allows us to infer information such as transitive relations. The Location Diagram for the first floor of our smart home is shown in Fig. 4. This figure shows, for instance, that Hall and Garden are adjacent but a user can not go from one to another, however Hall and SittingRoom are adjacent and a user can go from one to another too.
A Model Driven Development Method
669
Ù Fig. 4. Location Diagram of Floor1
5.2.2 User Model The User Model has been design to specify the context information about users and policies. It is specified by the System Analyst and is defined from two elements: • A Policy Diagram, which is used to specify the policies of the system. System Analysts must specify the policies indicating a list of available operations for each policy. To do this, the analyst can specify the list of allowed operations in different ways: 1) by adding a service, so that every operation of every component that provide that service will be allowed; 2) by adding a component, so that every operation of that component will be allowed; 3) by adding a service operation, so this operation of every component that provide this service will be permitted; or 4) by adding a component operation, so this operation of this component will be permitted. Furthermore, inheritance relations can be established in this model. These relations allow us to define capacities of a policy taking the capacities of a defined policy as a basis. For instance, in the example of Fig. 5 the Father can execute operations related to the services Lighting, GradualLighting and BlindManagement. The Child can execute the same operation except the operations of the BlindManagement service. ● A set of User Characterization Templates, which specify the users of the system. For each user Pervasive System Analysts must indicate the following information: -
The policy associated to the user (which is specified in the Policy Diagram). The following personal data: name, surname, gender, date of birth and marital status. The following contact data: email, telephone number, mobile phone and address. Social relations, i.e., information related to people that the user knows.
Both Personal and Contact data are those proposed by SOUPA to characterize users. Fig. 5 shows an example of User Characterization Templates. This example represents the user Peter in the system, whose policy is Father. Finally, remark that this model provides support for the privacy, the security and the views of the system, since users will be able to see and execute only system actions that they are authorized to use. On the other side, the construction of this model is optional. If System Analysts do not create this model, three policies (with a default user for each of them) are defined: (1) the administrator, who can execute all the available operations in the system including configuration operations; (2) the limited user, who is able to execute
670
E. Serral, P. Valderas, and V. Pelechano
all the available operations except configuration operations; and (3) the guest, who can only consult the state.
Fig. 5. A Policy diagram on the left and a User Characterization Template on the right
6 Implementation Framework for Building Context-Aware Pervasive Systems According to the development process presented in Section 4 the next step consists in generating code that implements the system functionality. This code extends an implementation framework. Thus, before introducing the strategy of code generation we explain the main characteristics of the framework. The proposed implementation framework has been built on top of the OSGi middleware [19]. It raises the abstraction level of the target platform by providing similar constructs to those defined by the primitives proposed in the PervML models.Besides, the framework significantly reduce the amount of code that must be generated because encapsulates the common functionality and structure of the elements that are generated by our method. The framework applies the Layers Architectural Pattern [12], which allows us to organize the system elements in layers with well defined responsibilities. Hence, with the aim of providing facilities for integrating several technologies (EIB networks, web services...) and for supporting multiple user interfaces, the framework architecture has been designed in three layers: 1) The Driver Layer, which is in charge of managing the access to the devices and external software. 2) The Logical Layer, which is in charge of give support to the generation of the system logic. It is subdivided in two sub layers: the Communications sub Layer that gives support to code generation about binding providers and the Services sub Layer that provides the functionality as it is required. We have added the Security sub layer that gives support to code generation about users and policies and is in charge of any user can only execute the operations allowed for him/her. 3) The Interface Layer, which manages the access to the system by any kind of clients. A more detailed description of these layers can be found in [8]. As far as context-aware characteristics of the system, the implementation framework provides us with mechanisms to support: (1) the presentation of information and services to a user and (2) tagging of context to information to support later retrieval (i.e., to extend the OWL specification with runtime context information). Next, we explain in more detail the logical and the interface layers, which support these context-aware characteristics.
A Model Driven Development Method
671
6.1 Logical Layer The logical layer provides the set of classes that facilitate the generation of the logic of the system. To do this, these classes provide us with constructors that are similar to PervML concepts such as service or binding provider. These classes are extended to generate the final system. They define the attributes that must take value when the framework is instantiated and implement the execution strategies of each element using the Template Method pattern [12]. Furthermore, they also provide support for the tagging of runtime context to information to support later retrieval. Fig. 6 shows a partial view of these classes as well as the different relationships among them. The classes inside the dashed rectangle provide support to the communication and services sub-layers. For instance, they provide support to implement the different binding providers that communicate the system with the physical devices. The hierarchy of classes inside the dotted rectangle facilitate the code generation from policies and users specified in the user model. For instance, the User class provides mechanisms for checking security policies: when a system service is activated by a specific user, the corresponding User object checks if the user can activate the operation.
Fig. 6. Logical Layer Implementation
Updating Context Information at Runtime. The User class encapsulates methods for the processing of runtime context data too. In particular, this class is in charge to keep updated the OWL specification obtained from the PervML models with context information related to the user actions. To understand how this is done, we must consider that the User class is in charge of checking security policies. Every time that a service is activated (throughout the interface layer) a method of the User class is automatically executed. This method maintains updated the OWL specification by adding the interactions with the system at runtime. When a service operation is executed the system inserts into the OWL specification the corresponding OWL description of this change. For instance, when Peter user switches on the kitchen lighting, the User class that represents him, checks if the policy assigned to Peter allows him to execute this operation. If the operation is allowed by his policy, the User class that represents Peter uses the service to carry the
672
E. Serral, P. Valderas, and V. Pelechano
operation out. Finally, if the operation is performed successfully, the User class creates the OWL action description and adds it to the OWL specification. 6.2 Interface Layer The Interface Layer gives support to the presentation of information and services to users. To implement this layer, we have used the Model-View-Controller (MVC) pattern [12], which provides us with support for having different interfaces to interact with the system. So, following this strategy, the components of the Service Layer correspond with the model; we have implemented the Controller class that is reusable for every interface; and specific viewers must be implemented for every supported interface. Thus, as well as implement the Controller class, we have implemented some viewers, for example, the corresponding to a Web interface. Fig. 7 shows the Controller class, and the classes which implement this Web interface too. This Web interface has been implemented for accessing from desktop Web browsers.
<>
<<uses>>
ServicesUIServlet <<uses>>
<>
ServicesListingServlet
Fig. 7. Web Interface
The Controller class provides methods that allow users: (1) To select the service instance (component, in PervML terminology) that the user wants to interact to. For instance, the method getServicesFromLocation (location) return all the services that are available for a user in a specific location. (2) To interact (get information and request functionality) with one specific service instance. For instance ManageOperation(componentPID, operation, parameters) is in charge to execute a Component operation using Java reflection capacities. These methods take the user identification into account. They only show the services and the operations specified in the policies of User Model for the user identified. Fig. 7 also shows the ServicesListingServlet class and the ServicesUIServlet class that implement Java Servlets. These classes provide the view of the web interface and invoke the Controller operations in order to generate the web pages that will be showed to users. The ServicesListingServlet class invokes the first set of methods, whereas the ServicesUIServlet class invokes the second set of methods. It is worth to remark that every class in Fig. 7 is a concrete class. This means that classes have not to be extended. Therefore, these elements are reusable for all the pervasive systems that are developed
A Model Driven Development Method
673
using the proposed approach. This feature has been feasible since (1) every PervMLComponent implements an interface which is known by the Controller and (2) the Java reflection capabilities has been used for invoking previously unknown methods.
7 Code Generation Strategy: From Models to Code In this section we explain how the code generation strategy is extended for supporting these new mechanisms. The required extensions are: 1) To extend the existent model-to-code transformation in order to generate code from the context information defined in the PervML models. We explain this next. 2) A new model-to-code transformation that generates the OWL specification from PervML models. This transformation can be found in [17]. 7.1 Considering Context in the PervML to Java Code Transformation In this sub section we present first a brief overview of the transformation strategy and next we explain how it is extended to support context. Transformation Strategy. To transform the context information to Java code we use Eclipse [13]. Eclipse is a flexible and extensible platform with many plug-ins that provide functionality for specific purposes (one of these purposes is the creation of modelling tools and code generation engines). We use the following plug-ins in order to support the generation of code: the Eclipse Modelling Framework (EMF) plug-in and the MOFScript tool. EMF is a modelling set of tools and code generation facilities for specifying metamodels and managing model instances. We use it to create first the PervML metamodel and next to automatically generate from this: 1) A set of Java classes representing each PervML metamodel concept; these Java classes provide methods to modify PervML models according to the metamodel. 2) A basic tree editor that facilitates the development of PervML models according to the metamodel. The MOFScript tool is an implementation of the MOFScript model-to-text transformation language. MOFScript allows us to create model-to-code transformation rules that are applied over the different elements of a source model in order to generate the corresponding code. We have used MOFScript rules to create model-to-code rules that generate code from the PervML models that are created with the editor generated by EMF. Thus, in order to transform PervML model into Java code we first define these models by using the EMF editor and next we apply the set of defined MOFScript rules to these models. Supporting context in the transformation process. In order to properly consider the context information in the transformation process we extend the PervML metamodel defined in [11] by using EMF. In this extension we present the abstractions required for the new introduced models to capturing context at the conceptual level. Next, we define new MOFScript rules that transform these new abstractions into code. To define these rules we first establish the mappings between the added models and the outputs produced from these models. An intuitive description of these mappings is the following: • From the Structural Model: Every component produces a concrete class in the Services Layer which extends the abstract class that is produced for the service that the
674
E. Serral, P. Valderas, and V. Pelechano
component implements. This concrete class is in charge of implementing the operations that were defined in its interface. It also contains the location where is, maintains links with the components that it uses, and provides mechanisms for reacting to their changes. • From the User Model: Every policy produces a concrete class in the Security layer which extends the abstract class Policy of the framework. This class is in charge of restricting the operations that can be accessed with this policy. Every user produces a concrete class in the Security layer which extends the abstract class User of the framework. This class contains the user properties (contact data and personal data), the policy assigned to the user, the relationships with other users and the system access data.
Fig. 8. On the left a MOFScritp rule; on the right, the code generated by this rule
Once mapping have been defined, we have implemented a set of MOFScript rules from them. Fig. 8 shows an example of these rules and the code that this generates. This rule generates the code of the Father policy. It generates the code that is in charge of initializing the name of the policy and the service operations allowed for the policy.
8 Related Work This section presents an overview of some of the most important context-aware pervasive systems. The CASS (Context-awareness sub-structure) project [7] is a middleware designed for context-aware mobile applications. CASS supports high-level context data abstractions and the separation of both context-based inferences and behaviour from the application code. The CORTEX project has built a context-aware middleware based on the Sentient Object Model [5]. It is suitable for the development of context-aware
A Model Driven Development Method
675
applications in ad-hoc mobile environments. It also allows developers to use data from disparate sensors, represent context application, and reason about the context efficiently. CoBrA and SOCAM, for instance, deal with context data basing on ontologies. CoBrA (Context Broker Architecture) [3] is an agent based architecture that supports contextaware computing in intelligent spaces. CoBrA adopts an OWL-based ontology approach, and it offers a context inference engine that uses rule-based ontology reasoning. SOCAM (Service-oriented Context-Aware Middleware) [6] is another architecture for building context-aware mobile services. It divides a pervasive computing domain into several subdomains and then defines each sub-domain in OWL to reduce the complexity of context processing. SOCAM implements a context reasoning engine that reasons over the knowledge base too. None of these works attempts to develop the context-aware pervasive system applying MDD and SF guidelines. Following these guidelines, our method gives support to the entire development trajectory from the system specification to the system in execution, i.e., this method allows us to generate the context-aware pervasive system code from the PervML models automatically and completely. In our approach the system is specified by using primitives of a high level of abstraction what does that it is more intuitive than traditional methods. Besides, it allows us to focus on satisfying systems requirements, and reduces development time, because developers do not spend time solving technological problems. Moreover, none of them set reasoning out from the point of view of OWL individuals [17] to infer relevant information from pervasive systems and adapt their behaviour accordingly.
9 Conclusions and Further Work In this work, we have extended the model-driven development method for pervasive system proposed in [8] in order to developed context-aware pervasive systems. The proposed method applies the MDA and Software Factories guidelines and allows us to properly specify the context information in a set of models, and automatically generate the code of the context-aware pervasive system from these models. The proposed method provides us with: 1. 2. 3.
4.
A set of models that allow specifying a context-aware pervasive system. Mechanisms for storing the context information in OWL in order to be able to be accessed by the system at runtime. An extension of the implementation framework to give support to the translation from added models into code and the updating of the OWL specification accordingly to the context information produced at runtime. A transformation engine to translate the context models into Java code.
As further work, we plan to 1) allow the reconfiguration of the system by user initiative, such as create new users or policies, 2) study machine learning algorithms to predict next actions and extract behaviour patterns to execute these automatically 3) realize inverse reengineering in order to adapt the system to context information and 4) develop more cases of study for context-aware pervasive systems.
676
E. Serral, P. Valderas, and V. Pelechano
References 1. Dey, A.K.: Understanding and Using Context. Personal Ubiquitous Computing (2001) 2. Chen, H., Finin, T., Joshi, A.: An ontology for context-aware pervasive computing environments. Special Issue on Ontologies for Distributed Systems, Knowledge Engineering Review 18(3), 197–207 (2004) 3. Chen, H.: An Intelligent Broker Architecture for Pervasive Context-Aware Systems. PhD thesis, University of Maryland, Baltimore County (2004) 4. Roman, M., Hess, C., Cerqueira, R., Ranganathan, A., Campbell, R.H., Nahrstedt, K.: A middleware infrastructure for active spaces. IEEE Pervasive Computing (2002) 5. Biegel, G., Cahill, V.: A framework for developing mobile, context-aware applications. In: Proceedings of the 2nd IEEE Conference on Pervasive Computing and Communication (2004) 6. Gu, T., Pung, H.K., Zhang, D.Q.: A middleware for building context-aware mobile services. In: Proceedings of IEEE VTC, Milan, Italy (2004) 7. Fahy, P., Clarke, S.: CASS – a middleware for mobile context-aware applications. In: Workshop on Context Awareness, MobiSys 2004 (2004) 8. Muñoz, J., Pelechano, V.: Applying Software Factories to Pervasive Systems: A Platform Specific Framework. In: ICEIS 2006, Paphos (Cyprus), pp. 337–342 (2006) ISBN: 9728865-43-0 9. Object Management Group. Model Driven Architecture Guide (2003) 10. Greenfield, J., Short, K., Cook, S., Kent, S.: Software Factories. Wiley Publishing Inc., Chichester (2004) 11. Cetina, C., Serral, E., Muñoz, J., Pelechano, V.: Tool Support for Model Driven Development of Pervasive Systems. In: MOMPES 2007, Braga, Portugal, pp. 33–41 (2007) 12. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading (1994) 13. http://www.eclipse.org 14. http://www.w3.org/TR/owl-features/ 15. Weiser, M.: The Computer for the 21st Century. Scientific American 265(3), 94–104 (1991) 16. Muñoz, J., Pelechano, V., Cetina, C.: Implementing a Pervasive Meetings Room: A Model Driven Approach. In: IWUC 2006, Paphos, Cyprus, pp. 13–20 (2006) ISBN: 972-886551-1 17. Serral, E., Valderas, P., Muñoz, J., Pelechano, V.: Towards a Model Driven Development of Context-aware Systems for AmI Environments. AmI.d, Nize, France (2007) 18. http://msdn2.microsoft.com/en-us/vstudio/aa718368.aspx 19. http://www.osgi.org/ 20. Muñoz, J., Ruiz, I., Pelechano, V., Cetina, C.: Un framework para la simulación de sistemas pervasivos. In: UCAmI 2005, Granada, Spain, pp. 181–190 (2005) ISBN: 84-9732442-0
Intelligent System Architecture for Context-Awareness in Ubiquitous Computing Jae-Woo Chang and Seung-Tae Hong Dept. of Computer Engineering Chonbuk National University, Chonju, Chonbuk 561-756, South Korea [email protected], [email protected]
Abstract. In this paper, we design an intelligent system architecture for dealing with context-aware application services in ubiquitous computing. The intelligent system architecture is composed of middleware, context server, and client. The middleware component of our intelligent system architecture plays an important role in recognizing a moving node with mobility by using a Bluetooth wireless communication technology as well as in executing an appropriate execution module according to the context acquired from a context server. The context server functions as a manager that efficiently stores into the database server context information, such as user's current status, physical environment, and resources of a computing system. To verify the usefulness of our intelligent system architecture, we finally develop a context-aware application system base on it, which provides users with a music playing service in ubiquitous computing environment.
1 Introduction In traditional computing environments, users actively choose to interact with computers. On the contrary, ubiquitous computing is embedded in the users’ physical environments and integrates seamlessly with their everyday tasks [1]. Mark Wieser at Xerox Palo Alto Research Center identified the goal of future computing to be invisible computing [2]. An effective software infrastructure for running applications in ubiquitous computing must be capable of finding, adapting, and delivering the appropriate applications to the user’s computing environment based on the user’s context. Thus, context-aware systems determine which user tasks are most relevant to a user in a particular context. They may be determined based on history, preferences, or other knowledge of the user’s behavior, as well as the environmental conditions. Once the user has selected a task from the list of relevant tasks, an application may have to move seamlessly from one device to another and from one environment to another based on the user’s activity. The context-awareness is one of the most important technologies in ubiquitous computing, which facilitate information acquisition and execution by supporting interoperability between users and devices based on users' context. In this paper, we design an intelligent system architecture for dealing with contextaware application services in ubiquitous computing. The intelligent system architecture is composed of middleware, context server, and client. The middleware component of F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 677–686, 2008. © Springer-Verlag Berlin Heidelberg 2008
678
J.-W. Chang and S.-T. Hong
our intelligent system architecture plays an important role in recognizing a moving node with mobility by using a Bluetooth wireless communication technology as well as in executing an appropriate execution module according to the context acquired from a context server. In addition, the context server functions as a manager that efficiently stores into database server context information, such as user's current status, physical environment, and resources of a computing system. In order to verify the usefulness of our intelligent system architecture, we develop a context-aware application system based on it, which provides a music playing service in ubiquitous computing environment. The remainder of this paper is organized as follows. The next section discusses related work. In section 3, we describe the intelligent system architecture for contextawareness, including clients, middleware, and context server. In section 4, we present the development of our context-aware application system based on the intelligent system architecture. Finally, we draw our conclusions in section 5.
2 Related Work Context-aware computing was first discussed by Schilit and Theimer [3] in 1994. A system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user’s task. [4] A context awareness system is to be fully controlled and guided by the time-varying contextual conditions and system’s progress should remain both predictable and deterministic. [5] In this section, we introduce some typical systems for context-awareness. First, INRIA in France [6] proposed a general infrastructure based on contextual objects to design adaptive distributed information systems in order to keep the level of the delivered service despite environmental variations. The contextual objects (COs) were mainly motivated by the inadequacy of current paradigms for context-aware systems. The use of COs does not complicate a lot of development of an application, which may be developed as a collection of COs. They also presented a general framework for context-aware systems, which provides application developers with an architecture to design and implement adaptive systems and supports a wide variety of adaptations. Secondly, AT&T Laboratories Cambridge in U.K [7] presented a platform for context-aware computing which enables applications to follow mobile users as they move around a building. The platform is particularly suitable for richly equipped, networked environments. Users are required to carry a small sensor tag, which identifies them to the system and locates them accurately in three dimensions. Thirdly, Arizona State Univ. [8] presented Reconfigurable Context-Sensitive Middleware (RCSM), which made use of the contextual data of a device and its surrounding environment to initiate and manage ad hoc communication with other devices. The RCSM provided core middleware services by using dedicated reconfigurable Field Programmable Gate Arrays (FPGA), a context-based reflection and adaptation triggering mechanism, and an object request broker that are context-sensitive and invokes remote objects based on contextual and environmental factors, thereby facilitating autonomous exchange of information. Finally, Lancaster Univ. in U.K [9] presented a comprehensive description of the GUIDE project which has been developed to provide city visitors with a hand-held context-aware tourist guide. The development of GUIDE has involved: capturing a real set of application requirements, investigating the properties of a cellbased wireless communications technology in a built-up environment and deploying a
Intelligent System Architecture for Context-Awareness in Ubiquitous Computing
679
network based on this technology around the city, designing and populating an information model to represent attractions and key buildings within the city, prototyping the development of a distributed application running across portable GUIDE units and stationary cell-servers.
3 Intelligent System Architecture for Context-Awareness 3.1 System Architecture Context is any information that can be used to characterize the situation of any entity [4]. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. In this section, we propose an intelligent system architecture for supporting various context-adaptive systems, which is divided into three components, context server, middleware (fixed node), and moving node (client). First, the context server serves to insert remote objects into an object database and context information into a context database, as well as to retrieve them from the both databases. Secondly, a fixed node functions as a middleware component to find, insert, and execute a remote object for context awareness. Finally, a moving object serves to execute a predefined built-in program according to the context information acquired from the middleware component. Figure 1 shows the intelligent system architecture for supporting various context-awareness. Because our intelligent system architecture combines the advantage of the INRIA work with that of the AT&T work, it has a couple of powerful features. First, the middleware component can define context objects describing context information as
Context Server
context DB
TCP/IP TCP/IP Middleware (Fixed Node 1)
TCP/IP Middleware (Fixed Node n)
bluetooth
bluetooth Middleware (Fixed Node 2)
Moving Node 1 (Client)
bluetooth Moving Node m (Client) Moving Node 2 (Client)
Fig. 1. Intelligent system architecture for supporting context-awareness
680
J.-W. Chang and S.-T. Hong
G
Context Database
Context Server
Context Server
Server API TCP/IP
TCP/IP Communication Module
TCP/IP
G
Context Definition Script
Middleware
Script Language Processor Fixed Node
Bluetooth
Remote Object Storing & Execution Module Bluetooth Communication Module
TCP/IP Communication Module Middle-Ware
Bluetooth Moving Node
Bluetooth Communication Module
G
Client
G
Fig. 2. Communication of the three components
well as can keep track of a user’s current location. Secondly, our context server can store context objects and their values depending on a variety of contexts as well as can mange users’ current locations being acquired from a set of fixed nodes by using spatial indexing. Finally, our client can provide users with adaptive application services based on the context objects. Meanwhile, the context server communicates with a middleware component by using a network based on TCP/IP, while a moving object communicates with a middleware component using Bluetooth wireless communication [10]. In order to makes a connection to a fixed node, a moving node maintains all the addresses of fixed nodes with which can communicates by using Bluetooth socket module. Once a connection between them is established, a moving node communicates with the fixed node periodically and determines whether or not the connection between them should hold. Figure 2 shows the communication between two of the components in the architecture. 3.2 Middleware Component The middleware component of our intelligent system architecture consists of three layers, such as detection/monitoring layer, context-awareness layer, and application layer. First, the detection/monitoring layer serves to monitor the locations of remote objects, network status, and computing resources, i.e., CPU usage, memory usage, bandwidth, and event information related with devices including Bluetooth. Secondly, the context-awareness layer functions as a middleware component which is an essential part for handling context-aware application services. It can be divided into five managers, such as script processor, remote object manager, context manager, context selection manager, communication proxy manager. The script processor analyzes the content of context-aware definition script and executes its specified actions. The remote object manager manages a data structure for all the context objects used in
Intelligent System Architecture for Context-Awareness in Ubiquitous Computing
681
application programs. The context manager manages context and environmental information including user preference and user location. The context selection manager chooses the most appropriate context information under the current situation. The communication proxy manager serves to communicate with the context server and to temporarily reserve data for retransmission in case of failure. Finally, being executed independently of the middleware component, the application layer provides users with a set of functions to develop various context-aware applications by using the application programming interface (API) of the middleware component. Figure 3 shows the three-layered structure of our middleware component. Application Layer Script
a t t r i b u t e
Object Request
Script Processor Object Reply
Object Manager Object Context Context Manager
Interest
Object Request
Communication Manager(Proxy) Object Varients
Object
Context
Selection Manager
Info. on Context Awareness Layer detected change
Detection / Monitoring Layer
Fig. 3. Three-layered structure of our middleware component
3.3 Context Server Component For context-awareness, context server is required to store and retrieve remote objects and context information extracted from them. We design a context server which can efficiently manage both the remote object and the context information using a commercial DBMS called MySQL. This is because we can reduce the developing time, compared with using a storage system, and we can increase the reliability of the developed system. The designed context server analyzes and executes the content of packets delivered from the middleware component. That is, the server determines whether the packet’s content is contexts or context objects and stores them into the corresponding database. Figure 4 shows the structure of context server which is divided into four managers, such as communication manager (CM), packet analysis manager (PAM), context/object manager (COM), and mySQL query manager (SQM). Based on the parsing, the PAM calls the most proper functions, i.e., context APIs, in the COM. The COM translates into SQL statements the content delivered from the PAM and delivers the SQL statements to SQM to execute them. The context APIs (application programming interfaces) for the COM is ContextDefine, ContextDestroy, ContextInsertTuple, ContextDeleteTuple, ContextSearch, ContextSearchTuples, and
682
J.-W. Chang and S.-T. Hong
Communication Manager
Packet Analysis Manager
MySQL DBMS
Context/Object Manager
MySQL Query Manager
Context DB
Fig. 4. Structure of context server
ContextCustom. The SQM executes the SQL statements from the COM using the MySQL DBMS and delivers the result to the middleware component via the CM. The SQL includes the mySQL API module being implemented by using mysql C libraries. The mySQL API module executes the SQL statements based on its context. If the SQL statement is a select statement to get a result, the mySQL API module calls mysql-Reader API; otherwise, it calls mysql-Nonquery API. The result to the SQL statement is delivered to the middleware component directly. 3.4 Procedure of Context-Aware Services By using our intelligent system architecture, the procedure to execute context-aware services based on contexts can be divided into three phases with nine steps, as shown Figure 5. The first phase is to find remote objects. For this, a moving node broadcasts a connection request signal so as to connect with a middleware component by using Bluetooth (햲). The middleware component covering an interesting area analyzes the signal and establishes a connection to the corresponding moving node (햳). Once the connection between them is established, the moving node delivers its own information to the middleware component (햴). The second phase is to store remote objects. For this, when remote objects is found, the middleware component delivers their information to the context server through a network using TCP/IP (햵). The sever stores into the context database the information delivered from the middleware component (햶). The final phase is to execute the services for the remote objects. For this, the server sends the middleware component context information which is the most suitable to the current situation from the context database (햷,햸). The middleware component executes its predefined programs based on the context information (햹) and notifies the moving node of the corresponding service being executed (햺). The moving node finally executes the service being requested by the middleware component.
Intelligent System Architecture for Context-Awareness in Ubiquitous Computing
Context Server
683
ཟ འ
context DB
ཞG ཡG རG
Network
Middleware (Fixed Node 1)
Middleware (Fixed Node n)
ཛG ཛྷG ཝ ལG Middleware (Fixed Node 2)
Moving Node
Fig. 5. Procedure for context-aware services using our intelligent system architecture
4 Development of Context-Aware Application System We implement the intelligent system architecture using GCC compiler 2.95.4 under Redhat Linux 7.3 (kernel version 2.3.20) with 1.7 GHz Pentium-IV CPU and 512 MB main memory. In order to show its efficiency, we develop a context-aware application system based on the intelligent system architecture. The context-aware application system severs to provide users with a music playing service in ubiquitous computing environment. In the system, when a user belonging to a moving node approaches to a fixed node, the fixed node starts playing the user’s music with his (her) preference according to his location. In general, each user has a list of his (her) music with his preference and even a user can have a different list of his (her) popular music depending on time, i.e., morning time, noon time, and night time. For example, when a user hearing his music with his preference in a place moves to another place, the fixed node located in the previous place stops playing his music while the fixed node in a new place starts playing his music. The fixed node differentiates a user from another and records the time when a user enters into the area of the fixed node in order to play his preferred music depending on time. In the context server, a user record for the music playing application service has six attributes, such as User_ID, User_Name, Location, Music_M, Music_A, and Music_E. The User_ID serves as a primary key to identify a user uniquely. The User_Name means a user name and the Location means a user’s current location which can be changed whenever a middleware component finds the location of a moving object. Finally the Music_M, the Music_A, and the Music_E represent his (her) preferred music file in the morning time, the noon time, and the night time, respectively. The context server manages a list of music files for a user, processes queries given from a fixed node, and delivers the corresponding music file to the fixed node by its request.
684
J.-W. Chang and S.-T. Hong
We develop our context-aware application system providing a music playing service by using affix 2.0.2 as a Bluetooth device driver protocol and by using GCC 2.95.4 an a compiler, under Redhat Linux 7.3 (kernel version 2.4.20) with 866 MHz Pentium-III CPU and 64 MB main memory. In addition, the Bluetooth device follows the specification of Version1.1/Class1 and makes a connection to PCs using USB interfaces [11]. We implement our context-aware application system in such a way that it becomes a general-purpose system easily by changing a communication module and a DMBS module because it is implemented by a set of modules. To determine whether or not our context-aware application system implemented works well, we test it by adopting a scenario used in Cricket [12], one of the MIT Oxygen project. For this, we test the execution of our context-aware application system in the following three cases; the first case when a user covered by a moving node approaches to a fixed node or move apart from it, the second case when two different users approaches to a fixed node, and the final case when a user approaches to a fixed node at different times. Among them, because the first case is the most general one, we will explain it in more detail. For our testing environment, we locate two fixed nodes in the database laboratory (DB Lab) and the media communication laboratory (Media Lab) of Chonbuk National University, respectively, where the middleware component can detect a moving node by using Bluetooth wireless communication. There is a corridor between DB Lab and Media Lab and its distance is about 60 meter. We test the execution of our context-aware application system in a case when a user having a moving node moves from DB Lab to Media Lab or in a reverse direction. Figure 6 shows a testing case when a user having a moving node approaches to a fixed node. First, the fixed node receives a user name from the moving node as the moving node is approaching to it (햲). Secondly, the fixed node determines whether or not the information of the user has already been stored into a server. If it does, the context server searches the music file belonging to the user in a current time and downloads the music file from the database (햳). In case when the fixed node detects that a user is too far to communicate with it, the fixed node stops the process to play music and it removes the music playing process. To analyze the performance of our context-aware application system, we measure an average time by adopting a boundary detection of beacons used in Cricket. Table 1 shows the average time to aware contexts. First, as a moving node is approaching to a fixed node, it takes 1.34 second for the fixed node to make a connection with the moving node. It means the time for the fixed node to detect the presence of a moving node when a moving node enters into the communication boundary of the fixed node. The time mainly depends on the specification of Bluetooth wireless communication. Secondly, it takes 0.5 second for the fixed node to start music playing service after making the connection between them. It means the time for the fixed node to search the profile of the corresponding user and to call the module to play music. The searching time for a user’s profile is dependant both on the packet transfer time of TCP/IP and on the DBMS performance of the context server. Finally, as a moving node is moving apart from a fixed node, it takes 1.45 second for the fixed node to make a disconnection to the moving node. It means the time for the fixed node to detect the absence of the moving node. The time is relatively long because the kernel tries to communicate with the moving node even though the moving node is beyond the communication boundary of the fixed node. Therefore, it is very reasonable for the
Intelligent System Architecture for Context-Awareness in Ubiquitous Computing
685
Fig. 6. Testing case when a user is approaches to a fixed node Table 1. Average time for connection and disconnection activities
Time(sec)
Average time for a middleware component to make a connection to a moving node Average time for a middleware component to start a music playing service
1.34
Average time for a middleware component to make a disconnection to a moving node
1.45
0.50
fixed node to set the time limit to two seconds. If it takes long time for a fixed node to establish a connection to a moving node and to detect a context from it, a user may consider the situation as a fault. Because the detection time for a context is less than two seconds, the context-aware application program is reasonable for the music playing application service.
5 Conclusions and Future Work In this paper, we designed and implemented an intelligent system architecture for supporting a context-aware application system in ubiquitous computing. The middleware component of our intelligent system architecture played an important role in
686
J.-W. Chang and S.-T. Hong
recognizing a moving node with mobility by using Bluetooth wireless communication as well as in executing an appropriate execution module according to the context acquired from a context server. In addition, its context server functions as a manager that efficiently stores into database server context information, such as user's current status, physical environment, and resources of a computing system. To verify the usefulness of our intelligent system architecture, we developed our context-aware application system based on it, which provided users with a music playing service in ubiquitous computing environment. We tested it by adopting a scenario used in Cricket, one of the MIT Oxygen projects. It was shown that it took about 1.5 seconds for our contextaware application system to make a connection (or disconnection) between a fixed node and a moving node, thus being considered reasonable for our music playing application service. As future work, it is required to study on an inference engine to acquire new context information from the existing contexts. Acknowledgement. This work is financially supported by the Ministry of Education and Human Resourses Development (MOE), the Ministry of Commerce, Industry and Energy (MOCIE) and the Ministry of Labor (MOLAB) through the fostering project of the Lab of Excellency.
References 1. Banavar, G., Bernstein, A.: Issues and Challenges in Ubiquitous Computing: Software Infrastructure and Design Challenges for Ubiquitous Computing Applications. Communication of ACM 45(12), 92–96 (2002) 2. Weiser, M.: The Computer for the Twenty-First Century. Scientific American, 94–104 (September 1991) 3. Dey, A.K., Abowd, G.D.: Towards a Better Understanding of Context and ContextAwareness. In: CHI 2000 Workshop on the What, Who, Where, When, and How of Context-Awareness (2000) 4. Dey, A.K.: Understanding and Using Context. Personal and Ubiquitous Computing Journal 5(1), 4–7 (2001) 5. Choy, K.L., Lee, W.B.: A multi-agent-based architecture for enterprise customer and supplier cooperation context-aware information systems. In: Proc. of 3rd Int’l Conf. on Autonomic and Autonomous Systems (2001) 6. Couderc, P., Kermarrec, A.M.: Improving Level of Service for Mobile Users Using Context-Awareness. In: Proc. of 18th IEEE Symposium on Reliable Distributed Systems, pp. 24–33 (1999) 7. Harter, A., Hopper, A., Steggles, P., Ward, A., Webster, P.: The anatomy of a Contextaware application. Wireless Networks 8(2/3), 187–197 (2002) 8. Yau, S.S., Karim, F.: Context-sensitive Middleware for Real-time Software in Ubiquitous Computing Environments. In: Proc. of 4th IEEE Symposium on Object-oriented Real-time Distributed Computing, pp. 163–170 (2001) 9. Cheverst, K., Davies, N., Mitchell, K., Feiday, A.: Experiences of Developing and Deploying a d Tourist Guide: The GUIDE Project. In: Proc. of 6th Int’l Conference on Mobile Computing and Networking (2001) 10. Bluetooth Version 1.1 Profile, http://www.bluetooth.com 11. Affix: Bluetooth Protocol Stack for Linux, http://affix.sourceforge.net 12. Priyantha, N.B., Chakraborty, A., Balakrishnan, H.: The Cricket Locaion Support System. In: 6th ACM/IEEE Int’l Conf. on Mobile Computing and Networking (MOBICOM), pp. 32–43 (2000)
User-Based Constraint Strategy in Ontology Matching Feiyu Lin and Kurt Sandkuhl J¨ onk¨ oping University, Gjuterigatan 5, 551 11 J¨ onk¨ oping, Sweden {feiyu.lin, kurt.sandkuhl}@jth.hj.se
Abstract. Ontologies, as essential elements of the semantic web, have been developed for various purposes and in many different domains. Many ontologies are quite extensive and sophisticated, capturing essential knowledge in a domain. However, ontology users or engineers do not only use their own ontologies, but also want to integrate or adapt other ontologies, or even apply multiple ontologies to solve a problem. As ontologies themselves can be heterogenous, it is necessary to find ways to integrate various ontologies and enable cooperation between them. Ontology matching for finding similar parts in the source ontologies or finding translation rules between ontologies is an important first step. Different strategies (e.g., string similarity, synonyms, structure similarity and based on instances) for determining similarity between entities are used in current ontology matching systems. A lot of existing research work in automatic matching does not sufficiently take into account the user’s requirements in the ontology matching process. In real world, the user’s or domain expert’s requirements play an important role. In this paper, we propose to integrate user-based constraint strategy in ontology matching. The proposed approach includes excluding rules, dominant key and extended key definition and extended key aggregation.
1
Introduction
Ontologies, as essential elements of the semantic web, have been developed for various purposes and domains. Many ontologies are quite extensive and sophisticated, capturing essential knowledge in a domain. Ontologies are considered an important contribution to solving the data heterogeneity problem. However, ontologies themselves can also be heterogeneous [1]. Klein categorized possible mismatches of the ontologies heterogeneity by language level and ontology level [2]. On the other hand, ontology users or engineers do not only use their own ontologies, but also want to integrate or adapt other ontologies, or even apply multiple ontologies to solve a problem. In this context, it is necessary to find ways to integrate various ontologies and enable cooperation between them. Ontology matching for finding similar parts in the source ontologies or finding translation rules between ontologies is an important first step. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 687–696, 2008. c Springer-Verlag Berlin Heidelberg 2008
688
F. Lin and K. Sandkuhl
In database application domains, schema matching approaches can be classified as schema-level or instance-level approaches. Schema-level matchers only consider schema information (e.g., the properties of elements, name, description, data type, relationship types (part-of, is-a, etc.), constraints, schema structure), but no information about instances. Instance-level matchers use insight into the contents and meaning of schema elements. The main benefit of evaluation instances is an exact characterization of the actual schema elements [3]. In some sense, ontologies and database schemata are quite similar; for example, both provide a vocabulary of terms, both constrain the meaning of terms used in the vocabulary. Thus, the solutions in schema matching could be beneficial to ontology matching [4]. Research work in this paper focuses on instance-level part of ontology matching, i.e. we assume the schema-level matching has been performed a-priori, for example, different schema-level strategies (e.g., string similarity, synonyms, structure similarity) when similarity between entities was calculated. A lot of existing research work in automatic matching does not sufficiently take into account the user’s requirements in the ontology matching process. In real world, the user’s or domain expert’s requirements play an important role. For example, during a crime investigation process, DNA should be the primary key to identify people from the expert’s view. If the DNA of two instances in different ontologies are matching, these two instances can be identified as equal and can be integrated. In this paper, we propose a user-based constraint strategy and how to integrate this strategy in ontology matching. The rest of this paper is organized as follows. In section 2, we give an example to motivates this research. In section 3, we describe existing approaches. In section 4, we propose the user-based constraint strategy. Applying user-based constraints in multi ontology matching is discussed in 5. In section 6, one case is studied. Conclusions are given in section 7.
2
Use Case Scenario
In the following we show an example motivating the use of user-based constraint. Based on this example, we first discuss why user-based constraints are needed and then discuss the approach in the context of the general ontology matching problem. We will consider the crime investigation process as example. If the corpse of a dead person was discovered, the police needs to answer different questions, e.g., what is the name of the corpse? was he/she murdered? who is the suspect? etc. Since the crime investigation domain is complex and dynamic, we need new ways of information extraction and knowledge representation which are based on ontologies [5]. However, an ontology allowing for the machine to fully understand the problem would probably be too complex. Suppose we have three ontologies: corpse, missing people and suspect. Partial corpse(see Figure 1, some parts of corpse ontology is based [5]) ontology has different properties which are connected to concepts, e.g., DNA, height, sex,
User-Based Constraint Strategy in Ontology Matching
689
Corpse
body character
estimate death time
hair
criminal character
age
modus operandi
situation of crime
sex
blood type fingerprint
forensic evidence
DNA
height
cloth
fingerprint
type of crime
circumstance aim of crime
motive of crime
biological trace
footprint
blood type
sweat hair
skin saliva
Fig. 1. Partial Corpse Ontology
missing people dressing
sex professional birthday
missing date
fingerprint hair color
height blood
DNA
social relationship
name
enemy
family friend first name
colleague
last name middle name
Fig. 2. Partial Missing People Ontology
age, blood type, hair, estimate death time, criminal character, etc. Partial missing people ontology (see Figure 2) has different properties which connected to concepts, e.g., DNA, height, sex, age, blood, hair color, name, birthday, missing date, etc. The suspect ontology (see Figure 3) contains suspect characters such as fingerprint, motive of crime, availability of crime time, etc. We assume there are instances in the corpse, missing people and suspect. The necessity of user-based constraints can be motivated by the following aspects: – There is no primary key in the ontologies. For example, to identify whether the corpse is in the missing people ontology, the DNA could be the primary key. Even with the result corpse.DN A = missing people.DN A from the schema-level matching, the machine can’t know the importance of the DNA concept during the matching. The user or domain expert need to set the primary key when 1 − 1 correspondence is needed in the ontology matching. – There is no extended key in the ontologies. If there is no common key, the combination of properties can be used to calculate the similarity of instances.
690
F. Lin and K. Sandkuhl
suspect fingerprint
footprint availability _of_crime _time
motive of crime habit
professional
first_name
sex
last_name
age society relationship
blood
Fig. 3. Partial Suspect Ontology
For example, the combination of properties such as height, age, cloth, hair can be used to calculate the matching probability between the corpse and all missing people. – Semantic constraints are required. For example, neither automatic ontology matching on schema-level nor on instance-level can find the relationship between age in corpse and birthday in missing people. The user or domain expert needs to set correspondences such as age = born year(birthday) − current year. – Mismatches can be the result of automatic schema-level matching. For example, there are two blood type in the corpse ontology (see Figure 1), one is the corpse’s own blood type which belongs to body character. The other one is another person’s blood which is found in the surrounding of the crime event or on the corpse. After automatic schema-level matching between corpse and missing people, two blood type in corpse will be found corresponding to blood in missing people. The user should clarify that body character’s range emphblood type is corresponding to blood in missing people. But if ontology matching is between corpse and suspect, user need claim that biological trace’s range blood type in corpse is corresponding blood in suspect.
3
Related Work
Euzenat summarized different constraint-based approaches, e.g., comparing entities’ properties, the range of properties, entities’ cardinality or multiplicity, and transitivity or symmetry of properties [6]. For example, if the key id in P roduct has the same type as the key isbn in V olume, theses properties should be corresponding if the classes are the same. However, these approaches are based on the internal structure of entities. In the database integration research area, the entity identification problem is to determine whether different object instances from multiple databases refer to the same real-world entity. Lim et al. summarized five categories of approaches to solve the entity identification problem, e.g., using key equivalence, user specified equivalence, use of probabilistic key equivalence, use of probabilistic attribute
User-Based Constraint Strategy in Ontology Matching
691
equivalence, use of heuristic rules. However, these approaches have limits. They proposed the use of extended key and semantic constraint information to determine the same object in different databases based on the instances level [7]. In this approach, the extended key is the set of keys or identity rules which is needed to identify an instance in the real-world. The database administrators supply the semantic constraint information, which is formulated as identity rules and used as part of extended key. We adapt this extended key approach and apply it in ontology matching. We also propose the polygon-based method to aggregate extended key result. Ehrig and Sure mentioned that their implemented approach was based on manually encoded mapping rules [8]. The ontology domain expert formulated machine-interpretable rules for giving the hints on whether two entities are identical. But the ontology matching itself is not yet encoded through rules.
4
User-Based Constraint Strategy
In this section, we first introduce the general approach of the user-based constraint strategy, then we explain the parts of the approach by using examples. The identification of a corpse in the missing people problem mentioned in section 2 is chosen as our example. We assume that the automatic schema-level ontology matching has been performed a priori. Before applying instance-level matching, the user can define constraints as follows. 4.1
Define Semantic Constraint
Based on automatic schema-level results, the user or domain expert has to supply a semantic constraint complement. O1 and O2 represent two ontologies, two concepts C1 and C2 are from these different ontologies O1 and O2 , the correspondence between C1 and C2 is expressed as O1 .C1 = O2 .C2 , required operators (e.g., + −, etc.) can be included in the equation. For example, the correspondence between age in corpse and birthday in missing people can be set as rule on schema-level: corpse.(body character).age = (missing people).((born year)(birthday)) − (current year) The user should be able to express real-world concept correspondences in order to enhance schema-level results of ontology matching. For example, in the ontology matching between corpse and missing people, the user should declare that the similarity of body character’s range blood type with blood in missing people follows the following rule: corpse.(body character).(blood type) = (missing people).blood 4.2
Define Excluding Rules
Assuming that i1 and i2 are the instances of O1 and O2 respectively, assuming there exists concept correspondence O1 .C1 = O2 .C2 , then the excluding rule is: V(O1 .i1 ).C1 V(O2 .i2 ).C2 → (i1 = i2 ).
692
F. Lin and K. Sandkuhl
V means the concept’s value of the instance. The extended rules should be simple and can get result easily. For example, if the corpse is female, the male instances in the missing people can be excluded immediately. The user can define the rule as: ((corpse.i1 ).(body character).sex = f emale) ((missing people).i2 ).sex = male) → (i1 = i2 ). Another example: if the corpse’s blood type is A, the other blood types’ instances in the missing people can be excluded. A rule to support this example is: ((corpse.i1 ).(body character).blood type = A) ((missing people).i2 ).blood = A) → (i1 = i2 ). 4.3
Define Dominant Key
To set the dominant key identifying the instance is another part of user-based constraint strategy. If there are several dominant keys, the user should define different priorities for them. For the identification of a corpse in the missing people example, fingerprint, DNA and iris can be used to identify the instance in question independently. Due to the difficulty of the procedure, comparing fingerprints is easier than comparing iris; comparing DNA is the most difficult one. The user can define the priority of the dominant key as follows: corpse.f ingerprint > corpse.iris > corpse.DN A 4.4
Define Extended Key
If there is no dominant key for the instances, but there is a combination of concepts or properties applicable as extended key which can calculate the similarity of instances, the user can define the rules of combination to be applied. For example, if there are no records about fingerprints, DNA and iris in the missing people ontology, the combination of person’s age, height, hair, dressing can be a good reference to identify the corpse, of course the death time should nearby or after the date a person disappeared. For example, the corpse age is 30, height is 180cm, hair is brown, dressing is T-shirt, estimate death time is 2007-11-01. One missing people in the ontology information: birthday is 1975-10-01, height is 178cm, hair is gold, dressing is T-shirt, missing date is 2007-10-28. Assuming the user defines that every 2 years age difference leads to 10 percent dissimilarity, the age similarity between corpse and missing people would be: V
−V
1 − (corpse.i1 ).(body character).age (missing 2people).((born year)(birthday))−(current year) ∗ 0.1 For all concepts or properties in the extended key, the user can define special rules for the correspondence similarity. Based on the user’s rules, assume that we get the following similarity results: 4.5
Polygon-Based Aggregation Extended Key
In this section, we extend our polygon-based aggregation approach introduced in [9]. We aggregate extended keys by using polygons. To compare similarity
User-Based Constraint Strategy in Ontology Matching
693
Table 1. Extended Key Similarity Between Corpse and Missing People Corpse Missing People age (37) birthday (1975-10-01) height (180cm) height (178cm) estimate death time (2007-11-01) missing date (2007-10-28) cloth (T-shirt) dressing (T-shirt) hair (brown) hair color (gold)
Similarity 0.9 0.9 0.8 1 0.7
SubArea (Missing people) = 1.7404 BigArea (Corpse) = 2.3776 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Fig. 4. Comparing Two Instances in Polygons
between instances, we compare the area of polygons corresponding to each instance, i.e. if polygons have exactly the same area, the similarity between the two instances represented by the polygons is maximal. We are using the following rules to map extended key values to polygons: – Choose the standard ontology. All the values of the standard instances are marked as 1 unit when creating the polygon. – Circles radiuses are used to present the similarities instances values. The circle is divided the number of similaritys elements. The points are added clockwise. The polygon is constructed by connecting the points created. When similarity is 0, this node is not taken into account when connecting the points. For example, the corpse instance is chosen as the standard instances, there are five elements and every 72◦ has one radius. Every radius is 1 unit. The polygon of corpse instance is constructed as Figure 4 (the big one). Using the same way, the instance of missing people is drawn as Figure 4 (the small polygon with the filled color) which radius value is based on the similarity results (see table 1). The similarity of two instances depends on their polygons areas result then would be: 1.7404 = 0.7320 (1) 2.3776
694
5
F. Lin and K. Sandkuhl
User-Based Constraint in Multi Ontology Matching
After processing automatic schema-level and user-based constraint instance matching, the source ontology instances are transformed into the target ontology instances according to the used relations. This leads to an enrichment of the knowledge captured in the source ontology. The enriched knowledge of the ontology is useful in multi ontology matching, as comparisons will be possible that could not be performed without the enrichment. There are two ways to represent the enriched onotology. One way is mergering source and target ontologies into a new ontology and combining the instances based on the new ontology structure. The other way is keeping the ontologies independent, but to include the new knowledge based on rules. We focus on the second way in this paper. For example, to answer the question about “who is the suspect? ” in the corpse example, the ontologies for corpse, missing people and suspect need to be considered together. If the corpse can be identified or a correspondence or similarity in the missing people instances is discovered, the corpse instance knowledge is expanded to include missing people instance knowledge, e.g., name, social relationship, professional, etc. To include correspondences of missing people instances into corpse instances, the user can define the rules as: (corpse(ii ) =missing people(i2 )) −→ (Vcorpse(ii ).name = Vmissing
people(i2 ).name ),
(Vcorpse(ii ).prof essional = Vmissing people(i2 ).prof essional ), (Vcorpse(ii ).social relationship=Vmissing people(i2 ).social relationship ), . . . (2) Based on the matching of corpse and suspect ontologies, the criminal character information is compared, e.g., fingerprint, footprint, blood, availability of crime time. However, based on the corpse and missing people ontology matching results, more information can be examined. For example, the social relationship between the corpse and suspect can be helpful to identify the motive of the crime, this can be checked by either looking for the suspect name in the missing people’s social relationship or if the missing people’s name can be found in society relationship. The rules can be formulated as: corpse(ii ) =missing people(i2 )∧ (Vmissing people(i2 ).name ⊆ Vsuspect(i3 ).social −→ (suspect(i3 ) corpse(ii ))
6
relationship )
(3)
Case Study
The ontologies corpse, missing people and suspect are our test examples. Let us assume there is an instance in the ontology corpse, which shall be identified in the missing people ontology that has 1000 instances. The user defined excluding
User-Based Constraint Strategy in Ontology Matching
695
rules are applied first, because comparing the dominate key (like fingerprint, iris and DNA) is more complicated than applying excluding rules, such as excluding all male instances. If 550 persons are male and the others are female, and if the corpse is female, the computation rate is decreased from 1000 instances to 450. Based on user defined priority for the dominant key, other attributes like age, hair, etc. are not compared in this process. If the dominant key values are missing and the corpse can’t be identified this way, the extended key will be applied, i.e. the dominant key will be skipped. User defined semantic constraints can provide more information and improve accuracy of results. Without user defined rules, all the attributes inside the instances would have to be compared, which obviously is less efficient. Furthermore, if results of such a comparison are integrated, this could lead to incorrect results. User defined constraints including rules are helpful in multi ontology matching. These rules make comparisons possible, which are not feasible without such constraints or rules. For example, as shown in section 5, does the suspect have a social relationship with the corpse? This question can’t be answered through comparing corpse and suspect directly. If the corpse can be found in the missing people and if the user defines instance including rules, this question could be answered. The above discussion about the case study shows that our approach can contribute to performance improvement, decreasing computation rate. Furthermore, the approach will lead to imoproved efficiency and exactness. However, the approach requires that the user or domain expert is involved in the ontology matching process. But in many application cases, the user or domain expert anyhow plays a very important role (as shown in the section 2) and should be involved.
7
Conclusions and Future Work
In this paper, we discussed and motivated the need for user-based constraints in ontology matching. User or domain expert need to provide semantic constraints during the ontology matching process. The proposed approach for user-based constraints includes excluding rules, dominant key and extended key definition and extended key aggregation. We illustrated how to integrate user-based constraint in the ontology matching process. We also proposed to use a polygonbased approach to aggregate the extend key’s values when determining similarity. Furthermore, the application of user-based constraints in multi ontology matching was discussed. One of the problems of the polygon-based approach is, as discussed in [9], how to add weights to the polygons reflecting the importance of extended keys. For example, in the case of corpse identification using an extended key (see section 4.5), the user currently cannot set special weights for the key values, such as ”the age value has higher importance as compared to the other keys”. This will be solved in future work. The main limitation of our work currently is that is has not been implemented and tested in real-world settings. Future work will include to integrate user-based constraints into an ontology matching system.
696
F. Lin and K. Sandkuhl
Based on this integration, we will have the possibility to perform experiments comparing performance and results of ontology matching with and without userbased constraints.
Acknowledgements Part of this work was financed by the Hamrin Foundation (Hamrin Stiftelsen), project Media Information Logistics.
References 1. Euzenat, J., Castro, R.G., Ehrig, M.: D2.2.2: Specification of a benchmarking methodology for alignment techniques. Technical report, NoE Knowledge Web project delivable (2004) 2. Klein, M.: Combining and relating ontologies: an analysis of problems and solutions (2001) 3. Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching. The VLDB Journal 10(4), 334–350 (2001) 4. Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Data Semantics (2005) 5. Dzemydiene, D., Kazemikaitiene, E.: Ontology-based decision support system for crime investigation processes. Information Systems Development, 427–438 (2005) 6. Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007) 7. Lim, E.-P., Srivastava, J.P.S.R.J.: Entity identification in database integration. In: Proceedings. Ninth International Conference on Data Engineering, April 19-23, 1993, pp. 294–301 (1993) 8. Ehrig, M., Sure, Y.: Ontology mapping – an integrated approach. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 76–91. Springer, Heidelberg (2004) 9. Lin, F., Sandkuhl, K.: Polygon-based similarity aggregation for ontology matching. In: Frontiers of High Performance Computing and Networking ISPA 2007 Workshops. LNCS, pp. 255–264. Springer, Berlin (2007)
RFID-Based Interactive Learning in Science Museums Yo-Ping Huang1, Yueh-Tsun Chang2, and Frode Eika Sandnes3 1
Dept. of Electrical Engineering, National Taipei University of Technology Taipei, Taiwan 10608 [email protected] 2 Dept. of Computer Science and Information Engineering, Tatung University Taipei, Taiwan 10451 [email protected] 3 Faculty of Engineering, Oslo University College Oslo, Norway [email protected]
Abstract. Science and engineering education play a very important role in the industrial development of a country. Although most people are aware of the importance of science, students often find science boring and uninteresting as traditional textbooks only provide static information. Students who achieve better usually have practical experiences related to the subject or have been exposed to interactive learning tools. In order to promote science education development, a learning system based on RFID is proposed to provide personalized learning service. This paper introduces the architecture of the proposed service and the technologies used including RFID and collaborative filtering. The contributions of the proposed system and how the profiles are used to provide personalization are also discussed. Evaluations of the system reveal that the system inspires and nurtures their interest in science. Keywords: Digital learning, RFID, collaborative filtering, learning assistant service.
1 Introduction Museums and galleries are often provided to raise the awareness in the population for a particular subject, provide inspiration and provide general education. Many such facilities do not sufficiently manage to captivate the audience and visitors consequently do not return. One problem is that there may be a lack of trained guides present. Guides are sometimes only offered at certain times and only for groups of visitors. A good guide can made all the difference between a mediocre and a memorable museum experience as a talented guide is able to captivate the audience with their stories. To combat the lack of human guide visitors can often borrow devices including headsets, handhold controllers and audio playback devices. Pre-recorded records are then used to present exhibition related introductions to the visitors. However, static audio is not interactive and may be too uninteresting for very young visitors with a short concentration span. Another problem is that some visiting routes may be blocked by visitors grouped around other popular exhibitions. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 697–709, 2008. © Springer-Verlag Berlin Heidelberg 2008
698
Y.-P. Huang, Y.-T. Chang, and F.E. Sandnes
In order to make the museum guide experience more dynamic multimedia content and tests (learning objects) designed by educators who are expert in the specific domains is combined with an interactive learning assistant. The learning assistant is based on a visitor’s personal profile [14-17], RFID technology where each visitor carries a card with an embedded RFID tag with a unique identification number, collaborative filtering and a history of learning records are combined to provide a personalized learning service. The service is called The Hands Free Interaction Guide System. Users obtain learning materials and questions using their RFID-equipped membership card to touch the sensor installed at the viewing kiosks [1] to obtain the personalized learning service. The system provides an optimal route recommendation derived from user’s browsing history [2-4]. This system has been in active use at the National Science Education Center (NTSEC) in Taipei, Taiwan since October, 2007. This paper is structured as follows. Section 2 introduces the motivation, the system architecture and selected components of interest from the system. The strategy for generating personalized interactive contents by collaborate filtering is illustrated in the third section together with a brief introduction to RFID technology. Next section 4 presents an evaluation of the system based on visitors of different age groups. Section 5 concludes the paper.
2 Motivation and Technologies The goal of the system is to make science education more attractive. Visitors are not always able to be given a guided tour when they come to the education center. However, the learning outcome is related to the skills of the guides that explain the exhibitions. In order to provide a self-learning environment, interactive kiosk were designed to provide relevant content according to age when the visitors are browsing unsupervised. The objective was to replace the professional guide. Guides use various pedagogical strategies to make the contents interesting and attractive. Different descriptions are used depending on the age of students and the students are asked suitable questions to stimulate the learning process. In order to provide personalized learning, RFID technology is used to associate each user with a unique identification. 2.1 Radio Frequency Identification Radio Frequency Identification (RFID) technology comprises a reader and a tag. The reader receives the identity of an object from the embedded tag wirelessly using radio waves. RFID has been successfully used in logistics, healthcare, small value payment and personal ID. Wal-Mart is one well-known case. Researchers have also experimented with embedding a small capsule containing RFID tag in their arms to assess its effectiveness and practicability as a form of personal ID for physical access control and healthcare. In our system, the embedded RFID membership card is used to uniquely identify visitors. A trial card is also provided to occasional visitors, but this only provides limited service. The trial card cannot be used to gather personalized information.
RFID-Based Interactive Learning in Science Museums
699
When the visitors touch the kiosk sensor with their cards collaborative filtering is used to make recommendations [5]. Collaborative filtering is different to rule-based mining and classification [11, 12]. These are used for providing better learning recommendations. The science education center has a range of interactive displays, such as simple electric circuits, vacuum discharge, and solar batteries. Some of the displays are permanent and others are only on display temporary for a particular theme or exhibition. 2.2 Collaborative Filtering Collaborative filtering [6, 7] utilizes the profile of the visitors as the building blocks for making recommendations. Such information is gained by the browsing and learning activities of visitors who have similar or related interests and learning behaviors. Item-based filtering is used as the recommendation mining method. The purpose of the adopted methodology is to connect different science subjects based on the visitor selections and educators’ pre-defined rules based on their teaching experiences. The selection records of visitors can be used to establish associations between subjects. For example, the interactivity kiosk collects visitors’ responses to a learning test, i.e., wrong or right answer [8, 10, 13, 18]. After connecting each subject covered by the visitors, their learning habits are estimated. Thus, the methodology can retrieve a relationship to other subjects when the visitor is learning a specific subject. The following equation is used for calculating the similarity of learning efficiency between subjects.
∑
N
sim(a, b) = corrab =
j =1
( paj − pa )( pbj − pb )
∑j=1( paj − pa )2 N
∑j=1( pbj − pb )2 N
.
(1)
Then, the most suitable subject for recommendation is determined by:
∑ ( p − p ) × sim(c, i) Score(c, j ) = , ∑ sim(c, i) ij
i∈H
i
(2)
i∈H
where paj represents subject a’s rating by visitor j and pa is the average rating from all visitors. As illustrated in Table 1, subjects viewed by the visitor are assigned 1 while the subjects are assigned 0 if they were not viewed. In this example, there is a high similarity of learning between subjects B and E. Therefore, any visitor who had viewed subject B but has not viewed subject E should be recommended subject E. After the recommendations are calculated, the system suggests that visitor E views subject E after viewing subject B. The system can also employ pre-fixed rules based on the recommendation of educators and exhibition managers. This may be useful for suggesting browsing routes for new exhibitions that lack a browsing history. The browsing records of visitors and their positions have to be continually monitored for the suggestions to be valuable [9].
700
Y.-P. Huang, Y.-T. Chang, and F.E. Sandnes Table 1. Example of collaborative filtering Visitor A Subject A Subject B Subject C Subject D Subject E
0 1 0 1 1
Visitor B Visitor C 1 1 0 0 1
0 0 1 0 0
Visitor D
Visitor E
0 1 0 0 1
1 1 0 0 0
3 System Components and Architecture To ensure adaptability, expansibility and stability, the system has been divided into two components, namely the interactivity kiosks and the central server. Our implementation covers 20 different science subjects distributed across three floors. Each interactivity kiosk services a single subject. These 20 kiosks are connected to a single central server that collects all the browsing records and calculates personal learning profiles for the visitors. The personal learning history can later be accessed from the website using a membership account. Physical constraint forced the system to use wireless network connections for most interactivity kiosks. Consequently, to ensure stability and high quality of service a three-layer control design was adopted to prevent the learning service from crashing. As illustrated in Fig. 1, the network reconnection checking module is located at the first layer to ensure all connections are running correctly. The module attempts to reconnect the central server cyclically whenever a network connection is broken. The reason to use a wireless network is due to the physical constraints of installing a wired network in the exhibition area. The limitation also causes some connection problems in the system design. A reconnection module is deployed in the network checking layer to resolve wireless network interruptions, prevent resource exhaustion and prevent wireless devices from overheating. Network failures usually occur unpredictably. Still, the service kiosk must maintain users’ learning records. The reconnection module therefore notifies the learning service layer which uploads the learning records once the connection is available. This procedure guarantees that visitors can retrieve their records despite network failures. The network checking layer ensures a stable and robust connection. In the second layer the learning service provides teaching materials and tests according to the visitor’s profile. Every kiosk is equipped with an embedded RFID reader near its screen. The reader senses the RFID tag which is embedded in the visitor’s member card for verifying the visitor’s profile that is received wirelessly from the central server. The central server recommendation engine determines the suitable learning materials. For each subject there are three levels of learning materials. Each level comprises unique multimedia contents and tests. Visitors view the contents via an interactive interface using a trackball or a touch panel. In total the system contains 60 independent digital learning objects (20 kiosks have three levels each). The system automatically selects a suitable level according to the visitor’s profile, but the visitor can also choose a
RFID-Based Interactive Learning in Science Museums
701
different level manually. Fig. 2 shows that the visitors name is shown after the visitor login the system and the user may select the level manually. After the visitor has watched the media, the system asks the visitor three questions to assess the learning behavior. These personal records are stored on the central server and used for subsequent recommendations. Although steps are taken to ensure a stable network link, the system may run out of CPU resources. To overcome this problem a resource controller is implemented in the third layer. This controller cyclically monitors the system resource and restarts the learning service when the resource utilization exceeds a given threshold. Through experimentation 80% of CPU usage sustained for 20 seconds was found to be a suitable threshold. This three-layer design is shown in Fig. 1. Each layer is subdivided into several sub-objects. The first layer comprises the reconnection and network status monitor. The RFID, multimedia and recommendation modules are implemented on the interactive kiosks and the server. The third layer contains the system reloader that restarts the service and resource surveillance that invokes the reloader.
Resource Controller Layer
Learning Service Layer
System Reloader Resource Surveillance RFID
Multimedia
Recommendation
Network Checking Layer
Reconnection Network Status
Fig. 1. System architecture
The main service is located in the learning service layer. In this layer, the personalized learning procedure is designed by educators and generated automatically by the server. The learning system architecture is shown in Fig. 2. It is divided into three main parts, namely login, learning and response. The components enclosed by rectangles and ellipses are located on kiosk and server, respectively. Initially, visitors log into the
702
Y.-P. Huang, Y.-T. Chang, and F.E. Sandnes
system using their member card. When the visitor is logged in, a personalized welcome message is sent from the server. It is generated according to the unique id containing the visitor’s name and customized message, for example, birthday, Christmas greeting, etc. The server also identifies which materials and tests that are most suitable for the visitor. The learning record is sent to the server and stored. The interactive kiosk will then switch to the welcome screen to await the next visitor. The process is illustrated in the Fig. 2. Visitor
Login Part
Personalized Welcome
Topic Displayer
Learning Part
Suitable Learning Materials
Suitable Tests
Response Part
Personalized Response and Recommendations
Learning Record
Fig. 2. The system flowchart
The hardware configuration of the system is shown in the Fig. 3. The interactive kiosk includes an RFID reader, display and wireless device. Each kiosk is pre-stored all the learning materials to avoid any unpredictable crash. Each kiosk stores sixty videos and up to 600 questions. To prevent malicious attacks and damage, each service kiosk is made of resistible materials and the display panel is protected by a transparent plastic plate. The server stores the visitor’s learning history into the learning material database and responds with the recommendations based on the visitors’ learning histories. A single server is capable of serving more than 100 interactive kiosks based on the available network bandwidth and the processing power of the machine. In this specific case, the implemented server only serves 20 kiosks.
RFID-Based Interactive Learning in Science Museums
703
Centre Server Learning Material Database
ΞΞ Interactive Station Visitor
Interactive Station Visitor
Fig. 3. The hardware setup
In collaboration with a team of educators a series of multimedia learning objects were designed. An example of the personalized welcome message is shown in Fig. 4(a) and Fig. 4(b), respectively. The system also connects to the server for retrieving the profile of the current visitor to provide more services as shown in Fig. 4(b). The kiosk has set up a variety of celebration events including national birthday celebration, Chinese New Year, father’s day, mother’s day, Christmas, etc. The interface will also remind the membership expiration date. The system selects one level from three available learning materials for the visitor, depending on the visitor’s historical record. Each level contains three different questions that are forwarded to the user after an instructional video is played. The level selection and a teaching example are illustrated in Fig. 4(c) and 4(d). These questions are devised by educators who have a strong track record in science education and supervision. In most cases, the visitors prefer to watch the flash video with rich anime rather than textual descriptions. Thus, most learning materials are expressed by anime to attract visitors’ attention. However, not every subject can be easily transformed to simple anime and sometimes a few textual materials are still used for learning at some specific exhibitions. Fig. 4(e) shows the question interface. After answering the system stores the answer in server database. When the learning session is finished, the kiosk will retrieve the visitor’s email account from the database and ask whether to send the learning records to the visitor. The user can choose whether an e-mail is to be forwarded with information about the visit. A website is provided that allows the visitors to later browse and download visitor-specific items related to their museum visit. These learning histories can later be browsed at home through a web-based interface. Visitors
704
Y.-P. Huang, Y.-T. Chang, and F.E. Sandnes
can also print their personal learning record for reference. Generated recommendations presented on the interactive kiosks are shown in Fig. 4(f). If visitors leave their e-mail address, the system can also send the visitor today’s learning history via e-mail. The interface is shown in Fig. 4(g). Fig. 4(h) shows a questionnaire for gathering comments from visitors with the purpose to improve the quality of the learning materials and tests.
Fig. 4(a). Welcome
Fig. 4(b). Celebration
Fig. 4(c). Level selection
Fig. 4(d). Video
Fig. 4(e). Learning test
Fig. 4(f). Recommendation
RFID-Based Interactive Learning in Science Museums
Fig. 4(g). Learning record
705
Fig. 4(h). Visitor response
4 System Evaluation The system has been deployed for several months starting October 2007. During the period, the system has been used for almost 8000 sessions. To assess the performance of the system, statistics were collected from Oct. to Dec. 2007. These are presented in Fig. 5 and Fig 6, respectively, and present usage statistics for each interactive kiosks and the learning performance associated with different age groups. 4.1 The Usage Frequency for the Interactivity Kiosks Fig. 5(a) and 5(b) show the usage frequencies for the system. Fig 5(a) shows that most members visit the fourth kiosk the most. The reason is that this interactive kiosk is near the entrance. Some kiosks are less used and this may be due to their textual interface that is less attractive. The result shown in Fig. 5(a) implies that most visitors are likely to get a superficial understanding through quick and casual observation rather than to learn a specific subject during their visit to the museum. Another interesting observation is that the tenth to the fourteenth kiosks provide similar statistics that may arise from their closeness of installation and the related subjects to each other. A visitor
Fig. 5(a). Usage statistics for members
706
Y.-P. Huang, Y.-T. Chang, and F.E. Sandnes
Fig. 5(b). Usage statistics for non-members
browsing one kiosk may continue to find related subjects of interest at the neighboring kiosks. Therefore, it is suggested subjects should be presented on kiosks in close vicinity with related subjects. Moreover, the relationship between subjects should be taken as an important factor when establishing recommendations and routing rules. Compared to Fig. 5(b) which shows the usage statistics for non-member, most non-members preferred the sixteenth and nineteenth kiosk which cover Vacuum Discharge and Conservation of Angular Momentum. Vacuum discharge shows a vacuum ball that contains electricity light and Conservation of Angular Momentum shows a rotating plate that visitor can stand on to experience the angular momentum. Obviously, those two exhibitions have more interactive and interesting effects for visitors. These statistics may verify that most visitors are not seriously in learning new subjects but are only interested in playing newfangled stuff. Therefore, we should design more special and learnable exhibitions to attract non-member visitors. The statistics reveal that non-members are more interested in dazzling exhibitions than members. One reason for this may be that members want a deeper understanding of a specific subject. The results tell us that more dazzling and interacting exhibitions are needed if we want to get the general public interested in science. The total statistics are shown in Fig. 5(c). The most used kiosk is the Vacuum Discharge and the least popular kiosk is the Dangerous of Our Environment because it is conspicuously located and the materials used are more conventional than the other items on display.
Fig. 5(c). Total usage statistics
RFID-Based Interactive Learning in Science Museums
707
4.2 The Learning Performance in Different Age Groups Fig. 6(a) and 6(b) present the ratio of correct answers. Fig. 6(a) shows that the rate of correct answers from students of ages from fifteen to twenty is zero. This suggests that the system does not appeal to students in this age segment. However, another explanation is that the educational strategy of NTSEC focuses on elementary and junior high school students; therefore, not many high school students showed activities in the database. Moreover, for the age group of 20 and up, the questions are easy enough for them to answer and correct answer rate is very high. The results for every other age segment display a high correct learning rate of more than 80%. This suggests that the materials are suitable for both children and adults. According to Fig. 6(b) students younger than fifteen still makes some mistakes after having viewed the learning materials. Based on this result the material was reviewed. It
Fig. 6(a). The superior learning performance
Fig. 6(b). The inferior learning performance
708
Y.-P. Huang, Y.-T. Chang, and F.E. Sandnes
was found that the vocabulary is too complex and does not match the vocabulary level of young children. This material is currently being revised to such that it is more suitable for younger visitors.
5 Conclusion and Future Work In this paper, we presented a personalized learning assisting system based on RFID technology and recommendation mining method. The proposed system has been tested for several months in a museum. Educators have endorsed the system. Future work includes combining RFID with a camera for personalized capture, remote teaching, in-door location for social contact and routing recommendations. A touch panel will be adapted for a richer interactive experience. By providing the learning interface and agent into visitor’s cell phone for better personal learning is being considered for the future design.
Acknowledgments This work is supported by National Science Council, Taiwan under Grants NSC 95-2745-E-036-001-URD and NSC 95-2745-E-036-002-URD.
References 1. Bellotti, F., Berta, R., Margarone, M.: User testing a hypermedia tour guide. IEEE Pervasive Computing 1(2), 33–41 (2002) 2. Davies, N., Cheverst, K., Mitchell, K., Efrat, A.: Using and determining location in a context-sensitive tour guide. IEEE Computer 34(8), 35–41 (2001) 3. Derntl, M., Hummel, K.A.: Modeling context-aware e-learning scenarios. In: Proc. of the 3rd IEEE International Conference on Pervasive Computing and Communications Workshops, Kauai Island, HI, USA, March 2005, pp. 337–342 (2005) 4. Facer, K., Joiner, R., Stanton, D., Reid, J., Hull, R., Kirk, D.: Savannah: mobile 75 game and learning. Journal of Computer Assisted Learning 20(4), 399–409 (2004) 5. Billsus, D., Pazzani, M.J.: Learning collaborative information filters. In: Proc. of International Conference on Machine Learning, Madison, WI, USA, pp. 46–54 (July 1998) 6. Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., Sartin, M.: Combining Content-Based and Collaborative Filters in an Online Newspaper. In: Proc. of ACM SIGIR Workshop on Recommender Systems, Berkeley, CA (1999) 7. Good, N., Schafer, J., Konstan, J., Borchers, A., Sarwar, B., Herlocker, J., Riedl, J.: Combining collaborative filtering with personal agents for better recommendations. In: Proc. of the Conference of the American Association of Artificial Intelligence, Orlando, FL, USA, July 1999, pp. 439–446 (1999) 8. Haverkamp, D.S., Gauch, S.: Intelligent information agents: review and challenges for distributed information sources. Journal of the American Society of Information Science 49(9), 304–311 (1998) 9. Bahl, P., Padmanabhan, V.N.: RADAR: An in-building RF-based user location and tracking system. In: Proc. of IEEE Computer & Communications Societies, Tel-Aviv, Israel, vol. 2, pp. 775–784 (March 2000)
RFID-Based Interactive Learning in Science Museums
709
10. Ponnusamy, R., Gopal, T.V.: A user adaptive self-proclamative multi-agent based recommendation system design for e-learning digital libraries. In: IEEE Conference on Cybernetics and Intelligent Systems, pp. 1–7 (June 2006) 11. Wen, Q., He, J.: Personalized recommendation services based on service-oriented architecture. In: IEEE Asia-Pacific Conference on Services Computing, pp. 356–361 (December 2006) 12. Xi-Zheng, Z.: Building personalized recommendation system in e-commerce using association rule-based mining and classification. In: International Conference on Machine Learning and Cybernetics, vol. 7, pp. 4113–4118 (August 2007) 13. Zaiane, O.R.: Building a recommender agent for e-learning systems. In: International Conference on Computers in Education, vol. 1, pp. 55–59 (December 2002) 14. Chen, T., Han, W.-L., Wang, H.-D., Zhou, Y.-X., Xu, B., Zang, B.-Y.: Content recommendation system based on private dynamic user profile. In: International Conference on Machine Learning and Cybernetics, vol. 4, pp. 2112–2118 (August 2007) 15. Puntheeranurak, S., Tsuji, H.: Mining Web logs for a personalized recommender system. In: 3rd International Conference on Information Technology: Research and Education, pp. 445–448 (June 2005) 16. Youji, O., Wakita, R., Yano, Y.: Web based self-directed learning environment using learner’s annotation. In: International Conference on Computers in Education, vol. 2, pp. 1207–1211 (December 2002) 17. Tan, X., Yao, M., Xu, M.: An effective technique for personalization recommendation based on access sequential patterns. In: IEEE Asia-Pacific Conference on Services Computing, pp. 42–46 (December 2006) 18. Furukawa, M., Watanabe, M., Kinoshita, M., Kakazu, Y.: A mathematical model for learning agents on a multi-agent system. In: IEEE International Symposium on Computational Intelligence in Robotics and Automation, vol. 3, pp. 1369–1374 (July 2003)
Real-Time Detection of Passing Objects Using Virtual Gate and Motion Vector Analysis Daw-Tung Lin and Li-Wei Liu Department of Computer Science and Information Engineering National Taipei University 151, University Rd., San-Shia, Taipei, 237 Taiwan
Abstract. Real-time human tracking and pedestrian counting in very complex situations with different directions of motion has been important for video surveillance and our daily life applications. This work presents a virtual gate method for the pedestrian detection without the need to construct a background model a priori. The proposed method utilizes motion estimation with three step search and a novel motion vector analysis algorithm which detects moving objects passing through the gate along any desired direction. This method is particularly applicable to complex situations. The experimental results demonstrate that the proposed strategy is reliable. Keywords: People counting, video surveillance, motion estimation, three step search, motion vector analysis.
1
Introduction
To accurately track and count pedestrians in very complex situations with different directions of motion has been important for video surveillance. Examples include the entrance control of buildings, estimation of customer flow in market places, detection of overcrowding hazards in public areas, and the management of effective departure intervals for mass transportation systems. Vision-based surveillance is a very active research area in the computer vision community. Sacchi et. al. developed a people counting application for tourist site-monitoring [1]. Sidla et. al. presented a vision based pedestrian detection and tracking system which is able to count people in very crowded situations [2]. Snidaro et. al. and Yang et. al. utilized a multi camera approach to conquer the mutual occlusion problems [3,4]. Liu et. al. focused on the problem of segmentation of group of people into individuals with a model based segmentation method and then applied it to a people counting application [5]. Zhao and Nevatia proposed a head detector algorithm utilizing the information of camera calibration and ground plane parameters to estimate the individuals [6]. Several classification
This work was supported in part by the National Science Council, Taiwan, R.O.C. grants NSC96-2221-E-305-008-MY2, NSC 95-2221-E-305 -006 and Ministry of Economics 94EC17A02S1-032, 95EC17A02S1-032.
F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 710–719, 2008. c Springer-Verlag Berlin Heidelberg 2008
Real-Time Detection of Passing Objects
711
methods such as radial basis function network and support vector machine have been adopted to solve the pedestrian identification problem based on the extracted features [7,8]. One popular video-based people counter solution is based on top-view camera configuration [9,10,11,12,3]. However, we consider motion object detection and people counting as an added value to security and safety applications for those systems and cameras have already been mounted for facing the entrance of a particular location. In this paper, we propose a vision-based real-time people counting system for complex situations that is not dependent on an existing building background model. Furthermore, we hope that users can specify any moving direction of people passing through a virtual gate. If a person passes through the gate in a specific direction that users appointed in advance, this system will capture image immediately and analyze it for other applications. Figure 1 shows a scenario of the passing objects through a virtual gate which defined by a user illustrated with red lines. Notably, there are many people passing through this virtual gate. Other objects including vehicles and pedestrian may move in any directions in this scene as well. The objective of the proposed system is to capture the image when people passing through the virtual gate and fit the direction specified by the users. Virtual Gate
Fig. 1. Scenario of the virtual gate (illustrated with red line) and specified direction (denoted with green arrow)
The rest of this paper is organized as follows. Section 2 illustrates the overall architecture and user interface of the proposed system. Section 3 elucidates the technique of motion detection. The principle and algorithm of motion estimation and motion vector analysis are then described. Section 4 then describes the experimental setup. Several simulations of the proposed approach are presented, with experimental results demonstrated the effectiveness of the motion object detection method. Conclusions are provided in Section 5, along with recommendations for future research.
712
2
D.-T. Lin and L.-W. Liu
System Architecture
To deal with complex environmental situations and changes, we do not adopt traditional background modeling for foreground subtraction which could lead to large amount of false motion alarms. Thus, one of our main goals is to accurately extract the region of interest (ROI) for moving objects without using a background model. We also developed the graphic user interface (GUI) with which the user can define the virtual gate, detecting area, capture area, motion vector display button and the other detail parameters of the system. Figure 2 presents the major functions which are indicated by arrows. User Defined Direction
Virtual Gate
Detection Result
Capture Area
Fig. 2. Graphic user interface (GUI) of the proposed system
The proposed method has been decomposed into the following two steps: I. Motion Estimation: Due to complicated environment, we apply a motion estimation approach to determine whether any potential moving objects have entered the scene; II. Motion Vector Analysis: After motion estimation, we record the ROI of motion objects until ROI reaches the detecting area. When the detecting area has been reached, the moving direction of tracked objects will be analyzed to decide whether this direction matches the one appointed by user. The overview block diagram of the proposed algorithm is shown in Fig. 3. First, user needs to define the virtual gate, detecting and capture area by utilizing convenient click and drag function in our GUI. For many applications, there are only one or two locations which need to be concerned in a video sequence. For instance, if we deploy a camera a few meters above the ground facing the entrance at a convenience store, all we need to known is how many people get into this convenience store. By utilizing the proposed system, user can define the detecting and capture area to monitor the customers without extra equipments. Motion Vector is computed by motion estimation method using the three step
Real-Time Detection of Passing Objects
Previous Frame Set Virtual Gate
Set Moving Direction
Set Capture Area
713
Current Frame
Motion Estimation
No
Capture Image
Match
Objects Score Judgement
Motion Vector Analysis
Yes
Objects in Detecting Area?
Fig. 3. Block diagram of the proposed motion detection algorithm
search (TSS) algorithm [13] which has been widely applied in video compression. Section 3.1 will elucidate this algorithm in detail. To estimate the motion vector of moving objects, we propose a motion vector analysis approach where the final judgment is accomplished. Then the principle and algorithm of motion vector analysis are described in detail in Section 3.2.
3 3.1
The Proposed Motion Detection Algorithm Motion Estimation
Motion estimation plays an important role in video coding. The block matching (BM) algorithm is a well known method of motion estimation. In the proposed algorithm, we first divide the current image frame into fixed-size blocks. Next, the motion vector of each block is estimated by finding the most similar block in the previous frame according to the defined matching criterion. That is, for each block in the current image frame, we need to determine the best matched block within a search window in the previous image frame. Let the size of search window be (2P + 1)×(2P + 1) and the size of each block be N ×N with N = 2m . The matching criterion used in our system is the mean square error (MSE) which is expressed as: N N 1 M SE(k,l) (x, y) = |Ft (k + i, l + j) − Ft−1 (k + x + i, l + y + j)|2 (1) N ×N i=1 j=1
for −P ≤x, y≤P , where Ft (k + i, l + j) denotes the gray value of the pixel at position (i, j) in the current block (k, l), and Ft−1 (k + i + x, l + j + y) denotes the gray value of the pixel at position (x + i, y + j) in the block (k + x, l + y) of the previous frame. Therefore, the best matched block within the search window in the previous image frame is to find the minimum match error, (i.e., minimal M SE(k,l) (x, y)).
714
D.-T. Lin and L.-W. Liu
One of the simplest block matching algorithms is the full search (FS) algorithm which provides an optimal solution by searching all candidates within the search window. But its high computational complexity made it difficult to be implemented in real-time. Thus, we apply the three step search (TSS) algorithm [13] in our system due to its simplicity and robust and near optimal performance as well. Searching with a large pattern in the first step, TSS more efficiently find the globe minimum match error. The algorithm is described as follows. Step 1: Choosing an initial center location. Eight blocks at distance of step size from the center (around the center block) are selected for comparison. Step 2: Moving the center to the point with the minimum match error. The step size is reduced to half.
1
1
1
1
1
1
2
2
2
Motion Vector
1
1
2
2 3
2
3
2 3
3
3
3
3
3
3
2
Fig. 4. Example path for convergence of Three Step Search method
Step 1 and 2 are repeated until the step size becomes smaller than 1. Fig. 4 shows a simple example path for the convergence of this algorithm. Distance of four is applied in the first stage search. The red dots with label 1’s denote the blocks chosen in the first stage in which the best match occurs at the lower-right dot. Thus the new center is moved to this particular location for the second stage search (denoted in blue dotted arrow in Fig. 4). By reducing the distance to two, new neighbors are chosen. The orange dots with label 2’s are the blocks chosen for the second stage search. Assume the best match happens at the lower-middle dot, the new center location will be moved to it. Similarly, the green squares with label 3’s represent the blocks chosen for the third stage comparison. The matching calculation will be performed and stop at the location with minimum value because the distance has been reduced to one at this point. 3.2
Motion Vector Analysis
After the motion estimation, the motion vector M V(k,l) of the block (k, l) is given by: M V(k,l) = arg min M SE(k,l) (x, y) (2) (x,y)
Real-Time Detection of Passing Objects
715
where the vectors (k + x, l + y) are valid block coordinates. The proposed system computes and records the motion vector of each block for further analysis. If there is no moving object in the desired detecting area (i.e., blocks in the detecting area have no significant motion variations), then the system will wait until there are sufficient motion changes occur in the detection area. Once there appear sufficient motion changes, the system will analyze the motion vector of each block in the detecting area based on our criterion. In order to determine whether the moving objects in the detecting area are passing through the virtual gate along the desired direction, the moving direction and moving distance of each block are compared and analyzed between the current image frame and the previous image frame. Also, another benefit of the information is that the detecting results can be stabilized and enhanced by setting a vigilant threshold. By only adopting the direction of moving objects information, the detecting results might not be reliable enough. Therefore, we can evaluate the direction M B(k,l) of the block (k, l) according to the value of X and Y axis components which are derived form the motion vector M V(k,l) and is expressed as the following equation.
M B(k,l) =
arctan (β/α)×180 π
(3)
where β denotes the value of Y-axis component of motion vector M V(k,l) , α denotes the value of X-axis component of motion vector M V(k,l) , and the result of arctan is in radians. To continue the next analysis, the difference of the direction between the motion vector M V(k,l) of block (k, l) and the desired direction defined by user M Vd can computed by the following equation: M BD(k,l) = T1 − ( M B(k,l) − M Bd )
(4)
where the motion vector angle M Bd of the desired direction can be computed by Equation (3), and T1 is the threshold of the M BD(k,l) . If a higher value of T1 is chosen, the constraint of detecting will become looser. 64 is chosen for T1 through extensive experiments in this study. After the direction information has been estimated, another information of the distance value M B(k,l) of block (k, l) can be deduced as: M B(k,l) = M V(k,l) =
(α)2 + (β)2
(5)
Although the value of direction and distance of motion vector have been computed, we still need to make integration for the final justification. The benefit of integration is that the system can consider this integration value as a judgement weighting factor, the higher value indicates there is higher probability that the block (k, l) is one of the ROI object consistent with the desired moving direction. Another benefit is that we can track the block for a while, so that the system can ensure that the block is belong to the moving object but not the noise. The integration value M BH(k,l,t) of the block (k, l) is defined as:
716
D.-T. Lin and L.-W. Liu
⎧ M BI(k,l,t−1) + M B(k,l) ×M BD(k,l) if M BD(k,l) ≥T2 ⎪ ⎪ ⎪ ⎨ and M B(k,l) ≥T2 , M BI(k,l,t) = ⎪ ⎪ ⎪ ⎩ M BI(k,l,t−1) /2 otherwise. (6) where T2 is a threshold of the final step parameters. The larger the T2 is, the more the number of blocks of the moving object will be included. We set T2 to be 16 in this study. From Equation (6), if there is a noising block resides in the detecting area, the integration value of the block will become smaller than the value of the same block in the previous image frame. The noising block will not be considered as a part of moving object. To obtain the complete object information without broken fragments, morphological operation and labeling method [14] are applied for the integration value M BH(k,l,t) in our algorithm. In this paper, we use closing operation (dilation followed by erosion) which is one of the popular techniques to eliminate the effect of noise and to get the smoothed integration values. Then, we utilize connected components algorithm to make label for these smoothed values. Because this algorithm is effective in determining the whole moving object, moving region can be characterized not only by their position, but also their size, and other shape information. After morphological operation and labeling are performed, each object is determined whether it is a main object or a part of the object. The part of the object is merged into the closest main object. Thus, those blocks with the same labels are considered a moving object passing through the virtual gate along the direction, then the system will capture the image from a defined region in video sequence.
4
Simulation Results
The proposed pedestrian detection system was implemented to operate at about 30 frames per second (30 fps) on Intel Core2 Duo 2.4GHz either for life analog CCD camera and video sequence files. The experiments were performed to demonstrate the detection performance of the proposed system. To test the performance, we have recorded five video clips containing several people entering and exiting a grocery store. The length of each video clip is around twelve minutes. Figure 5 demonstrates the results of motion estimation, in which Fig’s 5(a), (b) and (c) show three consecutive image frames of the testing video clips. Figure 5(d) shows the result of motion vectors of moving object corresponds to the motion estimation of the difference between the previous frame (Fig.’s 5(a)) and the current frame (Fig.’s 5(b)). The red arrows in Fig 5(d) stand for the motion vector of each block. Similarly, Fig. 5(e) presents the result of motion vectors (plotted in red arrows) of moving object corresponds to the motion estimation of the difference between its previous frame (Fig.’s 5(b)) and the current frame (Fig.’s 5(c)). In this study, the block size and the search window size were 8 and 15, respectively.
Real-Time Detection of Passing Objects
(a)
(b)
(d)
717
(c)
(e)
Fig. 5. Experimental results of motion estimation
(a)
(e)
(b)
(f)
(c) No Match
(d)
Detection Result
(h)
(g)
Fig. 6. Experimental results of motion object detection for entering direction
Figures 6(a)(b)(c) and (e)(f)(g) are two cases of consecutive video sequences of object passing through the gate along two contrary directions with exiting and entering, respectively. Assume we desire to monitor the people entering the grocery store utilizing the proposed system. Figure 6(d) shows that there is no match result in the right-hand side window because the moving object in Fig. 6(a)(b)(c) is exiting the virtual gate which doesn’t match the desired direction. Figure 6(h) demonstrates the correct outcome for the incoming case. When the moving object in Fig. 6(e)(f)(g) is passing the gate along the desired
718
D.-T. Lin and L.-W. Liu Table 1. Evaluation for the proposed system with five real world video clips Test Sequence Direction No. of people Count Accuracy(%) 1
2
3
4
5
Entering
27
25
93
Exiting
29
26
90
Entering
18
17
94
Exiting Entering
17 14
15 13
88 93
Exiting
13
11
85
Entering
14
14
100
Exiting Entering
11 32
10 30
91 94
Exiting
29
26
90
Overall
92
direction, then the system will capture the snapshot and shows the matching motion blocks. To test the performance of the proposed algorithm, we evaluate on the above mentioned five video clips. Table 1 presents the statistical results of the experiments. The overall accurate detection performance is 92%.
5
Conclusion
This work presents a virtual gate method for the pedestrian detection without the need for constructing a background model a priori. The proposed method utilizes motion estimation with three step search and a novel motion vector analysis algorithm which detects moving objects passing through the gate along any desired direction. This method is particularly useful for complex situation. The experimental results demonstrate that our system produces reliable performance. Because of the low computational complexity, our system even can be used in other applications for the pre-processing to reduce false alarm of motion object detection. Future work will be to combining more than two features for detection, and to finding another method of effectively combining multiple features for performance enhancement.
References 1. Sacchi, C., Gera, G., Marcenaro, L., Ragazzoni, C.S.: Advanced image-processing tools for counting people in tourist site-monitoring applications. Signal Processing 81, 1017–1040 (2001) 2. Sidla, O., Lypetskyy, Y., Brandle, N., Seer, S.: Pedestrian detection and tracking for counting applications in crowded situations. In: Proc. IEEE Conference on Advanced Video and Signal Based Surveillance, p. 70 (2006)
Real-Time Detection of Passing Objects
719
3. Snidaro, L., Micheloni, C., Chiavedale, C.: Video security for ambient intelligence. IEEE Trans. Systems, Man and Cybernetics A 35(1), 133–144 (2005) 4. Yang, D.B., Gonzalez-Banos, H.H., Guibas, L.J.: Counting people in crowds with a real-time network of simple image sensors. In: Proc. IEEE Int. Conf. Computer Vision, vol. 1, pp. 122–129 (2003) 5. Liu, X., Rittscher, J., Perera, A., Krahnstoever, N.: Detecting and counting people in surveillance applications. In: Proc. IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 306–311 (2005) 6. Zhao, T., Nevatia, R.: Tracking multiple humans in complex situations. IEEE Trans. Pattern Analysis and Machine Intelligence 26(9), 1208–1221 (2004) 7. Lin, S.-F., Chen, J.-Y., Chao, H.-X.: Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans. Systems, Man and Cybernetics A 31(6), 645–654 (2001) 8. Huang, D., Chow, T.W.S.: A people-counting system using a hybrid RBF neural network. Neural Processing Letters 18, 97–113 (2003) 9. Albiol, A., Mora, I., Naranjo, V.: Real-time high density people counter using morphological tools. IEEE Trans. Intelligent Transportation Systems 2(4), 204– 218 (2001) 10. Chen, T.-H., Chen, T.-Y., Chen, Z.-X.: An intelligent people-flow counting method for passing through a gate. In: Proc. IEEE Conference on Control, Automation, Robotics and Vision, pp. 1–6 (2006) 11. Kim, J.W., Choi, K.S., Choi, B.D., Ko, S.J.: Real-time vision-based people counting system for security door. In: International Technical Conference on Circuits/Systems Computers and Communications, pp. 1416–1419 (2002) 12. Septian, H., Tao, J., Tan, Y.-P.: People counting by video segmentation and tracking. In: Proc. IEEE Conference on Control, Automation, Robotics and Vision, pp. 1–4 (2006) 13. Koga, T., Linuma, K., Hirano, A., Lijima, Y., Ishiguro, T.: Motion compensated interframe coding for video conferencing. In: Nat. Telecommun. Conf., New Orleans, La, USA, pp. G5.3.1–G5.3.5 (December 1981) 14. Gonzalez, R.C., Woods, R.E.: Digital Image Processing 2/e. Prentice Hall, Englewood Cliffs (2002)
A Ubiquitous Interactive Museum Guide Yo-Ping Huang1, Tsun-Wei Chang2, and Frode Eika Sandnes3 1
Dept. of Electrical Engineering, National Taipei University of Technology Taipei, Taiwan 10608 [email protected] 2 Dept. of Computer Science and Information Engineering, De Lin Institute of Technology Taipei, Taiwan 236 [email protected] 3 Faculty of Engineering, Oslo University College Oslo, Norway [email protected]
Abstract. This paper proposes a PDA-based guide and recommendation system. The intelligent guide provides an alternative to the current audio-based guide tools. Techniques from both data mining and extension theory and RFID technology are deployed to provide better interaction between exhibitions and visitors. Keywords: Guide system, RFID, extension theory, data mining.
1 Introduction Tourism plays a substantial role in the economies of some countries. Tourism can secure employment, foreign exchange earnings, investment and regional development. To attract more tourists and local visitors, many units such as natural parks, museums, art galleries, hotels and restaurants tailor various personalized routes for the individual needs. Recently, various recommender-related systems have been proposed in the literature [1-3]. Most museums or exhibition centers offer assistance to tourists who explore the exhibits in the form of a personal guide or information booths. However, such manual services can be problematic. Lack of staff is the key obstacle. Often, a prior appointment with museum guide is required, especially for the weekends and public holidays. Without the personal guide, the visitors often have to rely on a pre-recorded audio-tape, tape player, and a headset for information. Besides, museums or exhibition centers often set a series of routes through the exhibits [4]. Consequently, popular exhibits may become congested with too many visitors. To tackle these limitations, a guide system running on handheld devices is proposed. The proposed guide system recommends interesting exhibitions to users by the means of data mining. Furthermore, extension theory is used to recommend items to the users that are related to the items the users have already browsed. Moreover, some visitors prefer to browse the exhibitions alone and to explore the diversity of the exhibitions. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 720–731, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Ubiquitous Interactive Museum Guide
721
This paper is organized as follows. Section 2 presents an overview of the proposed system, detailed design and related techniques. Section 3 presents the recommendation method based on association rules. Experimental results and discussions are presented in section 4. Conclusions and future research are given in section 5.
2 System Implementation The two main components of the proposed guide and recommendation system are the client-side PDA and the server-side database. The PDA client comprises the guide user interface and the server-side database maintains visitors’ log files stored. The system architecture is depicted in Fig. 1. A traditional audio-tape based guide system can only deliver static exhibitions contents. To enhance the quality of interaction, the proposed guide and recommendation system is implemented on a handheld device. The RFID reader, wireless network and EPC class 1 electronic tag allow the PDA-based platform to support a dynamic and interactive learning model. ISO 15693 passive electronic tags are attached to the information plaque of each exhibition. When a user holds the PDA equipped with a RFID reader close to the tag, the detailed contents of the exhibition is downloaded via the wireless network including historical background, the motivation behind the creations, etc. In addition, personal customized guidance is recommended to individuals according to their association rules-based preferences. The following sections describe the components of our system.
Fig. 1. The architecture of the proposed system
2.1 The Client-Side PDA-Based Guide System The PDA shown in Fig. 2 comprises eight modules. The corresponding functions include: (1) RFID Reader Module: This module emits radio waves for reading the passive electronic tag. Both the tag ID and the electronic tag memory can be read.
722
Y.-P. Huang, T.-W. Chang, and F.E. Sandnes
(2) RFID Identification Module: This module sends control command to the RFID reader module. When the RFID reader receives the response messages and identifies the corresponding ID, the relevant contents can be accessed directly from the PDA or downloaded from the mobile communication module. (3) RFID Middleware – Mapping Module: This module identifies the ID of the exhibition obtained from the RFID Identification Module. (4) Wireless/Mobile Communication Module: This module transmits data from the server to the PDA through WLAN, Bluetooth, or 3G. (5) Guide Content Storage Module: This module saves the guide information or content in advance such that a fast fetch for the detailed explanation of contents is possible. (6) Guide Content Display Module: This module presents the contents to the users such as textual introductions and multimedia. (7) Guide Activity Monitor Module: This module records the visitors’ routes and the browsed exhibitions. A suggested route of additional items to browse is recommended by the proposed recommendation model. (8) Guide Interaction Module: This is the user interface of the proposed mobile guide system. All the exhibitions and related information are presented in this module.
Fig. 2. The modules of the PDA guide system
2.2 Server Functionalities of the Guide System There are five modules in the backend as shown in Fig. 3, namely: (1) Network Communication Module: Information and multimedia contents can be received or transmitted between the Network Communication Module and the Mobile Communication Module on the PDA.
A Ubiquitous Interactive Museum Guide
723
(2) Guide Content Repository Module: This module stores the exhibition contents, related multimedia data and special exhibition programs. (3) Data Mining Module: This module finds association rules based on visitors’ browsing records. The recommendations are made according to these association rules. (4) Video Streaming Module: The multimedia data are encoded as video streams and then transmitted to the PDA. (5) Guide Activity Log Repository Module: This module maintains and manages the overall system.
Fig. 3. The backend modules of the guide system
2.3 RFID-Enabled Interactive Guide System The Radio Frequency Identification (RFID) technique emerged in the 1980s [5-6]. Since then, RFID has been applied in a diverse set of domains. RFID-systems comprise tags and readers. The tag contains object identification which is read by the reader and then compared to the corresponding identification stored in the database. When a match is found, detailed information is retrieved. Domains such as education, business or science can successfully utilize RFID [7-8]. Due to its wide applicability, convenience, wireless information transmission, robustness to the environment and the possibility of performing multiple reads and writes, RFID is adopted in the proposed system. 2.4 Collaborative Filtering Usually, galleries position fine arts according to different categories such as photographic collections, Chinese ink-water paintings, oil paintings or ancient architectures in different zones. Alternatively, artifacts from different categories are mixed and presented together for some particular theme. The main goal of the proposed system is to customize an optimal route for visitors. While browsing exhibitions in one zone, a visitor may miss some particular artists’ works belonging to different themes but with similar characteristics. An optimal guiding route is therefore recommended based on item-based collaborative filtering (CF) method [10-11].
724
Y.-P. Huang, T.-W. Chang, and F.E. Sandnes
The exhibit center first gathers all the information about what the visitors have viewed. Then the relationships between the browsed items can be identified. More related exhibitions can therefore be recommended to the visitors when they browse particular items. The CF method first identifies visitors with similar interests in exhibitions as the target user. The recommendation is then issued assuming that the groups of visitors have similar interests in exhibitions. The CF method continues identifying the correlations between items within visitors’ groups. Once the correlations have been found, potential exhibitions are recommended to the visitors. The preference similarity of exhibitions between the visitors is calculated as follows.
sim( a, b) =
Score( c, j ) =
∑ Nj=1 ( paj − pa )( pbj − pb ) ∑ Nj=1 ( paj − pa ) 2
∑( p i∈H
ij
(1)
∑ Nj=1 ( pbj − pb )2
− pi ) × sim( c, i ) (2)
∑ sim(c,i ) i∈H
where Paj : user a’s rating of exhibition j, and Pa : user a’s average rating of exhibitions.
Table 1 illustrates the CF method. Exhibitions browsed by a user are marked with 1, otherwise it is marked with 0. According to this table, visitor A has high preference similarity to visitor D. Hence, any exhibition browsed by visitor A not yet browsed by visitor D should be recommended to visitor D. Table 1. Collaborative filtering example
visitor A visitor B visitor C visitor D visitor E
People 1 0 0 1 1
Scenery 1 1 0 1 0
Bird 0 0 1 0 0
Flower 0 1 0 0 1
Mountain 1 1 0 0 0
3 Employing Extension Theory to Refine Recommendation The CF recommendation method recommends items for groups of visitors who have similar taste. Hence, the automatic guide system has the ability to “guess” what exhibitions that should be recommended to different visitors. Further relationships between exhibitions are found by mining association rules from visitor profiles. 3.1 The Construction of FP-Tree
In this section a simple database example, shown in Table 2, is used to introduce the FP-tree. The minimum support value is predefined as 2. Just as in the Apriori
A Ubiquitous Interactive Museum Guide
725
algorithm, the database is first scanned for frequent 1-itemsets with their support counts. A sorted list of frequent 1-itemsets L according to the support count in descendant order is as follows. L = [4:4, 2:3, 3:3, 6:3, 1:2]. A corresponding FP-tree is shown in Fig. 4. While building the FP-tree, every transaction has to be the same sequence as the sorted list L. If an item cannot find a node containing itself along any branch, it will be inserted into the tree and initialized with a support count of 1; otherwise, the support count is increased by 1. Table 2. The database used to build the FP-tree TID 100 200 300 400
Transaction (items) 2, 3, 4, 6 2, 4, 5, 6, 7 1, 2, 3, 4 1, 3, 4, 6
Fig. 4. A FP-tree from the database in Table 2
When the FP-tree is built, the mining procedures are preceded with the FP-tree as follows. (1) Construct the corresponding conditional pattern base from the beginning of frequent pattern whose length is 1. (2) Build its conditional FP-tree. (3) Mine the conditional FP-tree recursively. During steps 1 and 2, a table containing the conditional pattern base of items in the list L is generated before mining the FP-tree (see Table 3). The items in the Table 3 are in the inverse order compared to the list L. In other words, when item 1 is checked, all frequent itemsets with initial 1 are found. For example, if we want to mine item 6 with its conditional FP-tree, it is not necessary to consider item 1. That is because the entire frequent itemsets with initial 1 will be generated when the conditional FP-tree of item 1 is mined.
726
Y.-P. Huang, T.-W. Chang, and F.E. Sandnes
Considering item 1, it is evident that there are two branches in the FP-tree, <(4 2 3:1)> and <(4 3 6:1)>. These two branches are the conditional pattern base of item 1. Based on this conditional pattern base, it is easy to generate the relative conditional FPtree. Having obtained the conditional pattern base, then we can also find that the <4:2, 3:2> is the conditional FP-tree of item 1. Since the support count of <6:1> is smaller than the minimum support, it thus cannot be the element of the conditional FP-tree of item 1. Figure 5 demonstrates how to build and mine the conditional FP-tree. Table 3. The conditional pattern base and relative conditional FP-tree Item 1 6 3 2
Conditional pattern base {(4 2 3:1), (4 3 6:1)} {(4 2 3:1), (4 2:1), (4 3:1)} {(4 2:2), (4:1)} {(4:3)}
Conditional FP-tree <4:2, 3:2> <4:3, 2:2>, <3:2> <4:3> <4:3>
Fig. 5. The conditional FP-trees for item 1
The frequent pattern can be found using the above-mentioned steps. Meanwhile, the candidate itemsets are generated such that the computing cost can be reduced. Nevertheless, FP-tree has a high memory requirement. Instead of loading the entire database into memory, the database can be partitioned into several projected databases. Each projected database is used to build the FP-tree and is mined separately [12]. 3.2 Extension Set Theory
Classical set theory utilizes 0 and 1 to represent the certainty of an event; while extension set theory adopts real number ranging from (-∞, +∞) to indicate the degree of certainty of an event [13-14]. Positive and negative numbers indicate the degree of an event with or without the possibility, respectively. Definition: Suppose U is the universe of discourse. For any element u , u ∈ U , there exists a real number such that K (u ) ∈ ( −∞,+∞) . We call the relative set
A Ubiquitous Interactive Museum Guide
727
~ = {(u, y ) | u ∈ U , y = K (u ) ∈ ( −∞,+∞ )} is a extension set of the domain U. And the A ~ , and K (u ) is the degree of u related to A ~. y = K (u ) is the incidence function of A ~ . A = {u | u ∈ U , K (u ) ≤ 0} And A = {u | u ∈ U , K (u ) ≥ 0} is the positive domain of A ~ . J = {u | u ∈ U , K (u ) = 0} is zero boundary of A ~. is the negative domain of A 0 Obviously, if u ∈ J 0 , then u ∈ A and u ∈ A occurs at the same time. An example is used to illustrate the concept of extension sets. Suppose U is a set of processed machine spare parts and the standard diameter is 50±0.1mm. When the diameter of a spare is between 49.9mm and 50.1mm, it will belong to the qualified product. On the other hand, a spare part belongs to a set of unqualified products when it is outside the range from 49.9mm to 50.1mm. Normally, there are only two groups of machine spares under classic set theory, qualified and unqualified. Actually, if a spare’s diameter is larger than 50.1mm, then it is a returnable product. On the contrary, if the diameter is smaller than 49.9mm, then it will be a rejected product. Figure 6 shows the difference between a classical set and an extension set.
(a) Classic set.
(b) Extension set. Fig. 6. Classic versus extension sets
3.3 Data Mining Based on FP-Tree and Extension Sets
Traditional FP-tree mining uses a single threshold to derive the frequent itemsets. It means that the mining results highly depend on the pre-assigned threshold. A new
728
Y.-P. Huang, T.-W. Chang, and F.E. Sandnes
strategy is proposed for mining FP-trees. We are not only interested in extracting frequent patterns based on the pre-assigned threshold (lower bound), but also infrequent patterns. These infrequent patterns may become frequent under some conditions. For example, a product is not a frequent item if it is not popular. It will be frequent while associated with other products. Our goal is to discover the original frequent patterns and extension frequent patterns for recommending more valuable exhibitions to visitors. To mine the extension frequent patterns, an extension threshold has to be set in advance. Currently, a default value is set as (threshold − 0.005) × num _ of _ transaction where num_of_transaction is total number of transactions. A domain expert can also determine the extension threshold.
Fig. 7. The selected range is moved to the head of the array
(a) num_original only considers the original part
(b) num_exten only considers the extension part. Fig. 8. Header tables and their corresponding FP-trees
A Ubiquitous Interactive Museum Guide
729
The traditional FP-tree employs all frequent large 1-itemset as nodes. We are not interested on some items that are always frequent. Thus, we have to prune some large 1-itemset and specify a range of large 1-itemset. First, all the large 1-itemsets are put in an array in descending order. If the upper bound is not 1, then the range is moved to the head of the array. Figure 7 illustrates this process. Therefore, the interesting range of frequent patterns can be mined. Since the new FP-tree is built based on both original large 1-itemsets and extension large 1-itemsets, the header table must be constructed separately. Two parts are in the header table and controlled by two parameters. After extracting large 1-itemsets, two parameters, num_original and num_exten, are counted to represent the numbers from the original large 1-itemsets and extension large 1-itemsets, respectively. These parameters control the reading number of header table nodes. Figure 8 shows the two header tables and their corresponding FP-tree. While a FP-tree is mined, the number of paths is counted because the number of paths affects the decision making procedure that is executed recursively.
4 Experimental Results and Discussions Figure 9 demonstrates the entire functionalities in the system described herein.
Fig. 9. The guide system architecture
730
Y.-P. Huang, T.-W. Chang, and F.E. Sandnes
The guide system is developed in Embedded C 4.0 and runs in a PDA Windows Mobile environment. The main screen of the system is shown in Figure 10(a). The proposed system provides four languages including: 1) Chinese, 2) Taiwanese, 3) English and 4) Japanese. After selecting the language, users enter the guide system.
(a) The main screen of the guide system.
(b) The selection screen, for users to click.
(c) The screen for viewing browsing history.
(d) The related descriptions.
(e) The video introduction.
(f) Search by painted year.
(g) The keyword-based painting search.
(h) The texture descriptions for painting.
Fig. 10. The operations of the proposed guide system deployed in The Li Mei-shu Memorial Gallery
A Ubiquitous Interactive Museum Guide
731
5 Conclusions An interactive guide system is proposed based on RFID-technology, information retrieval, association rules, and personalized recommendation, to assist visitors browsing exhibitions. Visitors view exhibition-related multimedia content on PDAs equipped with RFID-readers. The exhibition centers gather the visitors' information based on the interactions between the PDAs and the server and the results are used for making recommendations to future visitors. Acknowledgments. This work is supported by National Science Council, Taiwan under Grants NSC95-2745-E-036-001-URD and NSC95-2745-E-036-002-URD.
References 1. Kim, C.-Y., Lee, J.-K., Cho, Y.-H., Kim, D.-H.: VISCORS: A Visual Content Recommender For The Mobile Web. IEEE Intelligent Systems 19(6), 32–39 (2004) 2. Huang, Y.-P., Tsai, T.-W.: A Fuzzy Semantic Approach To Retrieving Bird Information Using Handheld Devices. IEEE Intelligent Systems 20(1), 16–23 (2005) 3. Brown, P.J., Bovey, J.D., Chen, X.: Context-Aware Applications: From The Laboratory To The Market Place. IEEE Personal Communications 4(5), 58–64 (1997) 4. Derntl, M., Hummel, K.A.: Modeling Context-Aware E-Learning Scenarios. In: The 3rd IEEE International Conference on Pervasive Computing and Communications Workshops, USA, pp. 337–342 (2005) 5. Haverkamp, D.S., Gauch, S.: Intelligent Information Agents: Review And Challenges For Distributed Information Sources. Journal of the American Society of Information Science 49(9), 304–311 (1998) 6. Landt, J.: The History of RFID. IEEE Potentials 24(4), 8–11 (2005) 7. Weinstein, R.: RFID: A Technical Overview And Its Application To The Enterprise. IT Professional 7(3), 27–33 (2005) 8. Roussos, G.: Enabling RFID in Retail. IEEE Computer 39(3), 25–30 (2006) 9. Billsus, D., Pazzani, M.J.: Learning Collaborative Information Filters. In: International Conference on Machine Learning, USA, pp. 46–54 (1998) 10. Gokhale, M., Miranda, A., Murnikov, T., Netes, P.D., Sartin, M.: Combining ContentBased And Collaborative Filters In An Online Newspaper. In: ACM SIGIR Workshop on Recommender Systems: Algorithms and Evaluation, USA (1999) 11. Good, N., Schafer, J., Konstan, J., Borchers, A., Sarwar, B., Herlocker, J., Riedl, J.: Combining Collaborative Filtering With Personal Agents For Better Recommendations. In: International Conference of the American Association of Artificial Intelligence, USA, pp. 439–446 (1999) 12. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001) 13. Cai, W.: The Extension Set And Incompatible Problem. Journal of Scientific Exploration 1, 81–93 (1983) 14. Huang, Y.-P., Chen, H.-J.: Using Extension Theory To Design A Fast Data Processing Model. In: IEEE-SMC International Conference, USA, pp. 3410–3415 (2000)
Dynamic Probabilistic Packet Marking with Partial Non-Preemption Wei Yen and Jeng-Shian Sung Computer Science and Engineering Department, Tatung University, Taipei, Taiwan, R.O.C. [email protected]
Abstract. This paper studies the technique of probabilistic packet marking. The main purpose is for the receiver to trace the true sources of the marked packets and rebuild their paths rapidly. In many existing schemes, marked packets are overwritten by other marking routers because of the nature of probabilistic packet marking. Although some IP traceback schemes can improve the overwriting problem, they usually make unrealistic assumption[11] or do not consider the load balancing issue of the routers. In probabilistic packet marking, how to address the re-mark problem and how to balance routers’ overhead are two important design considerations. A scheme is proposed in this paper to address these problems in probabilistic packet marking. Using computer simulation, we observe that our scheme can greatly reduce re-marking. When compared with the current IP traceback schemes, our scheme can reduce 65% marked packets than the other scheme. Moreover, the overhead of marking packet in each router is even out in the proposed scheme than the other. Keywords: DDoS, IP traceback, probabilistic packet marking, re-marking, convergence packet number.
1 Introduction In this age, the Internet has become an important way to get information. This technology enables revolutions in ways of doing business and increasing productivity. In the downside, many hackers or even criminals are attracted to using the Internet as their playground. They exploit network security pit holes and raise all kinds of security issues. Because new network attack techniques have been coming out and old ones have been morphing constantly, network security issues pose enormous threats and challenges in recent years. One attack technique is the so called Denial of Service (DoS) attack [1, 2, 3]. The Denial of Service (DoS) attack is an attack that prevents normal users from accessing their desired services. Typically, the attacker assaults the targeted victim system with malicious packets. The loss of network connections and services can be achieved by depleting the computing resources or consuming the bandwidth of the victim systems. In the worst case, if the load is heavy enough, the targeted victim cannot function well and will be forced to shut down. The goal of IP traceback is to discover the routing path or, at least, the maximum subset of it. This function can verify the sender of IP packets destined to a receiver. F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 732–745, 2008. © Springer-Verlag Berlin Heidelberg 2008
Dynamic Probabilistic Packet Marking with Partial Non-Preemption
733
One of the most important applications of IP traceback is to deal with certain DoS /or DDoS attacks, where the source IP addresses may be spoofed by the attackers. Once we find out the real perpetrator, we can hold him accountable for what he does. Also, it helps in mitigating DoS/or DDoS attacks by isolating the identified attack source. Packet marking is one technique of IP traceback schemes. Generally speaking, the router in a packet marking scheme will write the router’s information into the packet header when it decides to mark a packet. After the victim receives a number of the marked packets from the routers and collects sufficient information from the headers, the attack path and the sender of the packets can be computed. If the victim is attacked by malicious attackers, it can use these marked packets to trace the upstream router and find out the attack source by using received marked packets. In short, packet marking is a viable traceback technique that can help the victim to find the malicious attackers. Many DoS /or DDoS attackers use spoofed IP addresses to hide the traces of the attacks. To solve the spoofing problem, the packet marking technique can be employed which asks the routers in their way to the victim to leave information in the packets. This technique can be used to reveal the concealed identity of the sender by collecting the information marked in the attack packets. Although the packet marking scheme can be effective against the spoofing problem, the victim usually needs a large number of the marked packets to compute the attack path. The main reason for this inefficiency is contributed to the repeated marking by the routers in large. This re-marking will waste the effort of the upstream marking router and results in unbalanced number of packets marked by the routers. In this paper, we propose a novel scheme that requires less marked packets for the path calculation. Our approach can help the victim rebuild the path rapidly and reduce the marking overhead in the routers. In general, we use non-preemptive compensation to avoid repeated marking. Moreover, the marking probability for each router is no longer a constant. It is adjusted to so that the numbers of the packets marked by the routers are statistically equal. In summary, we are motivated to improve the performance the probabilistic packet marking by cutting the required number of the marked packets and lowering the router overhead. The rest of the paper is organized as follow. In Section 2, we will describe DDoS attack types and packet marking schemes including deterministic packet marking (DPM), probabilistic packet marking (PPM) scheme, and other IP tracebacks. In Section 3, we will present our proposed Dynamic Probabilistic Packet Marking with Partial Non-Preemption (DPPM-PNP) marking scheme. The supporting algorithms in the routers and the victim are also included in this chapter. In Section 4, we will analyze the performance of the DPPM-PNP and compare it with that of other schemes. Finally, we conclude the paper in Section 5 which highlights the main results of contributions of this research.
2 Hierarchical and Self Coupling Schemes In the following, we give coverage on two aspects. First, we would like to introduce DDoS attacks. The purpose is to understand whether the packet marking can be used
734
W. Yen and J.-S. Sung
Fig. 1. DDoS attack model
as a countermeasure to these attacking methods or not. Then, the existing packet marking schemes are surveyed. In Figure 1, it is shown that the packets from the attackers can take different attack paths. In the following, we outline several challenges when dealing with the DDoS attack. First, when the attack packet is sent to the victim, it may not be immediately identified as ill intended because it may very well appear normal to the victim. The attack packet will be treated as other normal packets and receive services. However, since its request is phony, the victim will waste its resources on serving the attack packet. This situation leads to degradation in system performance. Because the packet may have a legitimate source ID and request for common services, it is difficult for the victim to detect the attack packet. Many web sites employ some schemes such as typing in somewhat distorted letters to confirm human interaction before granting services. A trackback mechanism can be triggered to locate the source of the attack packet if a user fails repeatedly to pass the confirmation test. Secondly, when the victim does detect that it is under the DDoS attack, it may use the implemented defense mechanisms to counter the attack. However, it is possible that the zombies spoof their IP addresses to avoid being pinpointed by the victim [4]. IP spoofing is for the source to replace its IP address in the packet header with another IP address. IP spoofing can conceal the true source and cover up the attack path. Hence, even if the attack is detected by the victim, the true identities and locations of the zombies will still be elusive because of IP spoofing. To observe these attack network techniques, if these attacks are not solved or improved completely, the victim will be attacked by these malicious attackers. Next, we will introduce the packet marking technique for addressing these problems.
Dynamic Probabilistic Packet Marking with Partial Non-Preemption
735
2.1 Probabilistic Packet Marking (PPM) With the PPM schemes, each router marks packets randomly and independently [5, 6, 8]. A random number is generated when a router receives a packet. It is compared with a pre-defined threshold, known as marking probability. The router will mark the packet if the random number is smaller than the marking probability. Otherwise, the packet is left intact. Therefore, a packet can be marked multiple times by the routers on the path. The previous marking will be overwritten by the latter one. This phenomenon is referred to as the re-marking problem. As we will explain later, remarking will slow down the traceback. When the receiver tries to rebuild the path, it depends on the router information carried in the packets. Since each marked packet contains only a fraction of the required information, the receiver has to wait for all available information to arrive. It should be noted that when the marking function is partially supported in the network, to traceback the entire path may be impossible. Hence, we are interested on finding the so called maximum subset of the route which is an ordered sequence containing the routers with marking capability in the path. In addition, the number of marked packets necessary for the receiver to compute this maximum subset is called the convergence packet number.
p (1 − p) d −1
(1)
There are several variations of PPM including node append, node sampling, and edge sampling [7]. Node append requires unavailable space in the packet header to accommodate marking information. It is largely considered unrealistic approach. The node and edge sampling are both subject to the re-marking problem. Eq. (1) characterizes the probability of a marked packet reaching its receiver. The p value is the marking probability of each router and d denotes the number of routers between the current router and the receiver. If the marking probability is too large, the receiver may have difficulty to get marked packets from the routers close to the source of the packet. Eq. (1) becomes a constraint as far as the convergence packet number is concerned. 2.2 Probabilistic Packet Marking with Non-Preemptive Compensation (PPM-NPC) The PPM-NPC scheme improves the edge sampling scheme in the PPM scheme [9]. In this scheme, a compensation counter is adopted in each router to allow nonpreemptive actions, i.e., no overwriting by the downstream routers. The idea is detailed in the following. Assume a marked packet goes through a router which is supposed to mark the packet based on its random number and the marking probability. In the PPM-NPC scheme, the packet is not marked by the router. Instead, a compensation counter in the router will be increased by one. This modification removes re-marking situations. The compensation counter, hence, records the number of packets that should be marked by the router in the future. When (i) the router receives a packet that has never been marked previously and
736
W. Yen and J.-S. Sung
(ii) the compensation counter is not zero, the router will mark the packet and decrease the counter by one. In summary, the re-mark problem in the PPM scheme is avoided in the PPM-NPC scheme. In turn, the numbers of the required marked packets are reduced. According to [19], 52% reduction is achieved by the PPM-NPC scheme in some cases when compared with the edge sampling scheme. 2.3 Efficient Dynamic Probabilistic Packet Marking (EDPPM) The EDPPM adjusts the marking probability of each router depending on its location in the route. Marking probabilities of the routers far away from the receiver are higher than those of the closer routers. The adjustment in marking probabilities is to ensure that the numbers of the marked packets by each router are statistically the same. Basically, this scheme uses the TTL value of the marked packet to change the marking probability of the router. Each router can get its marking probability to mark the packet. Using this designed, it will reduce re-mark problem for the receiver because of the marking probability. For the remote routers from the receiver, their marking probabilities are much higher than other routers close the receiver. pd = 1 / d ;
d = 1, 2,3,...
(2)
Eq. (2) is the marking probability designed with the EDPPM scheme. The parameter, d, indicates the hop distance from the source to the router and pd is marking probability of the router d. The marking probability of the router nearest the source is the highest. When the routers are getting closer to the receiver, their marking probability will become smaller. Then, the marked packet of the remote router will get easier delivery to the receiver because of the mitigated re-mark problem. According to the calculation in [10], the probability a marked packet can reach the destination is independent of the router where it is marked. The probability is fixed at 1/D where D is the hop distance of the route. Although the re-mark problem seems solved in the PPM-NPC scheme, a sufficient condition must be met first. When the marking probability of each router is relatively
Fig. 2. Drawback of the PPM-NPC scheme
Dynamic Probabilistic Packet Marking with Partial Non-Preemption
737
smaller, then this scheme will work as designed. However, if the marking probability with each router is high enough, some routers will never have a chance to mark their router information. Specifically, if the multiplication of the number of the router in the path and the marking probability exceeds one, then some of the compensation counters will not converge. For example, if the number of the router are 15 and the marking probability is 0.1 as shown in Figure 2, then the compensation counters in the last five routers near to the receiver will go to infinity because there will be no mark-free packets for the counters to deflate. Therefore, in some cases, when the marking probability is too high, the receiver will never be able to find the complete path. To tackle this critical problem, we can set a small marking probability that can accommodate most routes in the practical networking environment. Unfortunately, this solution will become inefficient for shorter routers and defies the goal of this scheme. This issue is addressed in the proposed DPPM-PNP scheme. The routers cannot leave their marks on packets in the stochastic sense are referred to as starving routers.
3 Dynamic Probabilistic Packet Marking with Partial Non-Preemption (DPPM-PNP) The DPPM-PNP scheme utilizes the un-rewrite property of the PPM-NPC scheme to mark the packets. About non-convergence problem of the PPM-NPC scheme, the DPPM-PNP scheme uses dynamic marking probability for the starving router to remark the marked packets. The dynamic marking probability is used by the starving router to adjust their marking probability. If the starving routers are nearer the receiver, their marking probabilities are smaller than the farther starving routers. This approach can reduce re-marks for the receiver. Our scheme can achieve convergence with a small number marked packets to the receiver. In the DPPM-PNP scheme, we utilize the start, end, and distance fields in marked packets from the edge sampling scheme. In addition, our scheme also marks two pieces of router information. They are Time-To-Live (TTL) and the marking probability of the router. They are marked into the ID field of the marked packet by the marking routers. The marked TTL can help to determine distances between the last router marking the packet and the current router. The marking probability of the router is recorded in the packet for the downstream starving routers to compute their marking probability. The marking probabilities of the starving routers need to be kept low because the DPPM-PNP scheme uses these probabilities for the re-marking function. We know a general operating system will set the TTL to one of the values of 32, 64, 128, and 255 [12]. These four values are considered as default values. And, there are very few packets will travel though more than 25 routers on the practical network, [14, 15]. In fact, a packet usually passes through 17 routers on average [13]. In DPPM-PNP, each router retrieves the marked_TTL, the marked_P, the packet_TTL, and distance value from a marked packet header. In the first step, if the router is a starving router, it uses the information in this database to compare with the retrieved
738
W. Yen and J.-S. Sung
information. When a router receives a packet, the distance value of the packet and the distance value in the routing table are compared. When the former is smaller, this router will update its database with this new value. Otherwise, this router will keep its original distance value. The objective is to figure out the the minimum distance between the router and the source. The second step is about changing marking probability for the starving routers. If the marking probability is high, some routers suffer from starvation as in the PPMNPC scheme. So, we utilize the compensation counter in each router. When the counter reaches a threshold value, the marking probability needs to be changed. When a router receives a marked packet and leave it along, the compensation counter will be increased by one. However, if the router finds a mark-free packet while its compensation counter is not equal to zero, then it will mark the packet and reduce the counter by one. If the counter reaches C = 2/p, then the router concludes that it is in a starving state where p is the marking probability of the router. In Figure 3(a), it is shown how to change and compute marking probability for the starving routers. For example, if there 30 routers labeled from 0 to 29. Their marking probability is 0.04. In such a case, the five routers near the receiver will rarely see any mark-free packets. Hence, their compensation counters will be increased if they decide to mark the marked packet. We assume the 26th router’s compensation counter gets to the threshold value C. Thus, the 26th router computes a new marking probability with Eq. (3).
Pi =
1 1 + ( Marked_TTL − Packet_TTL ) Marked_P
(3)
where Pi is the new marking probability of the starving router and P is the original marking probability. Using this formula, the new marking probability of 26th router is 1/26 in Figure 3(a). After changing its marking probability, the 26th router will use this new marking probability to re-mark the marked packets. Therefore, the receiver will receive the marked packets from the 26th router marked. In our example, after the 26th router changes its marking probability and starts to re-mark the marked packets, the other starving routers may start to adjust their marking probability due to threshold violation. As shown in Figure 3(b), the compensation counter of the 27th router also exceeds the threshold value. It will then adopt Eq. (3) to compute the new marking probability and enters the rewrite mode. Using the provided information, the 27th router will obtain 1/27 as its new marking probability. It should be noted that the starving router in the downstream of another starving router may get into the rewrite state prior to its upstream peer. This is because the compensation counter in the router operates independently. In fact, the starving routers closer to the destination are usually prone to earlier entrance to the rewrite state. In Figure 3(c), such a case is illustrated. Using Eq. (3) and the required information, the new marking probability is calculated as demonstrated in Figure 3(c). It is obvious that the new marking probabilities are identical regardless of the situation under which the starving router gets into the rewrite state.
Dynamic Probabilistic Packet Marking with Partial Non-Preemption
739
(a)
(b)
(c) Fig. 3. An example of the DPPM-PNP
4 Performance Evaluation In this section, we compare our scheme with other existing probabilistic packet marking mechanisms. First, the convergence packet numbers of the DPPM-PNP scheme, the Edge sampling scheme, and the PPM-NPC scheme are measured and compared. The convergence packet number indicates the number of packets received by the destination before it sees at least one marked packet from each of the routers in the connection path. The convergence packet number is expected to increase as the length of the path extends. In Figure 4, the marking probability is fixed at 0.01. In this experiment, it is shown that the edge sampling scheme is inferior to the PPM-NPC scheme and the DPPMPNP scheme. On the other hand, the PPM-NPC scheme and the DPPM-PNP scheme
740
W. Yen and J.-S. Sung DPPM-PNP P=0.01 Edge Sampling P=0.01 PPM-NPC P=0.01
1000
re 750 b m uN te 500 kc aP 250
0 5
10
15 Path length
20
25
Fig. 4. Convergence packet number (P = 0.01) DPPM-PNP P=0.1 Edge Sampling P=0.1
500
PPM-NPC P=0.1
400
re b 300 m u N te kc 200 aP 100
0 5
10
15
20
25
Path length
Fig. 5. Convergence packet number (P = 0.1)
share similar performance. In this simulation, the marking probability is relatively small given the path length. The re-mark problem is mild. Actually, most of these convergence packets are marked-free packets. In other words, more mark free packets arrive at the receiver than marked packets. As we explained previously, a small marking probability will be inefficient in terms of path convergence time. The PPM-NPC scheme and the DPPM-PNP scheme are better than the edge sampling scheme because they completely remove the re-marking problem. To see how the marking probability affects the performance of these schemes, it is changed to 0.1 and the experiment is repeated. In Figure 5, the performance is plotted. When the marking probability is 0.1, the convergence packet numbers demonstrate salient drop in all three marking schemes. It is contributed to the elimination of
Dynamic Probabilistic Packet Marking with Partial Non-Preemption Table 1. Convergence packet number in the PPM-NPC
Router number P
5
10
15
20
25
0.01 226 292 338 365 385 0.02 118 146 170 185 195 0.03
78
99 113 122 129
0.04
59
75
83
94
124
0.05
39
51
61
91
∞
0.06
39
51
62
∞
∞
0.07
30
40
∞
∞
∞
0.08
29
39
∞
∞
∞
0.09
24
36
∞
∞
∞
0.1
22
44
∞
∞
∞
0.2
14
∞
∞
∞
∞
0.3
∞
∞
∞
∞
∞
DPPM-PNP P=0.2 Edge Sampling P=0.2
4000
PPM-NPC P=0.2
3000
re b m u N te 2000 kc aP 1000
0 5
10
15
20
Path length
Fig. 6. Convergence packet number (P = 0.2)
25
741
742
W. Yen and J.-S. Sung DPPM-PNP P=0.05 EDPPM Edge Sampling P=0.05 PPM-NPC P=0.05
300
250
re 200 b m u N 150 etk ca P100 50
0 5
10
15 Path length
20
25
Fig. 7. Convergence packet number (P = 0.05)
marked-free packets. Nevertheless, the re-marking problem starts to become troublesome. In addition, the non-convergence problem of the PPM-NPC scheme takes place when the path length is greater than 10. This situation prohibits the routers closer to the destination from getting any mark free packet. These routers become starving and the destination will not see any packet marked by these starving routers. The PPM-NPC scheme shines in this situation because it allows the starving routers to switch to the rewrite mode. Table 1 emphasizes the non-convergence problem of the PPM-NPC scheme. It occurs when the product of the path length and the marking probability exceeds one. The DPPM-PNP scheme does not have this problem. Moreover, it outperforms the other two schemes in Figure 5. When the marking probability is set to 0.2, the marked-free packet numbers are further reduced in Figure 6. However, the re-marking problem becomes so serious that the receiver needs 3140 packets to converge when the edge sampling scheme is used and there are 25 routers in the path. The PPM-NPC scheme is basically useless in this situation. Next, we will consider the overhead of the packet marking schemes at each router. We compare the four schemes including the DPPM-PNP scheme, the edge sampling scheme, the PPM-NPC scheme, and the EDPPM scheme. At first, we show the convergence packet numbers for these schemes. In Figure 7, we can find the EDPPM scheme is better than the other schemes. This reason is that the EDPPM scheme adjusts the marking probability of each router based on its order in the path. This approach can reduce the re-mark problem and the receiver will see similar numbers of marked packets from the routers. It is noted that the DPPM-PNP and EDPPM share similar performance as the length of the path increases.
Dynamic Probabilistic Packet Marking with Partial Non-Preemption
743
DPPM-PNP P=0.05 EDPPM Edge Sampling P=0.05 PPM-NPC P=0.05
7000 6000
re 5000 bm uN te 4000 cka Pg 3000 inkr aM2000 1000 0 1
2
3
4
5 6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Path length
Fig. 8. Number of marked packets by each router observed by the receiver
100%
DPPM-PNP P=0.05 EDPPM
90%
Edge Sampling P =0.05
80%
PP M-NPC P=0.05
)% (r 70% eb m uN 60% te 50% kc aP gni 40% kra 30% M 20% 10% 0% 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Path length
Fig. 9. Marking overhead for each router
In Figure 8, 100000 packets are sent from the source to the receiver with the marking probability set to 0.05 except for the EDPPM. The numbers of marked packets by each router are counted and summarized in this figure. We see for both EDPPM and DPPM-PNP the numbers of packets marked by the routers are close to one another. However, this observation hides an important issue for the EDPPM. That is, the routers close to the source suffer from heaving workload imposed by packet
744
W. Yen and J.-S. Sung
marking. On the other hand, the DPPM-PNP scheme uses the compensation counter to avoid this problem. In Figure 9, the marking overhead is illustrated. If the traffic is heavy in the network, the routers with heavy marking workload may experience unacceptable system slowdown. In the worst case, this situation can lead to intensive packet loss at these routers. The DPPM-PNP scheme does not have this issue because each router marks approximately the same number of packets as Figure 9 suggests.
5 Conclusion We review packet marking schemes and introduce a novel solution in this paper. The early works focus on the issue of the convergence packet number and the re-mark problem are rule based without the overhead and the marking probability consideration. Our scheme exhibit several good characteristics. First, the convergence is guaranteed. Secondly, the marking probability and the router number can be randomly selected. Out scheme will adapt to the operating environment. In the experiments, we compare our scheme with the edge sampling scheme, the PPM-NPC scheme, and the EDPPM scheme. In our scheme, the convergence packet number is reduced 65% than the edge sampling scheme if the marking probability is 0.2 and the router number is 25. And, our scheme solves the non-convergence problem in the PPM-NPC scheme. In addition, we analyze the marking packet number of each router with the various marking probability for the receiver. We discover the marking packet numbers of the routers are average and better than the EDPPM scheme. So, the reducing re-mark problem, the shortening convergence packet number, and balancing the packet number of each router are finished for the receiver by our scheme. Acknowledgments. The author would like to express their appreciation to the support of National Science Council and Tatung University for supporting this research under grant numbered NSC 95-2745-E-036-004-URD and B95-I02-035, respectively.
References 1. CERT Coordination Center, CERT CA-1997-28 IP Denial-of-Service Attacks (December 1997), http://www.cert.org/advisories/CA-1997-28.html 2. CERT Coordination Center, Denial of Service Attacks (February 1999), http://www.cert.org/tech_tips/denial_of_service.html 3. CERT Coordination Center, Internet Denial of Service Attacks and the Federal Response (February 2000), http://www.cert.org/congressional_testimony/ Fithen_testimony_Feb29.html 4. Ferguson, P.: Network Ingress Filtering: Defending Denial of Service Attacks Which Employ IP Source Address Spoofing, RFC 2267 (January 1998) 5. Belenky, A., Ansari, N.: On IP traceback. IEEE Communications Magazine 41(7), 142– 153 (2003) 6. Savage, S., Wetherall, D., Karlin, A., Anderson, T.: Network support for IP traceback. IEEE/ACM Transactions on Networking 9(3), 226–237 (2001)
Dynamic Probabilistic Packet Marking with Partial Non-Preemption
745
7. Savage, S., Wetherall, D., Karlin, A., Anderson, T.: Practical network support for IP traceback. In: Proc. of ACM SIGCOMM 2000, vol. 30, pp. 295–306 (August 2000) 8. Park, K., Lee, H.: On the effectiveness of probabilistic packet marking for IP traceback under denial-of-service attack. In: Proc. of IEEE INFOCOM 2001, vol. 1, pp. 338–347 (April 2001) 9. Tseng, Y., Chen, H., Hsieh, W.: Probabilistic Packet Marking with Non-Preemptive Compensation. IEEE Communication Letter 8(6), 359–361 (2004) 10. Liu, J., Lee, Z.-J., Chung, Y.-C.: Efficient dynamic probabilistic packet marking for IP traceback. In: The 11th IEEE International Conference on Networks, pp. 475–480 (October 2003) 11. Al-Duwairi, B., Govindarasu, M.: Novel Hybrid Schemes Employing Packet Marking and Logging for IP Traceback. IEEE Transactions on Parallel and Distributed Systems 17(5) (May 2006) 12. SWITCH. Default TTL Values in TCP/IP (1999), http://www.switch.ch/docs/ttl_default.html 13. Theilmann, W., Rothermel, K.: Dynamic Distance Maps of the Internet. In: Proc. of the 2000 IEEE INFOCOM Conference, vol. 1 (March 2000) 14. Carter, R., Crovella, M.: Server Selection Using Dynamic Path Characterization in WideArea Networks. In: Proc. of the IEEE INFOCOM Conference, vol. 3 (April 1997) 15. Cooperative Association for Internet Data Analysis, Skitter analysis (2000), http://www.caida.org/tools/measurement/skitter
Fractal Model Based Face Recognition for Ubiquitous
Environments Shuenn-Shyang Wang, Su-Wei Lin, and Cheng-Ming Cho Department of Electrical Engineering Tatung University 40 Chungshan N. Road, 3rd Sec. Taipei, Taiwan, R.O.C. [email protected]
Abstract. In Ubiquitous environments, it is required to identify the user in order to utilize the knowledge of the user’s behavior and situation. In this paper, we propose a novel method of face recognition using dominant facial region extraction and fractal model. In order to improve the performance of the face recognition system, we propose an algorithm to extract the dominant facial region from the face images that includes the most discriminated part of the face, and for each dominant facial region it was presented by its fractal model and stored in database. The fractal model of the dominant facial region is then utilized as fractal facial features for face recognition. To further improve the performance of the face recognition system , we also propose the techniques of weighting mask and DC_Free MSE. Finally, some experimental results are presented and demonstrate the excellent performance of our face recognition approach.
1 Introduction A smart environment has various sensors and digital appliances networked together. To enable a user to obtain high-level services from networked appliances , it is required to identify the user in order to utilize the knowledge of the user’s behavior and situation without any explicit manipulation of computers. Therefore, it is desirable to have a realizable face recognition system that can identify the user with high speed and high accuracy. In the literature, many algorithms have been proposed for face recognition[1-5]. The fractal model has been applied in many applications[6-14] and recently it has been devised as a new approach to face recognition [6-10]. Up to present, the face recognition approaches based on fractal model are implemented using the whole face with hair. It is expected that the fractal based face recognition system can be further improved. In this paper we propose some new techniques to improve the performance. Firstly, we propose an algorithm to extract the dominant facial region from the face image that includes the most discriminated part of the face, so that the influence of different hair styles can be removed. Then dominant facial region is presented by its fractal model and stored in database. The fractal model of the dominant facial region is thus utilized as fractal facial features to realize the face recognition. Note that the dominant facial region is a square region of the face including eyes, nose F.E. Sandnes et al. (Eds.): UIC 2008, LNCS 5061, pp. 746–760, 2008. © Springer-Verlag Berlin Heidelberg 2008
Fractal Model Based Face Recognition for Ubiquitous Environments
747
and mouth, so that the influence of the different hair styles can be removed. Secondly, the weighting mask and DC_Free MSE (mean square error) are proposed in recognition stage. For the same person it may be with some different facial expressions, such as a smile, an angry or a sad expression. The weighting mask is proposed to enhance the performance of the system against some variation in expressions. Moreover, in different lighting condition the recognition error for the same person may be variant greatly. The DC_Free MSE is presented to eliminate the effect of lighting variation. The rest of this paper is organized as follows. Section 2 proposes an algorithm to extract the dominant facial region from the face image. Section 3 shows how to obtain the fractal facial features from the dominant facial region and presents the new techniques of weighting mask and DC_Free MSE to construct an improved face recognition system. Experimental results and conclusion are given in section 4 and 5 respectively.
2 The Dominant Facial Region Extraction This section presents a method to extract the dominant facial region from the input face image. We deal with the face image by Sobel Operator and binarization operator[14] to get the binary image with enhanced contour of the face. Then the binary image will be normalized to a preset size and is dealt with histogram equalization to reduce the influences of the illumination. There are seven steps needed to locate the dominant facial region of a face image. Step 1: The face image, as shown in Fig. 1, is first treated by Sobel operator and binarization operator to get the binary image as shown in Fig. 2. By scanning from left side to right side and summarizing the number of the mark pixels in the resultant binary image column by column, when one column contains more than 8 mark pixels with
Fig. 1. The face image
748
S.-S. Wang, S.-W. Lin, and C.-M. Cho
Fig. 2. The binary image
value “1” and appears first, we consider this column as the left boundary of the head. By the same way, we can find the right boundary of the head. Note that the reason for choosing 8 mark pixels is to reduce the effect of noise. Step 2: Detect the top of the head which is the upper boundary of the head. Step 3: With row by row scanning we locate the vertical position of the neck by finding the smallest distance between right boundary and left boundary of the face in binary image. Step 4: We intend to locate the vertical position of the eyes and the mouth. In view of Fig. 3, we denote the distance between head and eyes as D_he, distance between head and mouth as D_hm and distance between head and neck as D_hn. By the fact that the vertical position of the eyes is about (0.43~0.50)*D_hn from the top of the head, we find the vertical projection of the region and its maximum is the vertical position of the
Fig. 3. Some distance defined in the binary image
Fractal Model Based Face Recognition for Ubiquitous Environments
749
eyes. By the same way, we can locate the mouth in the face by the knowledge that the vertical position of the mouth is at (0.73~0.85)*d_hn from the top of the head. Step 5: We intend to locate the vertical centerline of a face by the symmetric property of the face. We define a search window that enclosed the face. Since the face is symmetric with respect to its vertical centerline, based on this property we define a symmetric measurement to search the vertical center line. For each pixel with coordinate ( i , j ) in the search window, a symmetric measurement is set as R
s (i, j ) = ∑ f (i, j + k ) − f (i, j − k )
(2.1)
i =1
where R is a constant indicating the range for computing the symmetric measurement and f (i, j) is the gray value on the position (i, j). The vertical centerline is found to be the vertical line with smallest s(i,j). Step 6: We intend to locate the location the horizontal position of eyes. Denoting the width of head as D_hw, we can locate the horizontal position of right eye at the region about (center - 0.2*D_hw ~ center-0.15*D_hw). By the same way, the horizontal position of left eye is at the region about (center + 0.15*D_hw ~ center + 0.2*D_hw). Then we just find the maximum projection of the region and which is the place where the eyes lie. Step 7: We will locate the dominant facial region of the face image, as shown in Fig. 4. First, we define a vertical margin of face as margin_y, which is equal to D_hn/13. Then the upper boundary of the face image we want to locate is at eyes-2*D_hn/13 and the lower boundary of face is at mouth-D_hn/13. Now we can get the width of face is face_size = | (mouth-D_hn/13) (eye-2*D_hn/13) |. Based on the center line of face what we get at step 5, we can set the left and right boundary of the dominant facial region of the face image as (center-face_size/2) and (center+face_size/2), respectively.
-
Fig. 4. The square region marked by square on the face is the dominant facial region
After we obtain the dominant facial region using above steps, we normalize the dominant facial region into the preset dimension. And the histogram equalization[14] is applied to the normalized dominant facial region. The lighting effect in dominant facial region can thus be reduced. Fig. 5 shows the effect of histogram equalization.
750
S.-S. Wang, S.-W. Lin, and C.-M. Cho
(a)
(b)
Fig. 5. (a) before histogram equalization (b) after histogram equalization
3 Face Recognition Using Fractal Model After we extract the dominant facial region followed by applying normalization and histogram equalization, we will apply the fractal image coding [9,11] to obtain the corresponding fractal facial features. Fig 6 illustrates the concept of face recognition by fractal model. There are two stages for face recognition: training stage and recognition stage. The first step in training stage is to extract fractal facial feature from the dominant facial region. In order to obtain the fractal facial features of a dominant facial region, the dominant facial region is partitioned into non-overlapping smaller blocks (range blocks) and overlapping larger blocks (domain blocks). For each range block, a full search is done to find a domain block whose contractive transformation best approximates the range block. A distance metric such as root mean square (RMS) can be used to find the approximation error. The parameters of the related transformations for each range block are then stored. The union of these parameters represents the fractal facial features of the face image. In the recognition stage the face recognition can be achieved by using the fractal facial features obtained in training stage. The dominant facial region of the input face image is iterated once by the fractal facial feature of each object (person) in face database. Then, the distances from the dominant facial region of the input face image to that of each iterated face image are calculated using mean square error (MSE). which is defined as
MSE =
1 N2
N
N
i =1
j =1
∑∑
( A ij − B ij ) 2 where 1 ≤ i , j ≤ N
where N is the size of the block,
(3.1)
Aij and Bij are the gray value of pixels in block A
and block B. Finally, we normalize all the MSEs between 0~1 by maximum MSE among them to define as recognition error (RE):
RE i =
MSE i max ( MSE i ) 1≤ i ≤ N
(3.2)
Fractal Model Based Face Recognition for Ubiquitous Environments
751
MSEi is the MSE between the input face image and the iterated face image using fractal facial features of object i . The minimum RE is the smallest REi ; that is, RE= min RE i . If the minimum RE is less than a threshold, the input face image is where
i
recognized as the corresponding object, else the input face image is considered not in
Fig. 6. Face recognition by fractal model
After first iteration Fig. 7(a). The face image of the person No.6 is iterated once by itself fractal facial feature
After first iteration Fig. 7(b). The face image of the person No.1 is iterated once by the fractal facial feature of person No.6
752
S.-S. Wang, S.-W. Lin, and C.-M. Cho
Fig. 7(c). The recognition errors for input face image of person No. 6. Note that minimum
RE6
is the
RE
the face database. Fig 7 shows an example that a person No.6 is recognized by fractal-based face recognition approach. To further improve the performance of the face recognition system , we propose two techniques: Weighting mask and DC_Free MSE. (A) Weighting Mask: Here, a weighting mask for calculating MSE is proposed to remove the influence of facial expressions. Fig 8 shows the T-shape weighting mask in the face region. The square region is the dominant facial region and the weighting of the pixels in T-shape region is 1.0 and the weighting of the pixels outside the T-shape region is 0.1. The weighting is set by the observation that in the T-shape region which included eyes and nose there is rare change for different face expressions. Thus the minimum recognition
L
2L/3
L/4
L/3
(a) the T- shape weighting mask in the facial
(b) the region of the weighting mask
Fig. 8. The T-shape weighting mask and the corresponding region
Fractal Model Based Face Recognition for Ubiquitous Environments
753
error of the same person in database is almost invariant even for different expressions. The experimental results will be shown in section 4. (B) DC_Free MSE: Fig 9 shows the facial images under different lighting condition. In brighter or darker lighting condition the MSE between the dominant facial region of input face image and that of iterated face image may vary greatly for the same person. So we propose the DC_Free MSE to avoid the effect of the variant lighting condition. The idea is to delete the average component of the dominant facial region from the MSE in the recognition stage. The DC_Free MSE is defined as:
DC_Free MSE =
1 N N ( (( Aij − avgA ) − ( Bij − avgB )) 2 ) 2 ∑∑ N i =1 j =1
where NxN is the block size.
Aij and Bij
(3.3)
are the gray value of pixels in block A and
block B. avgA and avgB are the average gray value of block A and block B, respectively. The recognition error based on DC_Free MSE is given by
RE i =
DC_Free MSE i max ( DC_Free MSE i )
(3.4)
1≤i ≤ N
where
DC - Free MSE i is the DC_Free MSE of the input face image and iterated
face image for person No. i , and N is the number of objects in face database.
(a)
(b)
(c)
Fig. 9. (a) dominant facial region in normal lighting condition. (b) dominant facial region in lower lighting condition. (c) dominant facial region in brighter lighting condition.
Finally, an improved face recognition system based on the weighting mask and DC_Free MSE can be constructed. The procedure is summarized as following: Training Stage: Step1: Extract the dominant facial region from an input face image. Step2: Obtain the fractal facial features by fractal coding of dominant facial region. Step3: Store the fractal facial features of the dominant facial region in face database. Recognition Stage: Step1: Extract the dominant facial region from an input face image.
754
S.-S. Wang, S.-W. Lin, and C.-M. Cho
Step2: Iterate the dominant facial region of input face image by fractal facial features of each person in face database. Step3: Calculate all the recognition errors with DC_Free MSE and weighting mask Step4: Find the object (person) with minimum recognition error.
4 Experimental Results In order to assess the performance of the proposed algorithm, several experiments are carried out on a database consisting of 30 front view face images of size 320x240 with 256 gray levels. The whole experiment is conducted in two stages. In first stage, the dominant facial region of the face image will be located, extracted and normalized. Fig. 10 shows the dominant facial region of total 30 candidates. The dimension of the facial feature after normalization is 128x128, and all the fractal facial features of the face image are obtained and stored. Here, the domain block size and the range block size are
No.1
No.2
No.3
No.5
No.6
No.7
No.9
No.10
No.11
No.4
No.8
No.12
Fig. 10. Extracted dominant facial region of 30 persons in face database
Fractal Model Based Face Recognition for Ubiquitous Environments
No.13
No.14
No.15
No.16
No.17
No.18
No.19
No.20
No.21
No.22
No.23
No.24
No.25
No.26
No.29
No.30 Fig. 10. (continued)
No.27
No.28
755
756
S.-S. Wang, S.-W. Lin, and C.-M. Cho
set to 4x4 and 8x8 pixels, respectively. In the second stage, the face recognition procedure is performed. 4.1 The Experimental Results of Face Recognition by Fractal Model In the first experiment the lighting condition in the surrounding is well controlled. Fig 11 shows the partial results when person No. 9 and person No. 22 are input for recognition. As expected, it reveals that the recognition error corresponding to itself is the minimum one among all. And, when all persons are input for recognition the overall experimental results demonstrate that 100% recognition rate is achieved.
(1)The RE for input face of person No. 9
(2) The RE for input face of person No. 22
Fig. 11. Shows the partial results for two input face images
4.2 The Experimental Results with Weighting Mask In second experiment a weighting mask is applied for the case that the person No.13 is input for recognition. The face images of person No.13 with several facial expressions, as shown in Fig 12, are input for recognition and the recognition errors of them are
(a) A normal face
(b) a smile face
(c) a sad face
Fig. 12. The person No.13 with different expressions
Fractal Model Based Face Recognition for Ubiquitous Environments
757
(a) The recognition error of the person No.13 with no
(b) The recognition error of the person No.13 with a
expression (minimum recognition error is 0.385)
smile (minimum recognition error is 0.452)
(c) The recognition error of the person No.13 with sadness(minimum recognition error is 0.468)
Fig. 13. The variation of minimum Recognition Error for the same person with different expressions
(a) The recognition error of the person No.13 with
(b) The recognition error of the person No.13 with smile
normal expression (minimum recognition error is
expression(minimum recognition error is 0.291)
0.287)
Fig. 14. The minimum recognition error remains close for the same person with different facial expressions
758
S.-S. Wang, S.-W. Lin, and C.-M. Cho
(c) The recognition error of the person No.13 with sad expression(minimum recognition error is 0.298)
Fig. 14. (continued)
calculated. Fig 13 shows the variation of recognition errors without weighting mask for the same person. Fig 14 shows the recognition errors with weighting mask. It is seen that by using weighting mask the minimum recognition errors for the same person with different expressions are close. This demonstrates that the influence of different expression can be overcome by using weighting mask. 4.3 The Experiment Results with DC_Free MSE In third experiment the lighting condition in surrounding is not well controlled in recognition stage. The luminance of input face could be influenced by lighting condition. Fig 15 shows the faces that are extracted under normal, darker, brighter lighting condition. Table 1 shows the recognition errors in accordance with MSE as well as DC_Free MSE.
(a)
(b)
(c)
Fig. 15. (a) the face in normal lighting condition (b) the face in darker lighting condition and (c) the face in brighter lighting condition
Fractal Model Based Face Recognition for Ubiquitous Environments
759
Table 1. The recognition errors of MSE and DC_Free MSE
Normal Face
Darker Face
Brighter Face
MSE
177.63
439.81
587.21
DC_Free MSE
170.59
164.01
169.21
It is found that the MSEs for normal face, darker face and brighter face are quite different, but their DC_Free MSEs are close. This demonstrates that using DC_Free MSE the influence of the lighting condition can be removed. 4.4 The Experimental Results with Weighting Mask and DC_Free MSE In this experiment the all faces in face database are recognized by the improved face recognition system which corporates the proposed techniques of weighting mask and DC_Free MSE.
(1)The RE for input face of person No. 9
(2) The RE for input face of person No. 22
Fig. 16. The partial experiment results recognized by the improved face recognition system
Fig. 16 shows the partial results when person No. 9 and person No. 22 are input for recognition. By comparing Fig. 16 with Fig.11, it is found that better performance is achieved for the improved face recognition system. Extensive experiments demonstrate the high recognition rate for the input face image with different expressions and different lighting conditions.
5 Conclusion In this paper, we have proposed an improved face recognition system using dominant facial region extraction and fractal model. In order to improve the performance of the face recognition system, we propose an algorithm to extract the dominant facial region
760
S.-S. Wang, S.-W. Lin, and C.-M. Cho
from the face images that includes the most discriminated part of the face, and then for each dominant facial region it was presented by its fractal model and stored in database. The fractal model of the dominant facial region is thus utilized as fractal facial features for face recognition. To further improve the performance of the face recognition system, we also propose the techniques of weighting mask and DC_Free MSE. Finally, the experiment demonstrates that high recognition rate can be achieved. Our approach needs less computational time in recognition and when a new face is added to the database the whole database needs not to be reconstructed.
References 1. Lawrencs, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face Recognition: A convolutional Neural Network Approach. IEEE Trans. on PAMI 20, 673–686 (1998) 2. Sakaue, F., Shakunaga, T.: Combination of projectional and locational decompositions for robust face recognition. In: Proc. IEEE Int. Workshop on Analysis and Modeling of Faces and Gestures, pp. 407–421 (2005) 3. Sakaue, F., Kobayashi, M., Migita, T., Shakunaga, T.: A real-life test face recognition system for dialogue interface robot in ubiquitous environments. In: The 18th Int. Conference on Pattern Recognition, pp. 1155–1160 (2006) 4. Lam, K.M., Yan, H.: An Analytic-to-Holistic Approach for Face Recognition Based on a Single Frontal View. IEEE Trans. on PAMI 20, 670–688 (1998) 5. Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3, 71–86 (1991) 6. Kouzani, A.Z., He, F., Sammut, K.: Towards invariant face recognition. Infornation Sciences 123, 75–101 (2000) 7. Tan, T., Yan, H.: Face Recognition by Fractal Transformation. In: IEEE International Conference on ASSP, pp. 3537–3540 (1999) 8. Jacquin, A.E.: Fractal Image Coding: A Review. Proc. of the IEEE 81(10), 1451–1465 (1993) 9. Lu, N.: Fractal Imaging. Academic Press, London (1997) 10. Wu, H.-Y.: A System of facial feature extraction and recognition, master thesis, Department of Electrical Engineering, Tatung University (1999) 11. Li, C.-H.: Two Level Fractal Image Coding Algorithm, master thesis, Department of Electrical Engineering, Tatung University (1996) 12. Jacobs, A.E., Fisher, Y., Boss, R.D.: Image coding based on a fractal theory of iterated contractive image transformations. IEEE Trans. Image Processing 1(1), 18–30 (1992) 13. Jacquin, A.E.: Fractal image coding: A Review. Processing of the IEEE 81(10), 257–261 (1993) 14. Jain, A.K.: Fundamentals of Digital Image Processing. Prentice Hall, New Jersey (1986)
Author Index
Apduhan, Bernady O.
61, 397
Govindaraju, Venu 75 Gu, Gaung-Hui 297
Bagula, Antoine B. 453 Becker, Benjamin 216 Beigl, Michael 647 Berbers, Yolande 46 Berchtold, Martin 647 Berger, Michael 143 Blumenthal, Jan 20 Bosse, Tibor 229 Braun, Iris 143 Bunoza, Ana 258 Catarinucci, Luca 119 Chamizo, Javier 105 Chang, Jae-Woo 677 Chang, Tsun-Wei 720 Chang, Yao-Jen 284 Chang, Yueh-Tsun 697 Cheah, Zi Bin 271 Chen, Bo-Wei 616 Chen, Zhaoji 3 Cho, Cheng-Ming 746 Cho, Sung-Bae 158, 535 Choi, Hyohyun 363 Choi, Keunho 386 Chowdhury, Mohammad M.R. Colella, Riccardo 119 Cui, Min-Woo 323 Dargie, Waltenegus 143 Decker, Christian 647 Deegan, Mark 490 DiBari, Angelo 119 Du, Kejun 187 Esposito, Alessandra
119
Fang, Hui 468 Faruk, A.B.M. Omar Filho, Julio Oliveira Fu, Ping-Fang 352
271 258
Garnes, H˚ avard Husev˚ ag 271 G´ omez, Juan Miguel 105
Haaland Thorsen, Kari Anne 626 Han, Guangjie 439 Han, Jong-Wook 548, 591 Harada, Fumiko 411 Hauswirth, Manfred 439 Herrmann, Klaus 20 Hong, Dowon 338, 548 Hong, Seung-Tae 677 Hoogendoorn, Mark 229 Hsu, Wen-Jing 468 Huang, Benxiong 426 Huang, Chen 426 Huang, Chung-Ming 636 Huang, Kuo-Feng 352 Huang, Runhe 61, 397 Huang, Yo-Ping 697, 720 Huber, Manuel 216 Hussain, Chauhdary Sajjad 323 Hwang, Ren-Hung 201 Indulska, Jadwiga
105
1
Jaatun, Martin Gilje 271, 602 Jayaraman, Bharat 75 Jeong, Ik Rae 338 Jeong, Young-Sik 591 Jho, Nam-Su 338 Jin, Qun 61 Kang, Joenil 548 Kashif, Ali 311 Kawamura, Takahiro 578 Kawashima, Tomomi 61 Kim, Dong-Sub 311 Kim, Jung-Hwan 311, 323 Kim, Kee-Bum 323 Kim, Kyoung-Yun 386 Kim, Tai-hoon 35 Klein, Michel C.A. 229 Klinker, Gudrun 216 Kuan, Ta-Wen 297 Kulyukin, Vladimir 244
762
Author Index
Kutiyanawala, Aliasgar 244 Kwon, Ohbyung 90, 386 Lee, Deok-Gyu 591 Lee, KyungHee 548 Lee, Yonnim 90 Lee, Young-Seol 535 Liang, Yan 482 Liao, Zhensong 35 Lin, Daw-Tung 710 Lin, Feiyu 687 Lin, Su-Wei 746 Line, Maria B. 271 Liu, Li-Wei 710 Locatelli, Marco P. 505 Loregian, Marco 505 Louta, Malamati 520 Ma, Jianhua 61, 397 MacCormac, Daniel 490 Maeng, YoungJae 548 Mazandu, Kuzamunu G. 453 Meersman, Robert 169 Menon, Vivek 75 Ming, Xue 311 Mohaisen, Abedelaziz 338, 548 Mtenzi, Fred 490 Nakamura, Yuichi 131 Nakanishi, Kei 397 Noll, Josef 105 Nyang, DaeHun 338, 548 O’Driscoll, Ciaran 490 O’Shea, Brendan 490 Ohsuga, Akihiko 578 Øyan, Petter 2 Pan, Liuqing 426 Park, Han-Saem 158 Park, Jong Hyuk 35, 591 Park, Myong-Soon 311, 323 Pelechano, Vicente 662 Preuveneers, Davy 46 Pyykk¨ onen, Mikko 563 Qin, Weijun
187
Reynolds, Vinny 439 Riedel, Till 647
Riekki, Jukka 563 Rong, Chunming 482, 626 Rosenstiel, Wolfgang 258 Rothermel, Kurt 20 Rudolph, Larry 468 Sanchez, Ivan 563 Sandkuhl, Kurt 687 Sandnes, Frode Eika 697, 720 Schuhmann, Stephan 20 Schøyen, Anne Liseth 2 Serral, Estefan´ıa 662 Shi, Yuanchun 187 Shimakawa, Hiromitsu 411 Shon, Taeshik 363 Shu, Lei 439 Sommer, J¨ urgen 258 Song, In-Jee 158 Springer, Thomas 143 Sun, Jun-Zhao 373 Sun, Zheng-Wei 616 Sung, Jeng-Shian 732 Sørensen, Jan Tore 602 Takada, Hideyuki 411 Tang, Yan 169 Tarricone, Luciano 119 Timmermann, Dirk 20 Treur, Jan 229 Tsai, Chang-Zhou 636 Tsai, Shan-Yi 201 Umezu, Keisuke
578
Valderas, Pedro
662
Wang, Chiung-Ying 201 Wang, Jhing-Fa 297, 616 Wang, Jia-chang 297, 616 Wang, Jun-Xuan 352 Wang, Shuenn-Shyang 746 Wang, Xinmei 426 Wang, Ying-Hong 352 Wedum, Petter 271 Wu, Shang-Yao 284 Wustmann, Patrick 143 Yamahara, Hiroyuki 411 Yang, Laurence Tianruo 35 Yau, Stephen S. 3 Yen, Wei 732 Yeo, Sang Soo 591
Author Index Yi, Zaiyao 426 Yu, Zhiwen 131 Zappatore, Marco 119 Zhang, Daqing 187
Zhang, Lin 439 Zhou, Xingshe 131 Zhou, Zhangbing 439 Zou, Deqing 35
763