George A. Tsihrintzis, Ernesto Damiani, Maria Virvou, Robert J. Howlett, and Lakhmi C. Jain (Eds.) Intelligent Interactive Multimedia Systems and Services
Smart Innovation, Systems and Technologies 6 Editors-in-Chief Prof. Robert James Howlett KES International PO Box 2115 Shoreham-by-sea BN43 9AF UK E-mail:
[email protected]
Prof. Lakhmi C. Jain School of Electrical and Information Engineering University of South Australia Adelaide, Mawson Lakes Campus South Australia SA 5095 Australia E-mail:
[email protected]
Further volumes of this series can be found on our homepage: springer.com Vol. 1. Toyoaki Nishida, Lakhmi C. Jain, and Colette Faucher (Eds.) Modeling Machine Emotions for Realizing Intelligence, 2010 ISBN 978-3-642-12603-1 Vol. 2. George A. Tsihrintzis, Maria Virvou, and Lakhmi C. Jain (Eds.) Multimedia Services in Intelligent Environments – Software Development Challenges and Solutions, 2010 ISBN 978-3-642-13354-1 Vol. 3. George A. Tsihrintzis and Lakhmi C. Jain (Eds.) Multimedia Services in Intelligent Environments – Integrated Systems, 2010 ISBN 978-3-642-13395-4 Vol. 4. Gloria Phillips-Wren, Lakhmi C. Jain, Kazumi Nakamatsu, and Robert J. Howlett (Eds.) Advances in Intelligent Decision Technologies – Proceedings of the Second KES International Symposium IDT 2010, 2010 ISBN 978-3-642-14615-2 Vol. 5. Robert James Howlett (Ed.) Innovation through Knowledge Transfer, 2010 ISBN 978-3-642-14593-3 Vol. 6. George A. Tsihrintzis, Ernesto Damiani, Maria Virvou, Robert J. Howlett, and Lakhmi C. Jain (Eds.) Intelligent Interactive Multimedia Systems and Services, 2010 ISBN 978-3-642-14618-3
George A. Tsihrintzis, Ernesto Damiani, Maria Virvou, Robert J. Howlett, and Lakhmi C. Jain (Eds.)
Intelligent Interactive Multimedia Systems and Services
123
Prof. George A. Tsihrintzis
Prof. Robert J. Howlett
Dept. of Informatics University of Piraeus Piraeus 185 34, Greece E-mail:
[email protected]
KES International P.O. Box 2115, Shoreham-by-Sea BN43 9AF, UK Email:
[email protected] Tel.: +44 2081 330306 Mob.: +44 7905 987544
Prof. Ernesto Damiani Universita degli Studi di Milano Dipto. Tecnologie dell´Informazione Via Bramante, 65 26013 Crema, Italy E-mail:
[email protected]
Prof. Maria Virvou Dept. of Informatics University of Piraeus Piraeus 185 34, Greece E-mail:
[email protected]
Prof. Lakhmi C. Jain School of Electrical and Information Engineering, University of South Australia, Adelaide, Mawson Lakes Campus, South Australia SA 5095, Australia E-mail:
[email protected]
ISBN 978-3-642-14618-3
e-ISBN 978-3-642-14619-0
DOI 10.1007/978-3-642-14619-0 Smart Innovation, Systems and Technologies
ISSN 2190-3018
Library of Congress Control Number: 2010930914 c 2010 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed on acid-free paper 987654321 springer.com
Foreword
KES International (KES) is a worldwide organisation that provides a professional community and association for researchers, originally in the discipline of Knowledge Based and Intelligent Engineering Systems, but now extending into other related areas. Through this, KES provides its members with opportunities for publication and beneficial interaction. The focus of KES is research and technology transfer in the area of Intelligent Systems, i.e. computer-based software systems that operate in a manner analogous to the human brain, in order to perform advanced tasks. Recently KES has started to extend its area of interest to encompass the contribution that intelligent systems can make to sustainability and renewable energy, and also the knowledge transfer, innovation and enterprise agenda. Involving several thousand researchers, managers and engineers drawn from universities and companies world-wide, KES is in an excellent position to facilitate international research co-operation and generate synergy in the area of artificial intelligence applied to real-world ‘Smart’ systems and the underlying related theory. The KES annual conference covers a broad spectrum of intelligent systems topics and attracts several hundred delegates from a range of countries round the world. KES also organises symposia on specific technical topics, for example, Agent and Multi Agent Systems, Intelligent Decision Technologies, Intelligent Interactive Multimedia Systems and Services, Sustainability in Energy and Buildings and Innovations through Knowledge Transfer. KES is responsible for two peer-reviewed journals, the International Journal of Knowledge based and Intelligent Engineering Systems, and Intelligent Decision Technologies: an International Journal. KES supports a number of book series in partnership with major scientific publishers. Published by Springer, ‘Smart Innovative Systems and Technologies’ is the KES flagship book series. The aim of the series is to make available a platform for the publication of books (in both hard copy and electronic form) on all aspects of single and multi-disciplinary research involving smart innovative systems and technologies, in order to make the latest results available in a readily-accessible form. The series covers systems that employ knowledge and intelligence in a broad sense. Its focus is systems having embedded knowledge and intelligence, which may be applied to the solution of world industrial, economic and environmental problems and the knowledge-transfer methodologies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of scientific and technological disciplines.
VI
Foreword
Examples of applicable areas to be covered by the series include intelligent decision support, smart robotics and mechatronics, knowledge engineering, intelligent multi-media, intelligent product design, intelligent medical systems, smart industrial products, smart alternative energy systems, and underpinning areas such as smart systems theory and practice, knowledge transfer, innovation and enterprise. The series includes conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes take responsibility for ensuring that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles.
Professor Robert J. Howlett Executive Chair, KES International Visiting Professor, Enterprise: Bournemouth University United Kingdom
Preface
This volume contains the Proceedings of the 3nd International Symposium on Intelligent Interactive Multimedia Systems and Services (KES-IIMSS 2010). This third edition of the KES-IIMSS Symposium was jointly organized by the Department of Informatics of the University of Piraeus, Greece and the Department of Information Technologies of the University of Milan, Italy in conjunction KES International. KES-IIMSS is a new series of international scientific symposia aimed at presenting novel research in the fields of intelligent multimedia systems relevant to the development of a new generation of interactive, user-centric services. The major theme underlying this year’s symposium is the rapid integration of multimedia processing techniques within a new wave of user-centric services and processes. Indeed, pervasive computing has blurred the traditional distinction between conventional information technologies and multimedia processing, making multimedia an integral part of a new generation of IT-based interactive systems. KES-IIMSS symposia, following the general structure of KES events, aim at providing an internationally respected forum for presenting and publishing high-quality results of scientific research while allowing for timely dissemination of research breakthroughs and novel ideas via a number of autonomous special sessions and workshops on emerging issues and topics identified each year. KES-IIMSS-2010 co-located events include: (1) the International Workshop Human-Computer Interaction in Knowledge-based Environments, (2) the International Workshop on Interactive Multimodal Environment and (3) two invited sessions respectively on Intelligent Healthcare Information Management, Pervasive Systems for Healthcare. KES-IIMSS-2010 is also colocated with the 2nd International Symposium on Intelligent Decision Technologies (KES-IDT-2010).
VIII
Preface
We are very satisfied of the quality of the program and would like to thank the authors for choosing KES-IIMSS as the forum for presentation of their work. Also, we gratefully acknowledge the hard work of KES-IIMSS international program committee members and of the additional reviewers for selecting the accepted conference papers.
General Co-chairs Maria Virvou Ernesto Damiani George A. Tsihrintzis Executive Chair R. J. Howlett Honorary Chair Prof. Lakhmi C.Jain Liaison Chair - Asia: Prof. Toyohide Watanabe , Nagoya University , Japan Programme Coordinator: Dr. Marco Anisetti, University of Milan, Italy General Track Coordinator: Dr. Valerio Bellandi, University of Milan, Italy
Contents
Adapting Spreading Activation Techniques towards a New Approach to Content-Based Recommender Systems . . . . . . . . . Yolanda Blanco-Fern´ andez, Mart´ın L´ opez-Nores, Jos´e J. Pazos-Arias
1
A Framework for Automatic Detection of Abandoned Luggage in Airport Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Grzegorz Szwoch, Piotr Dalka, Andrzej Czy˙zewski
13
Modeling Student’s Knowledge on Programming Using Fuzzy Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Konstantina Chrysafiadi, Maria Virvou
23
Camera Angle Invariant Shape Recognition in Surveillance Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Ellwart, A. Czy˙zewski
33
Multicriteria-Based Decision for Services Discovery and Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youn`es El Bouzekri El Idrissi, Rachida Ajhoun, M.A. Janati Idrissi
41
Building a Minimalistic Multimedia User Interface for Quadriplegic Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Constantinos Patsakis, Nikolaos Alexandris
53
Biofeedback-Based Brain Hemispheric Synchronizing Employing Man-Machine Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaszuba Katarzyna, Kopaczewski Krzysztof, Odya Piotr, Kostek Bo˙zena Performance of Watermarking-Based DTD Algorithm under Time-Varying Echo Path Conditions . . . . . . . . . . . . . . . . . . Andrzej Ciarkowski, Andrzej Czy˙zewski
59
69
X
Contents
Applying HTM-Based System to Recognize Object in Visual Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hoai-Bac Le, Anh-Phuong Pham, Thanh-Thang Tran Constant Bitrate Image Scrambling Method Using CAVLC in H.264 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Junsang Cho, Gwanggil Jeon, Jungil Seo, Seongmin Hong, Jechang Jeong
79
91
Color Image Restoration Technique Using Gradient Edge Direction Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Gwanggil Jeon, Sang-Jun Park, Abdellah Chehri, Junsang Cho, Jechang Jeong Watermarking with the UPC and DWT . . . . . . . . . . . . . . . . . . . . . 111 Evelyn Brannock, Michael Weeks Building a Novel Web Service Framework – Through a Case Study of Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Hei-Chia Wang, Wei-Chun Chang, Ching-seh Wu An Architecture for Collaborative Translational Research Utilizing the Honest Broker System . . . . . . . . . . . . . . . . . . . . . . . . . 137 Christopher Gillies, Nilesh Patel, Gautam Singh, Ishwar Sethi, Jan Akervall, George Wilson Simulated Annealing in Finding Optimum Groups of Learners of UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Kalliopi Tourtoglou, Maria Virvou A Smart Network Architecture for e-Health Applications . . . . 157 Abdellah Chehri, Hussein Mouftah, Gwanggil Jeon A Music Recommender Based on Artificial Immune Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Aristomenis S. Lampropoulos, Dionysios N. Sotiropoulos, George A. Tsihrintzis The iCabiNET System: Building Standard Medication Records from the Networked Home . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Mart´ın L´ opez-Nores, Yolanda Blanco-Fern´ andez, Jos´e J. Pazos-Arias, Jorge Garc´ıa-Duque Multi-agent Framework Based on Web Service in Medical Data Quality Improvement for e-Healthcare Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Ching-Seh Wu, Wei-Chun Chang, Nilesh Patel, Ishwar Sethi
Contents
XI
Towards a Unified Data Management and Decision Support System for Health Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Robert D. Kent, Ziad Kobti, Anne Snowdon, Akshai Aggarwal A Glove-Based Interface for 3D Medical Image Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Luigi Gallo Open Issues in IDS Design for Wireless Biomedical Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Luigi Coppolino, Luigi Romano Context-Aware Notifications: A Healthcare System for a Nursing Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Sandra Nava-Mu˜ noz, Alberto L. Mor´ an, Victoria Meza-Kubo Multimodality in Pervasive Environment . . . . . . . . . . . . . . . . . . . . 251 Marco Anisetti, Valerio Bellandi, Paolo Ceravolo, Ernesto Damiani Optimizing the Location Prediction of a Moving Patient to Prevent the Accident . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Wei-Chun Chang, Ching-Seh Wu, Chih-Chiang Fang, Ishwar K. Sethi An MDE Parameterized Transformation for Adaptive User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Wided Bouchelligua, Nesrine Mezhoudi, Adel Mahfoudhi, Olfa Daassi, Mourad Abed Agent Based MPEG Query Format Middleware for Standardized Multimedia Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Mario D¨ oller, G¨ unther H¨ olbling, Christine Webersberger Query Result Aggregation in Distributed Multimedia Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Christian Vilsmaier, David Coquil, Florian Stegmaier, Mario D¨ oller, Lionel Brunie, Harald Kosch Sensor-Aware Web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Marco Anisetti, Valerio Bellandi, Ernesto Damiani, Alessandro Mondoni, Luigi Arnone Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Adapting Spreading Activation Techniques towards a New Approach to Content-Based Recommender Systems Yolanda Blanco-Fern´andez, Mart´ın L´opez-Nores, and Jos´e J. Pazos-Arias
Abstract. Recommender systems fight information overload by selecting automatically items that match the personal preferences of each user. Content-based recommenders suggest items similar to those the user liked in the past by resorting to syntactic matching techniques, which leads to overspecialized recommendations. The so-called collaborative approaches fight this problem by considering the preferences of other users, which results in new limitations. In this paper, we avoid the intrinsic downsides of collaborative solutions and diversify the content-based recommendations by reasoning about the semantics of the user’s preferences. Specifically, we present a novel domain-independent content-based recommendation strategy that exploits Spreading Activation techniques as the reasoning mechanism. Our contribution consists of adapting and extending the internals of traditional SA techniques in order to fulfill the personalization requirements of a recommender system. The resulting reasoning-driven strategy enables to discover additional knowledge about the user’s preferences and leads to more accurate and diverse content-based recommendations. Our approach has been preliminary validated with a set of viewers who received recommendations of Digital TV contents.
1 Introduction Recommender systems provide personalized advice to users about items they might be interested in. These tools are already helping people efficiently manage content overload and reduce complexity when searching for relevant information. The first recommendation strategy was content-based filtering [8], which consists of Yolanda Blanco-Fern´andez, Mart´ın L´opez-Nores, and Jos´e J. Pazos-Arias Department of Telematics Engineering, University of Vigo, 36310, Spain e-mail: {yolanda,mlnores,jose}@det.uvigo.es
Work funded by the Ministerio de Educacin y Ciencia (Gobierno de Espaa) research project TSI2007-61599, by the Consellera de Educacin e Ordenacin Universitaria (Xunta de Galicia) incentives file 2007/000016-0.
G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 1–11. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
2
Y. Blanco-Fern´andez, M. L´opez-Nores, and J.J. Pazos-Arias
suggesting items similar to those the user liked in the past. In spite of its accuracy, this technique is limited due to the similarity metrics employed, which are based on syntactic matching approaches that can only detect similarity between items that share all or some of their attributes [1]. Consequently, traditional content-based approaches lead to overspecialized suggestions including only items that bear strong resemblance to those the user already knows (which are recorded in his/her profile). In order to fight overspecialization, researchers devised collaborative filtering [7] –whose idea is to move from the experience of an individual user’s profile to the experiences of a community of like-minded users (his/her neighbors)–, and even they combined content-based and collaborative filtering in hybrid approaches [4]. Although collaborative (and hybrid) approaches mitigate the effects of overspecialization by considering the interests of other users, they bring in new limitations, such as the sparsity problem (related to difficulties to select each individual’s neighborhood when there is no much knowledge about the users’ preferences), privacy concerns bound to the confidentiality of the users’ personal data, and scalability problems due to the management of many user profiles. The contribution of our paper is a content-based strategy that, instead of considering other individuals’ preferences, diversifies the recommendations by exploiting semantic reasoning about the user’s interests, so that we overcome overspecialization without suffering the intrinsic limitations of collaborative and hybrid solutions. A reasoning-driven recommender system requires three components: an ontology that contains classes and properties referred to the semantic annotations of the available items and their relationships; personal profiles that keep track the items the user (dis)liked along with ratings measuring his/her level of interest in them (typically negative values for unappealing items and positive values for interesting items); and a recommendation strategy that adopts reasoning techniques to infer knowledge about the user’s preferences, by uncovering semantic relationships between the items registered in his/her profile and others formalized in the ontology. For example, if a TV viewer has enjoyed a program about keep-fit, a reasoning-driven recommender would exploit an ontology like that depicted in Fig. 1 to infer that s/he likes personal cares, thus being able to suggest a program about fashion like Personal shopper tips. The so-called Spreading Activation (SA) techniques are especially useful for this purpose because they are able to efficiently manage huge knowledge networks (such as ontologies) working as follows: first, these techniques activate a set of concepts in the considered network; then, after a spreading process based on the relationships modeled in the network, they select other concepts significantly related to those initially activated. As per these guidelines, our idea is to harness SA techniques as follows: the initially activated concepts would be the user’s preferences, while those finally selected would refer to the items recommended by our content-based strategy. For that purpose, our approach must: (i) identify the nodes and links to be modeled in the knowledge network of each user (starting from the classes and properties formalized in the ontology and from his/her profile), and (ii) define the spreading process aimed at processing the user network and selecting our contentbased recommendations. In order to solve overspecialization, our spreading process
Adapting Spreading Activation Techniques towards a New Approach
3
Fig. 1 Subset of classes, properties and instances formalized in a TV ontology.
relies on a reasoning mechanism able to discover complex semantic associations between the user’s preferences and the available items, which are hidden behind the knowledge formalized in the system ontology. Besides, this spreading process puts the focus on the user’s interests, so that the resulting recommendations evolve as his/her preferences change over time. This paper is organized as follows: Section 2 describes the internals and limitations of existing SA techniques to be adopted in a recommender system. Next, Section 3 explains the improvements we propose in order to diversity traditional content-based recommendations by a novel approach to SA techniques. Section 4 describes a sample scenario to illustrate how our reasoning-driven strategy works. Section 5 summarizes some preliminary testing experiences and discusses scalabilityrelated concerns. Finally, Section 6 concludes the paper and outlines possible lines of future work.
2 Spreading Activation Techniques SA techniques are computational mechanisms able to explore efficiently huge generic networks of nodes interconnected by links. According to the guidelines established in [5], these techniques work as follows.
4
Y. Blanco-Fern´andez, M. L´opez-Nores, and J.J. Pazos-Arias
• Each node is associated to a weight (named activation level), so that the more relevant the node in the network, the higher its activation level. Besides, each link joining two nodes has a weight, in a such way that the stronger the relationship between both nodes, the higher the assigned weight. Initially, a set of nodes are selected and the nodes connected with them by links (named neighbor nodes) are activated. In this process, the activation levels of the initially selected nodes are spread until reaching their neighbors in the network. • The activation level of a reached node is computed by considering the levels of its neighbors and the weights assigned to the links that join them to each other. Consequently, the more relevant the neighbors of a given node (i.e. the higher their activation levels) and the stronger the relationship between the node and its neighbors (i.e. the higher the weights of the links between them), the more relevant the node will be in the network. • This spreading process is repeated successively until reaching all the nodes of the network. Finally, the highest activation levels correspond to the nodes that are closest related to those initially selected. We have identified two severe drawbacks that prevent us from exploiting the inferential capabilities of traditional SA techniques in our reasoning-driven recommendation strategy. These drawbacks lie within (i) the kind of links modeled in the considered network, and (ii) the weighting processes of those links. On the one hand, the kind of the modeled links is closely related to the richness of the reasoning processes carried out during the spreading process. These links establish paths to propagate the relevance of the initially activated nodes to other nodes closely related to them. This way, some nodes might never be detected if there are no links reaching them in the network. Existing SA techniques model very simple relationships, which lead to poor inferences and prevent from discovering the knowledge hidden behind more complex associations (see examples in [9, 10, 6, 11]). In other words, if the links model only simple relationships (like those detected by a syntax-driven approach), the recommendations resulting from SA techniques would continue being overspecialized. The second limitation of traditional SA approaches is related to the weighting processes of the links modeled in the network. According to the guidelines described at the beginning of this section, these weights remain invariable over time, because their values depend either on the existence of a relationship between the two linked nodes or on the strength of this relationship. This static weighting process is not appropriate for our personalization process, where it is necessary that the weights assigned to the links of the user’s network enable to: (i) learn automatically his/her preferences from the feedback provided after recommendations, and (ii) adapt dynamically the spread-based inference process as these preferences evolve. In the next section, we will explain how our reasoning-driven approach fights above limitations by extending traditional SA techniques so that they can be adopted in a void-of-overspecialization content-based recommender system.
Adapting Spreading Activation Techniques towards a New Approach
5
3 Our Content-Based Recommendation Strategy First, it is necessary to delimit the knowledge network to be processed by our improved SA techniques. Starting from the system ontology, our strategy creates a network for each user by including both the his/her preferences and other concepts (nodes) strongly related to those interests. In order to identify those concepts, we firstly locate in the domain ontology the items rated in the user’s profile. Next, we traverse successively the properties bound to these items until reaching new class instances in the ontology, referred to other items and their attributes. In order to guarantee computational feasibility, we have developed a controlled inference mechanism that progressively filters the instances of classes and properties that do not provide useful knowledge for the personalization process: Specifically, as new nodes are reached from a given instance, we firstly quantify their relevance for the user by an index named semantic intensity. In order to measure the semantic intensity of a node n, we take into account various ontology-dependent pre-filtering criteria (detailed in [3]), so that the more significant the relationship between a given node and the user’s preferences, the higher the resulting value. Next, the nodes whose intensity indexes are not greater than a specific threshold are disregarded, so that our inference mechanism continues traversing only the properties that permit to reach new nodes from those that are relevant for the user. The following step in our strategy consists of processing the user network by SA techniques. In this regard, we have extended the existing approaches by overcoming the limitations pointed out in Section 2. On the one hand, our approach extends the simple relationships adopted by traditional SA techniques by considering both the properties defined in the ontology and a set of semantic associations (which will be categorized in Section 3.1) inferred from them. This rich variety of relationships permit to establish links that propagate the relevance of the items selected by the pre-filtering phase, leading to diverse enhanced recommendations. On the other hand, to fulfill the personalization requirements of a recommender system, our link weighting process depends not only on the two nodes joined by the considered link, but also on (the strength of) their relationship to the items defined in the user profile, as we will describe in Section 3.2. This way, the links of the network created for the user are updated as our strategy learns new knowledge about his/her preferences, leading to tailor-made recommendations after the spreading process.
3.1 Semantic Associations Once the nodes related to the user’s interests (and the properties linking them to each other) have been selected, our strategy infers semantic associations between the instances referred to items. Specifically, we have borrowed from [2] the following semantic associations: • ρ -path association. In our approach, two items are ρ -pathAssociated when they are linked by a chain or sequence of properties in the ontology (e.g. in Fig. 1, it
6
Y. Blanco-Fern´andez, M. L´opez-Nores, and J.J. Pazos-Arias
is possible to trace a sequence between the movies Chocolat and Paris, je t’aime through the instance referred to their starring actress Juliette Binoche). • ρ -join association. Two items are ρ -joinAssociated when their respective attributes belong to a common class in the domain ontology (e.g. the tourism documentary about Toulouse depicted in Fig. 1 and the movie Paris, je t’aime would be associated through the common class France cities). • ρ -cp association. Two items are ρ -cpAssociated when they share a common ancestor in some hierarchy defined in the ontology (e.g., all the movies depicted at the top of Fig. 1 are associated by the ancestor Fiction Contents). Our strategy harnesses the knowledge learned from the semantic associations in order to draw new links in the user’s network, which improves the reasoning capabilities of traditional SA techniques. Specifically, we incorporate a new link for each semantic association discovered between the items defined in the user’s network. We call real links to those referred to property instances formalized in the ontology, and virtual links to the ones corresponding to the semantic associations inferred from it.
3.2 Weighting of Links in the User’s Network Before selecting our content-based recommendations by the spreading activation process, it is necessary to weigh the links modeled in the user’s network. Instead of considering that the weight of a link between two nodes depends only on the strength of their mutual relationship, our approach imposes two constraints on the links to be weighed. • First, given two nodes joined by a link, we consider that the stronger the (semantic) relationship between the two linked nodes and the user’s preferences, the higher the weight of the link. To measure how relevant a node is for a user, we consider either the rating of this node in his/her profile (if the node is known by the user) or the value of the semantic intensity of this node (otherwise). • Second, the weights are dynamically adjusted as the user’s preferences evolve over time, thus offering permanently updated content-based recommendations. Besides, the weights assigned to the virtual links are lower than those set for the real links. The intuition behind this idea is that the relationship existing between two nodes joined by a real link is explicitly represented in the system ontology by means of properties, while the relationship between two nodes joined by a virtual link has been inferred by a reasoning-driven prediction process.
3.3 Selection of Recommendations: Our Spreading Activation Process For the selection of the items finally recommended to the user, we use an improved spreading activation mechanism. Firstly, we activate in the user’s network the nodes
Adapting Spreading Activation Techniques towards a New Approach
7
referred to the item defined in his/her profile, and assign them an initial activation level equal to their respective ratings. Next, the activation levels of the user’s preferences are propagated through his/her network by using an iterative algorithm, which activates all the nodes in parallel in each iteration. Specifically, the algorithm computes the activation level of each node in the user’s SA network by adding the contribution from all of its neighbor nodes. This contribution considers both the activation level of each neighbor node and the weight of the link (real or virtual) joining it to the considered node. For that reason, the more relevant the neighbors of a node (i.e. higher activation levels) and the stronger the relationships among them and the considered node (i.e. higher weights of links), the more significant this node will be for the user. According to the guidelines of traditional SA techniques (see Section 2), once the spreading process has reached all the nodes in the user’s network, the highest activation levels correspond to items meeting two conditions: (i) their neighbor nodes are also relevant for the user and (ii) they are closely related to the user’s interests. For that reason, these nodes identify the items finally suggested by our content-based strategy.
4 Example of Reasoning-Driven Content-Based Recommendations This section presents a sample scenario in the scope of Digital TV, where we recommend programs to a user U who has enjoyed the comedy romance Chocolat starring Juliette Binoche, the documentary The Falklands crisis: the untold story, and the program Toulouse in a nutshell about the main tourist attractions of this French city. First, our strategy selects in the TV ontology depicted in Fig. 1 the instances that are relevant for U by considering his/her personal preferences. After inferring semantic associations among them, we create the user network depicted in Fig. 2.
Chocolat
The English patient
World War II
Juliette Binoche Toulouse in a nutshell Paris, je t’aime Toulouse
Falklands War
The Falklands crisis: an untold story
rdf:typeOf real links virtual links
Content attributes U’s preferences Recommendations
Paris
Fig. 2 Network used by SA techniques to select content-based recommendations for U
8
Y. Blanco-Fern´andez, M. L´opez-Nores, and J.J. Pazos-Arias
As represented in this network, we have selected class instances that share common ancestors with U’s preferences: • First, the nodes Paris, je t’aime and The English patient are included in U’s network because they are Comedy and Romance movies (respectively) just like the film Chocolat the user has appreciated. According to what we explained in Section 3.1, both ancestors lead us to inferring the following associations between U’s preferences and the nodes of his/her network: ρ -cpAssociated (Chocolat, Paris, je t’aime) and ρ -cpAssociated (Chocolat, The English patient). • Second, the nodes World War II (which is the topic of the movie The English patient) and Paris (which is the city where the movie Paris, je t’aime is settled) are relevant for U because this user has liked other instances belonging to their classes in the ontology (specifically, Falklands war belonging to the War conflicts class, and Toulouse belonging to France cities). These common classes permit to discover the following associations: ρ -joinAssociated (The Falklands crisis: an untold story, The English patient) and ρ -joinAssociated (Toulouse in a nutshell, Paris je t’aime). Once the links of U’s network have been weighted, we process the represented knowledge by our SA techniques in order to select content-based recommendations. After spreading the activation levels of U’s preferences until reaching all the nodes in his/her network, our strategy suggests the TV programs with the highest levels. As per Section 3.3, these programs receive links from other contents which are appealing to the user. This way, our content-based approach suggests to U the movies The English patient and Paris, je t’aime by exploiting the associations inferred between these contents and U’s preferences. • The English patient: The activation level of this movie gets higher thanks to the links from two programs relevant for U: the documentary The Falklands crisis: an untold story and the film Chocolat starring Juliette Binoche. The war topic turns the documentary into a program appealing to the user, whereas Chocolat is relevant because it involves his/her favorite actresses. • Paris, je t’aime: As shown in Fig. 2, Chocolat and Toulouse in a nutshell inject positive weights in Paris, je t’aime node. Both programs are specially relevant for U, thus increasing the activation level of Paris, je t’aime, a movie the user may appreciate due to two reasons: (i) his/her favorite actress takes part in it, and (ii) the movie is set in a city of France, a country that seems to be interesting for U in view of the documentary about customs and tourist attractions U has liked. To conclude, consider the following situation: a program in U’s network receives links from nodes referred to contents the user has rated negatively. Here the weights of the links would be very low, which contributes to decrease the activation level of the program after the spreading process. This reveals an important benefit of our strategy, which is also able to identify contents that must not be recommended to the user because they are (semantically) associated to programs U did not like in the past.
Adapting Spreading Activation Techniques towards a New Approach
9
5 Experimental Evaluation To validate our proposal we have developed a prototype of TV recommender system that works with an ontology containing about 50, 000 nodes referred to specific TV programs and their semantic attributes. The knowledge formalized in our ontology was queried by an OWL-specific API (Application Programming Guide) provided by Protg1 , a free open-source tool that includes mechanisms to create, view and query the classes, properties and specific instances formalized in OWL2 ontologies. Besides, we have developed a tool (named Reasoning Inspector) that permits to understand the kind of semantic associations that lead to our diverse content-based recommendations. In order to implement this tool, we have used an ontology-viewing plugin provided by Protg (TGVizTab3 ) to create and browse interactively generic graphs.
5.1 Preliminary Testing Experiences Our tests involved 150 users (recruited from among our undergraduate students, their relatives and friends), who provided us with both their initial preferences and their relevance feedback about our reasoning-based recommendations. For those purposes, the users accessed a Web form where a list of 200 TV programs was shown, which were classified into a hierarchy of genres to facilitate browsing tasks. The users identified the contents they liked and disliked by assigning specific ratings to each TV program4. The information about the users’ preferences was processed by our validation tool, which was in charge of modeling the users’ profiles and running our reasoningbased strategy. The list of suggested TV programs was e-mailed and feedback about these recommendations was requested to each viewer. Next, our validation tool updated the users’ profiles according to their relevance feedback, and the content-based strategy was executed again to check that the offered recommendations adapted as the users’ preferences evolve over time. The processes of sending recommendations and acquiring relevance feedback were repeated during one week, and conveniently monitored by our Reasoning Inspector. After the 7-days testing period, a questionnaire was e-mailed to each user, asking about his/her perception of our personalization services. Most users (78%) rated as very positive or positive the diversity of our reasoning-based recommendations, whereas only 12% of them remained indifferent towards the received suggestions. Nearly all the users noticed the diverse nature of our recommendations. In fact, many users (about 76%) told us that they did not know some of the suggested TV programs; however, they admitted that the way to relate the programs to their personal preferences was really “ingenious”, 1 2 3 4
http://protege.stanford.edu/ Web Ontology Language: http://www.w3.org/TR/owl-features/. See http://users.ecs.soton.ac.uk/ha/TGVizTab/ for details. Note that these TV programs were shown with a brief synopsis, in such a way that the users could rate even programs they did not known.
10
Y. Blanco-Fern´andez, M. L´opez-Nores, and J.J. Pazos-Arias
“peculiar but appropriate” and even “intelligent”. Lastly, from the questionnaires, we also discovered that most of the users (84%) would be willing to pay a small fee for receiving our recommendations, which evidences the interest of our contentbased approach.
5.2 Computational Viability and Scalability-Related Concerns We have defined some optimization features aimed at ensuring scalability and computational feasibility of our reasoning-driven content-based strategy. Firstly, due to the iterative nature of the algorithm used in our spreading process, our strategy can return suboptimal solutions to guarantee fast responses to the users practically in real-time. Besides, our implementation works with a master server that shares out the computational burden among a set of slaves personalization servers, which return content-based recommendations by running our strategy based on SA techniques and accessing a database that lodges the system ontology and users’ profiles. Our third feature consists of distributing the tasks involved in our strategy among several servers: the ontology server updates the items in the ontology and computes off-line parameters that can be reused as new users log into the recommender system, while the profiles server updates the users’ profiles by adding new preferences and ratings for items. Finally, we maintain multiple instances of the profiles servers and ontology servers. Besides, in order to avoid bottlenecks when accessing the ontology and users’ profiles, each instance of these servers works with a replica of the system database.
6 Conclusions and Further Work This paper fights the overspecialized nature of traditional content-based recommendations, which include only items very similar to those the user already knows (mainly due to the adoption of syntactic matching techniques). The novelty is that we overcome this limitation without considering the preferences of other individuals, which was the solution proposed so far in literature at expenses of causing other severe drawbacks. Our recommendation strategy harnesses the benefits of semantic reasoning over an underlying ontology as a means to discover additional knowledge about the user preferences, enabling to compare them to the available items in a more effective way. This way, instead of suggesting items very similar to those the user liked in the past, our strategy recommends items semantically related to his/her preferences. For that purpose, we have extended existing semantic reasoning mechanisms, so that they can be adopted in a personalization scenario where the focus is put on the user’s preferences. Specifically, we have described how semantic associations and SA techniques fit together in our content-based recommendation strategy: the associations help to diversify the recommendations because they discover hidden relationships between the user’s preferences and the available items, while our improved SA techniques enable (i) to process efficiently the knowledge learned from
Adapting Spreading Activation Techniques towards a New Approach
11
those associations, and (ii) to evolve the recommendations as the user’ preferences change. Our contribution is generic and can be reused in multiple contexts, becoming an easy-to-adopt starting point to implement diverse personalization services. As future work, we plan to carry out a quantitative evaluation driven by accuracy metrics such as MSE, Hit Rate, recall and precision. Besides assessing our reasoning-driven personalization capabilities, we will exploit the data gathered from a greater number of users in order to compare our approach against existing collaborative and hybrid works in terms of performance and personalization quality.
References 1. Adomavicius, G., Tuzhilin, A.: Towards the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17(6), 739–749 (2005) 2. Anyanwu, K., Sheth, A.: ρ -Queries: enabling querying for semantic associations on the Semantic Web. In: Proceeding of the 12th International World Wide Web Conference (WWW 2003), pp. 115–125 (2003) 3. Blanco-Fern´andez, Y., Pazos-Arias, J.J., Gil-Solla, A., Ramos-Cabrer, M., L´opez-Nores, M.: A flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems. Knowledge-Based Systems 21(4), 305–320 (2008) 4. Cornelis, C., Lu, J., Guo, X., Zhang, G.: One-and-only item recommendation with fuzzy logic techniques. Information Sciences 177(1), 4906–4921 (2007) 5. Crestani, F.: Application of Spreading Activation techniques in information retrieval. Artificial Intelligence Review 11(6), 453–482 (1997) 6. Huang, Z., Chen, H., Zeng, D.: Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Trans. on Inform. Systems 22(1), 116–142 (2004) 7. Liu, D., Lai, C., Lee, W.: A hybrid of sequential rules and collaborative filtering for product recommendation. Information Sciences 179(20), 3505–3519 (2009) 8. Pazzani, M., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 325–341. Springer, Heidelberg (2007) 9. Rocha, C., Schawabe, D., Poggi, M.: A hybrid approach for searching in the Semantic Web. In: Proceedings of 13th International World Wide Web Conference (WWW 2004), pp. 74–84 (2004) 10. Stojanovic, N., Struder, R., Stojanovic, L.: An approach for ranking of query results in Semantic Web. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 500–516. Springer, Heidelberg (2003) 11. Troussov, A., Sogrin, M., Judge, J., Botvich, D.: Mining socio-semantic networks using spreading activation techniques. In: Proceedings of the 8th International Conference on Knowledge Management and Knowledge Technologies, pp. 8–16 (2008)
A Framework for Automatic Detection of Abandoned Luggage in Airport Terminal Grzegorz Szwoch, Piotr Dalka, and Andrzej Czy˙zewski
Abstract. A framework for automatic detection of events in a video stream transmitted from a monitoring system is presented. The framework is based on the widely used background subtraction and object tracking algorithms. The authors elaborated an algorithm for detection of left and removed objects based on morphological processing and edge detection. The event detection algorithm collects and analyzes data of all the moving objects in order to detect events defined by rules. A system was installed at the airport for detecting abandoned luggage. The results of the tests indicate that the system generally works as expected, but the low-level modules currently limit the system performance in some problematic conditions. The proposed solution may supplement the existing monitoring systems in order to improve the detection of security threats.
1 Introduction Automatic event detection in surveillance systems is becoming a necessity. Enormous amount of video cameras used in facilities such as shopping malls, public transport stations and airports makes it impossible for human operators to watch and analyze all video streams in the real time. Such systems are typically used only as a forensic tool rather than a preventive or interceptive tool. Therefore, it is easy to miss harmful activities like theft, robbery, vandalism, fight or luggage abandonment as well as frequent events that may be dangerous, like unauthorized presence in restricted areas. In the last few years many publications regarding automatic video surveillance systems have been presented. These systems are usually focused on a single type of human activity. Events regarding human behavior may be divided into three main groups. The first group contains activities that does not involve other persons or objects such as loitering [1] or sudden human pose changes like going Grzegorz Szwoch, Piotr Dalka, and Andrzej Czy˙zewski Gdansk University of Technology, Multimedia Systems Department 80-233 Gdansk, Poland, Narutowicza 11/12 e-mail:
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 13–22. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
14
G. Szwoch, P. Dalka, and A. Czy˙zewski
from standing to lying down that might indicate a pedestrian collapse [2]. The second group includes neutral human interactions like walking together, approaching, ignoring, meeting, splitting [3] and violent ones, such as fist fighting, kicking or hitting with objects [4]. The last group contains human activities that are related to the environment. This includes intrusion or trespassing [5], wrong direction of movement [6], vandalism [7] and luggage abandonment. Determining object stationarity and finding an object left behind (e.g. backpack or briefcase) is a critical task that leads to safety and security of all public transport passengers or shopping mall customers. Abandoned luggage detection was the main topic of PETS (Performance Evaluation of Tracking and Surveillance) Workshop in 2006. The majority of papers presented there employ background subtraction to detect foreground objects that are classified as newly appeared stationary objects using simple heuristics [8] or Bayesian inference framework [9]. Other methods regarding this topic may be found in the literature. Spagnolo et al. classify objects as abandoned or removed by matching the boundaries of static foreground regions [10]. Another solution divides a video frame into blocks that are classified as background and foreground; non-moving foreground block is assumed to be stationary [5]. This method is robust against frequent dynamic occlusions caused by moving people. This paper presents a framework for detection a wide range of events in video monitoring systems, using rules defined by the system operator. The framework is based on the widely used background subtraction and object tracking algorithms and adds procedures developed especially for the presented system, as described in Section 2. An example application of the system at the airport for detection of abandoned luggage is briefly presented in Section 3 and the paper ends with conclusions and indication of areas for the future developments.
2 Framework Design and Implementation 2.1 System Overview The framework for automatic event detection is composed of several modules, as depicted in Fig. 1. The low-level algorithms extract information on moving objects from camera images. The detailed explanation of these algorithms lies beyond the scope of this paper, hence only the general idea is described here. Moving objects are detected in the camera frames using the background modeling method, based on Gaussian Mixture Model [11] (five distributions were used for each pixel). The results of background modeling are processed by detecting and removing shadow pixels (basing on the color and luminance of the pixels) and by performing morphological operations on the detected objects in order to remove small areas and to fill holes inside the objects. Next, movements of the detected objects (blobs) are tracked in successive image frames using a method based on Kalman filters [12]. A state of each tracked object (tracker) in each frame is described by an eight-element vector describing its position, velocity and change in the position and velocity. The state of each Kalman filter is updated for each image frame, so the movement of each
A Framework for Automatic Detection of Abandoned Luggage
15
Fig. 1 General block diagram of the proposed framework for the automatic event detection
object is tracked continuously. The next blocks of the analysis system are designed for the purpose of the presented framework. The task of the classification module is to assign detected moving objects (human, luggage, etc.). An event detection module analyses the current and the past states of objects and evaluate rules in order to check if any of the defined events occurred. The results of event detection are sent to the framework output for further processing (visualization, logging, camera steering). The remaining part of this paper discusses relevant modules of the framework in detail.
2.2 Resolving the Splitting Trackers The main problem that had to be resolved in the object tracking procedure implemented in the discussed framework was handling of ’splitting objects’, e.g. if a person leaves their luggage and walks away. In this situation, the tracker that was assigned to a person carrying a luggage has to track further movements of the same person and a new tracker has to be created for the luggage. A following approach is proposed for this task. First, groups of matching trackers and blobs are formed. Each group contains all the blobs that match at least one tracker in the group and all the trackers that match at least one blob in the group. The match is defined as at least one pixel common to the bounding boxes of the blob and the tracker. Within a single group, blobs matching each tracker are divided into subgroups (in some cases a subgroup may contain only one blob) separated by a distance larger than the threshold value. If all the blobs matching a given tracker form a single subgroup, the state of the tracker is updated with the whole blob group. If there is more than one subgroup of blobs matching the tracker, it is necessary to select one subgroup and assign the tracker to it. In order to find the sub-group that matches the tracker, two types of object descriptors are used - color and texture. Color descriptors are calculated using a two-dimensional chrominance histogram of the image representing the object. The texture descriptors are calculated from the same image, using a gray level co-occurrence matrix [13]. Five texture descriptors are used (contrast, energy, mean, variance and correlation) for three color channels and four directions of pixel adjacency, resulting in a vector of 60 parameters describing single object’s appearance. In order to find a subgroup of blobs matching the tracker, a vector of
16
G. Szwoch, P. Dalka, and A. Czy˙zewski
texture descriptors for the tracker DT is compared with a corresponding vector DB calculated for the subgroup of blobs, using a formula: ST = 1 −
1 N DTi − DBi ∑ max(DTi , DBi ) N i=1
(1)
where N is the number of elements in both vectors. The final similarity measure is a weighted sum of texture similarity ST and color histogram similarity SC (equal to a correlation coefficient of histograms of a tracker and a group of blobs): S = WT ST + WC SC
(2)
where the values of weighting coefficients (WT = 0.75,WC = 0.25) were found empirically. The subgroup of blobs with the largest S value is used to update the state of the tracker. After each tracker in the group is processed, the remaining (unassigned) blobs are used to construct new trackers. As a result, in case of a person leaving a luggage, the existing tracker follows the person and a new tracker is created for a left luggage.
2.3 Detection of Left or Removed Objects For the purpose of event detection, each tracker has to be assigned to a proper class (human, vehicle, luggage, etc.). In the test system intended to work at the airport, simplified classification is used: each object is classified either as a human or as a luggage, basing on analysis of objects velocity and their size and shape variability. It is assumed that luggage (as a separate object) remains stationary and do not change its dimensions significantly (some fluctuations in size and shape of objects are inevitable due to inaccuracies of the background subtraction procedure). In order to increase the accuracy of this simple classifier, a number of past states of the object are taken into account, together with its current state. The averaged parameters (speed, changes in size and shape) are compared with thresholds; if their typical values are exceeded, the object is classified as human, otherwise it is qualified as a luggage. In the further stages of system development, the classification procedure will be expanded so that more object classes will be defined. The main problem in this approach is that due to the nature of the background subtraction algorithm, leaving an object in the scene causes the same effect as removing an object that was a part of the background (e.g. a luggage that remained stationary for a prolonged time). In both cases a new tracker is created, containing either a left object or a ’hole’ in the background (remaining after the object was taken). The system has to decide whether the detected object was left or taken. This is achieved by examining the content (image) of the newly created tracker. It is expected that edges of the left object are located close to the blob’s border, while no distinct edges are present in case of the taken object (provided that a background is sufficiently smooth). The proposed procedure works as follows. First, the grayscale image B of the object (blob) and its mask M (having non-zero values for pixels belonging to the blob
A Framework for Automatic Detection of Abandoned Luggage
17
and zero values otherwise) are processed by the Canny detector in order to find the edges. The results of edge detection in the mask (EM ) is processed by morphological dilation in order to increase a detection margin: EMd = EM ⊕ SE
(3)
where SE is a 7 x 7 structuring element. Next, the result of dilation is combined with EB - the result of edge detection in the image B: R = EMd ∩ EB
(4)
and the resulting image is dilated using the same structuring element: Rd = R ⊕ SE
(5)
Finally, a measure of difference between the object and the background is calculated as: NR D= (6) NM where NR = CNZ(Rd ), NM = CNZ(EMd )
(7)
and CNZ() is a function that counts the number of non-zero pixels in the grayscale image. If the blob represents a left object, D is expected to be significantly larger than for a removed object. Therefore, the analyzed object is classified as a left one if D > Tcl or as a taken object otherwise, where Tcl is a detection threshold for classification. A proper selection of the threshold allows for accurate detection of taken and left objects regardless of the background (which may also contain edges) and errors in background subtraction. Fig. 2 presents an example of the procedure described above, for left and removed object cases (Tcl = 0.6). It may be seen that the proposed procedure allows for proper distinction of taken and left objects basing on the D measure value.
Fig. 2 Example of detection of left and taken objects using the procedure described in the text.
18
G. Szwoch, P. Dalka, and A. Czy˙zewski
2.4 Event Detection The event detection module utilizes data acquired from previous modules in order to detect events defined with rules. Detected events may refer to simple cases (object entering or leaving an area, crossing a barrier, stopping, moving, etc) as well as to more complex situations (abandoned luggage, theft and others). In this section of the paper, detection of an abandoned luggage scenario is used as an example. The event detector stores a history of each tracker states (in the experiments, last five states were used). The object state contains all the parameters needed for event detection (position of an object, its velocity and direction of movement, type, values of descriptors, etc.). Event rules are tested against all past states. If the rule is fulfilled for a defined number of states (e.g. three of total five), an event is detected. This approach allows for time-filtering of instantaneous events, reducing the number of false-positive decisions. The framework allows an user to add new rules; each rule is analyzed in parallel based on the same tracker data. An example rule for detection of an abandoned luggage may be formulated in plain English as follows: if an object of type ’human’ leaves an object of the type ’luggage’, the human moves away from the luggage at the distance d and does not approach the luggage for the time period t, an event ’abandoned luggage’ is detected. An implementation of this rule in the event detector is presented in Fig. 3. Only the objects classified as left luggage are processed by the rule. For each frame, distance d between the luggage and its ’parent’ (person that left the luggage) is calculated. If d exceeds the threshold Td (or the parent left the screen) for a time period Tt , an alarm is sent to the system output. If the person goes back to the luggage (d < Td ) before the time Tt passes, the counter t is reset. The rule may be extended with additional conditions, e.g. the event may be detected only if the person and/or the luggage are detected in the defined area or if the person crosses the defined barrier. The main idea of the event detection module presented here remains valid for more complex scenarios. Other rules (stolen object, loitering, etc.) may operate in a similar manner.
3 Test Results The framework for automatic event detection described in Section 2 was implemented in C++ programming language, using an OpenCV library [14] for performing low-level image operations and implementing Kalman filters. The first version of the system for detection of an abandoned luggage is currently tested at the PoznanLawica airport in Poland. A camera is mounted in the arrivals hall, at the height of 3.20 meters, overlooking the area in which most of the abandoned luggage cases were observed. The area visible by the camera is calibrated using the Tsai’s method [15] in order to analyze distances between objects in meters instead of pixels. The system runs on the PC with Fedora Linux operating system, 2.5 GHz quadcore processor and 4 GB of memory and is able to process 10 video frames per second in real time (resolution 1600 x 900 pixels, fps limit imposed by the camera). The thorough evaluation of accuracy of the proposed abandoned luggage detector requires
A Framework for Automatic Detection of Abandoned Luggage
19
Fig. 3 Block diagram of the detector of abandoned luggage described in text.
the ground data which has not been collected yet. Therefore, an extensive quantitative analysis of the system performance remains to be evaluated in future. Based on initial tests and visual evaluation of results provided by the system, it may be stated that the accuracy of the detection is satisfactory in good and moderate conditions (stable light, moderately busy scene). An example of proper detection of the abandoned luggage is presented in Fig. 4. The lightning conditions in this example are not optimal, therefore results of background subtraction and object tracking are inaccurate to some degree. However, the algorithms described in this paper generally work as expected. The tracker representing the person walking away from the luggage (Fig. 4a and 4b) matches the two blobs separated by distance larger than the threshold (30 pixels). Similarity factors calculated for the tracker and both blobs (using Eq. 1 for texture similarity ST, correlation coefficient for color histogram similarity SC and Eq. 2 for the total similarity) are: ST = 0.83, SC = 0.86, S = 0.84 for the tracker compared with the ‘person blob’ and ST = 0.51, SC = 0.99, S = 0.64 for the tracker compared with the ‘luggage blob’. Although the result calculated for color histograms is invalid in this case, the final similarity measure is correct and allows for proper assignment of the existing tracker to the person and a new tracker is created for the left luggage (Fig. 4c). This new tracker is correctly classified as a left object, basing on the edge measure calculated using Eq. 6: D = 0.71 (Tcl = 0.6).
20
G. Szwoch, P. Dalka, and A. Czy˙zewski
Fig. 4 Example result of tests of the automatic detector of the abandoned luggage, performed at the Poznan-Lawica airport: (a) person with luggage, (b) person leaving luggage, (c) new tracker created for the abandoned luggage, (d) event rule matched - abandoned luggage detected.
After the distance between the person and the luggage exceeds the defined threshold (d = 3m) for a defined time period (t = 15s), the event detector detects that the luggage was abandoned (Fig. 4d) and sends an alarm to its output. Conditions that may cause event misdetection by the system are related mainly to the inaccuracy of the background subtraction module due to the changes in lighting. In the presented example, sunlight falling through glass walls of the hall causes reflections on the floor that disturbed the background subtraction procedure and resulted in creation of numerous false trackers. Another problem is related to object tracking in case of crowded scenes, with a large number of moving objects overlapping each other for a prolonged time. In such situations, the number of erroneous decisions made by the tracking procedure increases significantly. As a result, the event detector is fed with invalid data and fails to provide expected results. This kind of problems will be addressed in the future research.
A Framework for Automatic Detection of Abandoned Luggage
21
4 Conclusions An extensible framework for the rule-based event detection was proposed and its application in a real-life scenario at the airport was initiated. The hitherto performed tests proved that the algorithms developed for this system (detection of left and removed objects, event detection based on defined rules) allow for a detection of defined events, such as an abandoned luggage, under normal conditions (stable light, moderately crowded scene). The results of the tests performed so far are promising, however, the accuracy of the system decreases in less favorable conditions. Therefore, in order to develop a fully-working system, future research will focus on improving the accuracy of background subtraction and object tracking algorithms in difficult conditions.. The fully developed system for abandoned luggage detection may provide a helpful tool in improving the level of public safety in airport terminals and other public areas. Moreover, the framework is designed in a way that allows straightforward extensibility with other analysis modules, thus it may be implemented in both existing and new security systems, helping to improve the level of safety in public facilities.
Acknowledgments Research is subsidized by the Polish Ministry of Science and Higher Education within Grant No. R00 O0005/3 and by the European Commission within FP7 project “INDECT” (Grant Agreement No. 218086). The authors wish to thank the staff of the Poznan-Lawica airport for making the tests of the system possible.
References 1. Bird, N.D., Masoud, O., Papanikolopoulos, N.P., Isaacs, A.: Detection of loitering individuals in public transportation areas. IEEE Trans. Intell. Transp. Syst. 6(2), 167–177 (2005) 2. Lee, M.W., Nevatia, R.: Body part detection for human pose estimation and tracking. Proc. Motion Video Comput., 23 (2007) 3. Blunsden, S., Andrade, E., Fisher, R.: Non parametric classification of human interaction. In: Proc. 3rd Iberian Conf. Pattern Recog. Image Anal., pp. 347–354 (2007) 4. Datta, A., Shah, M., Lobo, N.D.V.: Person-on-person violence detection in video data. In: Proc. of the 16th Int. Conf. on Pattern Recognition, vol. 1, pp. 433–438 (2002) 5. Black, J., Velastin, S.A., Boghossian, B.: A real time surveillance system for metropolitan railways. In: Proc. IEEE Conf. Adv. Video Signal Based Surveillance, pp. 189–194 (2005) 6. Kang, S., Abidi, B., Abidi, M.: Integration of color and shape for detecting and tracking security breaches in airports. In: Proc. 38th Annu. Int. Carnahan Conf. Security Technol., pp. 289–294 (2004) 7. Ghazal, M., Vazquez, C., Amer, A.: Real-time automatic detection of vandalism behavior in video sequences. In: Proc. IEEE Int. Conf. Syst., Man, Cybern., pp. 1056–1060 (2007)
22
G. Szwoch, P. Dalka, and A. Czy˙zewski
8. Auvinet, E., Grossmann, E., Rougier, C., Dahmane, M., Meunier, J.: Left-Luggage Detection using Homographies and Simple Heuristics. In: Proc. IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, pp. 51–58 (2006) 9. Lv, F., Song, X., Wu, B., Singh, V.K., Nevatia, R.: Left Luggage Detection using Bayesian Inference. In: Proc. of IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, pp. 83–90 (2006) 10. Spagnolo, P., Caroppo, A., Leo, M., Martiriggiano, T., D’Orazio, T.: An abandoned/removed objects detection algorithm and its evaluation on PETS datasets. In: Proc. IEEE Int. Conf. Video Signal Based Surveillance, p. 17 (2006) 11. KadewTraKuPong, P., Bowden, R.: A real time adaptive visual surveillance system for tracking low-resolution colour targets in dynamically changing scenes. J. of Image and Vision Computing 21(10), 913–929 (2003) 12. Welch, G., Bishop, G.: An introduction to the Kalman filter. Technical report, TR-95041, Department of Computer Science, University of North Carolina (2004) 13. Hall-Beyer, M.: The GLCM Tutorial (2007), http://www.fp.ucalgary.ca/mhallbey/tutorial.htm 14. Bradski, G., Kaehler, A.: Learning OpenCV: Computer Vision With the OpenCV Library. O’Reilly Media, Sebastopol (2008) 15. Tsai, R.: A Versatile Camera Calibration Technique For High Accuracy 3d Machine Vision Metrology Using Off-The-Shelf TV Cameras And Lenses. IEEE J. Robotics Automat. RA-3(4) (1987)
Modeling Student’s Knowledge on Programming Using Fuzzy Techniques Konstantina Chrysafiadi and Maria Virvou
Abstract. In this paper we describe the student modeling component of a webbased educational application that teaches the programming language Pascal using fuzzy logic techniques. To build a student model we have to diagnose the needs, misconceptions and cognitive abilities of each individual student. However, student diagnosis is fraught with uncertainty, and one possible approach to deal with this is fuzzy student modeling. Thus, we choose fuzzy logic techniques to describe student’s knowledge level and cognitive abilities. Furthermore, we use a mechanism of rules over the fuzzy sets, which is triggered after any change of the students’ knowledge level of a domain concept, and update the students’ knowledge level of all related with this concept, concepts.
1 Introduction Over the last decade, the use of the Internet in the field of education has been continually growing. Web-based educational applications offer easy access and also are independent of platform. However, a web-based application is used by a very large audience which consists of learners with different characteristics and needs and the teacher is absent during the instruction process [1, 2], so it has to be characterized by high adaptivity. Adaptive web applications have the ability to deal with different users’ needs for enhancing usability and comprehension and for dealing with large repositories [3]. Adaptivity requires dynamic adaptation of the web application tailored to individual users’ needs. It has long been recognized that in order to build a good interactive computer system with a heterogeneous user community, it is necessary to have the development of individual user models [4]. To build a student model we have to diagnose the needs, misconceptions and cognitive abilities of each individual student. However, student diagnosis is fraught with uncertainty, and Konstantina Chrysafiadi and Maria Virvou Department of Informatics, University of Piraeus, 80 Karaoli & Dimitriou St., Piraeus 18534, Greece e-mail:
[email protected],
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 23–32. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
24
K. Chrysafiadi and M. Virvou
one possible approach to deal with this is fuzzy student modeling. Fuzzy sets theory was introduced by Zadeh [5], according to who, the main contribution of Fuzzy logic is a methodology for computing with words, which cannot be done equally well with other methods [6]. Thus, Fuzzy logic techniques can be used to improve the performance of a web-based educational application. Several researchers found fuzzy student modeling as adequate for carrying out the system’s assessment and pedagogical functions [7, 8, 9, 10, 11, 12, 13]. In this paper we describe the student modeling component of a web-based educational application that teaches the programming language Pascal using fuzzy logic techniques. We have selected this knowledge domain because there is a significant scientific interest in the effective tutoring of programming courses [14, 15, 16, 17]. Indeed, efforts of adopting fuzzy logic in intelligent tutoring systems for programming learning have been made [18]. Mainly, the student model is based on the student cognitive model. Therefore, we focus on which parts of the domain knowledge of programming the student knows and how well. For describing the student’s cognitive state we use fuzzy techniques. The data from the fuzzy user model is the basis for the system adaptation.
2 The Domain Knowledge One of the most important components of an adaptive educational application is the representation of knowledge. To enable communication between system and learner at content level, the domain model of the system has to be adequate with respect to inferences and relations of domain entities with the mental domain of a human expert [19]. Taking this into account, the domain knowledge of our application is organized into an hierarchical tree, with domain topics as intermediate nodes and learning objectives as leaf nodes [20]. The domain topics consist of course modules (table 1). The full collection course modules cover the whole programming domain to be taught. There are topics which concern declarations of variables and constants, expressions and operands, input and output expressions, the sequential execution of a program, the selection statement (if statement), the iteration statements (for loop, while loop and do - until loop), sorting and searching algorithms, arrays, functions and procedures. Each topic has its learning objectives that are classified according to Bloom’s taxonomy [21]. Learning objectives determine the concepts that must be understood and learned in each chapter. For example, the learning objectives for the topic of variables are: • To learn the types of variables and understand their differences and how to use them • To learn how to declare variables • To learn how to select the appropriate type of variable each time • To learn the rules about variables’ names • To learn to use variables saving memory The hierarchy of the tree depicts the difficulty level of the domain topics and the order in which each topic has to be taught. The creation of the hierarchy is based on
Modeling Student’s Knowledge on Programming Using Fuzzy Techniques
25
Table 1 Course modules
Module C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16
COURSE MODULES Description Module Description Variables & Constants C17 Finding max/min in a For Loop Assignment Statement C18 The While Loop Numeric Operators C19 Calculation of sum in a While Loop Mathematic Functions C20 Counting in a While Loop Comparative Operators C21 Calculation of average in a While Loop Logical Operators C22 Finding max/min in a While Loop Input-Output Statement C23 The doWhile Loop Sequential Execution of a Program C24 Arrays: One dimension The if statement C25 Searching Algorithms The if-else if statement C26 Sorting Algorithms The nested if C27 Arrays: Two dimensions Calculation of max/min C28 Working with rows The For Loop C29 Working with columns Calculation of sum in a For Loop C30 Working with diagonal Calculation of average in a For Loop C31 Procedures Counting in a For Loop C32 Functions
the knowledge domain, on the logic of programming languages and on the dependencies that exist among the components of a programming language. For example, the teaching of variables and operands proceeds to the teaching of the if statement and the learning of a sorting algorithm presupposes the existence of knowledge of selection and iteration statements. The domain knowledge also encodes relations between different chapters and the categorization of these chapters according to their difficulty level.
3 Student Modeling Student model is mainly based on the student cognitive model. Therefore, we focus on which parts of the domain knowledge of programming the student knows and how well. Phrases such as “He is very good at if statement”, “She has a lack of knowledge at arrays”, “He is moderate at the for loop structure” are vague and imprecise. Moreover, statements such as “She knows at 70% the chapter of calculating a sum in a for loop”, “He succeeded 85% in the questions about the numeric operators” do not explicitly state that s/he has assimilated the corresponding programming issue or s/he has to revise it. Consequently, student’s knowledge and cognitive abilities representation is imprecise and one possible approach to deal with this is fuzzy set techniques, with their ability to naturally represent human conceptualization. We define the following four fuzzy sets for describing student knowledge of a domain concept:
26
K. Chrysafiadi and M. Virvou
• Unknown (Un): the degree of success in the domain concept is from 0% to 60%. • Unsatisfactory Known(UK): the degree of success in the domain concept is from 55% to 75%. • Known (K): the degree of success in the domain concept is from 70% to 90%. • Learned (L): the degree of success in the domain concept is from 85% to 100%. The membership functions for the four fuzzy sets are depicted in fig. 1, and are the following: ⎧ x ≤ 55 ⎨ 1, 1 − (x − 55)/5, 55 < x < 60 μUn (x) = ⎩ 0, x ≥ 60 ⎧ ⎪ ⎪ (x − 55)/5, ⎨ 1, μU K (x) = 1 − (x − 70)/5, ⎪ ⎪ ⎩ 0,
⎧ ⎨ (x − 85)/5, 85 < x < 90 1, 90 ≤ x ≤ 100 μL (x) = ⎩ 0, x ≤ 85
55 < x < 60 60 ≤ x ≤ 70 μU (x) = 70 < x < 75 x ≤ 55 or x ≥ 75
⎧ ⎪ ⎪ (x − 70)/5, ⎨ 1, 1 − (x − 85)/5, ⎪ ⎪ ⎩ 0,
70 < x < 75 75 ≤ x ≤ 85 85 < x < 90 x ≤ 70 or x ≥ 90
where x is the student’s degree of success in a domain concept.
Fig. 1 Partition for knowledge level of a chapter
The following expression stand: μUn , μUK , μK , μL ∈ [0, 1] μUn + μUK + μK + μL = 1 i f μUn > 0 → μK = μL = 0 i f μUK > 0 → μL = 0 i f μK > 0 → μUn = 0 i f μL > 0 → μUn = μUK = 0 Thus, a quadruplet (μUn , μUK , μK , μL ) is used to express the student knowledge of a domain concept. For example, if a student is succeeding 87% at the domain concept of “Variables and Constants”, then her/his knowledge state of this domain concept is described by the quadruples (0, 0, 0.6, 0.4), which means that the domain
Modeling Student’s Knowledge on Programming Using Fuzzy Techniques
27
concept of “Variables and Constants” is 60% Known and 40% Learned for the student. To determine the knowledge level of a student, the student takes a test at the end of each chapter. This test includes true/false exercises, multiple choice exercises, fill in the gap space exercises, where the student fills in a symbol or a command in order to complete a program and exercises in which users have to put certain parts of a program in the right order. The degree of success in this test does not coincide with the student’s degree of success in the domain concept, which is used for calculating the quadruplet (μUn , μUK , μK , μL ). That is happened due to the fact that in the field of programming each chapter haw syntax and semantic knowledge. For example, the domain concept of the while loop includes the syntax of the while statement as well as issues about the function and the logic of use of the while statement. So, if a student is taking the test that follows the chapter of the while loop and is answering right 80% of the questions, it does not mean that s/he knows the chapter. Lets consider that the above test was included from 100 questions from which 50 concerned the syntax of the while statement and the other 50 questions concerned the logical use of the while loop. Also, lets consider that the student answered right all the 50 questions, that concerned the syntax of the while statement, but only the 30 from the 50 other questions. Thus, the student knows excellent the syntax of the while statement, but we cannot conclude that s/he knows all the domain concept of while loop. With regard to the above, we define the following two fuzzy sets for describing domain concepts: • Syntax Knowledge (A) • Semantic Knowledge (B) The membership functions for these fuzzy sets are depicted in fig. 2, and are the following: μA : A → [0, 1] μA (c) = pA (c)/100 μB : B → [0, 1] μB (c) = pB (c)/100 where c is the domain concept, pA (c) is the percentage of syntax knowledge that is included in the chapter c and pB (c) is the percentage of semantic knowledge that is included in the chapter c. The following expression stand: pA (c) + pB (c) = 100 andμA(c) + μB (c) = 1 The values of μA and μB of each chapter of our knowledge domain are shown in table 2. These values are the mean of the pA and pB that are defined by ten experts on programming. Thus, the dyad (μA , μB ) is used to express the type of knowledge of a domain concept. Therefore, the student of the above example, who answer right 80% of the questions of the test, has 75,2% degree of success in the domain concept of while loop. That is because, according to table 2 the type of knowledge of the domain concept of the while loop is (0.38, 0.62) and the student knows 100% the syntax knowledge (s/he answered right all the questions that concerned the syntax of the while statement) and 60% the semantic knowledge of the chapter (s/he answered right 30 from 50 questions that concerned the logic of the while statement), thus s/he knows 0.38+0.62*60/100=0.752 75,2% the chapter of while loop. Taking into account all
28
K. Chrysafiadi and M. Virvou
Fig. 2 Partition for type of knowledge of a chapter Table 2 Values of membership functions for knowledge type of each chapter. Chapter μA (c) μB (c) Chapter μA (c) μB (c) Chapter μA (c) μB (c) C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
0,48 0,76 0,80 1 1 0,41 0,82 0,53 0,46 0,46 0.39
0,52 0,24 0,20 0 0 0,59 0,18 0,47 0,54 0,54 0,61
C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22
0 1 0,47 0,53 0 1 0 1 0 1 0 1 0,38 0,62 0 1 0 1 0 1 0 1
C23 C24 C25 C26 C27 C28 C29 C30 C31 C32
0,39 0,40 0 0 0,45 0 0 0 0,46 0,46
0,61 0,60 1 1 0,55 1 1 1 0,54 0,54
the above, the steps that are followed in order to determine a student’s knowledge level of a domain concept c, are the following: • Step 1: The student reads the chapter c and takes the corresponding test and her/his grade on test is g (100 is excellent) with g = gs + gl (gs : the grade on syntax knowledge, gl : the grade on semantic knowledge). • Step 2: We calculate the student’s degree of success x in the c, using the formula x = μA (c) ∗ gs + μB(c) ∗ gl • Step 3: We find the values of membership function for the fuzzy sets that describe the student’s knowledge state of c and we define the quadruplet ( μUn , μUK , μK , μL ) for c. A characteristic of the domain knowledge of programming is that the experience and the continuous development of programs lead to better understanding and assimilation of programming issues that have been taught a previous time. For example, if a student knows 70% the chapter of calculating a sum in a for loop and s/he is examined in the chapter of calculating a sum in a while loop and succeeds 95%,
Modeling Student’s Knowledge on Programming Using Fuzzy Techniques
29
then the student’s knowledge level the chapter of calculating a sum in a for loop has to become higher than it is. Furthermore, if a student performs poorly in a chapter, then her/his knowledge level of previous associated chapter has to be reduced. In addition, the learning of some programming issues helps to better understanding of chapters that follow. For example, a student who has been taught how to calculate a sum and how to count, s/he already knows, at some degree, the next chapter which concerns how to calculate an average in a loop. In other words, let’s symbolize with Ci → C j that there is a dependency between the chapters Ci and C j , and more concretely that chapter Ci precedes to chapter C j , then the following four facts can be happened f1. f2. f3. f4.
Considering the knowledge level of Ci, the knowledge level of Cj increases. Considering the knowledge level of Ci, the knowledge level of Cj decreases. Considering the knowledge level of Cj, the knowledge level of Ci increases. Considering the knowledge level of Cj, the knowledge level of Ci decreases.
When f1 and f3 are happened the student model expands. On the contrary, when f2 and f4 are happened the student model reduces. In other words, after any change of the value of concept knowledge of a domain concept, an inferring mechanism is triggered that updates the values of all related with this concept, concepts. An akin mechanism has also been used in [22], however only the values of all essential prerequisite concepts are updated and the student model only expands. Let’s define D the fuzzy set describing the dependencies between the domain concepts and μD (Ci ,C j ) the membership function of the dependency relation of C j on Ci and μD (Ci ,C j ) the membership function of the dependency relation of Ci on C j . The values of μD (Ci,C j) and μD (Ci,C j) are determined by ten experts on programming and they are shown in table 3. Table 3 Values of the membership function of dependencies Ci C j μD (Ci ,C j ) μD (C j ,Ci ) Ci C j μD (Ci ,C j ) μD (C j ,Ci ) 9-10 14 14 15 14 16 17 12 19
11 19 15 21 16 20 22 17 20
0.64 1.00 0.81 0.52 0.45 1.00 1.00 0.37 0.45
0.70 1.00 1.00 1.00 0.42 1.00 1.00 0.29 0.42
19 20 12 24 27 28 27 27
21 21 22 27 28 29 29 30
0.39 0.41 0.37 0.43 0.33 0.77 0.33 0.27
1.00 1.00 0.29 0.51 0.99 0.77 0.99 0.78
Concerning the precedence relation Ci → C j the knowledge level of the chapters can change according the following rules: • Concerning the knowledge level of Ci (C j ) the knowledge level of C j (Ci ) increases according to (S1, S2 are knowledge levels with S1 < S2):
30
K. Chrysafiadi and M. Virvou
– R1: If C j (Ci ) is S1 and Ci (C j ) is S1, then C j (Ci ) remains S1 with μS1 (C j ) = max[μS1 (C j ), μS1 (Ci ) ∗ μD (Ci ,C j )](μS1 (Ci ) = max[μS1 (Ci ), μS1 (Ci ) ∗ μD (C j ,Ci )]). – R2: If C j (Ci ) is S1 and Ci (C j ) is S2, then C j (Ci ) becomes S2 with μS2 (C j ) = μS2 (Ci ) ∗ μD (Ci ,C j )(μS2 (Ci ) = μS2 (C j ) ∗ μD (C j ,Ci )). • Concerning the knowledge level of Ci the knowledge level of C j reduces according to: – R3: If C j is 100% Learned, then it does not change. – R4: If C j is S and Ci is Unknown, then C j becomes Unknown with μUn (C j ) = μUn (Ci ) ∗ μD (Ci ,C j ). S is knowledge level with S > Unknown. – R5: If C j is S and Ci is Unsatisfactory Known, then C j becomes Unsatisfactory Known if μD (Ci ,C j ) = 1 or Known with μK (C j ) = 1 − μUK (C j ) = 1 − μUK (Ci ) ∗ μD (Ci ,C j ). S is knowledge level with S > Unsatis f acoryKnown. – R6: If C j is Partially Learned and Ci is Known, then C j remains Known with μK (C j ) = μK (Ci ) ∗ μD (Ci ,C j ). • Concerning the knowledge level of C j the knowledge level of Ci reduces according to: – R7: We use the formula pi = min[pi , (1 − μD (Ci ,C j )) ∗ pi + p j ], where pi and p j are the student’s degree of success in Ci and C j respectively, and then using the new pi we determine the new quadruplet (μUn , μUK , μK , μL ) for Ci . Let’s see some examples between the chapters 14 (“Calculation of sum in a For Loop”) and 15 (“Calculation of average in a For Loop”). The prerequisite relation is C14 → C15 and μD (C14 ,C15 ) = 0.81, μD (C15 ,C14 ) = 1. i Let’s consider that C15 is 40% Learned and the student is being examined at C14 and it is being concluded that C14 is 100% Learned, so the knowledge level of C15 will increase according to the rule R1 and it will become Learned with μL (C15 ) = max[μL (C15 ), μL (C14 ) ∗ μD (C14 ,C15 )] = max[0.4, 1 ∗ 0.81] = 0.81 So, C15 will become 81% Learned. ii Let’s say that C15 is 75% Known and the student is being examined at C14 and it is being concluded that C14 is 100% Unsatisfactory Known, so the knowledge level of C15 will decrease according to the rule R5 and it will become known with μK (C15 ) = 1 − μUK (C15 ) = 1 − μUK (C14 ) ∗ μD (C14 ,C15 ) = 1 − 1 ∗ 0.81 = 0.19. So, C15 will become 19% Known. iii Let’s consider that C15 is 100%Unsatisfactory Known and the student is being examined at C14 and it is being concluded that C14 is 30% Unknown, so the knowledge level of C15 will decrease according to the rule R4 and it will become Unknown with μUn (C15 ) = μUn (C14 ) ∗ μD (C14 ,C15 ) = 0.3 ∗ 0.81 = 0.243. So, C15 will become 24.3% Unknown. iv Let’s consider that C14 is 60%Unknown and the student is being examined at C15 and it is being concluded that C15 is 100% Known, so the knowledge level of C14 will increase according to the rule R2 and it will become Known with μK (C14 ) = μK (C15 ) ∗ μD (C15 ,C14 ) = 1 ∗ 1 = 1. So, C14 will become 100% Known.
Modeling Student’s Knowledge on Programming Using Fuzzy Techniques
31
v Let’s consider that C14 is 20%Learned and the student is being examined at C15 and it is being concluded that C15 is 100% Unsatisfactory Known, so the knowledge level of C14 will decrease according to the rule R7. p14 = min[p14 , (1 − μD (C14 ,C15 )) ∗ p14 + p15 ] = min[86, (1 − 0.81) ∗ 68] = min[86, 84.34] = 84.34. So the new quadruplet (μUn , μUK , μK , μL ) for C14 is (0,0,1,0). In other words, C14 will become 100% Known.
4 Conclusion Our target in this paper was to create an adaptive web-based educational application which teaches the programming language Pascal. The personalized support is realized due to the application’s user model which adopts fuzzy logic techniques. Due to the fact that student’s actions, misconceptions and needs are imprecise information, we choose fuzzy logic to manage the uncertainty and to describe human descriptions of knowledge and of student’s cognitive abilities. We use fuzzy sets in order to describe for each domain concept how well is known and learned and a mechanism of rules which is triggered after any change of the value of concept knowledge of a domain concept, and update the values of concept knowledge of all related with this concept, concepts. These techniques define our fuzzy user model is the basis for our system adaptation.
References 1. Smith, C., Grech, C., Gillham, D.: Online Delivery of a Professional Practice Course: An Opportunity to Enhance Life-long Learning for Asian Nurses. In: The Proceedings of Distance Education: An Open Question? Conference, Adelaide, Australia, September 11-13 (2000) 2. Carro, R., Pulido, E., Rodriguez, P.: An Adaptive Driving Course Based on HTML Dynamic Generation. In: Proceedings of World Conference on the WWW and Internet, WebNet 1999, Hawaii, USA, October 25-30, vol. 1, pp. 171–176 (1999) 3. Garlatti, S., Iksal, S.: Declarative specifications for adaptive hypermedia based on a semantic web approach. In: Brusilovsky, P., Corbett, A.T., de Rosis, F. (eds.) UM 2003. LNCS (LNAI), vol. 2702, pp. 81–85. Springer, Heidelberg (2003) 4. Rich, E.: Users are individuals: individualising user models. Int. J. Human-Computer Studies 51, 323–338 (1999) 5. Zadeh, L.A.: Fuzzy sets, information and control 8, 338–353 (1965) 6. Zadeh, L.A.: Fuzzy logic=Computing with words. IEEE Transactions on Fuzzy System 4(2), 103–111 (1996) 7. Katz, S., Lesgold, A., Eggan, G., Gordin, M.: Modelling the Student in SHERLOCK II. In: Greer, J., McCalla, G. (eds.) Student Modelling: The Key to Individualized Knowledge-based Istruction, pp. 99–125. Springer, Berlin (1994) 8. Kosba, E., Dimitrova, V., Boyle, R.: Using Fuzzy Techniques to Model Students in WebBased Learning Environments. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS (LNAI), vol. 2774, pp. 222–229. Springer, Heidelberg (2003) 9. Xu, D., Wang, H., Su, K.: Intelligent Student with Fuzzy Models. In: Proceedings of the 35th Hawaii International Conference on System Sciences (2002)
32
K. Chrysafiadi and M. Virvou
10. Panagiotou, M., Grigoriadou, M.: An Application of Fuzzy Logic to Student Modeling. In: Proceedings of the IFIP World conference on Computer in Education (WCCE 1995), Birmigham (1995) 11. Warendorf, K., Tsao, S.J.: Application of Fuzzy Logic Techniques in the BSS1 Tutoring System. Journal of Artificial Intelligence in Education 8(1), 113–146 (1997) 12. Suarez-Cansino, J., Hernandez-Gomez, A.: Adaptive Testing System Modeled Through Fuzzy Logic. In: 2nd WSEAS Int. Conf on Computer Engineering and Applications (CEA 2008), Acapulco, Mexico, January 25-27, pp. 85–89 (2008) 13. Nyk¨anen, O.: Inducing Fuzzy Models for Student Classification. Educational Technology & Society 9(2), 223–234 (2006) 14. Byckling, P., Sajaniemi, J.: A role-based analysis model for the evaluation of novices’ programming knowledge development. In: Proceedings of the 2006 international workshop on Computing education research, Canterbury, United Kingdom, pp. 85–96 (2006) 15. Grunbacher, P., Seyff, N., Briggs, R., Hoh, P.I., Kitapsi, H., Port, D.: Making every student a winner: The WinWin approach in software engineering education. Journal of Systems and Software 80(8), 1191–1200 (2007) 16. Wei, F., Moritz, S.H., Parvez, S.M., Blank, G.D.: A student model for object-oriented design and programming. Journal of Computing Sciences in Colleges 20(5), 260–273 (2005) 17. Woszczynski, A., Haddad, H.M., Zgambo, A.F.: Towards a model of student success in programming cources. In: Proceedings of the 43rd Annual Southeast Regional Conference, Kennesaw, Georgia, pp. 301–302 (2005) 18. Jurado, F., Santos, O.C., Redondo, M.A., Boticario, J.G., Ortega, M.: Providing Dynamic Instructional Adaptation in Programming Learning. In: Corchado, E., Abraham, A., Pedrycz, W. (eds.) HAIS 2008. LNCS (LNAI), vol. 5271, pp. 329–336. Springer, Heidelberg (2008) 19. Peylo, C., Teiken, W., Rollinger, C., Gust, H.: An ontology as domain model in a webbased educational system for prolog. In: Etheredge, J., Manaris, B. (eds.) Proceedings of the 13th International Florida Artificial Intelligence Research Society Conference, pp. 55–59. AAAI Press, Menlo Park (2000) 20. Kumar, A.: Rule-Based Adaptive Problem Generation in Programming Tutors and its Evaluation. In: Proceedings of the 12th International Conference on Artificial Intelligence in Education, Amsterdam, pp. 35–43 (2005) 21. Bloom, B.S.: Taxonomy of Educational Objectives. In: Handbook I: The Cognitive Domain. David McKay Co Inc., New York (1956) 22. Kavˇciˇc, A.: Fuzzy User Modeling for Adaptation in Educational Hypermedia. IEEE Transactions on Systems, Man and Cybernetics. Part C: Applications and Reviews 34(4), 439–449 (2004)
Camera Angle Invariant Shape Recognition in Surveillance Systems D. Ellwart and A. Czy˙zewski
Abstract. A method for human action recognition in surveillance systems is described. Problems within this task are discussed and a solution based on 3D object models is proposed. The idea is shown and some of its limitations are talked over. Shape description methods are introduced along with their main features. Utilized parameterization algorithm is presented. Classification problem, restricted to bi-nary cases is discussed. Support vector machine classifier scores are shown and additional step for improving classification is introduced. Obtained results are discussed and further research directions are discussed.
1 Introduction Monitoring systems are becoming more and more popular every day. They are used by the public authorities as much as by private users mostly for security purposes. However, multi-camera systems are difficult to manage manually. Considering public surveillance systems, even several operators would not be able to monitor every single camera continuously 24 hours a day. That is why much emphasis is put on research for automated event detection which, in case of a proper fidelity, could greatly improve the quality of such a system. These events can be defined differently depending on consumer needs. For example, an alert could be triggered if some activity in restricted areas is observed, but also many more advanced rules including object features could be used. This work concentrates on shape analysis for the purpose of silhouette recognition under various camera angles. Besides the possibility of being a great support to a general object classifier, it allows for classifying human behaviour. This could lead to an improvement in detection of dangerous shape-based events, as for example: a person lying on the ground. D. Ellwart and A. Czy˙zewski Multimedia Systems Department www.sound.eti.pg.gda.pl Gdansk University of Technology, Poland e-mail:
[email protected],
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 33–40. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
34
D. Ellwart and A. Czy˙zewski
2 Proposed System Human silhouettes can tell a lot about the performed actions. Such information could be helpful for many tasks, as object classification or event detection. To achieve this goal, shape recognition algorithm is needed. Unfortunately this task is not a common pattern recognition problem. In surveillance systems, cameras might be placed differently, depending on the possibilities in certain places. They can be attached to a wall, located on top of buildings or fixed to a pole. For that reason shape of an observed object differs depending on the camera orientation angle. Within spherical coordinates, both observation angles are meaningful (Fig. 1). Horizontal angle is related to the orientation of the camera and to its altitude above the ground. Vertical angle is connected to the analyzed object and the spin around its own axis. Therefore, in spite of recognizing shapes in two dimensional images, 3D pose models (Fig. 1) are needed to fully cover viewing variants [1]. Automated surveillance systems are built assembling many complex algorithms. Before even looking onto shape classification problem, input image should be processed to extract any moving objects visible in the scene. Because this work concentrates on real time shape parameterization and classification, input data in the form of binary masks, is prepared manually using videos from existing monitoring systems. In this way the evaluation of he proposed algorithm is independent from any previous possible errors.
Fig. 1 Difficulties with recognizing 3D objects using 2D images (left). Example models demonstrating 3 different classes (right).
Fig. 2 General scheme of data flow in the proposed system
Camera Angle Invariant Shape Recognition in Surveillance Systems
35
Data flow in the designed system is shown in Fig. 2. The dashed border marks the operations done only once per algorithm execution. All 2D images referring to the specified camera angle are read and parameterized during this phase. Then these data, as a set of vectors, are used to train the classifier. The algorithmic group within the solid border is called for every single input video frame. All steps made before the decision are described in the following paragraphs.
3 Shape Parameterization In order to create shape classification algorithm which could work in real time applications, object parameterization has to be done to lower the amount of data being further processed. Describing a shape, even in two dimensional images is a quite challenging task which requires some assumptions to be made. Firstly, it is important to define detection conditions such as sensitivity to shape rotation and size variety. The next step is to choose the proper description algorithm. There are two main groups among the parameterization methods. First of them is based on the shapes contour and the other group considers the shape as a region [2]. Many of these methods are burdened with some restrictions that is why the choice should be wisely considered. Popularly used algorithms include Zernicke’s Moments (regionbased), Chain Code (contour-based), Contour Scale Space (contour-based) and Pairwise Geometric Histogram (contour-based) [3] [4] [5] [6] [7] [8] [9] [10]. Most of above algorithms could produce very precise results, but can be quite hungry for computing power. The main feature of the description method used in this work is its simplicity and rotation invariance. After acquiring the mask of an analyzed shape it is being resized to reference dimensions keeping its proportions. In the next step the shape centroid is to be found (Eq. 1) and then mask thickness along the half lines as shown in Fig. 3 is being computed: M01 M10 (1) , (cx , cy ) = M00 M00 Mi j = ∑ ∑ xi y j I(x, y) x
(2)
y
where Mi j is the moment of order (i + j) and I(x, y) denotes the mask image. Parameterization occurs along the half line bounded by the shapes CoG (center of gravity). This line is reorientated according to Eq. 3 with a specified angle step (Δ α ). The lower its value, the better resulting accuracy. l : ysin(α ) = xcos(α )
(3)
α = k(Δ α ) , k = 0, 1, . . . , (360/Δ α ) − 1
(4)
where α is the angle between X axis and the current half line orientation. The smallest meaningful value of Δ α (angle resolution) results from assumed reference dimension. It is related with that size in the following way:
36
D. Ellwart and A. Czy˙zewski
Δ αmin = arctan
1 max(re f erenceX, re f erenceY )
(5)
Finally, at this point, the processed shape can be treated as a one dimensional signal. But still, one more issue should be remarked here. Namely, while input shapes are larger than the specified dimensions, they are shrunk during resizing phase. In an opposite situation, when mask is too small, it should be stretched up what implies the usage of an interpolation algorithm. In our experiments the linear method was used.
Fig. 3 Stages of shape parameterization. Obtaining object mask (left), resizing to reference dimension (middle) and calculating its thickness (right)
4 Classification Having to compare input data with many models directly (distance measurements) and having to assign it to a proper class in real time applications is quite difficult, because as the database grows the time needed to make decision increases as well. To omit this problem, a machine learning algorithm was chosen. Utilized SVM (support vector machines) classifier, takes as input vectors the values obtained in the parameterization step. The mentioned algorithm has many advantages over commonly used neural networks. It is not burdened with any local minima and in a case of many possible solutions, it picks the best class separation in the mean of the widest margin between classes [11]. This method needs a learning stage to divide the whole parameter space into given segments. Such action elongates the phase of data preprocessing but makes the decision almost instant. Popularly used kernels within this type of classifier include linear, polynomial and radial based functions. It is said that in most classification tasks RBF is a good first choice [12], hence this kernel function is used. It can be defined by the following equation:
(6) k(x, x ) = exp(−γ x − x )
Camera Angle Invariant Shape Recognition in Surveillance Systems
37
where K(x, x ) is the proper kernel function that defines the feature space and γ is the width of radial based function. Growing classes count could cause difficulties when trying to separate each one of them by hyper planes. Because of this problem, an OVA (one vs. all) approach has been used (Fig. 4) [13]. Every class has a corresponding SVM classifier. The extracted shape parameters are passed to each classifier which is making a binary decision. Next, all the outputs are put into a vector and the overall decision is being made. It may happen that a shape is assigned to more than one class in this way. In this particular situation the final decision is marked as ambiguous. Previously described methodology relates to an analysis done in every single image frame without referring to previous results. It is only a static approach which has many flaws. Often one action can be considered as a stage of another (i.e. ‘stand’ is a stage of ‘walk’). That is why a routine resolving the object state, by using additional information, has been implemented (Fig. 5). This algorithm, labeled as “dynamic classification”, stores the decisions made by the previous block for a specified period of time. From the last n decisions about an object membership, one class is picked which appears most frequently. In other words, all decisions are averaged using a window of length n. Some errors can be discarded in this way, hence object labeling can be performed more smoothly. For the purpose of further tests, there were four classes created. With regards to human behavior 14 models were built covering actions like bend, walk and stand. Two more models describing a passenger car were formed.
Fig. 4 Binary decision classifiers scheme. Input - vector consisting of shape features; output - vector of the size N, representing all the classifiers decisions
38
D. Ellwart and A. Czy˙zewski
Fig. 5 Two step approach for object classification
5 Experiments Tests were carried out to evaluate the proposed approach to silhouette recognition. As mentioned before, the input data for the first stage of tests was prepared manually as a set of object masks. All of earlier built models were projected with an angle step around their own axis equal to 20. Hence, a total number of 288 projected images were created as a training database. After parameterization (Sec. 3) the resulting vectors were passed to the classifier. In the first step, only the static classifier was used. Verified results for recognition under two different camera angles are shown in Tab. 1. As it is presented in Tab. 1, recognizing objects observed from a higher angle results in a lower accuracy. It is not a general statement, because having a different set of objects could lead to an opposite conclusion. However, in this case watching from a higher angle resulted in some misclassifications between classes ‘Car’ and ‘Bend’. It could be caused by a small training data set for these classes (only 36 images for each) or the shape ambiguity visible under certain angles for these objects. Although this classifier works well on static images (mean accuracy 77,5% and 74,2%) its results for video sequences are poor. Simple human motion could be divided into various stages. Considering a finite number of models available for a certain action, the continuous recognition becomes hard to achieve. Therefore another processing block was attached after the existing classifier, allowing for getting more smooth results through averaging (Sec. 4). This action allowed for Table 1 Classification results for static approach
Bend Stand Walk Car None
Accuracy[%](20degrees) Accuracy[%](40degrees) 75,0 65,2 78,6 90,0 70,0 86,0 88,9 60,0 75,0 70,0
Camera Angle Invariant Shape Recognition in Surveillance Systems
39
achieving better performances (over 80%). Worth of mentioning is that depending on the reference point in the scene, camera horizontal angle can be defined variously. To unify this value, the orientation is described corresponding to the point in the center of view. This assumptions can cause some errors during the recognition phase, especially when considering silhouettes placed near frame borders.
6 Conclusions The human silhouette recognition considering camera angle insensitivity is a challenging task. Apart from recognizing human actions it may become a way to communicate the system operators or more precisely to draw their attention to a specified event. For example a waving person in front of a camera could mean that something has happened and the operator should be immediately alerted. The proposed algorithms were aimed at fast shape recognition in real monitoring systems. Many situations that occurred during experiments prove that shape information itself is insufficient to determine human actions accurately enough. Therefore further work should include a correlation between this data and other obtained parameters such as velocity and objects’ real dimensions.
Acknowledgements Research is subsidized by the Polish Ministry of Science and Higher Education within Grant No. R00 O0005/3 and by the European Commission within FP7 project “INDECT” (Grant Agreement No. 218086).
References 1. Oh, S., Lee, Y., Hong, K., Kim, K., Jung, K.: View-point Insensitive Human Pose Recognition using Neural Network. Proceedings of World Academy of Science, Engineering and Technology 34 (October 2008) ISSN: 2070-3740 2. Martinez, J.M.: MPEG-7 Overview (version 10), Palma de Mallorca (October 2004) 3. Kim, H.-K., Kim, J.-D.: Region-based shape descriptor invariant to rotation, scale and translation. Signal Processing: Image Communication 16, 87–93 (2000) 4. Amayeh, G.R., Erol, A., Bebis, G., Nicolescu, M.: Accurate and Efficent Computation of High Order Zernike Moments. In: Bebis, G., Boyle, R., Koracin, D., Parvin, B. (eds.) ISVC 2005. LNCS, vol. 3804, pp. 462–469. Springer, Heidelberg (2005) 5. Kim, H.-K., Kim, J.-D., Sim, D.-G., Oh, D.-I.: A modified Zernike moment shape descriptor invariant to translation, rotation and scale for similarity-based image retrival. In: IEEE International Conference on Multimedia and Expo., vol. 1, pp. 307–310 (2000) 6. Kopf, S., Haenselmann, T., Effelsberg, W.: Enhancing curvature scale space features for robust shape classification. In: IEEE International Conference on Multimedia and Expo., Amsterdam (July 2005) 7. Bebis, G., Papadourakis, G., Orphanoudakis, S.: Recognition Using Curvature Scale Space and Artificial Neural Networks. In: Signal and Image Processing, Las Vegas (October 1998)
40
D. Ellwart and A. Czy˙zewski
8. Ashbrook, A.P., Thacker, N.A., Rockett, P.I.: Multiple shape recognition using pairwise geometric histogram based algorithms. In: IEEE 5th International Conference on Image Processing and its Applications, Edinburgh, July 1995, pp. 90–94 (1995) ISBN: 0-85296642-3 9. Evans, A., Thacker, N., Mayhew, J.: The Use of Geometric Histograms for Model-Based Object Recognition. In: 4th British Machine Vision Conference, September 1993, pp. 429–438 (1993) 10. Huet, B., Hancock, E.R.: Line Pattern Retrieval Using Relational Histograms. IEEE Transactions on Pattern Analysis and machine Intelligence 21(12), 1363–1370 (1999) 11. Jakkula, V.: Tutorial on Support Vector machine (SVM), School of EECS, Washington State University, http://eecs.wsu.edu/˜vjakkula/SVMTutorial.doc 12. Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A Practical Guide to Support Vector Classification. Technical report, Department of Computer Science, National Taiwan University (July 2003) 13. Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13(2) (March 2002)
Multicriteria-Based Decision for Services Discovery and Selection Youn`es El Bouzekri El Idrissi, Rachida Ajhoun, and M.A. Janati Idrissi
Abstract. The requirements brought by the heterogeneity of service discovery protocols leads to a non-optimal resource distribution in pervasive environments. In such environments, servers and clients use different discovery protocols respectively for publishing or looking for services. Such lack of cooperation between discovery protocols (SDP) reduces the network performance. The interoperability between (SDP) would allow users using a large variety of devices to discover and make use of numerous network-based services. However, selecting the most favorable service among a set of appropriate services according to a given request, can become complex and time-consuming for the user. In such situations, interaction between the user and the device should be minimal for ease of use.The main goal of this work is to improve the service discovery, by including intelligent service selection based on AHP multi-criteria methods within an OSGi-based middleware for service discovery. Keywords: Discovery protocols, interoperability, context-aware, multi-criteria selection and AHP method.
1 Introduction In pervasive environments, users are surrounded by numerous services provided by local as well as remote suppliers disseminated through different networks. In order to reach these services, service discovery protocols (SDPs) seem to be a fundamental role for enhancing resource utilization. With the existing SDPs, it is already possible to discover and make use of numerous network services on different kinds of user devices. Interoperability among different discovery protocols would make it possible to discover services beyond the local area and through different domains. Youn`es El Bouzekri El Idrissi, Rachida Ajhoun, and M.A Janati Idrissi ´ Ecole Nationale Sup´erieure d’Informatique et d’Analyse des Syst`emes Mohammed V-Souissi University, Rabat, Morocco e-mail:
[email protected], {ajhoun,janati}@ensias.ma G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 41–51. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
42
Y.E.B.E. Idrissi, R. Ajhoun, and M.A. Janati Idrissi
Moreover, the multitude of service suppliers leads to a situation where users may have a large list of discovered services as a result of a query; selecting the most favorable one of them can then become a complex and time-consuming task for the average user. In this situation, the interaction between the user and his/her own device should rather be minimal for an ease of use. To deal with these issues, we present in this paper an approach based on an OSGi-based middleware for services discovery and selection[4]. The selection consists of electing the most relevant service for a given user (section II). To do so, the proposed middleware explores the services description stored in the OSGi registry denoted OBR (Oscar Bundle Repository) and the characteristics of the user to make decision (section III). The decision is based on multi-criteria method called AHP (Analytic Hierarchy Process) which is detailed at the section IV. The AHP method was proposed in the eighties by Saaty [11] to help decision makers to perform decisions in economic business. It was largely used in multi-criteria decision analysis like: vendor selection problem through different scales [12], further it was introduced by [3] as a technique for strategic selection of alliance partners. In the computer science was involved to compute the estimation of quality for software components which was an interesting approach [2]. An evaluating performance of the AHP and the Weighted Scoring Method by means of a real-life case study is used to make decision on Web services under multiple objectives [13]. We remark that the AHP will be a promised technique for service ranking since we introduce it into direct service discovery and selection. The AHP method in this work is leveraged for assigning weight values to the characteristics for the proposed model. These weights are then used to evaluate the quality of the criteria and, finally, the overall quality of the service by using the appropriate metrics. We present at section V the methodology for criteria weighting. Then, in section VI we show how services are ranked.
2 OSGi-Based Middleware for Service Discovery The interoperability between traditional discovery protocols seems to be a required mechanism in pervasive environments for achieving optimal resource exploitation. To this end, we develop a middleware based on OSGi specification [4]. We perform the middleware as a component for service discovery and context aware service selection. The interoperability is ensured by adding a specific API for every SDP involved as shown in figure1. Thus, every request done with a SDP is translated by the API to make the change at the middleware registry (OBR). Generally, we describe the scenario in six steps (figure.1). First, the service provider subscribes its services with functional information using a specific service discovery. Then (step II) the interoperability middleware recognizes this subscription by the SDP driver. In this step, the SDP driver registers this service in the OSGi service registry called OBR. The third step represents a service lookup by a user. As a result of such lookup, the SDP driver browses the OBR of eventual subscribed services (step IV). The results of this lookup shall be sent to the service filtering layer for selection (step V). Finally, the filtering layer outputs a set of services selected by the selection
Multicriteria-Based Decision for Services Discovery and Selection
43
Fig. 1 The discovery interoperability and selection scenario
decision as well as the most relevant ones for the user in a specific situation (step VI). For making decision, the middleware should have possibility to explore service properties. The next section highlights how the service properties could be collected.
3 Getting Contextual and Non Functional Service Properties With the multi-criteria method we explore both functional and non-functional information bases to select one service as an output of the method. A crucial issue that we tackle, and that has not been addressed by previous work in the field is the following: how to acquire non-functional information (contextual)? After studying the OSGi specification and analyzing the OBR described at the RFC-0112Bundle Repository [1] , we claim that is possible to extract those necessary properties. Indeed, the description format allowed by the OBR (figure 2) is able to accommodate more than barely functional fields, in the form of non-functional information like capabilities or requirement, that could be beneficial to enhancing the service lookup. In particular, the capabilities class can carry properties type as display and memory capabilities needed for the service execution. Other types like execution environment and configuration will be useful. The repository presents a description about a set of available resources registered by suppliers. Similarly, after the discovery process, the returned list of discovered services contains also this description. Each resource description is characterized in terms of capabilities as well as the hardware requirements necessary for execution at the client side. All of this knowledge about services is gathered during the discovery process when browsing the OBR and, finally, the result of the discovery is formatted under an XML message. A sample of an XML description of service named Service 1, highlights the OSGi specification of the repository. As shown in figure 2, a service can be described as a set of XML Tags. Every tag yields a value about a service property which can be a number, a name, a boolean or a set of properties. Exploring the service description allows to make a model for service selection. The properties management described in next sections is one of the ingredients for implementing a model for service selection based on contextual information. For this aim, we implement our model by a multi-criteria
44
Y.E.B.E. Idrissi, R. Ajhoun, and M.A. Janati Idrissi
Fig. 2 XML description for discovered services
AHP method, which belongs to the family of complete aggregation methods. Other methods exist, like WSM (Weight Sum Method) and WPM (Weight Product Method) [8], however the AHP method provides an hierarchic analysis of the problem and compute the best alternative with intangible criterion.
4 AHP Multi-criteria-based Method for Evaluating Service The execution of the (AHP) method is a set of actions that carefully delimit the scope of the problem environment. It is based on the well- defined mathematical structure of consistent matrices and their associated eigenvectors ability to generate true or approximate weights [9]. For more objective determination of criteria weights, the AHP method is performed in three stages: • At the beginning of the process, the overall decision problem is decomposed in a hierarchic structure in which the top represents overall objectives and lower levels represent criteria, sub-criteria and alternatives. The alternatives present a set of mutually exclusive options, the best of which has to be chosen. • Second, comparative judgments are carried out so as to reduce multi-decision alternatives problem. The first comparison is among criteria, in order to derive a final weight for each criteria. The Second comparison is among alternatives according to one criterion, using the scale at the Table 1 [5]. Example: let’s C1 is a criterion, and A1, A2, A3 present the set of alternatives. If A1 is moderately important than A2, the judgement will be 3, and A1 is very strongly important than A3 i.e the judgement will be 7. The comparison between an alternative and itself yields 1. This comparison procedure yields the following comparison matrix for each criterion: ⎛ ⎞ A1 A2 A3 ⎜A1 1 3 7 ⎟ ⎟ Mc1 = ⎜ ⎝A2 1/3 1 4 ⎠ A3 1/7 1/4 1
Multicriteria-Based Decision for Services Discovery and Selection
45
Table 1 The numerical assessments and their linguistic meanings Numerical assessment 1 3 5 7 9 2,4,6,8
Linguistic meaning equal important Moderately more important Strongly more important Very strongly important Extremely more important Intermediate values of importance
• The comparison procedure shall be repeated at each level of hierarchy, if these levels exist. After the development of all matrices, eigenvectors (relative weights which are used to compute the final weight) are computed, then a maximum eigenvalue denoted λmax is derived for each comparison matrix. λmax is used for calculating the consistency ratio (CR) [10] of the estimated vector in order to validate whether the pair-wise comparison matrix provides a completely consistent evaluation. • The last step consists of comparing priorities and making the final score for each alternative. The composite weights of all alternatives are calculated based on preferences derived from the matrix (formula 1). Let us denote by ei j the judgment of the the alternative i according to the criterion j and, W j the weight of the criterion j; the final score for a given alternative is calculated as : Si =
N
∑ W j × ei j
(1)
j=1
where N is the number of criterion Therefore, the goal is yielded by the best alternative score.
5 Service Selection Using AHP Method 5.1 Application Components To demonstrate the feasibility of the approach based on the method sketched above, we describe in this section our case study: discovering services of an m-learning (madar-learning)[6] platform by the learners. The learning platform is deployed through four networks: LAN, WLAN, ad-hoc and Bluetooth. In each network there are some service providers, which publish and subscribe their courses and services accompanying learner within OSGI registry (OBR). The rules of our application will be based on three main items presented below. 5.1.1
The Network Infrastructure
The case study (m-learning) takes place in our engineering school, and learners can move and make use of services in whole four independent networks:
46
Y.E.B.E. Idrissi, R. Ajhoun, and M.A. Janati Idrissi
• LAN: is the school backbone network which is the most powerful and secure infrastructure. • WLAN: several access points were widespread through the school space, allowing users the access to the LAN and to the internet. • Ad-hoc: point to point connection established by equipments without using infrastructure. Example: shared printer. • Bluetooth: An Ad-hoc connectivity using the Bluetooth technology. Obviously, these networks do not provide similar capabilities and their reliability is not the same. For this reason, we rank these networks with different values according to there performances; then, the values are recorded in the matrix (Table 4) of the section VI. 5.1.2
Services Involved
Within an m-learning platform, learners are surrounded by several services (printing, scanning, administrative services...) as well as learning services (pedagogical services, translator, collaborative tasks, simulators...). Since learning components can be provided by any user devices, we consider in this application four services (S1, S2, S3, S4) belonging respectively to networks (N1:LAN, N2:WLAN, N3:Adhoc, N4:Bluetooth). According to the service description presented in figure 2, services will be classified by their properties comparing them with user properties. We consider only the network and the device capabilities as user properties. Within an heterogeneous environment like the application case, users can make use a large variety of devices. For the input data management issue for the algorithm, we classify the devices into three classes: • Class A: workstations and laptops; • Class B: PDA and tactile tablet; • Class C: Smart-phones and mobile phones. The Class A is the most powerful one. And there are no constraints, hence all resources are supported by this class. However, other parameters can influence the best rule, depending also on the environment of the device and the resource localization. For this reason we have defined some selection criteria for supporting the context aware selection by weighting service properties according to a specific criterion. 5.1.3
Criterion Definition
Obviously, resources closer to the requester are the preferred ones i.e are localized at the same user network. Furthermore, they must be well rendered and displayed on the device. Especially with the content adaptation, the learning platform provides services with numerous varieties for every class of client device. The selection process should choose the best service in term of capabilities and performance. Moreover light and static resources are suitable respectively for the B and C devices class.
Multicriteria-Based Decision for Services Discovery and Selection
47
Therefore, the display and hardware performance criteria should be considered as the most important ones. Thus, we define four criteria which are presented bellow at the Table 2 with description: Table 2 Criteria and their descriptions Criterion description Display capability The goal is to check if the service can be displayed by the device screen as well as the size of screen, for optimal user presentation Device capability Is about the technical capabilities available at the device, here for checking is the device is able to run a given service. Network capability Services available through reliable and high rate network will be more favored . Proximity nearby services will be suitable for charge balancing, Without Routing mechanism, without network interoperability
These input rules present to the system a set of rules for knowledge extraction and for filtering relevant services to a specific user profile. This is based on weighting every criterion and then evaluating for each criterion the set of services. To do so, we have to rank services based on pairwise comparison and then classify their evaluation in the matrix presented below (Table 3 and 4).
5.2 Service Ranking Method From the figure 2, the values drawn from XML context information description are not only numerals, but also text and boolean. The middleware should automatically rank services; to this end, it compares services properties with those of the user which requested a service. The selection is decisive to invoke the correct evaluation functions, but depends on the data type of the criteria. We define four metrics one for each criterion. In the case when the value of the criterion can be expressed by a numeral, a numeric metrics is used in formula 2 according to the same user metric, and finally making the evaluation of the criterion. We use appropriate numbers to denote display capability, memory storage and processor capabilities. We develop an algorithm which ranks services for the future comparison with AHP method. Service ranking is illustrated among formulas 2 and 3, with Di and Pi present respectively the evaluation of given service according to display capabilities and device performance.
48
Y.E.B.E. Idrissi, R. Ajhoun, and M.A. Janati Idrissi
Di =
1 |1 − log ddui |
(2)
with du is the user device capability display Pi =
mi v i ∗ mu v u
(3)
with (di , vi , mi ) and (du , vu , mu ) are respectively the vector of service requirements and user device capabilities in terms of display, processing and memory. In this way, the algorithm classifies services with formulas yielding values for pairwise comparison, then ranks service by a score. The next step is to give a scale using the measurement presented at the Table 1 for implementing the pairwise comparison matrix. Even if the criterion is boolean type it is indeed possible to draw metric by checking similarities between properties of both service and user according to specific property. Thus, the exact match will be used as Boolean metric formula 4 which checks the proximity Pri . Moreover, the boolean type is used not only to capture 0 or 1 values, but also matching of text values(formula 5) to indicating the performance of the network where the service is available. ⎧ 1 when the network property is the same ⎪ ⎪ ⎨ between the user and the service Pri = (4) ⎪ ⎪ ⎩ 0 otherwise For the network capabilities of the network where the service supplier is located, we define four lexical judgements which help us for the matrix comparison construction. These judgements present the importance between network types where the m-learning platform is deployed through. ⎧ LAN Very strongly important ⎪ ⎪ ⎨ W LAN Strongly more impotant Ni = (5) ad − hoc Moderately important ⎪ ⎪ ⎩ Bluetooth Normal The most important strength of those metrics is that we have designed evaluation rules dependent on the attributes type. As the result, we can reuse one metric rule for different services. The next step for this scenario is to build the matrix comparison using the AHP method.
6 Obtaining Results by the AHP Method In this section we present an application of the AHP method based on the metrics defined previously. Those metrics are deducted from the algorithm using formulas to compute the scale in the matrix of pairwise comparison. As explained before,
Multicriteria-Based Decision for Services Discovery and Selection
49
Fig. 3 The problem hierarchy Table 3 Pairwise comparison matrix for the four decision criterion Display DC NC Proximity weight 1 2 4 3 0.467 1/2 1 3 2 0.277 1/4 1/3 1 2 0.160 1/3 1/2 1/2 1 0.095 Sum 1 The consistency ratio: CR= 0.01
Display DC NC Proximity
we define four criteria figure 3, it’s also considered to use four alternatives (S1, S2, S3, and S4) belonging respectively to four different networks (LAN, WLAN, ad-hoc and Bluetooth). We consider in this case that S2 resides in the same network with the user. Therefore, the other judgements for each criterion is calculated from the algorithm. Before developing within-alternatives matrices; it should to start with elicitation and calculation of the between-criteria preferences. This appears at the higher level of the problem hierarchy and is intended to capture and display the relative importance or preference of each criterion from the decision criteria. As shown in Table 3, a 4 by 4 matrix must be constructed for the between-criterion comparison of decision alternatives is now applied in a similar mode at the current level. We denote by NC and DC respectively network and device capabilities. The current step consists in constructing matrices which reflect alternatives comparison according to one criterion. The algorithm is applied to the services for giving the right relative priority of each of the decision alternatives according to all criteria. For every matrix a consistency ration is calculated and presented at the Table 4. Once the preferences are elicited and the pairwise preferences matrices are constructed, the right eigenvector method is applied to matrices to allow the relative priority of each of the decision alternatives. A consistency ration is calculated for matrices. In the Table 4, the service SI is the most preferred with respect to criterion display, network and device capabilities. However the service S2 is nearest to the user (the same network). Applying the final score for all services using the formula 1 we have service classification
50
Y.E.B.E. Idrissi, R. Ajhoun, and M.A. Janati Idrissi
Table 4 Individual pairwise comparison matrices for the four decision alternatives compared in the context of a single decision criterion Criterion : Display capabilities Candidate S1 S2 S3 S4 Weight S1 1 4 2 2 0.447 S2 1/4 1 1/2 2 0.264 S3 1/2 2 1 2 0.159 S4 1/2 1/2 1/2 1 0.131 The consistency ratio: CR= 0.02 Criterion : Network capabilities Candidate S1 S2 S3 S4 Weight S1 1 2 3 4 0.467 S2 1/2 1 2 3 0.277 S3 1/3 1/2 1 2 0.160 S4 1/4 1/3 1/2 1 0.095 The consistency ratio: CR= 0.01
Criterion : Device capability Candidate S1 S2 S3 S4 Weight S1 1 1 2 3 0.354 S2 1 1 2 3 0.354 S3 1/2 1/2 1 1 0.161 S4 1/3 1/2 1/2 1 0.131 The consistency ratio: CR= 0.01 Criterion : Proximity Candidate S1 S2 S3 S4 Weight S1 1 4 4 4 0.143 S2 1/4 1 1 1 0.571 S3 1/4 1 1 1 0.143 S4 1/4 1 1 1 0.143 The consistency ratio: CR= 0.00
Fig. 4 The overall goal by the AHP method
under their total score figure 4: The output of the method is the goal expected from our experience. The process is repeated more than 20 times for different situations, then results were always satisfying and all judgments are reasonable. The AHP methods strength lies in its ability to structure the criteria of complex problems hierarchically. The AHPs main drawback lies in the high number of pairwise comparisons between (sub-) criteria [7]. We remark, this drawback persists only when there are many hierarchically levels, at this time may present difficulties at pairwise comparison. Contrarily, it’s remains simple to use with maximum 4 levels.
7 Conclusion and Future Works Although interoperability of service discovery protocols enhances the performance of pervasive environments, it’s insufficient to provide user with facilities and conveniences. User or learner as we explain in our study should be accompanied by the system, allowing a simple and transparent use. For this purpose, we use the Analytical Hierarchical Process AHP method to rank services and then suggest the best one with respect of the user context. We have demonstrated the use of this technique in an experiment by taking an example with m-learning platform services. Finally, we evaluate the overall goal which is one of a services list. This evaluation is pairwise comparison for selecting the best suitable service. Defining new criteria as user
Multicriteria-Based Decision for Services Discovery and Selection
51
preferences, social network-based profile and collaborative trend seem to be more beneficial for the learning activities. To do so, we should enrich the model through ontologies, and further information provided by other resources.
References 1. RFC-0112 Bundle Repository, by the OSGi alliance and Richard S. Hall (2005) 2. Grover, P.S., Sharma, A., Kumar, R.: Estimation of quality for software components: an empirical approach. In: ACM SIGSOFT Software Engineering Notes, vol. 33(6) (2008) 3. Erdal, N., Gulcin, B., Orhan, F.: Selection of the strategic alliance partner in logistics value chain. International Journal of Production Economics 113, 148–158 (2008) 4. El Bouzekri, Y., Ajhoun, A.: An osgi-based approach for context-aware discovery protocols interoperability. In: The international Conference of Mobile Communication and Pervasdive computing, MCPC 2009, Leipzig, Germany, Published at the .CoSIWN international Journal (2009) 5. Buyukozkan, G., Isiklar, G.: Using a multi-criteria decision making approach to evaluate mobile phone alternatives. Computer Standards & Interfaces (Elsevier) 29, 265–274 (2007) 6. Najima, D.: Towards an interoperable and extensible learning environment: madarlearning. In: International Conference on New Generation Networks and Services, NGNS 2009 (2009) 7. Aguilo, J., Duran, O.: Computer-aided machine-tool selection based on a fuzzy-ahp approach. Expert Systems with Applications 34(3), 1787–1794 (2007) 8. Canbolat, M.S., Cakir, O.: A web-based decision support system for multi-criteria inventory classification using fuzzy ahp methodology. Expert Systems with Applications: An International Journal 35(3) (October 2008) 9. Saaty, T.L.: Decision-making with the ahp: Why is the principal eigenvector necessary. European Journal of Operational Research 145(1), 85–91 (2003) 10. Saaty, T.L.: Fundamentals of decision making and priority theory, 2nd edn. RWS Publications, Pittsburgh (2000) 11. VargasLuis, G., Satty, T.L.: Models, Methods, Concepts and Applications of the Analytic Hierarchy Process. Kluwer Academique Publishers, Dordrecht (2001) 12. Haleem, A., Kumar, S., Parashar, N.: Analytical hierarchy process applied to vendor selection problem: Small scale, medium scale and large scale industries. Business Intelligence Journal 2(2), 355–362 (2009) 13. Stummer, C., Neubauer, T.: Interactive selection of web services under multiple objectives. Information Technology and Management, Business and Economics (2009)
Building a Minimalistic Multimedia User Interface for Quadriplegic Patients Constantinos Patsakis and Nikolaos Alexandris
Abstract. In this work we explore the possibilities of building a multimedia user interface that facilitates the everyday needs of quadriplegic patients. The proposed interface is built using current technology with the least affordable cost and off-theself solutions.
1 Introduction Current trends in software development create more intuitive user interfaces, users may control their devices mostly with their motion. Of course we have passed the point of using keyboards or just the movement of mouse, we may recognize more than movement, like gestures. On the other hand, game consoles are beginning to push more intuitive user interaction between the users and the consoles, a classic example is Nintendo Wii and PlayStation Eye. In the first case, using three dimensional controllers, users may enter data through body movement. In the second case, users may enter data through body movement, using the attached camera and the special software that recognizes body movements. This new human-computer interaction has lead to a great increase in the revenues of these companies, as more and more users buy these game consoles, to enjoy more active computer games. In the mean time, the neurological science has been developed to a big extend. As a result mind scanning tools have become affordable, simplified and small in size. In fact, several companies have taken advantage of it and have developed games Constantinos Patsakis Department of Informatics, University of Piraeus, 80 str Karaoli & Dimitriou, 18534 Piraeus, Greece e-mail:
[email protected] Nikolaos Alexandris Visiting Professor at the University of Canberra, Faculty of Information Sciences and Engineering, is Professor of University of Piraeus in the Department of Informatics e-mail:
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 53–58. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
54
C. Patsakis and N. Alexandris
Fig. 1 Games developed using EEG sensors. Mindflex on the left and Star Wars force trainer on the right are to games, the first one developed by Mattel and the second one by Uncle Milton. In these games, the user concentrates to lift a ball up using a fan that is controlled by the strength of the brainwave signals that EEG sensors detect.
Fig. 2 Neurosky and emotiv. Two affordable EEG sensor solutions that offer API for developers.
using portable brain scanners, two well known examples can be seen in figure 1, while there are other sensors in the market, figure 2, that offer tools to developers. Of course, the quality of the measures they are able to track are not very accurate, but used properly, they may create a new even more intuitive interaction between the user and the computer. Quadriplegic patients are a special case of patients, as due to traumas in the spinal chords or brain damages, their body movement is minimal. The effect of these traumas is that even if their brain sends signals to their arms or legs they are not to move
Building a Minimalistic Multimedia User Interface for Quadriplegic Patients
55
them. Moreover, in many cases their speaking abilities might have been damaged as well. This means that even if a computer could help them very much in their everyday life, it is very difficult for them to interact with it, as they are in no position to handle most of the current input devices, or use speech recognition software. In this work, we present a minimalistic multimedia user interface we have developed, which can be used by quadriplegic patients, offering them much comfort and independence for several everyday tasks. The use of portable brain scanners may help them in their everyday life, given the fact that due to their condition, it isn’t possible to use many other means of interaction. The proposed user interface, takes into consideration the obvious patient’s conditions, as well as the total cost of the application.
2 Current Tools The wireless headsets above can read and interprets brain waves. More precisely, they are able to detect the level of concentration. This can be done using the dry neural sensors on the headset, specifically beta waves that are associated with concentration. The brain scanning techniques that we have so far can be categorized in the following three categories (a) non invasive, (b) semi-invasive and (c) invasive, according to where the sensors are put. For example in the first category we have Electroencephalography (EEG), which measures the electrical potentials produced by the brain and Magnetoencephalography (MEG), which measures the magnetic fields produced by the brain. Both these techniques can be applied using external sensors. In the case of Electrocorticography (ECoG), electrodes should be placed on the surface of the brain, making it a semi-invasive technique. EGG sensors can detect brain responses that are triggered by sensory stimuli, cognitive processes or external stimulus and follow specific patterns. In order to separate the correct data from the noise, several techniques are used like Laplacian Spatial Filtering or Independent Component Analysis [10, 11].
3 The Proposed User Interface The proposed interface takes advantage of current technology of EGG sensors to control “circular menus” or more precisely tree menus, using patients brainwaves. Given the fact that quadriplegic patients are in no position to move controllers and in many cases, may have problems in speaking, they may use their thoughts to control a user interface. Unfortunately, the current affordable and small scale mind scanners on the market, can only interpret a small fraction of brainwaves, but as it becomes obvious, even this small fraction can be enough. Using off-the-self programmable brain scanners, we can detect the level of meditation and concentration of the user. These values will give the needed data to our interface, and as it becomes apparent, these values can be easily modified by the users thoughts. The main idea is that the patient is resting in a room wearing a programmable EEG sensor. In the room we have installed a PC and a big screen
56
C. Patsakis and N. Alexandris
Fig. 3 The menu options in each step.
opposite to the patient. On the PC we have attached on the serial or parallel port the nurse calling button that every hospital has in patients’ beds and the PC might have Internet connection. Before using the system, we have a phase of calibration and training. During this this phase, the patient becomes acquainted with the user interface, while depending on the patient’s brainwaves strength we set some thresholds, as to when we should consider that we have a “click”. Thus, concentration for a given interval, becomes the analogous of clicking in a a common graphic user interface. The user interface that we created is a decision tree like menu, in each step, the user selects the type of action that he wants and the next step gives him more options. Because of the type of interaction, the number of options in each step is limited, in order to facilitate the user and to avoid mistakes. The menus move circular from option to option in a given period of time, until the user triggers a click event with his thoughts. Using the brainwave sensors, the user has only to concentrate when he sees the option he wants, when it becomes highlighted, to pick it from the menu. In the first step, the user has the most urgent and most common tasks as options. This means that the user can select from calling for an emergency to changing TV channel and volume. The main menu can be seen in figure 4. As it seen in the figure, the interface is minimal. The user sees nothing but the list of tasks that he can select in each step. The complete list of the tasks that can be selected is shown in figure 3. Of course, these tasks can easily be extended or even restricted. In every step, the tasks are displayed as big buttons and are being highlighted with a vivid color, making obvious to the user what task can be selected in every given period.
Building a Minimalistic Multimedia User Interface for Quadriplegic Patients
57
Fig. 4 The main menu of the interface. In each step of the menu, the current option is highlighted with red color, while the others have a blue color. If the user concentrates while on the highlighted option, the option is selected. Each option is being highlighted for a certain period of time before moving to the next one.
This minimality is chosen, as it does not confuse the user and his thoughts, thus minimizing the probability of wrong input. If the user doesn’t select anything after 3 “rounds”, the interface exits and continues with the previous selected task, e.g. playing the next music track. One of the features that this interface has, is the ability to enable communication between the patient and people that are not close to him. The patient has the option to communicate with other people using messages, that are written from the patient using T9 system and a screen like a mobile phone’s keyboard. There’s also the option to browse the Internet using a list of favorite web sites, to keep track with current news. It is obvious that while the patient is wearing the brain scanner, his level of concentration and meditation will always change. In order to trigger the user interface, the user has to make a certain “pattern of thoughts”, this means for example that he has to concentrate a lot for two seconds, then meditate for two and then concentrate again. The pattern and the duration can be decided during the calibration. The idea is that the interface will not be accessible all the time, but only when triggered, because the constant change of thoughts will not let the patient for example watch a movie, as he will shortly trigger another event from the menu. In order to facilitate the user, the meditation and concentration values can always be seen with two small progress bars on the lower part of the screen. So the user knows if he has to concentrate or meditate more, to trigger an event.
4 Possible Improvements The possible improvements can be made with the analogous improvement on the field of mind scanning techniques. Current studies show that it is possible to detect and interpret more data from a patient’s brain, [1, 3, 4, 6] or even detect intentions [5]. If this can be done with less cost and smaller size equipment, it might be possible
58
C. Patsakis and N. Alexandris
to ease the life of quadriplegic patients, enabling them to complete everyday tasks, that anyone takes for granted, on their own. The use of exoskeletons or even other implants may be used. Our research so far, is trying to decode more data with current off-the-self equipment and other possible extensions and application of such devices.
5 Conclusions The proposed user interface is of course minimal and might seem trivial, yet the support it offers to quadriplegic patients, mostly in psychological terms is great. The aim of this interface is not to copy the common complex interfaces but to give patients the least possible freedom, the freedom to choose on their own what they want to listen or watch. The biggest freedom yet, is to enable them to communicate with the people beyond their room, whenever they choose to. To have a life outside of their rooms. Things that most healthy people take for granted. It is sure that soon we will have portable scanners that would be able to detect more than just concentration and meditation, that will enable more events than “clicking”. When this is possible, we might have two or even three dimensional data, enabling better user interfaces, even more intuitive, offering more comfort and ease to patients.
References 1. Swisher, J.D., Gatenby, J.C., Gore, J.C., Wolfe, B.A., Moon, C.H., Kim, S.G., Tong, F.: Multiscale pattern analysis of orientation-selective activity in the primary visual cortex. Journal of Neuroscience 30(1), 325–330 (2010) 2. Harrison, S.A., Tong, F.: Decoding reveals the contents of visual working memory in early visual areas. Nature 458, 632–635 (2009) 3. Haynes, J.: Detecting deception from neuroimaging signals a data-driven perspective. Trends Cogn. Sci. 12(4), 126–127 (2008) 4. Soon, C.S., Brass, M., Heinze, H.J., Haynes, J.D.: Unconscious determinants of free decisions in the human brain. Nat. Neurosci. (2008) 5. Haynes, J., Sakai, K., Rees, G., Gilbert, S., Frith, C., Passingham, R.E.: Reading hidden intentions in the human brain. Curr. Biol. 17(4), 323–328 (2007) 6. Haynes, J., Rees, G.: Decoding mental states from brain activity in humans. Nat. Rev. Neurosci. 7(7), 523–534 (2006) 7. Monti, M.M., Vanhaudenhuyse, A., Coleman, M.R., Boly, M., Pickard, J.D., Tshibanda, L., Owen, A.M., Laureys, S.: Willful Modulation of Brain Activity in Disorders of Consciousness. N. Engl. J. Med. (February 3, 2010) 8. Sutton, S., et al.: Evoced potential correlates of stimulus uncertainty. Science 150, 1187– 1188 (1965) 9. Naatanen, R.: Processing negativity: an evoked-potential reflection of selective attention. Psychological Bulletin 92, 605–640 (1982) 10. McFarland, D.J., et al.: Spatial filter selection for EEG-based communication. Electroencephalography and clinical Neurophysiology 103, 386–394 (1997) 11. Lee, T.-w., Girolami, M., Sejnowski, T.J., Hughes, H.: Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources. Neural Computation 11, 417–441 (1999)
Biofeedback-Based Brain Hemispheric Synchronizing Employing Man-Machine Interface Kaszuba Katarzyna, Kopaczewski Krzysztof, Odya Piotr, and Kostek Bo˙zena
Abstract. In this paper an approach to build a brain computer-based hemispheric synchronization system is presented. The concept utilizes the wireless EEG signal registration and acquisition as well as advanced pre-processing methods. The influence of various filtration techniques of EOG artifacts on brain state recognition is examined. The emphasis is put on brain state recognition using band pass filtration for separation of individual brain rhythms. In particular, the recognition of alpha and beta states is examined to assess whether synchronization occurred. Two independent methods of hemispheric synchronization analysis are given, i.e. the first consisted in calculating statistical parameters for the entire signal registered and the second one in using wavelet-based feature statistics for different lengths of time windows, and then discussed. Perspectives of the system development are shown in the conclusions. Keywords: hemisphere synchronization, EEG, EOG, wavelet transform, alpha waves, beta waves, wavelet transform.
1 Introduction Creating a brain-computer interface is currently a challenge for many researches, however most of them focus on using EEG (Electroencephalography) or MEG (Magnetoencephalography) signals to control a PC, while others concentrate on realizing biofeedback to help curing migraines or sleeping disorders. The approach presented is dedicated to creating a system that supports concentration techniques by increasing hemispheric synchronization for effective learning. In this paper a synchronization state is mainly recognized by examining which of the brain rhythms dominates. A decrease in the beta band and an increase in the alpha band is the most desired state to achieve as it constitutes an evidence of hemisphere synchronization [1,4]. The article is divided into six sections. In the first section, a wireless Kaszuba Katarzyna, Kopaczewski Krzysztof, Odya Piotr, and Kostek Bo˙zena Gdansk University of Technology, Multimedia Systems Department, Gdansk, Poland G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 59–68. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
60
K. Katarzyna et al.
recording system is described. This is a unique approach to brain state examination as only four electrodes located upon the subject’s forehead are used. The second section is dedicated to methods of signal recording employed in this study. The recording setup is described and then recording protocols are presented. The next part of the paper focuses on pre-processing techniques. An advanced method of filtering EOG artifacts using wavelets transform is given. Also a band pass filtration to separate different brain rhythm is presented. A brain source localization algorithm is applied to decide whether the synchronization occurs. Two independent methods of brain signal detection are presented. The first method uses statistical parameters calculated for the entire recording, while the second one is based on discrete wavelet features calculated for different lengths of time windows. All calculations are made for offline signals since such a concept guarantees better flexibility of data analyzing. The results are also presented and discussed and a conclusion is provided.
2 Recording System Taking into account comfort of the future user together with the simplicity of usage, a wireless EEG recording system was chosen for the system developed at the Multimedia Systems Department of Gdansk University of Technology (GUT). In this case, the wireless Enobio solution is the basis of the system discussed [11]. The Enobio device can record four signals from electrodes located on a subject’s forehead. The placement of electrodes is shown in Figure 1.
Fig. 1 Localization of electrodes in Enobio device frontal and overhead view [11].
Since such a localization of electrodes results in a great amount of EOG artifacts in the recorded signal [4,6], an attempt was made for the electrode placement re-configuration. However, it turned out that the new configuration is not suitable for recording signals since Enobio electrodes are incapable of registering signals through subject’s hair. To deal with EOG artifacts, it was decided to use advanced signal filtration methods. The device provides the 250Hz sampling frequency along with 0.589 V resolution. Since the Enobio framework allows for data recording into a single file, an attempt to create independent software will be made in the future.
Biofeedback-Based Brain Hemispheric
61
3 Data Collection Two separate data collection protocols were applied as two independent approaches for hemisphere synchronization were examined. It was crucial for the signal registration that a subject stays still and does not make unnecessary movements as it might interfere with useful information in the signal. Measurements were performed in a recording studio since it guaranteed no additional disturbances. During the recording the light stayed off [4,6]. In the first registration scenario five subjects (males, average age approx. 24) were examined. Each subject participated in two recording sessions. The signal registration was divided into three stages: • 1 minute without stimulation, • 10 minutes with stimulation, • 1 minute after simulation, The test participant was always sitting in a comfortable chair with the eyes closed. Two different stimulus signals were chosen for this experiment. First session was carried out using only binaural beat sounds. The sound frequency was adjusted during the stimulation to intensify alpha band. The second strategy extended the stimulus set given in previous scenario with isochrones and pulse sounds. Frequencies of those sounds were dynamically changed in the pattern presented in Figure 2. The alpha band was amplified and the remaining ones were attenuated in both experiments [7,10]. In the second registration protocol signals were recorded for four individual subjects (3 males, 1 female, average age group 24). In this case the protocol was divided into two following scenarios: • 15 minutes while listening to classical music, eyes closed, sitting; • dealing with two sets of mathematical calculation, eyes opened, sitting. For the first scenario, it was assumed that alpha waves should dominate and that hemisphere synchronization was achieved [1,2,4]. The second scenario, i.e. calculating mathematical expressions was supposed to increase the left hemisphere activity resulting in the increase of the beta state. This hypothesis was then validated using source localization techniques, which is explained later on. For the source localization a head model was calculated using a sphere modeling method [2,4]. 21
Fig. 2 Frequency adjustment pattern during stimulation with binaural beats with isochrones and pulse sounds
62
K. Katarzyna et al.
head points were used. These points refer to the electrode location in an equivalent standard EEG recording system. Based on the obtained field matrix sources were computed with the minimum normalized imaging method [4]. This experiment confirmed the hypothesis assumed for most recorded signals. Recordings which did not fulfill the assumption were excluded from the next stage of the experiment. Signals that did not fulfill the hypothesis were excluded from the data set to ensure no misleading information for the classifier used in experiments.
4 Data Pre-processing Since electrode reconfiguration didn’t result in a desired effect, advanced filtration methods had to be applied. In the first step, a signal band of high pass frequency fHP = 0.5Hz and low pass frequency fLP = 50Hz was separated using a set of two
Fig. 3 Block diagram of the EOG signal removal from registered EEG signal.
Fig. 4 Results of adaptive filtration and signal subtraction for a 6-second length signal.
Biofeedback-Based Brain Hemispheric
63
Fig. 5 Separated delta, theta, alpha and beta bands for a signal without the ocular artifact filtration.
third-order filters. The DC offset was eliminated and interferences from power supply were excluded from the useful band [4,6]. The main concern for the signal classification analysis was to eliminate EOG (Electrooculogram) artifacts from the signal. Even small amount of artifacts can significantly change the classification accuracy. In this experiment, two methods of reducing these interferences were examined: a simple signal subtraction and the adaptive filtration in the time-frequency domain. In both cases two outer channels were used as the EOG reference signal, while inner channels were treated as those with corrupted EEG signals. Such an approach results in a loss in filtered signal amplitude, however it ensures that no undesired EOG components in the signal appear. The signal subtraction was realized with the following formula: EEG = EEGcorrupted − EOGre f erence
(1)
Before adaptive filtration was applied both the corrupted EEG and reference EOG signals had been transformed into time-frequency domain using wavelet transform. The symlet mother wavelet function of the third-order was applied and the decomposition level was set to 8. Adaptive least squares filters were tested. For this purpose the recursive least square (RLS) filter (with memory) and normalized least mean square (NLMS ,no memory) were used. The adaptive filtration process is shown in Figure 3 [5,6]. For the RLS filtration the third-order filter was applied with a forgetting factor (λ - exponential weighting factor) set to 0.4, while the NLMS filtration used fourthorder filter with the step-size 0.4 and the leakage factor 1 (no leakage). The results of the technique employed are presented in Figure 4. Both ways of the adaptive filtration gave similar results, when the subtraction reduced the amplitude of useful signal. However, in the future system development an extra EOG registration signal will be used to avoid the amplitude loss in the filtered signal.
64
K. Katarzyna et al.
Fig. 6 Separated delta, theta, alpha and beta band for a signal with the ocular artifact filtration.
Then the band pass filtration was applied to extract brain rhythms. Wave separation was performed for two cases: signal with and without EOG reduction. Results were then compared to see whether filtration improved the visibility of signals. Results and comparison of these actions are given in Figures 5-6. It could be noticed that signals without the artifact elimination have higher amplitudes but contain noise and do not have the desired wave shape. A loss in the amplitude can be observed for filtered data, however in this case a correct shape of brain rhythms appeared.
5 Brain Synchronization Detection Two sets of coefficients were calculated to detect hemisphere synchronization. The first approach was based on average amplitude features. The average amplitude value was defined with formula 2 [8, 9]: meanband =
i=xe
∑ |xi |
(2)
i=xs
Where xi is a signal sample, xs , xe represent samples of the signal lower and higher frequency bounds and |.|stands for the signal module. Then the following features were calculated: • Nearest band ratio, e.g. meantheta /meanal pha; • Percentage contents of each band in the entire signal; • Theta band to beta band ratio. As an additional factor a correlation coefficient between hemispheres was calculated [8,9]. In the second approach the signal was windowed without overlapping. Four different time window length were used, i.e.: 5s, 2s, 1s and 0.5s. A stationary wavelet
Biofeedback-Based Brain Hemispheric
65
Fig. 7 Distribution of feature values in the low beta band and for the entire band in the 5s time window.
transform was applied using the symlet mother wavelet function of the third-order. The transform was calculated for the entire signal band and for four brain rhythm sub-bands: alpha, beta, low beta and high beta waves. This resulted in a set of 40 features for each channel. Four statistical values were extracted from the waveletbased coefficients [5]. From each frame maximum and minimum values were chosen and in addition mean and variance were calculated. These features were then examined to see how they separate domination of alpha and beta states by employing parameter visualization and cumulated histograms. An example of the parameter map for 5s time window is given in Figure 7. Despite the fact that the distribution of all feature values is in very close proximity to each other, it is still possible to isolate two separate sets for domination of alpha and beta states. The possibility of distinguishing between classes based on these data was tested employing Logistic Model Tree (LMT). This classifier combines advantages of linear logistic regression methods together with tree induction and is utilized for supervised learning tasks. In this method a single model tree is produced by creating a linear regression model in each node of a tree. For growing a decision tree the C4.5 splitting algorithm is used, while the model is pruned back with the CART algorithm [3].
6 Results The results for the sound stimulation divided into positive and negative ones are stored in Tables 1-2. Distinguishing between hemisphere synchronization based on the statistical analysis has shown uncertain outcomes. However it could be noticed that the number of satisfying results is slightly higher than the negative ones. This may indicate that with a more complex method of classification, better results might
66
K. Katarzyna et al.
Table 1 Results of binaural beat sound stimulation Indicators
Results [%] Positives Negatives comparison 55% 45% percentage contents 93% 17% low wave share 46% 54% correlation 59% 41%
Total 40 44 11 32
Table 2 Results of binaural beat, isochrones and pulse sound stimulations Indicators
Results [%] Positives Negatives comparison 57% 43% percentage contents 77% 23% low waves share 44% 56% correlation 64% 36%
Total 30 36 9 28
Table 3 Confusion matrices for classification 80- and 16-feature sets with time windows of 5s, 2s, 1s, 0.5s 80 features set Training set alpha beta alpha 100.0% 0% beta 0% 100.0% Training set alpha beta alpha 100.0% 0% beta 0% 100.0% Training set alpha beta alpha 100.0% 0% beta 0% 100.0% Training set alpha beta alpha 100.0% 0% beta 0% 100.0%
16 features set 5s time window Cross-validation Training set Cross-validation beta alpha beta alpha beta alpha 0.5% alpha 100.0% 0% 99.7% 0.3% beta 92.3% beta 0% 100.0% 3.8% 96.2% 2s time window Cross-validation Training set Cross-validation alpha beta alpha beta alpha beta 99.7% 0.3% alpha 100.0% 0% 99.7% 0.3% 3.8% 96.2% beta 0% 100.0% 4.5% 95.5% 1s time window Cross-validation Training set Cross-validation alpha beta alpha beta alpha beta 99.7% 0.3% alpha 99.7% 0.3% 99.5% 0.5% 1.9% 98.1% beta 0% 100.0% 3.4% 96.6% 0.5s time window Cross-validation Training set Cross-validation alpha beta alpha beta alpha beta 99.5% 0.5% alpha 99.7% 0.3% 99.3% 0.7% 2.8% 97.2% beta 1.3% 98.7% 7.3% 92.7%
Biofeedback-Based Brain Hemispheric
67
be achieved. The experiment also confirmed that the increase in the alpha band amplitude causes the correspondent increase in the correlation coefficient. Since wavelet transform coefficients gave satisfying separation for the defined classes, attribute sets were tested using the LMT classifier with cross-validation technique [3]. Also the efficiency of the training set was examined. Two scenarios of classification were used, i.e. the first one uses a set of 80 wavelet features and the second one employs only 16 features extracted from the entire band of the signal. The results of classifications are presented in Table 3. It is visible that the highest recognition was achieved for the frame length of 1s and a set of 80 features. A slight decrease in the classifier efficiency can be observed when a shorter time-length window is used. This can cause a problem for future brain-interface application, as it could reduce a time-resolution of the system designed. Still, a high accuracy of classification ensures good perspectives for the system development. Using the reduced set of features, a slight decrease in the classifier efficiency can be observed for time-length windows of 0.5s, 1s and 2s, however for the 5s-time window the accuracy has been improved. This may imply that for some cases employing too many parameters can lead to misclassification. Limited parameter set cannot deal correctly with the training data, still it could mean a greater ability of generalization in the classifier. Since using smaller number of features results in a simpler classification model and faster data processing, in future research a reduced set will be implemented.
7 Conclusion Gathered knowledge implies that it is possible to build a fully functional system that could use hemispheric synchronization in brain-computer-based interfaces. Since different brain states could be distinguished, the system designed may guarantee an effective sound or visual feedback to increase concentration and the effectiveness of the learning process. To achieve these goals the application developed will use a set of mnemotechnique exercises combined with the optimally selected feedback signal. Optimal “usage of brain” may be guaranteed by dividing learning process into active (e.g. exercises) and break phases to recover full functionality of one’s mind. After each session statistics from hemisphere activities should be given to better judge correlation between the achieved state and task performance.
Acknowledgments Research funded within the project No. POIG.01.03.01-22-017/08, entitled “Elaboration of a series of multimodal interfaces and their implementation to educational, medical, security and industrial applications”. The project is subsidized by the European regional development fund of the and by the Polish State budget.
68
K. Katarzyna et al.
References 1. Kemp, B., Varri, A., da Rosa, A., Nielsen, K.D., Gade, J., Penzel, T.: Analysis of brain synchronization based on noise-driven feedback models. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society (1991) 2. Khemakhem, R., Zouch, W., Hamida, B.A., Taleb-Ahmed, A., Feki, I.: Source Localization Using the Inverse Problem Methods. IJCSNS International Journal of Computer Science and Network Security 0(4) (2009) 3. Landwehr, N., Hall, M., Frank, E.: Logistic Model Tree, Department of computer Science University of Freiburg, Freiburg, Germany, Department of Computer Science, University of Waikato,Hamilton, New Zeland (2004) 4. Sanei, S., Chambers, J.A.: EEG Signal Proccesing, Centre of Digital Proccesing. Cardiff University, UK (2008) 5. Samar, V.J., Bopardikar, A., Rao, R., Schwarz, K.: Wavelet Analysis of Neuroelectric Waveforms: A conceptual tutorial. Brain and Language 66, 7–60 (1999) 6. Senthil, K.P., Arumuganathan, R., Sivakumar, K., Vimal, C.: An adaptive method to remove ocular artifacts from EEG signals using wavelet transform. Journal of Applied Sciences Research, 741–745 (2009) 7. Settapat, S., Ohkura, M.: An Alpha-Wave-Based Binaural Beat Sound Control System using Fuzzy Logic and Autoregressive Forecasting Model. In: SICE Annual Conference (2008) 8. Zunairah, H., Murat, N., Mohd, T., Hanafiah, Z.M., Lias, S., Shilawani, R.S., Kadir, A., Rahman, H.A.: Initial Investigation of Brainwave Synchronization After Five Sessions of Horizontal Rotation Intervention Using EEG. In: 5th International Colloquium on Signal Processing & Its Applications, CSPA (2009) 9. http://www.eeg-biofeedback.com.pl/index.php?mod=vademecum 10. http://www.hemi-sync.pl (in Polish) 11. http://starlab.es/products/enobio
Performance of Watermarking-Based DTD Algorithm under Time-Varying Echo Path Conditions Andrzej Ciarkowski and Andrzej Czy˙zewski
Abstract. A novel double-talk detection (DTD) algorithm based on techniques similar to those used for audio signal watermarking was introduced by the authors. The application of the described DTD algorithm within acoustic echo cancellation system is presented. The problem of DTD robustness to time-varying conditions of acoustic echo path is discussed and explanation as to why such conditions occur in practical situations is provided. The environment and the procedure used for simulation of test conditions and evaluation of DTD algorithms are presented. Results of comparing performance of the introduced watermarking DTD with the wellestablished Geigel DTD algorithm are presented.
1 Introduction Acoustic echo is one of the most important factors affecting quality and comprehensibility of speech in communications systems. This statement is especially true for rapidly-developing systems based on Internet protocol, commonly known as VoIP (Voice over Internet Protocol) applications, which tend to introduce higher delays and consequently more noticeable echo compared to traditional telephone systems (POTS). Another reason for increased interest in acoustic echo elimination systems are changing behavioral patterns of telephony users due to the fact that frequently traditional phone handsets are becoming replaced by laptop computers with built-in loudspeaker and microphone acting as a speakerphone terminal. Such a configuration, which is inherently echo-prone due to high coupling between sound source and receiver, is also common in teleconferencing applications and car hands-free adapters making the telecommunications acoustic echo problem even more tangible. Acoustic echo appears in the conversation when speech signal from far-end speaker, reproduced locally by the loudspeaker is being fed into the receiver Andrzej Ciarkowski and Andrzej Czy˙zewski Gdansk University of Technology, Multimedia Systems Department 80-233 Gdansk, Poland, Narutowicza 11/12 e-mail:
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 69–78. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
70
A. Ciarkowski and A. Czy˙zewski
(microphone) and returns to the original speaker. High amount of echo mixed with signal from the local (near-end) speaker distorts the communication, making his speech unintelligible and forces the far-end speaker to increase his concentration on understanding the message, which is not only stressful, but can even lead to dangerous accidents in case of car hands-free conversation. To counteract this problem in full-duplex communications setups acoustic echo cancellation (AEC) algorithms are used. Such algorithms typically process the incoming microphone signal in order to remove from it the estimate of echo signal, obtained through the transformation of recently reproduced far-end speaker signal. Most of the AEC algorithms proposed in the literature use adaptive filtering in order to estimate the echo path response. This allows obtaining accurate estimate of echo signal through filter adaptation and effectively eliminate the echo from microphone input by simple signal subtraction, provided that its contents is the sole echo signal [1]. Such an assumption however is hardly realistic, as besides the echo signal the microphone signal will typically contain some amount of noise and most importantly, the near-end speech from local speaker. The latter case, which is called double-talk [2] has to be detected so that the process of filter adaptation could be stopped. This prevents the adaptive filter from “detuning” from echo path response, which would lead to substantial distortion of microphone signal. The detection of such condition is a task of double-talk detector (DTD) algorithm, which is considered the most significant and troublesome element of an AEC system. The literature lists numerous approaches to the subject of double-talk detection [3] which vastly differ in accuracy and complexity. Simplest algorithms are based on a comparison of energy of incoming microphone signal with reproduced far-end signal. They usually perform poorly when applied to acoustic echo, however their computational complexity is very low, which allows for cheap implementation. Most notable representative of this group is Geigel algorithm [4]. Its common use is hybrid echo cancellation in POTS systems. More accurate DTD solutions typically are based on observation that echo signal is highly correlated with the far-end speech signal, while correlation between double-talk and far-end speech signal remains low. The detection statistic for such algorithms is based on the calculation of correlation coefficients [5]. These methods tend to have high computational cost, which in some cases is inacceptable, however their accuracy usually is better than in case of energy-based methods. The subsequent section of this paper introduces a DTD algorithm developed at the Multimedia Systems Department, basing on a novel approach, utilizing audio signal watermarking techniques. Typically DTD algorithms rely on comparison of far-end and microphone signals. Despite the principle of operation of presented algorithm is quite different it allows to obtain comparable and in some cases even better results than in case of traditional solutions. The detailed foundation of the proposed solution is presented in an earlier paper and related patent application [6, 7], however general principles of operation are explained in order to emphasize its ability to remain robust under time-varying echo path conditions, which more closely resemble circumstances of real-life hands-free operation. In such case response of audio path is affected by movements of conversation participants and other persons in the room, opening of doors and, what is characteristic for audio-and-video conferencing using laptop computer with built-in
Performance of Watermarking-Based DTD Algorithm
71
microphone and camera, adjustments of camera position directly transforming into movements of the microphone. The next section is devoted to the description of the evaluation procedure applied to the DTD algorithms in order to obtain test results. The procedure includes applying room impulse response changing in time and introducing variable amount of delay to the far-end speaker signal. Finally, test results comparing robustness of proposed algorithm against Geigel DTD are presented and conclusions are drawn.
2 Watermarking-Based DTD Overview While most DTD algorithms rely on comparison of far-end and microphone signals, the proposed algorithm utilizes a different approach, which is related to the so called “fragile” watermarking techniques, typically used for protection of multimedia contents against tampering [8]. Fragile watermarking has the property that the signature embedded into the protected signal is destroyed and becomes unreadable when the signal is modified. In case of the double-talk detector algorithm such signal protected from “tampering” is the far-end speaker signal, and the tampering is considered an addition of near-end signal to it. Simultaneously, any linear modifications to the signal resulting from the convolution with impulse response of the audio path should not be considered as tampering, so that the embedded signature would be detectible in “sole” echo signal arriving at the microphone and suppressed in combined echo-and-near-end signal. The information contents of the signature in this application is not important, as only the binary decision whether the signature is present or not is required. The applied signature embedding and detection scheme should also be robust against A/D and D/A conversions, which are inevitable in telephony application, being at the same time transparent (i.e. imperceptible) to the listener, not affecting intelligibility of the speech and perceived quality of the signal. Finally, minor addition of noise and non-linear distortions resulting from imperfections of used analogue elements of audio path should not impair the ability of the algorithm to detect presence of signature in echo signal. The binary decision coming from the signature detection block of above-described arrangement is inverse to the expected output from DTD algorithm. The correct detection of signature in the microphone signal indicates that near-end speech is not present, making it possible to control the adaptation process of adaptive filter. The described concept is presented in Fig.1. Adaptive filter is used to obtain an estimate of audio path impulse response ha(n) based on original far-end speaker signal x(n) and microphone signal u(n). The far-end speaker signal provides a subject to filtering with estimated impulse response yielding echo estimate h f (n), which is subtracted from microphone signal u(n) yielding in turn the signal e(n) with cancelled echo. In order to allow DTD operation the far-end speaker signal x(n) passes through signature embedding block prior to reproduction in the loudspeaker, producing the signal xw (n) with embedded signature. This signature is being detected in the signature detection block yielding detection statistic fd (n), which is compared to the detection threshold Td bringing in result binary decision y(n) used to control the adaptation process.
72
A. Ciarkowski and A. Czy˙zewski
Fig. 1 General concept of AEC algorithm with DTD based on audio signal watermarking
The above-listed requirements regarding the signature embedding and detection process make the choice of a suitable watermarking algorithm problematic. Most commonly used audio watermarking methods are either limited to digital domain only or are too susceptible to noise and reverberation added in the acoustic path. The research on this subject led to the choice of echo hiding method [9, 10, 11, 12], which adds to the signal single or multiple echoes with short delay (below 30ms), so the effect perceived by the listener is only a slight “coloring” of the sound timbre. In case of watermarking systems the information content of the signature is contained in the modulations of the embedded echo delay, the information being not necessary in this case, therefore constant, predefined echo delay is used during signature embedding, which eases the detection process. It was determined that the use of multiple echoes makes the signature detection more accurate. A detailed description of the design of signature embedding and detection procedure is contained in literature [6]. On the foundation of described DTD algorithm acoustic echo cancellation system was created in the form presented in Fig. 1. The system includes adaptive NLMS filter [13], whose length (filter order) is determined by the expected echo delay (length of echo path impulse response ha (n)). Each single detection of double-talk condition by the DTD block causes adaptation of the filter to be held for the time period corresponding to the filter length in order to prevent the filter from processing near-end speaker talkspurt “tail” stored within its buffer. The AEC block was also augmented with dynamics expander block which suppresses the residual echo signal present in the e(n) after the filtering.
3 Evaluation Procedure The applied evaluation procedure was based on methodology for objective evaluation of double-talk detectors described in [14], however it was modified in order to
Performance of Watermarking-Based DTD Algorithm
73
include the simulation of time-variable echo path response. Test input signals for the evaluation are the far-end speech x(n) and near-end speech v(n). The microphone signal u(n) is synthesized as a part of evaluation process from the DTD output signal xw (n), which is subject to attenuation, delaying, filtering with room impulse response, noise addition and mixed with near-end speech v(n). The actual difference between procedure described in literature [14] and the presented approach lays in the fact, that the actual delay and room impulse response characteristic applied to the signal are changing with each processing iteration. The delay introduced during the evaluation procedure directly reflects the local-loop echo path delay and consists of two components. The first component is constant, of predefined value related to buffering occurring in sound interfaces of the terminal. The value of this kind of delay depends on terminal software and hardware and typically ranges from 10 to 100 ms. The second component is variable and reflects additional delay introduced in the acoustic path. Its non-constant characteristic relates to the changing setup of objects in the acoustic field, with the conversation participant being most important one. The simulation of time-variable delay component is based on input delay characteristic d(n), which describes amount of delay in samples for each far-end speech signal x(n) sample. The characteristic d(n) is generated through linear interpolation of cue points given in the form of vector of pairs {N; d(N)}. The actual delay is applied to x(n) signal by interpolation with windowed-sinc kernel. During the experiments Hamming window of length of 21 samples was used. It is worth emphasizing that echo delay typically perceived during phone calls (especially when used with VoIP technology) is higher than local-loop delay described in above paragraph. This is caused by the fact, that the echo perceived by the far-end speaker includes also significant network delay component which adds up to the described delays, but is not present in AEC system running in the near-end speaker’s local loop. The variable room impulse response h(n), is simulated through the use of weighted addition of two separate impulse responses h1 (n) and h2 (n) with the weights updated with each processing iteration according to Eq.(1): h(n) = h1 (n) · r(n) + h2(n) · [1 − r(n)]
(1)
The impulse response weights vector r(n) used during the presented experiments was linear characteristic starting at 1 and shifting towards 0. The DTD accuracy is expressed as a probability of miss (Pm ), which describes the risk of not detecting the double-talk, and the probability of false alarm (Pf ), which describes the risk of declaring a double-talk that is not present in the signal. The proper measurement of these values requires designating a pattern signal u pat (n) acting as a reference for comparison with obtained DTD results. This signal and resulting from it values of Pm and Pf are calculated via procedure described in literature [14].
4 Test Results The test set included 5 recordings of speech in Polish language. The recordings were sampled at the rate of 8000 Sa/s, consistent with narrow-band telephony
74
A. Ciarkowski and A. Czy˙zewski
Fig. 2 Variable delay component characteristic used during experiments
Fig. 3 Room impulse responses h1 (n) and h2 (n) used in experiments
applications. One of the excerpts, designated as a far-end speech signal, had length of 5 seconds, the remaining ones had length of 2 seconds and were used as near-end speech fragments. Due to differences in far- and near-end speech excerpts lengths, for each near-end speech signal a series of tests was performed during which the near-end signal was introduced at different time tn into the far-end signal. The results presented in this paper were obtained for tn 0; 1s; 2s; 3s, so that the whole time span of far-end signal could be covered. Therefore, results of Pm measurement are averaged over 4 results obtained in a series. During the tests a constant delay component of 40 ms was used, while the variable delay was ranging from 0 to 300 samples (0 to 37.5ms), as presented in Fig. 2. The impulse responses h1 (n) and h2 (n) used during the experiments are referring to the same room, but were recorded with two microphones located in different places. They are presented in Fig. 3. For consistency with results presented in literature [6] the tests included measuring Pm at fixed level of Pf ∈ {0.1; 0.3} while sweeping through near-to-far end ratio (NFR) in the range of -20 to 10dB [14]. In order to evaluate the robustness of DTDs under time-varying echo path conditions the same measurements were per-formed with both time-variable delay and impulse response simulations turned off. The
Performance of Watermarking-Based DTD Algorithm
75
Fig. 4 Probability of miss (Pm ) of Watermarking-based and Geigel DTDs at probability of false alarm Pf fixed at 0.1 while operating under time-variable and non-variable conditions
Fig. 5 Probability of miss (Pm ) of Watermarking-based and Geigel DTDs at the probability of false alarm Pf fixed at 0.3 while operating under time-variable and non-variable conditions.
results of comparison between bahavior of the proposed Watermarking-based DTD and Geigel DTD when performing under both time-variable and “stable” conditions is presented in Fig. 4. for the probability of false alarm Pf = 0.1 and Fig. 5. for Pf = 0.3, respectively. Both tested DTDs show some degradation of detection quality when exposed to time-varying echo path conditions, however the scale of the degradation differs.
76
A. Ciarkowski and A. Czy˙zewski
Fig. 6 Relative increase in probability of miss Pm for both DTDs while working under timevarying conditions.
To analyze the phenomenon better a relative degradation has been introduced, which reflects as to how given algorithm performs at the specific NFR level in time-varying conditions compared to “stable” conditions. This relation is presented in Fig. 6. The averaged measured relative increase in Pm for watermarking-based DTD amounts to 1.046 for Pf = 0.3 and 1.054 for Pf = 0.1, is significantly lower than in case of Geigel DTD, where respective values were 1.118 and 1.134.
5 Conclusions The presented DTD algorithm based on audio watermarking techniques has been tested to verify its robustness against harsh operating conditions of most demanding environments like busy teleconferencing rooms or car hands-free installations. These environments share a common property, which is dissimilar to typical handset-operated telephone calls, that the response of echo path is subject to changes which may occur during the call. Such conditions pose additional difficulty for acoustic echo cancellation systems due to time-dependency of introduced delay and reverberation characteristic. The presented experiments prove that the proposed DTD algorithm is suitable for use in such environments, as the performance degradation resulting from aforementioned factors is low. The robustness of the discussed algorithm is higher than for Geigel DTD. That is because the detection procedure does not rely on direct signal comparison, but is based on the detection of embedded signature, which is only slightly affected by the adverse conditions if the proper watermarking method was chosen. The Geigel DTD algorithm used for the comparison
Performance of Watermarking-Based DTD Algorithm
77
is a very simple one, so it does not reflect current state of the art, however it is still in common use due to its low computational cost. Nevertheless, testing the proposed AEC system against modern algorithms based on cross-correlation may provide a subject of further research. As “fast” cross-correlation optimization methods use estimate of echo path obtained from adaptive filter in consecutive analysis steps, it may adversely affect the performance while the impulse response is changing. Therefore, it would be interesting to organize an experiment to verify this assumption in practice. Consequently, one of the directions of the future research the authors are planning to undertake is to extend the range of evaluated DTD algorithms by including more up-to-date solutions. Furthermore, more complex scenarios will be prepared in order to evaluate the influence of variable delay and changing room response independently. Another planned extension to the evaluation procedure is increasing the number of employed speech excerpts so that the test set is more complete.
Acknowledgments Research funded by the Polish Ministry of Education and Science within Grant No. PBZ-MNiSW-02/II/2007
References 1. ITU-T G.167, Acoustic Echo Controllers, International Telecommunication Union, Geneva, Switzerland (1993) 2. Kuo, S.M., Lee, B.H., Tian, W.: Adaptive Echo Cancellation. In: Real-Time Digital Signal Processing: Implementations and Applications. Wiley, NewYork (2006) 3. Vu, T., Ding, H., Bouchard, M.: A Survey of Double-Talk Detection Schemes for Echo Cancellation Applications. Can. Acoust. 32, 144–145 (2004) 4. Duttweiler, D.L.: A Twelve-Channel Digital Echo Canceller. IEEE Trans. Commun. 26, 647–653 (1978) 5. Benesty, J., Morgan, D.R., Cho, J.H.: A New Class of Doubletalk Detectors Based on Cross-Correlation. IEEE Trans. Speech Audio Process. 8, 168–172 (2000) 6. Szwoch, G., Czyzewski, A., Ciarkowski, A.: A Double-Talk Detector Using Audio Watermarking. J. Audio Eng. Soc. 57, 916–926 (2009) 7. Czyzewski, A., Szwoch, G.: Method and Apparatus for Acoustic Echo Cancellation in VoIP Terminal. International Patent Application No. PCT/PL2008/000048 (2008) 8. Dittmann, J., Mukherjee, A., Steinebach, M.: Media-Independent Watermarking Classification and the Need for Combining Digital Video and Audio Watermarking for Media Authentication. In: Proc. International Conference on Information Technology: Coding and Computing (ITCC 2000), March 27-29, p. 62 (2000) 9. Gruhl, D., Lu, A., Bender, W.: Echo Hiding. In: Proc. Information Hiding Workshop, Cambridge, UK, pp. 295–315 (1996) 10. Oh, H.O., Kim, H.W., et al.: Transparent and Robust Audio Watermarking with a New Echo Embedding Technique. In: Proc. IEEE Int. Conf. on Multimedia and Expo, Tokyo, Japan, pp. 433–436 (2001)
78
A. Ciarkowski and A. Czy˙zewski
11. Kim, H.J., Choi, Y.H.: A Novel Echo-Hiding Scheme with Backward and Forward Kernels. IEEE Trans. Circuits Sys. for Video Technol. 13, 885–889 (2003) 12. Yan, B., Sun, S.H., Lu, Z.M.: Improved Echo Hiding Using Power Cepstrum and Simulated Annealing Based Synchronization Technique. In: Proc. 2nd Int. Conf. on Machine Learning and Cybernetics, Xi’an, China, pp. 2142–2147 (2003) 13. Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice Hall, New Jersey (2002) 14. Cho, J., Morgan, D.: An Objective Technique for Evaluating Doubletalk Detectors in Acoustic Echo Cancellers. IEEE Trans. Speech Audio Process. 7, 718–724 (1999)
Applying HTM-Based System to Recognize Object in Visual Attention Hoai-Bac Le, Anh-Phuong Pham, and Thanh-Thang Tran
Abstract. As our previous work [2], we presented a model of visual attention in that the space-based attention happens prior to the object-based attention using Hierarchical Temporal Memory (HTM) system. In this paper, we propose a novel model, applying an alternative flow of visual attention in which the object-based happens earlier than spaced-based. The new approach is able to recognize multiple objects while the previous one only identifies single object in image. Besides, the way of moving object around image’s centralization is applied to improve object identification at any positions in image. The experiments as well as results for identifying one object and two separated objects in multi-object image are presented. Keywords: Image Processing; Visual attention; Spaced-based and Object-based attention; Hierarchical Temporal Memory.
1 Introduction The attention mechanism in almost visual system is aim to limit processing to important information relevant to behaviors or visual tasks [4]. It’s well known that there are two complementary modes of selection in behavior studies, including space-driven and object-driven. Advocates of space-based attention argue that attention selects regions of space independent of the objects they contain. Attention is like a spotlight illuminating a region of space. Objects that fall within the beam are processed; objects that fall outside it are not. Advocates of object-based attention argue that attention selects objects rather than regions of space. Selection Hoai-Bac Le and Anh-Phuong Pham Faculty of Information Technology, University of Science, HCM City, Vietnam e-mail:
[email protected],
[email protected] Thanh-Thang Tran Vocation Department, Ton Duc Thang University, HCM City, Vietnam e-mail:
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 79–90. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
80
H.-B. Le, A.-P. Pham, and T.-T. Tran
is spatial because objects necessarily occupy regions of space, but objects rather than the regions themselves are the things that are selected [3]. Many studies show that both of these attention modes coexist and influence each other, which the object-driven happens earlier or later the space-driven. Our previous model [2] used space-then-object. That is, object-based effects occur within the focus of spatial attention. Basically, it is a schema of combining Hierarchical Temporal Memory Space-based Network (HTM-SBN) and Hierarchical Temporal Memory Objectbased Networks (HTM-OBNs) for object recognition. The HTM-SBN is trained to identify several high possible objects. For each object, there is an associated HTM-OBN which is trained to recognize parts of the corresponding object. When an object is presented, its full image is applied to HTM-SBN to identify several candidates. The HTM-OBNs of these candidate objects are then applied for recognizing individual parts. The average result of all parts is then used as a recognition value of that object. If the object has the highest value, it is considered as the final output. We point out two problems as well as solutions of our previous model as follows: Problem 1. How to identify an object in a particular trained image if the object is moved to any positions in the image? Basically the system is able to recognize a trained image that the object is located at a particular position. However, if the object is moved to any positions in the image, the system is unable to recognize it unless it is trained at those positions. The solution is that the object is moved to the location where it is nearest the trained one. Firstly, HTM networks train images having the identifying object positioned at centre of the image. Next, when a testing image is presented, the unidentified object is segmented out and moved around the centre position of the image within a predefined radius. For each position-created image, it is identified using HTM networks. Finally, the output is the one having the highest recognition value. Problem 2. How to identify multiple objects in image? For instance, an image has chair and table concurrently. The ability of object-driven is able to find candidates of parts based on well-known trained ones. When an object is presented, it is segmented into many individual parts based on color. Each part is identified through HTM-OBNs to find the best candidates of part with high possible. Then, they are combined each other to create available objects. Finally, the objects are identified in space-based using their own HTM-SBNs. In this paper, we propose a new system based on above solutions. The new approach is able to identify multiple objects at any positions in mage. The remainder of this paper is organized as follows. In Section 2, we present the way to generate training-testing image set and train HTM-SBNs and HTM-OBNs. Section 3 introduces and explains our new approach. In Section 4, we describe some experiments as well as results. We then discuss limitations and extensions in Section 5. Finally, we show the related works and conclusion.
Applying HTM-Based System to Recognize Object in Visual Attention
81
2 Image Set and HTM-Based Networks For the training and testing images, we assume that they have been pre-processed in which object’s parts are colored differently. In another way, they are created based on solid-colored parts. We will mention the way to convert a natural object to our assumed object in discussion section. We use centralization-rotating method [2] to create training and testing image set for each object. The HTM network is applied as training and inference system for object recognition. In this section, we present following items: • Generating training and testing image set. • Training HTM-SBNs and HTM-OBNs.
2.1 Generating Training and Testing Image Set The total number of objects in the system is four. It consists of “Chair”, “Table”, “Computer” and “Telephone” whose parts are colored differently. Each object is in a 64 × 64 image. An object has full or a few of its own parts as shown in Table 1. The object is considered as a multiple part-based if it has more than one associated parts while as a single part-based if it has only one unique part. For each multiple part-based and single part-based, we place it in 3D space and use rotating method at centralization [2] to create image set. Particularly, the object is rotated 3600 on Oy while the camera is moved from 00 to 450 on xOy concurrently. Each generated image is regarded as a timing feature of the object. The output contains 200 continuous frames. We divide the output into training and testing image set in that a half number of pictures having even index are considered as training ones while the others are testing ones. For training image set, all objects including multiple part-based and single part-based are moved to centre position of the image and converted to binary ones. We use these binary images as input to train HTM-based networks.
2.2 Training HTM-SBNs and HTM-OBNs Basically, the HTM-SBN and HTM-OBN network has the same structure, using Hierarchical Temporal Memory (HTM) network as a learning and inference system. The HTM-SBN and HTM-OBN are considered as space-driven and object-driven in visual attention respectively. HTM-OBN is used to identify individual parts of object while HTM-SBN is used to recognize part-based combinations of object. For training HTM networks, inputted images are selected in training image set. The training images are multiple part-based for HTM-SBN while single part-based for HTM-OBN. Each object has one associated HTM-OBN and HTM-SBN. When an image is tested using the HTM-based network, the output is a prediction vector. Each element in the vector includes belief value and element name. The correctly identified object as output is the element having the highest belief. For HTM-OBN,
82
H.-B. Le, A.-P. Pham, and T.-T. Tran
Table 1 List of multiple part-based and single part-based for each object. Object
Multiple part-based
Single part-based
Case + Monitor Case + Keyboard Monitor + Keyboard Case + Monitor + Keyboard
Case Keyboard Monitor
Face + 4 Legs Face + Back Back + 4 Legs Face + Back + 4 Legs
Face Back Leg1 (Front-Left) Leg2 (Front-Right) Leg3 (Back-Left) Leg4 (Back-Right)
Face + 4 Legs
Face Leg1 (Front-Left) Leg2 (Front-Right) Leg3 (Back-Left) Leg4 (Back-Right)
Hand + Base Hand + Button Base + Button Hand + Base + Button
Hand Base Button
Computer
Chair
Table
Telephone →
assume that object O has k parts and the output of the network is v = p1 , p2 , . . . pk . The correctly identified part of an inputted part P is calculated by: OBNo (P) = max{Beli f (pt ),t = 1 . . . k}
(1)
For HTM-SBN, assume that object O has k part-based combinations and the out→ put of the network is v = c1 , c2 , . . . ck . The correctly identified combination of an inputted combination C is calculated by: SBNo (C) = max{Beli f (ct ),t = 1 . . . k}
(2)
3 A New Approach Assume that a multi-object image is considered as many differently connected solidcolored parts. We use the object-driven in visual attention to find the best identified
Applying HTM-Based System to Recognize Object in Visual Attention
83
candidates of parts through HTM-OBNs. Next, all these candidates are combined each other to create available objects. Each object is then identified using its own → HTM-SBN. The output of the system is a vector v = O1 , O2 , . . . Om ; m is the number of involved objects in the system. We choose m = 4 because there are 4 testing objects as described in Session 2.1. The new approach has two differences in comparison with the previous model [2]: • Using object-then-space instead of space-then-object in visual attention. That is, the system focuses on identifying individual parts prior to the whole object in space. Therefore, the system is able to recognize multiple objects in image. • Before an object is presented through HTM networks, it is pre-processed in the way that it is moved around center position. All position-created images are then identified using HTM networks to find the most correct one as the output. When a color image is presented to the model, it is passed through following phases. Phase 1. Pre-processing images An inputted image is segmented into many parts based on color. Then, parts’ color is converted into binary. Then, we move parts around center position within a predefined radius RADIUS OBN. We select RADIUS OBN = 2. So, a sample of “Monitor” segment generates 9 position-created images as shown in Fig. 1.
Fig. 1 Moving “Monitor” segment around centre position for position-created images with RADIUS OBN = 2.
Phase 2. Identifies object’s parts We identify segments based on their position-created images. They are passed through all HTM-OBNs to find a list of best candidate ones. Value(Segment) = {max{OBNOi (Seg j ) j = 1 . . . 9}i = 1 . . . m}
(3)
We sort the list based on belief value decreasingly. Value(Segment) = {Value1 > Value2 > · · · > Valuem }
(4)
84
H.-B. Le, A.-P. Pham, and T.-T. Tran
With a predefined parameter T OP OBN PART S(N), there are top N candidate parts of the segment as output. Value(Segment) = {Value1 ,Value2 , . . . ,ValueN }
(5)
Phase 3. Builds objects We build available objects based on all different combinations among candidate parts. Then, these part-based combinations are moved around centre position within a predefined radius RADIUS SBN. We select RADIUS SBN = 2. So, the number of position-created image of an input is 9 as shown in Fig. 1. Phase 4. Identifies objects Each object has an associated HTM-SBN. In this phase, position-created combinations of object O are passed through its own HTM-SBN to find the best one in space. Assume that object O has k candidate part-based combinations; the correctly identified one is calculated by: Value(Ob ject) = max{SBNO ()Ci j , i = 1 . . . k, j = 1 . . . 9}
(6)
→
Assume that vector v = O1 , O2 , . . . Om is the output of the system. Each element → is the object value for a particular object. So, v is: →
v = {Value(O1),Value(O2 ), . . . ,Value(Om )}
(7)
→
We sort v based on element’s belief value decreasingly. →
v = {v1 > v2 > · · · > vm }
(8)
With a predefined parameter T OP SBN OBJECT S(N), we consider top N ele→ ments in vector v as the correctly identified objects for an inputted image I. Out put(I) = {v1 > v2 > · · · > vN }
(9)
4 Experiments We present two experiments as well as results. It consists of identifying one object and two separated objects in 128128 images. Testing images are randomly selected in testing and training image set. These images are multiple part-based whose identifying object has full of solid-colored parts. Next, the object is placed at a random position for testing image. The correct percentage of an object is calculated based on the number of correctly identified images over the number of inputted testing images. Then, we calculate the average correct percentage of the whole testing and training image set. At the end of this session, we discuss about the evaluation method
Applying HTM-Based System to Recognize Object in Visual Attention
85
Table 2 List of parameters used in all experiments. Parameter Value RADIUS OBN 2 RADIUS SBN 2 TOP OBN PARTS 2 TOP SBN OBJECTS 1 or 2, which is the number of correctly identified objects as output of the system.
and compare the model with our previous one [2]. We configure the parameters used in all experiments as shown in Table 2.
4.1 Experiment 1 Subjects. Identifying one object. Procedures The value of TOP SBN OBJECTS parameter is configured to one. When an image is presented, the system returns one identified object name. A sample of testing images in this experiment is shown in Fig. 2.
Fig. 2 Sample of testing “Chair” images. Table 3 Result. Chair Table Computer Telephone Avg. Perc Testing Image Set 96% 100% 100% 100% 99% Training Image Set 83% 96% 100% 100% 94.7%
86
H.-B. Le, A.-P. Pham, and T.-T. Tran
4.2 Experiment 2 Subjects. Identifying two separated objects. Procedures The value of TOP SBN OBJECTS parameter is configured to two. When an image is presented, the system returns two identified object names. For testing images, each object combines with the others to generate two-object combinations. Particularly, the way to generate these testing combinations based on a particular object Oi with an object Ok is as follows: • In Oi ’s image set, we select one-by-one image and combined it with a randomly selected image in Ok ’s. • We continue to place the selected objects at randomly different positions in an image such that they are not covered each other. A sample of testing combinations in this experiment is shown in Fig. 3.
4.3 Comparison with Other Systems We compare our model in relation to the previous one [2]. We named this model as OBN-SBN system and our previous model as SBN-OBN system. Basically,
Fig. 3 Sample of testing separated multi-object images. Table 4 Result: Combining “Chair” with the others. Chair Avg.Perc Table Computer Telephone Testing Image Set 46% 85% 96% 75.6% Training Image Set 40% 73% 82% 65%
Applying HTM-Based System to Recognize Object in Visual Attention
87
Table 5 Result: Combining “Table” with the others. Table Avg.Perc Chair Computer Telephone Testing Image Set 54% 86% 64% 68% Training Image Set 30% 88% 75% 64.3%
Table 6 Result: Combining “Computer” with the others. Computer Avg.Perc Chair Table Telephone Testing Image Set 62% 80% 92% 78% Training Image Set 81% 92% 100% 91%
Table 7 Result: Combining “Telephone” with the others. Telephone Avg.Perc Chair Table Computer Testing Image Set 88% 71% 100% 86.3% Training Image Set 85% 82% 100% 89%
Fig. 4 Objects’ belief accuracy between SBN-OBN system and OBN-SBN system.
SBN-OBN model is able to identify one object at center position while OBN-SBN model identifies multiple objects at any positions in the image. Therefore, the comparison is only done for the test case in that the image has only one object at the center position. The comparison’s result shows two parts: (1) correctly object identification accuracy and (2) object’s belief accuracy. The meaning of object’s belief accuracy is that it shows the belief percentage of the correctly identified object(s) → in comparison with the others in the output vector v = O1 , O2 , . . . Om . For the test case, O1 is the correct object of the system. So, it is calculated by:
88
H.-B. Le, A.-P. Pham, and T.-T. Tran
Belie f Accuracy(O1) =
Belie f (O1 ) m ∑ j=1 Belie f (O j )
· 100%
(10)
We use our testing data set for testing both systems. For output vector , we choose m = 3. The result shows that object identification accuracy of both systems is 100%. That is, both OBN-SBN and SBN-OBN system are able to identify object correctly. However, the belief accuracy of correctly identified object from OBN-SBN system is more than SBN-OBN system as shown in Fig. 4.
5 Discussions Generally, the object identification accuracy in Experiment 2 is not high. The reason is that the position-created images, which are generated from the way of moving around centre position, are not close enough to the trained ones. Therefore, when an image is presented to HTM network, the belief value is not high. The key to the solution is that we need to modify the current way of generating training image set. That is, for each generated feature of the object using centralization-rotating method, we keep moving it at centre position within a predefined radius to generate a collection of images for the feature. These images are then trained by the HTM networks. So, when an image is presented, it could be identified better than now. As described in Session 2.2, the HTM-SBN is associated with a particular object and trained based on all available part-based combinations. In further research, we have an idea to hierarchically create relationships among combinations to upgrade the HTM-SBN, making it become a semantic network which is named as a Hierarchical Space-based Network (HSBN). Using the semantic network, the system is able to predict and locate the next parts based on identified ones. Moreover, the network shows that there are only a limited number of parts-based combinations which are properly for object identification in space. For example, we use a semantic network of “Table” object, shown in Fig. 5, as an example. Assume that P2 and P3 are identified through HTM-OBN correctly. So, P2 and P3 become activated nodes. They are then identified using HTM-SBN at node C2. If the be-lief returned by C2 is over a predefined belief thread hold, node C2 becomes activated. C2’s status is passed up to node B. At this stage, the system points out node B is only activated if node C3 is activated. If so, P4 is predicted to be the next identified part. Using position information about P4 at node C3, the system is able to locate the P4 in the image and segment it out of the image. We continue to identify P4 through HTM-OBN. Next, the combination between P4 and P3 is recognized at node C3 in space. If node C3 is activated, its status in conjunction with C2’s status makes node B to be identified using HTM-SBN. In case node B gets activated, its status is directly passed up to root node A. It means the “Table” object is identified properly in space-driven. Furthermore, the network shows that there’s another way to identify “Table” object using another path which is from node C1 to node A. Actually, using the semantic network in this way is an illustration of top-down processing control in visual attention. When an object is presented, the system only selects a random number of parts instead of all parts for identification at first stage of processing.
Applying HTM-Based System to Recognize Object in Visual Attention
89
Fig. 5 A sample of Hierarchical Space-based Network for “Table” object.
The assumption that an image has solid-colored parts seems to be hard to achieve in real application. The solution is that we could use HTM-SBN to predict available objects. For each identified object, we know its parts and corresponding part-based HTM network. Next, we use a semantic network as described above to locate the object’s parts. However, this problem needs more research. The HTM network is able to identify objects which are trained exactly so far. However, if the size of the object is zoomed in or out, which is bigger or smaller than the original trained ones, the network is inability to recognize it properly. The solution is that we could create many size-created HTM networks according to image resolution. When an inputted image is presented, it is passed through all size-created HTM networks to find the best candidates. An object or part is considered to have many built-in subobjects or sub-parts respectively. So, we suggest a new way for object identification. The system firstly identifies sub-objects or sub-parts instead of the whole object or part. So, the process of object recognition is organized as hierarchical layers. The recognition value from the lower layer, the sub-object/part, is moved up and used in higher layers, the whole object/part. This approach is able to associated with semantic network for predicting the object/part with high possible. We named this model as multi-layer object identification system.
6 Conclusion HTM is an elegant idea. It promises to build machines that approach or exceed human level performance for many cognitive tasks [7]. Based on HTM network, Hall and Poplin attempted to break CAPTCHA letters [8]; Bobier and Wirth experimented with content-based image retrieval [9]; Kapuscinski and Wysocki tested recognition of signed Polish words [10]. Generally, these applications came to a halt that using HTM network as training and testing framework, not applying it to build computational models of any cognitive theories. In this paper, we focus on realizing
90
H.-B. Le, A.-P. Pham, and T.-T. Tran
visual attention theories using HTM-based network in computer vision. Particularly, we create a computational model in that object-based happens prior to space-based for object recognition. Moreover, the way of moving object around centre position of image enhances the system to identify object at any positions in image. The new approach is able to indentify not only single object but also multiple objects at any positions in image. However, there are still limitations such as not processing on natural image, scaled images as well as overlapped objects with high possible. We think that more research in the direction that creating and improving the computational models based on combination of visual attention theories and HTM-based networks will lead to the creation of revolutionary machines, making applications more intelligent for object recognition liked human.
References 1. Scholl, B.J.: Objects and attention: the state of the art. Cognition 80, 1–46 (2001) 2. Le, H.B., Tran, T.T.: Recognizing objects in images using visual attention schema. In: Proceeding of Springer Berlin / Heidelberg, vol. 226, pp. 129–144 (July 2009) 3. Logan, G.D.: The CODE theory of Visual Attention: An Integration of Space-Based and Object-Based Attention. Psychological Review 103(4), 603–649 (1996) 4. Sun, Y., Fisher, R.: Object-based visual attention for computer vision. Artificial Intelligence 146, 77–123 (2003) 5. Chun, M.M., Wolfe, J.M.: Visual Attention. Blackwell Handbook of Perception (2000) 6. Itti, L., Koch, C.: Computational modelling of visual attention. Nature Reviews Neuroscience 2, 194–203 (2001) 7. Hawkins, J., Blakeslee, S.: On Intelligence, Times Books (2004) 8. Hall, Y.J., Poplin, R.E.: Using Numenta’s hierarchical temporal memory to recognize CAPTCHAs 9. Bobier, B.A., Wirth, M.: Content-based image retrieval using hierarchical temporal memory. In: Proceeding of the 16th ACM international conference on Multimedia, pp. 925–928 (2008) 10. Kapuscinski, T., Wysocki, M.: Using Hierarchical Temporal Memory for Recognition of Signed Polish Words. Computer Recognition Systems 3 57, 355–362 (2009)
Constant Bitrate Image Scrambling Method Using CAVLC in H.264 Junsang Cho, Gwanggil Jeon, Jungil Seo, Seongmin Hong, and Jechang Jeong
Abstract. As the digital multimedia industry has grown, demand for protecting the copyright of the multimedia contents has increased. Digital video scrambling is a digital video protection technique, and is known as an effective strategy to protect the contents. In this paper, a digital scrambling method based on a shuffled VLC table and the use of coefficient levels scrambling are proposed as content protection techniques. In the CAVLC procedure, the levels of each coefficient are first transformed by the key. The VLC tables are then shuffled using another key mapping procedure. This method can be applied to the latest video CODECs such as MPEG-4 and H.264. The image can be reconstructed by the key with the correct information. We found that the compression performance does not decrease due to the scrambling, so that attaching CODECs is worthwhile. The implementation is made possible by the substitution of codewords in the look-up table, which keeps the computational complexity low. The experimental results show that both content protection techniques investigated are able to efficiently scramble the images without any overhead or complexity.
1 Introduction The use of digital multimedia has increased rapidly. As existing analog data is digitalized, the media changes from the available analog methods to digital. However Junsang Cho, Seongmin Hong, and Jechang Jeong Department of Electronics and Computer Engineering, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul, Korea e-mail:
[email protected],
[email protected],
[email protected] Gwanggil Jeon School of Information Technology and Engineering (SITE), University of Ottawa, 800 King Edward, P.O. Box 450, Ottawa, Ontario, Canada, K1N 6N5 e-mail:
[email protected] Jungil Seo S/W Lab R&D 2 Team IT Solution Business Samsung Electronics Co. LTD, 416, Maetan-3Dong, Yeongtong-Gu, Suwon-Si, Gyeonggi-Do, 443-742, Korea e-mail:
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 91–100. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
92
J. Cho et al.
most digital data has a defect in that it is very easy to copy or distribute it illegally. Especially in the case of most digital videos, since they are transferred via public routes, such as satellites, cable, or internet network, the risk of illegal copying and use is much higher. Therefore the importance of protecting digital multimedia data is growing rapidly, and research has been performed regarding several protection techniques for video. Scrambling is one protection technique for digital videos. The main idea behind scrambling is simple. The transferred video is modified or encrypted using a special key. The receiver then reconstructs the video using the known special key. Therefore only certified users who have the special key can correctly restore the distorted video. Even though an unauthorized user may descramble the received videos, the recipients’ rights can be protected by displaying the distorted videos made by the scrambler instead of the original videos. Such scrambling techniques have been studied in recent years [1], [2], [3]. Some methods for scrambling distort the video data directly in the spatial domain [1]. However, direct distortion in the spatial domain changes the statistical property, which causes difficulty in the video compression and affects the coding efficiency. Another scrambling method involves the use of DCT or a wavelet transform in the frequency domain [2]. The scrambling method based on the wavelet transform mixes the coefficients in each block. The sub-band produced by the wavelet transform is divided into blocks, and the coefficients in these blocks are mixed by a special table to distort the video. This method mixes an important bit or the sign bit selectively. However, since most video compression algorithms such as MPEG-2,4 and H.264 do not contain a wavelet transform, it is difficult to apply the wavelet transform to video signals. The scrambling method in the frequency domain based on DCT distorts the video by mixing the coefficients like the method based on the wavelet transform. For example, only the DC coefficients in a frame are gathered and mixed using a special table, or the coefficients in the same frequency position are gathered to make a frequency spectrum like the wavelet transform. Coefficients in the same frequency can be mixed to scramble the video. There are other scrambling methods that use the cryptographic algorithm, such as the data encryption standard (DES) [3] and the advanced encryption standard (AES) [4], [5]. That is to say, the compressed video itself is ciphered using algorithms such as DES or AES. However, cipher algorithms such as DES and AES have high computational costs; therefore, the complexity is much higher when the whole video signal is ciphered. To overcome these drawbacks, a scrambling algorithm with a high coding efficiency was proposed [6], [7]. In [8], CABAC with scrambling was proposed, but the overhead of the encoded bitstream still remained a problem. Given these attempts, we proposed a coding-friendly and efficient scrambling algorithm. This paper proposes a scrambling method which is implemented in the entropy encoding section (in H.264, CAVLC is used as the entropy coding method [8]). One method is coefficient level scrambling (CLS). In CLS, through the coefficient handling, we can brighten or darken the scrambled image. Another method is shuffling the CAVLC table. In this case, only the DC table is used for easy encoder and decoder implementation. The suggested algorithm uses a codeword exchanged CAVLC table and
Constant Bitrate Image Scrambling Method Using CAVLC in H.264
93
distorts the original video efficiently. We bind the codeword to be represented in the CAVLC table by the same number of bits for each group, and exchange the codeword of the same group according to the key. These two scrambling methods improve the performance over that of existing methods while maintaining the compression bit stream rate after scrambling. The rest of this paper is composed as follows. In Section 2, we describe the specific scrambling process. The CLS and shuffling the CAVLC table methods are specified with a flowchart and with figures. The experimental results will be explained in Section 3, and the concluding remarks will be drawn in Section 4.
2 Proposed Algorithm It is well known that the CAVLC in H.264 is the entropy coding method for encoding the 4 (or 2 × 2) blocks’ DCT transformed coefficients [9]. In a 4 block, the quantized transformed coefficients are aligned in zig-zag order as shown in Fig. 1. After the ordering of the coefficients, the CAVLC encoding procedure is performed as shown in Fig. 2. First, the number of total non-zero coefficients and trailing ones is encoded. After that, the sign of the trailing ones is encoded. If the value of the trailing ones is 2, then 2 bits are needed to encode the trailingoness ign f lag. Next, the remaining nonzero coefficient levels are encoded. In this part, the remaining nonzero coefficients are encoded from the highest frequency coefficient to the DC coefficient. After that, totalzeros is encoded, where totalzeros means the number of total zeros before the last nonzero coefficient. Finally, the number of trailing zeros before the coefficients (runb e f ore) is encoded. We propose two scrambling methods, which are performed in the CAVLC procedure. Using these methods, we can get the scrambled image without increasing the total number of bits.
Fig. 1 Zig-zag scan order for encoding 4 × 4 block. (DCT and quantized block).
94
J. Cho et al.
Fig. 2 CAVLC encoding procedure.
2.1 Coefficient Levels Scrambling The first scrambling method is coefficient levels scrambling (CLS). In this method, the bitstream of the levels changes according to the key value. The key value is inserted in the user data section in order to be used in the decoder. In the CAVLC encoding procedure, the remaining nonzero coefficients are encoded from highest frequency coefficient to the DC coefficient. In the case of encoding the coefficients’ levels, the encoded contents are organized as length and information. For example, let us say the length is four and information is two. The total bitstream is then 0010. So if we shuffle the information and do not amend the level, we are able to produce scrambled video without increasing the total number of bits. The total flowchart of CLS is shown in Fig. 3. Using the CLS method, we can scramble the video to the intended quality according to the key value. We use 3-bits to embed the key value. The numbers are then mapped according to the values. The dynamic range of the integer number for the matching process is between -3 and 4. This approach is useful for the contents when
Constant Bitrate Image Scrambling Method Using CAVLC in H.264
95
Fig. 3 Coefficient levels scrambler.
the luminance of the scrambled image is bright. In that case, by adding the positive number, we can obtain the proposed image. On the other hand, we can acquire the dark scrambled image by using the negative number.
2.2 Scrambling Using Shuffled VLC Table The VLC table is used for efficient coding for some parts of the CAVLC encoding procedure. For example, for the coding of TotalCoeff and TrailingOnes, context adaptive VLC tables are used. In this paper, we only shuffle the VLC table of ”NumCoeff and TrailingOnes for ChromaDC,” which does little to affect the overall encoding or decoding procedure. When shuffling the VLC table, the codeword which has the same length is shuffled in order to maintain the total bitrate. The key value is used for selecting the shuffling group of the VLC table. Fig. 4 shows the flowchart of the scrambling algorithm using the shuffled VLC table. First, the table is grouped according to the length of the bitstream. We group it as A, B, and C, which represent bit lengths of 8, 7, and 6 respectively. The codewords can be exchanged within the same group, in order to scramble the image. The group may be chosen according to the key value. Table 1 shows the VLC table for encoding the DC value and its shuffling procedure when group C is selected for the shuffling codeword. The descrambling process is as follows: first, the image data stream which is transmitted to the decoder contains the scrambled codeword and key. The key is read before the entropy decoding. The original VLD table is reconstructed according to the key. In this procedure, only a legal receiver should have the authority to reconstruct the image. Therefore, only an approved receiver knows the rules of the key and the exchange procedure.
96
J. Cho et al.
Fig. 4 Flowchart of selective scrambling algorithm using VLC table.
3 Experimental Results We implemented the proposed algorithm in JM16.0 reference software. Four test sequences (Coastguard, Stefan, News, and Mobile) were used to test the algorithm. Each image sequence consisted of 40 P picture frames of 352 × 288 pixels each. The frame rate of the input images is 30Hz. We adopted a window size of 16 × 16 for the integer full search. The experiments were run on an Intel Core 2 CPU at 2.13GHz. By using the CLS method, we can get the scrambled image according to the selected numbers. If the number is positive, the luminance of the scrambled image becomes brighter. On the other hand, if the number is negative, the scrambled image becomes darkened, as shown in Figs. 5 and 6.
Constant Bitrate Image Scrambling Method Using CAVLC in H.264
97
Table 1 VLC table for encoding DC value and its shuffling procedure (when the codeword of group C is shuffled).
Fig. 7 shows the effect of the scrambling method using the VLC table of DC levels. Figure 7 shows the scrambled images according to the key values. We can produce the scrambled image according to the key value. The main difference of our approach, as compared with previous algorithms([8], [10]) is that the bitrate of our scrambler does not require any additional bitrate (Table 2). The implementation of the proposed algorithm is made possible by the substitution of codewords in the table or bitstream, so that the computational complexity remains very low.
98
J. Cho et al.
(a)
(b)
Fig. 5 Scrambled (a) Coastguard and (b) Stefan images when the selected number is positive value.
(a)
(b)
Fig. 6 Scrambled (a) Coastguard and (b) Stefan images when the selected number is negative value.
Table 2 Average bit rate (Kbit/s) comparison table. Original Scrambled Bitrate overhead Coastguard 1631.43 1631.43
0.00 %
News
247.96
247.96
0.00 %
Stefan
1163.9
1163.9
0.00 %
Mobile
2538.88 2538.88
0.00 %
Constant Bitrate Image Scrambling Method Using CAVLC in H.264
(a)
(b)
(c)
(d)
(e)
(f)
99
Fig. 7 Scrambled Coastguard and News images using shuffled VLC table. The key values are 1 to 3, from the top image to the bottom image, respectively.
4 Conclusion In this paper, we propose a digital video scrambling algorithm that amends the CAVLC encoding procedure. Using the CLS method, we can brighten or darken scrambled images according to the selected numbers. Another method is scrambling using a shuffled VLC table. These two methods are implemented in H.264/AVC JM 16.0 reference software. The simulation results show that the proposed algorithms can selectively scramble the images according to the key. The bitrate is the same as for the unscrambled image, so the coding efficiency is very good. The scrambling is performed by only adding or shuffling the table, making implementation easy. This scheme is suitable for mobile and entertainment applications.
100
J. Cho et al.
Acknowledgements. This research was supported by Seoul Future Contents Convergence (SFCC) Cluster established by Seoul R&BD Program (10570).
References 1. Hobbs, G.L.: Video Scrambling. U.S. Patent 5815572 (1998) 2. Zeng, W., Lei, S.: Efficient frequency domain selective scrambling of digital video. IEEE Transactions on Multimedia 5, 118–129 (2003) 3. Data Encryption Standard. FIPS PUB 46 (1977) 4. Wang, C., Yu, H.-B., Zheng, M.: A DCT-based MPEG-2 transparent scrambling algorithm. IEEE Transactions on Customer Electronics 49, 1208–1213 (2003) 5. Qiao, L., Nahrstedt, K.: Comparison of MPEG encryption algorithms. Computer and Graphics 22(4), 437–448 (1998) 6. Defaux, F., Ebrahimi, T.: Scrambling for privacy protection in video surveillance systems. IEEE Trans. Circuits Syst. Video Technol. 18(8), 1168–1174 (2008) 7. Tong, L., Dai, F., Zhang, Y., Li, J.: Prediction restricted H. 264/AVC video scrambling for privacy protection. Electron. Lett. 46(1), 47–49 (2010) 8. Lee, H.-J., Nam, J.: Low complexity controllable scrambler/descrambler for H.264/AVC in compressed domain. In: Proc. 14th annual ACM international conf. on Multimedia (2006) 9. Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC. in Joint Video Team(JVT) of ISO/IEC MPEG and ITU-T VCEG 10. Kankanhalli, M.S., Guan, T.: Compressed-domain scrambler/descrambler for digital video. IEEE Trans. Consumer Electronics. 48(2), 356–365 (2002)
Color Image Restoration Technique Using Gradient Edge Direction Detection Gwanggil Jeon, Sang-Jun Park, Abdellah Chehri, Junsang Cho, and Jechang Jeong
1 Introduction To achieve commercial cost effectiveness, most digital cameras use a color filter array (CFA) to capture images with a single charge-coupled device (CCD) sensor array, resulting in a sub-sampled raw image with a single red (R), green (G), or blue (B) component for each pixel of the image [1], [2]. To rebuild a full color image from the raw sensory data, a color interpolation using neighboring samples is requested in order to estimate the two missing color components for each pixel. Such a color plane interpolation is termed demosaicing. In this paper, we consider the interpolation for Bayer color arrays, as shown in Fig. 1(a). A large number of demosaicing methods have been documented [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [17], [16], [18], the simplest of which is bilinear interpolation [3]. However this method does not consider the relationships among the R, G, and B components, and it creates artifacts, specifically across high-frequency component parts, like edges. To enhance the quality of the restoration, several demosaicing methods such as frequency domain-based and edge direction-based have been presented. The methods based on frequency domain have been investigated [9], [12]. In general, their underlying principal is the reconstruction of the missing color components of the R/B channels. This is accomplished by alternating the high frequency subbands of the G channel, and iteratively projecting the information onto the constraint sets. The Gwanggil Jeon and Abdellah Chehri School of Information Technology and Engineering (SITE), University of Ottawa, 800 King Edward, P.O. Box 450, Ottawa, Ontario, Canada, K1N 6N5 e-mail:
[email protected],
[email protected] Sang-Jun Park, Junsang Cho, and Jechang Jeong Department of Electronics and Computer Engineering, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul, Korea e-mail:
[email protected],
[email protected],
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 101–109. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
102
G. Jeon et al.
Fig. 1 (a) Bayer color filter array pattern, (b) reference luminance pixel neighborhood, (c) indicated number of edge directions.
Fig. 2 Image cut through the CFA image filter (image number 19).
edge direction detection-based demosaicing methods determine the edge direction using edge indicators which interpolate the color difference (G/R or G/B) along the edge direction [17], [18]. The algorithm presented in this paper belongs to this latter category. There are two primary reasons for the importance of the G color component. One is that the G color component has double the amount of information as compared to those of the R and B components. Hence, G channel reconstruction is specifically important to human perception [2]. The other reason is that the human visual system (HVS) is more sensitive to changes in the G component than it is to changes in the R or B components. Therefore, the G color component has been considered as an alternative to the of luminance (L) component. However, the G color component itself does not fully represent the characteristic of the L component. In this paper, we first calculate the L component; then using the computed L component, we calculate the unknown R, G, and B components in the chosen Bayer color arrays. The rest of this paper is composed as follows: in Section 2, we present the proposed method. In Section 3, simulation results are obtained to demonstrate the feasibility of the proposed design. Finally, Section 4 presents our conclusions.
Color Image Restoration Technique Using Gradient Edge Direction Detection
103
2 Proposed Algorithm 2.1 Luminance Component Interpolation Most of the demosaicing methods first estimate the G component value, and then, based on this value they interpolate the missing R, and B components. The reasoning behind this methodology is that the G component contains half of all of the color information, and the R and B components are each assigned to a quarter of the pixels [3]. Typically however, the L component is composed of 29.9% R, 58.7% G, and 11.4% B or, roughly half G, and one-quarter for R and B [16]. L = 0.299 × R + 0.587 × G + 0.114 × B.
(1)
L = R + 2G + B.
(2)
Another reason we use the L component instead of the G component is that the images obtained from a CFA are represented as black and white images, as shown in Fig. 2. This means that each pixel has just one color component, therefore according to the position of the pixel the calculated G value is different from that of the original value. In this paper, we first calculate R, G, and B components using a simple filter such as a bilinear interpolator. For example, the results of bilinear interpolation can be obtained by the following equations and Fig. 1(a): G33 =
G23 + G34 + G43 + G32 . 4
(3)
R33 =
R22 + R24 + R44 + R42 . 4
(4)
2.2 Green Channel Reconstruction at the Red or Blue Positions To calculate the G component at the L33 position, we take into account 12 luminance components (L12 , L23 , L14 , L25 , L34 , L45 , L54 , L43 , L52 , L41 , L32 , L21 ) as shown in Figs. 1(b) and 1(c). Let us assume that the distance between neighboring pixels are d. Then the distances between L33 and L23 , L34 , L43 , L32 is d, and the distance between √ L33 and the others are 5d. They pixels are located at ”horizontally d, vertically 2d” or ”horizontally 2d, vertically d” positions. We assume that each edge direction has an indicator k=1,2,...,12. The horizontal direction displacement parameter (HDDP) and the vertical direction displacement parameter VDDP) are represented as dH,k and dV,k , respectively, as shown in Table 1. Let ωk (i, j) be the weight for direction k when interpolating a missing color at position (i, j), such that
ωk (i, j) =
1 1 + δk (i, j)
2 .
(5)
104
G. Jeon et al.
where δk (i, j) is obtained from Equation 6, which considers both horizontal and vertical directional weights (0 ≤ ρV , ρH ≤ 1).
δk (i, j) = ρV (i, j){γk (i − 2, j) + γk (i + 2, j)} + ρH (i, j){γk (i, j − 2) + γk (i, j + 2)} + γk (i, j). ρV (i, j) =
ΓH (i, j) . ΓH (i, j) + ΓV (i, j)
ρH (i, j) = 1 − ρV (i, j).
(6)
(7) (8)
The horizontal and vertical directional weights are determined by two parameters ΓH and ΓV . such that
ΓH (i, j) = |L(i+ 2, j)− L(i, j)|+ |L(i+ 1, j)− L(i− 1, j)|+ |L(i, j)− L(i− 2, j)| (9) ΓV (i, j) = |L(i, j + 2) − L(i, j)| + |L(i, j + 1) − L(i, j − 1)| + |L(i, j) − L(i, j − 2)|
(10)
The kth degree of gradient γk (i, j) is calculated as
γk (i, j) = |L(i + dH,k , j + dV,k ) − L(i − dH,k , j − dV,k )| + |L(i + 2dH,k , j + 2dV,k ) − L(i, j)| + |L(i, j) − L(i − 2dH,k , j − 2dV,k )|
(11)
Table 1 Parameters dH,k and dV,k for computing the green component at the red and blue positions. k dH,k dV,k k dH,k dV,k k dH,k dV,k 1 -1
0 5 -2
-1 9
2
1
2 0
-1 6 -1
-2 10 1
2
3 1
0 7 1
-2 11 -1
2
4 0
1 8 2
-1 12 -2
1
Fig. 3 B component calculations at the G position (a) in the horizontal edge direction, (b) in the vertical edge direction, (c) whole directions for the B component calculation at the R position, (d) whole directions for the R component calculation at the B position.
Color Image Restoration Technique Using Gradient Edge Direction Detection
105
The G values of the B and R component positions can now be calculated, respectively, as G(i, j) = B(i, j) +
∑12 k=1 δk (i, j)κB,k (i, j) . ∑12 k=1 δk (i, j)
(12)
G(i, j) = R(i, j) +
∑12 k=1 δk (i, j)κR,k (i, j) . ∑12 k=1 δk (i, j)
(13)
The equations 12 and 13 are for computing the G component given B (or R) information. The parameters κB,k (i, j) and κR,k (i, j) represent the differences between the G value and the B value and the G value sand the R value in a kth edge direction. κB,k (i, j) and κR,k (i, j) can be calculated, respectively, as
κB,k (i, j) = G(i + dH,k , j + dV,k ) − B(i + dH,k , j + dV,k ).
(14)
κR,k (i, j) = G(i + dH,k , j + dV,k ) − R(i + dH,k , j + dV,k ).
(15)
2.3 Red and Blue Components Predictions at the Green Position To interpolate the B component at the G position, we use six horizontal or vertical edge directions as shown in Figs. 3(a) and 3(b). The employed HDDP dH,k and VDDP dV,k are shown in Tables 2 and 3. The B and R channels at the G position are computed, respectively, B(i, j) = G(i, j) −
∑6k=1 δk (i, j)κB,k (i, j) . ∑6k=1 δk (i, j)
(16)
Table 2 Parameters dH,k and dV,k for computing the red component at the green position. k dH,k dV,k k dH,k dV,k k dH,k dV,k k
a
b k
a
b k
a
b
1 0
-1 3 2
1 5 -2
1
2 2
-1 4 0
1 6 -2
-1
Table 3 Parameters dH,k and dV,k for computing the green component at the green position. k dH,k dV,k k dH,k dV,k k dH,k dV,k k
a
b
1 1
a
-2 3 1
b k
a
b k
2 5 -1
0
2 1
0 4 -1
2 6 -1
-2
106
G. Jeon et al.
Table 4 Parameters dH,k and dV,k for computing the blue component at the red position. k dH,k dV,k k dH,k dV,k k dH,k dV,k k dH,k dV,k 1 -1
0 3 0
-1 5 1
0 7 0
1
2 -1
-1 4 1
-1 6 1
1 8 -1
1
Table 5 Parameters dH,k and dV,k for computing the blue component at the red position. k dH,k dV,k k dH,k dV,k k dH,k dV,k k dH,k dV,k 1 -1
0 3 0
-1 5 1
0 7 0
1
2 -1
-1 4 1
-1 6 1
1 8 -1
1
R(i, j) = G(i, j) −
∑6k=1 δk (i, j)κR,k (i, j) . ∑6k=1 δk (i, j)
(17)
As the R and B channels at the G position have each been interpolated, we may use all eight edge direction neighboring pixels, as shown in Fig. 3(c) and 3(d), when we calculate the R channel at the B position or the B channel at the R position. The lookup tables for HDDP dH,k and VDDP dV,k are shown in Tables 4 and 5, respectively. Finally, the R component at the B position and the B component at the R position are interpolated, respectively, as B(i, j) = G(i, j) −
∑8k=1 δk (i, j)κB,k (i, j) . ∑8k=1 δk (i, j)
(18)
R(i, j) = G(i, j) −
∑8k=1 δk (i, j)κR,k (i, j) . ∑8k=1 δk (i, j)
(19)
3 Simulation Results Using the experimental results, we evaluate the performance of each method using 24 KodakCD testing images with a pixel size of 768 × 512 [19]. The algorithms are implemented on a computer with a Pentium IV processor (3.2 GHz). The operating system is MS-Windows VISTA and the program developing environment is MS Visual C++, 2005. In the experimental results, we use two objective performance comparisons: one is the color peak signal-to-noise ratio (CPSNR) and the other is a spatial S-CIELAB color difference (S-CIELAB Δ E ∗ ). The CPSNR and S-CIELAB Δ E ∗ of a color image with size W × Y are represented by Equations 20 and 21, respectively. CPSNR = 10log10
2552 × 3(W × H) . −1 H−1 c c ∑W i=0 ∑ j=0 ∑c∈Ξ [xori (i, j) − xdm (i, j)]
(20)
Color Image Restoration Technique Using Gradient Edge Direction Detection
107
Fig. 4 CPSNR quality comparison of the 24 test images.
Fig. 5 S-CIELAB Δ E ∗ quality comparisons of the 24 test images.
Δ E∗ =
W −1 H−1 1 × ∑ ∑ ∑ [LABcori (i, j) − LABcdm (i, j)]2 . XY i=0 j=0 c∈Ψ
(21)
B where Ξ = R, G, B, xRori (i, j), xG ori (i, j), and xori (i, j) denote the three color components of the pixel at location (i, j). Additionally, Ψ = L, a, b, LABLori , LABaori , LABbori denote the three CIELAB color components of the pixel at location (i, j). Figures 4 and 5 show the CPSNR and S-CIELAB Δ E ∗ results, respectively. From these we are able to confirm that the proposed method yields the best objective performance out of the eight different methods. Figures 6 and 7 display the magnified sub-images cut from the original full color testing images of numbers 16 and 20, respectively; these are the demosaiced images obtained using the Bilinear, ACPI, DWCI, and the proposed methods. A visual comparison demonstrates that our proposed method produces fewer color artifacts than do the other demosaicing methods.
108
G. Jeon et al.
(a)
(b)
(c)
(d)
Fig. 6 Magnified subimages cut from testing image number 16. (a) Bilinear, (b) ACPI, (c) DWCI, (d) proposed method.
(a)
(b)
(c)
(d)
Fig. 7 Magnified subimages cut from testing image number 20. (a) Bilinear, (b) ACPI, (c) DWCI, (d) proposed method.
4 Conclusion This paper proposed new demosaicing algorithm. Combining the luminance component-based demosaicing algorithm and acquired gradient edge information
Color Image Restoration Technique Using Gradient Edge Direction Detection
109
and the weighting function, an edge-sensitive achieved demosaicing algorithm is achieved. The experimental results show that the proposed method has advantage in terms of both CPSNR and S-CIELAB Δ E ∗ . Acknowledgements. This research was supported by Seoul Future Contents Convergence (SFCC) Cluster established by Seoul R&BD Program (10570).
References 1. Lukac, R., Plataniotis, K.N.: Color filter arrays: Design and performance analysis. IEEE Trans. Consum. Electron. 51(4), 321–354 (2005) 2. Bayer, B.E.: Color Imaging Array. U.S. Patent 3 971 065 (1976) 3. Adams, A.E.: Interactions between color plane interpolation and other image processing functions in electronic photography. In: Proc. SPIE, vol. 2416, pp. 144–151 (1995) 4. Ramanath, B., Synder, W.E.B., Bilbro, G.L.: Demosaicking methods for Bayer color array. Journal of Electronic Imaging 11(3), 306–615 (2002) 5. Battiato, S., Gallo, G.B., Stanco, F.: A locally adaptive zooming algorithm for digital images. Image and Vision Computing 20(11), 805–812 (2002) 6. Pei, S.C., Tam, I.K.: Effective color interpolation in CCD color filter arrays using signal correlation. IEEE Trans. Circuits and Systems Video Tech. 13(6), 503–513 (2003) 7. Lukac, R., Plataniotis, K.N., Hatzinakos, D., Aleksic, H.: A novel cost effective demosaicing approach. IEEE Trans. Consum. Electron. 50(1), 256–261 (2004) 8. Li, X.: Demosaicing by successive approximation. IEEE Trans. Image Process. 14(3), 370–379 (2005) 9. Gunturk, B.K., Altunbasak, Y., Mersereau, R.M.: Color plane interpolation, using alternating projections. IEEE Trans. Image Process. 11(9), 997–1013 (2002) 10. Wu, X., Zhang, L.: Joint Spatial-Temporal Color Demosaicking. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 235–252. Springer, Heidelberg (2005) 11. Lukac, R., Plataniotis, K.N., Hatzinakos, D.: Color image zooming on the Bayer pattern, using alternating projections. IEEE Trans. Circuits Syst. Video Technol. 15(11), 1475– 1492 (2005) 12. Alleysson, D., Susstrunk, S., Herault, J.: Color demosaicing by estimating luminance and opponent chromatic signals in the Fourier domain. In: Proc. IS&T/SID 10th Color Imaging Conference, vol. 10, pp. 331–336 (2002) 13. Zapryanov, G., Nikolova, I.: Demosaicing methods for pseudo-random Bayer color filter array. In: Proc. ProRisc 2005, pp. 687–692 (2005) 14. Lukac, R., Plataniotis, K.N., Hatzinakos, D., Aleksic, M.: A new CFA interpolation framework. Signal Process 86(7), 1559–1579 (2006) 15. Chung, K.H., Chan, Y.H., Fu, C.H., Chan, Y.L.: A high performance lossless Bayer image compression scheme. In: Proc. IEEE ICIP 2007, vol. 2, pp. II-353–II-356 (2007) 16. Chung, K.-L., Yang, W.-J., Yan, W.-M., Wang, C.-C.: Demosaicing of color filter array captured images using gradient edge detection masks and adaptive heterogeneityprojection. IEEE Trans. Image Process. 17(12), 2356–2367 (2008) 17. Chang, H.-A., Chen, H.: Directionally weighted color interpolation for digital cameras. In: Proc. IEEE ISCAS 2005, vol. 6, pp. 6284–6287 (2005) 18. Zhang, L., Wu, X.: Color demosaicking via directional linear minimum mean squareerror estimation. IEEE Trans. Image Process. 14(12), 2167–2178 (2005) 19. http://r0k.us/graphics/kodak/
Watermarking with the UPC and DWT Evelyn Brannock and Michael Weeks
1 Introduction Advancements in digital technologies offer new and easier methods in which consumers can purchase and enjoy creative content, for example, by downloading and viewing movies on a notebook computer. These advancements enable those who wish to profit from digital media that they did not author and do not own to illegally profit from the efforts of others. The preservation of the control of creative digital content poses a laborious challenge to those who wish to retain control of their work. In the past, U.S. copyright law has tried to provide protection for this intellectual property. However, legal means have proven to not be sufficient for the creators and owners of the work, and the unauthorized copying and redistribution of their copyrighted works have caused their economic returns to decline. For example, the International Intellectual Property Alliance (IIPA) estimates the annual loss of revenue due to copyright violations to be 1.5 billion U.S. dollars in the U.S. motion picture industry and 2.3 billion U.S. dollars in the record and music industries in 2003 [7]. Consequently, the study of technological approaches to stall digital piracy has become increasingly important and relevant. One such approach is digital watermarking. Watermarks serve to identify the owner and creator of the content and thus help in investigating abusive duplication. The suggested method is to embed a UPC symbol into the digital media itself. The North American retail industry has used the Universal Product Code since 1973. There are two different versions of UPC bar codes, UPC-A and UPC-E. The UPCA bar code is 12 digits long, including its checksum. UPC-E bar codes are special shortened versions of UPC-A bar codes and are 6 digits long. Both UPC-A and UPC-E bar codes may have optional 2 or 5 digit supplemental codes appended to them. Because a checksum is required, and the checksum calculation is complex, these characteristics make the UPC a natural choice for a watermark. The impact of embedding such a watermark, the perceptibility of the watermark, and the robustness of the embedded watermark will be investigated, in an extension of [14]. For the purposes of this study, an uncomplicated key will be used. The complexity and G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 111–120. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
112
E. Brannock and M. Weeks
size of the key does not have any effect on the visibility of the watermark. However, for commercial applications the key should be large enough that it is difficult to decipher, and it will deter extensive search attacks. In this chapter, the next section will cover wavelets, section 3 will examine digital watermarking principles, and then section 4 will discuss the method used. Section 5 presents results, and section 6 concludes the chapter.
2 Wavelets The time and frequency information provided by the wavelet transform is well suited for the analysis of time-varying, transient signals. The Wavelet Transform suits many applications very well since most of the real-life signals encountered are time varying in nature [28]. The Discrete Wavelet Transform (DWT) is currently applied in a wide variety of applications in science, engineering, mathematics and computer science, including audio and video compression, removal of noise in audio, and the simulation of wireless antenna distribution. The DWT is used to implement a simple watermarking scheme. When the 1-D DWT is applied in both the horizontal and the vertical directions, the 2-D DWT is the result. The signal is channeled into similar (low-pass) and discontinuous/rapidlychanging (high-pass) sub-signals by the low-pass and high-pass filters of the wavelet transform. Therefore, the DWT separates an image into a lower resolution approximation image (LL) as well as horizontal (HL), vertical (LH) and diagonal (HH) detail components. The slow changing aspects of a signal are preserved in the channel with the low-pass filter and the quickly changing parts are kept in the high-pass filter’s channel. The end result of the first iteration if the 2-D DWT is the decomposition of the image into sub-images, 3 details and 1 approximation. The approximation looks just like the original; only it is scaled down to 1/4 of the size. The regions that human vision is less sensitive to, such as the high resolution detail bands (LH, HL, and HH), can be isolated using the 2-D DWT. Therefore we can embed high energy watermarks in these regions. Embedding watermarks in these regions allow us to increase the robustness of our watermark, at little to no additional impact on image quality [12]. Multi-resolution is the process of taking a filter’s output and putting through another pair of analysis filters. The fact that the DWT can be employed in a multi-scale analysis can be used to the watermarking algorithm’s efficacy. The first approximation will be used as a “seed” image and we can then recursively apply the DWT a second and third time (or however many times it is necessary to perform to find all of the areas of interest) [28]. See [13] for more background on wavelets, and [9] for wavelet history.
3 Digital Watermarking Hartung and Kutter define a digital watermark as a digital code unremovably, robustly, and imperceptibly embedded in the host data and typically contains information about origin, status, and/or destination of the data [5].
Watermarking with the UPC and DWT
113
The intention of the digital watermark is to hide the embedded data, often without the knowledge of the viewer or user, which is a form of steganography.
3.1 Blind Watermarking There are two main types of watermarks. A blind (or public) watermark is invisible, and is extracted “blindly” without knowledge of the original host image or the watermark itself. The second is non-blind (also asymmetric marking or private), in which the watermark is embedded in the original host, and is intentionally visible to the human observer. The original data is required for watermark extraction [14, 5]. The blind watermark has many more applications than the visible watermark, and we use it in this chapter. Blind watermarking hides the embedded data, often without the knowledge of the viewer or user, so it is a form of steganography. Watermarks add the property of robustness, which is the ability to withstand most common attacks [14]. The watermark can be considered to have been successfully attacked if its existence is determined, either by visual or technological means. Attacks usually include two types: removing the watermark and rendering the watermark undetectable [11].
3.2 Scoring and Evaluation There is usually a compromise between the size of the watermark and the watermark robustness, and perceptibility [10]. The goal is to programmatically emulate the human visual system with an objective measure that mirrors the response from human observers [3]. Therefore, the quality measure used for this evaluation of the embedding and extracting of the UPC symbol in the watermarking algorithm is the IQI (Image Quality Index) proposed by Wang and Bovik [16]. The IQI is designed by modeling any image distortion as a combination of three factors: loss of correlation, luminance distortion and contrast distortion. The IQI does not explicitly employ the human visual system model, and the new index is mathematically described. Wang and Bovik purport that experiments on various image distortion types show that it exhibits surprising consistency with subjective quality measurement [16].
The implementation of the IQI in this chapter supports Wang and Bovik’s conclusions. While the reader may not have the same comfort level with IQI as the PSNR, we want to establish that the IQI does a good job of measuring quality for this application. Earlier research with PSNR tended to be misleading, as one recovered image could have a negligible difference in PSNR compared to a second image, while the recovered watermark was superior. For visual corroboration, compare 1 to the recovered watermarks shown in section 5, such as figures 2 and 3. The IQIs of the recovered watermarks measure 0.4309 and 0.4237 respectively, and the watermark is clearly visible.
114
E. Brannock and M. Weeks
If x is the original signal of length N and y is the test signal of the same length N, the IQI Q is defined as: Q=
4σxy x¯y¯ (σx2 + σy2 )[(x) ¯ 2 + (y) ¯ 2]
[16] and [17]. One component of the measurement is the linear correlation between x and y. The next measurement is the mean luminance between x and y. The last component measures how similar the contrasts of the images are. The range of Q is −1 to 1. The best value is 1 which means the tested image is exactly equal to the original image and the worst value occurs at −1 when yi = 2x¯ − xi for all i = 1, 2, ..., N. This represents two images that are “opposites.” For example, in a black and white image, for the IQI to be −1, every location in which x contains a zero, y would contain a pixel of the value one, and for every one in x, y would have a zero in its place.
3.3 Watermarking Techniques All watermarking methods share a watermark embedding system and a watermark extraction system [5, 14]. There are two main watermarking techniques available: spatial domain and frequency domain. The technique used in this chapter will embed the watermark using the DWT, utilizing a frequency domain method. Signal transforms other than wavelets have historically been used for watermarking, including the global [18] and block based DCT [19], the DFT [20], and the Fourier-Mellin transform [21]. Other methods combine multiple transforms. Some use the properties of the DFT and the DWT ([22]) while others utilize the DCT and the DWT, as Corvi does, in [23]. Meerwald, in [24], gives compelling reasons why wavelets are excellent for watermarking. The DWT performs well for analyzing image features such as edges and textures. The multiresolution capability allows the watermarker to reach signal information that may not otherwise be obtainable when using other transforms. The wavelet has superior HVS modeling. The way that the wavelet transform breaks down an image into the approximation and the details closely mimics how the human eye is thought to behave. As copared to the DCT whose computational complexity is O(nlogn), the DWT’s complexity is O(n). As well, the wavelet transform is flexible and can be chosen according to the properties of the image [24]. One of the most exploited characteristics of wavelets is multiresolution. Three or four octaves may be used, depending on how large the image is. As well, watermarking methods have been proposed that do not rely on multiresolution and embed the information in the low-frequency subbands in the first octave [25]. There are some commercially distributed watermarking software applications, such as VeriData iDem and SureSign Enterprise from Signum Technologies [26]. Visual Watermark, from Visual Watermark, provides a batch watermarking system that aims to protect digital photos [27].
Watermarking with the UPC and DWT
115
4 Implementation A simple watermark raster UPC image was embedded in each of the images. It was created by using the UPC Bar Codes 4.0 software from Elfring Fonts Inc. [2]. The bar code was used for experimental and research purposes and was not assigned by the Uniform Code Council [4], nor meant to use for industry. The watermark is 140 × 76 pixels. Fig. 1 UPC watermark bitmap embedded
The Haar wavelet, the orthogonal, 4-coefficient, Daubechies wavelet (i.e. db2), the 32nd order Daubechies wavelet (i.e. db32), three biorthogonal wavelets, including a reverse biorthogonal wavelet (bior2.2, bior5.5, rbio6.8), the symlet 8coefficient wavelet and the 4th order coiflet wavelet were used to compare results.
4.1 Images Used Varying sizes, complexity and types of images were chosen. All images are grayscale. The image database consists of ten images including “Barbara” - 512 × 512 pixels, “Boat” - 512 × 512 pixels [15],“Circuit” - 272 × 280 pixels, “Dog on Porch” - 256 × 256 pixels [28], “Frog” - 620 × 498 pixels [15], “Gold Hill” 512 × 512 pixels [15], “Library” - 464 × 352 [15], “Mandrill” - 512 × 512 pixels (available at [15]), “Mountain” - 640 × 580 pixels [15], and “Text” - 256 × 256 pixels [15]. All of the images have been used with permission or are non-proprietary.
4.2 Algorithm First we apply the 2-D DWT to the image. This gives us four quadrants 1/4 the size of the original image. We will manipulate two quadrants of coefficients, the horizontal details (HL) and vertical details (LH), because these are most likely the less visible to the human eye. A pseudo random noise pattern is generated using the secret key image as a seed, and each of the bits of the watermark are embedded in the horizontal (HL) and vertical (LH) coefficient sub-bands using this pattern. The equation used to embed the watermark is: Wi = Wi + α Wi xi Wi
= Wi
for all pixels in LH, HL
for all pixels in HH, LL [8]
116
E. Brannock and M. Weeks
Wi is the watermarked image, Wi is the original image, and α is a scaling factor. Increasing α increases the robustness of the watermark, but decreases the quality of the watermarked image. We use the same α (the constant 2 on a scale of 1 to 5) as used in [8]. Finally, we write the image, and calculate the IQI. We apply the 2-D inverse DWT to the image Wi∗ to extract the watermark. The same secret key is used to seed the random function and to generate the pseudo random noise pattern. The correlation, z, between the pseudo random noise and the horizontal and vertical details is found, and if that correlation exceeds a threshold (the mean of the correlation), a pixel in the watermark is located. yi is a candidate pixel of the watermark and M is the length of the watermark. z=
1 ∑ Wi∗ × yi [8] M 1,M
As a last step, the extracted watermark is written and the IQI is calculated to determine the success of the operation.
5 Results The following images (figures 2 through 7) are some of the representational results of the watermarking implementation in this project. The caption for each image states the name of the original image, the wavelet used, and the IQI received. The table 1 shows the statistics obtained. The table examines the watermark embedding process, and shows the average IQI obtained across the ten images. The table also examines the watermark extraction results. When viewing the resulting UPC recovered watermarks, recall that the scanner reads the bars only. The numbers are only necessary if the scanner cannot read the bars, and the UPC must be enter manually. However, both the bars and the numbers are quite clear. Fig. 2 Recovered UPC watermark from the frog image using the Haar wavelet with an IQI of 0.4309
Fig. 3 Recovered UPC watermark from the boat image using the Haar wavelet with an IQI of 0.4237
Watermarking with the UPC and DWT Fig. 4 Recovered UPC watermark bitmap from the gold hill image using the Haar with an IQI of 0.4227
Fig. 5 Recovered UPC watermark bitmap from the Barbara image using the Haar with an IQI of 0.4091
Fig. 6 Recovered UPC watermark bitmap from the library image using the Haar with an IQI of 0.3387
Fig. 7 Recovered UPC watermark from the frog image using the biorthogonal 2.2 wavelet with an IQI of 0.2540
Table 1 Watermark Embedding and Extraction Average Average Wavelet Embedding IQI Extraction IQI Haar .3056 .3743 Daubechies .1862 -.2201 Daubechies 32 .2357 -.0053 Bior 2.2 .2342 .2051 Bior 5.5 .2095 -.0428 Symlets 8 .2168 -.0267 Coiflets 4 .2181 -.0106 Rev. Bior 6.8 .2206 .0138 All .2283 .0359
117
118
E. Brannock and M. Weeks
6 Conclusions Historically, the Discrete Wavelet Transform has shown its effectiveness for watermarking applications. It allows the embedding of a watermark at higher level frequencies, which are not as visible to the human eye, via the access to the wavelet coefficients in the HL and LH detail sub-bands. However, the type of watermark has not been given as much attention. Using the Haar wavelet, with an extraction average of .3743, produces desirable results. From visual observation, an IQI above .30 produces an acceptable bar code. See figures 2, 3, 4, 5 and 6 with IQIs of 0.4309, 0.4237, 0.4227, 0.4091 and 0.3387 respectively to confirm. As a point of interest, none of the other wavelet families produce acceptable values using the IQI measure, but the Haar results are very positive and well worth further examination. From Table 1, we see that the average embedding IQI ranges from .1862 to .3056. This means that embedding the watermark will distort the image, but that the Haar wavelet distorts it the least. The extraction column shows that the Haar wavelet gives the best reconstructed watermark. Though this average IQI value is low (.3743), the figures 2 through 6 clearly show that it is recognizable. The lowest IQI pictured, .3387, in figure 6 is still readable, and is well below the average IQI received. If the UPC is left in its virgin state, the great majority of the bitmap is black. This study suggests some manipulation of the bar code could perhaps occur before embedding it into the digital media. In this study we see with no manipulation, the resulting extracted UPC is definitely readable for the results using the Haar wavelet. As a watermark, the UPC suggests itself. It is unique, standardized [4] and has properties that are understood by current and widely available technology, such as scanners. As well, it may be possible that the recovered watermark could be scanned automatically. A digital product could have the bar-code hidden (or a least inconspicuous), but perhaps detectable, as in an automatic cash-register system, adding just another safeguard to prevent the theft of digital media. This chapter uniquely shows that using a UPC symbol as a watermark is viable. It also demonstrates, for the first time, that the Haar wavelet provides superior performance using the UPC as a watermark.
References 1. 2007 Global Software Piracy Report, Technical Report, BSA (Business Software Alliance) and IDC (International Data Corporation), UCB//CSD-02-1175 (May 2008) 2. UPC Bar Code Font Set, Elfring Fonts website, http://www.elfring.com/barupc.htm/ (Accessed January 2010) 3. Eskicioglu, A.M., Fisher, P.S.: Image quality measures and their performance. IEEE Transactions on Communications 43(12), 2959–2965 (1995) 4. GS1 US BarCodes, GS1 US website, World Wide Web electronic publication, http://barcodes.gs1us.org/dnn-bcec/Default.aspx (Accessed January 2010)
Watermarking with the UPC and DWT
119
5. Hartung, F., Kutter, M.: Multimedia watermarking techniques. Proceedings of the IEEE 87(7), 1079–1107 (1999) 6. International Federation of the Phonographic Industry, Technical Report IFPI music piracy report, UCB//CSD-02-1175 (June 2001) 7. International Intellectual Property Alliance, IPA 2002-2003 estimated trade losses due to copyright piracy. World Wide Web electronic publication (February 2004), http://www.iipa.com/pdf/2004SPEC301LOSS.pdf 8. Inoue, H., Miyazaki, A., Katsura, T.: An image watermarking method based on the wavelet transform. In: ICIP 1999. Proceedings. 1999 International Conference on Image Processing, vol. (1), pp. 296–300 (1999) 9. Jaffard, S., Meyer, Y., Ryan, R.D.: Wavelets Tools for Science and Technology. Society for Industrial and Applied Mathematics. SIAM, Philadelphia (2001) 10. Johnson, N.F., Duric, Z., Jajodia, S.: Information Hiding: Steganography and Watermarking - Attacks and Countermeasures. In: Advances in Information Security, vol. 1. Kluwer Academic Publishers, Norwell (2006) 11. Kirovski, D., Petitcolas, F.A.P.: Blind Pattern Matching Attack on Watermarking Systems. IEEE Transactions on Signal Processing 51(4), 1045–1053 (2003) 12. Langelaar, G., Setyawan, I., Lagendijk, R.L.: Watermarking Digital Image and Video Data. IEEE Signal Processing Magazine (17), 20–43 (2000) 13. Mallat, S.: A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Pattern Analysis and Machine Intelligence 11(7), 674–693 (1989) 14. Petitcolas, F.A.P., Anderson, R.J., Kuhn, M.G.: Information hiding - A survey. Proceedings of the IEEE 87(7), 1062–1078 (1999) 15. Some test images are from: Vrscay, E.R., Mendivil, F., Kunze, H., La Torre, D., Alexander, S.K., Forte, B.: Waterloo Repertoire GreySet (1 and 2), In: Waterloo Fractal Coding and Analysis Group website, University of Waterloo, Waterloo, Ontario, Canada, http://links.uwaterloo.ca/greyset1.ba-se.html and http://links.uwaterloo.ca/-greyset2.base.html (Accessed January 2010) 16. Wang, Z., Bovik, A.: A universal image quality index. IEEE Signal Processing Letters 9(3), 81–84 (2002) 17. Wang, Z.: IQI, IQI website (2004), http://www.cns.nyu.edu/˜wang/files/ -research/qualityindex/demo.html (Accessed January 2010), web Page 18. Cox, I.J., Kilian, J., Leighton, T., Shamoon, T.: Secure Spread Spectrum Watermarking for Multimedia. IEEE Transactions on Image Processing 6(12), 1613–1621 (1995) 19. Koch, E., Zhao, J.: Towards Robust and Hidden Image Copyright Labeling. In: Proceedings of 1995 IEEE Workshop on Nonlinear Signal and Image Processing, June 1995, pp. 452–455 (1995) 20. Ramkumar, M., Akansu, A.N., Alatan, A.A.: A robust data hiding scheme for images using DFT. In: IEEE Proceedings of the International Conference on Image Processing, pp. 211–215 (1999) 21. Lin, C.-y., Wu, M., Member, S., Bloom, J.A., Cox, I.J., Member, S., Miller, M.L., Lui, Y.M.: Rotation, scale and translation resilient watermarking for images. IEEE Trans. Image Processing 10, 767–782 (2001) 22. Onishi, J., Matsui, K.: A method of watermarking with multiresolution analysis and pseudo noise sequences. Systems and Computers in Japan 10(5), 11–19 (1998) 23. Corvi, M., Nicchiotti, G.: Wavelet Based Image Watermarking for Copyright Protection. In: The 10th Scandinavian Conference on Image Analysis (SCIA 1997), June 1997, pp. 157–163 (1997)
120
E. Brannock and M. Weeks
24. Meerwald, P., Uhl, A.: A survey of wavelet-domain watermarking algorithms. In: Proceedings of SPIE, Electronic Imaging, Security and Watermarking of Multimedia Contents III, pp. 505–516 (2001) 25. Xie, L., Arce, G.R.: Joint Wavelet Compression and Authentication Watermarking. In: 1998 Proceedings of the IEEE International Conference on Image Processing - ICIP, October 1998, pp. 427–431 (1998) 26. Signum Technologies, SureSign Enterprise website, http://www.signumtech.com (Accessed April 2010) 27. Visual Watermark, Visual Watermark website, http://www.visualwatermark.com (Accessed April 2010) 28. Weeks, M.: Digital Signal Processing Using MATLAB and Wavelets. Infinity Science Press (2006)
Building a Novel Web Service Framework -Through a Case Study of Bioinformatics Hei-Chia Wang, Wei-Chun Chang, and Ching-seh Wu
Abstract. A service-oriented framework aimed to compose the requested Web services automatically is proposed and described. The concept of service-oriented architecture is regarded as an open standard of utilizing distributed software components. The main characteristics of the architecture include better interaction, distribution, and easy communication. The communication platform among the components is via the Internet. The flexibility of choosing the most suitable software components in order to satisfy various kinds of service requests over the Internet is the main characteristic of the architecture. The key issue of the service concept is that available Web service components can be provided by different providers. The composition process is then to target the most suitable ones according to users’ criteria. Another issue regarding the transformation from users’ request (recorded as a workflow) into XML-based (eXtensible Markup Language) format for the process of composition is an ongoing study field. To search for the most suitable Web service components depends on the criteria set by the users. Quality of service can help to allocate these targeting service components. To propose a solution framework, we deal with the issues aforementioned. To illustrate the infeasible design of our framework, a biological field case study was applied and the results were presented and discussed. Keywords: Web service, Web service composition, workflow, XML.
1 Introduction Under the flourishing development of Web technologies, the applications over the Internet have been evolved from simple communication to computational-based Hei-Chia Wang Institute of Information Management, National Cheng Kung University Wei-Chun Chang Department of Information Management, Shu-Te University e-mail:
[email protected] Ching-seh Wu Department of Computer Science and Engineering, Oakland University, Michigan, USA G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 121–135. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
122
H.-C. Wang, W.-C. Chang, and C.-s. Wu
information system [1]. The exploration and development of the Web technologies provide the global infrastructure for exchanging and sharing information and computational resources over the Internet. The Internet has become a global platform where organizations and individuals can communicate with each other in order to carry out various commercial activities and to provide value-added services. The concept of composing Web services (WS) is the embodiment of implementing these activities and services. Supporting technologies for WSs provide users the functionalities to choose and combine required service components in order to fulfill their requests. Enterprises are also benefited from using WSs to reduce the transaction time and to make the productions of their work over the Internet. Accordingly, it is necessary for a company to utilize WSs to communicate and carry out business activities with their partners in order to maintain advantages and remain competitive on the market [2]. Following the scene, ”Web service technology” is not only offering competition advantages, but also becoming the key to be competitive. Currently, the composition and utilization of WSs conforms the service-oriented architecture (SOA) design proposed by WWW consortium[3]. In the SOA, service providers register their service components at a public Universal Description, Discovery, and Integration (UDDI) server and service requesters discover the service components required to accomplish the functionalities they requested from an UDDI server. The SOA implementation will soon encounter great difficulties in order to search and compose suitable service components when large amount of service components are registered for public usage. Moreover, matching the requirements of customers who request services over the Internet is another challenge issue in which the composition process and recommendation strategies are concerned. The key to successfully utilize WSs is to search and organize a set of service components from an UDDI server that are suitable to perform the functionalities matching users’ criteria. Currently, most UDDI servers only provide simple classifications and service descriptions for the service registrants. Therefore, it will be very difficult for requesters to search for suitable service components needed. One of the key stages in the SOA is to search all available WS components published or registered from various service providers. UDDI servers provide standard taxonomies that can be used to describe each entry (a WS component) applying the search terms conforming in industry standard. Meanwhile, service providers can register their service components based on the categorization. Following the categorization, service requesters can quickly inquiry service components and efficiently compose them together. A foreseeable problem is the efficiency to search suitable component through UDDI categorization when large amount of service components are available. Moreover, current categorization methods only focus on business classification, product and service types, and service provider location. To correctly discover those suitable service components, further classification to the function that the WS offers should be added. Therefore, it is an urgent problem that remains to be solved while searching for suitable WS component in order to complete a WS composition. In this article, we study the possibility of constructing an automated framework that can be used to process the WS requesting procedure automatically. The framework will provide the functionalities of service classification, graphical user interface design for
Building a Novel Web Service Framework
123
constructing workflow, an effective and automated WS composition method including user requirements and eXtensible Markup Language (XML) representation transformation, workflow establishment, service search, and composition recommendation.
2 Literature Review 2.1 Web Service Request According to the document published by the WWW consortium [3], WSs can be described as software entities that are capable to deliver certain functionalities for customers over a network. Following the definitions and specifications of WS, any organization, company, or even individual developers who can deliver such functional entities can register and publish their service components to an UDDI registry for the public usage. The concept of WS composition is to compose several independent service components in order to accomplish new functional services requested by customers [4]. The composition process is very complex regarding the interoperability of integrating various service components to match the task requirements in a workflow. The key to successfully compose useful components for a workflow is to automate the composition process instead of manually integrating all required service components. Technologies to support the concept of automated service composition include searching automation, service selection, and the consistency assurance between semantics and data type [5]. In addition, several composition languages are proposed to improve the performance of WS composition process, e.g. Web Service Flow Language (WSFL, reference), XML Language (XLANG, reference), Business Process Execution Language For Web Services (BPEL4WS, reference), DARPA Agent Markup Language for Services (DAMLS, reference). To complete and satisfy a WS request, a corresponding process is urgently required. The process aims to accomplish the implementation of the services from collecting user requirements to the composition of corresponding service components.
2.2 Before the Composition - Organizing the Request Workflow A service request can be decomposed as a sequence of tasks. Each task request is accomplished by a WS component [6]. The objective of the process is to search for suitable WS components and combine them together in order to carry out the functional results requested by users. The starting point of utilizing WSs is the requirements from customers. Generally, we can apply use-case or scenario-based method to elicit and represent the requirements. As a result, a workflow is generated as the service request (input) in the SOA. Therefore, service requests can be treated as the Internet workflow that requires corresponding pieces of software component to complete the task sequence requested by users.
124
H.-C. Wang, W.-C. Chang, and C.-s. Wu
2.3 Composition Solution - Technologies to Support WS Composition When an enterprise begins to utilize some basic WSs, the high-level functional demands such as the service security, the service composition and the service semantics will increase afterwards. These demands are critical to successful deploy WSs in the enterprise [7]. Supporting research work has been proposed to improve the performance of the automated composition process. Several composition models are proposed to accomplish the composition process. However, there are some shortcomings existed in these models which is urgently required to be studied in order to improve their performance. For instance, in WebDG and IRS-II, users are strongly expected to have the ability to describe their demands and procedures in XMLbased format; and these tools did not provide the effective user interface to assist users through the designing procedure. As a result, it is very difficult to transform users’ demands into formatted workflows. In the workflow construction, WebDG, Model Driven Service Composition [8] does not reuse workflows in WS composition; the model does not demarcate workflows according to their abstract level. Two problems have occurred due to the inefficiency of workflow reusability and the repetition of workflow construction. The searching process used to search suitable service components has difficulties to automate the process in most of the composition models aforementioned.
3 Composition Integration - Proposing an Automated Service Composition Framework To solve the composition integration problem, an automated WS composition framework to transform users’ demands into XML-based representation format and to progress the process automatically (see Figure 1) is proposed. The framework includes three main functional components to achieve the goal, e.g. a Graphical User Interface (GUI) tool to assist the workflow construction, a searching mechanism based on semantic classification technique, and an automated composition process. A GUI tool used to transform users’ demands into workflows. The tool transforms incoming workflows into rules to be stored in a database which can be used to infer the WS composition automatically. For the construction of second functional component, i.e. a searching mechanism, we utilize semantic categorized technology. The working theory is to categorize a service component based on the WS description. The WS will associate with many classifications respectively in accordance with their functional description in which it is beneficial to search for similar service components with higher speed. Accordingly, a search mechanism, multidimensional approach, is applied to look for the most suitable WS components matching the criteria of the service process; and make recommendation solutions to customers in order to automate the process of WS composition. The approach utilizes Q-grams, semantic information, and quality of services (QoS) to calculate similarity between tasks in the workflow and WSs that have same functionalities in
Building a Novel Web Service Framework
125
Fig. 1 The architecture diagram of the automated composition framework.
order to increase its correctness. A key issue is to consider the composability of WS components [8]. Finally, we integrated these functional components and proposed an automated framework to combine suitable WS components for customers.
3.1 The Infrastructure of the Framework 3.1.1
Request Workflow
The foundation of satisfying users’ requests over the Internet compiles the analysis and construction of service workflow, the automation of composing suitable WS components, and WS component reusability. To satisfy users’ criteria, we proposed a framework (see Figure 1) where we integrated a repository section and several modules into the SOA. A structure model is proposed and shown in Figure 2. In this research, we focus on grey blocks presented in the figure. In the structure model, three participants are presented: service provider, process constructor and client. The role of the participants is defined as follows: • Service provider: The service provider utilizes WSDL combine with QoS, precondition and effect to describe a specific application program, and register the Web service to UDDI server and UDDI server will creates a link to Web service. • Process constructor: The process constructor who constructs to the specific domain, build the execution order of operating or planning the activity of service, and use these processes after in Web services composition. • Client: Accord with his demands to get the service composition dynamically via service-oriented based platform in order to finish his demand purpose.
126
H.-C. Wang, W.-C. Chang, and C.-s. Wu
Fig. 2 The proposing architecture of the automatic Web services composition model.
In addition, this research offers a broker mechanism, database and workflow to operate with UDDI to get workflow that is automatic composition to finish user demand. This broker mechanism includes six modules, following to define and explain these modules respectively. • Classification Module: when service provider register specific application to Web Services Registry (UDDI), system according to Web service description that the service provider describe to get functional key and belong to the Web service respectively to a lot of classifications in order to benefit it while searching the Web service to filter the service with dissimilar function and improve the searching efficiency and exactness. • Rule Convert Module: When process constructor construct workflow via the browser, we offers the graphical user interface to construct workflow and utilizes Rule Convert Module mechanism to change workflow to rule format and utilize the Web services description language to describe the service operates or plans execution order in specific domain and then use to Web services composition. • Request Convert Module: to transfer the functional requirement to WSDL document via browser. • Process Control Module: Process Control Module offers a mechanism to change WSDL that use Request Convert Module produces into a Web service workflow. It used to show that the Web service carries out the control of order, it divides workflow into two kinds of Abstract Web Process and Concrete Web Process, its difference lies in Concrete Web Process points out Web services linked clearly. • Process Select Module: According to Abstract Web Process and user’s demand for the Web service, may produce a lot of Concrete Web Process, so Process
Building a Novel Web Service Framework
127
Select Module offers a mechanism to choose Concrete Web Process accorded that satisfy user’s demand. • Execution Control Module: Execution Control Module includes two parts is Execution Code and Fault handling mechanism respectively. Execution Code is transfer Concrete Web Process to Web service description language document and use the workflow engine to interpret and execute document. The Fault handling mechanism offer a fault-tolerant mechanism to choose the suitable Web service to substitute automatically when is being looked on as the incorrect link causing the failure or providing service that the communication or information of application program are being changed invalid. With above-mentioned modules, the clients only need to choose the function needed through the browser, the Web service composition that can be automatic, and choose the best process from a lot of feasible Web services composition, and then transfer the Web services description language document, execution and link Web services further. The functional classify of Web service increase searching efficiency and exactness, so strengthen the management of the automatic Web services composition greatly. Through the Process Control Module to reuse workflow can reduce the Web services process development time and achieve the purpose to combine automatically. And then we will introduce our services composition structure from three roles: process constructor, services provider and client. 3.1.2
Roles
In the architecture, three main roles are identified to corporate in the process including service provider, process constructor, and client. The service provider utilizes Web Service Description Language (WSDL) combined with Quality of Service (QoS), precondition and effect to describe a specific WS component (see Figure 1). The service component will be published to an UDDI server for public usage.
Fig. 3 Automatic Web services composition structure- Process constructor.
Process constructor The process constructor constructs a sequence of operating or planning service activities in a specific domain (see Figure 3). These processes are used in the composition of WSs afterwards. Relevant procedure as Figure 3 shows, when the process constructor use the process construct interface (Figure 4) to build the process, the system will change the process that construct by user into the structure of the rule and store up in Rule base to use while construct the process via inference. And we adopts BPEL4WS to describe the workflow in order to show how to operate a
128
H.-C. Wang, W.-C. Chang, and C.-s. Wu
Fig. 4 Process construct interface.
succession of Web services and store in workflow database later can offer the workflow engine to use while carrying out the workflow. The whole process builds and constructs interfaces to mainly divide into three blocks: on the left show the input parameters, output parameters, the workflow that has already existed at present and functional classification; on the right is workflow input area; The button below is the relevant button of the order for the construction workflow. At construction workflow order of carrying out, one software developer or organization as reach a certain specific goal will put several activities together, and follow its process established to keep to the conventional way of carrying out. Because the process affiliated types are different so have some differences in carrying out the order, therefore we gathers together the process that propose by [9, 10] as basic process composition modes use in our research. Service provider Relevant service providers register the Web services and provide organization relevant information, relevant information of the service and QoS information. Functional Description analytical method is mainly to make use of Classification Module to analyze the Web services description. First, remove words that not having information value (stopwords), such as “and” and “of”. We adopt Fox [11] proposed words to be the stopwords and move duplicate words to get the Web service’s functional keys. And then we adopt Porter Stemming Algorithm to stem the functional keys to express with true features the words and utilize WordNet to expand the meaning of word. Finally is to confirm if there have a same functional classification that has already been stored in Service Classification database, if the classification has already stored in database and belonged to the Web service respectively to this classification in accordance with the functional key that its offers, and store it in Service Classification database; Except for set up new classification and belong to the Web service in this classification. In this method, when receive more functional keys by Web service description, Web service affiliated classification more.
Building a Novel Web Service Framework
129
Client From the client view, the institute show as follows five phases of services composition life-cycle: • Planning phase: the user can choose the function that the process needs and provide the Web service quality and relevant information such as input and output. • Definition phase: In this phase mainly to transfer the function which the client determines and relevant information to requirement specification of WSDL form. • Scheduling phase: In this phase we will use requirement specification to produce the abstract composition automatically via rule base and abstract web process database. • Construction phase: It mainly to produce the concrete and unambiguous service composition process (Concrete Composition) to clear points out the services link. We subdivide this phase into Matchmaking Phase (Mainly to search Web services that accord with abstract service composition process every service operate or plan service of activity) and Recommendation Phase (Offer a mechanism to choose a most suitable process from the feasible concrete service process that Matchmaking phase produces, and provide Execution phase to carry out). • Execution phase: Transfer the concrete service process to be a Web service composition specification file, and carry out the process via the workflow engine. In services composition life-cycle, we focus on the first four phase mainly. The client posts demands in order to get the composed service components dynamically via service-oriented platform (see Figure 2). 3.1.3
Composability Mechanism
Composability mechanism is mainly based on the abstract service composition produced in the planning phase. The mechanism combines suitable WS components based on the search over an UDDI server. It is divided into two major parts including match and composability functions (see Figure 5). For the match function, we adopted multi-dimensional approach including q-grams [10], semantic information, and QoS to compare the similarity between tasks in the process and to identify those WS components that have same functionalities in order to increase its exactness of composition results. We used a composability function [8] to compare the parameters (i.e., I/O data type and binding) of WS components to make sure the communication ability between components.
3.2 Quality of Composition Mechanism When multiple WS compositions with similar functionalities are generated, we need a recommendation mechanism to assist users for targeting a most suitable composition in order to accomplish their requests. Commonly, Collaborative Filtering Approach (CFA) [12] is used to recommend a suitable WS composition for the user. The CFA considers the common characteristics of community members and their suggestions to propose a solution. However, it is different between the product
130
H.-C. Wang, W.-C. Chang, and C.-s. Wu
Fig. 5 Composibility Mechanism.
recommendation and workflow recommendation. The quality measurement of WS component is lost in the CFA. Accordingly, it is unsuitable for the WS quality evaluation. In the WS composition process, we adopted the stochastic workflow reduction (SWR) algorithm to evaluate QoS. The algorithm accommodates four models: sequential, parallel, conditional, and loop for the evaluation. We added an evaluate process formula (EPF) to quantify the measurement for the sorting scheme applied in the recommendation mechanism. The EPF is listed in equation (1) followed. EPF(P) = PQoSD(P, P,time) +PQoSD(P, P, cost)
(1)
+PQoSD(P, P, reliability) where P : the flow, P : the average flow, time: the execution time, cost: the cost, reliability: implementation reliability. To calculate the QoS difference between flow P and the average QoS, we used two equations (eq. 2 & 3). (P − P)/P increase bene f it (2) PQoSD(P, QoS, dim) = −(P − P)/P decrease bene f it j
P=
∑i=1 Pi j
(3)
where QoSdim represents different QoS attributes. Pi indicates ith flow, j indicates the number of flow. The EPF(P) is used to compare the current workflow with the average workflow. If a flow has the EPF(P) higher than average flow, the flow will obtain higher ranking in the recommendation list.
4 Case Study and Evaluation 4.1 Case Scenario To verify the feasibility of this framework, we adopted a case study from the biological domain which requests the services of analyzing the information of gene
Building a Novel Web Service Framework
131
Fig. 6 The gene sequence analyzing process.
sequences (see Figure 6). The diagram presents the workflow of analyzing gene sequences which is used in the tool. In the diagram, 90 WS components under 10 functional categories are available to serve users’ requests for the analysis process of biological gene sequences. Users request the services of analysis function through the tool’s GUI. The tool allows users to operate several functional keys in cooperating with the workflow that is used to trigger the process of automatic WS composition. After users issuing their requests in the format of functional words, our tool will match corresponding workflow and enter the process of service composition automatically.
4.2 Experimental Delimitation Thirty participants with biological knowledge background were selected to test the tool and to join the evaluation session. The participants accessed WS query information and the results were collected for the tool evaluation. To verify the effectiveness when applying our framework to a special domain, we compared the results generated from our tool with the results generated from a general UDDI WS composition. The evaluation of result accuracy is based on two measurement indexes, i.e. composition time and precision value. The main purpose is to illustrate the reusability of similar workflow and the time saved when applying our tool to a specific domain. The evaluation indexes are described as followed. • Composition time: measuring the time spent on composing the whole WS requests. • Precision value: measuring the relevant items discovered by the tool. The value illustrates the number of relevant flow recommended by the automatic flow composition. The precision value is measured in equation (4). precision =
Numbers o f f lows retrieved Total number o f f lows
(4)
132
H.-C. Wang, W.-C. Chang, and C.-s. Wu
Moreover, we adopted Likert’s five-point rating scale (1 for very unsatisfied, 5 for very satisfied) for the participants’ evaluation on system operation, execution time, and the satisfaction of recommended flow composition. The subjective data are also used to verify the effectiveness of the tool prototype. Three factors (usability of system interface, execution time, and the satisfaction of recommendation) were studied through the data analysis.
4.3 Experimental Results To investigate the effectiveness of our framework, we applied the tool to the bioinformatics domain using the WS components described in table 1. The composition time is presented in table 1. The precision of composing the WS components from the tool is illustrated in Figure 7. The detailed information of the results is described as followed. Table 1 The composition time of completing an automatic WSs composition.
hhh hhhh The number of functions hhh hhhh 2 3 4 5 6 hhh Composition time (second) hh h Blow 50 50-250 250-450 450-650 Above 650
28 0 2 0 0
21 7 0 2 0
13 15 0 1 1
6 22 0 0 2
8 20 0 0 2
In table 1, the composition time of completing an automatic WSs composition is presented. The number of participants against spending time is listed while operating our tool to compose necessary functional components. The data showed that over 94% (170/180) of the participants spent less than 250 seconds to complete the composition for all composition combinations (2 functions to 6 functions). The reason that the 6% participants operated more than 250 seconds is due to the lack of workflow abstraction in the beginning of the operation. It led to the difficulty of using workflow without reusable workflows to be referred to during the composition process. The results illustrated that reusing different abstract-level workflow can effectively reduce the time of composing the process. The next evaluation is to analyze the composing time between a semi-automatic composition tool and our automatic composition tool against different functional numbers. The average composing time against different functional numbers is illustrated in Figure 7. In the diagram, when the users adopted the semi-automatic tool, the time spending curve is raised dramatically when the functional number is increased from 5 to 6. On the other hand, the composing time of using our tool maintains a steady and smooth curve. This indicated that the growth rate of composing functional components will rise when the functional number is increased. This is due to the
Building a Novel Web Service Framework
133
Fig. 7 The composition time of two tool usage.
Fig. 8 Automatic composition precision. Table 2 The satisfaction of using automatic WS composing tool. Scale 1 2 3 4 5 Number of Participants 0 1 3 15 11 Percentage(Total: 30 participants) 0% 3.3% 10% 50% 36.7% Average evaluation 4.2
complexity of composing high functional number of service requests. In respect to the precision rate of the composition, the main objective is to evaluate the number of recommended solutions that satisfy users’ requests. The measurement equation is listed in eq. 1; the results are presented in Figure 8. In the diagram, most of the precision value is higher than 0.7. To further analyze the data, we found that the reason to cause some low precision values is due to the low number of recommendation workflows. For example, if the tool recommended 5 workflows and only 2 workflows matched users’ requirement, the precision value is 0.4; therefore, we have low
134
H.-C. Wang, W.-C. Chang, and C.-s. Wu
precision value. To cause the unsuited recommendation is due to the decision between time and reliability (the conflict characteristic of the two objectives). It is the optimization issue of conflict objectives which is beyond the study of this paper. To investigate the satisfaction of using our tool, we adopted the Likert’s five-point rating scale (1= Strong Dissatisfy, 2= Dissatisfy, 3=Neural, =Satisfy, 5= Strong Satisfy) for the users to evaluate the usage of our tool. The evaluation focuses on the tool operation, execution time, and the workflow recommendation. The results are presented in Table 2. In the table, more than 86% of participants are satisfied with our tool design. It illustrated the effectiveness and usefulness of our tool design.
5 Conclusion The appearance and development of the network technologies have prospered the WS usage extensively in-between enterprises. Therefore, how to manage the WSs and to recommend the most suitable WSs composition to users according to their demands is urgently required. In this paper, we proposed a framework with systematic structure to match the requirements. When providers publish their service components, we can classify the services based on their functionality. This reduces the time of searching required service components. Moreover, a user-friendly interface development simplifies the complexity of using the system. Users can easily operate the system without special knowledge or skill. Next, to solve the communication and interoperability problems when composing service components together, we applied composability and match functions to automatically select the most suitable combination. Through the tool implementation of framework, we can solve the problem of composing WS components dynamically. This is the main achievement of our study. A real world case study, 30 participants with related knowledge background was invited to join the study and the experimental data were collected. Our tool reduced the composing time compared to the results of using a semi-automatic tool. Moreover, the precision value of recommended compositions is greater than 0.7 in most cases. Finally, the satisfaction of tool usage is as high as 87% of all participants. From the experimental results, we have demonstrated the effectiveness and usefulness of our framework through the implementation of an automatic tool design. Although the framework we proposed performed well in composing service components automatically, several issues can be further discussed in order to improve the performance of the composition framework. First, the functionality classification proposed in our framework is based on their description. However, the difference between function and word similarity can cause the mismatch selection of service components. This also caused the minor loss of precision rate in our experiment. Next, we do not implement the fault handling mechanism when a fault binding point is selected. A secondary selection can be used to replace the false one when such condition occurred. In summary, the framework we proposed here is effective and useful to compose WS for the bioinformatics domain. Our framework is more comprehensive than related service composition methodologies aforementioned. It can be easily used to construct an automatic composing tool for enterprise WS requests.
Building a Novel Web Service Framework
135
References 1. Papazoglou, M.P., Traverso, P., Dustdar, S., Leymann, F.: Service-Oriented Computing:State of the Art and Research Challenges. In: Computer, pp. 64–71. IEEE Computer Society, Los Alamitos (2007) 2. Li, Y., Lu, X., Chao, K.-M., Huang, Y., Younas, M.: The realization of service-oriented e-Marketplaces. Information Systems Frontiers 8 (September 2006) 3. World Wide Web Consortium, Web Services Architecture (2004) 4. Dustdar, S., Schreiner, W.: A survey on web services composition. International Journal of Web and Grid Services 1, 1–30 (2005) 5. Majithia, S., Walker, D.W., Gray, W.A.: A Framework for Automated Service Composition in Service-Oriented Architectures. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 269–283. Springer, Heidelberg (2004) 6. Huang, G.Q., Huang, J., Mak, K.L.: Agent-based workflow management in collaborative product development on the Internet. Computer-Aided Design 32, 133–144 (2000) 7. Wang, H., Huang, J.Z., Qu, Y., Xie, J.: Web services: problems and future directions. In: Web Semantics: Science, Services and Agents on the World Wide Web, April 2004, vol. 1, pp. 309–320 (2004) 8. Orri¨ens, B., Yang, J., Papazoglou, M.P.: Model Driven Service Composition. In: Orlowska, M.E., Weerawarana, S., Papazoglou, M.P., Yang, J. (eds.) ICSOC 2003. LNCS, vol. 2910, pp. 75–90. Springer, Heidelberg (2003) 9. Sycara, K., Paolucci, M., Ankolekar, A., Srinivasan, N.: Automated discovery, interaction and composition of Semantic Web services. In: Web Semantics: Science, Services and Agents on the World Wide Web, vol. 1, pp. 27–46 (2003) 10. Cardoso, J., Sheth, A., Arnold, J., Kochut, K.: Quality of service for workflows and web service processes. In: Web Semantics: Science, Services and Agents on the World Wide Web, April 2004, vol. 1, pp. 281–308 (2004) 11. Fox, C.: A Stop List for General Text. SIGIR Forum 24, 19–35 (1989) 12. Li, Y., Lu, L., Xuefeng, L.: A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in E-Commerce. Expert Systems with Appliations 28, 67–77 (2005)
An Architecture for Collaborative Translational Research Utilizing the Honest Broker System Christopher Gillies, Nilesh Patel, Gautam Singh, Ishwar Sethi, Jan Akervall, and George Wilson
Abstract. Translational biomedical research involves a diverse group of researchers, data sources and data types voluminous quantities. A collaborative research application is needed to connect these diverse elements. Two approaches are used to integrate biomedical research data; a central data warehouse approach in what all the information is imported into it and a mediator approach which integrates the data on demand. We propose the Beaumont BioBank Integration Management System (BIMS), a collaborative research architecture based on the mediator approach utilizing the Honest Broker System. Our system provides collaborative, flexible and secure environment capable of accelerating biomedical research.
1 Introduction Translational biomedical research is the application of a discovery to the practice of medicine; a transfer of research from “bench to bedside”. The vision for translational research is personalized medicine, see Figure 1. Personalized medicine focuses on giving treatments to patients based on detailed knowledge of their individual genotype and disease characteristics. At the heart of personalized medicine are biomarkers generated from genomic and proteomic studies. A biomarker is a gene, a set or genes or gene products such as proteins. The process of translational research involves a diverse group of researchers and data sources including nurses, scientists, clinicians and biostatisticians. This diversity leads to a slow, ad hoc process where progress depends on completion of tasks by individuals in a sequence of events and leads to dispersion and lack of control over information A strong Christopher Gillies, Nilesh Patel, Gautam Singh, and Ishwar Sethi Oakland University, Department of Computer Science and Engineering e-mail: {cegillie,npatel,singh,sethi}@oakland.edu Jan Akervall and George Wilson William Beaumont Hospital, Research Institute BioBank e-mail: {jan.akervall,george.wilson}@beaumont.edu G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 137–146. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
138
C. Gillies et al.
collaborative research information technology platform that securely integrates biospecimen, clinical, laboratory analysis and molecular data is needed to accelerate translational research. This poses several problems as each aspect of the translational research process utilizes different platforms which do not normally communicate with each and which have their own individual issues and considerations. For instance, clinical data is available from a variety of sources including medical records and pathology reports, and these may also be incorporated into electronic health records; these pose significant patient confidentiality issues that need to be overcome to enable the integration with research data seamlessly. Laboratory analysis data comes from high-throughput molecular instruments such as DNA microarrays and mass spectrometry. These produce different types of data with individualized analytical requirements. To expedite translational research applications have been developed, to combine data from multiple data sources into one centralized area. The centralized area creates a global view of the data sources; this process being data integration. The process of data integration allows data mining to occur on the integrated data sets. Two approaches have been used for data integration in the translational bio-medical research domain data warehouses, and mediators. A data warehouse is a central repository that has all the data sources used imported into it and is updated periodically. A mediator answers queries by integrating data ondemand. Data warehouses are faster than mediators however they can contain stale data because of periodic updates. In this paper we will: 1) Discuss the requirements of a computer supported, collaborative work system for biomedical research. 2) Discuss systems that have been made to assist with translational research. 3) Explain our architecture and workflow for the Beaumont BioBank Integration Management System (BIMS). 4) Compare our architecture with other systems. 5) Elaborate on future areas of research that will improve the system. The BIMS is secure, collaborative, and customizable translational application that will expedite translational research.
2 Computer Supported Collaborative Work Systems for Biomedical Research Biomedical research is a complex, multidisciplinary, collaborative process that takes an enormous amount of time to complete. In this section we will discuss the requirements defined by Stark et al. in [1] of a Computer Supported Cooperative Work (CSCW) system then we will describe two integration platforms which meet these requirements to varying degrees. After a description of these biomedical integration tools we will discuss the Honest Broker (HB) system defined and implemented by Boyd et al. in [2 and 3] which is central component of the BIMS. The seven requirements defined by Stark et al. include: 1. User and Role Management. A diverse group of individuals will be using the system, therefore specific roles must be defined and access rights must be set for each role.
An Architecture for Collaborative Translational Research
139
Fig. 1 Translational Research Vision.
2. Transparency of physical storage. Data is spread across many systems this complexity of the data must be hidden from the researchers. A CSCW should be based on a Service Oriented Architecture for maximum flexibility. 3. Flexible data presentation. The multidisciplinary nature of the CSCW creates a need for customizable data presentation of each user role. 4. Flexible integration and composition of services. Investigators use many analysis tools for biomedical these tools need to be available through the CSCW. 5. Support of cooperative functions. Collaborative functions such as wikis, discussion boards, or blogs should be available for researchers to increase communication. 6. Data-coupled communication mechanisms. Data objects such as images should be compatible with communication mechanisms such as the discussion board. 7. Knowledge creation and knowledge processing. If a hypothesis is verified through a medical study this knowledge may be used in subsequent research. Methods to formalize the acquired knowledge, and share it are necessary. An example of a data warehouse-based solution was developed at the University of Leipzig by Kirsten et al. in [4]. Two systems were used in the integration process a study management tool called eRN, and a data warehouse called GeWare. The eRN system was used to collect personal, clinical and pathological data from predefined web forms. GeWare had information imported from eRN, microarrays, laboratory annotations and publicly available gene/clone annotations. A mapping-table that contains a patient identifier and the chip identifier was used to link the information imported into GeWare. In addition, GeWare provides mechanisms to query laboratory analysis fields compared with clinical fields using standard Boolean operators. The query result identifies the chips, patients or genets that can be exported
140
C. Gillies et al.
for further analysis. Overall, Kirsten et al. developed a central repository which contained all the data from the various data sources. Data mining tools were then built against the GeWare central repository. A mediator based approach was developed as a part of the caBIG [5] project. To accelerate biomedical translational research the National Cancer Institute created the caBIG project. One tool developed under the caBIG umbrella for data integration is caIntergator2 [6]. CaIntegrator2 relies on other tools in order to operate namely, Bioconductor [7], caArray [8], caBIO [9], caGrid [10], GenePattern [11], and NBIA [12]. In the future caTissue [13], a biospecimen repository, will be a part of the caIntegrator2 workflow. CaIntegrator2 is a web based platform created in Java. The workflow for caIntegrator2 is as follows: 1) Perform gene expression analysis and upload this data to caArray (the microarray data repository). 2) Upload Images to National Biomedical Image Archive (NBIA); this step is optional. 3) Create a caIntegrator2 study, and define parameters. 4) Upload clinical data via a comma separated file to caIntegrator2 5) Add a link to caArray instance where gene expression files are stored. 6) Provide a mapping file between patients, in clinical fields, and gene expression data files uploaded in caArray. 7) Upload mapping and annotation file for NBIA 8) Perform analysis using Bioconductor, and GenePattern tools provided on the web interface. CaIntegrator2 queries caBIO, a data warehouse containing genomic and proteomic data from public repositories, for gene annotations. The advantage of the caBIG platform is national interoperability for cancer research. Biobanks that have a more general scope may have limited use with caIntegrator2 because of cancer only focus. Additionally, CaIntegrator2 lacks collaborative functionalities necessary for biomedical research. Protection of patient privacy is a fundamental principle of medicine. With the increasing ability to access patient information through the electronic health record and link it sensitive research data such as genotyping information, there is a need for a mechanism to regulate the linking of identifiable data elements of protected health information (PHI) (e.g., name and address) with data transfer between clinical and research systems. Boyd et al. in [2 and 3] developed the HB concept which handles the data transfer and health identifiers for all the systems within a healthcare integration network. Some of the goals of the HB include to facilitate data sharing, maintain a master patient index, deidentify data in conformance with HIPAA limited dataset and route data to preauthorized destinations. The HB acts as a mediator, so systems can interoperate with other each without needing to know any of the implementation details of each other. Every patient was tracked and managed by the HB. Web Services were used to encode and send the messages between the systems. Security is a major concern, first a SSL handshake takes place with dual authentication option. Authentication is done using X.509 certificates that are pre-installed on the participating systems. Both the HB and the external system must identify and authenticate each other in order to authorize the transmission. In addition traffic is restricted to specific IP addresses and data formats. Backups were stored in a physically secured offsite location. The HB stores no clinical data instead it acts as a router between various data sources. Boyd et al used the HB for homogenous integration we will use the HB for heterogeneous integration.
An Architecture for Collaborative Translational Research
141
3 Beaumont BioBank Integration Management System Architecture The Beaumont BioBank collects stores and analyzes human specimens for translational research purposes to discover and implement novel biomarkers for early detection of diseases. Tissue, blood, urine, saliva, and CSF samples are analyzed using genomic, proteomic and other laboratory platforms. In order to analyze correlations between biomarker analysis and clinical outcome data on a daily basis investigators need a software based interface that connects three critical software systems; BIGR [14] (biospecimen database), Crossbreak (clinical database) and laboratory analysis data. With a user-friendly web based interface researchers could potentially manage, and track the progress of the study in a real-time fashion. The overall goal is faster discovery of correlations that eventually will result in clinical tools for diagnosis and risk analysis on patients in a personalized manner. In this section we discuss the BIMS which was designed to fulfill the above requirements. First we examine the components that are a part of the BIMS and secondly we explain the workflow of the BIMS for a gene expression study. The BIMS architecture allows customization of components. Below is a list of the components involved with the system, see Figure 2 for a pictographic representation. 1. Honest Broker is the centerpiece of the entire platform. It will handle secure communication between all the components, and it will act as the master patient index for the platform. It is a simplified version of the HB system defined by Boyd et al. in [2 and 3]. In other words all the identifiers will be stored in the HB’s underlying database. The communication protocol that will be used for maximum flexibility will be SOAP Web Services. 2. BIMS Web Application (BIMSWA) will securely communicate with the other components of the system via the HB. This component will be the integration hub for the research platform. Collaboration tools will be available for researchers such as a discussion board for each study. The BIMS system will handle the roles for the diverse group of users of the system. 3. Study Database (SD) is a repository that will store study information, users, and all the laboratory analysis data such as gene expression, and SELDI-MS data. The data format and management will follow the NCBI GEO [15] and NCBI Peptidome [16] structure. 4. Biospecimen Repository(BR) is the system that manages all the donor, samples, and barcodes for the Biobank. For our reference version we are using a commercial tool called BIGR from Healthcare IT, however other biospecimen repositories could be used (such as caTissue). 5. Institutional Review Board Application (IRBA) is a software tool that manages study information for the Institutional Review Board (IRB). When a study is given approval by the IRB and it has been input into the IRBA, a request will be sent to the HB that inserts the identifier for the IRB study identifier into the SD.
142
C. Gillies et al.
Fig. 2 The components involved with the BIMS.
6. Clinical Electronic Data Capture System (CEDC) will store all the clinical parameters for the patients involved with the studies. Specifically, our implementation we will be using an in house developed tool called Crossbreak. Like the BR, another EDC could be used such as the Velos system [17]. 7. NCBI Entrez Web Services. [18] The HB will query the Entrez web services to collect biological annotations for the analysis results. 8. R Bioconductor [19 and 7] will perform preprocessing, normalization, clustering, classification and graphical analytics. Scripts will be available on the R-server accessed by the HB. 9. NCBI GEO [15] will be the public submission repository for microarray data after a study is finished. 10. NCBI Peptidome [16] will be the public submission repository for proteomic data after a study is finished. Below is a workflow for a gene expression study based on the workflow defined by Eder et al. in [20 and 21]. In the future, our system will support many other analysis platforms such as the surface-enhanced laser desorption/ionization mass spectrometer [22]. 1. Study Creation a study will be defined in the study manager portion of the BIMSWA and collection protocols for the samples involved in that study will be established. The collection protocol identifiers will be input into the BIMSWA. These collection protocols identifiers will provide the link between a study in the BIMSWA and the BR. After a study is defined in BIMSWA a request is sent to the HB system to insert this data into the SD. A study will then proceed to the sample collection phase.
An Architecture for Collaborative Translational Research
143
2. Sample Collection. Donor information, sample annotations and barcode information will be input into the BR. This step will repeat until enough samples have been collected to conduct a study. 3. Sample Selection. Once enough samples have been collected, an investigator will select a subset of samples to conduct the study. The selection process will be done through the BIMSWA. This selection process will trigger the HB system to insert database entries in the master patient index if it does not exist for the relevant donors of the samples. After these entries have been created a user will insert relative patient information in master patient index such as Medical Record Number (MRN). After the samples have been selected for the study, a request will be sent to the IRBA to ascertain the approval status of the protocol. If the study is approved then a request from IRBA will be sent to the HB system with the IRB study identifier. 4. Gene expression profile. An investigator will perform a gene expression analysis on the samples for the study. Next, the investigator will upload the gene expression matrix to the BIMSWA which will send a request with the data to the HB, and the HB will store the data in the SD. The gene expression matrix will have the sample identifier in the column headings, and the HB will use this sample identifier to link the gene expression data to a specific donor. 5. Link Probes to Gene Annotations and Ontologies. This will occur on demand after the gene expression profiles are created. The HB will query the NCBI Entrez web services to collect this information by extracting probe annotations from gene expression matrix. Initially we will be using the annotations provided by the microarray manufacturer. The annotations are extra columns that are provided in the gene expression matrix. An example of an annotation is a gene ontology (GO) identifier that can be extracted and queried against NCBI Entrez web services for more detailed gene information. 6. Link Donors to Patients. After the samples for the study have been selected, the BIMSWA will display a list of the donors for the selected samples. The MRN for the patient representing this donor will be input into the BIMSWA. The HB system will input this information into the master patient index database. A request will be sent to the CEDC for clinical data collection of the patients, which starts the clinical data collection phase. 7. Clinical data collection. The clinical data for the patients will be input into the CEDC. The investigators will decide which clinical fields to collect. After all the clinical data has been input HB system will insert the patient identifiers in the CEDC into the master patient index database. Now the HB has a link between the patient clinical information in the CEDC and the samples in BR. 8. Group Samples. The investigators will view information about the patients, samples, gene expression profiles and relevant gene information. Afterwards the researchers will form a hypothesis, and then group the samples based on clinical and annotation data. 9. Analysis. During this phase the preprocessing and normalization will occur and then analysis using R and the Bioconductor package which are hosted on the RServer. The investigators can access the R-scripts via the BIMSWA.
144
C. Gillies et al.
Fig. 3 BIMS workflow for a gene expression study.
10. Plotting. R and the Bioconductor’s plotting capabilities will be available for the investigators via the BIMSWA. 11. Publication and Data Submission. After the manuscript is created, the gene expression data and study details will be submitted to the NCBI GEO system.
4 Discussion The architecture that we propose is a secure, collaborative and flexible system that has many advantages versus the other cited systems. Security is maintained using SSL, role management, the HB, and following HIPAA limited dataset guidelines. Our system does not have a data warehouse; we will not suffer from the stale data problem from public molecular resources that can occur with data warehouse solutions such as the system defined in [4]. The disadvantage is that since the system will offer real time integration it could be slower than the data warehouse based solutions. Researcher communication will increase due to collaboration tools such as a discussion board for each study. All these capabilities will facilitate translational research and potentially lead more effective cures for patients. The flexibility of the system comes from the HB system. The HB system will require minimal recoding
An Architecture for Collaborative Translational Research
145
for different custom components utilized by other institutions. For example, if an institution is using Velos as a CEDC then only the mapping from the HB Clinical CEDC component to the Velos system will have to be modified for the system to function normally. The same is true for the BR. The BIMSWA portion is going to have all the data and tools available for the researcher to do the analysis form the Bioconductor. Our architecture fulfils the seven requirements defined by Stark et al. in [1]: 1. User and Role Management. Our system has a diverse set of roles for uses of the system that will provide security and customizable views. The roles in use by our system are administrator, biobank staff, principal investigator, coinvestigator, coordinator and honest broker role. The honest broker role is responsible for mapping the donor identifier to a patient medical record number. 2. Transparency of physical storage. The BIMSWA will provide views over the HB system which handles all the integration and data management. 3. Flexible data presentation. Each user role has a different perspective in the BIMSWA. The principal and co-investigator perspectives will have customizable views so only the data of interest is visible. 4. Flexible integration and composition of services. The HB system will access different services such as NCBI Entrez and the Bioconductor tools suite. 5. Support of cooperative functions. Each study in the BIMS system will have a study discussion board to promote communication between researchers involved with the study. Only researchers involved with the study can post to the discussion board. 6. Data-coupled communication mechanisms. The discussion board will provide a way for researchers to comment on the various data objects involved with a study. For example, comments on a graphical analysis created by R statistical package. 7. Knowledge creation and knowledge processing. The results of a study will be made available to all users of the system so subsequent research can be built upon existing studies.
5 Conclusion We propose a simple yet flexible architecture for the BIMS. This architecture is completely customizable that will fit into other institutions. This system will improve research efficiency and collaboration. By integrating the data repositories, analysis tools and collaborative capabilities into the BIMS we are fulfilling the requirements discussed by Stark et al in [1] and it is our belief that BIMS will accelerate translational research. Future areas of research will be automatic information extraction from electronic health record systems, application of advanced data mining techniques for biomarker discovery, and utilization of standard based communication formats.
146
C. Gillies et al.
References 1. Stark, K., et al.: GATiB-CSCW, Medical Research Supported by a Service-Oriented Colla-borative System. In: Bellahs`ene, Z., L´eonard, M. (eds.) CAiSE 2008. LNCS, vol. 5074, pp. 148–162. Springer, Heidelberg (2008) 2. Boyd, A., et al.: An ’Honest Broker’ mechanism to maintain privacy for patient care and academic medical research. I. J. Medical Informatics 76(5-6), 407–411 (2007) 3. Boyd, A., et al.: The University of Michigan Honest Broker: A Web-based Service for Clinical and Translational Research and Practice. J. Am. Med. Inform. Assoc. (16), 784– 791 (2009) 4. Kirsten, T., et al.: An Integrated Platform for Analyzing Molecular-Biological Data Within Clinical Studies. In: Grust, T., H¨opfner, H., Illarramendi, A., Jablonski, S., Mesiti, M., M¨uller, S., Patranjan, P.-L., Sattler, K.-U., Spiliopoulou, M., Wijsen, J. (eds.) EDBT 2006. LNCS, vol. 4254, pp. 399–410. Springer, Heidelberg (2006) 5. caBIG, https://cabig.nci.nih.gov/ (Accessed 12-15-2009) 6. caIntegrator2, https://cabig.nci.nih.gov/tools/caIntegrator2 (Accessed 12-9-2009) 7. Bioconductor, http://www.bioconductor.org/ (Accessed 2-16-2010) 8. caArray, https://cabig.nci.nih.gov/tools/caArray (Accessed 12-15-2009) 9. caBIO, https://wiki.nci.nih.gov/display/caBIO/caBIO (Accessed 2-15-2010) 10. caGrid, http://cagrid.org/display/cagridhome/Home (Accessed 12-11-2009) 11. GenePattern, https://cabig.nci.nih.gov/tools/GenePattern, (Accessed 2-18-2010) 12. NBIA, https://cabig.nci.nih.gov/tools/NCIA (Accessed 2-19-2010) 13. caTissue, https://cabig.nci.nih.gov/tools/catissuesuite, (Accessed 12-7-2009) 14. BIGR, http://www.healthcit.com/front/BIGR/ biobanking-and-biomaterials-management (Accessed 2-15-2010) 15. Gene Expression Omnibus, http://www.ncbi.nlm.nih.gov/geo/ (Accessed 2-18-2010) 16. Peptidome, http://www.ncbi.nlm.nih.gov/peptidome (Accessed 2-17-2010) 17. Velos, http://www.velos.com/products_eres_overview.shtml (Accessed 2-16-2010) 18. NCBI Entrez, http://www.ncbi.nlm.nih.gov/sites/gquery (Accessed 2-19-2010) 19. R, http://www.r-project.org/ (Accessed 2-17-2010) 20. Eder, J., et al.: Data Management for Federated Biobanks. In: Bhowmick, S.S., K¨ung, J., Wagner, R. (eds.) Database and Expert Systems Applications. LNCS, vol. 5690, pp. 184–195. Springer, Heidelberg (2009) 21. Eder, J., et al.: Information Systems for Federated Biobanks. In: T. Large-Scale Dataand Knowledge-Centered Systems I, pp. 156–190 (2009) 22. Bio-Rad ProteinChip SELDI, http://www.bio-rad.com/proteinchip (Accessed 3-5-2010) 23. Nagarajan, R., et al.: Database Challenges in the Integration of Biomedical Data Sets. In: VLDB, pp. 1202–1213 (2004)
Simulated Annealing in Finding Optimum Groups of Learners of UML Kalliopi Tourtoglou and Maria Virvou
1 Introduction The Simulated Annealing (SA) algorithm (Kirkpatrick et al. 1983) is a genetic algorithm that serves as a general optimization technique for solving combinatorial optimization problems. The local optimization algorithms start with an initial solution and repeatedly search for a better solution in the neighborhood with a lower cost. So, the locally optimal solution to which they result is this with the lower cost in the neighborhood. Simulated Annealing is motivated by the desire to avoid getting trapped in poor local optima, and hence, occasionally allows ”uphill moves” to solutions of higher cost, doing this under the guidance of a control parameter called the temperature (Johnson et al. 1989). SA has been applied successfully to many problems characterized by a large discrete configuration space, too large for an exhaustive search, over which a cost-function is to be minimized (or maximized) (Monien et al. 1995). Therefore, Simulated Annealing seems suitable for searching the optimum groups of learners, as there is a large search space: there are many students with various characteristics that should be combined in various ways into groups. The Simulated Annealing algorithm has already been applied in several studies concerning a variety of sciences, such as statistics, biology (Wilson and Cui 1990), biomedicine (Czaplicki et al. 2006), graphics (Monien et al. 1995), (Davidson and Harel 1996), engineering (Cagan et al. 1997), computer science (Aerts et al. 2003), (Snchez-Ante et al. 2000), (Terzi et al. 2004), design of circuits (Yao and Kanani 1995) and scheduling (Anagnostopoulos et al. 2006), (Zhou et al. 2006), (Kazem et al. 2008), (Kazarlis 2005), but not in a collaborative learning system like AUTO-COLLEAGUE. AUTO-COLLEAGUE (AUTOmated COLLaborativE leArning Uml Environment) is a Computer-Supported Collaborative Learning (CSCL) system for UML. CSCL systems are learning software environments that Kalliopi Tourtoglou and Maria Virvou Department of Informatics, University of Piraeus, 80 Karaoli & Dimitriou St., 18534 Piraeus, Greece e-mail:
[email protected],
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 147–156. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
148
K. Tourtoglou and M. Virvou
allow distant users to collaborate with each other during an effective learning process. Computer-supported collaborative learning (CSCL) is one of the most promising innovations to improve teaching and learning with the help of modern information and communication technology (Lehtinen et al. 1999). In AUTOCOLLEAGUE, learners are grouped into teams by the trainer of the system and work collaboratively while trying to solve UML tests. One of the main functionalities of the system is that it suggests to the trainer optimum combinations of learners into groups taking into consideration their level of expertise and personality characteristics. There are, also, other systems that propose group formation, such as a Group Formation Tool (Christodoulopoulos et al. 2007), I-MINDS (Soh L-K. 2004), a Mathematical Approach (Graf et al. 2006), OmadoGenesis (Gogoulou et al. 2007), TANGOW (Martin et al. 2004) and GroupMe (Ounnas et al. 2009). These systems use Fuzzy C-means, Ant Colony Optimization, Genetic and Random Selection algorithms and are based mainly in forming heterogeneous groups.
2 Description of the System AUTO-COLLEAGUE is a collaborative learning system for UML. It is a multiuser environment where trainees/learners login via the network. They are organized into groups and try to solve problems/tests on UML. They can collaborate with each other through a chat system in order to either simply communicate or help each other. There is, also, another user in the system: the trainer. The role of the trainer is to observe the actions and performance of the trainees, adjust specific crucial for the system settings and make the arrangements of groupings according to the advice of the system. The core of the system is the student-modelling component that is responsible for tracing the learners’ characteristics to be evaluated by the optimum group generator. The student models are built using the perturbation modelling technique (Holt et al. 1994) and stereotypes (Rich 1983). The stereotypes are collections of user facets/characteristics. The stereotypes used in our system are classified in two categories: the Level of Expertise and the Personality. The Level of Expertise describes the level of knowledge on the domain that is UML. The stereotypes of this category are: Basics, Junior, Senior and Expert. The Personality stereotypes refer to the personality characteristics of the learners. We have used the taxonomy of the five-factor model of personality (Norman 1963), from which we chose to use the personality traits that would be more suitable for an educational environment in which students work into groups. In specific these stereotypes are: participative, willing to help, diligent, sceptical, efficient, hurried, unconcentrated and self-confident. The system can suggest optimum groups of learners (via the form illustrated in figure 1) after running the simulated annealing algorithm taking into consideration the two criteria described in the next section (stereotype combinations and groups’ structure). At the left part of this form, the suggested by the system groups are listed in hierarchical tree view, where the roots are the teams. At the right part of the form, an evaluation report is shown. In this report, 4 evaluation characteristics of the current group suggestion are listed. Failed Combinations are
Simulated Annealing in Finding Optimum Groups of Learners of UML
149
Fig. 1 Groups Building Form (Suggested Groups).
the number of the combinations between learners that that the system should not make according to the searching criteria. The Failed Groups refer to the number of the groups that Failed Combinations are included. In similar way, Successful Combinations state the number of successful combinations between learners according to the searching criteria. Successful Groups refer to the number of the groups that Successful Combinations are included. The existence of failed combinations may be the outcome of failure in the search algorithm, but the most common reason is the lack of available students with characteristics that would fit in the searching criteria. However, the trainer can manually change the formation of the groups by adding, deleting or moving the trainers after consulting their individual models.
3 Criteria for Finding Optimum Groups of Learners The criteria for finding optimum groups of learners include the stereotype combinations and the groups’ structure. This means that the search algorithm will trace the values of these criteria to result to a solution. The trainer of the system parameterizes them. The stereotype combinations are categorized into desired combinations and undesired combinations. The desired combinations between stereotypes are those stereotype combinations that the trainer considers them as effective to coexist in groups. For example, the desired combination ”Sceptical-Hurried” means for the system that it should pair in the same group learners that belong to these stereotypes. On the other hand, undesired combinations are those stereotype combinations that should not coexist in the same groups. For example, if ”ScepticalUnconcentrated” is defined as an undesired combination, the system will try not to
150
K. Tourtoglou and M. Virvou
Fig. 2 Form of Defining the Criteria Related to Stereotypes.
pair together learners assigned to these stereotypes in the same group. The trainer can define these combinations in the form illustrated in figure 2. The data presented in this figure is the predefined in the system stereotype combinations. They are the results of an empirical study (Tourtoglou and Virvou 2008) conducted with experienced trainers in order to find the most effective pairs of stereotypes to avoid and to aim at. The structure of the groups describes the kind and the number of roles that consist each group. A role reflects the status (related to the level of expertise) of a student in a group. The form of defining the groups’ structure and roles is illustrated in figure 3. Roles are shown at the lower left part of the form. Every role is assigned with a level of expertise. The groups are defined at the upper left part of the form and the structure for each group at the right part. The predefined roles assigned are: Junior Student, Senior Student and Expert Student. Each of these roles is associated with specific levels of expertise. The levels of expertise describe the degree of knowledge of the trainees on the UML domain. The trainer can define the structure of the groups as far as these roles are concerned using the form shown in figure 3. The first step is to add teams and, then, to define the types and roles for each of these teams. For example, Team 1 must be consisted of 2 Junior Students, 1 Senior Student and 1 Expert Student. This structure will be used as criteria for the searching algorithm of optimum groups formation.
4 Use of Simulated Annealing 4.1 Theoretical Background The search for the best solution is implemented using the Simulated Annealing (SA) algorithm (Kirkpatrick et al. 1983), which is a genetic algorithm that serves as a general optimization technique for solving combinatorial optimization problems. The
Simulated Annealing in Finding Optimum Groups of Learners of UML
151
Fig. 3 Form of Defining the Criteria Related to Groups’ Structure.
local optimization algorithms start with an initial solution and repeatedly search for a better solution in the neighborhood with a lower cost. So, the locally optimal solution to which they result is this with the lower cost in the neighborhood. Simulated Annealing is motivated by the desire to avoid getting trapped in poor local optima, and hence, occasionally allows ”uphill moves” to solutions of higher cost, doing this under the guidance of a control parameter called the temperature (Johnson et al. 1989). The temperature is used in the acceptance probability that the algorithm evaluates to decide if a solution is acceptable. The initial value of the temperature is high and then reduced during the progress of the algorithm. There are, also, two maximum limits of repeats of the algorithm without finding a better solution per temperature value. The first limit indicates that the temperature must change. The second limit represents the termination criterion for the algorithm. The termination criterion can also be: temperature=0, instead of using a limit. The initial value of the temperature and its changes are controlled by the so-called cooling schedule/strategy. The cost of each solution generated is calculated by the objective/cost function. The flow chart for the process of finding optimum groups of learners based on SA is illustrated in figure 4. The first step of the algorithm is to start with an initial solution. Then, this solution is evaluated using an objective/cost function. If the cost of the new solution is lower than the cost of the current solution, then the current solution is updated to the new solution. If not, then an additional criterion is applied based on the probability p = exp(−(δ f )/T ), where δ f is the difference between the costs of the new solution and the initial solution. If p is larger than a random number between 0 and 1, then the current solution is updated to the new solution. Then, the algorithm will be repeated to this point until the temperature is to be changed (according to the cooling schedule). The next step will be to change the temperature
152
K. Tourtoglou and M. Virvou
Fig. 4 Flow chart for finding optimum groups based on SA.
and decide whether the search should be terminated according to the termination criteria. If the search should not be terminated, then the algorithm is repeated. In order to apply the SA algorithm, it is necessary to define the configuration space, the method of finding the neighborhoods, the objective function and the cooling schedule/strategy.
4.2 Configuration Space The configuration space is the set of possible solutions. In our case, the possible solutions are all the possible organizations of the students into groups that satisfy the criteria related to the defined groups’ structures, the desired and undesired combinations between user stereotypes. If G is a finite set of the groups and U is a finite set of the students, then the solution space is the finite set O of P(GXU).
4.3 Finding Neighborhoods The method of finding the neighborhoods concerns the way the next solution is calculated. In our implementation, there is a generator of random changes/perturbations in the combinations of students to form groups. This means that the next solution (the neighbor) will be the current solution with a random change/perturbation. The initial solution given by the generator is a random grouping of the learners. It organizes all the learners into groups according to the groups’ structure (explained in
Simulated Annealing in Finding Optimum Groups of Learners of UML
153
section 3) defined by the trainer of the system. For example, if there are 6 defined teams with specific roles assigned to them, the generator will try to form 6 groups of learners with these roles. However, there may not be the adequate number of learners to meet the requirements of the groups’ structure criterion in the configuration space. In this case (which is the usual one), the generator will place the appropriate learners to the groups it can and, then, complete the rest of the groups’ members randomly (without considering their role). Every next solution is generated after a random change (of the current solution) in the group membership of two learners randomly selected. This random change is, also, called perturbation of the current solution.
4.4 Objective/Cost Function The objective function refers to the method of evaluating the cost of the solution. The result of this function expresses how much it would cost to follow a solution. The greater the cost is, the more disadvantageous the solution will be. Therefore, the algorithm condition of accepting a solution is satisfied when its cost is lower than the cost of the current solution. In our implementation, the objective/cost function returns an evaluation degree of the solution and is defined as: fSC (x) − fFC (x) fTC (x) : P(GXU) → N, fFC : P(GXU) → N, fTC : P(GXU) → N fcost : P(GXU) → [−1, 1], fcost (x) = −
fSC
(1)
where x is the solution, fSC (x) returns the number of successful combinations of the solution x, fFC (x) returns the number of failed combinations of the solution x and fTC returns the total number of combinations of the solution x. The successful/failed combinations are the combinations that are/are not in line with the desired combinations of stereotypes and the role structure of the groups defined by the trainer (section 3). The total number of combinations is not the sum of the successful and failed combinations, as the solution may include combinations that are neither successful nor failed. The result of fcost (x) is a real number between −1 and 1. For example, if in the solution x there have been made 8 successful and 3 failed combinations out of a total of 12 combinations, then the result of the fcost (x) will be calculated as: fcost (x) = −
8−3 = −0.41 12
(2)
4.5 Cooling Schedule The cooling schedule/strategy is very important for the efficiency of the algorithm and is related to the definition of an initial temperature T for the algorithm and the ways of decreasing it during the searching. The most simple and commonly used cooling schedule is the exponential. According to it, Ti+1 = a.Ti , where a is a
154
K. Tourtoglou and M. Virvou
constant, usually selected to be between 0.5 and 1. We have chosen to use a = 0.9. A method commonly used for determining the initial value of the temperature is by calculating the formula T0 =
−δ f ∗ ln(p0 )
(3)
where T0 is the initial temperature, δ f + is the average increase in cost for a number of random transitions (in our case random rearrangements of students into groups), and p0 is the initial acceptance probability. A usual value used for p0 is 0.8. For 15 random solutions, we found that the average increase in cost δ f + was approximately 4. Therefore, T0 was calculated to be approximately 18.
5 Conclusions In this paper, we described how we have applied the Simulated Annealing algorithm for finding optimum groups of learners in AUTO-COLLEAGUE that is a collaborative learning environment for UML. AUTO-COLLEAGUE is a CSCL system implemented using student modelling based on stereotypes and the perturbation modelling technique. The stereotypes used describe the level of expertise and the personality of the students and are traced constantly as the trainers use the system.
References [Aerts et al.] Aerts, J.C.J.H., van Herwijnen, M., Stewart, T.: Using Simulated Annealing and Spatial Goal Programming for Solving a Multi Site Land Use Allocation Problem. In: Evolutionary Multi Criterion Optimization, pp. 448–463. Springer, Heidelberg (2003) [Anagnostopoulos et al.] Anagnostopoulos, A., Michel, L., Hentenryck, P.V., Vergados, Y.: A simulated annealing approach to the traveling tournament problem. J. of Scheduling 9(2), 177–193 (2006) [Cagan et al.] Cagan, J., Clark, R., Dastidar, P., Szykman, S., Weisser, P.: Hvac Cad Layout Tools: A Case Study of University/Industry Collaboration. In: Proceedings of the Optimization in Industry Conference (1997) [Christodoulopoulos &, Papanikolaou] Christodoulopoulos, C.E., Papanikolaou, K.A.: A Group Formation Tool in an E-Learning Context. In: Proceedings of the 19th IEEE international Conference on Tools with Artificial intelligence, vol. 02 (2007) [Gogoulou et al.] Gogoulou, A., Gouli, G., Boas, E., Liakou, E., Grigoriadou, M.: Forming homogeneous, heterogeneous and mixed groups of learners. In: Proceedings of the Personalisation in E-Learning Environments at Individual and Group Level Workshop, in 11th International Conference on User Modeling, pp. 33–40 (2007)
Simulated Annealing in Finding Optimum Groups of Learners of UML
155
[Graf & Bekele] Graf, S., Bekele, R.: Forming Heterogeneous Groups for Intelligent Collaborative Learning Systems with Ant Colony Optimization. In: Ikeda, M., Ashley, K.D., Chan, T.-W. (eds.) ITS 2006. LNCS, vol. 4053, pp. 217–226. Springer, Heidelberg (2006) [Czaplicki et al.] Czaplicki, J., Corn´elissen, G., Halberg, F.: GOSA, a simulated annealingbased program for global optimization of nonlinear problems, also reveals transyears. J. Appl. Biomed. 4, 87–94 (2006) [Davidson & Harel,] Davidson, R., Harel, D.: Drawing graphs nicely using simulated annealing. ACM Transactions on Graphics (TOG) 15(4), 301–331 (1996) [Holt et al.] Holt, P., Dubs, S., Jones, M., Greer, J.: The State of Student Modelling. In: Greer, J., McCalla, G. (eds.) Student Modelling: The Key To Individualized Knowledge-Based Instruction, pp. 3–35. Springer, Berlin (1994) [Johnson et al.] Johnson, D.S., Aragon, C.R., McGeoch, L.A., Schevon, C.: Optimization by simulated annealing: an experimental evaluation. Part I, graph partitioning. Oper. Res. 37(6), 865–892 (1989) [Kazarlis] Kazarlis, S.: Solving University Timetabling Problems Using Advanced Genetic Algorithms. In: 5th International Conference on Technology and Automation, ICTA 2005 (2005) [Kazem et al.] Kazem, A.A., Rahmani, A.M., Aghdam, H.H.: A Modified Simulated Annealing Algorithm for Static Task Scheduling in Grid Computing. In: Proceedings of the 2008 international Conference on Computer Science and information Technology, ICCSIT, August 29-September 02, pp. 623–627. IEEE Computer Society, Washington (2008) [Kirkpatrick et al.] Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220, 671–680 (1983) [Lehtinen et al.] Lehtinen, E., Hakkarainen, K., Lipponen, L., Rahikainen1, M., Muukkonen, H.: Computer supported collaborative learning: A review. The J.H.G.I. Giesbers Reports on Education, 10, Department of Educational Sciences, University on Nijmegen (1999) [Martin & Paredes] Martin, E., Paredes, P.: Using learning styles for dynamic group formation in adaptive collaborative hypermedia systems. In: Proceedings of the 4th International Conference on Web-engineering, Munich, pp. 188–198 (2004) [Monien et al.] Monien, B., Ramme, F., Salmen, H.: A Parallel Simulated Annealing Algorithm for Generating 3D Layouts of Undirected Graphs. In: Brandenburg, F.J. (ed.) GD 1995. LNCS, vol. 1027, pp. 396–408. Springer, Heidelberg (1996) [Norman] Norman, W.T.: Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. Journal of Abnormal and Social Psychology 66, 574–583 (1963) [Ounnas & Davis] Ounnas, A., Davis, H.C., Millard, D.E.: A Framework for Semantic Group Formation in Education. Educational Technology & Society 12(4), 43–55 (2009) [Rich] Rich, E.: Users are individuals: Individualizing user models. Journal of Man-machine Studies 18(3), 199–214 (1983) [S´anchez-Ante et al.] S´anchez-Ante, G., Ramos, F., Sol´ıs, J.F.: Cooperative Simulated Annealing for Path Planning in Multi-robot Systems. In: Cair´o, O., Cant´u, F.J. (eds.) MICAI 2000. LNCS, vol. 1793, pp. 148–157. Springer, Heidelberg (2000) [Soh] Soh, L.-K.: On Cooperative Learning Teams for Multiagent Team Formation. In: Technical Report WS-04-06 of the AAAI’s 2004 Workshop on Forming and Maintaining Coalitions and Teams in Adaptive Multiagent Systems, San Jose, CS, pp. 37–44 (2004) [Terzi et al.] Terzi, E., Vakali, A., Angelis, L.: A Simulated Annealing Approach for Multimedia Data Placement. The Journal of Systems and Software 73, 467–480 (2004)
156
K. Tourtoglou and M. Virvou
[Tourtoglou & Virvou] Tourtoglou, K., Virvou, M.: User Stereotypes for Student Modelling in Collaborative Learning: Adaptive Advice to Trainers. In: Virvou, M., Nakamura, T. (eds.) Proceeding of the 2008 Conference on Knowledge-Based Software Engineering: Proceedings of the Eighth Joint Conference on Knowledge-Based Software Engineering. Frontiers in Artificial Intelligence and Applications, vol. 180, pp. 505–514. IOS Press, Amsterdam (2008) [Wilson] Wilson, S.R., Cui, W.: Applications of simulated annealing to peptides. Biopolymers 29(1), 225–235 (1990) [Yao & Kanani] Yao, X., Kanani, N.: Call routing by simulated annealing. In: Forsyth, G.F., Ali, M. (eds.) Proceedings of the 8th international Conference on industrial and Engineering Applications of Artificial intelligence and Expert Systems, International conference on Industrial and engineering applications of artificial intelligence and expert systems, Melbourne, Australia, June 05-09, pp. 737–744. Gordon and Breach Science Publishers, Newark (1995) [Zhou et al.] Zhou, S., Liu, Y., Jiang, D.: A Genetic-Annealing Algorithm for Task Scheduling Based on Precedence Task Duplication. In: Proceedings of the Sixth IEEE international Conference on Computer and information Technology, CIT, September 20 - 22, p. 117. IEEE Computer Society, Washington (2006)
A Smart Network Architecture for e-Health Applications Abdellah Chehri, Hussein Mouftah, and Gwanggil Jeon
Abstract. Most conventional telemedicine systems use fixed wired LAN, cable, and other land line systems to transmit medical data or operations. As wireless technology becomes increasingly pervasive, e-health professionals are considering wireless networks for their mobile medicine systems with the advent of e-health care, a wide range of technologies can now be applied to provide medical care products and services. In this paper, we evaluate smart sensor network architecture for e-health applications. This architecture based on multiple complementary wireless communications access networks between the patient and the system, through UMTS, WiMax, and the Internet.
1 Introduction The world is facing problems to provide high quality healthcare services at a reasonable cost to the citizens due to the increasing percentage of graying population. As the average span of life increases, people at the age of 60 or older are the fastest growing population in the world. According to a study performed by United Nations, by 2050, nearly 2 billion people will be 60 and older, which represent 22% of the World’s population. With this demographic change, the prevalence of chronic conditions such as chronic respiratory and vessel diseases increases: the percentage of elderly at 60s and older having at least one chronic disease is more than 60 year olds [1]. The solution to decrease both the cost of healthcare services and also the load of medical practitioners requires a dramatic change in the way future healthcare services are provided [2]. The expected necessary changes are: moving from reactive to preventive medicine [3], [4]. Abdellah Chehri, Hussein Mouftah, and Gwanggil Jeon SITE, University of Ottawa, 800 King Edward Avenue, Ontario, K1N 6N5, Canada e-mail: {achehri,mouftah}@uottawa.ca,
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 157–166. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
158
A. Chehri, H. Mouftah, and G. Jeon
In order to improve the efficiency in medical healthcare, strong demands for the introduction of new wireless technology come from various parties such as medical societies as well as communications technology (ICT) industries. The recent advances in wireless technology have led to the development of wireless body area networks (WBAN), where a set of communicating devices are located around the human body [5]. The application and use of wireless devices and computer-based technologies in health care have undergone an evolutionary process. Advances in information, telecommunication, and network technologies have led to the emergence of a revolutionary new paradigm for health care that some refer to as e-health [6]. e-Health informatics is evolving quickly, driven by: • Rapid space of innovation in information and communications technologies and their increasingly pervasive deployment in biomedicine and healthcare. • Exponential growth of knowledge in the biomedical and clinical sciences. • Organizing and structuring of medical data as an accessible, up-to-date and coherent electronic resource. These systems use modern wireless communication and information technologies to provide clinical care to remote located individuals. With more research progresses in this field it will be possible to provide a better quality of life to patients while reducing healthcare costs. Enabling underlying infrastructures such as wireless medical sensor devices, wearable medical systems integrating sensors on body’s patient can offers pervasive solutions for continuous health status monitoring through biomedical, biochemical and physical measurements. Remote monitoring systems typically collect these patient readings and then transmit them to a remote server for storage and later examination by healthcare professionals [4]. In this paper, we analyze a smart architecture for e-health application. To this end, we propose a new mobile system to provide unconfined e-health services, one that combines WLAN (Wireless Local Area Network) with wireless body sensor networks (WBANs), and the Internet or UMTS (Universal Mobile Telecommunications System). Using this architecture, patients, doctors, and nurses could be empowered to receive or provide real-time, distant health care services. Both patients and providers would have the freedom to be anywhere in the world while sending, receiving, checking, and examining medical data in a timely fashion. The remainder of the paper is organized as follows. In Section II, we give a state of the art of e-health applications. The system architecture is formally described in Section III. The challenges were summarized in Section IV. Finally, we conclude the paper in Section V.
2 State-of-the-Art There are several issues that will lead to the use of e-health applications. A critical shortages of medical staff, an increase in chronic illnesses or medical conditions
A Smart Network Architecture for e-Health Applications
159
requiring continuous monitoring, a complex medical environment in which diagnostic errors can contribute to raise hospital admissions, increased healthcare costs and decreased access to healthcare providers [7]. E-health uses communication technologies to provide conduits for information exchange between patients and medical staff. For example, the recent development of remote patient monitoring systems using sensor networks techniques has eliminated wire lines and given patients the freedom to move around inside a hospital, clinic, or patient ward [8], [9]. In e-health applications, the patient’s body can be equipped with smart sensors. The collected signals from the sensors are transmitted via wireless connection to the nearest receiver, and then to access point or PDA (Personal Digital Assistant). The data are transmitted to the central nursing system (via GSM, UMTS or Internet). The introduction of wireless connections to exchanges sensor’s data could provide a great flexibility for both, patient and medical staff. This propriety will contribute to allow more mobility for the patient and more facility for the doctors during his intervention (i.e., surgical operations). So, this will be more comfortable to the patients as well as medical personnel in comparison to conventional wired sensors. Some patients undergoing medical monitoring need to move around for work, travel, or other outdoor activities. Existing healthcare systems will not permit this freedom of movement. Keeping these patients (for example, those with chronic diseases who need long-term health services) all the time is undesirable; thus, mobile medicine or monitoring can play a useful role [10], [11]. The requirements for reliability, flexibility and portability make wireless sensor technologies particularly attractive for e-healh applications [12]. Venkatasubramanian et al identified the following challenges for developing a health monitoring system using wireless sensor networks [13]: • • • •
Robustness, the ability of a system to be able to avoid service failures); Energy Efficiency; Real-time information Gathering; Dependability of the collected data. The data should be accurate, allowing medical staff to accurately diagnose the patient’s condition); • Security, the privacy and confidentiality of the data must be preserved. A smart homecare system can hold the essential elements of diagnostic used in medical facilities. It extends healthcare from traditional clinic or hospital settings to the patient’s home. A smart homecare system benefits the healthcare providers and their patients, allowing 24/7 physical monitoring, reducing the costs and increasing efficiency. Wearable sensors can notice even small changes in vital signs that humans might overlook [15]. There are several projects for e-health application and medical remote [16]. The following are the most relevant ones: • Code-Blue: is a wireless sensor network developed at Harvard University and intended to assist the triage process for monitoring victims in emergencies and disaster scenarios. The caregivers, using a query interface, can access data obtained from the sensors [17].
160
A. Chehri, H. Mouftah, and G. Jeon
• AMON: encapsulates many sensors (blood pressure, pulse oximetry, ECG, accelerometer and skin temperature) into one wrist-worn device that is connected directly to a telemedicine center via a GSM network, allowing direct contact with the patient [18]. • AlarmNet: continuously monitors assisted-living and independent-living residents. The system integrates information from sensors in the living areas as well as body sensors. It features a query protocol for streaming online sensor data to user interfaces [19].
3 Smart Network Architecture for e-Health Applications Our proposal design attempts to integrate several smart services to provide a scalable and interoperability-health solution. We describe the plat form below.
3.1 Access Techniques The main requirements for medical institutions when sharing information between different users (medical staff and patients) are reliability and accuracy. If any electronic data exists, an efficient way used for healthcare activities is having a stable and fast access to its resources. With mobile medicine, reliability and security are key technical considerations. The two widely used WLAN standards are IEEE 802.11 and Bluetooth. They are also being increasing implemented in many systems. The both standards operate in the unlicensed industrial, scientific, and medicine (ISM) band ranging from 2400 to 2483.5 MHz. a trend toward utilization of the 2.4 GHz frequency band can be observed, since it is license-free and available worldwide. Due to the extensive use of this frequency band, it is not possible to completely rule out a mutual influence of wireless systems operating in parallel. To reduce the interference, spread spectrum (SS) techniques are used. For example, Bluetooth uses frequency-hopping SS with 1600 hops per second, and it hops among 79 channels spaced at 1 MHz. While the IEEE 802.11 uses direct sequence SS for transmission. Bluetooth forms networks on an ad hoc basis and covers short-range cells of less than 10 meters; IEEE 802.11 covers cells up to 50 meters in size [9].
3.2 Telemedecine and Teleconsultation Telemedicine may be as simple as two health professionals discussing over videoconferencing equipment to conduct a real-time consultation between medical specialists in two different places or between doctor and patient. IP video eliminates the cabling, modulator and PTZ control costs. PoE (Power over Ethernet) can also be used to power IP cameras, which enables cameras to be located up to 100 meters from the nearest AC source. Sony PTZ IP video cameras cost approximately 1600$, which is less than half of the conventional method’s cost.
A Smart Network Architecture for e-Health Applications
161
IP video quality is dependent on lighting, encoding/compression scheme and the image update rate. MPEG-4 encoding provides a high quality image while using less than 1 Mbps bandwidth. As the image size and update rate are reduced the required bandwidth drops significantly so that several dozen low and medium quality images can be transmitted.
3.3 Body Sensor Networks In general, body sensor networks (BSNs) are wireless networks that support the use of biomedical sensors and are characterized by its (1) very low transmit power to coexist with other medical equipments and provide efficient energy consumption, (2) high data rate to allow applications with high QoS constraints,(3) low cost, low complexity and miniature size to allow real feasibility. BSN, unlike wired monitoring systems, provide long term and continuous monitoring of patients under their natural physiological states even when they move. The system allows unobtrusive ubiquitous monitoring and can generate early warnings if received signals deviate from predefined personalized ranges. Figure 1 illustrates the architecture of body sensor network. One patient is equipped with several sensors monitoring different parameters. A Body Sensor Network is made up of one or more body area networks and a base station. When the information has been gathered in the sensor network it is forwarded to this base station. The information is then received at a relay station and passed on through a backbone network. In the end, the information can be viewed at terminals or monitoring stations that are connected to the network. This system has the potential of making remote monitoring and immediate diagnostics a reality [20], [21], [22]. Sensors are heterogeneous, and all integrate into the human body. The number and the type of biosensors vary from one patient to another depending on the state of the patient. The most common types of biosensors are EEG “Electroencephalography” to measure the electrical activity produced by the brain, ECG “Electrocardiogram” to record the electrical activity of the heart over time, EMG “Electromyography” to evaluate physiologic properties of muscles, Blood pressure, heart rate, glucose monitor, SpO2 “Oxymeter” to measure of oxygen saturation in blood, and to mesure temperature of the body [23], [24]. As shown in the Table 1, according to the characteristics of physiological measurements or type of application services which can be real-time or non real-time with high or low rate. Table 1 Service Classification of Physiological Measurements. Type of Service ECG EEG, EOG, EMG Blood pressure, body temperature, heart rate, glucose monitor Medical image, X-ray, MRI
Data rate Latency High Low Low High
Class of Service
Low Real-time high rate Low Real-time low rate High Non real-time low rate High Non real-time high rate
162
A. Chehri, H. Mouftah, and G. Jeon
EEG
Hearing
Network
Vision
ECG
Glucose Blood pression
Spo2
Implants
Abstract view of the networks
Network on the body
Fig. 1 Body Sensor Network organization.
3.4 Overall System Design This system interfaces to healthcare providers, doctors, care-givers and the medical call centers (See Fig. 2) and also is integrated with a mobile platform for the occupant to remote control his home and another mobile platform for the doctors and nurses to view the state of their patients. Typically they will receive an alarm in case of a health problem of their patients. This figure shows that the system is integrated with vital sign monitoring devices and sensors. The system has possibility of integrating the system with the wireless devices and sensors with different communication methods such as Bluetooth, UMTS, WiFi, ZigBee, and other technologies. In this architecture considered in our work wireless communication standards with publicly available specifications, which are namely Bluetooth (IEEE 802.15.1.), WiFi (IEEE 802.11), WPAN (IEEE 802.15.4 / ZigBee) and wireless mesh networks. While comparing all wireless technologies we had to choose relevant criteria for an evaluation. The main criteria that should be considered in this architecture were robustness, range, energy consumption, availability, usability and security.
3.5 Bluetooth (IEEE 802.15.1.) Bluetooth technology permits devices to communicate with each other, synchronize data, and connect to the Internet at high speeds without wires or cables. A Bluetooth radio and baseband controller can be installed on a device that links to a Universal
A Smart Network Architecture for e-Health Applications
163
Medical call center
Mobile Platform
BSN
WiFi
UMTS/ WMN/ Internet Smart House
Ip video Bleuthooth Mobile Platform
ZigBee BSN
Router
UMTS Teleconsultation
Ip voice
Fig. 2 Overall system design.
Serial Bus (USB) port, integrated on a system board to add Bluetooth functionality to a computer or other host device. In our proposed system, the access pointeither the local area network or a Bluetooth-enabled mobile phone for remote area networks-is the master, and the Bluetooth-enabled medical instruments are slaves.
3.6 WiFi (IEEE 802.11) Wireless Local Area Networks are becoming increasingly popular due to their flexibility and convenience. Users can deploy WLANs to transmit data, voice and video within individual buildings, meeting rooms, warehouses, across campuses, metropolitan areas, etc. WLANs are being widely implemented in many venues from hospitals and airports to retail, manufacturing and corporate environments [7].
3.7 ZigBee (IEEE 802.15.4) ZigBee is a wireless automation networking standard based on an international standard (called IEEE 802.15.4 - similar to the 802.11 standards used for Wi-Fi networks). As we mention earlier, ZigBee systems use a peer-to-peer networking infrastructure, called mesh networking, to reach throughout the home. ZigBee provides a data rate of 250 Kbps, while using chips that are inexpensive to manufacture.
164
A. Chehri, H. Mouftah, and G. Jeon
3.8 Wireless Mesh Networks Wireless Mesh Networks (WMN) are believed to be a highly promising technology and will play an increasingly important role in future generation wireless mobile networks. WMN is characterized by dynamic self-organization, self-configuration and self-healing to enable quick deployment, easy maintenance, low cost, high scalability and reliable services, as well as enhancing network capacity, connectivity and resilience [14].
4 Challenges The introduction of wireless connections to exchanges sensor’s data could provide a great flexibility for both, patient and medical staff. In fact, wireless communication tends to be one of the major trends in medical applications to increase usability and comfort in the long time patient monitoring. The above requirements are slightly more general than those mentioned in [25]. They result in several challenges for the system platform which will have to be addressed when compared to the state of the art. The overall system architecture will have to deal with enormous device diversity. In order to guarantee successful execution of clinical decision support systems for long term monitoring of patients based on clinical practice guidelines, the integration, more importantly interoperability. As example, and in the case of BSN, there are many biomedical sensors devices available, and active research is going on for body platforms initial products of which will be soon in the market. The clinical decision support systems should be able to communicate with heterogeneous medical devices supplied by various different vendors. A huge amount of medical devices exist today. Some are equipped with wireless (Ultra-wideband or Bluetooth, future devices may also support emerging standards like Wireless USB) or wired interfaces (USB, serial interface).
5 Conclusion In this paper we described a smart architecture for healthcare applications, actually a monitoring device for patients requiring continuous remote monitoring treatments. The whole architecture of the system, based on wireless communication is discussed. A prototype system has been developed to help patients at home or outside. This architecture based on multiple complementary wireless communications access networks between the patient and the system, through UMTS, WMN, and the Internet. The overall system is expected to also help reduce the cost of the current healthcare service.
A Smart Network Architecture for e-Health Applications
165
References 1. Kubitscke, L.: The possible role of Demographic Change & Ageing, http://ec.europa.eu/informationsociety/events/phs2007/ docs/slides/phs2007-kubitschke-s2b.pdf 2. Fieldand, M., Lohr, K.: Guidelines for Clinical Practice: From development to use. Institute of Medicine, National Academy Press, Washington, DC (1992) 3. Tan, J.: E-Health Care Information Systems: An Introduction for Students and Professionals, 624 p., Hardcover, ISBN: 978 4. Nee, O., Hein, A., Gorath, T., Hulsmann, N., Laleci, G.: SAPHIRE: Intelligent Healthcare Monitoring based on Semantic Interoperability Platform - Pilot Applications. IEE Proceedings Communications Special Issue on Telemedicine and e-Health Communication Systems 5. Alasaarela, E., Nemana, R., DeMello, S.: Drivers and challenges of wireless solutions in future healthcare. In: International Conference on eHealth, Telemedicine, and Social Medicine (2009) 6. Nourizadeh, S., Song, Y.Q., Thomesse, J.P., Deroussent, C.: A Distributed Elderly Healthcare System. In: MobiHealth 2009, Porto, Portugal, January 16 (2009) 7. Zielinski, K., Duplaga, M., Ingram, D.: Information Technology Solutions for Health Care. Health Informatics Series. Springer, London Ltd. (2006) 8. Woodward, B., Rasid, M.: Wireless telemedicine: The next step?. In: Proceedings of the annual IEEE Conference on Information Technology Applications in Biomedicine (2003) 9. Hung, K., Zhang, Y.T.: Usage of Bluetooth in wireless sensors for tele-healthcare. In: Proceedings of the joint EMBS/BMES Conference, Houston, TX (October 2002) 10. Dobrescu, R., Dobrescu, M., Popescu, D.: Embedded Wireless Homecare Monitoring System. In: International Conference on eHealth, Telemedicine, and Social Medicine (2009) 11. Banitsas, K.A., Tachakra, S., Istepanian, R.: Operational parameters of a medical wireless LAN: Security, range and interference issues. In: Proceedings of the joint EMBS/BMES conference, Houston, TX (October 2002) 12. Ng, H.S., Tan, M.L., Wong, C.C.: Wireless Technologies for Telemedicine. Technology Journal 24 (2006) 13. Venkatasubramanian, K.D., Guofeng, M., Tridib, Q., Annamalai, J.: A Wireless Sensor Network based Health Monitoring Infrastructure and Testbed. In: Proc. of the IEEE Int. Conf. on Dist. Comp. in Sensor Sys., June/July 2005, pp. 406–407 (2005) 14. Zhang, Y., Luo, J., Hu, H.: Wireless Mesh Networking, Architectures, protocols and standards. Auerbach Publications (2007) 15. Stankovic, J., Doan, Q., Fang, L., He, Z., Kiran, R., Lin, S.: Wireless Sensor Networks for In-Home Healthcare: Potential and Challenges. In: Workshop on High Confidence Medical Devices Software and systems, HCMDSS (2005) 16. Jurik, A., Alfred, C.: Remote Medical Monitoring. IEEE Computer 41(4), 96–99 (2008) 17. Moulton, D., Fulford-jones, T., Welsh, M., Moulton, S.: CodeBlue: An Ad Hoc Sensor Network Infrastructure for Emergency Medical Care. In: MobiSys 2004 Workshop on Applications of Mobile Embedded Systems, WAMES 2004 (2004) 18. Anliker, et al.: Amon: A Wearable Multiparameter Medical Monitoring and Alert System. IEEE Transaction on Information Technology in Biomedicine 8, 415–427 (2004) 19. Alarmnet : alarmnet: Assisted-living and Residential Monitoring Network - A wireless Sensor Network for Smart Healthcare (2007), http://www.cs.virginia.edu/wsn/medical/
166
A. Chehri, H. Mouftah, and G. Jeon
20. Hongliang, R., Meng, M., Chen, X.: Physiological Information Acquisition through Wireless Biomedical Sensor Networks. In: Proceedings of the IEEE International Conference on Information Acquisition, June 27-July 3 (2005) 21. Blount, M.: Remote health-care monitoring using Personal Care Connect. IBM Systems Journal 46(1) (2007) 22. Baker, R., et al.: Wireless Sensor Networks for Home Health Care. In: 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW 2007). IEEE, Los Alamitos (2007) 23. Ben Slimane, J., Song, Y.Q., Koubaa, A., Frikha, M.: A Three-Tiered Architecture for Large-Scale Wireless Hospital Sensor Networks. In: MobiHealthInf 2009, pp. 20–31 (2009) 24. Gyselinckx, B., Van Hoof, C., Donnay, S.: Body area networks: the ascent of autonomous wireless microsystems, pp. 73–83. Springer, Heidelberg (2006) 25. Stankovic, A., Cao, Q., Doan, T., Fang, L., He, Z., Kiran, R., Lin, S., Son, S., Stoleru, R., Wood, A.: Wireless Sensor Networks for In-Home Healthcare: Potential and Challenges. In: Proceedings of Workshop on High Confidence Medical Devices Software and Systems, HCMDSS (2005)
A Music Recommender Based on Artificial Immune Systems Aristomenis S. Lampropoulos, Dionysios N. Sotiropoulos, and George A. Tsihrintzis
Abstract. In this paper, we address the recommendation process as a one-class classification problem based on content features and a Negative Selection (NS) algorithm that captures user preferences. Specifically, we develop an Artificial Immune System (AIS) based on a Negative Selection Algorithm that forms the core of a music recommendation system. The NS-based learning algorithm allows our system to build a classifier of all music pieces in a database and make personalized recommendations to users. This is achieved quite efficiently through the intrinsic property of NS algorithms to discriminate “self-objects” (i.e. music pieces of user’s like) from “non self-objects”, especially when the class of non self-objects is vast when compared to the class of self-objects and the examples (samples) of music pieces come only from the class of self-objects (music pieces of user’s like). Our recommender has been fully implemented and evaluated and found to outperform state of the art recommender systems based on support vector machines methodologies.
1 Introduction The recent availability of digital storage space in huge quantities led to the creation of large repositories of music files for use by broad classes of computer users. In turn, users are able to obtain instant access to extensive collections of music files. Clearly, users cannot manage these resources of music files properly, as the huge amount of data makes the time needed organize and search these databases prohibitively long. In turn, this fact gives rise to a need for systems that have the ability to automatically identify user information needs and help users to choose appropriate set of music files, when the users face difficulties or lack the knowledge to make such decisions themselves. Such systems are known as recommender systems and, Aristomenis S. Lampropoulos, Dionysios N. Sotiropoulos, and George A. Tsihrintzis Department of Informatics University of Piraeus, 80 Karaoli and Dimitriou St, Piraeus 18534 Greece e-mail: {arislamp,dsotirop,geoatsi}@unipi.gr G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 167–179. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
168
A.S. Lampropoulos, D.N. Sotiropoulos, and G.A. Tsihrintzis
somehow, represent a similar social process of recommendation and help relieve some of the pressure of information overload. Recommender systems take into account users personal needs and interests and provide information in a way that will be most appropriate and valuable to them. During the recent years, recommender systems have received a lot of attention from several research groups worldwide. The main difference between a recommender system and a search engine or a retrieval system is that a recommender system not only finds results but also selects objects (or items) that satisfy the specific querying user’s needs. Thus, recommender systems are supplied with an individualization/personalization process of the results they return to their user. In other words, recommender systems attempt to predict items that a user might be interested in. In our present work, we present a music recommender system which is trained with a small number of examples of user-preferred music pieces. The system computes features that are automatically extracted from the audio content of music pieces, without the need of creation and maintenance of textual meta-data. Therefore, our system makes music recommendations on a personalized basis, i.e., without having to match the user’s interests to some other user’s interests. In this way, our system overcomes well-known problems associated with Collaboration Filtering, such as non-association, user bias, or cold start1 . More specifically, the paper is organized as follows: in Section 2, we present an overview of related work in music recommendation systems. In Section 3, we formulate the recommendation problem as an one-class classification problem and focus on SVM- and AIS-based one-class classification. In Section 4, we evaluate recommendation methods and present experimental results. Finally, in Section 5, we draw conclusions and point to future related work.
2 Related Work There is a variety of works that relate specifically to music recommendation with respect to user need to find favorite music pieces. Generally, music recommender systems can be categorized according to methods and meta-data that are utilized. Two major music recommendation approaches seem to have been followed in the literature, namely (1) collaborative filtering and (2) content-based recommendation [17, 8]. Collaborative methods [1, 3, 12] recommend pieces to a user by considering someone else’s ratings of those pieces. For example, suppose that there is a target user who likes pieces A and B. If several other users like A, B, and C, C will probably be recommended to the target user. This technique is widely utilized in practical web-shopping services (e.g., Amazon and iTunes music store) and has been demonstrated to be rather effective. On the other hand, content-based methods 1
The problem of non-association arises when two similar items have never been wanted by the same user, their relationship is not known explicitly or, in item based Collaboration Filtering, those two items cannot be classified into the same group. On the other hand, the problem of user bias may be present in past ratings, while the problem of cold-start is related to new users who have no or insufficient ratings.
A Music Recommender Based on Artificial Immune Systems
169
[6, 10, 9, 2, 15] recommend pieces that are similar to users’ favorites in terms of music content such as moods and rhythms. This allows a rich artist variety and various pieces, including unrated ones, to be recommended. To achieve this, it is necessary to associate user preferences with music content by using a practical database where most users tend to rate few pieces as favorites. A relevance feedback approach for music recommendation was presented in [6] and based on the TreeQ vector quantization process initially proposed by Foote [5]. More specifically, relevance feedback was incorporated into the user model by modifying the quantization weights of desired vectors. Also, a relevance feedback music retrieval system based on SVM Active Learning which retrieves the desired music piece according to mood and style similarity [10]. Contrary to previous works, our system is based on content based music metadata and formulates the recommendation process as a one-class classification problem. The one-class classification approach is original as it has not been followed in the relevant literature before. The one-class classification process is implemented utilizing either a SVM- based or a NS algorithm to capture user preferences. From our findings, we draw the conclusion that the NS algorithm performs better than the SVM-based algorithm.
3 Recommendation as a One-Class Classification Problem The main problem dominating the design of an efficient multimedia recommender system is difficulty faced by its users when attempting to articulate their needs. However, users are extremely good at characterizing a specific instance of multimedia information as preferable or not. This entails that it is possible to obtain a sufficient number of positive and negative examples from the user in order to employ an appropriate machine learning methodology to acquire a user preference profile. Positive and negative evidence concerning the preferences of a specific user are utilized by the machine learning methodology so as to derive a model of how that particular user valuates the information content of a multimedia file. Such a model could enable a recommender system to classify new multimedia files as desirable or not-desirable according to the acquired model of the user behavior. However, the burden of obtaining a sufficient number of positive and negative examples from a user is not negligible. Additionally, users find it sensibly hard to explicitly express what they consider as non desirable since the reward they will eventually receive does not outweigh the cost undertaken in terms of time and effort. This insight has led us to lay the foundations of our music piece recommender system on a positive feedback incremental learning procedure since it is more convenient for a user to characterize a certain music piece as desirable. Specifically, the system unobtrusively monitors the user’s behavior by storing the past positive feedback in order to build a model that describes the musical habits of a particular user. In our approach we formulate the problem of recommendation as a one-class classification problem where the one-class to be considered is the class of desirable
170
A.S. Lampropoulos, D.N. Sotiropoulos, and G.A. Tsihrintzis
patterns an the complementary space of the universe of discourse corresponds to the class on non-desirable patterns. One-class classification is an umbrella term that covers a specific subset of learning problems that try to induce a general function that can discriminate between two classes of interest, given the constraint that training patterns are available only from one class. Since samples from both classes are not available, machine learning models based on defining a boundary between the two classes are not applicable. Therefore, a natural choice in order to overcome this problem is building a model that either provides a statistical description for the class of the available patterns or a description concerning the shape / structure of the class that generated the training samples. Otherwise stated, our primary concern is to derive an inductive bias which will form the basis for the classification of new incoming patterns. In the context of building a music piece recommender system, available training patterns correspond to those multimedia instances that a particular user identified and assigned to the class of preferable patterns. The recommendation of new music pieces is then performed by utilizing the one-class classifier for assigning the rest of the music pieces in the database either in the class of desirable patterns or in the complementary class of non-desirable patterns. In this paper, we develop an Artificial Immune System (AIS)-based music piece recommendation system where the one-class classifier is implemented by utilizing a specific class of immune-inspired algorithms called Negative Selection (NS) algorithms. The recommendation performance of our system was tested against the performance of Support Vector Machines (SVM)-based one-class classification approaches which constitute a state of the art classification paradigm. The relative recommendation performance of the two classifiers is demonstrated in the Experimental Results of Section 4.2.
3.1 SVM-Based One-Class Classification A classifier based on one-class support vector machines (One-Class SVM) is a supervised classification system that finds an optimal hypersphere which encompasses within its bounds as many training data points as possible. The training patterns originate only from one class, namely the class of positive patterns. In the context of music piece recommendation, the class of positive patterns is interpreted as the class of desirable music pieces for a particular user. A One-Class SVM classifier attempts at obtaining a hypersphere of minimal radius into a feature space H using an appropriate kernel function that will generalize best on future data. This means that the majority of incoming data pertaining to the class of positive patterns will fall within the bounds of the learnt hypersphere. The optimal solution gives rise to a decision function of the following form: f (s) =
n
n
∑ ∑ ai a j Φ (si ) Φ (s j )n
i=1 j=1 n
+2 ∑ ai Φ (si ) Φ (s) − Φ (s) Φ (s) i=1
(1)
A Music Recommender Based on Artificial Immune Systems
such that 0 ≤ ai ≤ f (s) =
1 , υn
171
n
∑ ai = 1
(2)
j=1
+1, if s ∈ S; −1, if s ∈ S.
(3)
A significant characteristic of SVMs is that only a small fraction of the ai coefficients are non-zero. The corresponding pairs of si entries (known as margin support vectors) fully define the decision function. Given that the training patterns appear only in dot product terms Φ (si ) · Φ (s j ), we can employ a positive definite kernel function K(si , s j ) = Φ (si ) · Φ (s j ) to implicitly map into a higher dimensional space and compute the dot product. Specifically, in our approach we utilize the Gaussian 2 s −s kernel function which is of the form K(si , s j ) = exp{− i2σ 2j }. Here, S is the set of positive samples that a particular user provides to the system as an initial estimation of the kind of music files that he / she considers desirable. One Class SVM classifier subsequently learns a hypersphere in the features space that encompasses as many as possible from the given positive samples. The induction principle which forms the basis for the classification of new music files is stated as: a new sample is assigned to the class of desirable patterns if the corresponding feature vector lies within the bounds of the learned hypersphere, otherwise it is assigned to the class of non-desirable patterns [11].
3.2 Negative Selection-Based One-Class Classification Negative Selection (NS) was the first among the various mechanisms in the immune system that were explored as a computational paradigm in order to develop an AIS. NS involves the discrimination between “self” and “non-self” cells and is considered as one of the major mechanisms in the complex immune system of the vertebrates. From a computational perspective the process of classifying a cell as “self” or “non-self” cell constitutes the dominant pattern recognition capability of the adaptive immune system. Any molecule recognized by this self/non-self discrimination process is called an antigen (Ag). When a non-self antigen is identified (pathogen), the adaptive immune system elicits an immune response (specific to that kind of antigen) which relies on the secretion of antibody molecules (Ab) by the B-Lymphocytes. Artificial negative selection constitutes a computational imitation of the self/nonself discrimination process which was first designed to address change detection problems. NS as a computational paradigm derives its inspiration from the the Tcell maturing process that happens in the thymus. T-cells of enormous diversity are assembled by utilizing a pseudo-random rearrangement process and those recognizing any self cell are eliminated from the available T-cells repertoire. Any method which is based on this process is identified by the term Negative Selection Algorithm.
172
A.S. Lampropoulos, D.N. Sotiropoulos, and G.A. Tsihrintzis
The main characteristics of any NS algorithm which was originally formulated by Forrest et al. [4] involve the Generation and Detection stages. In the Generation stage, a set of candidate detectors (antibodies)is randomly generated and subsequently censored against a given set of self samples. In other words if a detector matches any self sample, it is eliminated while those candidates that do not much any self sample are kept as detectors which are used during the Detection stage in order to classify incoming data as “self” or “non-self”. From a machine learning point of view this class of algorithms feature the negative representation of information since they output the complementary concept of the real target concept. Finally, negative selection algorithms constitute a specific instance of one-class classification since their goal is to discriminate between two classes on a training basis of only self samples. More formally, let U be the universe of discourse which is the set of all possible feature vectors such that U = [0, 1]L . We assume normalized feature vectors of real valued attributes such that each feature vector may be viewed as a point in the L dimensional unit hypercube. S ⊂ U is a subset of all possible feature vectors that corresponds to the class of self samples representing those music files that a particular user would characterize as desirable. The set of elements that corresponds to the class of non-desirable music files S¯ is the complementary set of non-self patterns, S = U \ S. The fundamental directives of any negative selection algorithm is firstly to generate a set D ⊂ S of detectors that each one of them does not recognize any instances from the self space and secondly that each detector recognizes at least one element of the non-self space. In order to formulate the self/non-self discrimination process which lies within the core of any negative selection algorithm we should designate a matching rule that defines when a specific self sample or detector recognizes a given element from the universe of discourse. In this approach the matching rule is modelled by the normalized Euclidean distance be→ tween two vectors in the unit hyper cube. Specifically, an element − x recognizes an element y if their normalized distance is less than a given threshold thres such that √1 x − y ≤ thres where thres ∈ (0, 1) L
3.3 Artificial Immune System-Based Recommendation In a previous work of ours [13, 14], we used artificial immune systems for clustering and classification of music pieces. In this paper, we adopt a NS algorithm named “Real Valued Negative Selection with Variable Sized Detectors” that was originally formulated by [7]. This approach was incorporated in our music recommender system as the core mechanism for inferring the class of unseen data in order to make recommendations for a particular user. A major advantage of this NS algorithm is that it utilizes a statistical inference methodology in order to decide whether the number of generated detectors covers the space of non-self samples above a predefined percentage. The set of the generated detectors is expected to identify any feature vector originated from the class of the non-desirable music files. Therefore,
A Music Recommender Based on Artificial Immune Systems
173
the criterion for recommending a new music file is that it is not recognized by any element in the set of the generated detectors. Let S be the space of self samples corresponding to those feature vectors that are representatives of the class of desirable music files. However, the class of desirable patterns is not known a priori as it would require an exhaustive evaluation of every music piece in the database by a particular user. Instead, we are given an initial set of positive examples described as S = {s1 , s2 , . . . , sn } where S ⊂ S. The primary goal of the negative selection algorithm is to generate a set of detectors D = {d1 , d2 , . . . , dm } ⊂ S that cover as much as possible from the complementary space of negative samples. Each detector may be considered as a hypersphere in the unit cube U = [0, 1]L centered at d d ∈ U with a corresponding radius Radius(d). ∀d ∈ D∃x ∈ S : d − xnorm ≤ Radius(d)
(4)
∀d ∈ D∀s ∈ S : d − snorm > Rsel f
(5)
Each detector recognizes any point in the universe of discourse that lies within the bounds of the associated hypersphere. Likewise each self sample si defines a hypersphere of radius Rsel f and it is considered to be recognized by any point that falls within its bounds. The generation phase of detectors is governed by two primary objectives that are quantified by Eq. 4 and Eq. 5. The Generation phase of the Real Valued Negative Selection Algorithm with Variable Detectors is described by the following pseudocode: [D] = Generate V-Detectors(S,Rself,Co,a,DeltaTmax) S : set of self samples so that S ⊆ [0, 1]L D : set of detectors so that D ⊆ [0, 1]L L : dimensionality of the input space Rself : self radius Radius(d) : function returning the Radius of a given detector d C0 : estimated coverage of the non - self space Tmax : maximum number of detectors DeltaTmax: additional number of detectors T : current number of detectors t : number of detectors that matched already covered non-self region a : maximum acceptable probability that the non - self coverage is less than C0 norminv(p,mu,sigma) : function computing the inverse of the normal cdf with parameters mu and sigma at the corresponding probability p Za : z score for a confidence level of (1 - a) positive z value separating a area of (a) in the right tail of the standard normal distribution Zt : z score corresponding to the sample statistic pˆ where pˆ =
t Tmax
174
A.S. Lampropoulos, D.N. Sotiropoulos, and G.A. Tsihrintzis
D ← 0/ p ← C0 q←1-p Za ← norminv(1-a,0,1) Tmax ← max( 5p , 5q ) + DeltaTmax T←0 t←0 max0 Zt ← √Tt−T max0 ×p×q while(Zt ≤ Za and T ≤ Tmax) x ← random sample from [0, 1]L while(∃s ∈ S : x − snorm ≤ Rself) x ← random sample from [0, 1]L end T←T+1 Radius ← min{x − snorm −Rself} s∈S
if(∀d ∈ D : d − xnorm >Radius(d)) D ← D ¡x,Radius¿ else t←t+1 max0 Zt ← √Tt−T max0 ×p×q end end return(D) f (s) =
−1, ∃d ∈ D : d − snorm ≤ Radius(d) +1, otherwise
(6)
The Detection Stage involves the utilization of the generated detectors in order to assign an unseen music file to the class of desirable or non-desirable patterns. The discrimination function obtained by the negative selection algorithm may be formulated as it is described by Eq. 6
4 System Evaluation The recommendation efficiency of our system which incorporates the Real Valued Negative Selection Algorithm with Variable Detectors was compared against the recommendation efficiency of One-Class SVM. In order to illustrate the performance of our recommender method we utilize a set of 1000 music files from 10 classes of western music each of them contains one hundred (100) pieces. Music files were assigned ratings by 15 users. Users have stated their opinions for at least 150 music files. As we mentioned before ratings follow the numerical scale
A Music Recommender Based on Artificial Immune Systems
175
from 1(low preference) to 3(high preference), while a rating of 0 indicates that the corresponding files were not rated and, therefore, are considered as non preferred. For the evaluation of the efficiency of our recommender method, we applied a 10-fold cross validation procedure in the ratings of each user. Thus, 90% of the dataset of each user was used for training and the other 10% for testing. In each fold, the query to our system included a total 40 music files, consisting of 10 pieces from each genre. This process was iterated with different disjoint partitions and the results were averaged. Finally, the results of all users were averaged. The accuracy of the predictions of our method was evaluated using the mean absolute error (MAE) and ranked scoring criteria (RS) [1]. Let U = {u1 , u2 , . . . , um } and I = {i1 , i2 , . . . , in } be the sets of users and items, respectively of the database from which our Recommender System makes recommendations. Each item in the database corresponds to a feature vector a 30-dimensional feature vector, derived by MARSYAS [16], in a high-dimensional Euclidean vector space V . These features not only provide a low level representation of the statistical properties of the music signal but also include high level information, extracted by psychoacoustic algorithms in order to represent rhythmic content (rhythm, beat and tempo information) and pitch content describing melody and harmony of a music signal. Each user assigns a unique rating value for each item in the database within the range of {0, 1, 2, 3}. Thus, user ratings define four disjoint classes of increasing degree of interest, namely C0 , C1 , C2 and C3 . C0 corresponds to the class of non-desirable/negative patterns, while the class of desirable/positive patterns may be defined as the union (C1 ∪C2 ∪C3 ) of C1 , C2 and C3 . In order to indicate the user involvement in defining the four classes of interest, we may write that ∀u ∈ U,
V = C0 (u) ∪C1 (u) ∪C2 (u) ∪C3 (u),
(7)
where C0 (u) ∩C1 (u) ∩C2 (u) ∩C3 (u) = 0. /
(8)
More specifically, letting R(u, i) be the rating value that the user u assigned to item i, the four classes of interest may be defined via the following equations: C0 (u) = {i ∈ I : R(u, i) = 0} C1 (u) = {i ∈ I : R(u, i) = 1} C2 (u) = {i ∈ I : R(u, i) = 2} C3 (u) = {i ∈ I : R(u, i) = 3}
(9)
we need to mention that if I(u) denotes the subset of items for which user u provided a rating, it follows that ∀u ∈ U, I(u) = I. The training procedure of the first level of our cascade classification architecture aims at developing one one-class classifier per user. These one-class classifiers are trained to recognize those data instances that have originated from the positive class of patterns. Each one-class classifier realizes a discrimination function denoted by fu (v), where v is a vector in V that is learnt from the fraction of training positive
176
A.S. Lampropoulos, D.N. Sotiropoulos, and G.A. Tsihrintzis
patterns. More specifically, if fu,k (v) is the discrimination function that corresponds to user u at fold k, then this function would be the result of training the one-class classifier on C1 (u, k) ∪ C2 (u, k) ∪ C3 (u, k). The purpose of each discrimination function fu (v) is to recognize the testing positive patterns P(u) against the complete set of negative patterns N(u). On the other hand, the training procedure of the second level of our cascade classification architecture aims at developing one multi-class classifier per user. This is achieved by training the multi-class classifier on the same set of positive data, C1 (u, k) ∪ C2 (u, k) ∪ C3 (u, k), but this time with the purpose to discriminate among the various (sub-)classes of the data pertaining to the set P(u, k). In other words, each second-level classifier realizes a discrimination function denoted by gu (v), the purpose of which is to partition the space of testing (positive) data P(u) into the 3 corresponding subspaces, C1 (u), C2 (u) and C3 (u), of desirable patterns. To explicitly indicate the discrimination function concerning user u at fold k, we use gu,k (v). The testing procedure concerning the first-level of our cascade classifier involves the assignment of a unique value within the set {−1, 1} for each input element such that: ∀u ∈ U, ∀k ∈ [K], ∀v ∈ P(u, k) ∪ N(u, k), (10) fu,k (v) ∈ {−1, +1}. The subset of testing instances that are assigned to the class of desirable patterns are subsequently fed into the second-level classifier which assigns to them a particular rating value within the range of {1, 2, 3}. Specifically, ∀u ∈ U, ∀k ∈ [K], ∀v ∈ P(u, k) ∪ N(u, k) : fu,k (v) = +1, gu,k (v) ∈ {1, 2, 3}
(11)
4.1 Measuring the Efficiency of Recommender Methods The efficiency of the adapted cascade classification scheme was measured in terms of the Mean Absolute Error and the Ranked Scoring measures. The Mean Absolute Error (MAE) constitutes the most commonly measure used to evaluate the efficiency of RS. More formally, the MAE concerning user u at fold k may be defined as: MAE(u, k) =
∑v∈P(u,k)∪N(u,k) |Ru,k (v)−Ru,k (v)| |P(u,k)|+|N(u,k)|
(12)
The Ranked Scoring (RS)[1] assumes that the recommendation is presented to the user as a list of items ranked by their predicted ratings. Specifically, RS assesses the expected utility of a ranked list of items by multiplying the utility of an item for the user by the probability that the item will be viewed by the user. The utility of an item is computed as the difference between its observed rating and the default or neutral rating d in the domain, which can be either the midpoint of the rating scale or the average rating in the dataset. On the other hand, the probability of viewing decays exponentially as the rank of items increases. Formally, the RS of a ranked list
A Music Recommender Based on Artificial Immune Systems
177
Table 1 Mean Absolute Error (MAE) Recommendation Method MAE SVM: 1st Level One-Class SVM, 2nd Level Multiclass SVM 0.73 AIS V-Detector: 1st Level V-Detector, 2nd Level Multiclass SVM 0.67
Table 2 Ranked Scoring (RS) Recommendation Method RS SVM: 1st Level One-Class SVM, 2nd Level Multiclass SVM 2.2 AIS V-Detector: 1st Level V-Detector, 2nd Level Multiclass SVM 2.3
of items vj ∈ P(ui , k) ∪ N(ui , k) sorted according to the index j in order of declining Rui ,k (vj ) for a particular user ui at fold k is given by: RSui ,k =
∑
vj ∈P(ui ,k)∪N(ui ,k)
max {Rui ,k (vj ) − d), 0} ×
1 2( j−1)(i−1)
(13)
The performance of two methods with respect to these two criteria was compared. As shown in Tables 1 and 2, respectively, the recommendation method based on V-Detector had better performance in MAE and it was slightly better in RS measure. The kernel which was utilized for the One-Class SVM was Polynomial with d = 3 and for the fraction (parameter v) 45%, which means that in the positive hyperplane of the One-Class SVM remained only 55% of positive data for each user. For the VDetector Rself is a crucial parameter effecting the recommendation performance of our system which was experimentally set. . The number of users does not influence the efficiency of our system as systems based on collaborative filtering because it is an item based recommender system. However, we are in the process of extending our experimental results with more users.
5 Conclusions and Future Work In this paper, we addressed the music recommendation process as a one-class classification problem. Specifically, we followed an innonavative which has not been followed before and is based on Artificial Immune System (AIS) and Negative Selection Algorithm. The NS-based learning algorithm allows our system to build a classifier of all music pieces in a database and make personalized recommendations to users. This is achieved quite efficiently through the intrinsic property of the NS algorithm to discriminate “self-objects” (i.e. music pieces of user’s like) from “non self-objects”, especially when the class of non self-object is vast when compared to the class of self-objects and the examples (samples) of music pieces come only from the class of self-objects (music pieces of user’s like). Our recommender
178
A.S. Lampropoulos, D.N. Sotiropoulos, and G.A. Tsihrintzis
has been fully implemented and evaluated and found to outperform state of the art recommenders based on support vector machines methodologies. Currently, we are in the process of conducting further experiments and improvements to our system This and other related research work is currently in progress and will be reported elsewhere in the near future.
References 1. Breese, J., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI 1998), pp. 43–52. Morgan Kaufmann, San Francisco (1998) 2. Celma, O., Ramrez, M., Herrera, P.: Foafing the music: A music recommendation system based on rss feeds and user preferences. In: Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), London, UK (2005) 3. Cohen, W., Fan, W.: Web-collaborative filtering: recommending music by crawling the web. Computer Networks (Amsterdam, Netherlands: 1999) 33(1-6), 685–698 (2000) 4. D’haeseleer, P., Forrest, S., Helman, P.: An immunological approach to change detection: Algorithms, analysis and implications. In: Proc. of IEEE Symposium on Security and Privacy (1996) 5. Foote, J.: An overview of audio information retrieval. Multimedia Systems 7(1), 2–10 (1999) 6. Hoashi, K., Matsumo, K., Inoue, N.: Personalization of user profiles for content-based music retrieval based on relevance feedback. In: Proc. ACM International Conference on Multimedia 2003, pp. 110–119. ACM Press, New York (2003) 7. Ji, Z., Dasgupta, D.: Real-valued negative selection algorithm with variable-sized detectors. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 287–298. Springer, Heidelberg (2004) 8. Lee, J.H., Downie, J.S.: Survey of music information needs, uses, and seeking behaviours: Preliminary findings. In: Proc. 5th International Conference on Music Information Retrieval (2004) 9. Logan, B.: Music recommendation from song sets. In: Proc. 5th International Conference on Music Information Retrieval, pp. 425–428 (2004) 10. Mandel, M., Poliner, G., Ellis, D.: Support vector machine active learning for music retrieval. ACM Multimedia Systems Journal 12(1), 3–13 (2006) 11. Manevitz, L.M., Yousef, M., Cristianini, N., Shawe-taylor, J., Williamson, B.: One-class svms for document classification. Journal of Machine Learning Research 2, 139–154 (2001) 12. Shardanand, U., Maes, P.: Social information filtering: Algorithms for automating ”word of mouth”. In: Proc. ACM CHI 1995 Conference on Human Factors in Computing Systems, pp. 210–217 (1995) 13. Sotiropoulos, D.N., Lampropoulos, A.S., Tsihrintzis, G.A.: Artificial immune systembased music piece similarity measures and database organization. In: Proc. 5th EURASIP Conference on Speech and Image Processing, Multimedia Communications and Services. Smolenice, Slovak Republic (2005)
A Music Recommender Based on Artificial Immune Systems
179
14. Sotiropoulos, D.N., Lampropoulos, A.S., Tsihrintzis, G.A.: Artificial immune systembased music genre classification. In: New Directions in Intelligent Interactive Multimedia. Studies in Computational Intelligence. Springer, Heidelberg (2008) 15. Sotiropoulos, D.N., Lampropoulos, A.S., Tsihrintzis, G.A.: Musiper: a system for modeling music similarity perception based on objective feature subset selection. User Modeling and User-Adapted Interaction 18(4), 315–348 (2008) 16. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5) (2002) 17. Uitdenbogerd, A., van Schyndel, R.: A review of factors affecting music recommender success. In: Proc. 3rd International Conference on Music Information Retrieval (2002)
The iCabiNET System: Building Standard Medication Records from the Networked Home Mart´ın L´opez-Nores, Yolanda Blanco-Fern´andez, Jos´e J. Pazos-Arias, and Jorge Garc´ıa-Duque
Abstract. Electronic Health Records (EHR) are a crucial element towards the implantation of information technologies in healthcare. One of the goals pursued with these artifacts is to prevent medicine misuse, for which EHR standards define fields to record medical prescriptions and medication regimens. Unfortunately, the information stored in an EHR about how and when the patient does take his/her medicines is most often imprecise and incomplete, which implies severe health risks and brings down the benefits of technology. There exist solutions to get accurate records from inpatient settings (i.e. when the patient is treated in hospital), but not from contexts of daily life (e.g. when the patient takes medicines in home or at work) even though these are breeding ground for medication misuse. In this paper, we present an approach to fill in this gap, building on a system that monitors medicine intake from within a residential network, and relying on EHR standards for the storage and exchange of health-related information.
1 Introduction An ongoing revolution in the field of healthcare aims at incorporating information technologies, in the belief that they will serve to prevent errors and achieve greater efficiency. One of the milestones in this movement is the management of Electronic Health Records (EHR) as longitudinal collections of health-related information about individual patients, including current and historical health conditions, medical referrals, tests, treatments, demographic information and non-clinical administrative data [22]. Among others, EHR systems are expected to play a role in solving the problem of medication misapplication or misuse, which is growing to the point of being as dangerous and costly as many illnesses [6]. The solution is quite advanced on the side of physicians, since there exist tools to annotate and track Mart´ın L´opez-Nores, Yolanda Blanco-Fern´andez, Jos´e J. Pazos-Arias, and Jorge Garc´ıa-Duque Department of Telematics Engineering, University of Vigo e-mail: {mlnores,yolanda,jose,jgd}@det.uvigo.es G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 181–190. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
182
M. L´opez-Nores et al.
prescribed medicines and regimens directly in an EHR [17]. The same happens with pharmacy and nursing thanks to the so-called Electronic Medication Administration Records (EMAR), which allow to accurately monitor the medication regimen of a patient while being treated in hospital [14]. Yet, it remains an open issue to enter information in the EHR from the patient’s side in his/her daily life, to keep track of what medicines he/she takes, when and in what doses. According to [6], this side represents the greatest share of the problem, as people tend not to follow their medication regimens properly. The reasons range from forgetfulness to voluntary discontinuations when the patients’ symptoms improve or they experience side effects. Overall, it has been estimated (see [21]) that the medications and regimens listed in EHR are inaccurate in at least 40% of the cases. A solution to the aforementioned problem may come from the use of smart medicine managers, which appear as a new range of domestic appliances intended to monitor the intake of medicines. In this paper, we describe the mechanisms recently introduced in the iCabiNET system [16] to interact with EHR, which imply two contributions to the state of the art. On the one hand, the idea itself represents a step forward with regard to previous medicine managers (e.g. those of [23, 12]) that could only record medication information in local logs. On the other, whereas most of the previous works in home-based telemedicine (e.g. [9, 4, 5]) used proprietary languages and formats for the storage and exchange of health-related information, the iCabiNET incorporates standard ones, opening the doors to interoperability with future systems.
2 The iCabiNET System As shown in Fig. 1, the iCabiNET system appears as an element of a residential network, ready to communicate directly with other appliances installed in a house, and with the outside world through a residential gateway. Within these settings, the operation of the system consists of two major steps: (i) gathering information about the medicines and doses available, and (ii) processing that information to identify and react to actual or potential misuses. The iCabiNET approach to monitoring the intake of medicines relies on smart packaging technologies, currently promoted by stakeholders of the pharmaceutical industry. As explained in [10], the idea is to integrate RFID devices and different types of sensors with the packaging of the medicines, to allow tracking not only medicine names, but also the doses available with no additional equipment. A common example is that of smart blister packs, which record the removal of a tablet by breaking an electric flow into the RFID’s integrated circuit. Other possibilities exist for liquid medicines, ointments and so on. Smart packaging allows monitoring the intake of drugs from anywhere, with the highest precision, with no need to keep all drugs in a unique place, and with no risk of mistaking one drug for another. The system simply gathers information by polling RFID readers deployed around the house (the ‘RR’ devices in Fig. 1) or connected to the residential network from other places (e.g. from the office).
Building Standard Medication Records from the Networked Home
183
Fig. 1 The iCabiNET system in a residential network.
Fig. 2 The operational scheme of the iCabiNET.
Primarily, the iCabiNET is intended to enforce some medical prescriptions (e.g. “the patient should take one of these tablets every 4 to 6 hours”). Accordingly, in the operational scheme of the system (Fig. 2), there is a ‘Watchdog’ module that continually supervises the information gathered about medicines and doses, to check that the former remain in good condition and the latter decrease correctly with time. This module notifies any odd circumstances by triggering different types of events. Events are the input for a second module, the ‘Actuator’, to decide what actions will be performed to issue warnings or deliver health care information to the patient. This module firstly considers generic statements with no liaison to specific appliances, such as those of Table 1. Then, it instantiates those statements on demand, using the appliances it finds most convenient at any time. In doing so, the ‘Actuator’
184
M. L´opez-Nores et al.
Table 1 Some events and generic actions they might trigger Event
Generic action
Oblivion Expiration Depletion
“Wait up to 90 minutes before reminding the user ” “Deliver increasingly serious warnings day after day ” “Arrange an appointment with the doctor to get a new prescription ” “Recommend an innocuous combination of drugs with the same effects ” “Restart the medication at a lower dose ” “Warn the local health authorities ”
Interaction Discontinuation Abuse
considers patient data, context information provided by external devices (e.g. about whether the patient is sleeping, watching TV or out of home) and descriptions of the connected appliances and the operations they can do. Thus, a “warn the patient” action can be automatically made to trigger an alarm clock, to interrupt a TV program and display some message on screen, or to make a telephone call. Details on how this is achieved, plus a couple of usage scenarios, can be found in [16]. The range of possibilities enabled by the residential network to reach the patient is precisely the point that makes the iCabiNET most advantageous with regard to previous systems.
3 The New iCabiNET Design When enhancing the iCabiNET system to interact with standard EHR repositories, our aim was to come up with a system that would comply with future developments in European research, which placed the EN13606 standard (Health informatics Electronic Health Record Communication) as the reference framework. Notwithstanding, we have opted to work on top of openEHR specifications [19], because they are freely available and there is already a nice set of software tools supporting them. In principle, openEHR implementations can easily generate and consume EN13606 communication extracts, so an eventual adaptation of the iCabiNET to EN13606 technology should not require much effort. Next, we describe the new version of the iCabiNET system, looking at the information units it handles and its architectural design. We also present the results of some evaluation experiments carried out to assess the implementation we have produced.
3.1 Exchanging Archetypes In the iCabiNET version presented in [16], all the health-related information was managed using ad hoc syntaxes and messages. The new one, in contrast, proceeds by exchanging instances of archetypes included in the openEHR repository.1 1
http://www.openehr.org/clinicalmodels/archetypes.html
Building Standard Medication Records from the Networked Home
185
Archetypes are reusable, formal expressions of constraints on a reference information model that define the valid structure, cardinality and content of EHR component instances. These artifacts constitute a technology-independent, single-source expression of the semantics, which can drive databases, user interface definitions, message schemas and all other technical expressions. Thus, it is possible to develop most of an EHR system from an implementation of the reference model, plus generic logic supporting the so-called Archetype Definition Language (ADL) for storage, validation, querying, etc. To begin with, the iCabiNET retrieves all the prescriptions issued for a patient by reading an instance of the OPEN EHR-SECTION. MEDICATION . V 1 archetype from his/her EHR. This can be done periodically or in response to the identification of new medicines, as when the patient enters his/her house with some pills he/she has just got from the drugstore. Each medication order appears as an instance of the OPEN EHR-EHR-INSTRUCTION. MEDICATION . V 1 archetype, specifying what medication to take, when, for how long etc. Among others, this archetype contains formal specifications for the following fields, which can therefore be monitored automatically: • • • • • • •
Name of the medication (free or coded text). Generic name (free or coded text). Dose unit (coded text). Form of the medication (coded text). Strength per dose unit (quantity, in various units). Dose to be administered at one time (by absolute quantity or by dose unit). Dose duration (time): The time over which an individual dose is to be administered. • Safety limits: – Maximum dose unit frequency (frequency units): The maximum number of dose units to be taken in a particular time. – Minimum dose interval (time): The minimum safe interval between doses. – Maximum dose interval (time): The maximum safe interval between doses. • Administration information: – Date and time of first administration (formatted as yyyy-??-??t??:??:??). – Date and time of last administration (formatted as yyyy-??-??t??:??:??). • Dispensing information: – Quantity (quantity, in various units). – Number of authorized repeat dispensing (count): The number of times this quantity of medication may be dispensed before a further prescription is required. – Brand substitution allowed (Boolean): True if an alternative brand may be substituted when dispensing. In addition to these formal fields, there is a free/coded text field called Administration instructions, introduced as a placeholder for any detailed instructions about
186
M. L´opez-Nores et al.
how to administer each medication. Though this is not normative, we have been using this field to specify generic actions to drive the operation of the ‘Actuator’ module of the iCabiNET (remember Fig. 2), and thus supersede the default behaviors. To keep track of medication regimens, we use the archetype OPEN EHR-EHRACTION. MEDICATION . V 1, which is intended to describe any actions arising from a medication order. The system stores successive entries (one per medicine administration) filling in the following fields: • Sequence number (count). • Batch number (text), given by the manufacturer and retrieved from the RFID. • Date and time of administration (formatted as yyyy-??-??t??:??:??). Actually, the OPEN EHR-EHR-ACTION. MEDICATION . V 1 archetype, just like it happens with OPEN EHR-EHR-INSTRUCTION. MEDICATION . V 1 (see above), differentiates dates/times of first and last administration. Normally, the iCabiNET stores the same values in both fields. It only stores different ones in cases of uncertainty, as when the patient has been out of the reach of RFID readers for some time, and he/she reappears with fewer doses than when he/she left. Again, this is a non-normative use of the archetype.
3.2 The New Architecture Originally, thinking of the television as the main interface of the networked home, we built the iCabiNET system as a software application for Digital TV set-top boxes. To harness our previous expertise (see [8]), we chose a framework that combines the MHP (Multimedia Home Platform) and OSGi (Open Services Gateway initiative) standards [7, 20]. The former is one of the specifications that dominates the Digital TV landscape worldwide, especially with satellite, terrestrial and cable broadcast networks [2]. The latter is a multi-purpose software architecture that is becoming increasingly popular to enable interactions in residential networks [15]. Both standards are based on the Java language. As presented in [16], the iCabiNET implementation contained the Java code of the ‘Watchdog’ and ‘Actuator’ modules. These communicated with RFID readers and in-home appliances over Ethernet local networks, and with devices out of home through a residential gateway. With the enhancements presented in this paper, the new design spans from the software to the middleware and hardware layers of an MHP set-top box, as shown in Fig. 3. The key element of the new design is the ‘Archetype manager’ module that appears in the software layer of the set-top box. This module is in charge of (i) converting instances of the OPEN EHR-EHR-INSTRUCTION. MEDICATION . V 1 archetype into inputs to the ‘Watchdog’, and (ii) turning the outputs of the ‘Actuator’ into instances of OPEN EHR-EHR-ACTION. MEDICATION . V 1. To perform these tasks correctly, the ‘Archetype manager’ relies on an ADL parser/analyzer tool, which is actually an adaptation of the one included in the Java reference implementation of the openEHR framework. The changes have to do with using JDOM
Building Standard Medication Records from the Networked Home
187
Fig. 3 The new design of the iCabiNET system and an EHR repository.
libraries, since these are the means provided by the MHP middleware to process XML content. The ‘Archetype manager’ is also responsible for accessing the EHR repositories. To this aim, it can invoke two different interfaces: the one based on the remote call paradigm of Java RMI, and the other based on Web Services technology for mobile devices (WS J2ME). The Java RMI approach is very similar to that of CORBAmed (now called the Healthcare Domain Task Force) [11], which has been dominant for more than 10 years in e-health practice; notwithstanding, we have added support for Web Services because they are becoming increasingly popular, as noted in [13]. Access to either interface, again, is accomplished by a tailored version of the client component included in the Java reference implementation of openEHR. Downwards in the architecture, all communications between the iCabiNET and EHR repositories flow through SSL tunnels, secured by the cryptographic functions of the Security and Trust Services API for mobile devices (SATSA). Specifically, we have considered RSA-1024 and RSA-2048 encryption schemes. Other functions enable access to smart cards, which the ‘Archetype manager’ may use to store information locally during periods of disconnection from the Internet.
3.3 Evaluation The new design of the iCabiNET has been tested to work in laboratory, deploying MHP development set-top boxes, off-the-shelf residential gateways and an experimental EHR repository system as depicted in Fig. 3. We built the repository using the Java reference implementation of openEHR. Its design, also shown in Fig. 3, includes the following layers: • At the bottom, we use the MySQL Connector/J driver to access a MySQL database that stores all the information through a JDBC interface.
188
M. L´opez-Nores et al.
• The core services layer uses the three main back-end services included in the Java reference implementation: the EHR service, to understand the classes of the openEHR reference model; the archetype service, to understand the format and syntax of the archetypes; and the demographic service, to manage the identities of all parties involved in the medication process, from prescription to dispensing or administration. • The ADL layer includes the parser and analyzer tool included in the reference implementation to process archetype instances. • At the top, there are two interfaces to communicate with client applications as explained above, using either Java RMI or Web Services technology. It is worth noting that this design is essentially an openEHR and Java adaptation of the repository system presented in [18] for EN13606 technology. For a first round of experiments, we completed the settings for our trials with purpose-built smart blister packs, inasmuch as smart packaging technology is not yet available in retail drugs (only in clinical trials). We also developed our own RFID readers for those blister packs, because we did not find alternatives in the market ready to deliver data over an OSGi network. With such equipment, we gathered data from 10 volunteers over a period of 4 months to check the correct operation of the iCabiNET and EHR repository implementations, with emphasis on exercising all methods of the Java RMI and Web Services interfaces. In turn, we could characterize the network traffic generated by average patients as a function of the number of medications they have to take and the prescribed dose intervals. Thereby, we made a second round of experiments with simulated patients to assess the performance and scalability of the repository. The main observation here is that Web Services technology implies some extra overhead and computational cost, to the point that a low-end dedicated server (with 1 GHz of processor speed and 1 Gb of RAM memory) could serve nearly 38,000 patients in real time using exclusively the Web Services interface, whereas it could serve 50,000 using exclusively the Java RMI counterpart. From the point of view of the set-top boxes, the differences in terms of computational cost were practically negligible. With the background of these experiments, we are currently having negotiations with a privately-funded local hospital to include the new version of the iCabiNET system as part of their continuous care program for patients with chronic heart disease. Our aim is to assess the system in practice, with a testbed of no less than 50 real patients for whom the correct intake of drugs is of utmost importance.
4 Conclusions and Future Work The development of smart medicine managers with networking capabilities can help maintain EHR with accurate information about medicine intake, and thereby reduce the huge impact of misapplications and misuses on the well being of people and the economies of health systems. With the enhancements we have presented in this paper, the iCabiNET system implements the first approach to this idea, introducing support for EHR standards in a field that had relied on proprietary formats
Building Standard Medication Records from the Networked Home
189
and languages thus far. Specifically, we have adopted the framework provided by the openEHR freely-available specifications, which lie at the core of the EN13606 European standard. Our work has benefitted greatly from the availability of opensource supporting tools and reference implementations, which do not yet exist for EN13606 or other standards like HL7. This software could be readily incorporated in MHP-compliant set-top boxes. The iCabiNET communicates with EHR repositories by exchanging instances of openEHR archetypes, whose formal definition prevents ambiguities in the interpretation of the data. Notwithstanding, our implementation makes non-normative use of a couple of fields, so we leave it for further analysis whether it would be necessary to define specialized versions of the archetypes considered. Also, it would be interesting to formalize the terminology used to specify generic actions for the iCabiNET to handle the events related to medication misuse. Now that we can use MHP applications to manage information stored in EHR repositories, our ongoing research aims at developing a new wave of recommender systems that bring together TV watching, e-commerce and telemedicine. A recommender system is a filtering application intended to discover items that are likely to be of interest for a target user, as inferred from a profile that gathers information about the TV programs he/she watches or the commercial products he/she purchases [1]. Our goal is to introduce the EHR as a new dimension of the user’s profile, in the belief that having access to health-related information will enable a number of features that may have a significant impact in the field of interactive services to the home. For example, it would be possible to advertise over-the-counter drugs to viewers who may benefit from them, taking care of recommended doses, interactions and so on. Also, we could take advantage of propitious scenes of TV programs to offer relaxation products or services to viewers who may be stress-prone (e.g. as inferred from knowledge about their jobs). In the same line, we can achieve greater targeting for herbal, first-aid or dietetic products, for rehabilitation or assistance services and so on, following factual information about the viewers’ health problems or factual/inferred information about their hobbies. Finally, we can avoid offering certain commercial products to viewers touched by diabetes, coeliac disease or allergies in general. These ideas will be soon implemented in the AVATAR system presented in [3]. Acknowledgements. This work was supported in part by the Ministerio de Educacin y Ciencia (Gobierno de Espaa) project TSI2007-61599, and by the Consellera de Innovacin, Industria e Comercio (Xunta de Galicia) project PGIDIT05PXIC32204PN.
References 1. Adomavicius, G., Tuzhilin, A.: Towards the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 739–749 (2005) 2. Benoit, H.: Digital Television: Satellite, cable, terrestrial, IPTV, Mobile TV in the DVB framework. Focal Press, St. Louis (MO) (2008)
190
M. L´opez-Nores et al.
3. Blanco-Fern´andez, Y., Pazos-Arias, J.J., L´opez-Nores, M., Gil-Solla, A., Ramos-Cabrer, M.: AVATAR: An improved solution for personalized TV based on semantic inference. IEEE Transactions on Consumer Electronics 52(1), 223–231 (2006) 4. Bobbie, P.O., Yussiff, A.L., Ramisetty, S., Pujari, S.: Designing an embedded electronicprescription application for home-based telemedicine using OSGi framework. In: Proc. Intl. Conf. on Embedded Systems and Applications (ESA), Las Vegas (NV), USA (2003) 5. Cur´e, O.: XIMSA: Extended interactive multimedia system for auto-medication. In: Proc. 17th IEEE Symp. on Computer-Based Medical Systems (CMBS), Bethesda MD, USA (2004) 6. Downey, G., Hind, C., Kettle, J.: The abuse and misuse of prescribed and over-thecounter medicines. Hosp. Pharm. 7(9), 242–250 (2000) 7. DVB: Multimedia Home Platform standard, version 1.2. Draft Technical Specification 102 590 V1.1.1 (2007) 8. D´ıaz-Redondo, R.P., Fern´andez-Vilas, A., Ramos-Cabrer, M., Pazos-Arias, J.J.: Exploiting OSGi capabilities from MHP applications. Virtual Reality Broadcast 4(16) (2007) 9. Ghinea, G., Asgari, S., Moradi, A., Serif, T.: A Jini-based solution for electronic prescriptions. IEEE Trans. Inf. Tech. Biomed. 10(4), 794–802 (2006) 10. Goodrich, N.: Smart packaging (2006), http://www.pac.ca/Chapters/ONSmartPkgDownloads.html 11. Healthcare Domain Task Force, http://healthcare.omg.org 12. Ho, L., Moh, M., Walker, Z., Hamada, T., Su, C.F.: A prototype on RFID and sensor networks for elder healthcare. In: Proc. SIGCOMM Workshops, Philadelphia, PA, USA (2005) 13. Iakovidis, I., Asuman, D., Purcarea, O., Comyn, G., Laleci, G.B.: Interoperability of eHealth systems - selection of recent EU’s Research Programme developments. In: Proc. Intl. Conf. eHealth (CeHR), Regensburg, Germany (2007) 14. Ladak, P.: Electronic medication order entry and administration - challenges of clinical and technical integration. In: Proc. Conf. Healthcare Information and Management Systems Society (HIMSS). Amelia Island, FL, USA (2003) 15. Lee, C., Nordstedt, D., Helel, S.: Enabling smart spaces with OSGi. IEEE Perv. Comp. 2(3), 89–94 (2003) 16. L´opez-Nores, M., Pazos-Arias, J.J., Garc´ıa-Duque, J., Blanco-Fern´andez, Y.: A smart medicine manager delivering health care to the networked home and beyond. In: Proc. Intl. Conf. Health Informatics (HEALTHINF), Funchal, Portugal (2008) 17. Mehta, N.B., Partin, M.H.: Electronic health records: A primer for practicing physicians. Cleveland Clinic J. Med. 74(5), 826–830 (2007) 18. Mu˜noz, A., Somolinos, R., Pascual, M., Fragua, J.A., Gonz´alez, M.A., Monteagudo, J.L., Salvador, C.H.: Proof-of-concept design and development of an EN13606-based Electronic Health Care Record service. J. Am. Med. Inform. Assoc. 14(1), 118–129 (2007) 19. openEHR: http://www.openehr.org 20. OSGi Alliance: Open Services Gateway initiative, http://www.osgi.org 21. Staroselsky, M., Volk, L.A., Tsurikova, R., Newmark, L.P., Lippincott, M., Litvak, I., Kittler, A., Wang, T., Wald, J., Bates, D.W.: An effort to improve electronic health record medication list accuracy between visits: Patients’ and physicians’ response. Intl. J. Med. Inform. 77(3), 153–160 (2008) 22. Tang, P.C., Ash, J.S., Bates, D.W., Overhage, J.M., Sands, D.Z.: Personal health records: Definitions, benefits and strategies for overcoming barriers to adoption. J. Am. Med. Inform. Assoc. 13(2), 121–126 (2006) 23. Wan, D.: Magic Medicine Cabinet: A situated portal for consumer healthcare. In: Proc. 1st Intl. Symp. Handheld and Ubiquitous Computing (HUC), Karlsruhe, Germany (1999)
Multi-agent Framework Based on Web Service in Medical Data Quality Improvement for e-Healthcare Information Systems Ching-Seh Wu, Wei-Chun Chang, Nilesh Patel, and Ishwar Sethi
1 Introduction It was reported that between 44,000 and 98,000 deaths occur annually as a consequence of medical errors within American hospitals alone [1] and the US National Association of Boards of Pharmacy reports that as many as 7,000 deaths occur in the US each year because of incorrect prescriptions [2] Therefore, a great desire to improve access to new healthcare methods, and the challenge of delivering healthcare becomes significant nowadays. In an attempt to meet these great demands, health systems have increasingly looked at deploying information technology to scale resources, to reduce queues, to avoid errors and to provide modern treatments into remote communities. Many medical information systems are proposed in the literature trying to assist in management and advising medical treatments to prevent from any type of medical errors. From the individualized care point of view, in order for clinicians to make the best diagnosis and decide on treatment all the relevant health information of the patient needs to be available and transparently accessible to them regardless of the location where it is stored. Moreover, computer-aided tools are now essential for interpreting patient-specific data in order to determine the most suitable therapy from the diagnosis, but existing systems lack collaborative ability because of employing different method to design [3]. Many researchers have been trying to apply service oriented architecture (SOA) to deal with the distributed environment for e-healthcare information system. [6] The objective of SOA is to provide better quality service to users. The new service is called web services [4]. Following the definitions and specifications of web service, any organization, company, or even individual developers who can deliver such functional entities can register and Ching-Seh Wu, Nilesh Patel, and Ishwar Sethi Department of Computer Science and Engineering, Oakland University, USA e-mail: {cwu,npatel,isethi}@oakland.edu Wei-Chun Chang Department and Graduate School of Information Management, Shu-Te University, Taiwan e-mail:
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 191–204. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
192
C.-S. Wu et al.
publish their service components to a Universal Description, Discovery, and Integration (UDDI) registry for public use. Web services can be as simple as a single transaction, e.g. the querying of a medical record, or more complex multi-services, e.g. supplying chain management systems from business to business (B2B), and many other [4]. However, current web service developments mostly focus on providing either a single service or at most a few. Focusing on single service without being prepared for complex and large-scale web services cause technological bottlenecks to develop. Therefore, in order to enhance service-oriented integration in distributed e-Healthcare environment, the collecting and composing of web service components for complex and large-scale web service applications need to be developed and improved. In composing web services, both a single service component and a series of service components that can support large-scale tasks need to be found. Ko and Neches also point out that current web service research focuses only on developing mechanisms to describe and locate individual service components in a network environment [5]. In dynamic optimization of medical data quality, the information regarding suitable medical data service components need to be acquired from many medical data service providers whose components are registered in a UDDI registry repository. The next step is to negotiate with different medical data service providers in order to integrate suitable medical data components. The optimization of medical data selection is successful when multi-objectives set by a medical data service requester are met such as reliability of medical data component, results of diagnosis, and cycles of consultation [8]. To evaluate web service composition, several aspects of the quality of service have been proposed, e.g. web service composition - Business Process Execution Language for Web Services (BPEL4WS) [7], web service coordination, web service transaction, web service security, and web service reliability. This paper aims to apply the SOA of Web Service concepts specified above to put forward a model of multiple intelligent agents based assistance in improvement of medical data quality in the distributed e-Healthcare information system environment which is able to optimize he medical data provided by multiple heterogeneous systems via Internet Web Services. Furthermore, to improve accuracy of doctor’s diagnostic, many methods for Medical Diagnostic and Treatment Advice Systems have been developed to assist medical doctors in decision making such as rule based reasoning, fuzzy inference, neural network, and etc. [8, 9, 10] Intelligent Agent is another approach taken by researchers trying to assist in different domains such as business process, remote education service, and project management [11, 12, 13]. Our objectives of this research are to design and develop medical data quality models and to develop the methodologies and algorithms of our multi-agent framework to assist in monitoring and optimizing data quality for e-Healthcare information system. In following sections, we will first describe the preliminary aspects of our study focusing on medical data quality in terms of data extraction in section 2. In section 3, static and dynamic behavior of medical data quality models were designed and developed by using UML notations. These models will be implemented for healthcare intelligent agents to monitor and keep track of the medical data recording and extraction process.
Multi-agent Framework Based on Web Service
193
Fig. 1 Healthcare Consultation Governance Cycle.
2 Preliminaries of Medical Data Quality Data quality refers to many different aspects. In Table 1, aspects of the data quality were grouped into two categories of dimensions, measurable dimension and intangible dimension. However, the main focus of the medical data quality in this research has been on the measurable Accuracy of data quality dimension. The accuracy of medical data in this study refers to the reality presentation of the medical data from data extraction process during the healthcare governance cycle specified in Figure 1. To receive an accurate set of medical data for healthcare consultation, this study has designed healthcare intelligent agents to monitor and track the data extraction process.
2.1 Healthcare Governance Cycle In order to design intelligent agent to monitor and to keep track of medical data processing, the healthcare governance cycle is illustrated in Figure 2. Within the healthcare consultation, the General Practitioner (GP), such as a family doctor, uses a networked Healthcare Maintenance Organizations (HMO) to find relevant healthcare knowledge for the treatment. The healthcare data from each consultation will be stored in medical database. The medical database records information in a concise format with compressed detail clinical coding regarding symptoms, diagnostic results, treatments, prescriptions, and other medical information for the consultation. When one of particular medical information is retrieved for further or next healthcare consultation/reference, the compressed medical data must be extracted to an understandable format for GPs. The feedback on the medical data quality will be conducted to improve the patient care for the next iteration of healthcare consultation. The major concern of the medical data quality is drawn from the data extraction process. One of the major challenges in healthcare domain is the
194
C.-S. Wu et al.
Table 1 Aspects of the data quality.
extraction of comprehensible knowledge from medical diagnosis data. Data accuracy and consistency must be maintained during the extraction process. In order to make sure that the data extraction process maintains a good quality of medical upto-date information, medical data quality models are created for the further design of healthcare intelligent agents.
3 Modeling the Medical Data Quality Using UML A saying from Software Engineering said [14], ”‘If you can model it, you can implement it.”’ We have designed the class diagram, the activity diagram, the use case diagram, and the sequence diagram for modeling the statics view of the medcial data quality and the dymanic behavior of the medical data extraction process using UML. By developing models, we are able to look into the details of the medical data recording/retrieval process as well as the data extraction process. This will help us design the multiple intelligent agents to monitor and track the data recording/retrieval and extraction processes to assist in medical data quality improvement. Use case diagram is tool for modeling the features and functions of an information system. The Use Case Diagram in Figure 2 shows that our system consists of medical data quality features such as data extraction, data migration, data cleaning, data integration, data processing and data analysis. Data analysis involves feedback and quality assessment. This Use Case Diagram is the first step toward the definition of the behavior of the medical data quality involved in a healthcare information process. For this particular study, we only focus on the data extraction of the medical data quality. The rest of medical data quality issues specified in Figure 3 has been reserved for future study and development The class diagram of medical data quality model in Figure 3 contains all classes/objects that associate with medical data processes and queries. Each class/object in the model was used to generate data quality metrics/attributes for intelligent agents to keep track and monitor. When data extraction process is conducted, the
Multi-agent Framework Based on Web Service
195
Fig. 2 Medical Data Quality Modeling - Use Case Diagram.
medical information in classes/objects will be collected in medical knowledge base for inference conducted by intelligent agents. The activity diagram in Figure 4 models the activities and tasks involved in the medical data extraction process. These key activities including using hospital query language (HQL) for hospital information system, medical data recording process, and looping process for data update. The activity diagram help design the internal monitoring process of healthcare intelligent agents. The sequence diagram of medical data model in Figure 5 shows the process sequence of medical data extraction. The sequence diagram enhances the process definition from the activity diagram. Healthcare intelligent agent uses these two models to identify quality items to be monitored.
4 Open Design of Intelligent Agent Prototype Once both dynamic and static models for the medical data quality have been designed, the intelligent agent was developed according to the medical quality models to monitor and to keep track of medical data recording and processing. Our design of intelligent agent is an open and collaborative infrastructure so that each agent has the same structure enabling communications with each other. An agent, as illustrated in Figure 6, consists of three service components, collaboration service, quality monitor service, and reporting service as described in Figure 7. The collaboration service
196
C.-S. Wu et al.
Fig. 3 Medical Quality Modeling - Class Diagram.
enables agents to plug & play medical web forms for portable medical records and to plug & play data workflows, medical protocols, and clinical guidelines in a distributed heterogeneous medical information environment, and enables information exchange service among intelligent agents. There are two existing methods that can be used to implement intelligent agents, Java Expert System Shell (JESS) and Java Agent DEvelopment Framework (JADE). The interior implementation of services for an agent was carried out by using JESS. The exterior communication behavior in a distributed e-healthcare environment was carried out by using JADE. In general, healthcare intelligent agents have been developed to assist e-healthcare information system in medical data quality improvement activities: • • • • • •
update medical knowledge base define criteria for healthcare data query determine if a threshold value of data quality has been reached optimize data quality for the accurate diagnosis keep track of patient’s healthcare profile communicate with other agents
Multi-agent Framework Based on Web Service
197
Fig. 4 Medical Data Quality Modeling - Activity Diagram.
JESS was used to develop and implement the interior inference process of intelligent agents as specified in Figure 7. The intelligent agent can take inputs from healthcare experts and transfer the inputs into healthcare knowledge for inference engine to make consultation judgments. Healthcare intelligent agent was developed with the feature described in Figure 8. It is able to take inputs from medical events such as patient’s medical history, diagnosis knowledge from experts, and medical symptoms. The interior features include update medical knowledge, define criteria for hospital queries, determine if the data quality threshold value has been reached, keep track of patient’s medical profiles, and communicate with other agents.
198
C.-S. Wu et al.
Fig. 5 Medical Data Quality Modeling - Sequence Diagram.
5 Optimization of Medical Data Quality One of the most important concerns of this study is the medical data selection for quality improvement over a distributed e-healthcare information environment. The foundation of satisfying data quality over the distributed medical data environment compiles the analysis and construction of medical data service workflow, the automation of composing/optimizing suitable medical data Web Service components, and medical data Web Service component reusability. To satisfy data quality criteria, we proposed a framework (see Figure 9) where we integrated intelligent agent, a medical data repository section and several modules into the Service Oriented Architecture (SOA). Evolutionary Algorithms (EAs) have been applied as the searching algorithms to search the optimal medical data in the distributed e-healthcare information environment as specified in Figure 9. ”‘Survival of the fittest”’ [15] is a principle in the natural environment which is used in the medical data selection algorithm to generate survivors, the optimal data selection in the distributed healthcare environment. The original principles of the EC theory are based on Darwin’s theory of natural selection to solve real world problems [6]. EAs have been successfully applied in optimizing the solutions for a variety of domains [6]. The strength of EC techniques comes from the stochastic strategy of search operators. The major components in EC are search operators acting on a population of chromosomes. EC was developed to solve complex problems, which were not easy to solve by existing algorithms [6, 7]. The method utilized in the algorithm to progress the search from ancestors to offspring is the collective learning process; species information is collected during the evolutionary process, and the offspring that inherit good genes from parents survive
Multi-agent Framework Based on Web Service
199
Fig. 6 Open Module Design of an Intelligent Agent.
the competition. This is the first characteristic of EAs. Next, the generation of descendants is handled by the search operators, crossover and mutation; which explore variations in species information in order to generate offspring. Crossover operators exchange information between mating partners. On the other hand, a mutation operator, which mutates a single gene with very small probability, is used to change the genetic material in an individual. Finally, the third characteristic that defines EAs is the evaluation scheme, which is used to decide who the survivor is. The evaluation scheme is the most diverse characteristic of the three due to the different objectives used to select the different solutions needed in different domains. The evaluation scheme can be as simple as good or bad, a binary decision; or as complex as nonlinear using multiple mathematical equations to assess trade-offs between multiple objectives. For this study EC techniques provided stochastic searching techniques aimed at global optimization. Global optimization searches for the best performance of solutions in the objective space. A general global optimization problem can be defined as follows. x∈Ω
f∗ (x) −→ min f (x) sub ject to c(x)
(1) (2)
where f (x) is the global optimization in objective space when determining the minimum of the function f (x); x is a vector of variables which lies in the feasible region
200
C.-S. Wu et al.
Fig. 7 Intelligent Agent Inference structure.
Ω , any x in Ω defines a feasible solution in which x conforms to the constraints c(x). A similar definition can also be applied to the maximization of objective functions. The design objective of this study was to develop an EC-based process incorporated with a current web service transaction procedure (see Figure 11) to search the optimal medical data quality solution space. The space was created by collecting information of data service components through UDDI registries for the optimization of medical data web service composition. This type of evolutionary process has also been developed and tested in requirements engineering in order to search for the optimal quality solutions for system specification [15]. The fundamental designs of an EC-based process in this study were focused on the definition of medical data search space, chromosome structure design, objective function definitions, and
Multi-agent Framework Based on Web Service
201
Fig. 8 A Healthcare Intelligent Agent.
Fig. 9 Optimizing medical data quality framework in distributed medical data environment.
quality fitness assessment algorithm. In general, to apply the process in medical data web service composition, the major steps of the process are defined as follows. 1) Collecting the medical data of component registrants: the size of medical data searching space is decided by the number of component registrants collected from available UDDI registries. Therefore, it is very important to obtain the information of all available medical data locations/components from component registration agents. The information regarding the description of service components can be collected from a component library as specified in [16]. The communication protocol is based on a set of API message (i.e., UDDI 3.0 and up). 2) Modeling medical data resources from different providers: medical data service components are classified and constructed into database tables based on the functionalities and characteristics of medical data service requested. The work flow of the medical data
202
C.-S. Wu et al.
Fig. 10 Design of the evolution process of medical data selection.
service can be modeled by using a scenario-based method that is used in previous sections to describe the task steps required to accomplish the completion of medical data web service applications.[15] 3) Applying the sequence of medical data web service composition and chromosome encoding/decoding: the task sequence of medical data web services that are needed to be optimized is defined. A sub-task service in a task sequence can be defined as component j,i, sub − task j
(3)
where it is assumed that one sub-task can be completed by a medical data service component. By utilizing the collected information of medical data component registrants, a web service task sequence is transformed into a binary string, i.e. encoding a quality solution into a chromosome. The chromosome mapping mechanism utilizes a hierarchical structure [15] for an encoding/decoding task sequence and chromosome. 4) Quality Fitness Assessment: To evaluate the quality of medical data optimization, multi-parameters or attributes are used in the metrics to evaluate performance and quality. The metric measurement focuses on different aspects based on what data quality criteria require. Such measurement is a key element of evaluating the performance and quality of medical data optimization.
6 Conclusions The accurate judgment about the healthcare treatment mainly depends on the selection of good quality medical data, especially in a distributed and heterogeneous healthcare information environment. This paper proposed a multi-agent framework based on SOA of Web Service that can assist in medical data quality improvement
Multi-agent Framework Based on Web Service
203
by monitoring the data extraction process, keeping track of data recoding/retrieval, and optimizing the selection of medical data from a distributed and heterogeneous e-healthcare environment. This study starts with creating static and dynamic models for medical data quality in term of data extraction so that the domain of objects and processes is defined. The open design of healthcare intelligent agents follows the definitions from the medical data quality models. The design of the intelligent agent enables the intelligent agent to provide external communication for collaborative service, internal inference shells for monitoring and tracking of data extraction process, and printing report service. To solve the problem of data selection and quality optimization in a distributed e-Healthcare environment, evolution computing algorithm was integrated into the Service Oriented Architecture of Web Service. In SOA, the healthcare intelligent agent also plays a major role as the service agent for medical data registration service and data requesting service. This multi-agent framework has been developed using Java Expert System Shell (JESS) and Java Agent DEvelopment Framework (JADE). The system will be practically deployed and integrated with e-Healthcare information systems for our local hospitals.
References 1. Kohn, L.T., Corrigan, J.M., Donaldson, M.S.: To Err Is Human: Building a Safer Health System. The Institute of Medicine (IOM), USA (1999) ISBN-13: 978-0-309-06837-6 2. Branigan, T.A.: Medication Errors and Board Practice. Colorado State Board of Pharmacy News. Nat’l Assoc. of Boards of Pharmacy, pp. 1–4 (February 2002), http://www.nabp.net/ftp-files/newsletters/CO/CO022002.pdf 3. Sokolowski, R.: Expressing Health Care Object in XML. In: EE 8th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, Palo Alto, California, pp. 341–342 (1999) 4. Web Services Organization (2004), http://www.webservices.org/ 5. Ko, I.Y., Neches, R.: Composing Web Services for Large-Scale Tasks. IEEE Internet Computing 7, 52–59 (2003) 6. Menasce, D.A.: Web Server Software Architectures. IEEE Internet Computing 7, 78–81 (2003) 7. Business Process Execution Language for Web Services Importer/Exporter Technology (2004), http://www.ibm.com/ 8. Hayashi, Y., Setiono, R.: Combining neural network predictions for medical diagnosis. Comput. Biol. Med 32(4), 237–246 (2002) 9. Papageorgiou, E., Stylios, C., Groumpos, P.: A Combined Fuzzy Cognitive Map and Decision Trees Model for Medical Decision Making. In: 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2006, pp. 6117–6120 (2006) 10. Tsai, M.C., et al.: Cooperative medical decision making and learning by the sharing of web-based electronic notebooks and logs. In: 12th IEEE Symposium on Computer-based Medical System, pp. 90–95 (1999) 11. Wu, C.-s., Chang, W.-c., Sethi, I.K.: A Metric-based Multi-System for Software Project Management. In: The Proceedings of IEEE/ACIS Eighth International Conference on Computer and Information Science (2009)
204
C.-S. Wu et al.
12. Devamalar, P.M., et al.: Design of Real Time Web Centric Intelligent Health Care Diagnostic System using Object Oriented Modeling. In: The 2nd International Conference on Digital Object, pp. 1665–1671 (2008) 13. Bellifemine, F., Rimassa, G.: Developing multi-agent systems with a pa-compliant agent Framework. Software - Practice and Experience 31(2), 103 (2001) 14. Perkins, A.: Business Rules = Meta-Data. In: International Conference on Technology of Object-Oriented Languages, p. 285 (2000) 15. Chang, W.C.: Optimising system requirements with evolutionary algorithms. Department of Computation. UMIST, The University of Manchester Institute of of Science and Technology, Manchester, p. 165 (2004) 16. Yang, J.: Web service componentization. Communications of the ACM 46, 35–40 (2003)
Towards a Unified Data Management and Decision Support System for Health Care Robert D. Kent, Ziad Kobti, Anne Snowdon, and Akshai Aggarwal
Abstract. We report on progress in development of a unified data management and decision support system, UDMDSS, for application to injury prevention in health care. Our system is based on a modular architecture which supports real-time web-base desktop and mobile data acquisition, semantic data models and queries, Bayesian statistical analysis, artificial intelligence agent-based techniques to assist in modelling and simulation, subjective logic for conditional reasoning with uncertainty, advanced reporting capabilities and other features. This research work is being conducted within a multi-disciplinary team of researchers and practitioners and has been applied to a Canadian national study on child safety in automobiles and also in the context of patient falls in a hospital.
1 Introduction Healthcare is a highly complex and multidimensional system that has a social contract with societies to provide safe and effective healthcare to citizens in a timely manner. This social contract requires effective decision making based on strong, empirical evidence in order to achieve best practice quality of care that all patients desire and expect from their health care system. Information systems currently collect massive amounts of data that is then painstakingly entered into databases, usually by management information systems (MIS) departments, who generate reports on operational or executive level. Information systems that support effective decision making at the level of the individual patient (using patient-centred data), to support decisions by health professionals, are very limited, however. For example, an elderly patient who has a history of falls at a long term care facility is admitted to hospital. There is no information system that allows for a seamless record of falls reporting Robert D. Kent, Ziad Kobti, and Akshai Aggarwal School of Computer Science, University of Windsor, Windsor, Ontario, Canada N9B 3P4 Anne Snowdon Odette School of Business, University of Windsor, Windsor, Ontario, Canada N9B 3P4 G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 205–220. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
206
R.D. Kent et al.
between the long term care facility and the hospital to support effective risk management of the patient’s risk of falling in hospital. Health records and health information remain paper based in most institutions, thereby limiting access to clinical data in a timely manner by either the health professional managing patient care, or the institution decision makers designing effect health services. Similarly, at the system level, large data sets of health information are managed and reported using sophisticated online analytical processing (OLAP) tools. However, such datasets are historical in nature and therefore, queries and analysis reports are necessarily retrospective, often 1 to 5 years old. While such reports are useful for making aggregate level decisions and pattern recognitions, an individual physician may benefit very little from such macro-level data. At the level of the individual patient and healthcare practitioner, data that is real-time and patient specific will have higher impact on the quality of care than the aggregate data. However, the availability of real-time, patient-centric data management and decision support system is almost non-existent. Furthermore, recent advances have set the stage for rapid changes in health information collection and sharing (Bilykh et al, 2003; Blanquer et al, 2005; Huang et al, 2005; Hwang & Aravamudham, 2004; Lim, Choi, & Shin, 2005; Smirnov et al, 2005). In particular, web-based communication and wireless devices, including sensors, cellular telephones and Personal Digital Assistants (PDA), have been found to be useful in health research and practice for distribution of information for teaching or patient care (Berry et al, 2004; Kim et al, 2005; Kopec et al, 2005; Paik et al, 2005; Sanderson et al, 2005; Schaffers, 2005; Skov & Hoegh, 2006). The difficulty of collecting and storing data, and the distributed and ubiquitous nature of devices being used in healthcare sector provided us a motivation to initiate a project for developing a realtime data management and decision support system in the area of injury prevention. In order to meet these challenges, a multi-disciplinary team of researchers working in the area of injury prevention collaborated to develop and test a unique tool that offers rapid and ”real time” decision support for health professionals and health sector leaders. While the system could have been developed in any area of healthcare sector, the developed team selected ”injury prevention” as an application domain because injury is currently the leading cause of death worldwide for individuals aged two to adult. Currently, injury research provides important evidence regarding the prevalence, severity of injury, and factors that contribute to injury outcomes in populations. Yet, injury data is often difficult and time consuming to collect, requiring health professionals and policy makers to make decisions and set policy based on evidence that may have been collected up to 5 years previously. The overarching goal of this project is to develop a Unified Data Management and Decision Support System (UDMDSS) that could achieve a simple and cost effective method of collecting data using handheld devices, which could also analyze data quickly, using advances in information system technology and modelling and simulation using semantics, artificial intelligence and subjective reasoning approaches. The core system software was originally developed for the 2006 national child seat survey collecting over 20,000 vehicle observations in 187 randomly selected intersections across Canada. The data was automatically downloaded into the UDMDSS database using handheld computer devices (BlackberryTM). This study resulted from a partnership
Towards a Unified Data Management and Decision Support System
207
Fig. 1 Functional overview of UDMDSS architecture.
with Transport Canada and Research in Motion (RIM) and yielded a very rich data set suitable to build, develop and test impacts of various policy parameters such as legislation, enforcement, and the influence of social networks on safety seat use in these families (Snowdon et al, 2006; Snowdon et al, 2008). In health care and injury prevention there are many problems and challenges; developing policies and practices for health and safety is a complex endeavour. There is a critical need for systems and approaches that provide and support evidence based research to augment and support traditional expert knowledge and technical acumen which depend upon experience, ongoing training and education, and many other factors. Thus, the goals of our team include deriving requirements for increased multi-disciplinary approaches in determining what are ”best” strategies for security and ethics, facts and measurement, expertise versus heuristics, culture and policy, ontologies, prototypes and exemplars, conceptualization and reasoning, and learning and evolutionary systems. Throughout our development approach we consider the total survey error paradigm in which gathering as much information about the survey design, methodology, implementation, practice, application, analyses, and so on, is essential in understanding the quality of final analytical results. The nature of the workflow in UDMDSS is illustrated in Figure 1. In the rest of this paper we describe the current architecture and capabilities of the UDMDSS, as well as work in progress.
208
R.D. Kent et al.
2 A UDMDSS PRIMER The Unified Data Management and Decision Support System (UDMDSS) is a comprehensive, multi-tiered software system that includes a range of information management capabilities necessary for securely collecting and maintaining health data, systematic analyses of that data (both at the individual level and aggregate level), and producing informed reports that capture the meaning of the data to support best practice decision making in the health sector. The flexibility and adaptability of the software is ideally suited to health care professionals, policy makers, health system leaders, and decision makers. In this section we discuss the UDMDSS from both functional and architectural perspectives. UDMDSS offers a large suite of information technology functions designed to collect and analyze data for a particular patient so that care can be personalized to their unique needs. For example, the system can collect information on key factors associated with falls in the elderly, and then produce a meaningful analysis of an individual patient’s falls risk, or history of falls complete with the most influential factors that contribute to the elderly patient’s fall in a clinical setting. Similarly, this software system can collect, house, and produce trends analyses of population data so that policy makers can make informed decisions about policy innovation to support key health indicators. One of the most important features of this software is the adaptive user interface to support entry of any type of health data. Furthermore, data entry occurs in ”real time” (immediately and automatically), and meaningful analyses can be generated with a simple query or command in the software system. UDMDSS has already been beta-tested in one Ontario hospital (Kent et al, 2008) and with a national government agency (Transport Canada) (Snowdon et al, 2006; Snowdon et al, 2008). As shown in Figure 2, the UDMDSS consists of six primary functional components: Real-Time Data Management (RTDM) Module, Persistent Storage (PS), Artificial Intelligence (AI) Module, Statistical Analysis (SA) Module, Subjective Logic (SL) Module, and Business Logic and Decision Support (BLDS) Module. The RTDM module uploads and saves the collected field data into the PS. The PS is a repository for data and database schema for the UDMDSS. The SA module performs statistical analysis of the stored data, and sends its results to the AI module for the purpose of initializing agent-based simulation parameters. The AI module starts simulating its historical dataset after calibrating its model through two validation tests. When the simulation results are aligned with the survey statistics by the BLDS module, the model is considered stabilized, and is initialized with these values to be used in actual production mode. The simulation data generated in the production mode are stored in the persistent storage. The SL is used to evaluate the impacts of uncertainty and belief on the data and on subjective reasoning. The BLDS module displays the results in a graphical format to the user. We now describe each of these components in more detail.
Towards a Unified Data Management and Decision Support System
209
2.1 Real-Time Data Management (RTDM) Module The overall objective of this module is to collect distributed data from various sources, extract and upload the collected data in the database, and attach semantics to the stored data for the purpose of statistical analysis and report generation. The data collection module consists of two main components: Data acquisition, and Unified Survey Management Server. Data acquisition refers to the process of collecting data via direct or indirect public communication. There are two kinds of data centres identified, the wireless data centre and the permanent or semi-permanent data centre. The former consists of light weight mobile data centers (e.g., wireless handheld devices like BlackberryTM, PDA etc.) that would allow the surveyor to collect data from virtually any place where effective wireless internet communication is available. The latter is not intended to be mobile and thus can be equipped with power computation hardware. Such centres may include hospitals or other public healthcare facilities. The Unified Survey Management System (USMS) is a multi-tiered, distributed system for creating and deploying surveys (Bates & Gawande, 2003; Newsted et al, 1998; Wright, 2005) online for data collection (Kent et al, 2007; Snowdon et al, 2006). The USMS is integrated with other modules, particularly BLDS, to provide analytical capabilities for post-survey statistical correlation studies as well as real-time trends analysis while
Fig. 2 Architecture of UDMDSS.
210
R.D. Kent et al.
Fig. 3 The Unified Survey Management System module.
field surveys are ongoing. Recent work involving agent-based modeling of social networks that uses survey data suggested incorporating various aspects of decision support in order to provide an additional analytical framework for developing and testing intervention strategies using feedback. Since 2006, the USMS has been undergoing further testing and development for application in a hospital setting to examine its utility for patient care reporting systems with nursing staff (Kent et al, 2007). Currently, USMS has been extended to deal more effectively with these and similar application scenarios in health systems by redesigning and implementing an ad hoc, peer-to-peer based grid strategy (Chervenak & Cai, 2006) using service oriented architecture (Blanquer et al, 2005; Omar & Taleb-Bendiab, 2006). This approach enables USMS to achieve local autonomy at each peer node while supporting secure data exchange and replication across the peer network, scalability of the network, shared access to resources and other features. Through these capabilities, it is possible to use USMS in other relevant areas, such as
Towards a Unified Data Management and Decision Support System
211
institutional and community based health programs, global health applications for epidemiology and disaster monitoring where “real-time” data collection and analysis are required to support rapid decision making to respond to natural disasters. Services offered by the USMS: From the user perspectives, the USMS affords the following capabilities: local autonomy with built-in logic; easy to set up and manage for local IT staff; minimum training requirements for administrators and users; support for any data collection tool with the ability to edit data from survey questions; dynamically alter surveys during initial validation and testing phases, and lock surveys once validated; minimization of data entry error; maintenance of data integrity; portability (cellular to desktops); automatic download to servers and easy retrieval of data; flexible analysis and modeling features; automated, real time data analysis for efficient and effective decision making; and, meaningful data representation for multilevel (e.g., front line, managerial) interpretation and policy decision support. A partial list of UDMDSS services is included in Table 1.
2.2 Persistent Storage (PS) Module For cognitive modeling and simulation we require a suitable model for fundamental data (ie. measurements and values) and the logical data storage model to be referenced. To achieve these goals we require definition of (i) the type system of computer software and hardware, (ii) meta-level descriptions of the data and resources, (iii) meta-meta-level descriptions of the relationships and concepts inherent in the data, (iv) support for Bayesian statistical analysis on the data and conceptual reasoning on the concepts, (v) support for semantics based data mining, and (vi) integration with an ontological framework to enhance support for improved user interfaces, including domain specific natural language (eg. expert, layperson). For our purposes we determined that an Object-Relational DBMS approach with overlaid semantics is appropriate. The semantic framework is established using XML (XML Schema, 2004) and Resource Description Language with proposed extensions to the standard to incorporate semantics (RDF Vocabulary Description Language; RDF Semantics). Within the core USMS module we established a high degree of database system interoperability using the Open Grid Services Architecture for Data Access and Integration (OGSA-DAI) and further support data mining with an emerging Generic Query Toolkit and language (Zhu et al, 2007).
2.3 Statistical Analysis (SA) Module In order to analyze collected data, it is necessary to provide efficient, well-defined access to the data. The UDMDSS imposes structure on the data model inferred through the survey structure itself. Thus, at the time of survey creation, many aspects of the data model structure can be accounted for and imposed in the database. We have linked our system with the R programming and analysis framework (The R project,
212
R.D. Kent et al.
Table 1 UDMDSS Services. MODULE
SERVICES
Task Management Task analysis Service composition, choreography Survey Creation, editing Approval (Sponsor, Ethics) Policy Knowledge Ontology design, creation, integration Metadata, Concept management Simulation Data Survey storage, editing, transaction Metadata, Indexing, Cataloguing Discovery Security Registration (Role based) Authentication Authorization Communication Peer management and Networking Web services Wireless
SERVICES Resource planning, coordination Scheduling, Load Balancing Data collection Query Interface and View Rendering Reasoning Agents Survey data analysis Data mining, reporting Database federation, replication Encryption Auditing/Assurance Accounting Satellite Grid and Cloud
2010) in order to provide capabilities for modelling and we provide well-defined and choreographed data pipelines from the database to statistical computing modules.
2.4 Artificial Intelligence (AI) Module The AI module is a key component which provides the intelligence to the UDMDSS by means of a multi-agent based modeling that simulates an artificial society using various evolutionary algorithm techniques such as particle swarm (Ahmad et al, 2007; Coelho & Mariani,2006) and ant colony optimization (Dorigo et al, 2006). The AI Module’s multi-agent based model is built around the Cultural Algorithm Framework (Reynolds, 2002). Cultural Algorithm is a population-based evolutionary algorithm that provides a well-established framework for representing the population, culture and knowledge (Kobti et al, 2003; Kobti et al, 2005; Ulieru et al, 2006). In our beta-testing of the UDMDSS, the AI module ran simulation on the Child Safety data. First, the data is laid onto a relational database with database schema appropriate for the Child Safety Model. Then, the model feeds on the database to initialize object entities such as Driver, Child, and Model properties. The implementation of the Child Safety Model has been evaluated by performing various high level tests and outputs, verified against actual statistics and available data. This process was termed as “validation testing” because it was designed to validate and calibrate the model. These were trial and error-based procedures and were performed several times until the model began to act rationally. These tests, along with their counterparts, provided a well calibrated and rational agent model for final testing in our study.
Towards a Unified Data Management and Decision Support System
213
2.5 Subjective Logic (SL) Module Reasoning with uncertainty has been considered by Dempster-Shafer (Shafer, 1976; Shafer, 1990) and in the context of subjective logic (Josang, 2007; Josang, 2008). We apply these principles and formalism to survey and questionnaire design and analysis. Part of our goal is to model the so-called Total Error Paradigm evident in data acquisition and analysis.
2.6 Business Logic and Decision Support (BLDS) Module We contain our business logic and rules within the BLDS decision support module patterned after several other systems in the literature (De Leeuw et al, 2008; Homer et al, 2006; Kohli et al, 1999; Lyell et al, 2008; Simon, 1977; Turban & Aronson, 2000). This module is evolving towards our goal of incorporating intelligence and reasoning.
3 Work in Progress While the UDMDSS has been successfully beta-tested with hospital and Transport Canada data, the system is still in its infancy relative to the goals established by our research group. Additional data management and decision support features must be present in the UDMDSS for it to be a fully viable platform. In this section we describe several enhancements currently under development. The current RTDM module allows for the automatic data entry and download of various data types (e.g., numerical data such as vital signs or a quantitative score, text data such as a health event or scenario, multiple choice, rating scale, etc.) in the Persistent Storage; however, it does not have built-in quality control for the input data. For example, while entering data of a 6 month old infant, the weight of 50 kg is erroneous even though this is an acceptable value for an adult patient. Thus, we are incorporating quality control logic for all types of data entry in the system. Furthermore, the logic would also alert the user about the data quality problem with the ability to correct the data in real time or discard it altogether. This approach is intended to overcome, in part, the data cleaning problem experienced in most survey analyses. The BLDS module of the UDMDSS generates basic summary statistics only, suitable for a proof of concept, but incomplete for more complex analysis. We are extending the analytical capabilities of the UDMDSS by incorporating Bayesian statistical analysis features into it; this is actually quite straightforward using the R system (The R project, 2010). Another enhancement of the existing UDMDSS is the introduction of a Concept Analyzer which clusters the collected data according to concepts that are meaningful in the ”real-world” sense and examines potential relationships among these clusters. The Concept Analyzer, as indicated in Figure 1, relies on the World Knowledge Base to build its knowledge about the world. The research strategy that our group follows involves development of practical, robust
214
R.D. Kent et al.
software modules fitting seamlessly into an overarching system core of UDMDSS. Each research activity listed below reflects current work in progress.
3.1 Questionnaire Design Workbench For the efficient design of surveys, patient reports and data queries X we implemented a survey creation module and associated software tools. At this stage metamodelling concerns are addressed directly and also data quality requirements. This will provide users with capabilities to become involved with the creative process at different levels of expertise and knowledge and to incorporate their own policies. We use XML and RDF frameworks extended to include semantics.
3.2 Semantics and Ontologies Using tools and frameworks such as XML, XSL/T/FO, DAML+OIL, OWL, Protg and so on, we produce a semantic data model suitable for implementation on open source database tools such as PostGreSQL, or proprietary database products. Our goal is to dramatically lessen the administrative burden of database model creation and management, especially of refactoring a database model, while also serving to automatically generate a substantial basis to support semantic web based data mining.
3.3 Bayesian Statistical Analysis Using open source software systems such as R, we are implementing a standard toolkit of statistical functions for complex analysis. In a forthcoming research application with Transport Canada we will test and validate our approach and demonstrate the capability to perform various complex statistical analyses, data quality control and real-time and post-trial types of trending.
3.4 Reporting Schema and Interface For a local hospital we produced a specific reporting interface for patient falls using XSLT-FO document transformation frameworks. We will extend this and produce a Reporting Schema and Interface workbench to permit users to specify much of their document formatting needs intuitively while enabling complex reports to be generated directly from data. Reports may be produced in XML, HTML, PDF and other open formats and may include formatted text, tables and graphics. The power of XSL, through XPath, permits significant data analysis to be done using additional processing logic so that graphics, for example, may be generated from raw data and users are exempted from having to know the technical details.
Towards a Unified Data Management and Decision Support System
215
3.5 Mobile and Web-Based Communications We have established the means of using web-based data communications to deploy questionnaires and retrieve data on a limited number of BlackberryTM models during 2005-06. We have extended this, using software simulators, to other models and products. We will produce a management module that will permit users to add and manage devices to a list of devices used in their organization. Features include specific user/device assignment and authorization, potential use of GPS tracking, data security, and potential for downloadable applications and interfaces to mobile devices.
3.6 Agent Modelling We are continuing our development and application of this research to the proposed work with Transport Canada and several health agencies in Ontario. This will provide a test-bed for validating the world and cultural model used, the methods of knowledge transfer determined most effective in the model and a means of performing policy analysis and research for the intended project goals.
3.7 Concept Analysis Using results of research in the areas of Conceptual Spaces (Goguen, 2005; Gardenfors, 2000; Gardenfors, 2004; Rickard et al, 2007) we are developing strategies and algorithms for pattern identification and conceptual reasoning within health survey application contexts, including the proposed national survey with Transport Canada. The ability to reason with high-level concepts is expected to be increasingly important to practitioners and health professionals. At the same time, the ability to derive concepts from clusters of data and complex relationships is expected to further the goals and capabilities of agent modelling systems.
3.8 Development Tools and Environment An important objective in UDMDSS development was to utilize various software components that satisfy open standards and licensing. In this respect, our system can be easily and freely employed in a variety of research contexts without the necessary concerns that derive from using proprietary software and systems. The USMS component of the UDMDSS runs under operating systems Solaris 10, Linux (Debian, Fedora) using Apache web server 2.2. OpenSSL 0.9.7d has been used to create a self-signed certificate for SSL connection. Survey application can run in the Internet Explorer and Mozilla Firefox and various BlackberryTM model browsers where we employed Research in Motion’s BlackberryTM development kits. Programming and scripting languages are Java (JDK, JVM), JXTA (Lim et al, 2005), Python, PHP 5, JpGraph 1.17, PHP 5, XML, XSLT-FO, WML, and XHTML. Network protocols include HTTP, HTTPS, WAP, UDP, and TCP. Database tools and applications
216
R.D. Kent et al.
include PostgreSQL, Oracle 10g XE, MySQL, and Datalog (Liu, 1995). For metadata creation, we have used Protg, DAML+OIL, RDF, and RDFS. Support for emerging standards exists with OGSA/OGSI, OGSA-DAI/DQP and HL7 health protocols. Agent modelling and simulation uses the Repast software environment. Statistical modeling and analysis is performed using the R system. Currently, the DSS is running under Fedora core 4 Linux and Oracle 10g XE with an Intel Pentium IV 2.8 GHz with 512 MB of RAM. All of the above machines are behind a firewall which provides a basic packet filtering facility to block ”illegal” traffic specified by security policy. Authenticated access is provided through a secure HTTPS session. In addition to the tools and systems noted above, we continue to study other software components for suitability and support of interchangeable components. It is evident from this discussion that a key component of DSS is based on open source technologies and therefore, our software release plan closely follows the emerging business models of open source software.
3.9 Conclusion In this paper we have presented an architecture for a unified data management and decision support system, UDMDSS. The architecture is modular and implemented fully using open standards, systems and software tools. The UDMDSS is intended for use in injury prevention research and by practitioners and provides integration of Bayesian statistics, agent-modelling and simulation of culture and subjective logic with uncertainty. In addition, our software supports web-based and mobile data acquisition and reporting options. A variety of toolkits may be used for creation of surveys or data forms, inclusion of semantic interfaces and domain ontologies and a multi-layered approach to meta-data and meta-modeling. We have successfully demonstrated the utility of our UDMDSS system approach in two applications, namely, in the reporting and analysis of patient falls in a hospital setting and in the acquisition of remote survey data and subsequent agent modelling of the culture of automotive safety for children. Acknowledgements. The authors acknowledge the work of the USMS team and volunteers that worked on collecting the survey data throughout Canada. RDK and AWS thank Research in Motion for providing Blackberry units for use in this research and NCE Auto21 and NSERC Discovery for funding support.
References [Ahmad et al.] Ahmad, R., Yung-Chuan, L., Rahimi, S., Gupta, B.: A multi-agent based approach for particle swarm optimization. In: International Conference on Integration of Knowledge Intensive Multi-Agent Systems (KIMAS 2007), pp. 267–271 (2007) [Bates & Gawande] Bates, D.W., Gawande, A.A.: Patient safety: Improving safety with information technology. New England Journal of Medicine 348(25), 2526–2534 (2003)
Towards a Unified Data Management and Decision Support System
217
[Berry et al.] Berry, D.L., Trigg, L.G., Lober, W.B., Karras, B.T., Galligan, M.L., AustinSeymour, M., et al.: Computerized symptom and quality-of-life assessment for patients with cancer part I: Development and pilot testing. Oncology Nursing Forum 31(5), 75–83 (2004) [Bilykh et al.] Bilykh, I., Bychkov, Y., Dahlem, D., Jahnke, J.H., McCallum, G., Obry, C., et al.: Can GRID services provide answers to the challenges of national health information sharing? In: 2003 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, Ontario, Canada, pp. 39–53 (2003) [Blanquer et al.] Blanquer, I., Hernandez, V., Segrelles, D., Robles, M., Garcia, J.M., Robledo, J.V.: Clinical decision support systems (CDSS) in GRID environments, from grid to healthgrid. In: Healthgrid 2005, vol. 112, pp. 80–89 (2005) [Chervenak & Cai] Chervenak, A.L., Cai, M.: Applying peer-to-peer techniques to grid replica location services. Journal of Grid Computing 4(1), 1572–9814 (2006) [Coelho & Mariani] Coelho, L., Mariani, V.C.: An efficient particle swarm optimization approach based on cultural algorithm applied to mechanical design. In: IEEE Congress on Evolutionary Computation (CEC 2006), pp. 1099–1104 (2006) [Collier et al.] Collier, N., Howe, T., North, M.: Onward and upward: The transition to repast 2.0. In: First Annual North American American Association for Computational Social and Organizational Science Conference, Pittsburgh, PA, USA (2003) [De Leeuw et al.] De Leeuw, E.D., Hox, J.J., Dillman, D.A.: International handbook of survey methodology. In: European Association of Methodology. Lawrence Erlbaum Associates, Mahwah (2008) [Dorigo et al.] Dorigo, M., Birattari, M., Stutzle, T.: Ant colony optimization. IEEE Computational Intelligence Magazine 1(4), 28–39 (2006) [XML] Extensible Markup Language (XML) 1.0 (Fifth Edition): http://www.w3.org/ XML/ [Grdenfors 2000] G¨ardenfors, P.: Conceptual Spaces: The Geometry of Thought. MIT Press, Cambridge (2000) [G¨ardenfors 2004] G¨ardenfors, P.: Conceptual Spaces as a Framework for Knowledge Representation. Mind and Matter 2(2), 9–27 (2004) [Goguen] Goguen, J.: What Is a Concept? In: Dau, F., Mugnier, M.-L., Stumme, G. (eds.) ICCS 2005. LNCS (LNAI), vol. 3596, pp. 52–77. Springer, Heidelberg (2005) [Gorry & Morton] Gorry, G.A., Scott Morton, M.S.: A Framework for Management Information Systems. Sloan Management Review 13, 1 (1971) [Hine et al.] Hine, N., Judson, A., Ashraf, S., Arnott, J., Sixsmith, A., Brown, S., et al.: Modelling the behaviour of elderly people as a means of monitoring well being. In: Ardissono, L., Brna, P., Mitrovi´c, A. (eds.) UM 2005. LNCS (LNAI), vol. 3538, pp. 241–250. Springer, Heidelberg (2005) [Homer & Hirsch] Homer, J., Hirsch, G.: System Dynamics Modeling for Public Health: Background and Opportunities. American Journal of Public Health 96(3), 452–458 (2006) [Huang et al.] Huang, C., Lanza, V., Rajasekaran, S., Dubitzky, W.: HealthGrid - bridging life science and information technology. Journal of Clinical Monitoring and Computing 19(4-5), 259–262 (2005) [Keen & Morton] Keen, P.G.W., Scott Morton, M.S.: Decision Support Systems: An Organizational Perspective. Addison-Wesley, Reading (1978) [Kent et al.] Kent, R.D., Snowdon, A., Preney, P., Kim, D., Ren, J., Aggarwal, A., et al.: USMS: An open-source IT solution for effective health and safety data collection and decision support. Halifax 7 (CPSI: Canadian Patient Safety Institute), Ottawa (2007)
218
R.D. Kent et al.
[Kim et al.] Kim, D., Kent, R., Aggarwal, A., Preney, P.: Flexible multi-layer virtual machine design for virtual laboratory in distributed systems and grids. In: 1st WSEAS International Symposium on GRID COMPUTING, Corfu Island, Greece (2005) [Kobti et al. 2003] Kobti, Z., Reynolds, R.G., Kohler, T.: A multi-agent simulation using cultural algorithms: The effect of culture on the resilience of social systems. In: IEEE Conference on Evolutionary Computation, CEC 2003, vol. 3, pp. 1988–1995 (2003) [Kobti et al. 2005] Kobti, Z., Snowdon, A.W., Kent, R.D., Dunlop, T., Rahaman, S.: A multiagent model prototype for child vehicle safety injury prevention. In: Agent 2005 Conference on: Generative Social Processes, Models, and Mechanisms, Argonne National Laboratory. University of Chicago, Chicago (2005) [Kobti et al. 2010] Kobti, Z., Snowdon, A.W., Kent, R.D., Bhandari, G., Rahaman, S.F., Preney, P.D., Zhu, L., Kolga, C.A., Tiessen, B.: Towards a “Just-in-Time” Distributed Decision Support System in Health Care Research (2010) (submitted) [Kohli et al.] Kohli, R., Tan, J.K., Piontek, F.A.D., Ziege, E., Groot, H.: Integrating cost information with health management support system: an enhanced methodology to assess health care quality drivers. Topics in Health Information Management 20(1), 80–95 (1999) [Kopec et al.] Kopec, D., Eckhardt, R., Tamang, S., Reinharth, D.: Towards a mobile intelligent information system with application to HIV/AIDS. Studies in Health Technology and Informatics 114, 30–35 (2005) [Krug &Sharma] Krug, E.G., Sharma, G.K., Lozano, R.: The global burden of injuries. American Journal of Public Health 90(4), 523–526 (2000), Retrieved from http://search.ebscohost.com/login.aspx?direct=true& db=cin20&AN=2001043079&site=ehost-live [Jøsang 2007] Jøsang, A.: Probabilistic Logic under Uncertainty. In: Gudmundsson, J., Jay, B. (eds.) Proceedings of the Thirteenth Computing: The Australasian Theory Symposium (CATS 007), Conferences in Research and Practice in Information Technology (CRPIT), Ballarat, Victoria, Australia, January 2007, vol. 65, pp. 101–110 (2007) [Jøsang 2008] Jøsang, A.: Conditional Reasoning with Subjective Logic. Journal of Multiple-Valued Logic and Soft Computing 15(1), 5–38 (2008) [Lim] Lim, B., Choi, K., Shin, D.: A JXTA-based architecture for efficient and adaptive healthcare services. In: Kim, C. (ed.) ICOIN 2005. LNCS, vol. 3391, pp. 776–785. Springer, Heidelberg (2005) [Liu] Liu, M.: Relationlog: A typed extension to datalog with sets and tuples (1995) [Lu et al.] Lu, J., Naeem, T., Stav, J.B.: A distributed information system for healthcare web services. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds.) APWeb Workshops 2006. LNCS, vol. 3842, pp. 783–790. Springer, Heidelberg (2006) [Lyell et al.] Lyell, D., Sadsad, R.,, G.: Health Systems Simulation. In: Encyclopedia of Healthcare Information Systems, vol. II (2008), Information Science Reference [Newsted & Huff] Newsted, P., Huff, S., Munro, M.: Survey instruments in IS. MISQ Discovery, OGSA-DAI 3.0 documentation (December 1998), http://www.ogsadai. org.uk/documentation/ogsadai3.0/ogsadai3.0-gt/; Omar, W.M., Taleb-Bendiab, A.: Service oriented architecture for E-health support services based on grid computing. In: IEEE International Conference on Services Computing (SCC 2006), pp. 135–142 (2006) [RDF Semantics] A work in progress (W3C Working Draft), http://www.w3.org/TR/rdf-mt/ [Reynolds] Reynolds, R. G. Cultural algorithm: A tutorial, http://ai.cs. wayne.edu/ai/availablePapersOnLine/CULTURAL_ALGORITHMS_ 2002-05-03/CULTURAL_ALGORITHMS_2002-05-03.ppt
Towards a Unified Data Management and Decision Support System
219
[Rickard et al.] Rickard, J.T., Aisbett, J., Gibbon, G.: Reformulation of the theory of conceptual spaces. Information Sciences 177, 4539–4565 (2007) [Salem et al.] Salem, F.D., Krishnaswamy, S., Loke, S.W., Rakotonirainy, A.: Context-aware ubiquitous data mining based agent model for intersection safety. In: Enokido, T., Yan, L., Xiao, B., Kim, D.Y., Dai, Y.-S., Yang, L.T. (eds.) EUC-WS 2005. LNCS, vol. 3823, pp. 61–70. Springer, Heidelberg (2005) [Sanderson et al.] Sanderson, N., Goebel, V., Munthe-Kaas, E.: Metadata management for ad-hoc InfoWare - A rescue and emergency use case for mobile ad-hoc scenarios. In: Meersman, R., Tari, Z. (eds.) OTM 2005. LNCS, vol. 3761, pp. 1365–1380. Springer, Heidelberg (2005) [Schaffers] Schaffers, H.: Innovation and systems change: The example of mobile, collaborative workplaces. AI & Society 19(4), 334–347 (2005) [Shafer & Glenn 1976] Shafer, Glenn: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976) [Shafer & Glenn 1990] Shafer, Glenn: Perspectives on the theory and practice of belief functions. International Journal of Approximate Reasoning 3, 1–40 (1990) [Simon] Simon, H.: The New Science of Management Decision. Prentice Hall, Englewood Cliffs (1977) [Skov &Hoegh] Skov, M.B., Hoegh, R.T.: Supporting information access in a hospital ward by a context-aware mobile electronic patient record. Personal and Ubiquitous Computing, 1–10 (2006) [Smirnov et al.] Smirnov, A., Pashkin, M., Chilov, N., Levashova, T.: Ontology-based knowledge repository support for healthgrids, from grid to healthgrid. In: Healthgrid 2005, vol. 112, pp. 47–56 (2005) [Snowdon et al. 2007] Snowdon, A., Howard, A., Boase, P.: The Development of a Protocol for a National Study of Canadian Children’s Safety in Vehicles. In: Canadian Association of Road Safety Professionals, Canadian Multidisciplinary Road Safety Conference Proceedings, Montreal, Quebec (June 2007) [Snowdon et al. 2008] Snowdon, A., Hussein, A., Ahmed, E.: Children at Risk: Predictors of car seat safety misuse in Ontario. Accident Analysis and Prevention 40, 1418–1423 (2008) [Snowdon et al. 2006] Snowdon, A., Kent, R., Kobti, Z., Howard, A.: Development of a wireless, web-services based survey for road to measure vehicle safety for Canadian children. In: 8th World Conference on Injury Prevention and Safety Promotion, Durban, South Africa (2006) [Snowdon et al. 2008] Snowdon, A.W., Hussein, A., Purc-Stevenson, R., Bruce, B., Kolga, C., Boase, P., et al.: Are we there yet? Canada’s progress towards achieving road safety vision 2010 for children travelling in vehicles. International Journal of Injury Control and Safety Promotion (2008) [Snowdon et al. 2009] Snowdon, A.W., Hussein, A., Purc-Stevenson, R., Follo, G., Ahmed, E.: A longitudinal study of the effectiveness of a multi-media intervention on parents’ knowledge and use of vehicle safety systems for children. Accident Analysis & Prevention 41(3), 498–505 (2009), doi:10.1016/j.aap.2009.01.013 [R project] The R project for statistical computing (2010), http://www.r-project.org/ [Turban, & Aronson] Turban, E., Aronson, J.E.: Decision Support Systems and Intelligent Systems. Prentice Hall, Upper Saddle River (2000) [Turner et al.] Turner, C., McClure, R., Nixon, J., Spinks, A.: Community-based programs to promote car seat restraints in children 0-16 years - a systematic review. Accident Analysis & Prevention 37, 77–83 (2005)
220
R.D. Kent et al.
[Ulieru et al.] Ulieru, M., Hadzic, M., Chang, E.: Soft computing agents for e-health in application to the research and control of unknown diseases. Information Sciences 176(9), 1190–1214 (2006) [Wigton] Wigton, R.S.: Use of Linear-Models to Analyze Physicians Decisions. Medical Decision Making 8(4), 241–252 (1988) [Wright] Wright, K.B.: Researching internet-based populations: Advantages and disadvantages of online survey research, online questionnaire authoring software packages, and web survey services. Journal of Computer-Mediated Communication 10(3) (2005) [XML Schema] XML Schema (2004), http://www.w3.org/XML/Schema [XSL Transformations] XSL Transformations (XSLT) Version 1.0, http://www.w3.org/TR/xslt [Zhu et al] Zhu, L., Ezeife, C.I., Kent, R.D.: Generic query toolkit: A query interface generator integrating data mining. In: Information Resources Management Association (IRMA), Vancouver (2007)
A Glove-Based Interface for 3D Medical Image Visualization Luigi Gallo
Abstract. In this paper, a low cost and portable 3D user interface for exploring medical data is presented. By means of a data glove, equipped with five bend sensors and an accelerometer, and a Wiimote, which tracks additional InfraRed Light Emitting Diodes placed on the glove, 3D imaging data can be visualized and manipulated in a semi-immersive virtual environment. The paper also details the interaction techniques we specifically designed for a medical imaging scenario and provides implementation details of the integration of the interface in an open-source medical image viewer.
1 Introduction and Background The enormous flood of data provided from modern imaging units together with the recent advances in the architecture of modern graphics processing units are leading toward the direct 3D exploration of volumetric medical data. In this context, immersive visualization provided by Virtual Reality (VR) technologies can further support radiologists by increasing their perception of both the shape and position of anatomical structures and by simplifying their understanding of the spatial relationships between anatomical structures. However, for an effective application of VR technologies in medical facilities, additional constraints have to be considered. Radiologists are not VR experts and are not inclined to use new tools if they require time-consuming training and system configuration. For effective use, the cognitive load required to use the interface has to be minimized. Moreover, together with intuitive and effective interaction techniques, ergonomic and non-obstructive input devices are required, since the activity of medical image analysis is executed many times a day by different clinical practitioners. Luigi Gallo ICAR-CNR, Via P. Castellino 111, 80131, Naples, Italy e-mail:
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 221–230. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
222
L. Gallo
Fig. 1 Glove-based inspection of volumetric medical data in a semi-immersive virtual environment.
Accordingly, Hand-Computer interaction may be a suitable paradigm. There is a long history of the adoption of data gloves in medicine, particularly in the area of surgical training [14]. Basically, data gloves are normal gloves with embedded sensors. The device provides accurate information about hand movements but, for a full 6 degrees of freedom direct manipulation of virtual objects, positional trackers placed on the glove are also required. Tracking systems are available using a variety of technologies ranging from mechanical devices to magnetic, acoustic or optical trackers [15], but most of them are not really suitable for use in medical facilities, mainly because they suffer from interferences due to ambient noise and metal or because they limit the user’s freedom of motion [1]. Furthermore, data gloves and tracking systems are generally expensive. This is the reason why, in recent years, many research groups have been working on the design of low cost and less cumbersome glove-based interaction systems. In [13], a low-power and low-cost data glove is presented. Both bend sensors and accelerometers are used to detect finger and hand movements. In [11], a commodity camera is used to track visual markers on the finger tips, whereas a software module computes the position of each finger tip in real-time. In [3], a single consumer-grade webcam is used to enable 3D hand tracking. Recently, the Wiimote, the controller of the Nintendo WiiTM console, has also emerged as a low cost input device suitable for 3D medical imaging. 3D interfaces which use the Wiimote as the input device alone [6] and together with a speech recognition module [5] have been proposed and implemented. The Wiimote, wrapped in a sterile plastic hull, has also been effectively adopted in an operating theatre to allow the intraoperative modification of resection plans during liver operations [8]. In this paper, we describe a 3D user interface that uses a Wiimote-enhanced wireless data glove as the input device and provides interaction techniques specifically developed for exploring medical data in semi-immersive virtual environments. In greater detail, the interface allows you to rotate and move 3D reconstructions of anatomical parts, to dolly the camera and to control the position of a 3D cursor
A Glove-Based Interface for 3D Medical Image Visualization
223
over the object shapes. Different sources of data are considered: positional data provided by the Wiimote, which tracks the InfraRed (IR) Light Emitting Diodes (LED) placed on the glove; orientation data provided by the accelerometer integrated into the glove; and finger joint movement data provided by the finger bend sensors of the glove. Besides the inexpensiveness of the whole system, a main advantage of the proposed interface is its portability. To manipulate 3D data all that is required is to wear the glove and place the Wiimote in front of it. The user interface hereafter described has been integrated in MITO (Medical Imaging TOolkit) [9], an open-source PACS-integrated medical image viewer. The rest of the paper is organized as follows. Section 2 introduces the input device, providing details about the data glove used and discussing the main benefits and limitations of the possible Wiimote + glove configurations. Section 3 describes the interaction techniques specifically developed for medical image analysis and the mapping between hand gestures and state transitions. Section 4 provides implementation details focused on the low level handling of the near-real-time constraints required for an interactive manipulation and on the merging of the interface in the medical image viewer. Finally, section 5 concludes the paper.
2 Input Device The input device used to interact with 3D medical data is reported in Fig. 2. It is a wireless data glove equipped with additional IR LEDs, which are tracked from a Wiimote placed at the top or the bottom of the display. The data glove is able to measure the finger flexions (1 sensor per finger) and the orientation (pitch and roll) of the user’s hand, while the Wiimote is able to track the IR LEDs placed on the glove. In the following, we provide some technical details about the glove and the tracker and consider the benefits and limitations of different system configurations.
2.1 Data Glove The data glove we used in our experiments is the DG5 VHand 2.0 glove [2]. It is equipped with one bend flex sensor per finger and an accelerometer able to sense hand movements and to deduce hand orientation (roll and pitch) along the 3 main axes. It is a wireless (communicates via a Bluetooth link), inexpensive and lowpower device that guarantees a long operating period. The finger sensor resolution is 10 bit (1024 points), the hand orientation resolution is 0.5◦ and the measured hand acceleration ranges from -2g to 2g, at a sampling rate of 25 Hz.
2.2 Wiimote The Wiimote, also known as Wiimote, is a wireless, ergonomic and economical input device. It communicates via a Bluetooth wireless link and follows the Bluetooth Human Interface Device (HID) standard. The Wiimote is able to send reports to the
224
L. Gallo
Fig. 2 The input device: a data glove equipped with (a) additional infrared light emitting diodes, (b) a 3-axis accelerometer, (c) a bluetooth module, (d) batteries and switch.
host with a maximum frequency of 100 reports per second (100 Hz). The controller movements can be sensed, over a range of +/- 3g with 10% sensitivity, thanks to a 3-axis linear accelerometer. Another feature, the only one used in the system described in this paper, is the optical sensor at the front of the controller, which is able to track up to four infrared hotspots with a resolution of 1024×768.
2.3 System Configuration One or more Wiimotes can be configured in many ways in order to track the position in space of a data glove equipped with additional IR LEDs. The optical sensor of the Wiimote has a field of view slightly less than 45◦ , and can track infrared sources as far as 4m away, providing two-dimensional information. According to the interaction techniques, more precisely to the degrees of freedom involved in the interaction tasks, at least three different system configurations can be set up. Single Wiimote - Single LED. In this configuration, a single Wiimote is used to track a single infrared source placed on the forefinger of the glove. This is the simplest configuration. However, by using this configuration, the tracking system can provide the glove position with only 2 degrees of freedom. The twodimensional camera of the Wiimote cannot deduce the hand position in space along the z axis in Fig. 3. Nonetheless, this configuration is very robust, and it avoids occupying CPU cycles necessary to execute triangulation algorithms. Double Wiimote - Single LED. In this configuration, two Wiimotes are used to track a single infrared source placed on the forefinger of the glove. The coverage area, in this case, is lower and it varies according to the angle at which the devices are set. With this configuration, the system can also deduce, by means of a
A Glove-Based Interface for 3D Medical Image Visualization
225
Fig. 3 Coverage area of the Wiimote plus data glove interface in a Single Wiimote - Single LED configuration.
triangulation algorithm, the position of the glove on the z axis of Fig. 3, on condition that the IR LED stays in the coverage areas of both the Wiimotes. The main limitations of this configuration are the increased complexity of the interface and the computation load required to execute the triangulation algorithms 100 times per second. Moreover, the requirement to place the two Wiimotes at the correct distance and with the proper angle limits the portability of the system. Single Wiimote - Double LED. In this configuration, a single Wiimote is used to track two infrared sources. This approach allows you to deduce the position of the glove in space as well, but the computation of the position should also consider the orientation data coming from the glove. The main limitation of this configuration is the occlusion problem. The position in which to place the second LED has to be chosen in accordance with the peculiar interaction techniques developed. In greater detail, the interaction tasks that require also the z component of the glove location in space (e.g. zooming or dollying, which usually require a movement of the hand on the z axis) need to be mapped to gestures that allow you to keep both the IR LEDs in the Wiimote’s direct line of sight. It has to be emphasized that in the Wiimote-enhanced whiteboard system described in [10], which also uses the Wiimote camera to track an IR stylus, the user is constrained to stay really close to the screen since the Wiimote camera does not directly track the direct IR beam but its reflection on the projection screen. However, we believe that the “reflection” approach is not suitable for pointing in immersive environments for two main reasons: i) when immersive visualization is provided, often the object is perceived as located in the space in front of the display screen, which makes it awkward for users to interact if they are constrained to stay close to the projection screen; ii) many projection screens used in immersive environments are not as reflective in the infrared band as the standard ones, which makes it impossible for the Wiimote camera to track the IR source. When the pointing is performed in mid-air, the beam angle (β in Fig. 3) of the IR LED used also has to be considered. The wider the beam angle is, the better the
226
L. Gallo
Wiimote camera can track the IR LED when it is not directly oriented toward it (e.g. when pointing at the borders of a large screen display). After considering the aforementioned benefits and limitations of the various approaches, we have chosen to adopt a Single Wiimote - Single LED configuration, equipping the glove with an IR LED with a viewing angle of 60◦.
3 Interaction Techniques Typical manipulation metaphors in virtual environments are based on three main steps: selection, positioning, and orientation [12]. Relative to the medical imaging scenario, to manipulate 3D data users should be able to select an anatomical part, position it in the 3D space and change its orientation opportunely. Hand-Computer interaction techniques allow you to execute these tasks in a natural and so easy to learn way. For example, the orientation of the hand can become the orientation of the object when it has been grabbed. Nonetheless, data gloves do not have buttons on them, so grabbing an object requires associating this action with a particular gesture of the hand. Choosing the appropriate gestures for switching between interaction modalities is a key activity in the design of a glove-based interface. The choice has to keep in consideration both the hardware characteristics of the input device and the peculiar domain in which the interaction tasks take place. The gestures we have chosen for the interface are depicted in Fig. 4. The interface allows you to perform four interaction tasks: • • • •
Rotation - to rotate the 3D object; Pointing - to control the 3D cursor position over the object’s shape; Moving - to control the 3D object position in the viewport; Dollying - to control the distance between the object and the camera.
Fig. 4 Hand gestures - the interface switches between its states by detecting the user’s single hand gestures.
A Glove-Based Interface for 3D Medical Image Visualization
227
Fig. 5 A finite state machine describing the mapping between hand gestures and state transitions of the interface.
The data provided by the Wiimote and by the glove’s sensors are differently interpreted according to the current interaction modality. The finite state machine reported in Fig. 5 depicts the mapping between hand gestures and state transitions of the interface.
3.1 Pointing In order to perform the pointing activity, a user has to point his index finger toward the object while the other fingers are bent (see Fig. 4). Given that the IR LED is placed on the forefinger of the glove, this gesture allows you to preserve a direct line of sight between the Wiimote and the glove. Pointing in 3D space requires users to control a 3D cursor. However, the Wiimote can track the index finger of the glove only in a two-dimensional space. To overcome this limitation, we have adopted the 3D depth-enhanced pointer [7], that consists in a 3D cursor the position of which can be controlled through just two degrees of freedom. By following this approach, a user has only to move his index finger to indicate the pointer display coordinates on the 2D plane of the Wiimote camera. Then, the pointer’s depth is automatically determined from the user’s viewing direction so that the pointer binds to the visible surfaces of the objects. Moreover, in order to enhance the accuracy of pointing and to reduce the effects of hand tremors, positional data provided by the Wiimote are filtered by using the Smoothed Pointing technique [4].
3.2 Moving To move the object a user has to stick out his thumb and index finger while the other fingers are bent (see Fig. 4). Among the possible gestures, we have chosen this particular movement since it allows users to keep steady the index finger, the
228
L. Gallo
only one tracked by the Wiimote camera. The Moving state is the only one that is not directly connected with the Idle state, but it is only connected with the Pointing state. The reason for such a design is that we observed that users frequently switch between pointing and moving tasks. Once an object has been selected, to move it a user has only to move his index finger to indicate the object display coordinates on the 2D plane of the Wiimote camera. Since the user’s thumb is stuck out, the user will control the object and not the pointer, which stays fixed. Bending the thumb fixes the object and switches the control to the cursor position.
3.3 Rotation To rotate the 3D object, a user has to bend all his fingers. Then, the object can be manipulated by rotating the hand (pitch and roll). To perform large rotations, the object must be grabbed, rotated, released and then grabbed again, following a screwdriver rotation metaphor. The center of rotation is fixed by default into the center of the object. However, the user can change it by moving the object on the viewport. The center of the viewport becomes the center of rotations. By alternating rotation and moving tasks, every point in the space can be selected as the center of rotation.
3.4 Dollying To dolly, which means to move the camera toward or away from the object, a user has to stick out all his fingers. Then, by moving her/his hand toward or from the display, the user can modify distance between the camera and object. Again, to perform large camera movements, the user has to enter in the Dolly state, move his hand, then enter in the Idle state, move his hand back, and repeat the operation.
4 Implementation Details The user interface hereafter described has been integrated in MITO (Medical Imaging TOolkit) [9], an open-source, PACS-integrated medical image viewer that is currently under evaluation by the clinicians of the Second Polyclinic of Naples. MITO was developed from scratch by using only open-source libraries and is fully compliant with the DICOM (Digital Imaging and COmmunications in Medicine) standard for image communication and file formats. In greater detail, MITO is dedicated to DICOM images produced by medical equipment (computed tomography, positron emission tomography, magnetic resonance imaging, etc.), which can be transferred from any PACS (Picture Archive and Communication System). With the exception of the Wiimote driver, presently available only for MS Windows OSs, MITO is a platform-independent application, ready for use in every medical facility.
A Glove-Based Interface for 3D Medical Image Visualization
229
Interaction in MITO follows the event-state-action paradigm, so that the control flow is completely determined by the user’s inputs. To integrate a new interaction modality, a designer has to develop an event handler, which is the procedure in charge of associating events to corresponding actions, and a driver for the input device, which is in charge of generating properly formatted events. The state of the interface defines the set of possible events that can be received. For the 3D medical data manipulation tasks we considered, as depicted in Fig. 5, the states are: Idle, Pointing, Rotation, Dollying, and Moving. Since the interface must be near-real-time, a key aspect of the whole process is the event generation. The input device has to provide data with a short, limited lag and at a constant rate. When the VHand data glove and a Wiimote are used, there is a problem in retrieving data from different channels. The data glove communication is based on a synchronous serial communication. The glove continuously (25 samples per second) transmits to the host device a 20 byte packet. The driver must wait for the serial link for the packet delivery. The Wiimote, instead, follows the Bluetooth Human Interface Device (HID) standard. The driver actively polls the HID for data at a rate near 100 polls per second. To merge the data glove and the Wiimote data so as to generate 100 events per second (compared to the 25 of the glove), the driver has been divided into two different threads. The Wiimote thread is in charge of polling the controller to read the infrared LED position data (100 samples per second) and of creating the event. The data glove thread, instead, is in charge of reading the accelerometer and bend sensors data and storing it in a region of memory shared with the Wiimote driver and protected with a semaphore. In this way, the driver generates 100 events per second containing both glove and Wiimote data, only 25 of which will contain new glove sensor data.
5 Conclusions In this paper, an interface suitable for exploring 3D medical data in virtual environments at a distance has been presented. The interface allows you to manipulate 3D reconstructions of anatomical parts and to control the position of a 3D cursor which can move over the object shapes. The interface is portable, not cumbersome and low cost, because it does not require a tracking system but only a data glove plus a Wiimote. To foster its use in medical facilities, the interface has been integrated in an open-source, PACS-integrated medical image viewer. Future work will focus on evaluating the hybrid data glove together with the interaction techniques by comparing them with other low cost interfaces in a real medical scenario.
References 1. Burdea, G.C., Coiffet, P.: Virtual Reality Technology. John Wiley & Sons, Inc., New York (2003) 2. DGTech Engineering Solutions: DG5 VHand 2.0 OEM Technical Datasheet (2007)
230
L. Gallo
3. Fredriksson, J., Ryen, S.B., Fjeld, M.: Real-time 3D hand-computer interaction: optimization and complexity reduction. In: NordiCHI 2008: Proc. of the 5th Nordic conference on Human-computer interaction, pp. 133–141. ACM, New York (2008) 4. Gallo, L., Ciampi, M., Minutolo, A.: Smoothed pointing: a user-friendly technique for precision enhanced remote pointing. In: Proc. of the International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2010, pp. 712–717. IEEE Computer Society, Los Alamitos (2010) 5. Gallo, L., De Pietro, G., Coronato, A., Marra, I.: Toward a natural interface to virtual medical imaging environments. In: Proc. of the working conference on Advanced Visual Interfaces AVI 2008, pp. 429–432. ACM, New York (2008) 6. Gallo, L., De Pietro, G., Marra, I.: 3D interaction with volumetric medical data: experiencing the Wiimote. In: Proc. of the 1st international conference on Ambient media and systems, Ambi-Sys 2008, pp. 1–6. ICST, Brussels (2008) 7. Gallo, L., Minutolo, A., De Pietro, G.: A user interface for VR-ready 3D medical imaging by off-the-shelf input devices. Computers in Biology and Medicine 40(3), 350–358 (2010) 8. Hansen, C., K¨ohn, A., Schlichting, S., Weiler, F., Zidowitz, S., Kleemann, M., Peitgen, H.O.: Intraoperative modification of resection plans for liver surgery. International Journal of Computer Assisted Radiology and Surgery 3(3-4), 291–297 (2008) 9. ICAR-CNR, IBB-CNR: Medical Imaging TOolkit, MITO (2009), http://amico.icar.cnr.it/mito.php 10. Lee, J.C.: Hacking the Nintendo Wii Remote. IEEE Pervasive Computing 7(3), 39–45 (2008) 11. Pamplona, V.F., Fernandes, L.A.F., Prauchner, J.L.: The image-based data glove. In: Proc. of the 10th Symposium on Virtual and Augmented Reality, SVR 2008 (2008) 12. Poupyrev, I., Weghorst, S., Billinghurst, M., Ichikawa, T.: A framework and testbed for studying manipulation techniques for immersive VR. In: Proc. of the ACM symposium on Virtual reality software and technology VRST 1997, pp. 21–28. ACM, New York (1997) 13. Sama, M., Pacella, V., Farella, E., Benini, L., Ricc´o, B.: 3dID: a low-power, low-cost hand motion capture device. In: Proc. of the conference on Design, automation and test in Europe DATE 2006, pp. 136–141. EDAA, Leuven (2006) 14. Satava, R.M.: Medical applications of Virtual Reality. J. of Med. Sys. 19(3), 275–280 (1995) 15. Sturman, D.J., Zeltzer, D.: A survey of glove-based input. IEEE Comput. Graph. Appl. 14(1), 30–39 (1994)
Open Issues in IDS Design for Wireless Biomedical Sensor Networks Luigi Coppolino and Luigi Romano
Abstract. Wearable sensor devices play a key role for the wide adoption of pervasive and continuous healthcare monitoring systems. A first generation of wearable devices was represented by e-textile. A new generation of devices is instead represented by the so called Biomedical Wireless Sensor Networks (BWSN), that is ad-hoc networks of wireless devices deployed on (or in proximity of) the patients, e.g. in the form of sticking plasters, sensing various biomedical parameters and routing such values toward a datawarehousing system. While BWSNs result in a nonintrusive and flexible solution for continuous patient monitoring, they also introduce new challenges in preserving privacy and security of the collected sensitive data. In this paper we present an overview of BWSN related security issues and discuss the main approaches used to cope with them. In particular we analyze how Intrusion Detection Systems designed to protect generic Wireless Sensor Networks must be customized to be useful in the healthcare field. We also propose some modification to the applied IDS to better fit BWSN specific requirements and identify limits of the proposed approach paving the way for future work.
1 Introduction Telemedicine and Telecare represent an invaluable possibility for both cost reduction and finer grain diagnostic and therapeutic actions in the healthcare field [1, 2]. Applications of such new ways of doing medicine range from elderly patients monitoring in their own homes to hospitalized patients monitoring in smart wards [3] and new exciting application scenarios can be imagined due to the advent of special programmable devices aimed at dispensing drugs. Such devices can allow to close the loop by integrating sensing and therapeutic systems, as an example in Luigi Coppolino and Luigi Romano Dipartimento per le Tecnologie(DiT), Universit degli Studi di Napoli ”‘Parthenope”’ Centro Direzionale di Napoli, Isola C4, 80143 Napoli, Italy e-mail: {luigi.coppolino,luigi.romano}@uniparthenope.it G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 231–240. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
232
L. Coppolino and L. Romano
pharmacotherapy this would allow combining drugs and changing doses dynamically with environmental and patient conditions [4]. One of the main factors for the acceptance of telemedicine is the ability of carrying on Measurements in a ”‘natural”’ environment and without interfering with the patient daily activities. A class of sensors that fits for this purpose is that of wearable sensors that are non-invasive devices for the monitoring of biological parameters. Wearable sensors are supported either directly on the human body or on a piece of clothing, and are designed to be worn in comfort enabling prolonged use. A first technological advance in wearable sensors was the adoption of e-textile that is sensors, electrodes, and connections, integrated into fabrics. Another recent innovation is represented by the so called Biomedical Wireless Sensor Networks (BWSN). BWSN are the result of applying Wireless Sensor Networks [5] to healthcare: they consist of a cloud of wireless networked low-power biosensor devices that wirelessly monitor patients’ physiological signals (EEG, ECG, GSR, blood pressure, blood flow, pulse-oxymeter, glucose level, etc.). The introduction of BWSN in telemedicine systems poses a number of challenges for preserving privacy and security of both the sensed data and the patient health. A first objective of this paper is to present a new IDS solution for WSN. We present a ”‘hybrid”’ IDS, developed in the context of the INSPIRE European project, aimed at protecting generic WSN, and that has been used with success in the context of Critical Infrastructures Protection. The second objective of the paper is to analyze how Intrusion Detection System (IDS) solutions designed for generic WSNs, and in particular the one we developed, must be customized in order for them to be usable in BWSN scenarios. We report some of the main issues we encountered while applying an IDS for a WSN to healthcare monitoring in a real environment and highlight the need for more investigations on how to adapt IDS solutions to BWSN applications. The rest of the paper is organized as follows: next section presents background material necessary to better understand BWSN and some interesting work related to privacy and security in BWSN; section 1.3 presents the IDS solution we proposed to protect a WSN; section 1.4 evaluates the results of applying our IDS solution to a BWSN deployed on the field; finally section 1.5 summarizes our conclusions and paves the way for our future work.
2 Background and Related Work 2.1 Biomedical Wireless Sensor Networks BWSN are the result of the convergence among biosensors, wireless communication and networks technologies. One of the most promising applications of BWSN is in smart wards. At present patients in hospital are monitored with different levels of intensity ranging from intermittent (with intervals of few hours), to intensive (every hour), and finally to continuous monitoring in the intensive care unit. The adoption of BWSN would provide a cheap and flexible way to continuously monitor patients. The wireless nature of the sensors has a twofold reason: on one side it allows a simpler deployment of the sensors (typically applied in the form of
Open Issues in IDS Design for Wireless Biomedical Sensor Networks
233
Fig. 1 Typical deployment of BSNs in the context of a smart ward.
sticking plasters); on the other side, it is the only way to provide certain features such as gastrointestinal diseases monitoring that can be obtained with special swallowed pills embedding wireless transceivers containing sensors that can detect enzymes, nucleic acids, intestinal acidity, pressure, contractions of intestinal muscle, and so on. The deployment of the BWSN can be operated following two different approaches: i) sensors are deployed on the patient and collect her vital signs, in this case we refer to the network of sensors with the name of Body Sensor Network (BSN) [7]; ii) the BSN deployed to collect biometric data cooperate with other wireless sensors deployed to collect environmental data necessary to obtain information related to the context where monitoring happens (e.g. environmental temperature, light and so on). Figure 1 shows a typical deployment of a BSN in the context of a smart ward. As the picture shows, a cluster of nodes is deployed on a single person. The cluster is composed of specific purpose sensor nodes and a more general purpose node acting as gateway for the cluster. The ward presents many of such clusters interconnected in an ad-hoc network delivering data to the back-end system through a base station.
2.2 Privacy and Security Issues in Wireless Biological Sensor Networks Many papers have been written regarding privacy and security issues in BWSN [8,9,10]. Mainly security and privacy in BWSN can be considered at three different levels of the system: i) the network; ii) the back-end (server and data repository); and iii) the nodes. As for the network, the wireless nature of communications makes
234
L. Coppolino and L. Romano
particularly easy eavesdropping and/or injecting packets from/into the communication. Typically such issue is coped with the adoption of cryptographic techniques. Unfortunately, because wireless BSN nodes have severely constrained resources, cryptography can be too expensive in terms of system overhead, hence security is often traded-off with system performance and resources usage. This has brought to paying more attention to robust and efficient key management schemes, which serve as the fundamental requirement in encryption and authentication. [10]. Regarding the back-end, because it is a traditional information system, common information security solutions can be applied. Finally attacks can be targeted to the nodes of the BWSN. In [10] such kinds of attacks are categorized in two main classes: i) outsider attacks, and ii) insider attacks. Outsider attacks are carried out by nodes not authenticated as part of the network. The main activity such attackers can bring on is eavesdropping, possibly to crack cryptographic keys. Much more dangerous from a security point of view, are attacks carried out by insider attacks, which are attackers recognized as authorized nodes of the network. Such a kind of attacks can follow the discovery of a cryptographic key or the capture of a legitimate node. The attacker in this case is able to conduct many illegitimate actions: 1) unauthorized access to sensitive data, with a consequential data confidentiality loss; 2) injection of false data packets; 3) selective packets forwarding, that implies legitimate data being dropped; and 4) alteration of legitimate data, possibly leading to incorrect diagnosis and treatment. To cope with insider attacks, the only feasible solution is adopting an IDS to recognize the ongoing attack and possibly stopping it. Many IDS solutions have been proposed for generic WSN and are discussed in next section.
3 An Hybrid Intrusion Detection System for WSNs In this section we present a new IDS solution for WSNs. The presented IDS was designed in the context of the INSPIRE project where it was successfully adopted to protect WSN deployed to monitor Critical Infrastructures. Before going to discuss architecture and functions of the IDS, we need to present the protocol that the protected WSN is intended to work with and the attack model we refer into the following experimental campaign. Multihop protocols There is a great variety in deployed routing protocols for WSNs, in the rest of the paper we consider CTP as the reference protocol as it is a very popular choice and as it is the one executed by the sensors we used in the experimental campaign. CTP is an enhancement of Multihop protocol that use a shortest path first algorithm. It gives priority to routes presenting the lower cost to the base station. This cost can be based on either the hop count to the base station or the link quality estimation (in terms of both receive quality and send quality). The best routes are selected to choose the parent node, which is just the next node toward the base station. Multihop nodes periodically send route update messages with routing information for their neighbours. These route messages contain the expected transmission cost to the
Open Issues in IDS Design for Wireless Biomedical Sensor Networks
235
base station and the link quality for every neighbour node. These messages can be exploited by an attacker who is able to forge packets hence modifying the expected behaviour of the net. Attack model By sending false information an attacker can subvert the routing relationships and the paths that data would take. Popular examples of such attacks include sinkhole attack, sleep-deprivation, replay attack, and wormhole attack [8]. Many of such attacks aim at disrupting the communication in the WSN by consuming system resources (CPU, memory, battery). As an example the sleep deprivation has the intent of forcing transmissions into the network so to reduce the lifetime of the nodes. This kind of attacks have a minimal impact in the context of healthcare since: i) a disruption of the service would immediately result in some alarm triggered (since expected data would not reach the back-end system) and the attacker being discovered; ii) BWSN are typically deployed in environments as smart homes or smart wards where, as an example, dynamics of the attacks are typically much longer than the time to battery recharge. In the rest of the paper we focus instead on the sinkhole attack as it is a launch pad for many other attacks and as it apparently preserves the functionality of the network hence results particularly hard to detect. In the Sinkhole attack the attacker node sends route packets with a low hop count value or other attractive metric values to its adjacent forwarding elements. In this way, the malicious node looks very attractive to the surrounding sensors, which will use the attacker node as an intermediary. As a result, many sensor nodes will route their traffic through the compromised node. The attacker will thus be able to alter the content of the data flow, throw it away, or launch additional attacks (e.g. selective forwarding attack, blackhole attack, and more); An Hybrid Intrusion Detection System for WSNs When designing IDS agents that are to be run on WSN nodes of low capability, the following constraints must be taken into account: low memory footprint, low CPU usage, low network footprint. Current IDS solutions for WSNs can be categorized in two main classes: centralized [12, 13] and distributed [14, 15] solutions. In centralized solutions, sensor nodes feed an IDS agent running on a host connected to the WSN with control data needed for the following detection actions. The agent analyzes the collected data to gain a global view of the network and possibly detect ongoing attacks. Since routing attacks can prevent control packets from reaching the IDS agent, then centralized approaches are typically vulnerable to such attacks. In decentralized solutions, sensor nodes run the logic for detecting attacks. Typically this implies the execution of agreement protocols to allow each node to share its view of the network with a set of neighbors. While distributed solutions are more tolerant, they consume supplementary resources, especially in terms of number of transmissions, and might result in a non consistent view of the network (as decision can be adopted based on a local view of the network). Another typical classification of IDS solutions is based on the kind of analysis adopted to recognize an attack.
236
L. Coppolino and L. Romano
Fig. 2 IDS High Level Architecture.
Two classes of IDSs are available: anomaly based and misuse based. In misuse based IDSs, the detection agent holds a knowledge, in form of attack signatures, related to the kind of malicious activities the attacker may conduct. Then at runtime the detection agent checks the current traffic into the network with respect to its list of signatures. In anomaly based solutions, the detection agent is aware of what is the ”normal” behaviour of the system, typically represented through a mathematical model, and checks the actual behaviour of the system with respect to such a model: any deviation from the expected behaviour is considered as a potential attack. Misuse based solutions are typically more accurate but they are unable to detect the so called zero day attacks. On the other hand, while anomaly based solutions present a higher detection rate, they also result in a higher number of ”false positive” (normal traffic recognized as an attack). We propose a hybrid detection solution where any node runs a simple detection agent which is in charge of recognizing suspicious nodes. Suspicious nodes are inserted temporarily in a black list and an alarm is sent to the central agent. The ultimate decision is finally demanded to the central agent. Fig. 2 shows details related to the IDS architecture supporting our hybrid approach. In particular the IDS Local Agent (LA), running on a sensor node, is made of: i) the Local Packet Monitor that is in charge of analyzing the traffic flowing through the node; ii) the Control Data Collector that gathers measures to be sent to the IDS Central Agent (CA); iii) the Local Detection Engine that is in charge of: a) detecting suspicious activities and rising alerts; b) receiving responses from the CA; and c) performing recovery actions. The Local Detection Engine works by identifying anomalous events with respect to measures taken by the Control Data Collector. The occurrence of specific combinations of such events is taken as a ”weak” detection of an ongoing attack and triggers temporary reaction actions. The usage of an anomaly based approach for local detection might results in a high number of false positives. The usage of a temporary local decision allows mitigating such side effect and especially avoiding the triggering of reactions for intermittent anomalies. The temporary decisions taken by the LA can be made persistent by the CA which is in charge of recognizing attacks by exploiting control data and alarms sent by LAs. In particular, it recognizes attacks based on patterns of features characteristic of
Open Issues in IDS Design for Wireless Biomedical Sensor Networks
237
specific attacks (misuse based detection). When the CA takes its final decision, the base station propagate the decision to the LAs for its enforcement. Specifically, when an alert is triggered by the Local Detection Engine, the LA performs the following actions: 1. 2. 3. 4.
changes the parent, as an example it can switch to the last parent; adds the suspicious node to the black list; sends an alert message to the CA containing the suspicious node ID; waits for a response from the CA; if such a response is missing the LA forces another run starting from point 1 until either a decision from the CA is received or all the neighbours have been tried as parent. In the last case the decision of the LA is made persistent.
The black list can be implemented by adding a flag to the existing neighbours table stored on each node.
4 Testbed Description and Experimental Results The main goal of this paper was to evaluate how to adapt an IDS solution for WSN to BWSN. To this intent we conducted an experimental campaign deploying in a rehabilitation center an implementation of the hybrid IDS solution we described in previous section. The proposed IDS was developed in the context of the INSPIRE project and was aimed at protecting critical infrastructures (such as the power distribution network). Experiments were conducted in a testbed (fig. 3) made of 5 rooms and 18 patients. The wireless sensor network deployed in the ward was composed of an environmental WSN and many BSN, and was connected to the storage and analysis centralized system through an USB programming board (WSN base station). Each patient was equipped with three ECG sensors and a collector sensor that was in charge of transmitting (802.15.4 compliant radio transmitter, multihop routing protocol) the collected ECG data to the WSN base station. The collector sensor was also equipped with an accelerometer and was hence able to act as a motion and position sensor. The environmental sensor network was a CrossBow IRIS mote network: each sensor node was equipped with an 802.15.4 radio transmitter and some environmental sensors (light, temperature, humidity, noise). Experimental campaign: run 1 The campaign duration was of seventy-two hours. Periodically, every half hour, a sinkhole attack was randomly launched from a node of one of the BSNs. Results of the experimental campaign are summarized in Table 1. In details, when the nodes in the WSBN are almost static (e.g. during nighttime) the results were very positive: we had a detection rate of 99% and a false positive (FP) rate of about 1%. During the daytime the results dropped down to 80% and 2,8% respectively. Following analysis confirmed us that the worst results were mainly due to the increased mobility of the nodes in the WSBN. In particular this had a dramatic impact during the visiting hours (the ones characterized by the
238
L. Coppolino and L. Romano
Fig. 3 Experimental campaign testbed Table 1 Average Detection, False Positive (wrt total amount of alarms) rates experienced during the experimental campaign. Testing Period Detection False Positive Nightly 99% 1% 80% 2,8% Daily
highest mobility of the nodes) when the Detection Rate fall-down to only 74% and FP rate grow-up to 3,9%. The number of FP remained low due to the hybrid nature of the IDS as most of the alarms triggered by local agents were not confirmed by the central agent. IDS adjustment From results analysis it also followed that the weaknesses of the proposed solution can be blamed on the IDS central agent because some patterns of features characteristic of specific attacks are heavily based on WSBN topology, for example some patterns are related to the number of neighbors. We attempted a preliminary enhancement to the proposed IDS to better address the characteristics of WSBN. In particular, to mitigate the impact of mobility on intrusion detection, we modified the IDS so to consider, for the detection process, only the data collected from nodes of the WSBN remaining static in the topology during the detection period. Experimental campaign: run 2 After this change we extended the experimental campaign of further 24 hours: when the WSBN nodes were characterized by a high mobility it came out a detection rate
Open Issues in IDS Design for Wireless Biomedical Sensor Networks
239
of 89% and a FP rate of about 1%. Detection rate remained slightly lower due to fewer data being collected, as an example data from highly mobile attacker/attacked node are completely ignored by the IDS. On the other end the false positive rate is considerably reduced.
5 Conclusions and Future Work WSBN are a promising technology for healthcare applications. One of the limiting factors for their adoption is the difficulty in preserving privacy and security of the managed data. While in certain cases it is possible to apply approaches and technologies already wide adopted for computer systems, such as cryptography, in other cases this is not practicable or does not provide satisfactory results, this is the case of intrusion detection systems. In this paper we have first presented an IDS solution for WSN, then we have experienced how intrusion detection techniques specifically developed for WSN work when facing WSBN. We observed poor results due to the high mobility that characterize the monitored system during certain hours of the day. After some modifications to the data analysis process, the IDS showed better performance, nevertheless it still was not suitable for being used in a critical environment as the one described in this paper. A possibility for increasing the IDS performances is allowing the IDS Central Agent to correlate gathered data with observed modifications of the network topology. This would require more complex computations but is still feasible in the proposed hybrid IDS as it would be done by the IDS Central Agent that does not present resource constraints. We plan to work on such possibility and to conduct further experiments in an even larger testbed.
6 Conclusions WSBN are a promising technology for healthcare applications. One of the limiting factors for their adoption is the difficulty in preserving privacy and security of the managed data. While in certain cases it is possible to apply approaches and technologies already wide adopted for computer systems, such as cryptography, in other cases this is not practicable or does not provide satisfactory results, this is the case of intrusion detection systems. In this paper we have first presented an IDS solution for WSN, then we have experienced how intrusion detection techniques specifically developed for WSN work when facing WSBN. We observed poor results due to the high mobility that characterize the monitored system during certain hours of the day. After some modifications to the data analysis process, the IDS showed better performance, nevertheless it still was not suitable for being used in a critical environment as the one described in this paper. A possibility for increasing the IDS performances is allowing the IDS Central Agent to correlate gathered data with observed modifications of the network topology. This would require more complex computations but is still feasible in the proposed hybrid IDS as it would be done by the IDS Central Agent that does not present resource constraints. We plan to work on such possibility and to conduct further experiments in an even larger testbed.
240
L. Coppolino and L. Romano
References 1. Alexander, Q., Xiao, Y., Hu, F.: Telemedicine for pervasive healthcare. In: Mobile Telemedicine: A Computing and Networking Perspective. Auerbach Publications, Taylor & Francis, New York (2008) 2. Choi, Y.B., Krause, J.S., Seo, H., Capitan, K.E., Chung, K.: Telemedicine in the USA: standardization through information management and technical applications. IEEE Communications Magazine 44, 41–48 (2006) 3. Ogawa, M., Togawa, T.: The concept of the home health monitoring. In: Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry. Healthcom 2003, June 6-7, pp. 71–73 (2003) 4. Varshney, U.: Pervasive Healthcare and Wireless Health Monitoring In: Mobile Networks and Applications, pp. 113–127 (2007) 5. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: a survey. Computer Networks 38(4), 393–422 (2002) 6. ZigBee Alliance, ZigBee Wireless Sensor Applications for Health, Wellness and Fitness (March 2009) 7. Espina, J., Falck, T., M¨ulhens, O.: Network Topologies,Communication Protocols, and Standards. In: Body Sensor Networks, ch. 5, Springer, Heidelberg (2006) 8. Misic, J., Amini, F., Khan, M.: On Security Attacks in Healthcare WSNs implemented on 802.15.4 Beacon Enabled Clusters. In: First International Workshop on Pervasive Digital Healthcare (PerCare) in conjunction with IEEE Percom (2008) 9. Leon, Hipolito, Gracia: Security and Privacy Survey for WSN in e-Health Applications. In: 2009 Electronics, Robotics and Automotive Mechanics Conference (2009) 10. Dimitriou, T., Ioannis, K.: Security Issues in Biomedical Wireless Sensor Networks. In: 1st International Symposium on Applied Sciences in Biomedical and Communication Technologies (2008) 11. Lo, B., Yang, G.Z.: Key technical challenges and current implementations of body sensor networks. In: Proc. 2nd International Workshop on Body Sensor Networks, BSN 2005 (April 2005) 12. Yu, B., Xiao, B.: Detecting selective forwarding attacks in wireless sensor networks. In: 20th International Parallel and Distributed Processing Symposium, IPDPS 2006, April 25-29, p. 8 (2006) 13. Ngai, E.C., Liu, J., Lyu, M.R.: An efficient intruder detection algorithm against sinkhole attacks in wireless sensor networks. Comput. Commun. 30, 2353–2364 (2007) 14. Krontiris, I., Giannetsos, T., Dimitriou, T.: LIDeA: a distributed lightweight intrusion detection architecture for sensor networks. In: Proceedings of the 4th international Conference on Security and Privacy in Communication Networks, SecureComm 2008, Istanbul, Turkey, September 22-25 (2008) 15. da Silva, A.P., Martins, M.H., Rocha, B.P., Loureiro, A.A., Ruiz, L.B., Wong, H.C.: Decentralized intrusion detection in wireless sensor networks. In: Proceedings of the 1st ACM international Workshop on Quality of Service and Security in Wireless and Mobile Networks, Q2SWinet 2005, Montreal, Quebec, Canada, October 13 (2005)
Context-Aware Notifications: A Healthcare System for a Nursing Home Sandra Nava-Munoz, ˜ Alberto L. Moran, ´ and Victoria Meza-Kubo
Abstract. Taking care of an elder with cognitive impairment is not an easy task. The caregiver needs to be aware of what surrounds the elder at all times to be able to prevent, avoid or manage any event that may affect the elder’s quality of life. Based on an observational study to improve our understanding regarding the care practices of a nursing home, we proposed a system for context-aware notifications, aiming at increasing caregivers’ awareness about elders’ situations of care. The main feature of the system architecture is that it considers both the Situation of Care of Elders and the Situation of Caregivers to setup notifications. The results of an initial evaluation in a nursing home are promising. During the evaluation period, caregivers perceived an increase in their awareness of the situations of care and a safer working environment regarding the provided quality of care.
1 Introduction According to the latest estimates of the Alzheimer’s Disease International Society, there are currently 30 million people with dementia worldwide, and that number will increase to over 100 million by 2050 [1]. This is significant not only due to the number of people affected by the disease, but also due to the number of people required to care for these patients. A caregiver is someone who takes care of the basic needs of patients and monitors their daily living [2]. As the deterioration and dependency of the elderly increases, the complexity of the tasks for their care also Sandra Nava-Munoz ˜ Facultad de Ingenier´ıa, UABC, Ensenada, Mexico ´ and with Facultad de Ingenier´ıa, UASLP, San Luis Potos´ı, Mexico ´ Alberto L. Moran ´ Facultad de Ciencias, UABC, Ensenada, Mexico ´ Victoria Meza-Kubo Facultad de Ingenier´ıa, UASLP, San Luis Potos´ı, Mexico ´ e-mail: {snava,alberto.moran,mmeza}@uabc.edu.mx G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 241–250. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
242
S. Nava-Munoz, ˜ A.L. Moran, ´ and V. Meza-Kubo
increases [3], and generate more stress on the caregiver [4]. Caregivers must be alert to everything around the elder, in order to prevent an accident or to act quickly as a consequence of any situation of care (SC). Otherwise the elder’s quality of life may be affected. However, delegating this responsibility to the caregiver without additional help may be critical and exhausting for her, due to her attending multiple patients and performing multiple chores (e.g. provision of care, housekeeping, personal care, etc.) at the place of care (e.g. nursing home, hospital, or private home). Therefore, this paper proposes to use computer technology to enhance caregiver’s awareness level, by means of notifications regarding elders’ SC. The remaining sections of the paper are organized as follows. Sections 2 and 3 briefly describe related work and the methodology followed in this work, respectively. Section 4 describes a case study which enabled us to develop an initial understanding regarding the main aspects of the elders care process in the nursing home. Section 5 presents the design of the proposed notification architecture. Section 6 describes and discusses the process followed to evaluate this architecture, as well as the obtained results. Finally, section 7 concludes the paper and establishes some lines of future work.
2 Technologies to Support the Care of Elders The literature reports several technological developments for elder’s healthcare. These systems inform caregivers about the elder’s health status, activity or risk situations. These technologies focus on a variety of settings, including elders in hospitals, nursing homes or private homes, and under different types of caregivers, including professional, formal or informal caregivers (according to the classification of Verginadis [5]). For instance Jaichandar et al. [6] proposes the use of alarms in hospitals, which are sent to a caregiver’s mobile device (beeper), indicating that an elder has been in the same position in bed for a very time, so that caregivers could quickly change his position. Corchado and colleagues [7] developed a geriatric intelligent environment that integrates multi-agent systems, mobile devices, RFID and Wi-Fi technologies to facilitate the management and control of nursing homes for the elderly. Finally, Chung-Chih et al. [8] proposes the use of lights, music and mobile phones to indicate a warning about possible elder risks, allowing caregivers to timely prevent them. As in these examples, notification technologies share two broad features: i) they include mechanisms to capture contextual information to determine the elder’s care needs, and ii) mechanisms for notification delivery to alert the caregiver of the occurred event. In this paper, we argue for the need of specialized mechanisms, which in addition to the two previous features, allow identifying the situation surrounding the caregiver during the elder’s SC, and consider it to decide when to deliver and how to present a notification.
3 Methodology With the aim to propose a design of technologies for context-aware notification to support the elderly care, we conducted several studies in a nursing home. Some of
Context-Aware Notifications: A Healthcare System for a Nursing Home
243
the questions that have guided this research include: what?, why?, when?, how? and whom to notify?, as proposed in [9]. We have divided the work into three stages: Step 1. Initial understanding. In this stage, we conducted a case study that aims at: i) understanding the care practices followed, ii) characterizing the elders’ Situations of Care (SC), and, iii) characterizing the Situation of the Caregiver, that is, which activities are performed by the caregiver during the notification of a SC, which allows or limits her to provide care in a appropriately and timely manner. Step 2. Design. Based on the findings of the first stage, we have envisioned a scenario of use and proposed some implications for the design of context aware systems, and based on these, informed the design of initial notification architecture. Step 3. Preliminary evaluation. Finally, we conducted an evaluation study in order to i) determine the caregivers’ perception about the usefulness of a particular implementation of this notification architecture and to ii) analyze the tool’s effect regarding work and information overload on the caregiver.
4 Initial Understanding In order to obtain an initial understanding of the nursing home care practices, we performed a case study to identify and characterize the elder’s SC (who participates, involved activities, location, etc.), as well as to identify and characterize the Situation of the Caregiver during an elder’s SC (activities performed, location and their limitations to cope with the SC, etc). The study consisted of structured interviews to 8 caregivers and direct observation (non-participatory shadowing technique) to 4 of them, while they performed their care activities. The observation lasted 40 hours and 29 minutes.
4.1 Elder Care in a Nursing Home In a nursing home, formal caregivers are the ones responsible for the direct care of the elderly. In these environments, the number of elders receiving attention tends to be greater than that of the caregivers providing care, which increases the complexity of caretaking activities. Although caregivers seek to satisfy all the basic needs of the elderly at any time, there are moments when an elder is not under the supervision of a caregiver. Thus, the possibility of occurrence of a SC exists. Informally defined, a Situation of Care is not only a situation of risk, but also any situation that might affect the elder’s quality of life (e.g. abnormal vital signs).
4.2 Context of an Elder’s Situation of Care Based on the observation and interviews, we identified 18 SC in the nursing home. Examples include Invasion of personal physical space, Invasion of personal food, Discussions and aggressive behavior between patients, and Aggressive behavior towards caregivers. The description of a SC, referred to as the Context of the Situation
244
S. Nava-Munoz, ˜ A.L. Moran, ´ and V. Meza-Kubo
of Care, comprises 5 elements, including: i) identity, actors involved, ii) location, where the situation occurs iii) activity, elder current activity, iv) time, moment at which the situation occurs, and v) SC, the actual event taking place, which may compromise the elder’s quality of life. Further, SC can be seen as having a Life Cycle that comprises three states: i) before, the context that precedes a SC (e.g. an elder is near to the outside door), ii) during, when the SC is actually happening (e.g. the elder is going through the door) and iii) after, the effect of the occurrence of a SC (e.g. the elder has crossed the door and out of the house). These three states allow caregivers to act on a SC, to prevent it (before), attend it (during) or recover from it (after).
4.3 Context of the Situation of the Caregiver We defined a Situation of the Caregiver as the features surrounding a caregiver whenever a SC occurs. The two principal elements of the Context of the Situation of the Caregiver are the activities being performed and her location. The main activities, and average time, performed by caregivers were: Patient’s personal hygiene (18.2%), patient mobility (13.2%), and Provision of food (12.3%). Further, the location and average time of these activities were: bedroom (22%), restroom (14.7%), and courtyard (14.1%). Caregiver activities were classified according to a criticality level. When a caregiver is performing an activity with high criticality, it will be difficult or not advisable for her to leave her current activity to attend a less critical SC. In contrast, an activity with low criticality could be easily interrupted to deal with a SC of higher criticality level. When a SC occurs and a caregiver becomes aware of it, she should take her context into account (criticality of her current activity, criticality of SC, location, etc.) in order to decide whether to attend the SC immediately, or to look for a peer that could take care of it, also based on his own location and activity.
4.4 Limitations for the Provision of Care Based on the main findings of the understanding study, we identify some constraints that caregivers face while performing their work. These limitations include: • Lack of mechanisms to support the caregiver in the perception of a SC. Usually the caregiver becomes aware of a SC by herself and most of the time when it is happening or when it has already happened. • Limited awareness of what surrounds the elderly, in the sense of being able to prevent a SC. In many cases, caregiver’s activities do not allow her to know at all times the elders’ statuses (activity, location, etc.). • Limited awareness of activities and locations of other caregivers on duty. There are no mechanisms in the nursing home that allow a caregiver to know the activities and locations of her peers.
Context-Aware Notifications: A Healthcare System for a Nursing Home
245
These limitations provide an opportunity to propose technology to furnish caregivers with support for the execution of their care activities, in order to increase their awareness on the SC and the quality of the provided care.
5 Design of the Notification Architecture 5.1 Use Scenario Considering the initial understanding, we projected an ensemble of scenarios where caregivers are informed about the SC by means of a context-aware notification mechanism. An example follows: At a nursing home located in Ensenada, Baja California, three caregivers care of twelve elders who have different level of cognitive decline. One morning, Gilberto, an elder with problems to walk, was sitting at a chair in the courtyard. Miranda, the caregiver, was at the nursing station performing her morning documentation activity. Sophie, another caregiver, was with Luisa in her bedroom, helping her with her personal hygiene. Finally, Maria, the other caregiver, was in the hallway helping Juanita in her walking towards the dining room. In an attempt to walk to the other side of the courtyard, Gilberto rises from the chair and takes the walking aid. At that instant, the notification system generates an alarm indicating the SC (e.g. Gilberto, rises from chair, courtyard), and based on the Situation of the Caregivers ((Miranda, documentation, nursing station), (Sophie, patient personal hygiene, patient bedroom), (Maria, patient mobilization, hallway)), sends an audio message to Miranda. She hears it and interrupts her documentation activity to hurry up towards the courtyard where Gilberto is. She barely arrives on time to prevent a fall. Thus, the notification mechanism decided to notify her as she was the one performing the least critical activity.
5.2 Design Implications Based on the projected scenarios, we identified three design implications that could be used to inform the design of a context-aware notification system to help caregivers increasing their awareness regarding the elders’ SC, including: 1. Setup the notification according to the context of the SC. This means to include all necessary features of the elder’s context in the notification message, such as the SC, identity, location, time and activity of the elderly. In this way, the caregiver will have all required information (e.g. to whom, where, when and what is happening) to take an informed and timely decision about how to address it in the most adequate manner. 2. Assign priorities to the notification, and deliver it accordingly. Based on the purposes of a notification identified from the Life Cycle of a SC (prevention, attention and recovery) and on its criticality prioritize notifications, and notify them according to the resulting urgency level.
246
S. Nava-Munoz, ˜ A.L. Moran, ´ and V. Meza-Kubo
3. Personalize notification delivery according to the caregiver’s context. This includes selecting whom to notify and defining an adequate presentation mechanism considering the context of the Situation of the Caregiver; most notably the caregiver’s activity and proximity to where the SC occurs. The definition of the presentation mechanism involves considering the device type (e.g. ambient display, mobile device) and the media that will be used to present the notification (e.g. audio message, text message).
5.3 Notification Architecture The architecture consists of three main modules that are arranged in 3 layers (see Figure 1): Layer 1. Context Acquisition, aims to capture the Context of the elder’s SC and the Context of the Situation of the Caregiver. Context may be acquired through different types of sensors, such as: radio-frequency (RFID’s) [7][8]; infrared [10]; motion [11]; electricity [12][13]; temperature [6][13]; and vision [14][15]; among others. Based on the gathered information, context can also be estimated (e.g. activity or risk situations) by several methods, including: pattern recognition [13][14]; Bayesian networks [16]; and Markov models [15]; among others. Layer 2. Notification Composition, is in charge of constructing and sending the notification. This functionality is achieved by means of i) Message composition, this component creates the notification message based on the context of the SC (see Figure 2 for an example); ii) User selection, this component considers the context of
Fig. 1 Context-Aware Notification Architecture.
Context-Aware Notifications: A Healthcare System for a Nursing Home
247
Fig. 2 Creation of a Notification Message.
the SC and the priority assigned to the notification to decide to which caregivers the notification will be sent; iii) Mechanism selection, determines which of the mechanisms and devices available to the caregiver are the most appropriate to deliver the notification according to the Situation of the Caregiver; and iv) Notification storage and delivery, this component store the notification in a log; and sends notifications to selected devices. Layer 3. Notification Presentation, receives the notification for its presentation through the selected mechanism or device. General notification information should be transformed or adapted for presentation according to the mechanism and device capabilities. Part of our work is to explore diverse notification mechanisms given a certain context. The proposed architecture may include one or more context acquisition mechanisms here presented. The evaluation presented in the next section, implements the acquisition and inference functionality by means of a Wizard of Oz [17]. This allowed us to focus the evaluation on aspects of creating, delivering and presenting the notifications to caregivers.
6 Evaluation of an Audio Notification Mechanism in a Nursing Home In order to know the perception of caregivers regarding the usefulness of a contextaware notification system, and to obtain feedback on the proposed notification concepts and mechanisms, we conducted a preliminary evaluation study.
6.1 Setting The study was conducted at a local nursing home. Participants included 12 elders with various mental diseases (e.g. Alzheimer disease) and different stages of cognitive impairment, and 5 caregivers (three women and two men). For this evaluation we implemented an audio notification mechanism on mobile devices that were carried by the caregiver (radio transmitters). The radio transmitter was used with a headset for hands-free operation. To capture the context of the elderly, we installed six video cameras in different locations of the nursing home. Then, the context was filtered by a researcher (wizard of Oz) to create the notification message, whenever a SC occurred. Notifications were sent by the researcher via audio with a radio transmitter and stored in a written log. Finally, notifications were received on the radio transmitter of the caregiver.
248
S. Nava-Munoz, ˜ A.L. Moran, ´ and V. Meza-Kubo
6.2 Procedure We conducted a three-hour pilot session, so caregivers could familiarize themselves with the audio device. Subsequently, there were 9 sessions to observe the daily activities of patients and caregivers (an average of 3.89 hours per day) for a total duration of 35 hours and 10 minutes. At the end of each session, questionnaires were applied to caregivers regarding the notifications that occurred during that day. Finally, at the end of the study, a structured interview was conducted to obtain information regarding the effectiveness and usefulness of the notification system.
6.3 Results Firstly, the results validate the importance of notifying the identified SC and the importance of including contextual information in the notification message. A second set of results indicates the perception of caregivers on the notification mechanism. 6.3.1
Notified Situations of Care
During the 35 hours of the study a total of 97 notifications were sent. These notifications were classified into 9 types of notification messages. Table 1 shows the notified messages, the relevance of the notification according to caregivers’ perception and the frequency with which they were sent. The questionnaires applied to caregivers allowed us to know the caregivers’ perception of the status of the SC when it was notified and attended. They considered that i) 79% of the notifications were occurring (e.g. patient is standing at the door of another patient’s room), ii) 6% of the notifications had already occurred (e.g. the patient had already taken the drink of another patient, and the caregiver took the glass and verified its contents to ensure that it would not hurt her), and finally, iii) 15% of the notifications prevented the occurrence of a SC (e.g. by notifying that a patient stood up to walk without assistance, a situation ¡fall¿ was prevented). Table 1 Notified Situations of Care Message Patient1 is helping Patient2 to walk outside Patient stood up and is walking outside Patient is drinking someone else’s beverage at the dining room Patient is calling to a caregiver Patient has entered to restroom (sensitive area) Patient1 is at the door of Patient2’s room Patient1 has entered to Patient2’s room Patient is close to main door Patient is close to the outside door
Relevance Frequency High 1 High-Medium 5 High-Medium 1 Medium- High 11 Medium- High 4 High 4 Medium- High 3 High-Medium 39 Medium 29
Context-Aware Notifications: A Healthcare System for a Nursing Home
6.3.2
249
Perception of Usefulness and Work Overload Regarding Notifications
Concerning caregivers’ perception about having a notification mechanism as support for their care activities, they mentioned that they felt that they were ”‘more vigilant”’, ”‘more careful”’, and ”‘more alert”’. They also perceived that notifications allowed them to be more in control of what happened at the residence regarding the SC. A second aspect of the evaluation referred to the caregivers’ perception of information or work overload generated by the notification service. Caregivers considered that there was no information overload, as on average they received 10 notifications per 3.89-hour session (i.e. one notification about every 20 minutes). Furthermore, regarding work overload, caregivers mentioned that they did not perceive an increase in their workload. They even considered that notifications may have reduced the number of times they had to voluntarily stop their activities to go check on the status of elders, thus allowing them to complete their activities with less interruption and in less time.
7 Conclusions and Future Work In this work we propose the concepts of Situation of Care and of Situation of Caregiver, which group contextual information of a situation or event that might affect the quality of life of the elder, and of the situation that surrounds the caregiver at the moment of occurrence of a SC, respectively. Derived from our initial understanding and based on a proposed set of design implications, we proposed a componentbased architecture including components for: i) context acquisition, ii) notification composition and iii) notification presentation. In order to validate our proposal, we conducted a preliminary evaluation in a local nursing home. The results confirm not only the feasibility of providing notifications to caregivers by means of contextaware systems in the nursing home, but also provide evidence of the usefulness and ease of use of our proposal according to the perception of the caregivers. Regarding future work, we consider two main directions: i) Extending the architecture to include mechanisms that enable coordination among caregivers to address the Situations of Care; and ii) Developing and validating a prototype that considers the Situation of the Caregiver for the presentation of notifications.
References 1. A newsletter for Alzheimer’s Disease International, The International Federation of Alzheimer’s Disease and Related Disorders Societies Inc. Global Perspective, ALZ 18(3) (December 2008) 2. Rockwood, K., MacKnight, C.: Understanding dementia, A primer of diagnosis and management. Pottersfield Press Ltd., Halifax (2001) 3. Feinberg, L.: The state of the art: caregiver assessment in practice settings. In: Family Caregiver Alliance, National Center on Caregiving, San Francisco (2002)
250
S. Nava-Munoz, ˜ A.L. Moran, ´ and V. Meza-Kubo
4. Reinhard, S.C., Given, B., Petlick, N.H., Bemis, A.: Supporting Family Caregivers in Providing Care, Cap´ıtulo del libro Patient Safety and Quality, An Evidence-Based Handbook for Nurses, AHRQ Publication No. 08-0043 (2008) 5. Verginadis, Y., Gouvas, P., Bouras, T., Mentzas, G.: Conceptual Modeling of ServiceOriented Programmable Smart Assistive Environments. In: Proc. of the 1st ACM Int. conf. on Pervasive Tech. Related to Assistive Environments PETRA 2008 (2008) 6. Jaichandar, K., Rajesh, M., Kumar, S., Chua, A.: A Semi Autonomous Control and Monitoring System for Bed Sores Prevention. ACM, New York (2007) 7. Corchado, J.M., Bajo, J., Abraham, A.: GerAmi: Improving Healthcare Delivery in Geriatric Residences. IEEE Intelligent Systems (2008) 8. Chung-Chih, L., Ping-Yeh, L., Po-Kuan, L., Guan-Yu, H., Wei-Lun, L., Ren-Guey, L.: A Healthcare Integration System for Disease Assessment and Safety Monitoring of Dementia Patients. IEEE Transactions on Information Technology in Biomedicine 12(5) (September 2008) 9. Nava-Muoz, S., Morn, A.L., Tentori, M.: A Taxonomy of Notification Technology for Assisting the Caregivers of Elders with Cogni-tive Decline. In: 13th International Conference on Human-Computer Interaction (HCI 2009), San Diego, CA, USA, July 19-24, pp. 956–960 (2009) ISBN 978-3-642-02884-7 10. Giroux, S., Bauchet, B., Pigot, H., Lussier-Desrochers, D., Lachappelle, Y.: Pervasive behavior tracking for cognitive assistance. In: Proceedings of the 1st ACM international conference on PErvasive Technologies Related to Assistive Environments, PETRA 2008 (2008) 11. Kanai, H., Tsuruma, G., Nakada, T., Kunifuji, S.: Notification of Dangerous Situation for Elderly People using Visual Cues. In: Proceedings of the 13th international conference on Intelligent User Interfaces, IUI 2008 (2008) 12. Corte, G., Gally, F., Berenquer, M., Mourrain, C., Couturier, P.: Non-invasive monitoring of the activities of daily living of elderly people at home - a pilot study of the usage of domestic appliances. Journal of Telemedicine and Telecare, 1–5 (2008) 13. Kr¨ose, B., van Kasteren, T., Gibson, C., van den Dool, T.: CARE:Context Awareness in Residences for Elderly. In: Proceedings of Sixth International Conference of the International Society for Gerontechnology, ISG 2008 (2008) 14. Martinez-Perez, F.E., Gonzalez-Fraga, J.A., Tentori, M.: Artifacts’ Roaming Beats Recognition for Estimating Care Activities in a Nursing Home. In: 4th International Conference on Pervasive Computing Technologies for Healthcare 2010, Munchen, Germany (2010) ISBN: 978-963-9799-89-9 15. Hoey, J., von Bertoldi, A., Poupart, P., Mihailidis, A.: Assisting Persons with Dementia during Handwashing Using a Partially Observable Markov Decision Process. In: Proceedings of the 5th International Conference on Computer Vision Systems, ICVS 2007 (2007) 16. Philipose, M., Fishkin, K., Perkowitz, M., Patterson, D., Fox, D., Kautz, H., H¨ahnel, D.: Inferring Activities from Interactions with Objects. In: Pervasive Computing IEEE (2004) 17. Kelley, J.F.: An iterative design methodology for user-friendly natural-language office information applications. ACM Transaction son Office Information Systems 2, 26–41 (1984)
Multimodality in Pervasive Environment Marco Anisetti, Valerio Bellandi, Paolo Ceravolo, and Ernesto Damiani
Abstract. Future pervasive environments are expected to immerse users in a consistent world of probes, sensors and actuators. Multimodal interfaces combined with social computing interactions and high-performance networking can foster a new generation of pervasive environments. However, much work is still needed to harness the full potential of multimodal interaction. In this paper we discuss some short-term research goals, including advanced techniques for joining and correlating multiple data flows, each with its own approximations and uncertainty models. Also, we discuss some longer term objectives, like providing users with a mental model of their own multimodal “aura”, enabling them to collaborate with the network infrastructure toward inter-modal correlation of multimodal inputs, much in the same way as the human brain extracts a single self-conscious experience from multiple sensorial data flows.
1 Multimodal Systems: Introduction Humans naturally communicate with each other in a multimodal fashion: we speak, gesture, gaze and move generating a rich, multi-streamed flow of multimedia information. Interacting with machines has traditionally been a much simpler affair, typically generating a single flow of uniform events like the discrete mouse clicks entered sequentially in a graphical user interface. As the global information infrastructure is becoming more and more pervasive, however, digital business transactions are performed in diverse situations, using a variety of mobile devices and across multiple communication channels. Rather than being forced to assume a fixed, pre-set position in front of a machine, users move freely around their work Marco Anisetti, Valerio Bellandi, Paolo Ceravolo, and Ernesto Damiani Universit`a degli Studi di Milano, Dipartimento di Tecnologie dell’informazione e-mail: {marco.anisetti,valerio.bellandi}@unimi.it, {paolo.ceravolo,ernesto.damiani}@unimi.it G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 251–260. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
252
M. Anisetti et al.
environment, starting and monitoring different transactions. Mobile terminals get smaller and lighter, yet at the same time the requirements to be able to interact with pervasive applications keep expanding. Terminal devices are increasingly equipped with sensors, such as video cameras or audio microphones, capable of collecting information from the environment. Our own voice, hands, and whole body, once augmented by sensors (e.g. of pressure or acceleration), become the ultimate mobile multimodal input devices. In this new paradigm of multimodal access to networked media, a much richer context representation regarding both users and the resources they access is made available to applications. The outcome of an interaction may well depend of where is the user when a certain application-related event takes place, where is she headed, or even whether is she sitting at her desk alone or walking accompanied by others. However, multimodal context information is based on sensor data that are hardly ever perfect or certain, especially within unsupervised environments. Still, theories and models proposed so far for representing and managing sensor data are mostly aimed at ensuring semantics-aware interoperability of the sensor infrastructure [21], leaving uncertainty management aside. Lack of attention for uncertainty is partly due to the idea that different modes can confirm - or disprove -each other’s results. Early multimodal systems were based on joint recognition of active modes, such as speech and handwriting, for which there is now a large body of research work. Today, context-aware systems sense and incorporate data about illumination, noise level, location, time, people other than the user, as well as many other pieces of information to adjust their model of the user’s environment. However, the emergence of novel pervasive computing applications, which combine active interaction modes with passive modality channels based on perception, context, environment and ambience [1, 2, 3], has raised new challenges linked to the imprecision and time-dependence of multimedia predicates, and to difficulties in conjoining facts coming from different modal streams. We argue that inherent uncertainty of sensor data cannot be hidden simply by representing them as crisp information. Facts representing users position and posture (e.g., as shown in a video feed), for instance, are semantically very different from traditional database records, being both highly dynamic and uncertain. Designing and implementing systems that take the best advantage of recognition-based modalities of interaction and multi-sensory observations is difficult. In pervasive environments, sensors that can capture data about the user’s physical state or behavior, have been used to gather cues which can help the system perceive users’ status [4, 5, 6]; however, these attempts have only very partially succeeded, due to problems in using different modalities to support or disprove one another. Our lack of understanding of how modalities must be combined in the user interface often leads to flawed understanding of the user’s intent. Short-term research objectives include solving well-known technical issues of traditional multimodal interaction (e.g. the one carried out via camera, speech and pen interfaces). Natural modalities of interaction, such as speech and gestures, rely on recognition-based technologies, which are inherently error prone. In pervasive computing applications, where the capture and the analysis of passive modes are key, errors are much greater. Issues to be handled include managing uncertain
Multimodality in Pervasive Environment
253
and error prone sources having heterogeneous uncertainty models in a robust and consistent way. Today strategies for uncertainty reduction mostly work at the interface level, either by constraining user behavior into less error-prone interaction (i.e. “error reduction by design”), or by exploiting other information coming from other modalities (i.e. “error reduction by cross-modality”). However, it is important to remember that in pervasive environments users may not even be aware that their behavior is monitored by a system. They may also have a wrong understanding of what data the various devices capture, and how it is used. Traditional methods of cross-modality error correction are ill adapted to pervasive computing applications and research is urgently needed to better understand user behavior when faced with errors in this type of application. Mid- and long-term research will address putting the human in the loop of multimodal interaction, not only as a source of sensor data but as an integral part, fully in control of the process of capturing and understanding her own multimodal flows [7]. For instance, users will be supported in forming new mental models of the network and of the networked media they interact with. These mental models will provide them with effortless awareness of what data about them is captured and recorded, and how it is used, enabling the development new user-centric strategies to cope with errors in pervasive computing applications. Like its sensor-less predecessor, the coming Semantic Web of Sensors is focusing on representation formats, for handling sensor context, (e.g. source of information, temporal location, dependencies and so on) rather than on handling uncertainty. The reason is probably that uncertainty (e.g., about locations or motion parameters) is assumed to be have been successfully handled by some sort of low-level layer. In this paper, we argue that this is not the case. Multimodal interaction involves not only uncertain sensor data, but also uncertain inferences based on these data; and the nature of inference uncertainty (e.g. whether it is frequency-based probability or a belief) depends on application semantics, and cannot be handled by any low-level layer. Mathematical models for reasoning with uniformly uncertain information have been successfully applied in several situations, but predicates inferred from heterogeneous sensor data exhibit different types of uncertainty (for example, sensor-based predicates like ”user accompanied by someone” and ”user close to door”) and require hybrid reasoning strategies. An Ontology of Uncertainty, like the one proposed by the W3C’s UR3W-XG incubator group 1 , provides an important first step: a vocabulary to annotate different sources of information with different types of uncertainty. Here we argue that such annotations can be mapped to hybrid reasoning and representation strategies. As a proof of concept of this approach we present, in this paper, a Patient Monitoring System (PMS) implementing a semantics-aware matching strategy, where (i) the semantics of each assertion is represented explicitly as instances of an ontology (ii) the representation of matchings is also based on a specific ontological representation. An uncertainty type is assigned to each relation using SWRL rules, this allows to divide the knowledge base in sub-parts according with the specific uncertainty. 1
http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331/
254
M. Anisetti et al.
The Ontology of Uncertainty, proposed by W3C’s UR3W-XG incubator group, allows an explicit definition of the various types uncertainty. Assigning to each model a reasoner process it then possible to manage different independent sources of information. In our case, the different sources of information derive from sensors producing data of different nature (i.e. video, audio, RFID, positioning sensor).
2 Uncertain Information Representation and Reasoning Issues Uncertainty falls at a meta-level with respect to truth; it arises when the available knowledge does not provide sufficient information to decide if a statement is true or false. Many researchers have tried to classify uncertainty types, starting from Epistemic Uncertainty, that comes from the limited knowledge of the agent that generates the statement , and proceeding to Aleatory Uncertainty, which is intrinsic in statements reporting quantitative observations of the world, including scientific ones.Uncertainty can also be classified according to its source: Objective if it derives from a repeatable observation and Subjective, if the uncertainty in the information is derived from an informal evaluation. Depending on the “coordinates” of the uncertainty to deal with, a certain model (such as Fuzzy theories, Probabilistic Theories and Possibility theory) can be more suitable than another. Once we are resigned to (i) establish a unified vocabulary about uncertainty models, such as gradual truth values and probabilities (ii) use it for building metaassertions describing the type of uncertainty of predicates inferred from sensor data flows, we face the problem of what to do with these meta-assertions, i.e. how to take them into account when composing predicates inferred from sensor data. Unfortunately this is a difficult problem. As stated in [8] probability and possibility theories are not fully compositional with respect to all the logical connectives, without a relevant loss of expressiveness. Some work in this direction has nevertheless been done by imposing restrictions to the expressiveness of the logics. Among the most relevant studies, [13] proposes a definition of possibilistic fuzzy description logics by associating weights, representing degrees of uncertainty, to fuzzy description logic formulas2. Other works like [9, 10] define probabilistic description logics programs by assigning probability degrees to fuzzy description logics programs. In [14] the authors propose a framework for sharing information between three different models of uncertainty, where the fuzzy linguistic truth values are propagated through the three models in a non monotonic way, by exploiting the extension principle [17] and aggregation of linguistic values. . Generally speaking, the choice between uncertainty composition techniques depends on the situation. A shared representation of uncertainty types would indeed facilitate automatic selection of a composition technique. For this reason the URW3-XG1 working group has proposed an ontology (called Ontology of Uncertainty) as a generic meta-model for representing the semantics of the uncertainty in various assertions. This ontology 2
An extension of the Fuzzy Description Logics in the field of Possibility theory has been presented also in [12]
Multimodality in Pervasive Environment
255
is designed for a flexible environment, where different uncertainty types can arise in the same knowledge base, so the selection of the correct model for inference is driven by the information in the ontology. Of course, the URW3-XG incubator group was concerned with representation only, and did not specify how to deal with situations where more than one model is involved in the inference process; this is exactly the open issue we want to address. In the literature we are not aware of hybrid reasoning processes, which can handle the flexible integration of different models. In [16, 14, 15, 11] the interoperability has been studied and defined on a set of selected inference models. Adding new models to the framework can easily result in a revision of the underlying theory.
3 Nursing Home Monitoring Framework In this Section, we shall briefly present an example of handling heterogenous uncertaintiy types in access control. In the logics-based approach to access control policy evaluation [19], evaluating a policy means computing an inference. In other words, for each access request r to a resource R, the policy evaluation system need to evaluate whether the access policy PR implies the request r logically, taking into account all available predicates representing the context in which the access takes place. If this evaluation terminates correctly, access is granted/denied depending on its outcome. Otherwise, a negotiation phase takes place to extract additional information from the user and/or the environment, so that the evaluation can be concluded. As said before, predicates for multimodal pervasive environment can support an extended context of interaction for each user and resource in the environment, modeling their state and spatial-temporal relationships with other users/resources [20]. Here, context is not the static situation of a predefined environment; rather, it is a dynamic part of the process of interacting with a changing environment, composed of mobile users and resources [18]. The environment is monitored by different sensors producing assertions on users and resources that are annotated with the model of uncertainty characterizing the specific sensor. According to the model specified the suitable inference process is activated3 A reconciliation among the different inferences is computed only at the end of process, allowing to evaluate the conjunction of their results. Several techniques can be envisioned to deal with such reconciliation [12]. To provide an illustration of our method we introduce an example related to a healthcare scenario. The general idea is that the access to specific areas of a Nursing Home are allowed only if a doctor and its patient are together. To evaluate this condition the system is organized in three independent inference models. A first model take in inputs video signals and using a probabilistic model evaluate the identity and the role of the persons inside the focus of the camera. A second model, using a fuzzy approach, evaluates if the persons inside the focus of the camera are 3
Note that, in order to avoid the introduction of inconsistencies along the inference process a constraint of safeness may be applied to the flow of inference. This constraint implies that if a model A is using the conclusions of a model B, B cannot use conclusions from A in any premises.
256
M. Anisetti et al.
”near” the door. A third model, taking in input the conclusions of the second model plus the data available in a database evaluate if the persons are currently in the right relation on to the other. The conclusions from the different models are conjunctively evaluated to verify if the access request is implied in the access policy. The access policy describing our example is expressed in assertion 1: Open(door1) → Patient(x) ∧ Doctor(y) ∧ nearTo(x, door1) ∧nearTo(y, door1) ∧ careGiver(y, x).
(1)
The data expressing the request to be evaluated are obtained by the system from different sources and evaluated according to the specific uncertainty model associated to each. These data can be interpreted as independent set of assertions that any source produces and tags with the appropriate uncertainty model. This way the system activates the right inference service, collects all the conclusions and finally tries to enforce the access policy. In our example the sensors in the environment produces three source of knowledge expressed in the assertions 2, 3, and 4. → hasUncertainty(Sentence1, 0.8) → saidBy(Sentence1, camera1) → saidAbout(Sentence1, eyes distance) → uncertaintyModel(Sentence1, Probability) Sentence1[→ user1] → hasUncertainty(Sentence2, 0.9) → saidBy(Sentence2, camera1) → saidAbout(Sentence2, iris) → uncertaintyModel(Sentence2, Probability) Sentence2[→ user1] → hasUncertainty(Sentence3, 0.3) → saidBy(Sentence3, camera1) → saidAbout(Sentence3, eyesdistance) → uncertaintyModel(Sentence3, Probability) Sentence3[→ user3] → hasUncertainty(Sentence4, 0.7) → saidBy(Sentence4, camera1) → saidAbout(Sentence4, iris) → uncertaintyModel(Sentence4, Probability) Sentence4[→ user3]
(2)
Multimodality in Pervasive Environment
257
→ hasUncertainty(Sentence5, 0.9) → saidBy(Sentence5, camera1) → saidAbout(Sentence5, eyesdistance) → uncertaintyModel(Sentence5, Probability) Sentence5[→ user2] → hasUncertainty(Sentence6, 0.9) → saidBy(Sentence6, camera1) → saidAbout(Sentence6, iris) → uncertaintyModel(Sentence6, Probability) Sentence6[→ user2] → hasUncertainty(Sentence7, 0.7)
(3)
→ saidBy(Sentence7, camera1) → saidAbout(Sentence7, nearTo) → uncertaintyModel(Sentence7, Fuzzy) Sentence7[→ nearTo(user1, door1)] → hasUncertainty(Sentence8, 0.8) → saidBy(Sentence8, camera1) → saidAbout(Sentence8, nearTo) → uncertaintyModel(Sentence8, Fuzzy) Sentence8[→ nearTo(user2, door1)] → hasUncertainty(Sentence9, 1)
(4)
→ saidBy(Sentence9, NHDB) → saidAbout(Sentence9, 12h32m23s) → uncertaintyModel(Sentence9, Temporal) Sentence9[→ careGiver(user1, user2)] The first source of knowledge is evaluated by a probabilistic reasoner that aggregates the assertions related to the same entity (users in our example) and according to the parameter under analysis (eyes distance or in our example) computes the union or intersection of the corresponding probabilities. user1 : p(eyesd istance) ∩ p(iris) = 0.8 ∗ 0.9 = 0.72 user2 : p(eyesd istance) ∩ p(iris) = 0.9 ∗ 0.9 = 0.81 user3 : p(eyesd istance) ∩ p(iris) = 0.3 ∗ 0.7 = 0.21
258
M. Anisetti et al.
Finally a threshold is applied on the assertions, saving only the assertions whose confidence is high enough. The second source of knowledge is evaluated by a fuzzy reasoner that is configured to apply a threshold not on the single assertions but on the intersection or union of the assertions. In our example we consider the intersection, computed via a t-conorm (the min operator). (nearTo(user1, door1) ∩ nearTo(user2, door1)) = min(0.7, 0.8) = 0.7 If in the example we take 0.7 as a threshold, the assertion in the knowledge base are the following: → user1
(5)
→ user2
(6)
→ nearTo(user1, door1) → nearTo(user2, door1)
(7) (8)
The third source of knowledge is not generated by a sensor; rather, it is produced by the system querying the Nursing Home database taking impulse from the assertions generated by the sensors. The goal is to expand the knowledge provided by the sensors with the contextual knowledge of the specific domain in order to verify if the access request (detected directly by observing the environment) is implied in the access rules. The type of uncertainty handled at this level is not related to vagueness or incompleteness as in the previous examples; rather it is related to the fact that the validity of the assertions is temporary, and must be verified within the temporal range of the specific situation. Querying the database one obtains the assertions in 9. user1? :→ Doctor(user1)
(9)
user2? :→ Patient(user2)
(10)
In conclusion the system, computing the conjunction all available sources of knowledge, is able to correctly evaluate assertion 11 that represents the context of the access request, directly derived by observing the environment. → Patient(user2) ∧Doctor(user1) ∧ nearTo(user1, door1) ∧nearTo(user2, door1) ∧ careGiver(user1, user2).
(11)
4 Conclusions Like its predecessor, the coming Semantic Web of Sensors promises to be more focused on handling sensor data context, (e.g. source of information, temporal location, dependencies and so on) than on handling uncertainty. The reason is probably that uncertainty (e.g., about locations or motion parameters) is assumed to be successfully handled at the individual sensor level. In this paper, we argue that this is not the case. Multimodal interaction involves not only uncertain sensor data, but
Multimodality in Pervasive Environment
259
also uncertain inferences based on these data. Although mathematical models for reasoning with uncertain information have been successfully applied in several situations, predicates inferred from heterogeneous sensor data exhibit different types of uncertainty (for example, sensor-based predicates like ”user accompanied by someone” and ”user close to door” involve two very different notions of uncertainty). In this paper, we described a simple Use Case. supporting the idea of using W3C Ontology of Uncertainty to write meta-assertions guiding hybrid reasoning strategies on sensor data.
References 1. Abowd, G.D., Mynatt, E.D.: Charting past, present, and future research in ubiquitous computing. ACM Transactions on Computer Human Interaction 7(1), 29–58 (2000) 2. Feki, M.A., Renouard, S., Abdulrazak, B., Chollet, G., Mokhtari, M.: Coupling context awareness and multimodality in smart homes concept. In: Miesenberger, K., Klaus, J., Zagler, W.L., Burger, D. (eds.) ICCHP 2004. LNCS, vol. 3118, pp. 906–913. Springer, Heidelberg (2004) 3. Oikonomopoulos, A., Patras, I., Pantic, M.: Human action recognition with spatiotemporal salient points. IEEE Transactions on Systems, Man and Cybernetics. Part B 36(3), 710–719 (2006) 4. Anisetti, M., Bellandi, V.: Emotional State Inference Using Face Related Features. New Directions in Intelligent Interactive Multimedia Systems and Services 2, 401–411 (2009) 5. Damiani, E., Anisetti, M., Bellandi, V.: Toward Exploiting Location-based and Video Information in Negotiated Access Control Policies. In: International Conference on Information Systems Security, Calcutta, India (2005) 6. Pantic, M.: Affective Computing. In: Pagani, M. (ed.) Encyclopedia of Multimedia Technology and Networking, Hershy, PA, USA, May 2005, vol. 1, pp. 8–14. Idea Group Reference (2005) 7. Damiani, E., Khosla, R., Sethi, I.K.: Intelligent multimedia multi-agent systems: a human-centered approach. Kluwer, Dordrecht (September 2001) 8. Dubois, D., Prade, H.: Can we enforce full compositionality in uncertainty calculi. In: Proc. of the 11th Nat. Conf. on Artificial Intelligence (AAAI 1994), pp. 149–154. AAAI Press/MIT Press (1994) 9. Lukasiewicz, T., Straccia, U.: Description logic programs under probabilistic uncertainty and fuzzy vagueness. In: Mellouli, K. (ed.) ECSQARU 2007. LNCS (LNAI), vol. 4724, pp. 187–198. Springer, Heidelberg (2007) 10. Lukasiewicz, T., Straccia, U.: Uncertainty and vagueness in description logic programs for the semantic web (2007) 11. Antoniou, G., Bikakis, A.: Dr-prolog: A system for defeasible reasoning with rules and ontologies on the semantic web. IEEE Trans. Knowl. Data Eng. 19(2), 233–245 (2007) 12. Bobillo, F., Delgado, M., Gomez-Romero, J.: Extending fuzzy description logics with a possibilistic layer. In: URSW. CEUR Workshop Proceedings, vol. 327, CEUR-WS.org (2007) 13. Dubois, D., Mengin, J., Prade, H.: Possibilistic uncertainty and fuzzy features in description logic. A preliminary discussion. In: Sanchez, E. (ed.) Fuzzy logic and the semantic web, pp. 101–113. Elsevier, Amsterdam (2006), DMenP001 14. Luo, X., Zhang, C., Jennings, N.R.: A hybrid model for sharing information between fuzzy, uncertain and default reasoning models in multi-agent systems. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(4), 401–450 (2002)
260
M. Anisetti et al.
15. Van Melle, W.J.: A domain-independent system that aids in constructing knowledgebased consultation programs. PhD thesis, Stanford, CA, USA (1980) 16. Shortlife, E.H., Buchanan, B.G.: A model of inexact reasoning in medicine. pp. 259–275 (1990) 17. Zadeh, L.A.: Fuzzy sets. Information Control, 338–353 (1965) 18. Coutaz, J., Crowley, J.L., Dobson, S., Garlan, D.: Context is Key. Comm. of the ACM 48(3) (March 2005) 19. Bonatti, P.A., Samarati, P.: Logics for Authorization and Security. In: Logics for Emerging Applications of Databases 2003, pp. 277–323 (2003) 20. Damiani, E., Anisetti, M., Bellandi, V.: Toward Exploiting Location-based and Video Information in Negotiated Access Control Policies. In: International Conference on Information Systems Security (ICISS 2005), India (2005) 21. Sheth, A., Henson, C., Sahoo, S.S.: Semantic Sensor Web. IEEE Internet Computing (2008)
Optimizing the Location Prediction of a Moving Patient to Prevent the Accident Wei-Chun Chang, Ching-Seh Wu, Chih-Chiang Fang, and Ishwar K. Sethi
Abstract. An evolutionary solution for optimal forecasting the movement of a patient is proposed in this paper. As the great changes of society and environment in modern life, high living pressure results in more and more mental disorders. The safety issue regarding the mental illness patients has risen. According to the statistics of patient safety repost system from the Ministry of the interior in Taiwan, the accidents happened in all the patients have over 85% from the schizophrenias during their treatment period. It indicates the potential dangerous of the psychotics while they are in the treatment period. Therefore, to prevent the accidents will need to focus on the understanding of patient movement in order to issue the warning before the accident happened. It is a practical problem in health care. Moving object prediction addresses the problem of locating a moving object correctly. Therefore, how to effectively forecast or predict the location of moving objects can lead to valuable design for real world applications regarding the issues of moving objects. To address the solution for forecasting problem, we applied evolutionary algorithm (EA) as the target algorithm. The algorithm has powerful techniques to helping stochastic search on the fittest solutions for the problems. Hence, the objective of this paper is to devise the methods that apply EA to solve the practical problems. A case study is a good means to broaden the understanding of the methods how EA can solve problems. Moreover, a case analysis to apply the model for the accident prevention of inpatients is presented to identify the potential of solving the practical problem.
1 Introduction Forecasting the movement of moving objects has been the practical problem in several domains. Existing prediction methods in moving objects normally constraints Wei-Chun Chang and Chih-Chiang Fang Department of Information Management, Shu-Te University, Taiwan Ching-Seh Wu and Ishwar K. Sethi Department of Computer Science and Engineering, Oakland University, Michigan, USA G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 261–273. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
262
W.-C. Chang et al.
in different domain environment. Therefore, a general forecasting method utilized effective algorithms can be a good solution for the problem. As the great changes of society and environment in modern life, high living pressure results in more and more mental disorders. The safety issue regarding the mental illness patients has risen. According to the statistics of patient safety repost system from the Ministry of the interior in Taiwan, the accidents happened in all the patients have over 85% from the schizophrenias during their treatment period. It indicates the potential dangerous of the psychotics while they are in the treatment period. The objectives of caring mental illness patients are focusing on the early discovery and prevention that can lead to lower the medical cost and reduce the percentage of accidents. Therefore, to prevent the accidents will need to focus on the understanding of patient movement in order to issue the warning before the accident happened. It is a practical problem in health care. Evolutionary computing has been applied to a wide range of applications; however, most problems have involved searching for optimal solutions in complex but well understood domains. These include solving constraint satisfaction problems, combinatorial optimization, reliability optimization, flow-shop sequencing optimization, job scheduling, machine scheduling, transportation optimization and many others, as summarized by Gen and Runwei [1]. Evolutionary computing (EC) optimizers have been developed as decision support tools so parameters can be changed to observe their effects on solutions (i.e. via objective functions); for instance, in urban transport planning [2]. EC techniques have been applied to the design of electronic systems where the properties of the problem space are well known (e.g. electronics) and have been shown to produce novel solutions [3]. However, the application of EC to domains such as requirements specification and concept design, where the solution is not so well formed, represents a considerable challenge, since assessing the performance of systems which do not yet exist is more difficult. However, if EC can be applied to creating new design ideas then new solutions to a range of problems could be evolved to match performance criteria. The principles of evolution and natural selection were published by Darwin in 1859 [4]. Since then, “Survival of the fittest” has become a common term. Computer scientists have begun to study the possibilities of using it to solve complex problems when conventional algorithms are unsuccessful or inefficient [5, 6]. In order to utilize the principles of natural selection to solve complex problems, one should understand that it takes time to evolve a solution. According to the evolutionary theory, it takes time to accomplish a certain level of evolution in a species; for example, if the evolution of a species takes 10 generations, and a generation lives for 10 years, it will need hundreds of year to evolve the species. Fortunately, researchers do not have to wait a hundred or a thousand years for the evolutionary process to reach a solution. Computational models can simulate the process in seconds over generations. Under these circumstances, evolutionary computation (EC), which mimics Darwin’s principles of natural selection, uses the theory as the key element to design and implement computer-based problem solving systems[7, 8]. The main focus of this research is to address the solution for forecasting the moving object problem. We applied EA as the target algorithm. The algorithm has powerful techniques to helping stochastic search on the fittest solutions for the problems. A case study is a good means
Optimizing the Location Prediction of a Moving Patient to Prevent the Accident
263
to broaden the understanding of the methods how EA can solve problems. Moreover, a case analysis to apply the model for the accident prevention of inpatients is presented to identify the potential of solving the practical problem. Hence, the objective of this paper is to devise the methods that apply EA to solve the practical problems. This raises an interesting question, “Can EA techniques scale up to handle a problem domain with moving objects and alternative fitness assessment than other domains?”
2 Literature Review The goal of this research is to build an evolutionary computing tool for forecasting the movement of a moving object. The tool evolves optimal component combinations of moving object movement. This chapter reviews and analyses the state-ofthe-art in the solution domain of evolutionary computation (EC) and the problem domain of moving object prediction (MOP).
2.1 Moving Object Prediction The prediction of moving objects has been studied in order to solve the real world problem. In [9], the authors argued that current prediction methods in moving objects is constrained by the time factor. Also, many research studies focus on defining the movement of moving objects through mathematical formulas which can only discover the pattern of the objects. They proposed a novel prediction approach, namely The Hybrid Prediction Model, which estimates an object’s future locations based on its pattern information as well as existing motion functions using the object’s recent movements. The paper aims to provide more effective and accurate results for both near and distant time predictive queries. Madhavan and Schlenoff [10] developed a novel framework, named PRIDE (PRediction In Dynamic Environments), to forecast the location of a moving object for autonomous ground vehicles. For autonomous on-road driving application, Schlenoff et al. [11] presented a hierarchical multi-resolution approach for moving object prediction via estimationtheoretic and situation-based probabilistic techniques. The solution enables the planner to make accurate plans in the presence of a dynamic environment. Based on these research studies, the solutions of utilizing the prediction of moving objects are very important for the real world applications.
2.2 The Evolutionary Algorithm In the EC domain, techniques for searching have improved dramatically since they were first developed in the mid-1950s to optimize solutions for a variety of problem domains [6]. Schwefel noted that EC techniques are not “one for all” solutions to all problem domains [12]. It is equally important to examine the weakness of EC techniques and eliminate drawbacks.
264
W.-C. Chang et al.
EC basis The principles of EC theory are based on mimicking Darwin’s theory of natural selection to solve real world problems [13]. Evolutionary algorithms (EAs) have been successful in optimizing the solutions for a variety of domains [1, 5, 14, 15]. The strength of EC techniques comes from the stochastic strategy of search operators. The major components in EC are search operators acting on a population of chromosomes. If a population is a set of possible solutions to a problem, candidates that are selected for the next population will be the ones best fitted to survive the challenges from the environment. This is known as evolution of species by natural selection. EC was developed to solve complex problems, which were not easy to solve by existing algorithms [14]. There are three main branches in EC, genetic algorithms (GAs), evolutionary strategies (ES), and evolutionary programming (EP). Optimization deals with problem domains that have several objective functions to be minimized or maximized, usually subject to equality and/or inequality constraints. Ideally, if no constraint exists, the basic definition of unconstrained and nonlinear optimization is as follows [16]. given
f : ℜn → ℜ,
min f (x)
x∈ℜn
(1)
where f (x) is the objective function; n is the number of variables; and x is a vector of variables; ℜ is real number domain. In real world problems, most optimizations are likely to have constraints. Hence, to complete the definition above, a constraint vector c(x) is added to the optimization definition [17] given
f : ℜn → ℜ, sub ject
min f (x)
(2)
to c(x)
(3)
x∈ℜn
To connect the discussion of the optimization to this research, EC techniques provide stochastic searching techniques aimed at global optimization. Global optimization searches for the best performance of solutions presented in the objective space. A general global optimization problem can be defined as follows. x∈Ω
f∗ (x) → min f (x) sub ject to c(x)
(4) (5)
where f∗ (x) is the global optimization in objective space when determining the minimum of the function f (x); x is a vector of variables which lies in the feasible region Ω , any x in Ω defines a feasible solution in which x should conform to the constraints c(x). A similar definition can also be applied to the maximization of objective functions. In EC, Yao discussed the effect of EA operators concerning the issue of global optimization; and the results of experiments were presented in three models of parallel EAs [18]. The viewpoint of performance and complexity is
Optimizing the Location Prediction of a Moving Patient to Prevent the Accident
265
discussed by applying EC techniques to solve NP-hard problems [19] in respect to approximation solutions. Yao concluded that a tailored EA can be built in a specific domain where the domain knowledge is sufficient. Moreover, he also pointed out the foundation theory of EA has not been fully established yet. It makes the construction of EC-based applications more difficult and complex to achieve. Although global search is the accepted strategy for the evolutionary process, alternative research published by Tavares et al. argued that a new algorithm, called the “infected genes EA (igEA)”, can produce approximate results without loss of diversity during the search process [20]. The particular interest of this research is the special design of search path selection. The search operators focus only on certain genes that have been tagged during the previous search process. The behavioral information of genes is collected as tagging criteria for certain groups of genes. The key to avoiding the local optimum trap is a self-adaptive scheme that dynamically spreads or reduces the infected group. Their approach inspires a concept of gene preservation through the evolutionary process, which could be effective in reusability and computational efficiency issues when component-based representation is applied. Back ¨ et al. reported a full investigation and historical review of EC over special domains as well as specialized mechanisms developed in EC fields [5]. When applying EC techniques to a specific domain, several key components including search space definition, genotype-phenotype mapping (or encoding-decoding scheme), chromosome representation, constraints, fitness functions, genetic operators, and selection policies, need to be tailored with domain knowledge. These design problems have to be considered before applying the EC techniques to a specific domain.
3 The Solution Models 3.1 The Problem Definition To capture a moving object, normally have three tasks to proceed. The first task is to detect and monitor the object. Secondly, forecasting algorithms are applied for the prediction of its movement. After the prediction is accomplished, the planning actions are taken to hit the object. The problem domain of the case study is to seize the behavior of a moving object. The collision percentage and performance time are the criteria to be observed through the experiments. By using the information of the moving object, which detected by suitable hardware, the EA can run by stochastic search on the solution domain in generations. Hence, a suggested solution comes out after the criteria of the system matched. The reason to replace the forecasting algorithm in this project is to testify the functionality of EAs on searching the solution space and provides the alternative solution of targeting a moving object. Since a moving object has dynamic parameters, the assumptions in the models are made for simplicity reason. The moving direction of the object is assumed in constant. The speed parameter is considered in three variables, which are acceleration, deceleration and constant.
266
W.-C. Chang et al.
3.2 The Basis of EA Survivors may not be the best, but the fittest ones to outlive the challenges from environment. This is the evolution of species under the natural circumstance. The EC, a simulating process of biological evolution, is developed to solve complex problems, which are not easy to solve by the existing algorithms, in our practical life. EA is considered to be the powerful solutions for the stochastic search and optimization techniques for complex problems among the three algorithms. Therefore, the paper focuses on the solutions of EA. EA is the simulation model of biological evolution combined with evolution computation. The operations of EA imitate the process of natural selection. The algorithm is shown as follow. Evolutionary Algorithm begin t <-0; //Clean up the generation counter Initialize P(t); //Get the initial population Evaluate P(t); //Evaluate the fitness of each //chromosome by fitness function while (termination condition not met){ P’(t) <- SelectParents P(t); // select a //sub-population for crossover operation Crossover P’(t); //Running crossover operator // perturb the mated population stochastically by setting a mutation rate P"(t) <- Mutate(P(t)&P’(t));//Running mutation for all generated chromosomes Evaluate P"(t); // Evaluate the fitness for new chromosomes P(t)<- Reproduction P"(t);// select the survivors from generated chromosome } //end of while end of Evolutionary Algorithm.
In the algorithm above, selection and reproduction are the operations in EC, and the rest are the operations in EA.
4 The Solution Models Two models are designed to observe the functionality of genetic algorithm. The first model utilizes genetic algorithm on searching the next move of the agent. The searching technique is based on the parameters of the moving object. The distance between agent and object is the performance criterion to be observed for this model. The single agent is replaced by a multi-object agent5 in second model. The instances of agent are adopted as a single chromosome for the algorithm. Collision percentage and task time are the performance criteria and fitness function for the model.
Optimizing the Location Prediction of a Moving Patient to Prevent the Accident
267
Fig. 1 The model diagram of single agent versus a moving object, r is the running distance for the agent in a time unit, The circle with dash line is the solution population. The course line is the prediction for The agent toward the moving object. The dash line represents the moving direction of the object, which is dynamically changed.
Single Agent vs. a Moving Object Evolutionary algorithms are applied on single agent to search its next move that possesses the shortest distance between the agent and moving object. The searching technique is based on the speed and coordinates of the moving object. The model diagram is shown in figure 1. Evolutionary algorithms are applied on the agent to search its next move based on the coordinates of moving object. The representation of chromosome is bit stream, which maps to a pair of coordinates. The mapping rules for the model are as follows. (x, y) = (r cos θ , r sin θ )
(6)
θ = 360 ÷ ϕ
(7)
o
(x, y) is a pair of coordinates after mapping from population, r is the running distance for the agent in a time unit and also the radius of the circle surround the coordinates of the agent, which sets to 1 in this model, θ is the population angle against the circle, ϕ is the divider, which sets to 10 in this model. Based on the parameters above, thirty-six chromosomes are the solution domains for the case. The solution domains can be increased in more accuracy by adjusting the divider ϕ . Four chromosomes are selected as the initial population for EA. The fitness function is the distance between the forecasting coordinates and the coordinates of the moving object. (8) F(x, y) = (x − x )2 + (y − y )2 (x, y) are the coordinates of prediction, (x , y ) are the coordinates of the moving object. The model is created in C++ language. The object-oriented characteristic of C++ helps to build the model in the consideration of objects. The starting coordinates for the agent are (5, 5), and (55, 55) for the moving object. The result is shown
268
W.-C. Chang et al.
Fig. 2 The fitness of predictions for the agent against the moving object, The red line indicates the shortest distance calculated by functions of line and circle in figure 1.
in figure 2. After running ten generations, the algorithm generates the result close to the shortest one. The shortest distance of the model can be calculated from the functions as follows. x−y = 0
(9)
(x − 5) + (y + 5) = 1
(10)
2
2
The first one is the linear function of two coordinates, (5,5) and (55,55). The second one is the circle function with center coordinates (5,5). Multi-objects of agent versus a moving object The chromosome representation and fitness function are changed in the second model. Due to the time constraint, this model has paper specification instead of coding it. The model diagram is shown in figure 3. The prediction zone, which calculates by the speed parameter of a moving object, is divided into N sub-zone as the phenotype of chromosomes. The bit stream of chromosome is ranged between [13] according to the zone division. The length of bit stream is bounded by the rule, 2m−1 < N < 2m − 1
(11)
m is the length for the bit stream. The calculation equations for the boundary of populations, which represent the prediction zone of the moving object in time T , are as follows. Ds = Vd ∗ T D f = Va ∗ T
(12) (13)
Optimizing the Location Prediction of a Moving Patient to Prevent the Accident
269
Fig. 3 Forecasting model of multi-object versus a moving object, Each object is an instance of the Agent. The N sub-zone can be calculated by the speed variables of the moving object. Table 1 The configurations of EA model Description Operator Configuration Description Initial selection Four chromosomes are selected as initial population. Random policy, Crossover Parent selection: roulette wheel approach, with fitness probabilities crossover rate: 0.5, 50% of chromosomes undergo crossover, crossover operator: one cut point, random policy, Mutation Mutation rate: 0.01, operator: random Reproduction Tournament policy for next generation selection to keep the population size constant. The fitness will provide the probability of selection on chromosomes.
Ds is the shortest distance and D f is the farthest distance that the moving object can move in time T , and Vd is the deceleration velocity and Va is the acceleration velocity of the moving object. The fitness function is given as follow. F(ω ) = ω ∗ p + (1 − ω ) ∗ t
(14)
ω is the weighting number to weight collision percentage and time factors, p is the collision percentage of simulation that can be simulated from the probabilities of speed variables of the moving object, t is the time factor which maps running time to domain [21] depended on time criterion. ω is normally used to adjust the weight on p and t. To run the model, table one shows the possible configurations for the operations in EA and evolution.
4.1 Results Analysis The two models are designed to observe the functionality of EA and EC when applied to practical problems. The first model binds to the small size (thirty-six)
270
W.-C. Chang et al.
of population for the simplicity reason. The size of population can be adapted by changing the divider ’?’ in the equation above and the result can be obtained in more accurate. The model provides the conception of running EA and EC on the moving agent side. The result converges to the sub-optimum solution in 10 generations (refers to figure 2). The observations of chromosome representation and fitness function are achieved in the model design. The second model suggests the different chromosome representation and fitness function associated with more environment parameters, which include the speed of the moving object and the agent, the probabilities of speed parameters of a moving object, moving direction on the moving object and the agent, the objects of class agent. The model expands the single parameter into multi-parameters which utilized EA to the challenge of multi-objective optimization problem in the field, which is more practical to the real world problem against the first model.
5 Proposing an Accident Warning System for the Inpatients’ Safety To prevent high accident rate of the inpatients, we proposed a solution for an accident warning system which utilizes the moving object algorithm aforementioned. Due to the time limitation, we only proposed the specification of the system in this paper. Further development of the system is still in progress. The Accident Warning System (AWS) consists of three units with a set of hardware (transponder for the inpatients and transceiver for any object or place that can cause damage to the inpatients) and the system itself (refers to figure 4). The specifications of these units are defined as follow.
Fig. 4 The block diagram of the proposing system to prevent the accident of the inpatients
The Transponder Unit: A mobile facility used to collect the location and other information of an inpatient as the input data to trigger the optimal prediction subsystem within the AWS. The mobile facility (i.e. RFID tag embedded into a wrist belt) can transmit signals. The location information of an inpatient is based on a virtue coordination system within certain indoor area. The location information transmission is the key to the transponder unit. The Transceiver Unit: A signal receiver facility used to receive the signal data from the transponder unit. The signal data received from transponder will then transform into digital signal and fed into computer system as the input of the AWS. The Accident Warning System: A system used to
Optimizing the Location Prediction of a Moving Patient to Prevent the Accident
271
Fig. 5 The architecture of the proposing AWS.
run the optimal algorithm for the moving object prediction (refers to Figure 5). The system receives the signal data from the transceiver unit and triggers the prediction algorithm. The input data will be translated (through component ”Virtue Coordination”) and compared (through component ”DangerObject”) for judging the safety condition of the inpatient. If the location of the inpatient is close to the danger objects or places that may cause any accident, then a warning process will be activated to prevent the accident. The advantage of the AWS is to make the prediction of the inpatients and apply to trigger the prevention process in advance. Two types of forecasting model are applied to support the system. First, a single agent model (refers to sec. 3.3) is applied to forecast the location of an inpatient. If the danger level reaches the threshold (i.e. an inpatient is predicted to approach the danger object), the second model (i.e. locating a reachable health care worker) is activated for the warning process. The main objectives of the system for caring mental illness patients are focusing on the early discovery and prevention that can lead to lower the medical cost and reduce the percentage of accidents. The proposing system aims to provide an early warning process in order to prevent the high accident rate from the schizophrenias during their treatment period. The idea is to forecast the location of inpatient; and if the results lead to indicate the potential dangerous to the inpatients, the warning process (i.e. assign any health care workers who is nearby the targeting inpatient for the prevention) will be activated.
6 Conclusion Through the solution model devised, the observations of the functionality of EA are achieved. To apply the operations of EA on the simple cases of developing a software system for predicting the movement of inpatients is proposed in section 4. The further development of the AWS will be the next work plan from this study. Several observations of this case study are useful to direct the structure design of this
272
W.-C. Chang et al.
research. First of all, to encode a solution of the problem into a chromosome is one of the critical points for applying EA to the problem domain in the moving object prediction. The fitness function is another critical point to deal with the model design associated with the operations of EA. Thirdly, the problem and solution domain analysis can be helpful to define the chromosome representation and fitness function. Hence, one possibility of the result analysis will be the suggestion for building a practical application for the future research regarding the issue in predicting the movement location of mental disorder inpatients.
References 1. Gen, M., Cheng, R.: Genetic algorithms & engineering design. John Wiley & Sons, Inc., Chichester (1996) 2. Feather, M.S., Menzies, T.: Converging on the optimal attainment of requirements. In: Proceedings of the IEEE Joint International Conference on Requirements Engineering, pp. 263–270 (2002) 3. Thompson, A.: Evolutionary design for novel technologies. IEE Half-day Colloquium on Evolutionary Hardware Systems (Ref. No. 1999/033), 4/1 (1999) 4. Darwin, C.R.: On the origin of species by means of natural selection. John Murray, London (1859) 5. B¨ack, T., Hammel, U., Schwefel, H.-P.: Evolutionary Computation: comments on the history and current state. IEEE Transactions on Evolutionary Computation 1, 3–17 (1997) 6. De Jong, A.K., Fogel, D.B., Schwefel, H.-P.: A history of evolutionary computation. In: Evolutionary Computation I, basic algorithms and operators, pp. 40–58. Institute of Physics Publishing, Bristol (2000) 7. Fogel, D.B.: What is evolutionary computation? In: IEEE Spectrum, pp. 26 –32 (2000) 8. Spears, W.M., De Jong, K.A., B¨ack, T., Fogel, D.B., Garis, H.d.: An overview of evolutionary computation. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, Springer, Heidelberg (1993) 9. Jeung, H., Liu, Q., Shen, H.T., Zhou, X.: A Hybrid Prediction Model for Moving Objects. In: The 24th International Conference on Data Engineering, Cancn, Mxico (2008) 10. Madhavan, R., Schlenoff, C.: The Effect of Process Models on Short-term Prediction of Moving Objects for Autonomous Driving. International Journal of Control, Automation, and Systems 3, 509–523 (2005) 11. Schlenoff, C., Madhavan, R., Barbera, T.: A Hierarchical, Multi-Resolutional Moving Object Prediction Approach for Autonomous On-Road Driving. In: Proceeding of the 2004 ICRA, New Orleans, LA, USA (2004) 12. Schwefel, H.-P.: Advantages (and disadvantages) of evolutionary computation over other approach. In: Evolutionary Computation 1, Basic Algorithms and Operators, pp. 20–22. Institute of Physics Publishing, Bristol (2000) 13. Bck, T., Fogel, D.B., Michalewicz, T.: Evolutionary Computation 1, Basic algorithms and operators. Institute of Physics Publishing, Bristol (2000) 14. Fogel, D.B.: Evolutionary Computation: toward a new philosophy of machine intelligence. IEEE Press, Piscataway (1995) 15. Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning. Addison-Welsley, Reading (1989) 16. Dennis Jr., J.E., Schnabel, R.B.: A view of unconstrained optimization. In: Handbooks in operations reserach and management science: Optimisation, vol. 1, pp. 1–72. Elsevier Science Publishers B. V. North-Holland (1989)
Optimizing the Location Prediction of a Moving Patient to Prevent the Accident
273
17. Gill, P.E., Murray, W., Saunders, M.A., Wright, M.H.: Constrained nonlinear programming. In: Handbooks in operations reserach and management science: Optimisation, vol. 1, pp. 171–210. Elsevier Science Publishers B. V. North-Holland (1989) 18. Yao, X.: Global optimisation by evolutionary algorithms. In: Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis (pAs 1997), Aizu-Wakamatsu, Fukushima, JAPAN, pp. 282–291 (1997) 19. Lee, J., Sahni, S., Shragowitz, E.: A hypercube algorithm for the 0/1 knapsack problem. Journal of Parallel & Distributed Computing 5, 438–456 (1988) 20. Tavares, R., Teofilo, A., Silva, P., Rosa, A.C.: Infected genes evolutionary algorithm. In: Proceedings of the 1999 ACM symposium on Applied Computing, San Antonio, TX, pp. 333–338 (1999) 21. Robinson, G., El-Beltagy, M., Keane, A.: Optimisation in mechanical design. In: Evolutionary design by computers, p. 446. Morgan Kaufmann Publishers, Inc., San Francisco (1999)
An MDE Parameterized Transformation for Adaptive User Interfaces Wided Bouchelligua, Nesrine Mezhoudi, Adel Mahfoudhi, Olfa Daassi, and Mourad Abed
1 Introduction The expanding world of new communication and information technologies highlights the diversity of the platforms of interaction (PDA, mobile phone, com-puter, watch, etc). Hence, in 1999 Thevenin brought a new concept: the plasticity of interfaces [16]. The plasticity is defined as the capacity of a user interface to adapt to the context of use while preserving usability. The context of use is de-noted by the triplet user, platform, environment. Several approaches are proposed to make User Interfaces (UI) adapted to the context of use. According to [13], these approaches are classified into four categories: 1) Interfaces translation, 2) Interfaces reverseengineering and migration, 3) approaches based on the markup languages and 4) model-based approach. In our work, we adopt a model-based approach whose advantage is to apply the adaptation to the context of use to the models leading to a strong abstraction. In this paper, we propose an approach that assures the adaptation of the UI to the context of use. More particularly, we are interested in the platform variant. Our approach builds on the concept of transformation parameterized by the context as defined within the framework of the Model Driven Engineering (MDE) [1, 2, 7]. We apply this parameter setting at the level of the transformation of an Abstract User Interface (AUI) in a Concrete User Interface (CUI). In the rest of the paper, this transformation is called AUI2CUI. The parameter of this transformation is the interaction platform whose properties are described in the proposed meta-model. The remainder of this paper is structured as follows. Section 2 presents a state of the Wided Bouchelligua and Mourad Abed LAMIH, University of Valenciennes UMR CNRS 8530, BP: 311-59304 Valenciennes cedex 9, France Adel Mahfoudhi CES, ENIS Soukra Road km 3,5, B.P: w 3038 Sfax Tunisia Nesrine Mezhoudi and Olfa Daassi UTIC, ESST Taha Houssein Road 5, B.P: 56, Bab Menara, 1008 Tunis Tunisia G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 275–286. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
276
W. Bouchelligua et al.
art on the model-based approaches for the adaptation of the UI. Section 3 clarifies the concept of the parameterized transformation in the MDE approach. Section 4 describes the AUI2CUI transformation in terms of meta-models and adaptation rules. Finally section 5 draws the conclusion and provides perspectives to future research.
2 State of the Art In this section, we present only model-based approaches for UI adaptation. In fact, the Cameleon reference framework [6] represents an excellent framework of UI adaptation as it defines four essential stages for the development of the user interfaces in a pervasive environment (Fig. 1): tasks and concepts, abstract in-terface, concrete interface, and final interface. In this area of research, we can quote the TERESA method [12] which supplies a single model which is the tasks, and allows the generation of several interfaces for various platforms. We also quote the Comets (COntext sensitive Multi-target widgETS) [5], which essentially proposes a model for the plastic interactors that are capable of adjusting to the variation of the screen size. The UsiXML (User Interface eXtensible Markup Language) [18, 11] approach represents a UI approach of engineering defined according to the Cameleon reference framework. Such an approach describes a context model consisted of three constituents: user, environment and platform. But practically only the variant platform is considered during the UI generation. The work of Sottet [14, 15] is among the first ones to have joined the Model Driven Engineering and the domain of Human Computer Interaction (HCI). His approach makes it possible to show that the concepts of the MDE could be successfully applied to the UI engineering. Sottet [14] proposes meta-models and models transformations to generate adaptable UI. Sottet defines a general context meta-model. Based on the same approach (MDE), Hachani [9] suggests the introduction of the context of use at the tasks level rather than at the interactors level. This approach is distinguished by the definition of the generic rules appropriate to all the contexts of use. However, both approaches [15] and [9] lack a detailed description
Fig. 1 Cameleon Reference Framework [18].
An MDE Parameterized Transformation for Adaptive User Interfaces
277
Fig. 2 Parameterized transformation [17].
of each constituent of the context of use. 3 MDE Parameterized Transformation Our objective is to handle the adaptation of the UI to the context of use (platform, environment, user). To do so, we build on the parameterized transformations defined by [17]. Vale [17] describes a parameterized transformation within the framework of the model driven engineering for a contextual development. The methodology proposed by [17] (Fig.2) consists in defining the correspondences “match” between the model of the context and the platform Independent Model: PIM (Platform Independent Model) to define a CPIM (Contextual PIM). Then, an ordinary MDE transformation is used to define the CPSM (Contextual Platform Specific Model). The correspondences are assured by a parameter setting of the transformation, whose basic principle is to take into account the properties of the context during the specification of transformation rules (Fig. 2). In the following section, we describe how the parameter setting of the transformation is applied in the framework of AUI2CUI transformation.
3 AUI2CUI Parameterized Transformation 3.1 Overview of AUI2CUI Transformation Based on the principle of contextualisation evoked by [17], we can use “the parameterized transformation” in the field of UI Engineering to consider the context of use. Such a transformation requires a triplet of models ¡Source, Target, Parameter¿. The source model and the target model represent models of functional description (initial models) while the parameter model plays the role of a context model served for the contextualisation of the target model. Fig. 3 clarifies the transformation principle parameterized in our scenario. Besides, the parameter setting of the transformation leads to a strategy generation of UI adaptable to the context of use. We apply this parameter setting at the level of the AUI2CUI transformation. The context of use, taken into consideration, concerns particularly the platform.
278
W. Bouchelligua et al.
Fig. 3 AUI2CUI parameterized transformation.
Fig. 4 Overview of AUI2CUI transformation.
Fig. 4 shows the general framework of the transformation AUI2CUI. This transformation requires the definition of three meta-models: • • • •
an Abstract User Interface Meta-model; a Concrete User Interface Meta-model; a Platform Meta-model. The transformation defined a set of rules that transform a source model into a target model. These rules are set by the characteristics of the platform.
3.2 Abstract User Interface Meta-model In the literature, the abstract user interface is defined in several ways. In fact, Thevenin [16] defines it as a set of interconnected workspaces. A workspace is an abstract structure in which an interaction is organized. The connection between workspaces is made according to links between the tasks and the domain concepts. In [19], the abstract user interface is defined as the logical windows and the presentation units. The interactive tasks and/or the concepts are grouped together in the form of logical windows. In our approach, the Abstract User Interface (AUI) allows the transition of the specification in the modelling of the abstract components of the interface. In order to describe the Abstract User Interface and the Concrete User interface, we have resorted to the static model of interactions [4]. Aiming at applying a modelto-model transformation, we have refined the static model of the interactions of [4]
An MDE Parameterized Transformation for Adaptive User Interfaces
279
Fig. 5 Abstract User Interface Meta-model.
in the form of two meta-models: the AUI and CUI meta-models. Indeed, AUI metamodel which is shown in Fig. 5 describes the hierarchy of the abstract components “UIComponent” corresponding to the logical groups of interactions “UISpace”. The modelling of the abstract interface of an application is then made by one or several “UIGroup” which model containers forming coherent graphic elements (a window in a Windows environment, for example). Each “UIGroup” consists of one or several “UIUnitSuit” and/or “UIUnit”. A “UIUnit” gathers a set of interaction elements which cannot be separated from a logical business point of view of the application (a treatment form for example). It can include one or several “UISubUnit”. The advantage of this modelling is to allow the creation of the application by assembling the existing elements, resulting in a strong reusability. The AUI is expressed by means of the BPMN (Business Process Modeling Notation) [3], through the use of an ad-hoc sub-process.
3.3 Concrete User Interface Meta-model The Concrete User Interface (CUI) is deduced from the Abstract User Interface (AUI) to describe the interface in terms of graphic containers, interactors and navigation objects. It is also expressed through the BPMN notation. The CUI metamodel extended from the static model of the interactions of [4] is presented in Fig. 6. It consists of one or several windows presented in the meta-model by the “UIWindow” class. Besides, the “UIPanel” class allows the modelling of the possible hierarchies of containers. The interactors presented by the “UIField” class of the concrete interface are classified according to their types in three groups: “UIFieldMultimedia”, “UIFielData” and “UIFieldControl”. The CUI is also expressed by means of the BPMN notation [3], through the use of an ad-hoc sub-process.
280
W. Bouchelligua et al.
Fig. 6 Concrete User Interface Meta-model.
3.4 Platform Meta-model Aiming at generating adaptable interfaces, the platform meta-modelling has become a necessity in this work. Although most of the work on plastic UI made adaptation to the platform, the latter remains without a complete and detailed meta-model. The existing approaches only describe it at a high level of abstraction or describe only the display surface of the platform which represents the most used interactional resource in the adaptations made so far. However, the adaptation can be prepared in the presence and absence of the other interaction devices. For example, if we do not have a mouse, we can suggest as a form of adaptation using a vocal interactor where the activation of the actions will be made vocally. In the same line, we note the work CC / PP (Composite Capabilities/Preference Profiles) of W3C [20]. The platform is described in four levels: a) material, b) software, c) navigator and d) network. Fig. 7 presents our platform meta-model. Generally, the platform consists of: • Calculation resources represented in Fig. 7 by the “ComputationalCapacities” class. These resources include not only the material aspect, such as the memory or processor but also the software aspect as the supported operating system; • Interaction resources which are the input-output devices represented in our metamodel by the “InteractionDevices” class. We identify two classes of interaction devices: the input devices (InputDevice class in Fig. 7) and the output devices (OutputDevice class in Fig. 7). Certain devices inherit both classes and are thus input/output devices, such as the touch screen.
An MDE Parameterized Transformation for Adaptive User Interfaces
281
Fig. 7 Platform Meta-model.
3.5 Illustration Example The case study relates to a credit card request by a customer. This application is adapted to the context of interaction in terms of platform. More exactly, the adaptation is made in the variation of the screen size: we pass from a big screen towards a small one. The following scenario illustrates this adaptation on a precise case. Sarra is connected to the site of the bank to launch her request of credit card. She has to log in first of all by introducing her user name and password. Then she has to choose her type (private individual or company). Then, she is asked to choose the type of card which she seeks to obtain before filling in an information form. Fig. 8 shows the abstract user interface for the possession process of the credit card. This interface contains a “UIGroup” associated with the global sub-process “Ask for a credit card”. This “UIGroup” gives access to two “UIUnitSuit” (“Login” and “Determine private individual form”) and “CollapsedUIUnit” (“Select customer type”). As concrete example, in the following figure, we give the tree-based
282
W. Bouchelligua et al.
Fig. 8 Abstract User Interface for the possession process of the credit card (case of a private individual customer).
description of “iPAQ HX2490 Pocket PC” realized by EMF-based editor. The concretisation of the abstract interface taking into account this platform allows the generation of a concrete interface replying to the properties of this platform. For instance, for the value of the screen size (height=“320” width=“240”), we limit ourselves to eight component per window in maximum. Moreover, the choice of the appropriate interactor is related to the inputting devices that exist in the platform. In this case, we have a touch screen (TouchScreen) and a text input device (TextInputDevice). That is why the concretisation in the graphic form is possible. The AUI2CUI transformation consists in transforming an XML (Extensible Markup Language) file source obtained from an abstract user interface. This file is automatically generated by our AbstractUserInterface editor developed thanks to the Graphical Modeling Framework (GMF) tool [8] of Eclipse. The result of transformation is an XML file witch is compliant with the CUI meta-model. Fig. 10 produces the visualization of the CUI with our ConcreteUserInterface editor.
3.6 Adaptation Rules The generation stages of the Concrete User Interface lean strongly on the work of [16] and [11]. The AUI2CUI transformation is implemented in Kermeta language [10] by the following four stages:
An MDE Parameterized Transformation for Adaptive User Interfaces
283
• Creation of the application: creation of the application in the “ConcreteUserInterface” target model by the • AbstractUserInterface of the source model; • Realization of the abstract containers; • Choice of the interactors; • Definition of the navigation. We have developed a set of rules allowing the transformation of an AUI into a CUI. As an illustration, in what follows, we clarify the stage of interactor’s choice. This stage aims at associating the adequate interactor with the abstract component of AUI. Such a choice depends on the properties of the abstract component: its type (Input or Output) its nature (Specify, Select or Turn) and the platform properties (screen size, presence or absence of a keyboard, a mouse, a microphone etc). The
Fig. 9 The tree-based description of “iPAQ HX2490 Pocket PC”.
284
W. Bouchelligua et al.
Fig. 10 The tree-based description of “iPAQ HX2490 Pocket PC”.
UIField class of CUI meta-model presents a generalization of the various forms of interactors. The extract of the following code transforms every abstract component of the “CollapsedUIUnit” type into a “UIField” and appeals to the “UIFieldTreatment” method for the choice of the appropriate interactor. In that case, it is a question of executing the interactor’s choice for an abstract component of the “Specify” nature. We treat different cases as examples: • If we have a keyboard, a mouse and screen, then the “UIField” will be specialized in a “UIFieldIn” and a “UIFieldStatic”. • Otherwise, if we have an input device of the type microphone and visiocasque, the “UIField” will be specialized in a “UIFieldSound”. operation TransformationTreatment(aui:AbstractUserInterface , uiw: UIWindow, p:PlatformModel) is do getAllCollapsedUnit(aui).each {cui| UIFieldTreatment(aui,p, cui,uiw)} end // UIField specification operation UIFieldTreatment( inputmodel : AbstractUserInterface, paramModel: PlatformModel,
An MDE Parameterized Transformation for Adaptive User Interfaces
285
cui : CollapsedUIUnit,uiw : UIWindow) is do // recovery of annotation var lnk : Link lnk := getLinks(inputmodel).detect{c|c.uicomponent.name== cui.name} var nat : Nature init cui.nature var tp : AnnotationType init lnk.uicomponentannotation.type if (MouseExist(paramModel) and ScreenExist(paramModel) and KeyboardExist(paramModel)) or (TouchPadExist (paramModel)and ScreenExist(paramModel) and KeyboardExist (paramModel)) or TouchscreenExist(paramModel) then // Treatment of abstract component of type ‘‘Specify’’ if (nat == Nature.Specify) then createFieldIn(uiw,cui,lnk) // Creation of UIFieldIn createUILabel(uiw,cui,lnk) // Creation of UILabel end rest of code else if VisiocasqueExist(paramModel) and MicroExist(paramModel) then createFieldSound(uiw,cui,lnk) // Creation of UIFieldSound rest of code end
4 Conclusion and Future Work In this paper, we have presented a methodology for the development of the plastic UI of an Information System. To apply “model to model” transformations, we set up two meta-models: Abstract User Interface meta-model and Concrete User Interface meta-model. The characteristic of the interface adaptation to its context of use was our primordial objective. In order to reach this objective, we proposed a platform meta-model describing the material and software constituents of the interaction platform. Encountered by new platforms, a definition of a model for this platform will be enough. So, our transformations rules are generic. We foresee multiple perspectives for our work, which concern a meta-modelling of the environment and the user as well as the integration of the ergonomic properties in our transformations.
References 1. Bzivin, J., Blay, M., Bouzeghoub, M., et al.: Action spcifique CNRS sur l’Ingnierie Dirige par les Modles. Rapport de synthse (2005) 2. B´ezivin, J.: In Search of a Basic Principle for Model-Driven Engineering. Journal Novatica, Special Issues (2004) 3. BPMI, Business Process Modeling Notation version 1.0 (2004), http://www.bpmn.org
286
W. Bouchelligua et al.
4. Brossard, A., Abed, M., Kolski, C.: Context Awareness and Model Driven Engineering: A multi-level Approach for the Development of Interactive Applications in Public Transportation. In: Proceedings of 27 th European Annual Conference on Human DecisionMaking and Manual Control, EAM 2008, Delft, Hollande (2008) 5. Calvary, G., Coutaz, J., Dˆaassi, O., Balme, L., Demeure, A.: Towards a new generation of widgets for supporting software plasticity: the “ comet ”. In: Bastide, R., Palanque, P., Roth, J. (eds.) DSV-IS 2004 and EHCI 2004. LNCS, vol. 3425, pp. 306–323. Springer, Heidelberg (2005) 6. Calvary, G., Coutaz, J., Thevenin, D., et al.: A Unifying Reference Framework for MultiTarget User Interfaces. Interacting with Computers 15(3), 289–308 (2003) 7. Favre, J.-M.: Toward a Basic Theory to Model: Model Driven Engineering. In: Workshop on Software Model Engineering, Wisme 2004, Lisbonne, Portugal (2004) 8. GMF, Graphical Modeling Framework, http://www.eclipse.org/gmf 9. Hachani, S., Dupuy-Chessa, S., Front, A.: Une approche gnrique pour l’adaptation dynamique des IHM au contexte. In: IHM 2009 Conf´erence, Grenoble, France (2009) 10. Kermeta, Kernel Metamodeling Framework, http://www.kermeta.org/ 11. Limbourg, Q., Vanderdonckt, J.: UsiXML: A User Interface Description Language Supporting Multiple Levels of Independence. In: Matera, M., Comai, S. (eds.) Engineering Advanced Web Applications, pp. 325–338. Rinton Press, Paramus (2004) 12. Mori, G., Patern´o, F., Santoro, C.: Tool Support for Designing Nomadic Applications. In: Proceedings of the International Conference on Intelligent User Interfaces, Miami, pp. 141–148 (2003) 13. Samaan, K., Tarpin-Bernard, F.: Task models and Interaction models in a Multiple User Interfaces generation process. In: Proceedings off 3rd International Workshop on TAsk MOdels and DIAgrams for user interface design TAMODIA 2004, Prague, Czeck Republic, November 2004, pp. 137–144. ACM, New York (2004) 14. Sottet, J.S., Calvary, G., Favre, J.M., Coutaz, J., Demeure, A., Balme, L.: Towards Model-Driven Engineering of Plastic User Interfaces. In: Bruel, J.-M. (ed.) MoDELS 2005. LNCS, vol. 3844, pp. 191–200. Springer, Heidelberg (2005) 15. Sottet, J.S., Calvary, G., Favre, J.M.: Mapping Model: A First Step to Ensure Usability for sustaining User Interface Plasticity. In: Proceedings od the MoDELS 2006 Workshop on Model Driven Development of Advanced User Interfaces (2006) 16. Thevenin, D.: Adaptation en Interaction Homme-Machine: Le cas de la Plasticit´e. Th`ese de doctorat, Universit´e Joseph Fourier, Grenoble I, pp. 212 (2001) 17. Vale, S., Hammoudi, S.: Context-aware Model Driven Development by Parameterized Transformation. In: Proceedings of MDISIS (2008) 18. Vanderdonckt, J.: A MDA-Compliant Environment for Developing User Interfaces of ´ Falc˜ao e Cunha, J. (eds.) CAiSE 2005. LNCS, Information Systems. In: Pastor, O., vol. 3520, pp. 16–31. Springer, Heidelberg (2005) 19. Vanderdonckt, J.: Conception assist´ee de la pr´esentation d’une interface homme-machine ergonomique pour une application de gestion hautement interactive. Th`ese de doctorat, Facult´e Notre Dame de la Paix Louvain, Belgique (1997) 20. World Wide Web Consortium, http://www.w3.org
Agent Based MPEG Query Format Middleware for Standardized Multimedia Retrieval Mario D¨oller, G¨unther H¨olbling, and Christine Webersberger
Abstract. The amount of multimedia data has increased tremendously world wide in the private as well as in the public sector. An unified access, navigation and search is hardly realizable in this distributed and heterogenous environment. In this context, the paper introduces an agent based middleware system for search and retrieval in multimedia repositories that bases on the newly standardized MPEG Query Format. The strength of the system can be summarized as follows: Due to the use of a standardized query language an unified access to heterogenous retrieval systems is guaranteed. Related to the use of an agent system, we especially address network segments in domains that show bandwidth characteristics such as low transfer rate or instabilities in connections. Predominantly, this is beneficial for environments dealing with mobile devices.
1 Introduction Data, especially multimedia data, has grown in the past years in its size as well as in the amount. Locally regarded, all of these sets of data are distributed over the world and administrated within a heterogeneous landscape consisting of various multimedia databases and retrieval systems. In general, one is facing several major barriers (e.g. different query languages or different metadata formats) which prevents users experiencing a broad and unified access to different multimedia data collections available today. In this context, the ISO/IEC SC29/WG11 committee (most notably known as MPEG) finalized the MPEG Query Format [6] which standardizes the format of multimedia query requests (by the Input Query Format) and its response messages (by the Output Query Format). Additionally, the standard also provides means (by Query Management Tools) for a standardized service discovery, service selection Mario D¨oller, G¨unther H¨olbling, and Christine Webersberger Chair of Distributed Information Technology, University of Passau, Germany e-mail:
[email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 287–297. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
288
M. D¨oller, G. H¨olbling, and C. Webersberger
and the implementation of aggregated services. Related to this, the novel query format is also especially designed for the use in environments for mobile devices. Let us consider the following motivating example, where a user is on a sightseeing trip equipped with one of the high-end smartphones. As the tourist may be short in time for recovering background information about historical buildings, an asynchronous search request can be triggered (e.g., by the use of a taken photo). In general, the desired information is distributed over multiple data stores and contains a diverse (multimodal) set of data (text, video, image). However, afterwards in the hotel, the tourist has access to a more powerful device or simply has more spare time and is able to fetch and consume the results processed in the meantime of the previous request. Related to the example, this article introduces an agent based multimedia middleware framework which offers due to the use of the MPQF standard a high flexibility in terms of staking together heterogenous retrieval engines on the one side and multimedia clients on the other side. Our middleware strongly benefits from the employment of an agent system in different combined query execution modes such as synchronous, asynchronous, sequential and parallel mode. Beside support of a broad variety of platforms the main advantages are the expandability and reliability of the query process. In addition, benefits for mobile use cases can be summarized as follows: As agents commonly migrate directly to the database, the query can be executed autonomously. Thus the retrieval processes could also be finished in situation where connection problems normally lead to retrieval failures. Additionally, processing of intermediate data and result aggregation is done by the agents. Therefore, only query results have to be transferred back to the client which saves communication costs. The remainder of this paper is organized as follows: Section 2 investigates available multimedia middleware solution and standardization approaches in this direction. Our agent based middleware system is described in section 3. Exprimental results are presented in section 3.5. Finally section 4 concludes this paper.
2 Related Work As mentioned above, multimedia collections and their retrieval systems are spread over the whole globus. The concatenation of those distributed databases and systems is a well known research area (see e.g. [12, 3]). Besides, basic, middleware frameworks such as CORBA1 or RMI2 ,several systems, especially designed for the connection of multiple multimedia database systems, exist (e.g.,MOCHA [13], Instant-X [14] or Network-Integrated Multimedia Middleware (NMM) [10]). 1 2
http://www.omg.org/technology/documents/corba_spec_catalog. htm http://java.sun.com/javase/technologies/core/basic/rmi/ whitepaper/index.jsp
Agent Based MPEG Query Format Middleware
289
Instant-X [14] is a novel OSGi3 -based middleware for a generic multimedia API. It provides standard multimedia components and supports a dynamic deployment. For instance, unavailable components are automatically discovered and loaded in a peer-to-peer network on demand. The Instant-X middleware features a generic API which serves as abstraction layer for supporting the replacement of specific protocol implementations at runtime. The Network-Integrated Multimedia Middleware (NMM) [10] offers a multimedia architecture, which considers the network as an integral part and enables the use of devices distributed across a network. The system is available for multiple platforms and operating systems. Its novelty is the supported access to all kinds of networked and distributed multimedia systems, ranging from embedded and mobile systems to large scale computing clusters. Besides systems that have been developed within research projects, there exist several endeavors in producing a standardized middleware for multimedia processing. For instance, in 1998, the ISO/IEC JTC 1 SC24 working group, concerned with computer graphics and image processing, introduced the PREMO standard [7] (Presentation Environment for Multimedia Objects) which is published under the official reference ISO/IEC 14478. PREMO essentially provides a middleware specification for multimedia programming - more generally it also serves as a reference model for distributed multimedia. A second initiative has been established by the ISO/IEC SC29 WG 11 (known as MPEG) working group in 2004. The goal of the MPEG Multimedia Middleware [8] (M3W) was to improve application portability and interoperability through the specification of a set of APIs dedicated to multimedia. The main aims of this project can be summarized as follows: (1) To allow application software to execute multimedia functions with a minimum knowledge of the inner workings of the multimedia middleware, and (2) to allow the triggering of updates to the multimedia middleware to extend the API. Recent, the JPSearch project [9] of ISO/IEC SC29 WG1 (JPEG) has been instantiated which aims on the development of a standardized framework for distributed image retrieval. A successor of the M3W project has been implemented by the MPEG consortium and is called the MPEG eXtensible Middleware (MXM) [11] project. The result shall be a collection of MPEG and non-MPEG technologies organized in a middleware supporting applications for different purposes (e.g., video/audio decoding, rights management, retrieval and navigation among MM repositories).
3 Agent Based MPQF Middleware Our Agent based MPQF Middleware has been implemented on top of the JADE platform. For mobile devices the JADE Lightweight Extensible Agent Platform (Leap) is used. The following section details on the JADE framework, the components of our system, their relationship and each single step of the querying process. 3
http://www.osgi.org
290
M. D¨oller, G. H¨olbling, and C. Webersberger
3.1 Technical Foundation JADE (Java Agent Development Framework)[1] is a Java based agent framework which fully complies to FIPA4 standards. A JADE platform is an execution environment for agents that may spread over several physical hosts. It consists of several agent containers, that can run on different machines. One of these containers is the main container, which hosts the following organizational information and processes (see figure 1):
Agent
Agent
AMS
Main Container
Container 1 LADT GADT (cache)
Agent
Agent
DF
LADT
IMTP
LADT: Local Agent Description Table GADT: Global Agent Descriptor
GADT
Agent
Container 2 LADT
CT
IMTP
GADT (cache)
CT: Container Table IMTP: Internal Message Transport Protocol
Fig. 1 JADE organizational structure.
• The container table (CT), acts as registry for all container storing the transport addresses and object references. • The Global Agent Descriptor Table (GADT) provides information of all registered agents. • The Agent Management System agent (AMS) features most organizational tasks like registering new agents, deregistering agents upon deletion and taking care of the whole migration process. • The Directory Facilitator agent (DF) provides a registry for services provided by agents and is hosted by the main container. This mechanism is similar to the yellow pages of the Universal Description, Discovery and Integration5 (UDDI) service of the Web Service technology.
3.2 Architecture Figure 2 shows the architecture of our system partitioned into the client-side, the middleware and the server-side (see subsection 3.4). The Client Tool offers a GUI for the creation of different types of MPQF queries and the presentation of the result sets. For the service selection a list of available multimedia databases is presented to the user. At server-side different multimedia 4
5
The Foundation for Intelligent Physical Agents provides a collection of standards to ensure interoperability between different FIPA compliant agent systems – http://www. fipa.org/repository/standardspecs.html http://www.oasis-open.org/committees/uddi-spec/doc/tcspecs. htm
Agent Based MPEG Query Format Middleware Client-side
Middleware Main Container
AMS
DF
Client Container 1 Client Tool
MPQF Query
291
Query Analyser Agent
Query Agent Type
result
Server-side Server Container 1 Interpreter
DB Communication Service
Service Manager Agent
CapDesc
DBComm Interface
JDBC
MMDB 1
Server Container 2 Interpreter
DB Communication Service
CapDesc
Client Container 2 Client Tool
MPQF Query
result
Query Analyser Agent
Query Agent Type
Service Manager Agent
DBComm Interface
JDBC
MMDB 1
JADE Agent Plattform
Fig. 2 Agent based MPQF Middleware Architecture
retrieval systems can be used. For each database an Interpreter maps arriving MPQF queries to the native query dialect used by the underlying retrieval system. It also keeps track of mapping the query results back to MPQF. In our test environment a rudimental Interpreter for Oracle Multimedia6 has been implemented (see section 3.5). Related to the MPQF standard, the Interpreter provides a service capability description containing information such as supported media types or query types for the database. Related to the container structure introduced by JADE, the following main middleware components can be identified. Dotted lines in figure 2 visualize the logical connections between different container. • Main Container: There is only one main container per middleware and agent system instance. By querying the DF all agents in the middleware are enabled to find registered server container and their associated database or the right agent providing the results of an asynchronous query. • Client Container: For each Client Tool a related client container is created. Within this container a QueryAnalyserAgent takes a query from the Client Tool and creates according to the query type a specific QueryAgent. Figure 3 shows the type hierarchy of the different QueryAgents known by the system (see 3.4 for more details). • Server Container: For each database a server container is created. Based on its unique ContainerID (registered in the DF) clients are enabled to find and query the related database. The DBCommunicationService interface is used for the communication between the QueryAgents and the Interpreter on the serverside. Management information for MPQF is retrieved via this interface (e.g. the 6
http://www.oracle.com/technology/documentation/intermedia. html
292
M. D¨oller, G. H¨olbling, and C. Webersberger jade.core.Agent <
>
MpqfMiddlewareAgents
<>
QueryAnalyserAgent
QueryAgents
ServiceManagerAgent
SynQueryAgent
AsynQueryAgent
FetchResultAgent
ManagementQueryAgent
ParHelperAgent
Fig. 3 Agent Class Hierarchy
Service Capability Descriptions) by the ServiceManagerAgent. Moreover this agent is responsible for MPQF service discovery and the mapping of JADE container IDs to database IDs used in MPQF queries and viceversa. By introducing the DBCommInterface different techniques (e.g.: RMI, CORBA) can be used for the communication between the DBCommunicationService and the Interpreter.
3.3 Multimedia Service Registration and Detection The registration of new multimedia services at the middleware is realized by the following steps. First the service provides a service capability description. Subsequently an interpreter, responsible for translating a MPQF request into the target query language, is instantiated. Finally, a Jade server container is started which handles the communication between the new multimedia service (represented by a ServiceManagementAgent agent) and the rest of the system. Every container and multimedia service is identified by an unique ID (containerID and serviceID respectively). The detection of services within the agent system is accomplished by a ManagementQueryAgent agent. This type of agent is initialized by the QueryAnalyzerAgent if a MPQF based management request is received. Then, the agent forwards the request to all registered ServiceManagementAgent agents verifying a potential intersection among desired and available capabilities. In case of a matching description the services are picked up into the result list which is forwarded to the user.
3.4 Processing Strategies As illustrated before hand, one of the advantages of an agent based system is its capability of migration among network peers. Related to this, several processing strategies can be distinguished. First of all the MPQF standard provides means for sending search requests with a synchronous or asynchronous behavior. The main difference is whether the client tool is blocked and has to wait for the query result
Agent Based MPEG Query Format Middleware
293
6. initiates ParHelper generation 3. returns mpqfID
AMS Asyn Query Agent
4. returns mpqfID 2. initializes Client Tool
1. Asynch request 8. Fetch request
Query Analyser Agent
1
13. return result list 12. aggregate result list
2
6
1
2
6
Par MMDB Helper 3 Agent2
5. request Container IDs 9
DF
7. register at DF
10. Request ParHelper Agent IDs
Fetch Result Agent
9. initializes
Par MMDB Helper 1 Agent1
9
Par MMDB Helper 4 Agent3
11. receive result list
Fig. 4 Workflow of an asynchronous request with parallel execution
(synchronous) or decides to retrieve the result set at a later point in time (asynchronous). Next, related to performance issues, one can differ between sequential and parallel processing. In a sequential scenario, one agent migrates from host to host and triggers the search requests. In contrast to that, during a parallel processing multiple agents are spread out gathering the results from individual hosts. In the following, the workflow within our system is explained exemplarily on the basis of an asynchronous parallel retrieval. 3.4.1
Asynchronous Parallel Retrieval
The following scenario (see figure 4) details the processing of asynchronous MPQF requests. We assume that the MPQF request is valid, has set its immediateResult attribute to false (asynchronous) and selected a set of services where the query should be evaluated. 1.
At the beginning a asynchronous MPQF request is created and transmitted to a QueryAnalyserAgent agent. 2. This agent analyzes the query and extracts all necessary management information such as request type (synchronous, asynchronous), the timeout attribute and the ServiceIDs of the target services. In this scenario, as the immediateResult-attribute is set to false, a AsynQueryAgent agent is established. 3+4. After the instantiation of the agent, an unique mpqfID with an empty MPQF message is created which is forwarded (via the QueryAnalyserAgent agent) to the client. This mpqfID value serves as reference for fetching the final result set of the query at a later point in time. 5. Next, the AsynQueryAgent agent contacts the DF component (see section 3.1) in order to receive the respective server ContainerIDs for the ServiceIDs.
294
M. D¨oller, G. H¨olbling, and C. Webersberger
6. In the parallel scenario, the AsynQueryAgent agent establishes with the help of the AMS component as many parallel helper agents (ParHelperAgentX in our example) as the number of services that need to be addressed. The helper agents receive all necessary information (MPQF query, ServiceID, etc.) in order to work properly. 7. Then, every helper agent migrates to the assigned service and executes the given MPQF query. In order to be detected later on, the helper agents need to register themselves at the DF component. The individual helper agents wait at the service hosts as long as the given timeout does not expire and they are not fetched. In case of a fetch operation, the following steps are accomplished (see figure 4 steps 8 - 13). 8.
At a given point in time a client is interested in the result of its previous asynchronous request. For this purpose, a MPQF fetch query is created containing the assigned mpqfID and forwarded to the QueryAnalyserAgent agent 9. The MPQF fetch query is analyzed and a respective FetchResultAgent agent is initialized. 10. The FetchResultAgent agent contacts the DF component in order to receive the IDs of all ParHelperAgent agents that have been responsible for executing the query with the given ID. 11. Then, successively all ParHelperAgent agents are addressed and their result set is transferred via an FIPA-ACL (Agent Communication Language) communication to the FetchResultAgent. The individual result sets are aggregated by the algorithm specified in subsection 3.4.2. 12+13. Finally, after receiving and aggregating all result sets the final outcome is forwarded (via the QueryAnalyserAgent agent) to the client tool. 3.4.2
Result Aggregation
In general, merging results from different sources plays an important role in distributed information retrieval. Result aggregation is necessary when several results from different locations need to be merged to one single result. Current approaches (see for instance [2]) can be classified according the following characteristics: The degree of overlap [15] among the involved database systems; The degree of interaction (they can be either cooperative/integrated or uncooperative/isolated [4]); and the use of additional information such as analysis of training data. A first analysis which result aggregation algorithm are feasible in combination with MPQF has been given in [5]. Related to this our approach uses the Round Robin approach [2] which merges the individual result items by iterating over all result sets and selecting the top element successively.
Agent Based MPEG Query Format Middleware
295
Table 1 Test of sequential and the parallel processing strategies (time measured in ms) overall analysis helper QAS seq. 5303.5 par. 2715.1
25.3 25.2
0 875
DB
47.4 4614.2 69.7 1346.3
migration result aggr. 223.6 0
30 47.6
3.5 Evaluation For evaluating our middleware platform a test system with three multimedia databases has been set up featuring the Oracle Multimedia7 retrieval system. Oracle Multimedia provides basic CBR (Content-Based-Retrieval) functionality on the basis of four low level features (global color, local color, texture and shape). For mapping a MPQF query request to Oracle specific CBR calls, an MPQF-to-OracleMultimedia Interpreter has been developed. Currently, the interpreter is only aware of executing MPQF based QueryByMedia requests where an example image is received and the most similar images are retrieved. Each database was installed on its own server with an associated Interpreter component. All server were connected to the same LAN. Based on the test setup the sequential and the parallel processing strategies were evaluated (see Table 1). The presented tests use the synchronous strategy and are in common steps fully applicable for the asynchronous types. For each strategy the time spent on the individual process steps (see process description in section 3.4) has been measured. Table 1 shows the average times over 10 test passes for each process step in milliseconds. The overall time of the retrieval process is shown in column overall. The column analysis represents the time used for analyzing the query by the QueryAnalyserAgent and the Helper column contains the time taken for creating the ParHelperAgents. The QAS column specifies the time for QueryAgent setup and the DB column states the querying times. Note, in the sequential mode this is the sum of all querying times and in the parallel mode it is the maximum querying time of the involved databases. Finally, the migration column shows the time for the whole migration process and the result aggr. column stands for result aggregation. From table 1 one can derive, that the overhead produced by our middleware, especially the time for the agent migration (223,6 ms) lies in an acceptable range with respect to the advantages of an agent based approach and the overall time (2715,5 ms vs. 5303,3 ms). Furthermore, as expected, the different processing strategies heavily differ in the time needed by querying the databases in the parallel (1346.3 ms) and in the sequential (4614,2 ms) strategy. The time for creating the ParHelperAgents (875 ms) and the migration times of the SynQueryAgent (223,6 ms) depend on the used strategy. Differences in common times like analysis and result aggr. belong to different working loads on the server. 7
http://www.oracle.com/technology/products/multimedia/index. html
296
M. D¨oller, G. H¨olbling, and C. Webersberger
4 Summarization This paper presented an agent based multimedia middleware system that relies on the newly developed MPEG Query Format and addresses especially network segments that show low transfer rates and unreliable network conditions. Related to the use of a standardized query format, interoperability among heterogeneous retrieval systems is realized. Furthermore, the adoption of an agent based system supports the integration of environments addressing mobile devices. The system features the discovery and selection of retrieval systems and the aggregation of multiple result sets during search. In addition, synchronous and asynchronous search modes have been investigated. Especially, the second is valuable for mobile applications such as tourism guides etc.
References 1. Bellifemine, F., Caire, G., Greenwood, D.: Developing multi-agent systems with JADE. John Wiley & Sons, Chichester (2007) 2. Berretti, S., Bimbo, A.D., Pala, P.: Merging results of distributed image libraries. In: Proceedings of the 2003 International Conference on Multimedia and Expo (ICME 2003), Baltimore, Maryland, USA, pp. 33–36 (2003) 3. Coulson, G.: A configurable multimedia middleware platform. IEEE Multimedia 6, 62– 76 (1999) 4. Craswell, N., Hawking, D., Thistlewaite, P.B.: Merging results from isolated search engines. In: Proceedings of the Australasian Database Conference, pp. 189–200 (1999) 5. D¨oller, M., Bauer, K., Kosch, H., Gruhne, M.: Standardized Multimedia Retrieval based on Web Service technologies and the MPEG Query Format. Journal of Digital Information 6(4), 315–331 (2008) 6. D¨oller, M., Tous, R., Gruhne, M., Yoon, K., Sano, M., Burnett, I.S.: The MPEG Query Format: On the way to unify the access to Multimedia Retrieval Systems. IEEE Multimedia 15(4), 82–95 (2008) 7. Duke, D.J., Herman, I., Marshall, M.S.: REMO: a framework for multimedia middleware: specification, and java binding, 1st edn. Springer, Heidelberg (1999) ISBN: 3-54066720-2. 8. ISO/IEC JTC1/SC29/WG11. MPEG Multimedia Middleware Requirements v.2.0, N6835 (October 2004), http://www.chiariglione.org/ 9. Leong, M.K., Chang, W., Houchin, S., Dufaux, F.: JPSearch – Part 1: System framework and components. ISO/IEC JTC1/SC29 WG1 TR 24800-1:2007 (2007) 10. Lohse, M., Winter, F., Repplinger, M., Slusallek, P.: Network-Integrated Multimedia Middleware (NMM). In: Proceedings of ACM Multimedia, Vancouver, Canada, October 2008. IEEE Computer Society, Los Alamitos (2008) 11. MPEG. Requirements for MPEG eXtensible Middleware (MXM). ISO/IEC JTC1/SC29 WG11 N9893, Archamps, FR (2008) ¨ 12. Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Prentice Hall, Englewood Cliffs (1999) ISBN 0-13-659707-6
Agent Based MPEG Query Format Middleware
297
13. Rodryguez-Martynez, M., Roussopoulos, N.: Mocha: a self-extensible database middleware system for distributed data sources. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD 2000), Dallas, TX, USA, pp. 213–224 (2000) 14. Schmidt, H., Elsholz, J.-P., Hauck, F.J.: Instant-X: a component-based middleware architecture for a generic multimedia API. In: Proceedings of the ACM/IFIP/USENIX Middleware, Leuven, Belgium, pp. 90–92. ACM, New York (2008) 15. Wu, S., Crestani, F.: Shadow document methods of results merging. In: Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004), Nicosia, Cyprus, pp. 1067–1072 (2004)
Query Result Aggregation in Distributed Multimedia Databases Christian Vilsmaier, David Coquil, Florian Stegmaier, Mario D¨oller, Lionel Brunie, and Harald Kosch
Abstract. Merging the responses of multiple heterogeneous databases responding to the same query is referred to as the collection fusion problem. As the responding databases use their own search measures to compute the results, the returned result sets are usually incomparable. To solve this problem in the environment of distributed multimedia databases, we propose in this paper a new algorithm that uses the three steps of normalization, merging and filtering.
1 Introduction Recent years have seen a tremendous increase in the amount of multimedia data and in parallel on the number of multimedia datastores; both are expected to increase even more in the future. In view of this upcoming multimedia data deluge, the need for efficient methods for querying these data is very high. In particular, users should be provided with query interfaces for transparently querying sets of heterogeneous and distributed multimedia datastores. Enabling this is one of the goal of our ongoing project of developing the Architecture for Interoperable Retrieval (AIR) middleware. To address the problem of the typical heterogeneity of multimedia metadata formats, AIR uses the metadata-agnostic MPEG Query Format (MPQF) query language. To realize the transparent multimedia information retrieval vision described above, several issues must be addressed. Among these, we focus in this paper on the Christian Vilsmaier and Lionel Brunie Institut National des Sciences Appliques de Lyon, LIRIS UMR CNRS 5205, Bat. Blaise Pascal, F-69621 Villeurbanne CEDEX, France e-mail: [email protected] David Coquil, Florian Stegmaier, Mario D¨oller, and Harald Kosch Chair of Distributed Information Systems, University of Passau, Innstrasse 43, Passau, Germany e-mail: [email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 299–308. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
300
C. Vilsmaier et al.
problem of query results aggregation. Indeed, to hide the complexity of the underlying set of repositories to the users, the results returned by the individual datastores should be aggregated into a single consistent result list. In this paper, we propose an algorithm that performs the successive steps that are required to compute this integrated list. If we assume that we are dealing with typical multimedia similaritybased queries, each item part of a single result list will be associated with a similarity score, called confidence. To make the confidences returned by heterogeneous datastores comparable, they must first be normalized. A merging technique can then be applied, which either chooses a confidence, that should be taken for items that show up in multiple result sets, or calculates a new confidence before it ranks the items in a new generated global result set. Accordingly after this step, a filter carries out a re-ranking on the global result set, using a global search measure. This re-ranked global result set can finally be sent back to the client. This article is organized as follows. In section 2 and 3, we present the framework of our study, respectively MPQF and the AIR middleware. Section 4 discusses related work. Our algorithm is described in section 5. Section 6 describes the experiments that we made to analyze its performance. Section 7 concludes the paper and presents planned future work.
2 The MPEG Query Format The MPEG Query Format (MPQF)1 [1] became an international standard in early 2009. It specifies a format for the interaction of multimedia clients and multimedia retrieval systems (MMRS). In detail, the standard defines the message format for multimedia requests (e.g. Query by Example or Query By Text) to heterogeneous MMRS and the message format for their responses. Furthermore, a management part provides features such as service discovery (service is a synonym for MMRS) and service capability description. Figure 1 shows the individual message types and a possible usage scenario, in which a content provider undertakes the distribution and aggregation of MPQF messages to individual MMRS. MPQF can deal with any XML based multimedia metadata format. This versatility allows groups of users to define their own metadata sets (e.g. as folksonomies and informal tags) while retaining interoperability in the vital search interface.
3 AIR So far, multimedia information retrieval basically remains an open issue [2]. One reason is the large variety of available MMRS, which soon leads to heterogeneous and distributed retrieval environments. The AIR [3] framework is designed as a middleware architecture with the main purpose to operate within these heterogeneous and distributed MMRS using MPQF as a query language. To ensure a broad 1
http://www.mpegqueryformat.org
Query Result Aggregation in Distributed Multimedia Databases
301
Fig. 1 Possible scenario for the use of the MPEG Query Format
(a) Local query processing
(b) Distributed query processing
Fig. 2 AIR: Supported query processing strategies
coverage of heterogeneity issues, it should encompass the following two different query processing strategies in the future, see Figure 2. Figure 2 (a) demonstrates the local query processing. Here, the query can be understood by every connected database, in this case image search engines, as a whole. The heterogeneity issue in this case is founded in the diversity on the one hand of the used metadata formats (e.g., MPEG-7 vs. Dublin Core) and on the other hand of the supported query languages (e.g., SQL/MM [4] vs. XQuery [5]). In order to overcome the metadata interoperability issue [6], AIR makes use of the JPSearch2 Transformation Rules [7]. These rules define syntactic mappings on the basis of the JPSearch core metadata schema (as a pivot metadata format) and any other XML based metadata format. In contrast to this XML based approach, the system will be extended by an ontology based metadata transformation approach [8] issued by the 2
http://www.jpsearch.org
302
C. Vilsmaier et al.
W3C Media Annotations Working Group3. MPQF interpreter located at the actual retrieval engines perform the transformation of a MPQF query into the underlying query language. Here, result aggregation is a major obstacle and has to deal with the following issues: duplicate elimination, (re-)ranking and (multimodal) data fusion. In contrast to the local query processing, AIR is also able to deal with distributed query processing on the basis of a global data set as illustrated in Figure 2 (b). In this case, the connected databases may be completely different from each other with respect to data storage and supported query languages, but describe a global semantically linked data set. The query language issue is solved just like in the local query processing by the use of MPQF interpreter. To enable efficient retrieval, AIR must behave like a federated database system. The main goal in this case besides result aggregation is optimizing the execution of interpretable segments of the initial query, in other words finding the best query execution plan. Current work is focusing on the local query processing strategy.
4 Related Work This section presents a categorization of normalization and merging techniques related to our proposal. Categories that were important for the work presented in this article are developed in figure 3 and figure 4. Let us first make some hypotheses to further characterize the multimedia query result merging problem. First, several types of configurations are possible for the component databases of the distributed system with respect to the stored data: they can be identical, overlapping, or disjoint. For our research, the different databases are assumed to be overlapping. This corresponds to the most general case in AIR’s local query processing mode (see Figure 2 (a)). Then, to guide the design of our merging proposal, we must decide on how we interpret identical items that appear in several individual result sets (duplicates). One possibility is to consider these multiple occurrences as hints of the high relevance of the duplicates. However, following [9], we argue that the validity of this hypothesis depends on the degree of overlapping between the database. As we want to be as generic as possible, and propose a solution that does not require the middleware to know of the contents of the databases beforehand, we reject the assumption. On the other hand, duplicates can be used as an anchor point to correlate the involved result sets by enabling a comparison between the confidences output by different databases for the same item. As explained above, multimedia query results merging starts with a normalization phase. Among related work of this kind based on the above assumptions, [10] proposes three possible ways to correlate the confidences produced by heterogeneous databases. They propose either to download the whole files and to compare them, or to obtain a core statistic from every database and compare them, or a combination of both techniques. In the case of multimedia content, neither downloading, nor a combination of downloading and core statistics (see [11]) is feasible. Indeed, the generated network traffic would be unacceptable. Therefore only a statistical approach 3
http://www.w3.org/2008/WebVideo/Annotations/
Query Result Aggregation in Distributed Multimedia Databases
303
Fig. 3 Categorization of the Normalization Algorithms
can be applied. [12] defined three desirable qualities for normalization algorithms: shift invariance, scale invariance and outlier insensitivity. If a normalization scheme is shift or scale invariant this means that it is insensitive to its input being shifted by an additive constant or being scaled by a multiplicative constant. Outlier insensitivity means that the normalization is not sensitive to the confidence of a single result item, and therefore that adding an outlier to a result set does not crucially change the normalized confidence for the other result items. To verify these properties, we rejected the idea found in some normalization approaches such as [13] to include a learning phase, as it would damage the scalability of the approach. Figure 4 outlines categories of result aggregation techniques found in the literature. As with normalization algorithms, we distinguish between databases containing identical, overlapping and disjoint data sets. The desired output of these algorithms is a ranked document list sorted in descending order of the relevance. This ranking is computed using a merging techniques that again uses the rank or the confidence assigned by the individual input databases. Several techniques have been proposed for the merging problem ( [14], [15], [16], [9]). We concluded from our analysis of this related work that the majority of them take advantage of additional information. As we wanted to design the most generic method, we concentrated on merging techniques that take exclusively advantage of the confidence. Thus, we finally selected the CombMAX algorithm [14]. This algorithm is further explained in section 5.
5 Description of the Proposed Algorithm Based on our analysis of the state of the art, we designed an algorithm that uses a simple mathematical regression algorithm as normalization algorithm. To merge
304
C. Vilsmaier et al.
Fig. 4 Categorization of the Merging Algorithms
results, we used the CombMAX algorithm [14], which ranks all the items depending on their confidence, whereas items that would show up multiple times in the ranking are inserted just once on their highest possible rank. We chose CombMAX after comparing with a multiplicity of other algorithms introduced by [14], [15], [16] and [9]. He still seemed to be the most promising and implementable algorithm for this first prototype. Certainly other algorithms could be used as well. The steps normalizing and merging are present in the most common approaches. We propose to add a filtering step to the process. This third and last step in our algorithm provokes a reordering of the received result items. It is not a part of any algorithm known to us. The filter reorders the result items with respect to a global search criterion that is available for every result item. In our implementation, we used the color histogram as it was available for our whole test data. We expected the top results to be rated best, but we did not expect the top results to be in the optimal order. That is why we de added the additional filtering step, to optimize the ranking of the top ranked items. The meta code for the proposed approach is shown in Algorithm 1. Algorithm 1. search(searchItem, resultSets) Require: searchItem, resultSets Ensure: searchItem = null, |resultSets| > 0 1: for i = 0 to i = |resultSet| − 1 do 2: (rs1, rs2) ← getResultSetsW MCI(resultSets) 3: regress(rs1, rs2) 4: end for 5: items ← getNBestResultItems(resultSets, 30) 6: f ilter(items, searchItem) 7: return items
The Algorithm starts with a loop. After processing the confidences, the result sets are normalized. The loop is carried out n-1 times, n being the number of result sets. In the first iteration the method getResultSetWMCI(resultSets) returns the two result sets that have the most common items, noted rs1 and rs2 . Accordingly the items of rs1 and rs2 are normalized by the regression method regress(rs1 , rs2 ). In every following iteration, the method getResultSetWMCI(resultSets) returns two result sets. These result sets are the ones with the most common items, with the constraint that one of them (noted rs1 ) has already been normalized and the other
Query Result Aggregation in Distributed Multimedia Databases
305
Fig. 5 Example run of the algorithm
(noted rs2 ) has not. Using this strategy, the method regress(rs1 , rs2 ) normalizes the items that are referenced by the result set rs2 . By repeating the process, all result sets are eventually normalized and become comparable using their confidence values. After having processed the loop, getNBestResultItems (resultSets, 30) merges the items of all the result sets that have the highest confidence using CombMAX. These
306
C. Vilsmaier et al.
items are stored in a ranked list and after their reordering by filter(items, searchItem) returned to the client. An example run of the algorithm is outlined in figure 5.
6 Experimental Evaluation The general setting of our experimental evaluation is the simulation of a set of MPQF queries aimed at a distributed system composed of heterogeneous (in terms of query processing engines) MPEG-7 databases. A graphical user interface module was implemented to provide a convenient test environment. It offers the possibility to define the following test case variables: image that has to be searched for, average percentage of coverage between databases, number of databases; number of items per database. Another module, the test case generator, assembles a test case. To this end, it uses a predefined set of MPEG-7 files and input variables from the GUI. The result is stored in an SQLite database. This database is used to simulate the behavior of several MPEG-7 databases. We implemented different similarity functions on the scalableColorType attribute using the different p norms. They are used variantly to simulate a heterogenous environment. The controller module is the central instance of the implementation. It is the conducting unit between the GUI, the test case generator and the implementation of the algorithm. The algorithm itself is implemented in a further module. It contains the algorithm controller, the instance that ensures the proper execution of the algorithm as described in section 5. The algorithm controller receives the input file and sends a request for this file to every participating database. After receiving the responses from every single database, these results were processed three times differently, each test run including a different set of steps of the complete process described in the previous section. For the first test run, the result sets were merged without preprocessing. We used CombMAX as a merging technique. For the second test run, the result sets were preprocessed and then merged using the regression analysis technique. For the third test run, we added a filtering step. This step was applied to the best 30 results as identified at the previous step. The best 30 items were then sent back. These results of the different test runs were then compared. The algorithm was developed and tested on a MacBook1,1. It had a processor speed of 2 GHz and a memory of 1 GB with an L2-Cache of 2 MB. The Bus speed was 667 MHz and the used operating system was Mac OS X 10.5.6 running on the kernel version Darwin 9.6.0. The allocation of the variables for the experimentations were chosen as follows and for the following reasons. The majority of users of research engines do not consult more 30 results, therefore the algorithm does not return more than 30 items. We used a test set of 7200 pictures whereas every 72 of these pictures were similar. Having implemented 3 different metrics, we chose to use a set of 3 databases, each with a different metric to ensure a certain level of heterogeneity. After a few test runs, we established 30% of average coverage between the databases as precondition for our algorithm and the number of at least 800 items per database.
Query Result Aggregation in Distributed Multimedia Databases
307
Table 1 Amelioration of the result in percentage Performed Steps \ Picture
car
Normalization & Merging
+5%
-25%
-4%
Normalization & Merging & Filtering +7%
+12%
+5%
duck
burger
We chose three different pictures with very different characteristics (in terms of color, intensity and shape). For each of them we made 20 tests and obtained the results shown in Table 1. We obtained an improvement in all test cases for the complete test run. A search for the picture of the one colored item had a 7% increased confidence if the algorithm was used compared to the simple merging. The picture of a two colored item had an increasing of 12 % and the picture of a multicolored item had an increasing of 5%. An interesting aspect is that in two cases the confidence after the normalization decreased compared to the simple merging case. This was however an expected behavior caused by the reordering of the regression analysis that often places new false hits over an old hit or new hits, that are placed too low in the result set. This actually shows the interest of the filter, which reorders these mistakes and can profit from the additional hits.
7 Conclusion In this paper, we have presented an algorithm aimed at the problem of query results merging for distributed and heterogeneous multimedia databases. This algorithm was designed in the framework of the MPQF query language and of the AIR middleware. After a study of related work in the literature, we have described our proposed algorithm and then the experiments that we conducted to evaluate it, which show significant improvement of the results. The algorithm proposed in the paper has the advantage of being able to deal with large amounts of multimedia files as it works without a learning phase, reference statistics, downloading parts of the files, and prior knowledge of the component databases. Moreover, it is compatible with the international standard MPEG 7 and MPQF. We showed that there is a potential in algorithms that are following our suggested methodology of normalization, merging and filtering. Planned future work includes a more detailed study of the runtime and implementing specific measures for dealing with outliers in the result sets with the goal of producing the final version of the merging algorithm that will be implemented in the AIR middleware. In addition, we are currently looking at the problem of tuning heterogeneous similaritybased multimedia query engines to improve the consistency of their results; our work on result merging will provide a solid foundation for this new direction.
308
C. Vilsmaier et al.
References 1. D¨oller, M., Tous, R., Gruhne, M., Yoon, K., Sano, M., Burnett, I.S.: The MPEG Query Format: On the way to unify the access to multimedia retrieval systems. IEEE Multimedia 15(4), 82–95 (2008) 2. Hanjalic, A., Lienhart, R., Ma, W.-Y., Smith, J.R.: The holy grail of multimedia information retrieval: So close or yet so far away? Proceedings of the IEEE 96, 541–547 (2008) 3. Stegmaier, F., D¨oller, M., Kosch, H., Hutter, A., Riegel, T.: AIR: Architecture for interoperable retrieval on distributed and heterogeneous multimedia repositories. In: 11th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2010 (to appear, 2010) 4. Melton, J., Eisenberg, A.: SQL multimedia application packages (SQL/MM). ACM SIGMOD Record 30, 97–102 (2001) 5. Boag, S., Chamberlin, D., Fern´andez, M.F., Florescu, D., Robie, J., Sim´eon, J.: XQuery 1.0: An XML query language (2007), http://www.w3.org/TR/xquery/ 6. Smith, J.R.: The search for interoperability. IEEE Multimedia 15(3), 84–87 (2008) 7. D¨oller, M., Stegmaier, F., Kosch, H., Tous, R., Delgado, J.: Standardized interoperable image retrieval. In: ACM Symposium on Applied Computing (SAC) 2010, pp. 881–887 (2010) 8. Stegmaier, F., Bailer, W., B¨urger, T., D¨oller, M., H¨offernig, M., Lee, W., Malais´e, V., Poppe, C., Troncy, R., Kosch, H., de Walle, R.V.: How to align media metadata schemas? design and implementation of the media ontology. In: Proceedings of the 10th International Workshop of the Multimedia Community on Semantic Multimedia Database Technologies (SeMuDaTe 2009), December 2009, vol. 539, pp. 56–69 (2009) 9. Wu, S., Crestani, F.: Shadow document methods of results merging. In: Proceedings of the 2004 ACM symposium on Applied Computing (January 2004) 10. Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: Proceedings of the eleventh international conference on Information and knowledge management (January 2002) 11. Craswell, P.T.N., Hawking, D.: Merging results from isolated search engines. In: Proceedings of the Tenth Australasian Database Conference (1999) 12. Montague, M., Aslam, J.: Relevance score normalization for metasearch. In: Proceedings of the tenth international conference on Information and knowledge management (January 2001) 13. Berretti, S., DelBimbo, A., Pala, P.: Merging results for distributed content based image retrieval. Multimedia Tools Applications 24(3), 215–232 (2004) 14. Lee, J.: Analyses of multiple evidence combination. In: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 1997 (January 1997) 15. Fox, E., Shaw, J.: Combination of multiple searches. NIST Special Publication Sp (January 1994) 16. Wu, S., Crestani, F.: Data fusion with estimated weights. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management (January 2002)
Sensor-Aware Web interface Marco Anisetti, Valerio Bellandi, Ernesto Damiani, Alessandro Mondoni, and Luigi Arnone
Abstract. Traditionally, human machine interactions (HMI) occur in the form of simple, typically single flows of uniform events, such as mouse clicks on Graphical User Interface (GUI). Researchers in the HMI field aim to improve the HMI experience using a more natural approach. Interaction infrastructure is becoming more pervasive, and terminal devices are increasingly equipped with multiple sensors, such as video cameras or audio/video equipment, capable of collecting information from the environment. Video sensors are already placed in nearly every kind of device with which humans interact, such as PCs, mobile phones, and PDAs. In this paper, we propose that in Web 2.0-3.0, the interactive experience could be greatly improved from the client point of view with the use of a pervasive computing paradigm. We focus our attention on improvements that can be obtained with video sensors, and describe a more general architecture that is compliant with the current web infrastructure.
1 Introduction The Web is the largest knowledge database in the world. Most of the data stored on the Web are text- or image-based, but in recent years, many kinds of sensors, including microphones and video cameras, have been installed in our daily environments such as the home and work place. This reflects the fact that humans naturally Marco Anisetti, Valerio Bellandi, Ernesto Damiani, and Alessandro Mondoni Universit`a degli Studi di Milano, Dipartimento di Tecnologie dell’informazione e-mail: {marco.anisetti,valerio.bellandi}@unimi.it, [email protected], [email protected] Luigi Arnone ST-Ericsson e-mail: [email protected] G.A. Tsihrintzis et al. (Eds.): Intel. Interactive Multimedia Systems & Services, SIST 6, pp. 309–322. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
310
M. Anisetti et al.
communicate with each other in a multi-modal fashion: we speak, gesture, gaze, and move, generating a rich, multi-streamed flow of information. Interacting with machines has traditionally been a much simpler affair, typically occurring via a single flow of uniform events like the discrete mouse clicks entered sequentially in a GUI. As the global information infrastructure becomes more pervasive, however, digital transactions are performed in diverse situations, using a variety of mobile devices and across multiple communication channels. Rather than being forced to assume the standard position in front of a machine, users move freely around their work environment, starting and monitoring different transactions. In this new paradigm of multi-modal access to networked media, a much richer contextual representation of users and their environment is made available to applications. For various purposes, it becomes important to be able to access data obtained by sensors existing in our environments. Since such data includes real-world information, a new worldwide social information infrastructure, which the research community calls the Sensing Web, may be realized. Early multi-modal systems were based on the recognition of active modes, such as speech and handwriting, for which there is now a large body of research. In this paper we propose an architecture to include multi-modal pervasive computing in natural web interaction (a Sensor-Aware Web interface). This work could be considered to exist under the umbrella of the Sensing Web Project launched in 2007 in Japan, and it compensates for certain architecture leakages related to the privacy of exchanged data. The contributions of this paper are: i) the design of a Sensor-Aware Architecture compliant with current web technologies, ii) a video-based lightweight algorithm that permits an automatic elaboration of the video stream, and iii) an improvement of the privacy of the Sensing Web approach. We also present examples of two applications based on Sensor-Aware Architecture. These demos propose a new way of interacting with websites that we call Presence-Awareness. The paper is organized as follows. Section 2 provides a brief summary of related work in the Sensing Web field and video analysis. Section 3 describes the Sensor Aware architecture. Then, in Section 4, the Sensor Layer is described, and in Section 5 two application examples are presented. Finally, in Section 6 we draw our conclusions.
2 Related Work In the literature, there are various examples of related efforts regarding the enabling of wide-area sensing services. In [9], the authors classify these into efforts having similar applications goals (sensor networks [11][12][13] and video surveillance [14]), and those that employ similar techniques (Internet service frameworks [15][16] and distributed databases [17][18][19]). In this paper, we will present two examples of the video-based Sensor Aware approach for improving the interaction experience in a pervasive environment. In the following subsections we discuss video analysis as it relates to functionalities included in our case study.
Sensor-Aware Web interface
311
2.1 Video Analysis Related Work The most common sensors in devices are video and audio sensors. This paper is focused on the analysis of video signals due to the fact that these signals permit a large number of functionalities useful for interactive purposes. The most useful functions achievable via video analysis are: • extraction of environmental information: from a simple description of illumination and prominent color, to a more complex description of the environment (open air, countryside, indoor); • object detection: detection of the presence of particular objects like markers (for AR purposes) or complex objects like faces and cars; • identification: of a particular object or situation, for instance, the identity of a person or emotion recognition. Since we focus our attention on interactive applications, the most important feature that should be “controlled” in the video is the presence of users. One method of determining whether a specific person is present in a video is face detection. Here we briefly summarize the literature related to the face related algorithms used in the rest of this paper. 2.1.1
Face Detection
The human face is a dynamic object with a high degree of variability in its appearance, which makes face detection a difficult problem. Nevertheless, a wide variety of techniques for face recognition have been proposed, ranging from simple edge-based algorithms to composite high-level approaches utilizing advanced pattern recognition methods. Wu et al. [62] used a mixture of low level techniques, and [50] considered visual features such as edges and color, as derived in the early stages of the human visual system and shown by the various visual response patterns on the retina of the eye. Yang and Huang [63] explored the gray-scale behavior of faces, creating a multi-resolution hierarchy of images by averaging and sub-sampling. The use of a multi-resolution hierarchy and rules to guide searches influenced many later face detection efforts [39]. A number of interesting techniques [56] [37] are based on feature searching processes that start with the determination of prominent facial features. Some face detection studies address this problem by grouping features in face-like constellations using more robust modeling methods [36][34][45]. Another approach [46] deals with gradient-type operators over local windows and converting the input images to a directional image. In [29], an edge-grouped approach is used with several constraints for face template searches. The same process is repeated at different scales to locate features such as eyes, eyebrows, and lips. In [30] the feature searching process is based on a control strategy to guide and assess results from template-based feature detectors. The most recent work in the field of deformable templates is based on Active Appearance Model (AAM )techniques [32] [31]. The first neural approaches to face detection were based on Multi-Layer Perception (MLP) and showed promising results only for fairly simple datasets [25]. The first advanced neural approach that reported satisfactory results for a large, difficult
312
M. Anisetti et al.
dataset was [51]. In [43] a probabilistic decision-based neural network was proposed. This is a classification neural network with a hierarchical modular structure. One of the more representative methods based over a naive Bayes classifier is described in [53][54]. A naive Bayes classifier estimates the joint probability of the local appearance and position of face patterns (subregions of the face) at multiple resolutions. One of the most common detection methods which already defines the standard for face detection is the Viola-Jones approach [60][61]. This approach describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. Recently, many extensions to the ViolaJones detector have been proposed that rely on different formulations of Haar-like features, from a tilted version [42] to a joint formulation of more features [48]. Several extensions intended to detect faces in multiple views with in-plane rotation have been proposed [41][35]. We chose to follow the Viola-Jones approach since its performance is compatible with a video streaming approach and the quality of the results are satisfactory. The Viola-Jones approach permits the definition of a general object detection framework. 2.1.2
Face Tracking
Tracking an object in a video sequence requires the continuous identification of its location when either the object or the camera is moving. Object tracking is a challenging problem. We concentrated our research on face tracking. There are a variety of approaches discussed in the literature, depending on the degrees of freedom of the face and camera, and the target application. Generally, we classify the approaches by considering the face model used (2D or 3D). 2D tracking typically aims to follow the image projections of objects or parts of objects whose 3D displacement results in motion that can be modeled as a 2D transformation. An adaptive model is then required to handle appearance changes due to perspective effects or to deformation. 2D model-based face tracking can provide the face’s image position in terms of centroid and scale or of an affine transformation [38]. Alternatively, more sophisticated models such as splines and deformable templates [26][40][49] can be used. In addition to motion, these methods provide basic 2D information about the observed individual’s appearance. Some methods rely on undeformable face features (e.g., eye corners) [52]; in [44], five features are used and registered to a line-based model. Another study uses the contour features for posture estimation with a rough 3D model [57]. Other methods include an ellipsoidal model [20] [21]. Dellaert et al. [33] formulated the 3D tracking of planar patches using texture mapping as the measurement model in an extended Kalman filter framework. Other work uses an extended Kalman filter to recover 3D structure, focal length, and facial pose [20]. Some feature-based approaches use Kalman filtering for improving the block matching of noticeable features [58][22] [28] during facial tracking. Recently, one group developed an approach similar to template-based AAM but using features instead of the global appearance [31]. This approach, called the Constrained Local Model (CLM), received a great deal of attention as an add-on for improving tracking
Sensor-Aware Web interface
313
quality and facial feature location. AAM, first proposed in [27], are closely related to Active Blobs [55] and Morphable Models [59],[23].
3 Sensor-Aware Architecture Sensor-Aware Architecture is an infrastructure that permits standard web pages to benefit from the new pervasive interactive functionality. A web programmer can use Sensor-Aware functionality without significant changes to the method of programming dynamic web pages. The trend toward ubiquitous Internet connectivity provides new opportunities for Internet-oriented applications [10]. Current web technologies, and the most diffuse Internet access infrastructures like ADSL, are download-oriented. The direction of communications is unbalanced. Our SensorAware Architecture re-balances this tendency because sensor information flows upstream from the client side to the web server side. Unfortunately, the HTTP protocol was not created for strongly bidirectional continuous flow of information. Since the goal is to remain compatible with the current web infrastructure, we developed a Sensor-Aware architecture using current web technology. The entities involved in the architecture are the Sensor-Aware Server (SAS), the Sensor-Aware Client (SAC), and the Browser Connection Module. • The SAS resides on the web server and collects sensor data from different clients and for different browsing sessions. Its main functionality is to make such information available to web pages. • The SAC runs as a daemon on the client machine, and its principal aim is to manage sensor data and send them to the server. It is also connected to web pages to exchange initial information about the session, receive browsing events, and to invoke page refreshes. • The Browser Connection Module is a simple applet for exchanging information with the Sensor-Aware Client. To take advantage of the functionality provided by Sensor-Aware Architecture, web pages must contain this object. The architecture uses three distinct connections, the classical HTTP connection for web browsing, one direct bidirectional connection from client to server for sensor-related information exchange, and a local connection established between the Browser Connection Module and the SAC. The exchange of sensor information with the server is not mandatory if the sensor data is not required on the server side. The sensor aware experience may be: i) client-only: a purely local effect like some change in the layout of the web pages depending on the illumination (useful for mobile devices), or ii) client-server: sensor information flow from the client to the server that makes a decision (i.e., face identification for web page access, the face images must be identified using the server-side data-base). Note that the sensor-related signals that flow to the server can be pre-elaborated to save bandwidth and computational power from the server-side point of view (i.e., the information about the illumination can be elaborated locally). Figure 1 is an overview of the architecture.
314
M. Anisetti et al.
Fig. 1 Sensor-aware architecture
A web designer can implement a web site including Sensor-Aware functionality using a purely local approach or by exchanging information between the client and server. One interesting aspect of the client server approach is that it solves the problem of privacy related to the Sensing Web project. In fact, it is possible to force the obscuration of certain information by setting the level of authorization. For instance, it is possible to obscure the faces in a video coming from remote cameras, except the faces of a user and his child. In Section 5 we will demonstrate the functionality achievable using the local approach.
4 Sensors Layer In this section we briefly present our Sensor Layer (SL), and in particular, the algorithms used for video processing. The sensor layer is a pre-elaboration layer that performs knowledge extraction from sensor data. Each sensor signal can be elaborated in parallel by a different analyzer. Each analyzer is focused on applying a specific algorithm for knowledge extraction. The requirements of every analyzer are related to the processing time consumption. Figure 2 shows the Sensor Layer’s internal architecture. The output of each module (video, audio) is collected and encapsulated in XML format by the encapsulator component. As already described in the architecture, the information flow from client to server could include some pre-elaborated data or some rough sensor signals. For instance, an audio signal can be pre-elaborated at the client side with a speech recognition analyzer in such a way that the information exchange includes words instead of audio signals, or it can be sent as is. One possible reason for sending it as an
Sensor-Aware Web interface
315
Fig. 2 Sensor layer structure.
audio stream is that an audio spectrum is required for a person’s identification on the server side. In this paper we focus our attention on video sensors. The video module of the SL includes several analyzers, starting from a low level analyzer for contrast or illumination, to higher levels for object detection and environmental 3D reconstruction. We focus our attention on face analyzers. The video module links each analyzer, collects the output, and performs data condensation and reasoning to obtain the data required by the higher level. It also sends its own output to the encapsulator module.
4.1 Face Detection Analyzer This face detection analyzer is an evolution of our previous face detection approach described in [4], and is an extension of the Viola-Jones [61] approach. The Viola-Jones approach uses four Haar-like features, but we use five. Furthermore, the cascade of the weak classifier in the initial stages uses the variance of the pixel intensity to discard uniform regions and speed up the searching process. In our previous work we focused on pure frontal views of faces. The face detection approach that we use for the Sensor Layer is an extension of our previous detector but is focused on the detection of faces in different postures. To achieve this goal, we extended a simple direct cascade to obtain a sequential cascade with different branches. Other methods use a tree of cascades with different decision making strategies. Wu et al. uses a parallel cascade [2], Li et al. uses a pyramid [3], and Huang et al. uses a Width-First-Search (WFS) tree [35]. Our cascade tree decision strategy is shown in Figure 3. The first strong cascade is trained using faces in every detectable posture. At the end of the first cascade, several false positives are allowed. The rest of the branches are trained over more specific postures, but are not tested in parallel. We start with the most probable position that can be chosen depending on the application. For instance, for our web interface application we start with the frontal posture. In other cases, like surveillance, the most probable posture would be lateral. In the case of failure during the cascade of one particular posture, the next cascade is activated. This approach performs well when the most frequent posture appears early in the branch list. Finally, we obtain the positions of faces in a frame and a posture for each face detected (Figure 4).
316
M. Anisetti et al.
Fig. 3 Face detection cascade.
Fig. 4 Face detection results for one example extracted from the profile CMU-MIT database. The red square indicates a right profile
The outputs of this analyzer are: i) the positions of the faces, ii) the inferred posture for each detected face, and iii) the portion of the frame that contains faces. This information is organized in XML format by the video module.
4.2 Tracking Analyzer The tracking analyzer is based on our tracking algorithm described in [5]. This analyzer requires an initialization step of the object which will be tracked. The tracking
Sensor-Aware Web interface
(a) Pitch
317
(b) Roll
(c) Yaw
Fig. 5 Comparison with the Boston University DB (dot line) for rigid tracking. The three axes rotation graphs are shown with our estimation as the black solid line. Note the y-scale for each graph varies in accordance with the extent of the subject’s motion in each direction.
approach can be 2D or 3D. For 2D tracking, the initialization is provided by the output of the Face Detector, otherwise a mask initialization strategy is performed as described in [6]. The precision of our approach for 3D tracking is comparable with other invasive measurement systems that can be used to infer posture precisely in terms of degree. For comparison with existing methods, we used the postural marked database (Boston University DB) used by [7] and [8] and performed the same rigid tracking tests. Figure 5 shows some examples of tracking using our approach compared with the ground truth for the three rotations. The system achieved a high precision on average, the recovery accuracy of Rolls, Pitches, and Yaws were about 0.96, 1.98, and 2.15, respectively. Compared to the other approach that uses a steepest descent minimization [7], we performed better in terms of quality of postural inferences (in [7] the authors indicate: rolls = 1.4, pitches = 3.2, yaws = 3.8 degrees). The output of the tracking analyzer was the rotations and translations in 3D or 2D space depending on the type of tracking used.
5 Presence-Aware Web Sites In this section we demonstrate a “Presence-Aware” web site. The “Presence-Aware” web site reacts to the presence of one or more users on the client side. This reaction can make the browsing experience more pleasant, intuitive, and natural. The first demonstration is a web page shown on a large LCD display (Figure 6) that changes
318
M. Anisetti et al.
(a) near
(b) distant
Fig. 6 Changing layout, depending on the users distance from the screen
Fig. 7 Control of video stream.
its layout depending on the distances between the screen and the persons in front of the display. The goal is to keep the content of the pages readable for various distances between the screen and viewers. The web page requires information about the presence of a human in front of the screen, and the distance from the screen to the user. This is obtained by face detection and the window dimension of the detected face. The second demonstration is of a presence-aware video streaming control. The functionalities covered by this demonstration are: • Pause and Play: The video streams switch to pause if the subject is not in front of the screen. This is obtained using face detection, tracking, and motion recognition. • Rewind or Forward: Stream chapter selection is driven by the subject’s head motion, and uses face detection (lateral posture) or 3D tracking if available.
Sensor-Aware Web interface
319
The Sensor-Layer provides information about the sensors, and the decision about which to use is made at the highest level. For instance, for rewind and forward, the information required by the application can be obtained via two different approaches: 3D tracking or lateral face detection. This functionality can also be implemented with a set-top box.
6 Conclusions In this paper, we proposed a Sensor-Aware architecture for a new generation of web development. Our architecture is compliant with the current web site design and development process, but it opens a new pervasive and natural mode of web interaction. We also discussed two applications of the method based on our video analysis layer.
References 1. Huang, C., Ai, H.Z., Li, Y., Lao, S.H.: Vector Boosting for Rotation Invariant Multi-View Face Detection. In: Proc. 10th IEEE Int’l Conf. Computer Vision (2005) 2. Wu, B., Ai, H., Huang, C., Lao, S.: Fast Rotation Invariant Multi-View Face Detection Based on Real AdaBoost. In: Proc. Sixth Int’l Conf. Automatic Face and Gesture Recognition, pp. 79–84 (2004) 3. Li, S.Z., et al.: Statistical Learning of Multi-View Face Detection. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 67–81. Springer, Heidelberg (2002) 4. Anisetti, M.: Fast and robust Face Detection, Multimedia Techniques for Device and Ambient Intelligence, ch. 3. Springer, US (2009) 5. Anisetti, M., Bellandi, V., Beverina, F.: Face Tracking Algorithm Robust to Pose, Illumination and Face Expression Changes: a 3D Parametric Model Approach. In: International Conference on Computer Vision Theory and Applications, Setubal, Portogallo (2006) 6. Bellandi, V.: Automatic 3D Facial Fitting for Tracking in Video Sequence, Multimedia Techniques for Device and Ambient Intelligence, ch. 4. Springer, US (2009) 7. Xiao, J., Kanade, T., Cohn, J.F.: Robust Full-Motion Recovery of Head by Dynamic Templates and Re-registration Techniques. In: Proc. of Conference on automatic face and gesture recognition (2002) 8. Cascia, M.L., Scarloff, S., Anthitsos, V.: Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models. IEEE Transaction on Pattern Analysis and Machine Intelligence (2000) 9. Gibbons, P.B., Karp, B., Ke, Y., Nath, S., Seshan, S.: Irisnet: An architecture for a worldwide sensor. IEEE Pervasive Computing 2(4) 10. Campbell, J., Gibbons, P.B., Nath, S., Pillai, P., Seshan, S., Sukthankar, R.: IrisNet: An Internetscale Architecture for Multimedia Sensors. In: ACM Multimedia 2005 (2005) 11. Hill, J., et al.: System Architecture Directions for Network Sensors. In: Proc. 9th Int’l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 93–104. ACM Press, New York (2000) 12. Heidemann, J., et al.: Building Efficient Wireless Sensor Networks with Low-Level Naming. In: Proc. 18th ACM Symp. Operating Systems Principles, pp. 146–159. ACM Press, New York (2001)
320
M. Anisetti et al.
13. Madden, S., Franklin, M.J.: Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data. In: Proc. 18th Int’l Conf. Data Eng., pp. 555–566. IEEE CS Press, Los Alamitos (2002) 14. Collins, R., et al.: Algorithms for Cooperative Multisensor Surveillance. Proc. IEEE 89(10), 1456–1477 (2001) 15. von Behren, J.R., et al.: Ninja: A Framework for Network Services. In: Proc. USENIX Ann. Tech. Conf., USENIX (2002) 16. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999) 17. Harren, M., et al.: Complex Queries in DHT-Based Peer-to-Peer Networks. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 242–250. Springer, Heidelberg (2002) 18. Silberschatz, A., Korth, H.F., Sudarshan, S.: Database Systems Concepts. McGraw-Hill, New York (2002) 19. Pu, C., Leff, A.: Replica Control in Distributed Systems: An Asynchronous Approach. In: Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 377–386. ACM Press, New York (1991) 20. Azarbayejani, A., Starner, T., Horowitz, B., Pentland, A.: Visually controlled graphics. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(6) (1993) 21. Basu, S., Essa, I., Pentland, A.: Motion regularization for model-based head tracking. In: Proc. of International Conference on Pattern recognition ICPR (1996) 22. Basu, S., Essa, I., Pentland, A.: A visual analysis/synthesis feedback loop for unconstrained face tracking research report no rr-99-051. Technical report, Institut EURECOM, Sophia Antipolis France (1999) 23. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: Computer Graphics, Annual Conference Series, SIGGRAPH (1999) 24. Blanz, V., Vetter, T.: Face recognition based on fitting a 3d morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(9), 1063–1074 (2003) 25. Burel, G., Carel, D.: Detection and localization of faces on digital images. Pattern Recognition Letter 15 (1994) 26. Cootes, T., Cooper, D., Taylor, C., Graham, J.: Active shape models – their training and application. Computer Vision and Image Understanding 61(1) (1995) 27. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance mode. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6), 681–685 (2000) 28. Cordea, M.D., Petriu, E.M., Georganas, N.D., Petriu, D.C., Whalen, T.E.: 3d head pose recovery for interactive virtual reality avatars. In: IEEE Instrumentation and Measurement Technology Conference (2001) 29. Craw, I., Ellis, H., Lishman, J.: Automatic extraction of face features. Pattern Recognition Letters (1987) 30. Craw, I., Tock, D., Bennett, A.: Finding face features. In: Sandini, G. (ed.) ECCV 1992. LNCS, vol. 588. Springer, Heidelberg (1992) 31. Cristinacce, D., Cootes, T.: Feature detection and tracking with constrained local models. In: Proceedings of the British Machine Vision Conference (2006) 32. Cristinacce, D., Cootes, T., Scott, I.: A multi-stage approach to facial feature detection. In: 15th British Machine Vision Conference (2004) 33. Dellaert, F., Thrun, S., Thorpe, C.: Jacobian images of super-resolved texture maps for model-based motion estimation and tracking. In: Proc. IEEE Workshop Applications of Computer Vision (1998)
Sensor-Aware Web interface
321
34. Hamouz, M., Kittler, J., Kamarainen, J.-K., Kalviainen, H., Paalanen, P.: Affine-invariant face detection and localization using gmm-based feature detector and enhanced appearance model. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition (2004) 35. Huang, C., Ai, H., Li, Y., Lao, S.: High-performance rotation invariant multiview face detection. IEEE Trans on Pattern Analysis and Machines Intelligence (2007) 36. Huang, W., Mariani, R.: Face detection and precise eyes location. In: 15th International Conference on Pattern Recognition (2000) 37. Jeng, S.H., Liao, H.Y.M., Han, C.C., Chern, M.Y., Liu, Y.T.: Facial feature detection using geometrical face model: An efficient approach. Pattern Recog. 31 (1998) 38. Jepson, A., Fleet, D.J., El-Maraghi, T.: Robust on-line appearance models for vision tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003) 39. Kotropoulos, C., Pitas, I.: Rule-based face detection in frontal views. In: Proc. Int’l Conf. Acoustics, Speech and Signal Processing (1997) ˜ 40. Lanitis, A., Taylor, C.J., Cootes, T.F.: Automatic interpretation and coding of face images using flexible models. IEEE Trans. on Pattern Analysis and Machine Intelligence 19 (1997) 41. Li, S.Z., Zhang, Z.Q.: Floatboost learning and statistical face detection. IEEE Trans. on Pattern Analysis and Machines Intelligence 26, 1112–1123 (2004) 42. Lienhart, R., Maydt, J.: An extended set of haar-like features for rapid object detection. In: International Conference on Image Processing, ICIP 2002 (2002) 43. Lin, S.-H., Kung, S.-Y., Lin, L.-J.: Face recognition/detection by probabilistic decisionbased neural network. IEEE Transaction on Neural Networks 8 (1997) 44. Liu, Z., Zhang, Z.: Robust head motion computation by taking advantage of physical properties. In: International Workshop on Intelligent Comunication Technologies and Applications with Emphasis on Mobile Comunications, pp. 181–190 (1999) 45. Loy, G., Eklundh, J.-O.: Detecting symmetry and symmetric constellations of features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 508– 521. Springer, Heidelberg (2006) 46. Maio, D., Maltoni, D.: Real-time face location on gray-scale static images. Pattern Recog. 33 (2000) 47. Matthews, I., Baker, S.: Active appearance models revisited. International Journal of Computer Vision 60(2), 135–164 (2004) (in press) 48. Mita, T., Kaneko, T., Hori, O.: Joint haar-like features for face detection. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005 (2005) 49. Moses, Y., Reynard, D., Blake, A.: Robust real time tracking and classification of facial expressions. In: Proceedings of the Fifth International Conference on Computer Vision, ICCV (1995) 50. Reisfeld, D., Yeshurun, Y.: Preprocessing of face images: Detection of features and pose normalization. Comput. Vision Image Understanding 71 (1998) 51. Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Transaction on Pattern Analysis Machines Intelligence 20 (1998) 52. Sarris, N., Makris, D.: Three dimensional model based rigid tracking of a human head. In: International Workshop on Intelligent Communication Technologies and Applications with Emphasis on Mobile Communications (1999) 53. Schneiderman, H., Kanade, T.: A statistical model for 3d object detection applied to faces and cars. In: IEEE Conference on Computer Vision and Pattern Recognition (2000) 54. Schneiderman, H., Kanade, T.: Object detection using the statistics of parts. International Journal of Computer Vision (2004)
322
M. Anisetti et al.
55. Sclaroff, S., Isidoro, J.: Active blobs. In: Proceedings of the 6th IEEE International Conference on Computer Vision (1998) 56. De Silva, L.C., Aizawa, K., Hatori, M.: Detection and tracking of facial features by using a facial feature model and deformable circular template. IEICE Trans. Inform. Systems (1995) 57. Pei, S.-C., Ko, C.-W., Su, M.-S.: Global motion estimation in modelbased image coding by tracking three-dimensional contour feature points. IEEE Transaction on circuits and system for video technology 1(2) (1998) 58. Valente, S., Dugelay, J.-L.: Face tracking and realistic animations for telecommunicant clones. IEEE Multimedia (2000) 59. Vetter, T., Poggio, T.: Linear object classes and image synthesis from a single example image. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 60. Viola, P., Jones, M.: Robust real-time object detection. In: Second Intl. Workshop on Stat. and Comp. Theories of Vision (2001) 61. Viola, P., Jones, M.: Robust real-time face detection. International Journal of Computer Vision (2004) 62. Wu, H., Chen, Q., Yachida, M.: Face detection from color images using a fuzzy pattern matching method. IEEE Trans. Pattern Analysis and Machine Intelligence 21(6), 557– 563 (1999) 63. Yang, G., Huang, T.S.: Human face detection in a complex background. Pattern Recog. 27 (1994)
Author Index
Abed, Mourad 275 Aggarwal, Akshai 205 Ajhoun, Rachida 41 Akervall, Jan 137 Alexandris, Nikolaos 53 Anisetti, Marco 251, 309 Arnone, Luigi 309
Gallo, Luigi 221 Garc´ıa-Duque, Jorge 181 Gillies, Christopher 137 H¨ olbling, G¨ unther 287 Hong, Seongmin 91 Idrissi, M.A. Janati
Bellandi, Valerio 251, 309 Blanco-Fern´ andez, Yolanda Bo˙zena, Kostek 59 Bouchelligua, Wided 275 Brannock, Evelyn 111 Brunie, Lionel 299
1, 181
Ceravolo, Paolo 251 Chang, Wei-Chun 121, 191, 261 Chehri, Abdellah 101, 157 Cho, Junsang 91, 101 Chrysafiadi, Konstantina 23 Ciarkowski, Andrzej 69 Coppolino, Luigi 231 Coquil, David 299 Czy˙zewski, A. 33 Czy˙zewski, Andrzej 13, 69 Daassi, Olfa 275 Dalka, Piotr 13 Damiani, Ernesto 251, 309 D¨ oller, Mario 287, 299 El Idrissi, Youn`es El Bouzekri Ellwart, D. 33 Fang, Chih-Chiang
261
Jeon, Gwanggil Jeong, Jechang
41
91, 101, 157 91, 101
Katarzyna, Kaszuba 59 Kent, Robert D. 205 Kobti, Ziad 205 Kosch, Harald 299 Krzysztof, Kopaczewski 59 L´ opez-Nores, Mart´ın 1, 181 Lampropoulos, Aristomenis S. Le, Hoai-Bac 79 Mahfoudhi, Adel 275 Meza-Kubo, Victoria 241 Mezhoudi, Nesrine 275 Mondoni, Alessandro 309 Mor´ an, Alberto L. 241 Mouftah, Hussein 157 Nava-Mu˜ noz, Sandra
241
41 Park, Sang-Jun 101 Patel, Nilesh 137, 191 Patsakis, Constantinos 53
167
324
Author Index
Pazos-Arias, Jos´e J. Pham, Anh-Phuong Piotr, Odya 59 Romano, Luigi
1, 181 79
Tourtoglou, Kalliopi 147 Tran, Thanh-Thang 79 Tsihrintzis, George A. 167
231
Seo, Jungil 91 Sethi, Ishwar 137, 191 Sethi, Ishwar K. 261 Singh, Gautam 137 Snowdon, Anne 205 Sotiropoulos, Dionysios N. Stegmaier, Florian 299 Szwoch, Grzegorz 13
Vilsmaier, Christian 299 Virvou, Maria 23, 147
167
Wang, Hei-Chia 121 Webersberger, Christine Weeks, Michael 111 Wilson, George 137 Wu, Ching-Seh 191 Wu, Ching-seh 121 Wu, Wei-Chun 261
287